### The Interpretation Of Statistical Tests

by Albert Frank

In this article, we assume the following hypothesis: if the
reliability of a dichotomical test is *f*, then the
probability that it gives a wrong result is
1-*f*.

The following question arises: Below what reliability will a test result have a probability of being correct of less than 0.5?

Let P be the number of elements in the population, *a* the
probability (known) for an element of this population to
have a definite feature K, and *f* the reliability of the
test. The number of K-elements detected by the test equals *a
f* P. The number of non-K detected (wrongly) is
(1-*a*) (1-*f*) P.
The probability that an element detected by the test is
effectively a K-element is 0.5 if *a f* P =
(1-*a*) (1-*f*) P,
equivalent to *f* = 1-*a*. So, as
soon as *f* ≤ *a*, the test becomes a
nonsense.

A test must be more reliable if what it attempts to detect is very rare.

This simple fact is very often neglected.

Let's take an example: the alcohol test. We assume as hypothesis that one driver out of 100 is at "0.8 or more" (European norm for heavy offence is in excess of 0.8 gm/ltr.). In the following table, we examine for several reliabilities of the test the probability that somebody with a positive test is actually positive. We take a population of 100,000 persons, of which 1,000 are supposed to be "at 0.8 or more."

Reliability of the test | Valid detections | Invalid detections | Probability a "detection" is valid |
---|---|---|---|

.9999 | 1,000 | 10 | 0.99 |

.999 | 999 | 99 | 0.91 |

.99 | 990 | 990 | 0.50 |

.95 | 950 | 4,950 | 0.16 |

.9 | 900 | 9,900 | 0.08 |

.8 | 800 | 19,800 | 0.04 |

We can imagine the dangers of bad interpretations of tests in, for example, the medical field.

## 1 comment:

It's not just medical tests that can result in incorrect interpretations.

Also tests to identify someone as a terrorist, for example airport passenger screening or data-mining citizen call records or random roving wiretaps. In each case, the value of P(terrorist) is known to be small ("optimistically" 1000 out of 300M or 0.00033% but more likely smaller) and the value of f(test) is either uncertain or not very high (optimistically of the order of 80-90%, but probably worse). The chances of false positives swamp the likelihood of

everdetecting true positives. Further, false positives further decrease f(test) because of the "boy crying wolf" effect.As the cost of imposing such tests is demonstrably high (GDP, liberties, etc.), it begs the question: why do we use them and how deluded are we in believing their efficacy? The cost-benefit is dubious at best.

This obvious result is what leads some to wonder if the target isn't terrorists but rather anyone who doesn't agree with Administration and/or GOP ideology. The P value in that case puts such tests into the range of effective.

Post a Comment