The Interpretation Of Statistical Tests
by Albert Frank
In this article, we assume the following hypothesis: if the reliability of a dichotomical test is f, then the probability that it gives a wrong result is 1-f.
The following question arises: Below what reliability will a test result have a probability of being correct of less than 0.5?
Let P be the number of elements in the population, a the probability (known) for an element of this population to have a definite feature K, and f the reliability of the test. The number of K-elements detected by the test equals a f P. The number of non-K detected (wrongly) is (1-a) (1-f) P. The probability that an element detected by the test is effectively a K-element is 0.5 if a f P = (1-a) (1-f) P, equivalent to f = 1-a. So, as soon as f ≤ a, the test becomes a nonsense.
A test must be more reliable if what it attempts to detect is very rare.
This simple fact is very often neglected.
Let's take an example: the alcohol test. We assume as hypothesis that one driver out of 100 is at "0.8 or more" (European norm for heavy offence is in excess of 0.8 gm/ltr.). In the following table, we examine for several reliabilities of the test the probability that somebody with a positive test is actually positive. We take a population of 100,000 persons, of which 1,000 are supposed to be "at 0.8 or more."
Reliability of the test | Valid detections | Invalid detections | Probability a "detection" is valid |
---|---|---|---|
.9999 | 1,000 | 10 | 0.99 |
.999 | 999 | 99 | 0.91 |
.99 | 990 | 990 | 0.50 |
.95 | 950 | 4,950 | 0.16 |
.9 | 900 | 9,900 | 0.08 |
.8 | 800 | 19,800 | 0.04 |
We can imagine the dangers of bad interpretations of tests in, for example, the medical field.
No comments:
Post a Comment