Saturday, December 02, 2006

The Interpretation Of Statistical Tests

Albert Frank headshot by Albert Frank

In this article, we assume the following hypothesis: if the reliability of a dichotomical test is f, then the probability that it gives a wrong result is 1-f.

The following question arises: Below what reliability will a test result have a probability of being correct of less than 0.5?

Let P be the number of elements in the population, a the probability (known) for an element of this population to have a definite feature K, and f the reliability of the test. The number of K-elements detected by the test equals a f P. The number of non-K detected (wrongly) is (1-a) (1-f) P. The probability that an element detected by the test is effectively a K-element is 0.5 if a f P = (1-a) (1-f) P, equivalent to f = 1-a. So, as soon as fa, the test becomes a nonsense.

A test must be more reliable if what it attempts to detect is very rare.

This simple fact is very often neglected.

Let's take an example: the alcohol test. We assume as hypothesis that one driver out of 100 is at "0.8 or more" (European norm for heavy offence is in excess of 0.8 gm/ltr.). In the following table, we examine for several reliabilities of the test the probability that somebody with a positive test is actually positive. We take a population of 100,000 persons, of which 1,000 are supposed to be "at 0.8 or more."

Reliability of the test Valid detections Invalid detections Probability a "detection" is valid
.9999 1,000 10 0.99
.999 999 99 0.91
.99 990 990 0.50
.95 950 4,950 0.16
.9 900 9,900 0.08
.8 800 19,800 0.04

We can imagine the dangers of bad interpretations of tests in, for example, the medical field.


No comments: