### The Statistics of Stereotyping

by Fred Vaughan

In Albert Frank's article "The Interpretation of Statistical
Tests" he provides formulas and examples of the errors one can
get into when applying a test of a known high reliability to
determine whether subjects taken at random qualify as members of
a select (generally a negatively perceived) group. This is the
problem with "racial profiling" that was identified by
*"Renaissance"* in his posted comment which many fail to
understand properly.

Let us take as a given that a particular racial type that is readily identifiable happens to be represented at a much higher incidence frequency in some sort of crime or other. This could be theft, murder, terrorism, or whatever statistics provide convincing "justification." And let us suppose further that the statistics that are used are completely valid such that, for example, although one race constitutes only 10% of the total population, its members who perpetrate said crime outnumber those of the majority racial type who also perpetrate such crimes. Why would profiling in such cases be unwarranted even (or especially) from a mathematical perspective?

Here's why.

Suppose that there is a test in place that can be applied to
individuals that is extremely reliable (defined as *f* as in
the original article) with regard to determining the culpability
of an individual having already committed (or who will in the
future commit) said crime. There is *nothing* in the
justification statement given above that has any direct bearing
on the appropriateness of implementing such a program. Although
those may be completely valid statistics, they are *not*
sufficient to determine the efficacy of a program which they
attempt to justify. The appropriate statistic is "what is the
probability that an individual of the subject race may commit
such a crime — the parameter *a* in Albert
Frank's article. This number will always be small
— much much smaller than the probability of an
individual having already committed the crime being a member of
the subject race. This may seem like a subtle difference, but it
is not!

Suppose the racial mix of a population is only 10% A and 90% B
and that some precentage *a** _{a}* of those who
commit crime C are from the subset A. So far we have nothing to
go on. We need to know the percentage of the entire population
who commit crime C. Let

*c*be that probability. Then we can determine the likelihood of a member of A or B comitting that crime. If we define

*a*and

*b*as the probabilities that members of A and B will, respectively, commit the crime, we can then solve the problem using what we know as follows:

a* (0.1) +b* (0.9) =c, and

a* (0.1) /c=a_{a}

Given *a** _{a}*(which is usually all that is
given and that is usually insinuated as though it were

*a*itself — which it is

*not*), and one of the following,

*a*,

*b*, or

*c*, we can determine the effectiveness of profiling for a given reliability of testing.

Let's say by way of example that one in a thousand (*c* =
0.1%) of the total population commit the crime. Then for
*a** _{a}*= 0.6 we would have:

a=a/ 100 = 0.006 and_{a}b= ( 0.01 − 0.1 *a) / 0.9 = 0.00044.

Since in racial profiling the population is effectively reduced
to that of A rather than the much larger A + B, it is *a*
(as defined here) that corresponds to the same term in Albert
Frank's article. So in order to avoid the use of profiling
producing law enforcement nonsense, the reliability of testing an
individual once he is subject to test must be so good that 1
− *f* would be much less than 0.006 or
there will be as many or more unlawful (false positive) arrests
as lawful ones. And in law enforcement a reliability as high as
even 0.9 (let alone the required 0.994) is unheard of!

Therefore quite aside from the issue of the impertinence of the practice, it is a very ineffective approach to fighting crime.

## No comments:

Post a Comment