Classifying New Cases Using Discriminant Analysis
We are often asked how to classify new cases based on a discriminant analysis. The first section of this note describes the way SYSTAT classifies cases into classes internally. This is the way it is done in a file saved from a discriminant analysis and it is how the columns GROUP and PREDICT are calculated.
The second section discusses how to use the discriminant function classification coefficients to classify a new observation. The third section discusses a trick which will make SYSTAT automatically classify new observations. People who just wish to classify some new cases should go directly to section 3.
Classification in a Discriminant Analysis
When SYSTAT uses discriminant analysis, it classifies cases into classes in the ‘standard’ way. Here is how that works in a little more detail.
First, suppose there are no prior probabilities specified and there are n possible classes. To classify a case, SYSTAT first calculates D(1),….D(n), the Mahalanobis distances of that case to the centroid of each of the classes. It then calculates the probability of the case being in class j in two steps. First it calculates:
R( j ) = exp( -.5 * D( j ) ^2 ).
Then it calculates:
P( j ) = R( j )/( R( 1 ) + …. + R( n ) )
Finally, the program classifies a case into the class with the highest probability.
For example, if there is a case for which the Mahalnobis distances are .5 from the first group and 3 from the second, we calculate:
R( 1 ) = exp( -.5 * D( 1 ) ^2 ) = exp(-.5*(.5)^2)=.88250
R( 2 ) = exp( -.5 * D( 2 ) ^2 ) = exp(-.5*3^2)=.01111
Then P(1), the probability of this case being in the first group, is:
P( 1 )= R( 1 )/( R( 1 )+R( 2 ) ) = .88250/(.88250+.01111) = .98757
P( 2 )= R( 2 )/( R( 1 )+R( 2 ) ) = .01111/(.88250+.01111) = .01243
Since the higher probability is for the first group, the discriminant analysis classifies this case as being in group 1.
The above is the case for equal prior probabilities. Sometimes it is known that the classes do not occur with uniform frequency and it is worthwhile to specify a prior probability distribution in the analysis. That is to say, we know that the classes occur with relative frequencies Q( 1 ), Q(2), … , Q( n ) and these frequencies may not be equal. In this case, case the above formula is modified to be:
P( j ) = Q( j )R( j )/( Q( 1 )R( 1 ) + …. + Q( n ) R( n ) )
For example, in the case above, suppose we know that the two classes have prior probabilites of .2 and .8 respectively. Then, we have:
P( 1 ) = .2*R( 1 ) / (.2*R( 1 ) + .8*R( 2 ) )
= (.2*.88250)/( .2*.88250 + .8*.01111)
P( 2 ) = .8*R( 1 ) / (.2*R( 1 ) + .8*R( 2 ) )
= (.8*.01111)/( .2*.88250 + .8*.01111)
Thus, the case is still classified in the first class. Conceivably, however, it is possible that, with different prior probabilities, the classification could change.
You can find the theoretical basis behind this classification procedure in section 6.2 of Anderson’s book.
Group Classification Function Coefficients
Group classification function coefficients in discriminant analysis are used to classify new cases. The idea is this. Suppose you have a case that has not been classified. It will have observations on all of the continuous variables of the discriminant analysis and, from those observations, it should be possible to classify it in one of the given classes. So, pick one of the group classification functions and its associated constant.
Multiply each of the observations by its associated coefficient, add up the products and add the constant. Do this for each of the discriminant functions. Then classifiy the new observation in the class that has the highest value.
The above is fairly complicated and an example might help. Below is an example from page 295 of Seber’s book on multivariate analysis. It concerns two species of ‘flea beetles’ – Haltica oleracea L. and Haltica carduorum Guer.
Run the following command sequence to create the data file for the example:
INPUT X1 X2 X3 X4 BUG
189 245 137 163 1
192 260 132 217 1
217 276 141 192 1
221 299 142 213 1
171 239 128 158 1
192 262 147 173 1
213 278 136 201 1
192 255 128 185 1
170 244 128 192 1
201 276 146 186 1
195 242 128 192 1
205 263 147 192 1
180 252 121 167 1
192 283 138 183 1
200 294 138 188 1
192 277 150 177 1
200 287 136 173 1
181 255 146 183 1
192 287 141 198 1
181 305 184 209 2
158 237 133 188 2
184 300 166 231 2
171 273 162 213 2
181 297 163 224 2
181 308 160 223 2
177 301 166 221 2
198 308 141 197 2
180 286 146 214 2
177 299 171 192 2
176 317 166 213 2
192 312 166 209 2
176 285 141 200 2
169 287 162 214 2
164 265 147 192 2
181 308 157 204 2
192 276 154 209 2
181 278 149 235 2
175 271 140 192 2
197 303 170 205 2
Run a discriminant analysis on these data:
MODEL BUG = X1 X2 X3 X4
PRINT NONE / FMATRIX FSTATS EIGEN CMEANS SUM MEANS WILKS CFUNC TRACES CDFUNC,
SCDFUNC CLASS JCLASS
(You can, of course, use the Statistics->Classification->Discriminant Analysis dialog box to estimate the model. If you do, be sure to use the Statistics button in the dialog to request the results listed in the PRINT command above.) Among the results, you will see the following constants and coefficients for the group classification functions:
Now, suppose we have a new bug that has measurements:
For group 1: (0.956)*200+(-0.021)*260+(0.684)*140+(0.435)*170-178.309 = 177.141
For group 2: (0.610)*200+( 0.110)*260+(0.791)*140+(0.579)*170-194.114 = 165.656
The new observation should be classified in group 1, since that function has the larger value.
Sometimes people want ‘Fisher’s linear discriminant function.’ You get that by subtracting column 2 from column 1 and constant 2 from constant 1. That will get you coefficients for a linear function.
If you plug the values from a new observation into this function, then you should classify the new observation into group 1 if the value is greater than zero and into group 2 if it is less than zero.
If you think about it for a second, this rule is the same as the rule illustrated above. In the case from Seber, if you subtract column 2 from column 1 and constant 2 from constant 1, you will get the linear discriminant function on page 296 of his book.
The output from the example above includes the canonical discriminant functions. These values can be used in a manner similar to the Fisher coefficients to derive a linear classification function.
If you look at Mardia, Kent and Bibby’s book, on page 311 they have an example of discriminant analysis that uses a slight variation on the IRIS discriminant analysis of the SYSTAT manual. They have a slightly different viewpoint on classification functions, but, in the end, the classification functions they use agree with SYSTAT’s.
Automatically Classifying New Cases
Suppose you have a discriminant analysis that you have run successfully and you wish to classify some new cases that were not part of the original data set. There is a way to use the SYSTAT DISCRIM procedure to classify a number of cases automatically.
First, add a new variable to your data file, called COUNT. Set COUNT to 1 for all cases, using the LET command or the Data->Transform->Let dialog box.
Second, add the new cases to the end of your data file. You won’t necessarily know in which category to put the new cases, so you can enter an arbitrary classification or none at all. If you choose to enter an arbitrary classification, just make sure it is one of the classifications or categories of your original data.
Third, the important thing to do is to set the variable COUNT to 0 for all the new cases. You will use COUNT as a FREQUENCY variable that will, in effect, tell SYSTAT which cases to use in estimating the classification coefficients. Finally, run a discriminant analysis, but save the results to a file and request the table of Mahalanobis distances and posterior probabilities for each case.
Once you’ve added the variable COUNT, added the new cases and set their COUNT value to 0, use the following commands to run the discriminant analysis and save the results to a file:
SAVE DISCRIM.SYD / SCORES,DATA
MODEL BUG = X1 X2 X3 X4
PRINT NONE / MAHAL
After the command FREQUENCY=COUNT, those cases (your original cases) that have COUNT=1 are used to estimate the model, but a predicted class is saved for all cases in the file DISCRIM.SYD. As a result, your new cases, where COUNT=0, will now have a predicted class in that file.
If you examine the statistical output from the discriminant analysis, the table will show a posterior probability for membership in each class for each case. In the file DISCRIM.SYD, SYSTAT has assigned each case a predicted class, in the variable PREDICTD, that matches the class for which the case shows the highest probability.
- Anderson, T. W., An Introduction to Multivariate Analysis, (Second Edition), John Wiley and Sons, New York, 1984, ISBN 0 471-88987-3
- Mardia, K.V., Kent, J.T. and Bibby, J.M., Multivariate Analysis,
Academic Press, New York, 1979
- Seber, G.A.F, Multivariate Observations, John Wiley and Sons, New York, 1984