Relationships Between Component Loadings & Factor Scores in SYSTAT

 

If you perform a principal components analysis on a correlation matrix in SYSTAT, there are a number of relationships between component loadings, factor scores and eigenvalues that might be of interest in your analysis. You can use SYSTAT’s MATRIX module to explore these relationships.

As an example, estimate a principal components model on the file USDATA.SYD, one of the sample data files included with SYSTAT. The variables SPIRITS, WINE and BEER contain per capita consumption for each of the United States. The principal components analysis yields the following output:

Matrix to be factored

SPIRITS WINE BEER

SPIRITS 1.000
WINE 0.731 1.000
BEER 0.625 0.484 1.000

Latent Roots (Eigenvalues)

1 2 3

2.232 0.527 0.241

Empirical upper bound for the first Eigenvalue = 2.3558.

Chi-Square Test that all Eigenvalues are Equal, N = 50
CSQ = 60.7642 P = 0.0000 df = 3.00

Component loadings

1 2 3

SPIRITS 0.919 0.109 -0.380
WINE 0.861 0.425 0.280
BEER 0.804 -0.579 0.134

Variance Explained by Components

1 2 3

2.232 0.527 0.241

Percent of Total Variance Explained

1 2 3

74.400 17.580 8.020

Differences: Original Minus Fitted Correlations or Covariances

SPIRITS WINE BEER
SPIRITS 0.000
WINE 0.000 0.000
BEER 0.000 0.000 0.000
(NOTE: To get the correlation matrix included at the head of your results, you need to request MEDIUM or LONG output. Ordinarily you would do this using the Edit > Options dialog box, but the Statistics > Data Reduction > Factor Analysis dialog box always issues the command PRINT SHORT. You will need to run the principal components analysis using commands. For this example, the commands are:

FACTOR
USE usdata
PRINT MEDIUM
MODEL sprits wine beer
ESTIMATE / METHOD=PCA LISTWISE CORR NUMBER=3

Copy these commands to the SYSTAT Command window and use one of the Submit options to run them.

Using these results, we can explore the relationships between loadings, scores and eigenvalues.

(At the end of this document there is a SYSTAT command file you can use to verify the operations described here.)

1) The sum of squared principal component scores on a variable will add up to N-1 where N is the number of data elements. You can save the principal component scores for each variable in your model to a SYSTAT data file; In the Factor Analysis dialog box, click on the Save button and select the “Factor scores” radio button.

You will be prompted for file name when you click on OK in the Factor Analysis dialog box. In the MATRIX module, open the file of saved principal component scores and square all of the values, then take the sum of each column. For the example above, that is 49.

2) The sum of the squared component loadings from principal component analysis will add up to the eigenvalue. Save the component loadings by clicking on the Save button in the Factor Analysis dialog and selecting the “Factor loadings” radio button. In the MATRIX module, open the file of saved component loadings.

Each row of the saved file contains the loadings for the corresponding column of the Factor Analysis output. (That is, case 1 contains column 1 of the component loadings from the printed results.) Transpose the matrix to match the layout of the component loadings in the Factor Analysis output, square the values of all cells and then sum the columns. The results will match the eigenvalues listed in the Factor Analysis results.

3) The inverse of the component loadings matrix is the transpose of the matrix of factor score coefficients. That is, if you take the file of saved component loadings, transpose it in the MATRIX module to get the component loading matrix, you will have the inverse of the factor score matrix. If you also invert the component loadings matrix, you will have the factor score matrix itself.

4) The sum of the squares of the columns of the factor score matrix is the inverse of the eigenvalue corresponding to that column. If you’ve derived the factor score matrix in the preceding step, square the values of the matrix and sum the columns. The results will be the inverse of the eigenvalues printed in the Factor Analysis output.

5) Many people refer to the principal components as the eigenvectors. But, in fact, the columns of component loadings are the eigenvectors of the correlation matrix multiplied by the square roots of the eigenvalues of the matrix. Using USDATA, generate the PEARSON correlation matrix for the variables SPIRITS, WINE and BEER; save the correlation matrix to a data file.

In MATRIX, create a full matrix from the triangular matrix saved from the CORR module (use the FOLD function), and then use the CALL command to run the EIGEN procedure, generating a matrix of eigenvalues and eigenvectors. Note that the eigenvalues for the correlation matrix are identical to those reported in the Factor Analysis output, but the eigenvectors are not the same as the component loadings.

Next, still in the MATRIX module, take the square root of the eigenvalues and multiply the matrix of eigenvectors by these values. The resulting matrix will contain the values of the component loadings. The signs of the component loadings in the Factor Analysis output and as calculated in MATRIX may be different; don’t be alrmed – the sign of an eigenvector is arbitrary.

Deriving a Factor Score For Each Case

You can use factor score coefficients generated during the principal components analysis to derive a factor score for each case in your original data file. To obtain the factor score coefficients, use the Save button in the Factor Analysis dialog box and select the “Factor coefficients” radio button. You will be asked to specify a file name for the saved results after you click on OK in the Factor Analysis dialog box. Once the analysis is completed, open the file of saved results to view the matrix of factor score coefficients.

If you standardize the variables WINE, BEER and SPIRITS and consider one particular case, then the factor score of that case on FACTOR(1) is obtained by multiplying the standardized scores by the coefficients for FACTOR(1) and then adding them up. The factor score coefficients for the sample data are:

FACTOR SCORE COEFFICIENTS

1 2 3

SPIRITS 0.412 0.206 -1.579
WINE 0.386 0.806 1.163
BEER 0.360 -1.097 0.558
The state of Rhode Island has standardized scores of:

SPIRITS WINE BEER

0.1641 1.1228 -0.0348
If we then calculate:

.412*.1641+.386*1.1288+.360*(-.0348)

we get .488. (There may be round off error.) This is the factor score for Rhode Island.

Command File Examples

REM *****
REM RELATIONSHIPS BETWEEN COMPONENT LOADINGS AND FACTOR SCORES
REM *****

REM Generate example results, including correlation matrix.

FACTOR
USE usdata
PRINT MEDIUM
MODEL spirits wine beer
STIMATE / METHOD=PCA LISTWISE CORR NUMBER=3

REM 1) Save ‘Factor scores’, calulcate squares and sum.
REM Sum of SQUARES(n) will be the number of valid cases,
REM less 1.

FACTOR
USE usdata
SAVE scores.syd/ SCORES
MODEL spirits wine beer
ESTIMATE / METHOD=PCA LISTWISE CORR NUMBER=3

MATRIX
USE scores /MATRIX=scores
MAT squares=scores##2
MAT sum=COLSUM(squares)
CLEAR squares
SHOW sum

REM 2) Save ‘Factor loadings’. Transpose the file of
REM component loadings to get the component loadings matrix.
REM Square the loadings and take the sum of each column.
REM The result will equal the eigenvalue for that column in
REM Factor Analysis output.

FACTOR
USE usdata
SAVE loadings.syd /LOADINGS
MODEL SPIRITS WINE BEER
ESTIMATE / METHOD=PCA LISTWISE CORR NUMBER=3

MATRIX
USE loadings /MATRIX=loadxvar
MAT loadings=TRP(loadxvar)
COLNAME loadings=factor1 factor2 factor3
MAT squares=loadings##2
MAT evalue=COLSUM(squares)
CLEAR loadxvar squares
SHOW evalue

REM 3) Transpose and invert the matrix of component
REM loadings (LOADINGS) to derive the factor score matrix.

MAT factscor=INV(TRP(loadings))
COLNAME factscor=factor1 factor2 factor3
SHOW factscor

REM 4) Use the factor socre matrix (FACTSCOR) to derive
REM the inverse of the corresponding eigenvalues.

MAT squares=factscor##2
MAT evalinv=COLSUM(squares)
MAT evalue=1/evalinv
CLEAR squares
SHOW evalinv evalue

REM 5) The component loadings are the eigenvectors of the
REM correlation matrix multiplied by the square roots of the
REM of the eigenvalues of the correlation matrix.

CORR
USE usdata
SAVE halfcorr
PEARSON spirits wine beer

MATRIX
USE halfcorr /MATRIX=halfcorr
MAT fullcorr=FOLD(halfcorr)
CALL EIGEN(evalues,evectrs,fullcorr)
SHOW evalues evectrs

MAT evalues=DIAG(evalues)
MAT loadings=evectrs#SQR(evalues)
SHOW evalues evectrs loadings

REM *****
REM Calculating a factor score for a case.
REM *****

REM Save ‘Factor coefficients’ from principal components
REM analysis. Use MATRIX to standardize raw data values
REM of variables used in PCA, multiply standardized values
REM by coefficients for first factor to derive the factor score
REM for each case. Repeat for each factor. Concatenate
REM individual factor scores with raw data.

FACTOR
USE usdata
SAVE factor3.syd/ COEF
MODEL spirits wine beer
ESTIMATE / METHOD=PCA LISTWISE CORR NUMBER=3

MATRIX

REM Open raw data file, standardize variables of interest

USE usdata
MAT data=COLZSC(usdata(;spirits wine beer))
CLEAR usdata

REM Open file of factor coefficients, name rows
REM for later reference.

USE factor3 /MATRIX=factcoef
ROWNAME factcoef=factor1 factor2 factor3

REM Calculate factor score for each row, repeat for
REM for each factor

MAT score1=ROWSUM(data#factcoef(factor1;))
COLNAME score1=score1
MAT score2=ROWSUM(data#factcoef(factor2;))
COLNAME score2=score2
MAT score3=ROWSUM(data#factcoef(factor3;))
COLNAME score3=score3

REM Concatenate matrices of factor scores with raw data
REM values.

USE usdata
MAT scrdfile=usdata(;state$ spirits wine beer)||score1||score2||score3

SAVE usdscore /MATRIX=scrdfile