SigmaPlot Has Extensive And EasyToUse Statistical Analysis Features
SigmaPlot is now bundled with SigmaStat as an easytouse package for complete graphing and data analysis. The statistical functionality was designed with the nonstatistician user in mind. This wizardbased statistical software package guides users through every step and performs powerful statistical analysis without having to be a statistical expert.
Each statistical analysis has certain assumptions that have to be met by a data set. If underlying assumptions are not met, you may be given inaccurate or inappropriate results without knowing it.
However, SigmaPlot will check if your data set meets the test criteria and if not, it will suggest what test to run.
[toggle border=’2′ title=’Statistical Analysis Features’]Describe DataSingle Group
Compare Two Groups
Compare Many Groups
Before and After

Repeated Measures
Rates and Proportions

Regression
Principal Components Analysis Correlation
Survival
Normality 
New Statistics Macros
 The Histogram and Kernel Density macro creates graphical estimates of underlying data distributions
Enhancements to Existing Features
 Analytic P values are implemented for all nonparametric ANOVAs
 All P values can now be specified for any value between 0 and 1
 The Akaike Information Criterion (AICc) is now found in the Regression Wizard and Dynamic Fit Wizard reports and the Report Options dialog
 The Rerun button has returned to the SigmaStat group
 Implemented the 24 probability functions in the curve fitter in standard.jfl
 Added seven weighting functions to all curve fit equations in standard.jfl. There is a slight variant added for 3D equations
Introduction
A singlefactor ANOVA model is based on a completely randomized design in which the subjects of a study are randomly sampled from a population and then each subject is randomly assigned to one of several factor levels or treatments so that each subject has an equal probability of receiving a treatment. A common assumption of this design is that the subjects are homogeneous.
This means that any other variable, where differences between the subjects exist,does not significantly alter the treatment effect and need not be included in the model. However, there are often variables, outside the investigator’s control, that affect the observations within one or more factor groups, leading to necessary adjustments in the group means, their errors, the sources of variability ,and the Pvalues of the group effect, including multiple comparisons.
These variables are called covariates. They are typically continuous variables, but can also be categorical. Since they are usually of secondary importance to the study and, as mentioned above, not controllable by the investigator, they do not represent additional maineffects factors, but can still be included into the model to improve the precision of the results. Covariates are also known as nuisance variables or concomitant variables.
ANCOVA (Analysis of Covariance) is an extension of ANOVA obtained by specifying one or more covariates as additional variables in the model. The ANCOVA data arrangement in a SigmaPlot worksheet has one column with the factor and one column with the dependent variable (the observations) as in an ANOVA design. In addition, you will have one column for each covariate. When using a model that includes the effects of covariates, there is more explained variability in the value of the dependent variable.
This generally reduces the unexplained variance that is attributed to random sampling variability, which increases the sensitivity of the ANCOVA as compared to the same model without covariates (the ANOVA model). Higher test sensitivity means that smaller mean differences between treatments will become significant as compared to a standard ANOVA model, thereby increasing statistical power.
As a simple example of using ANCOVA, consider an experiment where students are randomly assigned to one of three types of teaching methods and their achievement scores are measured. The goal is to measure the effect of the different methods and determine if one method achieves a significantly higher average score than the others.
The methods are Lecture, Selfpaced, and Cooperative Learning. Performing a One Way ANOVA on this hypothetical data gives the results in the table below, under the ANOVA column heading. We conclude there is no significant difference among the teaching methods. Also note that the variance unexplained by the ANOVA model which is due to the random sampling variability in the observations is estimated as 35.17.
It is possible that students in our study may benefit more from one method than the others, based on their previous academic performance. Suppose we refine the study to include a covariate that measures some prior ability, such as a statesanctioned Standards Based Assessment (SBA). Performing a One Way ANCOVA on this data gives the results in the table below, under the ANCOVA column heading.
ANOVA  ANCOVA  
Method  Mean  Std. Error  Adjusted Mean  Std. Error 
Coop  79.33  2.421  >82.09  0.782 
Self  83.33  2.421  82.44  0.751 
Lecture  86.83  2.421  84.97  0.764 
P = 0.124  P =0.039  
MSres= 35.17  MSres = 3.355 
The adjusted mean that is given in the table for each method is a correction to the group mean to control for the effects of the covariate. The results show the adjusted means are significantly different with the Lecture method as the more successful. Notice how the standard errors of the means have decreased by almost a factor of three while the variance due to random sample variability has decreased by a factor of ten. A reduction in error is the usual consequence of introducing covariates and performing an ANCOVA analysis.
[/toggle]
[toggle border=’2′ title=’Assumption Checking’]
In addition to the Normality and Equal Variance options, the One Way ANCOVA options will include testing for the equality of slopes (the Interaction Model).
 Normality Test: There are two options, ShapiroWilk and KolmogorovSmirnov, as is provided for the parametric ANOVA tests. The default P Value to Reject is .05 and the default test is ShapiroWilk. Normality testing is performed on the residuals of the Equal Slopes model or, if the Equality of Slopes Test fails, then the normality test is performed on the residuals of the Interaction Model.
 Equal Variance Test: The default P Value to Reject is .05. Levene’s mean test is used to assess equal variance. The test is performed on the residuals of the Equal Slopes Model or, if the Equality of Slopes Test fails, then the equal variance test is performed on the residuals of the Interaction Model.
 Equality of Slopes – One of the assumptions of ANCOVA is that there is no interaction between the Factor variable (the treatment levels) and the covariate variables. In other words, the coefficient of each covariate in the model is assumed to be the same for all treatments. An Equality of Slopes option provides the calculations to test this assumption. If the Equality of Slopes option is not selected, then equality of the slopes will be assumed and the analysis for the report focuses on the results of fitting the ANCOVA Model (also called the Equal Slopes Model) to the user’s data. If the Equality of Slopes option is selected, then the Interaction Model will be fit to the data to determine if any of the interactions is significant. If an interaction between the factor and any covariate is significant (so the equality of slopes test fails), then the analysis will stop, but the regression equations for each group will be provided. If there is no significant interaction between the factor and any covariate (so the equality of slopes test passes), then the report will continue to provide the results of the Equal Slopes Model.
An example of the ANCOVA report is shown. The assumption checking results are displayed followed by the ANOVA table and the results interpretation. The adjusted means are displayed and then the multiple comparison results.
One Way Analysis of Covariance
Data source: Data 1 in ANCOVA_DataSets.JNB
Dependent Variable: Length
Group Name  N  Missing  Mean  Std Dev  SEM 
control  62  0  1.217  0.140  0.0178 
exposed 1  27  0  1.206  0.0578  0.0111 
exposed 2  40  0  1.050  0.0994  0.0157 
Total  129  0  1.163  0.137  0.0121 
Normality Test (ShapiroWilk): Passed (P = 0.057)
Equal Variance Test:Failed (P < 0.050)
Equal Slopes Test:
The equality of slopes assumption is tested by extending the ANCOVA regression model to include terms for the interactions of the factor with the covariates.
R = 0.625  Rsqr = 0.390  AdjRsqr = 0.365 
Analysis of Variance:
Source of Variation  DF  SS  MS  F  P 
Col 1  2  0.124  0.0618  5.166  0.007 
Age  1  0.0369  0.0369  3.086  0.081 
Col 1 x Age  2  0.0681  0.0340  2.844  0.062 
Residual  123  1.472  0.0120  –  – 
Total  128  2.413  0.0189  –  – 
The effect of the different treatment groups does not depend upon the value of covariate Age, averaging over the values of the remaining covariates. There is not a significant interaction between the factor Col 1 and the covariate Age (P = 0.062).
There are no signficant interactions between the factor and the covariates. The equals slopes assumption passes and the equal slopes model is analyzed below.
Analysis of Equal Slopes Model:
R = 0.602  Rsqr = 0.362  AdjRsqr = 0.347 
Analysis of Variance:
Source of Variation  DF  SS  MS  F  P 
Col 1  2  0.229  0.114  9.290  <0.001 
Age  1  0.128  0.128  10.421  0.002 
Residual  125  1.540  0.0123  –  – 
Total  128  2.413  0.0189  –  – 
The differences in the adjusted means among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = <0.001). To isolate which group(s) differ most from the others use a multiple comparison procedure. The adjusted means and their statistics are given in the table below.
The coefficient of covariate Age in the equal slopes regression model is significantly different from zero (P = 0.002). The covariate significantly affects the values of the dependent variable.
Adjusted Means of the Groups:
Group Name  Adjusted Mean  Std. Error  95%ConfL  95%ConfU 
control  1.200  0.0150  1.170  1.230 
exposed 1  1.193  0.0217  1.150  1.236 
exposed 2  1.085  0.0206  1.044  1.126 
The adjusted means are the predicted values of the model for each group where each covariate variable is evaluated at the grand mean of its sampled values.
All Pairwise Multiple Comparison Procedures (HolmSidak method):
Comparisons for factor: Col 1
Comparison  Diff of Means  t  P  P<0.050 
control vs. exposed 2  0.115  4.178  <0.001  Yes 
exposed 1 vs. exposed 2  0.109  3.456  0.001  Yes 
control vs. exposed 1  0.00693  0.271  0.787  No 
Regression Equations for the Equal Slopes Model:
Each equation below is obtained by restricting the effectscoded dummy variables in the regression model to the values corresponding to each factor group.
A significant difference in the adjusted means of the factor groups is equivalent to a significant difference in the intercepts of the dependent variable for these equations.
Group: control
Length = 1.307 – (0.00280 * Age)
Group: exposed 1
Length = 1.300 – (0.00280 * Age)
Group: exposed 2
Length = 1.192 – (0.00280 * Age)
[/toggle]
[toggle border=’2′ title=’ANCOVA Result Graphs’]
There are four ANCOVA result graphs – Regression Lines in Groups, Scatter Plot of Residuals, Adjusted Means with Confidence Intervals and Normality Probability Plot.
Examples of ANCOVA Graphs
[/toggle]
[toggle border=’2′ title=’Principal Components Analysis (PCA)’]
Principal Components Analysis (PCA)
Introduction
Principal component analysis (PCA) is a technique for reducing the complexity of highdimensional data by approximating the data with fewer dimensions. Each new dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.
You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a lowdimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering and other statistical techniques.
The primary goal of Principal Components Analysis is to explain the sources of variability in the data and to represent the data with fewer variables while preserving most of the total variance.
Assumption Checking
Normality test – There are two options. Mardia’s Skewness and Kurtosis tests and the HenzeZinkler test.
Principal Components Results
An example of the Principal Components report is shown. The assumption checking results are displayed followed by descriptive statistics, the correlation matrix and its eigenvalues. The number of inmodel principal components is displayed along with a test for equality of eigenvalues. Based on these results, interpretations are given as to the number of principal components supported.
Principal Components Analysis
Normality Test (HenzeZinkler):
Statistic = 1.351 Failed (P < 0.050)
Descriptive Statistics:
Variable  Mean  Std Dev 
Murder  7.444  3.867 
Rape  25.734  10.760 
Robbery  124.092  88.349 
Assault  211.300  100.253 
Burglary  1291.904  432.456 
Larceny  2671.288  725.909 
Auto_Theft  377.526  193.394 
Total Observations: 50
Missing: 0
Valid Observations: 50
An observation is missing if any worksheet cell in its row has a nonnumeric value.
Correlation Matrix:
Murder  Rape  Robbery  Assault  Burglary  Larceny  Auto_Theft  
Murder  1.000  –  –  –  –  –  
Rape  0.601  1.000  –  –  –  –  – 
Robbery  0.484  0.592  1.000  –  –  –  – 
Assault  0.649  0.740  0.557  1.000  –  –  – 
Burglary  0.386  0.712  0.637  0.623  1.000  –  – 
Larceny  0.102  0.614  0.447  0.404  0.792  1.000  – 
Auto_Theft  0.0688  0.349  0.591  0.276  0.558  0.444  1.000 
Total Variance: = 7.000
Eigenvalues of the Correlation Matrix:
Eigenvalue  Difference  Proportion(%)  Cumulative(%)  
1  4.115  2.876  58.785  58.785 
2  1.239  0.513  17.696  76.481 
3  0.726  0.409  10.369  86.850 
4  0.316  0.0585  4.520  91.370 
5  0.258  0.0359  3.685  95.056 
6  0.222  0.0980  3.172  98.228 
7  0.124  —  1.772  100.000 
If two or more eigenvalues have the same value, then the corresponding principal components are not welldefined and any interpretation of them is suspect.
Number of InModel Principal Components = 2
The inmodel components correspond to all eigenvalues greater than or equal to the average eigenvalue. When analyzing the correlation matrix, the average eigenvalue is always 1.0. This criterion can be changed in the Test Options dialog on the Criterion panel. The variance of each principal component equals its corresponding eigenvalue.
ChiSquare Tests for the Equality of Eigenvalues:
Hypothesis: All eigenvalues are equal.
Statistic = 224.295
Degrees of freedom = 21.000
P value = <0.001
There is a significant difference in the eigenvalues. A principal components analysis can be conducted.
Hypothesis: The last 5 eigenvalues are equal.
Statistic = 39.287
Degrees of freedom = 13.209
P value = <0.001
There is a significant difference in the last 5 eigenvalues. You may want to include additional principal components in your model by changing the settings in the Test Options dialog on the Criterion panel.
Eigenvectors of the Correlation Matrix:
PC 1  PC 2  
Murder  0.300  0.629 
Rape  0.432  0.169 
Robbery  0.397  0.0422 
Assault  0.397  0.344 
Burglary  0.440  0.203 
Larceny  0.357  0.402 
Auto_Theft  0.295  0.502 
Each principal component is a linear combination of the original variables, after each original variable has been standardized to have unit variance. The coefficients of this linear combination are the entries in the corresponding column of the above table. These coefficients provide the interpretation of the principal components in terms of the original variables.
Standard Errors for the Eigenvector Entries:
PC 1  PC 2  
Murder  0.0754  0.0790 
Rape  0.0387  0.100 
Robbery  0.0476  0.153 
Assault  0.0512  0.0888 
Burglary  0.0371  0.0898 
Larceny  0.0620  0.151 
Auto_Theft  0.0729  0.160 
Component Loadings:
PC 1  PC 2  
Murder  0.609  0.700 
Rape  0.876  0.189 
Robbery  0.805  0.0470 
Assault  0.805  0.382 
Burglary  0.893  0.226 
Larceny  0.725  0.448 
Auto_Theft  0.599  0.559 
If the principle components are standardized to have unit variance, the loadings are the coefficients of the linear combination of inmodel principal components used to approximate the original variables. If a correlation matrix is analyzed, then the loadings equal the correlations between the original variables and the principal components.
Fitted Correlation Matrix:
Murder  Rape  Robbery  Assault  Burglary  Larceny  Auto_Theft  
Murder  0.861  –  –  –  –  –  – 
Rape  0.666  0.803  –  –  –  –  – 
Robbery  0.457  0.696  0.650  –  –  –  – 
Assault  0.758  0.777  0.630  0.794  –  –  – 
Burglary  0.385  0.739  0.729  0.632  0.848  –  – 
Larceny  0.128  0.550  0.605  0.412  0.749  0.726  – 
Auto_Theft  0.0268  0.419  0.508  0.268  0.661  0.684  0.671 
This is an estimate of the correlation matrix that results by approximating the original variables with the inmodel principal components.
[/toggle]
[toggle border=’2′ title=’Principal Components Results Graphs’]
Principal Components Results Graphs
There are three PCA result graphs – Scree Plot, Component Loadings Plot, and Component Scores Plot. Below are examples of the result graphs together with captions explaining the information the graphs contain. The graphs are based on a study of crime data gathered across the United States. The original variables in the data are seven types of crimes: Murder, Rape, Larceny, Burglary, Auto Theft, Robbery and Assault. The rates per 100,000 people were measured for all 50 states.
[/toggle]