SigmaPlot Has Extensive And Easy-To-Use Statistical Analysis Features
SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical functionality was designed with the non-statistician user in mind. This wizard-based statistical software package guides users through every step and performs powerful statistical analysis without having to be a statistical expert.
Learn More about the new and improved features
The statistical functionality was designed with the non-statistician user in mind.
Each statistical analysis has certain assumptions that have to be met by a data set. If underlying assumptions are not met, you may be given inaccurate or inappropriate results without knowing it.
However, SigmaPlot will check if your data set meets the test criteria and if not, it will suggest what test to run.
Statistical Analysis Features
Describe Data Single Group
Compare Two Groups
Compare Many Groups
Before and After
| Repeated Measures
Rates and Proportions
| Regression
Principal Components Analysis Correlation
Survival
Normality |
New Statistics Macros
- The Histogram and Kernel Density macro creates graphical estimates of underlying data distributions
Enhancements to Existing Features
- Analytic P values are implemented for all nonparametric ANOVAs
- All P values can now be specified for any value between 0 and 1
- The Akaike Information Criterion (AICc) is now found in the Regression Wizard and Dynamic Fit Wizard reports and the Report Options dialog
- The Rerun button has returned to the SigmaStat group
- Implemented the 24 probability functions in the curve fitter in standard.jfl
- Added seven weighting functions to all curve fit equations in standard.jfl. There is a slight variant added for 3D equations
New Statistics Features Found in Version 13 – One Way ANCOVA
Introduction
A single-factor ANOVA model is based on a completely randomized design in which the subjects of a study are randomly sampled from a population and then each subject is randomly assigned to one of several factor levels or treatments so that each subject has an equal probability of receiving a treatment. A common assumption of this design is that the subjects are homogeneous.
This means that any other variable, where differences between the subjects exist,does not significantly alter the treatment effect and need not be included in the model. However, there are often variables, outside the investigator’s control, that affect the observations within one or more factor groups, leading to necessary adjustments in the group means, their errors, the sources of variability ,and the P-values of the group effect, including multiple comparisons.
These variables are called covariates. They are typically continuous variables, but can also be categorical. Since they are usually of secondary importance to the study and, as mentioned above, not controllable by the investigator, they do not represent additional main-effects factors, but can still be included into the model to improve the precision of the results. Covariates are also known as nuisance variables or concomitant variables.
ANCOVA (Analysis of Covariance) is an extension of ANOVA obtained by specifying one or more covariates as additional variables in the model. The ANCOVA data arrangement in a SigmaPlot worksheet has one column with the factor and one column with the dependent variable (the observations) as in an ANOVA design. In addition, you will have one column for each covariate. When using a model that includes the effects of covariates, there is more explained variability in the value of the dependent variable.
This generally reduces the unexplained variance that is attributed to random sampling variability, which increases the sensitivity of the ANCOVA as compared to the same model without covariates (the ANOVA model). Higher test sensitivity means that smaller mean differences between treatments will become significant as compared to a standard ANOVA model, thereby increasing statistical power.
As a simple example of using ANCOVA, consider an experiment where students are randomly assigned to one of three types of teaching methods and their achievement scores are measured. The goal is to measure the effect of the different methods and determine if one method achieves a significantly higher average score than the others.
The methods are Lecture, Self-paced, and Cooperative Learning. Performing a One Way ANOVA on this hypothetical data gives the results in the table below, under the ANOVA column heading. We conclude there is no significant difference among the teaching methods. Also note that the variance unexplained by the ANOVA model which is due to the random sampling variability in the observations is estimated as 35.17.
It is possible that students in our study may benefit more from one method than the others, based on their previous academic performance. Suppose we refine the study to include a covariate that measures some prior ability, such as a state-sanctioned Standards Based Assessment (SBA). Performing a One Way ANCOVA on this data gives the results in the table below, under the ANCOVA column heading.
ANOVA | ANCOVA | |||
Method | Mean | Std. Error | Adjusted Mean | Std. Error |
Coop | 79.33 | 2.421 | >82.09 | 0.782 |
Self | 83.33 | 2.421 | 82.44 | 0.751 |
Lecture | 86.83 | 2.421 | 84.97 | 0.764 |
P = 0.124 | P =0.039 | |||
MSres= 35.17 | MSres = 3.355 |
The adjusted mean that is given in the table for each method is a correction to the group mean to control for the effects of the covariate. The results show the adjusted means are significantly different with the Lecture method as the more successful. Notice how the standard errors of the means have decreased by almost a factor of three while the variance due to random sample variability has decreased by a factor of ten. A reduction in error is the usual consequence of introducing covariates and performing an ANCOVA analysis.
Assumption Checking
In addition to the Normality and Equal Variance options, the One Way ANCOVA options will include testing for the equality of slopes (the Interaction Model).
- Normality Test: There are two options, Shapiro-Wilk and Kolmogorov-Smirnov, as is provided for the parametric ANOVA tests. The default P Value to Reject is .05 and the default test is Shapiro-Wilk. Normality testing is performed on the residuals of the Equal Slopes model or, if the Equality of Slopes Test fails, then the normality test is performed on the residuals of the Interaction Model.
- Equal Variance Test: The default P Value to Reject is .05. Levene’s mean test is used to assess equal variance. The test is performed on the residuals of the Equal Slopes Model or, if the Equality of Slopes Test fails, then the equal variance test is performed on the residuals of the Interaction Model.
- Equality of Slopes – One of the assumptions of ANCOVA is that there is no interaction between the Factor variable (the treatment levels) and the covariate variables. In other words, the coefficient of each covariate in the model is assumed to be the same for all treatments. An Equality of Slopes option provides the calculations to test this assumption. If the Equality of Slopes option is not selected, then equality of the slopes will be assumed and the analysis for the report focuses on the results of fitting the ANCOVA Model (also called the Equal Slopes Model) to the user’s data. If the Equality of Slopes option is selected, then the Interaction Model will be fit to the data to determine if any of the interactions is significant. If an interaction between the factor and any covariate is significant (so the equality of slopes test fails), then the analysis will stop, but the regression equations for each group will be provided. If there is no significant interaction between the factor and any covariate (so the equality of slopes test passes), then the report will continue to provide the results of the Equal Slopes Model.
ANCOVA Results
An example of the ANCOVA report is shown. The assumption checking results are displayed followed by the ANOVA table and the results interpretation. The adjusted means are displayed and then the multiple comparison results.
One Way Analysis of Covariance
Data source: Data 1 in ANCOVA_DataSets.JNB
Dependent Variable: Length
Group Name | N | Missing | Mean | Std Dev | SEM |
control | 62 | 0 | 1.217 | 0.140 | 0.0178 |
exposed 1 | 27 | 0 | 1.206 | 0.0578 | 0.0111 |
exposed 2 | 40 | 0 | 1.050 | 0.0994 | 0.0157 |
Total | 129 | 0 | 1.163 | 0.137 | 0.0121 |
Normality Test (Shapiro-Wilk): Passed (P = 0.057)
Equal Variance Test: Failed (P < 0.050)
Equal Slopes Test:
The equality of slopes assumption is tested by extending the ANCOVA regression model to include terms for the interactions of the factor with the covariates.
R = 0.625 | Rsqr = 0.390 | AdjRsqr = 0.365 |
Analysis of Variance:
Source of Variation | DF | SS | MS | F | P |
Col 1 | 2 | 0.124 | 0.0618 | 5.166 | 0.007 |
Age | 1 | 0.0369 | 0.0369 | 3.086 | 0.081 |
Col 1 x Age | 2 | 0.0681 | 0.0340 | 2.844 | 0.062 |
Residual | 123 | 1.472 | 0.0120 | – | – |
Total | 128 | 2.413 | 0.0189 | – | – |
The effect of the different treatment groups does not depend upon the value of covariate Age, averaging over the values of the remaining covariates. There is not a significant interaction between the factor Col 1 and the covariate Age (P = 0.062).
There are no significant interactions between the factor and the covariates. The equals slopes assumption passes and the equal slopes model is analyzed below.
Analysis of Equal Slopes Model:
R = 0.602 | Rsqr = 0.362 | AdjRsqr = 0.347 |
Analysis of Variance:
Source of Variation | DF | SS | MS | F | P |
Col 1 | 2 | 0.229 | 0.114 | 9.290 | <0.001 |
Age | 1 | 0.128 | 0.128 | 10.421 | 0.002 |
Residual | 125 | 1.540 | 0.0123 | – | – |
Total | 128 | 2.413 | 0.0189 | – | – |
The differences in the adjusted means among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = <0.001). To isolate which group(s) differ most from the others use a multiple comparison procedure. The adjusted means and their statistics are given in the table below.
The coefficient of covariate Age in the equal slopes regression model is significantly different from zero (P = 0.002). The covariate significantly affects the values of the dependent variable.
Adjusted Means of the Groups:
Group Name | Adjusted Mean | Std. Error | 95%Conf-L | 95%Conf-U |
control | 1.200 | 0.0150 | 1.170 | 1.230 |
exposed 1 | 1.193 | 0.0217 | 1.150 | 1.236 |
exposed 2 | 1.085 | 0.0206 | 1.044 | 1.126 |
The adjusted means are the predicted values of the model for each group where each covariate variable is evaluated at the grand mean of its sampled values.
All Pairwise Multiple Comparison Procedures (Holm-Sidak method):
Comparisons for factor: Col 1
Comparison | Diff of Means | t | P | P<0.050 |
control vs. exposed 2 | 0.115 | 4.178 | <0.001 | Yes |
exposed 1 vs. exposed 2 | 0.109 | 3.456 | 0.001 | Yes |
control vs. exposed 1 | 0.00693 | 0.271 | 0.787 | No |
Regression Equations for the Equal Slopes Model:
Each equation below is obtained by restricting the effects-coded dummy variables in the regression model to the values corresponding to each factor group.
A significant difference in the adjusted means of the factor groups is equivalent to a significant difference in the intercepts of the dependent variable for these equations.
Group: control
Length = 1.307 – (0.00280 * Age)
Group: exposed 1
Length = 1.300 – (0.00280 * Age)
Group: exposed 2
Length = 1.192 – (0.00280 * Age)
[/toggle] [toggle border=’2′ title=’ANCOVA Result Graphs’] There are four ANCOVA result graphs – Regression Lines in Groups, Scatter Plot of Residuals, Adjusted Means with Confidence Intervals and Normality Probability Plot.
Examples of ANCOVA Graphs
Principal Components Analysis (PCA)
Introduction
Principal component analysis (PCA) is a technique for reducing the complexity of high-dimensional data by approximating the data with fewer dimensions. Each new dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.
You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a low-dimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering and other statistical techniques.
The primary goal of Principal Components Analysis is to explain the sources of variability in the data and to represent the data with fewer variables while preserving most of the total variance.
Assumption Checking
Normality test – There are two options. Mardia’s Skewness and Kurtosis tests and the Henze-Zinkler test.
Principal Components Results
An example of the Principal Components report is shown. The assumption checking results are displayed followed by descriptive statistics, the correlation matrix and its eigenvalues. The number of in-model principal components is displayed along with a test for equality of eigenvalues. Based on these results, interpretations are given as to the number of principal components supported.
Principal Components Analysis
Normality Test (Henze-Zinkler):
Statistic = 1.351 Failed (P < 0.050)
Descriptive Statistics:
Variable | Mean | Std Dev |
Murder | 7.444 | 3.867 |
Rape | 25.734 | 10.760 |
Robbery | 124.092 | 88.349 |
Assault | 211.300 | 100.253 |
Burglary | 1291.904 | 432.456 |
Larceny | 2671.288 | 725.909 |
Auto_Theft | 377.526 | 193.394 |
Total Observations: 50
Missing: 0
Valid Observations: 50
An observation is missing if any worksheet cell in its row has a non-numeric value.
Correlation Matrix:
Murder | Rape | Robbery | Assault | Burglary | Larceny | Auto_Theft | |
Murder | 1.000 | – | – | – | – | – | |
Rape | 0.601 | 1.000 | – | – | – | – | – |
Robbery | 0.484 | 0.592 | 1.000 | – | – | – | – |
Assault | 0.649 | 0.740 | 0.557 | 1.000 | – | – | – |
Burglary | 0.386 | 0.712 | 0.637 | 0.623 | 1.000 | – | – |
Larceny | 0.102 | 0.614 | 0.447 | 0.404 | 0.792 | 1.000 | – |
Auto_Theft | 0.0688 | 0.349 | 0.591 | 0.276 | 0.558 | 0.444 | 1.000 |
Total Variance: = 7.000
Eigenvalues of the Correlation Matrix:
Eigenvalue | Difference | Proportion(%) | Cumulative(%) | |
1 | 4.115 | 2.876 | 58.785 | 58.785 |
2 | 1.239 | 0.513 | 17.696 | 76.481 |
3 | 0.726 | 0.409 | 10.369 | 86.850 |
4 | 0.316 | 0.0585 | 4.520 | 91.370 |
5 | 0.258 | 0.0359 | 3.685 | 95.056 |
6 | 0.222 | 0.0980 | 3.172 | 98.228 |
7 | 0.124 | — | 1.772 | 100.000 |
If two or more eigenvalues have the same value, then the corresponding principal components are not well-defined and any interpretation of them is suspect.
Number of In-Model Principal Components = 2
The in-model components correspond to all eigenvalues greater than or equal to the average eigenvalue. When analyzing the correlation matrix, the average eigenvalue is always 1.0. This criterion can be changed in the Test Options dialog on the Criterion panel. The variance of each principal component equals its corresponding eigenvalue.
Chi-Square Tests for the Equality of Eigenvalues:
Hypothesis: All eigenvalues are equal.
Statistic = 224.295
Degrees of freedom = 21.000
P value = <0.001
There is a significant difference in the eigenvalues. A principal components analysis can be conducted.
Hypothesis: The last 5 eigenvalues are equal.
Statistic = 39.287
Degrees of freedom = 13.209
P value = <0.001
There is a significant difference in the last 5 eigenvalues. You may want to include additional principal components in your model by changing the settings in the Test Options dialog on the Criterion panel.
Eigenvectors of the Correlation Matrix:
PC 1 | PC 2 | |
Murder | 0.300 | -0.629 |
Rape | 0.432 | -0.169 |
Robbery | 0.397 | 0.0422 |
Assault | 0.397 | -0.344 |
Burglary | 0.440 | 0.203 |
Larceny | 0.357 | 0.402 |
Auto_Theft | 0.295 | 0.502 |
Each principal component is a linear combination of the original variables, after each original variable has been standardized to have unit variance. The coefficients of this linear combination are the entries in the corresponding column of the above table. These coefficients provide the interpretation of the principal components in terms of the original variables.
Standard Errors for the Eigenvector Entries:
PC 1 | PC 2 | |
Murder | 0.0754 | 0.0790 |
Rape | 0.0387 | 0.100 |
Robbery | 0.0476 | 0.153 |
Assault | 0.0512 | 0.0888 |
Burglary | 0.0371 | 0.0898 |
Larceny | 0.0620 | 0.151 |
Auto_Theft | 0.0729 | 0.160 |
Component Loadings:
PC 1 | PC 2 | |
Murder | 0.609 | -0.700 |
Rape | 0.876 | -0.189 |
Robbery | 0.805 | 0.0470 |
Assault | 0.805 | -0.382 |
Burglary | 0.893 | 0.226 |
Larceny | 0.725 | 0.448 |
Auto_Theft | 0.599 | 0.559 |
If the principle components are standardized to have unit variance, the loadings are the coefficients of the linear combination of in-model principal components used to approximate the original variables. If a correlation matrix is analyzed, then the loadings equal the correlations between the original variables and the principal components.
Fitted Correlation Matrix:
Murder | Rape | Robbery | Assault | Burglary | Larceny | Auto_Theft | |
Murder | 0.861 | – | – | – | – | – | – |
Rape | 0.666 | 0.803 | – | – | – | – | – |
Robbery | 0.457 | 0.696 | 0.650 | – | – | – | – |
Assault | 0.758 | 0.777 | 0.630 | 0.794 | – | – | – |
Burglary | 0.385 | 0.739 | 0.729 | 0.632 | 0.848 | – | – |
Larceny | 0.128 | 0.550 | 0.605 | 0.412 | 0.749 | 0.726 | – |
Auto_Theft | -0.0268 | 0.419 | 0.508 | 0.268 | 0.661 | 0.684 | 0.671 |
This is an estimate of the correlation matrix that results by approximating the original variables with the in-model principal components.
Principal Components Results Graphs
There are three PCA result graphs – Scree Plot, Component Loadings Plot, and Component Scores Plot. Below are examples of the result graphs together with captions explaining the information the graphs contain. The graphs are based on a study of crime data gathered across the United States. The original variables in the data are seven types of crimes: Murder, Rape, Larceny, Burglary, Auto Theft, Robbery and Assault. The rates per 100,000 people were measured for all 50 states.