# SigmaPlot Has Extensive And Easy-To-Use Statistical Analysis Features

SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical functionality was designed with the non-statistician user in mind. This wizard-based statistical software package guides users through every step and performs powerful statistical analysis without having to be a statistical expert.

Each statistical analysis has certain assumptions that have to be met by a data set. If underlying assumptions are not met, you may be given inaccurate or inappropriate results without knowing it.

However, SigmaPlot will check if your data set meets the test criteria and if not, it will suggest what test to run.

[toggle border=’2′ title=’Statistical Analysis Features’]
 Describe DataSingle Group One Sample t-test One Sample Signed Rank test Compare Two Groups t-test Ranked Sum Test Compare Many Groups One Way ANOVA Two Way ANOVA Three Way ANOVA ANOVA on Ranks One Way ANCOVA Before and After Paired t-test Signed Rank Test Repeated Measures One Way Repeated Measures ANOVA Two Way Repeated Measures ANOVA Repeated Measures ANOVA on Ranks Rates and Proportions z-test Chi-Square Fisher Exact Test McNemars’s Test Relative Risk Odds Ratio Regression Linear Multiple Logistic Multiple Linear Polynomial Stepwise Best Subsets Regression Wizard Deming Principal Components Analysis Correlation Pearson Product Moment Spearman Rank Order Survival Kaplan-Meier Cox Regresssion Normality
[/toggle] [toggle border=’2′ title=’New Statistics Macros’]

New Statistics Macros

• The Histogram and Kernel Density macro creates graphical estimates of underlying data distributions
[/toggle] [toggle border=’2′ title=’Enhancements to Existing Features’]

Enhancements to Existing Features

• Analytic P values are implemented for all nonparametric ANOVAs
• All P values can now be specified for any value between 0 and 1
• The Akaike Information Criterion (AICc) is now found in the Regression Wizard and Dynamic Fit Wizard reports and the Report Options dialog
• The Rerun button has returned to the SigmaStat group
• Implemented the 24 probability functions in the curve fitter in standard.jfl
• Added seven weighting functions to all curve fit equations in standard.jfl. There is a slight variant added for 3D equations
[/toggle] [toggle border=’2′ title=’New Statistics Features Found in Version 13′] One Way ANCOVA

Introduction
A single-factor ANOVA model is based on a completely randomized design in which the subjects of a study are randomly sampled from a population and then each subject is randomly assigned to one of several factor levels or treatments so that each subject has an equal probability of receiving a treatment. A common assumption of this design is that the subjects are homogeneous.

This means that any other variable, where differences between the subjects exist,does not significantly alter the treatment effect and need not be included in the model. However, there are often variables, outside the investigator’s control, that affect the observations within one or more factor groups, leading to necessary adjustments in the group means, their errors, the sources of variability ,and the P-values of the group effect, including multiple comparisons.

These variables are called covariates. They are typically continuous variables, but can also be categorical. Since they are usually of secondary importance to the study and, as mentioned above, not controllable by the investigator, they do not represent additional main-effects factors, but can still be included into the model to improve the precision of the results. Covariates are also known as nuisance variables or concomitant variables.

ANCOVA (Analysis of Covariance) is an extension of ANOVA obtained by specifying one or more covariates as additional variables in the model. The ANCOVA data arrangement in a SigmaPlot worksheet has one column with the factor and one column with the dependent variable (the observations) as in an ANOVA design. In addition, you will have one column for each covariate. When using a model that includes the effects of covariates, there is more explained variability in the value of the dependent variable.

This generally reduces the unexplained variance that is attributed to random sampling variability, which increases the sensitivity of the ANCOVA as compared to the same model without covariates (the ANOVA model). Higher test sensitivity means that smaller mean differences between treatments will become significant as compared to a standard ANOVA model, thereby increasing statistical power.

As a simple example of using ANCOVA, consider an experiment where students are randomly assigned to one of three types of teaching methods and their achievement scores are measured. The goal is to measure the effect of the different methods and determine if one method achieves a significantly higher average score than the others.

The methods are Lecture, Self-paced, and Cooperative Learning. Performing a One Way ANOVA on this hypothetical data gives the results in the table below, under the ANOVA column heading. We conclude there is no significant difference among the teaching methods. Also note that the variance unexplained by the ANOVA model which is due to the random sampling variability in the observations is estimated as 35.17.

It is possible that students in our study may benefit more from one method than the others, based on their previous academic performance. Suppose we refine the study to include a covariate that measures some prior ability, such as a state-sanctioned Standards Based Assessment (SBA). Performing a One Way ANCOVA on this data gives the results in the table below, under the ANCOVA column heading.

 ANOVA ANCOVA Method Mean Std. Error Adjusted Mean Std. Error Coop 79.33 2.421 >82.09 0.782 Self 83.33 2.421 82.44 0.751 Lecture 86.83 2.421 84.97 0.764 P = 0.124 P =0.039 MSres= 35.17 MSres = 3.355

The adjusted mean that is given in the table for each method is a correction to the group mean to control for the effects of the covariate. The results show the adjusted means are significantly different with the Lecture method as the more successful. Notice how the standard errors of the means have decreased by almost a factor of three while the variance due to random sample variability has decreased by a factor of ten. A reduction in error is the usual consequence of introducing covariates and performing an ANCOVA analysis.
[/toggle] [toggle border=’2′ title=’Assumption Checking’] In addition to the Normality and Equal Variance options, the One Way ANCOVA options will include testing for the equality of slopes (the Interaction Model).

• Normality Test: There are two options, Shapiro-Wilk and Kolmogorov-Smirnov, as is provided for the parametric ANOVA tests. The default P Value to Reject is .05 and the default test is Shapiro-Wilk. Normality testing is performed on the residuals of the Equal Slopes model or, if the Equality of Slopes Test fails, then the normality test is performed on the residuals of the Interaction Model.
• Equal Variance Test: The default P Value to Reject is .05. Levene’s mean test is used to assess equal variance. The test is performed on the residuals of the Equal Slopes Model or, if the Equality of Slopes Test fails, then the equal variance test is performed on the residuals of the Interaction Model.
• Equality of Slopes – One of the assumptions of ANCOVA is that there is no interaction between the Factor variable (the treatment levels) and the covariate variables. In other words, the coefficient of each covariate in the model is assumed to be the same for all treatments. An Equality of Slopes option provides the calculations to test this assumption. If the Equality of Slopes option is not selected, then equality of the slopes will be assumed and the analysis for the report focuses on the results of fitting the ANCOVA Model (also called the Equal Slopes Model) to the user’s data. If the Equality of Slopes option is selected, then the Interaction Model will be fit to the data to determine if any of the interactions is significant. If an interaction between the factor and any covariate is significant (so the equality of slopes test fails), then the analysis will stop, but the regression equations for each group will be provided. If there is no significant interaction between the factor and any covariate (so the equality of slopes test passes), then the report will continue to provide the results of the Equal Slopes Model.
[/toggle] [toggle border=’2′ title=’ANCOVA Results’]

An example of the ANCOVA report is shown. The assumption checking results are displayed followed by the ANOVA table and the results interpretation. The adjusted means are displayed and then the multiple comparison results.

One Way Analysis of Covariance

Data source: Data 1 in ANCOVA_DataSets.JNB

Dependent Variable: Length

 Group Name N Missing Mean Std Dev SEM control 62 0 1.217 0.140 0.0178 exposed 1 27 0 1.206 0.0578 0.0111 exposed 2 40 0 1.050 0.0994 0.0157 Total 129 0 1.163 0.137 0.0121

Normality Test (Shapiro-Wilk): Passed (P = 0.057)

Equal Variance Test:Failed (P < 0.050)

Equal Slopes Test:

The equality of slopes assumption is tested by extending the ANCOVA regression model to include terms for the interactions of the factor with the covariates.

 R = 0.625 Rsqr = 0.390 AdjRsqr = 0.365

Analysis of Variance:

 Source of Variation DF SS MS F P Col 1 2 0.124 0.0618 5.166 0.007 Age 1 0.0369 0.0369 3.086 0.081 Col 1 x Age 2 0.0681 0.0340 2.844 0.062 Residual 123 1.472 0.0120 – – Total 128 2.413 0.0189 – –

The effect of the different treatment groups does not depend upon the value of covariate Age, averaging over the values of the remaining covariates. There is not a significant interaction between the factor Col 1 and the covariate Age (P = 0.062).

There are no signficant interactions between the factor and the covariates. The equals slopes assumption passes and the equal slopes model is analyzed below.

Analysis of Equal Slopes Model:

 R = 0.602 Rsqr = 0.362 AdjRsqr = 0.347

Analysis of Variance:

 Source of Variation DF SS MS F P Col 1 2 0.229 0.114 9.290 <0.001 Age 1 0.128 0.128 10.421 0.002 Residual 125 1.540 0.0123 – – Total 128 2.413 0.0189 – –

The differences in the adjusted means among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = <0.001). To isolate which group(s) differ most from the others use a multiple comparison procedure. The adjusted means and their statistics are given in the table below.

The coefficient of covariate Age in the equal slopes regression model is significantly different from zero (P = 0.002). The covariate significantly affects the values of the dependent variable.

 Group Name Adjusted Mean Std. Error 95%Conf-L 95%Conf-U control 1.200 0.0150 1.170 1.230 exposed 1 1.193 0.0217 1.150 1.236 exposed 2 1.085 0.0206 1.044 1.126

The adjusted means are the predicted values of the model for each group where each covariate variable is evaluated at the grand mean of its sampled values.

All Pairwise Multiple Comparison Procedures (Holm-Sidak method):

Comparisons for factor: Col 1

 Comparison Diff of Means t P P<0.050 control vs. exposed 2 0.115 4.178 <0.001 Yes exposed 1 vs. exposed 2 0.109 3.456 0.001 Yes control vs. exposed 1 0.00693 0.271 0.787 No

Regression Equations for the Equal Slopes Model:

Each equation below is obtained by restricting the effects-coded dummy variables in the regression model to the values corresponding to each factor group.

A significant difference in the adjusted means of the factor groups is equivalent to a significant difference in the intercepts of the dependent variable for these equations.

Group: control

Length = 1.307 – (0.00280 * Age)

Group: exposed 1

Length = 1.300 – (0.00280 * Age)

Group: exposed 2

Length = 1.192 – (0.00280 * Age)
[/toggle] [toggle border=’2′ title=’ANCOVA Result Graphs’] There are four ANCOVA result graphs – Regression Lines in GroupsScatter Plot of Residuals, Adjusted Means with Confidence Intervals and Normality Probability Plot.

Examples of ANCOVA Graphs    [/toggle] [toggle border=’2′ title=’Principal Components Analysis (PCA)’]

Principal Components Analysis (PCA)

Introduction

Principal component analysis (PCA) is a technique for reducing the complexity of high-dimensional data by approximating the data with fewer dimensions. Each new dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.

You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a low-dimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering and other statistical techniques.

The primary goal of Principal Components Analysis is to explain the sources of variability in the data and to represent the data with fewer variables while preserving most of the total variance.

Assumption Checking

Normality test – There are two options. Mardia’s Skewness and Kurtosis tests and the Henze-Zinkler test.

Principal Components Results

An example of the Principal Components report is shown. The assumption checking results are displayed followed by descriptive statistics, the correlation matrix and its eigenvalues. The number of in-model principal components is displayed along with a test for equality of eigenvalues. Based on these results, interpretations are given as to the number of principal components supported.

Principal Components Analysis

Normality Test (Henze-Zinkler):

Statistic = 1.351 Failed (P < 0.050)

Descriptive Statistics:

 Variable Mean Std Dev Murder 7.444 3.867 Rape 25.734 10.760 Robbery 124.092 88.349 Assault 211.300 100.253 Burglary 1291.904 432.456 Larceny 2671.288 725.909 Auto_Theft 377.526 193.394

Total Observations: 50

Missing: 0

Valid Observations: 50

An observation is missing if any worksheet cell in its row has a non-numeric value.

Correlation Matrix:

 Murder Rape Robbery Assault Burglary Larceny Auto_Theft Murder 1.000 – – – – – Rape 0.601 1.000 – – – – – Robbery 0.484 0.592 1.000 – – – – Assault 0.649 0.740 0.557 1.000 – – – Burglary 0.386 0.712 0.637 0.623 1.000 – – Larceny 0.102 0.614 0.447 0.404 0.792 1.000 – Auto_Theft 0.0688 0.349 0.591 0.276 0.558 0.444 1.000

Total Variance: = 7.000

Eigenvalues of the Correlation Matrix:

 Eigenvalue Difference Proportion(%) Cumulative(%) 1 4.115 2.876 58.785 58.785 2 1.239 0.513 17.696 76.481 3 0.726 0.409 10.369 86.850 4 0.316 0.0585 4.520 91.370 5 0.258 0.0359 3.685 95.056 6 0.222 0.0980 3.172 98.228 7 0.124 — 1.772 100.000

If two or more eigenvalues have the same value, then the corresponding principal components are not well-defined and any interpretation of them is suspect.

Number of In-Model Principal Components = 2

The in-model components correspond to all eigenvalues greater than or equal to the average eigenvalue. When analyzing the correlation matrix, the average eigenvalue is always 1.0. This criterion can be changed in the Test Options dialog on the Criterion panel. The variance of each principal component equals its corresponding eigenvalue.

Chi-Square Tests for the Equality of Eigenvalues:

Hypothesis: All eigenvalues are equal.
Statistic = 224.295
Degrees of freedom = 21.000
P value = <0.001

There is a significant difference in the eigenvalues. A principal components analysis can be conducted.

Hypothesis: The last 5 eigenvalues are equal.
Statistic = 39.287
Degrees of freedom = 13.209
P value = <0.001

There is a significant difference in the last 5 eigenvalues. You may want to include additional principal components in your model by changing the settings in the Test Options dialog on the Criterion panel.

Eigenvectors of the Correlation Matrix:

 PC 1 PC 2 Murder 0.300 -0.629 Rape 0.432 -0.169 Robbery 0.397 0.0422 Assault 0.397 -0.344 Burglary 0.440 0.203 Larceny 0.357 0.402 Auto_Theft 0.295 0.502

Each principal component is a linear combination of the original variables, after each original variable has been standardized to have unit variance. The coefficients of this linear combination are the entries in the corresponding column of the above table. These coefficients provide the interpretation of the principal components in terms of the original variables.

Standard Errors for the Eigenvector Entries:

 PC 1 PC 2 Murder 0.0754 0.0790 Rape 0.0387 0.100 Robbery 0.0476 0.153 Assault 0.0512 0.0888 Burglary 0.0371 0.0898 Larceny 0.0620 0.151 Auto_Theft 0.0729 0.160

 PC 1 PC 2 Murder 0.609 -0.700 Rape 0.876 -0.189 Robbery 0.805 0.0470 Assault 0.805 -0.382 Burglary 0.893 0.226 Larceny 0.725 0.448 Auto_Theft 0.599 0.559

If the principle components are standardized to have unit variance, the loadings are the coefficients of the linear combination of in-model principal components used to approximate the original variables. If a correlation matrix is analyzed, then the loadings equal the correlations between the original variables and the principal components.

Fitted Correlation Matrix:

 Murder Rape Robbery Assault Burglary Larceny Auto_Theft Murder 0.861 – – – – – – Rape 0.666 0.803 – – – – – Robbery 0.457 0.696 0.650 – – – – Assault 0.758 0.777 0.630 0.794 – – – Burglary 0.385 0.739 0.729 0.632 0.848 – – Larceny 0.128 0.550 0.605 0.412 0.749 0.726 – Auto_Theft -0.0268 0.419 0.508 0.268 0.661 0.684 0.671

This is an estimate of the correlation matrix that results by approximating the original variables with the in-model principal components.
[/toggle] [toggle border=’2′ title=’Principal Components Results Graphs’]

Principal Components Results Graphs

There are three PCA result graphs – Scree Plot, Component Loadings Plot, and Component Scores Plot. Below are examples of the result graphs together with captions explaining the information the graphs contain. The graphs are based on a study of crime data gathered across the United States. The original variables in the data are seven types of crimes: Murder, Rape, Larceny, Burglary, Auto Theft, Robbery and Assault. The rates per 100,000 people were measured for all 50 states.   [/toggle]