SigmaStat Helps You Analyze Data Confidently, Visualize Results Easily
SigmaStat is an easy-to-use, wizard-based statistical software package designed to guide users through every step of the analysis and perform powerful statistical analysis without being a statistical expert. SigmaStat is tailored to the areas of life science and medical research, but can be a valuable product to scientists in many fields.
With SigmaStat, you can be confident that you have analyzed your data correctly.
And you save time, too!
Because it takes you step by step through the analysis, SigmaStat ensures that you:
• Use the proper statistical method to analyze your data
• Avoid the risk of statistical error
• Interpret the results correctly
• Generate an appropriate display and a professional report
SigmaStat 4.0 Overview
SigmaStat provides a wide range of powerful yet easy-to-use statistical analyses specifically designed to meet the needs of research scientists and engineers.
With the features in the program, you are guided through the process of choosing the appropriate test to analyze your data, running the test, and interpreting the results in the test report.
For many tests, graphs are available to summarize the test results.
The editing features in the program allow you to customize the appearance of reports and graphs. Your final results may be distributed using the large variety of file formats available for export.
- Solve a nonlinear regression problem with one of our 150 built-in fit equations or with a user-defined fit equation
- Raw data for the regression can be selected from the worksheet using a variety of data formats or can be selected from a plot in a graph
- The program’s default fit library contains models based on polynomials, rational functions, exponential growth and decay, sigmoidal functions, ligand binding, waveforms, logarithmic functions, probability distribution and density functions, piecewise linear and others
- User-defined equations are created from the Regression Wizard using the Edit Function dialog and are saved to our default fit library. They can optionally be saved to any notebook file. An equation item in a notebook will launch the Regression Wizard when double-clicked
- Fit model equations are coded with the Transform Language and can contain definitions of constants, weight variables, linear equality and inequality constraints and other variables
- Initial parameter values required by the algorithm can be specified as constants or be defined using our automatic parameter estimation functions in the Transform Language
- Fit equations can contain up to 500 parameters and up to 50 independent variables
- Weighted regression is supported with weights defined as a constant per observation or as functions of the regression parameters. Weight functions permit the user to apply robust procedures for parameter estimation that mitigate the effects of outliers. Examples are given in the program’s fit library and the installed sample files
- Options are available to create many types of results shown in the regression report or the worksheet.
- Graphs of the best-fit equation with the raw data can be created for models with two or three independent variables. Confidence bands can be added
- Models can be created to solve other type of problems besides ordinary data fitting. Examples are contained in the sample files and include global curve fitting, solving systems of equations, quantile regression and distribution fitting
- Produce a frequency histogram of a worksheet column
- Select from multiple graph output styles
Plot Equation Dialog
- Create the graph of a function in two or three dimensions
- Enter a user-defined function or select an equation item from an equation library
- Evaluate functions for specific values of the independent variable(s) or solve equations to obtain values for the independent variable for specified values of the dependent variable. Copy the results to paste to the worksheet, report or graph page
- Write your own numerical procedures, called Transforms, in the User-Defined Transform dialog. The Transform Language provides a vector-based computational environment with operations and functions that can manipulate worksheet data and perform many computations important to data analysis
- Transforms can be saved as items in a notebook file or as separate files with the extension (.xfm). The installed program contains several(.xfm) files. Examples of computational procedures in these transforms include cumulative distribution functions, bootstrapping, peak finding, frequency tables, the D’Agostino-Pearson normality test and estimating the variance of functions of a random variable
- The Transform Language is used in the Plot Equation dialog and the Regression Wizard to define Equation items in a notebook
- You can use the Quick Transform dialog for quickly creating and computing single-line transforms
- The dialog supports column picking and a functions palette to create transforms easily. The output columns from a quick transform can be titled with the transform itself for later reference
- Quick transforms are saved with the worksheet they use for their output
- The output of a quick transform can be updated automatically for changes in the input data
Typical graphs that can be rendered using the new SigmaStat v4:
Test Result Graphs
Graph Wizard and Histogram Example Graphs
Some of the many features found in new SigmaStat Version 4:
- Over 50 statistical tests to analyze your data
- Ribbon interface improves access to program commands
- An Advisor Wizard to assist you in choosing the correct test that is appropriate for your data
- A programming capability, the transform language, for creating additional analysis procedures and creating user-defined equations for fitting and plotting
- Powerful graph editing capabilities to enhance the appearance of graphs
- A Graph Wizard to create graphs that supplement the result graphs produced by the tests
- Additional graphing interfaces, including the Histogram Wizard, the Plot Equation dialog, and the Plot Regression dialog
- Sample files to assist you in understanding data formats, graph editing features, and the nonlinear regression capabilities
- Numerous import/export file formats for notebooks, worksheets, reports, and graphs
- Several data formats for statistics, including raw, indexed, and tabulated formats
- Several data formats for creating 2D and 3D graphs
Import/Export Worksheets and Notebooks
- ASCII file importer which allows importing comma delimited files and user-selected delimiters
- Excel, SPSS, Minitab, SYSTAT and SAS input data formats are supported by SigmaStat
- Excel and Access database files are supported
- Import any ODBC compliant database
- Run SQL queries on tables and selectively import information
- Export worksheets to plain and comma delimited text files
- Export worksheets to Excel, Minitab, SYSTAT, SAS, and more
- Export notebook files to earlier versions of SigmaStat and SigmaPlot
- Export a selection on a graph page or the entire graph page to bitmap or vector image files
- You can specify the color resolution, spatial resolution (DPI), height, and width of the exported images
- The bitmap formats include JPEG, GIF, PNG, BMP, TIFF-RGB, TIFF-CMYK, and PDF raster
- The vector formats include Enhanced Metafile (EMF), Windows Metafile (WMF), EPS, PDF vector, SVG and SWF
- Save a graph as a web page
- Can hold SigmaPlot worksheets, Excel worksheets, graph pages, reports, equations, and transforms
- A notebook docking pane that has several states: docks, resizable, hide-able, summary information mode, etc
- Browser-like notebook functionality that supports drag-n-drop capabilities
- Direct-editing of notebook summary information
- Multiple undo
- Adjust row height and column widths
- Long text strings and variable names
- Add row titles
- Format cells and empty columns
- Formatted text (subscript, etc.) in worksheet cells
- Data linked to graphs
- Handles missing data
- Interactive column title editing
- Promote text labels to column titles
- 32 million rows, 32,000 columns (limited by available memory)
- Sort, index, and stack data
- Transpose rows/columns
- Cut, Paste, Copy, etc
- Export in PDF, RTF, and HTML formats
- Select contents, copy & paste to Microsoft Word for additional editing
- Drag and drop Word documents into reports
- Multiple undo
- Print preview
- Supports embedded objects
- Result graphs can copied and pasted into reports
- Graphical ruler to set tabs and margins
- Change font, text size, and text color
- Add numbered and bulleted lists
- Support for tables
- Change alignment, indentation, background color
- Find and Replace
- Option to “Explain Test Results”
- Option to set the number of significant digits for numeric data
- Help – Extensive online Help and PDF documentation for using SigmaStat
- Quick Access Toolbar – Customize to quickly access frequently used commands
- Tip of the Day – Provides useful information about program features
- Statistics Samples – A file of sample data for all statistical tests arranged in various formats
- Password Protection and Auditing – Limit access to notebook files
Typical Graphing Features
Graph Wizard Graphs
- Single and Multiple Scatter Plots
- Single and Multiple Line Plots
- Step Plots
- Bar Charts with Error Bars
- Column Means with Error Bars
- Point Plots
- Point and Column Means
- Box Plots
- Pie Chart
- Raw Residuals
- Standardized Residuals
- Normal Probability Plot
- 3D Scatter Plots
- 3D Mesh Plots
- Edit the attributes for graphs, axes, and plots with the Graph Properties dialog
- Customize attributes for graph objects with the Object Properties dialog
- Modify text on a graph page with the Edit Text dialog
- Change graph page options with the Page Setup dialog
- Multiple undo
- Select plots to perform regression analysis
- Predefined color schemes for symbols, lines, and solids
- Modify the computation of error bars
- Eight axis scaling options
- Gradient fills, color transparency, antialiasing
- Legends created automatically
- Multi-level zooming
Test Result Graphs
- Column Means (bar and scatter plots) with Error Bars
- Point Plot
- Histogram of Residuals
- Normal Probability Plot
- Before and After Plots
- Multiple Comparison Graph
- Grouped Bar Charts
- 3D Residual Scatter
- 3D Category Scatter
- ANOVA Profile Plots for Main Effects and Interactions
- Scatter Plot Residuals
- Adjusted Means Scatter with Error Bars
- Regression Lines and Scatter for Groups
- Box Plot
- Bar Chart Standardized Residuals
- Regression Curves with Confidence and Prediction Bands
- 3D Scatter and Mesh Plot
- Scatter Correlation Matrix
- Point and Column Means
- Survival Curves
- Adjusted Survival Curve
- Cumulative Hazard Curves
- Log-Log Survival Curves
- Scree Plot
- Component Loadings Plot
- Component Scores Plot
In 2007, SigmaStat’s functions and features were integrated into SigmaPlot starting at version 11.
However, on February 1, 2016, SigmaStat 4.0 was released and is now available as a standalone product.
Below are the many improvements to the statistical analysis functions in SigmaStat:
- Principal Components Analysis (PCA) – Principal component analysis is a technique for reducing the complexity of high-dimensional data by approximating the data with fewer dimensions.
- Analysis of Covariance (ANCOVA) – Analysis of Covariance is an extension of ANOVA (Analysis of Variance) obtained by specifying one or more covariates as additional variables in the model.
- Cox Regression – This includes the proportional hazards model with stratification to study the impact of potential risk factors on the survival time of a population. The input data can be categorical.
- One-Sample T-test – Tests the hypothesis that the mean of a population equals a specified value.
- Odds Ratio and Relative Risk tests – Both tests the hypothesis that a treatment has no effect on the rate of occurrence of some specified event in a population. Odds Ratio is used in retrospective studies to determine the treatment effect after the event has been observed. Relative Risk is used in prospective studies where the treatment and control groups have been chosen before the event occurs.
- Shapiro-Wilk Normality test – A more accurate test than Kolmogorov-Smirnov for assessing the normality of sampled data. Used in assumption checking for many statistical tests, but can also be used directly on worksheet data.
- New Result Graph – ANOVA Profile Plots: Used to analyze the main effects and higher-order interactions of factors in a multi-factor ANOVA design by comparing averages of the least square means.
- New Probability Transforms – Thirty four new functions have been added to SigmaStat’s Transform language for calculating probabilities and scores associated with distributions that arise in many fields of study.
- New Interface Change – Nonlinear Regression: An easy to use wizard interface and more detailed reports.
- New Interface Change – Quick Transforms: An easier way of performing computations in the worksheet.
- New Interface Change – New User Interface: Allows the user to work more easily with Excel worksheets.
- Yates correction added to the Mann-Whitney test – Yates correction for continuity, or Yates chi-square test is used when testing for independence in a contingency table when assessing whether two samples of observations come from the same distribution.
- Improved Error Messaging – Improved error messages have added information when assumption checking for ANOVA has failed.
- Deming Regression – Deming regression allows for errors in both X and Y variables – a technique for method comparison where the X data is from one method and the y data the other. The Deming regression method basically extends the normal linear regression, where the X values are considered to be error-free, to the case where both X and Y (both methods) have error. Hypotheses can then be tested, slope different from 1.0 for example, to determine if the methods are the same. For example, it might be used to compare two instruments designed to measure the same substance or to compare two algorithmic methods of detecting tumors in images. The graph compares the two methods to determine if they are different or the same. A report gives statistical results.
- Akaike Information Criterion (AICc) – The Akaike Information Criterion is now available in nonlinear regression reports. It is a goodness of fit criterion that also accounts for the number of parameters in the equation. It also is valid for non-nested equations that occur, for example, in enzyme kinetics analyses.
- New Probability Functions for Nonlinear Regression – A total of 24 probability functions have been added to the curve fit library. Automatic initial parameter estimate equations have been created for each.
- Nonlinear Regression Weighting – There are now seven different weighting functions built into each nonlinear regression equation (3D are slightly different). These functions are reciprocal y, reciprocal y squared, reciprocal x, reciprocal x squared, reciprocal predicteds, reciprocal predicteds squared and Cauchy. The iteratively reweighted least squares algorithm is used to allow the weights to change during each nonlinear regression iteration.
- Multiple Comparison Test Improvements – Two important improvements have been made. P values for the results of nonparametric ANOVA have been added. These did not exist before. Also, multiple comparison P values were limited to discrete choices (0.05, 0.01, etc.). This limitation no longer exists and any valid P value may be used.
Major New Statistical Tests
Principal Components Analysis (PCA) – Principal component analysis is a technique for reducing the complexity of high-dimensional data by approximating the data with fewer dimensions. Each new dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.
You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a low-dimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering, and other statistical techniques. The primary goal of Principal Components Analysis is to explain the sources of variability in the data and to represent the data with fewer variables while preserving most of the total variance.
Examples of Principal Components Graphs:
Analysis of Covariance (ANCOVA) – Analysis of Covariance is an extension of ANOVA obtained by specifying one or more covariates as additional variables in the model. If you arrange ANCOVA data in a SigmaPlot worksheet using the indexed data format, one column will represent the factor and one column will represent the dependent variable (the observations) as in an ANOVA design. In addition, you will have one column for each covariate.
Examples of ANCOVA Graphs:
New Statistical Features
Multiple Comparison Improvements – A significant improvement in multiple comparison P value computation has been made. The following multiple comparison procedures, Tukey (non-parametric tests), SNK (non-parametric tests), Dunnett’s, Dunn’s (non-parametric only), and Duncan did not compute p-values analytically, but instead used lookup tables of critical values for a particular distribution to determine, using interpolation if necessary, whether a computed test statistic represented a significant difference in the group means. Thus, p-values were not reported for these tests, but only the conclusion of whether a significant difference existed or not. A major problem with this approach is that the lookup tables are only available for two significance levels, .05 and .01. Another problem is that many customers want to know the p-values. For SigmaPlot, algorithms have been coded to compute the distributions for the test statistics for all post-hoc procedures, making the lookup tables obsolete. As a result, adjusted p-values for all post-hoc procedures are now placed in the report. Also, there is no longer any need to restrict the significance level of multiple comparisons to .05 or .01. Instead, the significance level of multiple comparisons will be the same as the significance level of the main (omnibus) test. There is no limitation on this p value – any valid value may be used.
Akaike Information Criterion (AICc) – The Akaike Information Criterion has been addedto the Regression Wizard and Dynamic Fit Wizard reports and the Report Options dialog. It provides a method for measuring the relative performance in fitting a regression model to a given set of data. Founded on the concept of information entropy, the criterion offers a relative measure of the information lost in using a model to describe the data.
More specifically, it gives a tradeoff between maximizing the likelihood for the estimated model (the same as minimizing the residual sum of squares if the data is normally distributed) and keeping the number of free parameters in the model to a minimum, reducing its complexity. Although goodness-of-fit is almost always improved by adding more parameters, overfitting will increase the sensitivity of the model to changes in the input data and can ruin its predictive capability.
The basic reason for using AIC is as a guide to model selection. In practice, it is computed for a set of candidate models and a given data set. The model with the smallest AIC value is selected as the model in the set which best represents the “true” model, or the model that minimizes the information loss, which is what AIC is designed to estimate.
After the model with the minimum AIC has been determined, a relative likelihood can also be computed for each of the other candidate models to measure the probability of reducing the information loss relative to the model with the minimum AIC. The relative likelihood can assist the investigator in deciding whether more than one model in the set should be kept for further consideration.
The computation of AIC is based on the following general formula obtained by Akaike1
where is the number of free parameters in the model and is the maximized value of the likelihood function for the estimated model.
When the sample size of the data is small relative to the number of parameters (some authors say when is not more than a few times larger than ), AIC will not perform as well to protect against overfitting. In this case, there is a corrected version of AIC given by:
It is seen that AICc imposes a greater penalty than AIC when there are extra parameters. Most authors seem to agree that AICc should be used instead of AIC in all situations.
Probability Fit Functions – 24 new probability fit functions have been added to the fit library standard.jfl. These functions and some equations and graph shapes are shown below:
As an example, the fit file for the Lognormal Density function contains the equation for the lognormal density lognormden(x,a,b), equations for automatic initial parameter estimation and the seven new weighting functions.
x = col(1)
y = col(2)
reciprocal_y = 1/abs(y)
reciprocal_ysquare = 1/y^2
reciprocal_x = 1/abs(x)
reciprocal_xsquare = 1/x^2
reciprocal_pred = 1/abs(f)
reciprocal_predsqr = 1/f^2
weight_Cauchy = 1/(1+4*(y-f)^2)
‘Automatic Initial Parameter Estimate Functions
u = subblock(s,1,1,1, size(x))
v = subblock(s,2,1,2, size(y))
meanest = trap(u*v,u)
p = 1+varest/meanest^2
a= if(meanest > 0, ln(meanest/sqrt(p)), 0)
b= if(p >= 1, sqrt(ln(p)), 1)
fit f to y
”fit f to y with weight reciprocal_y
”fit f to y with weight reciprocal_ysquare
”fit f to y with weight reciprocal_x
”fit f to y with weight reciprocal_xsquare
”fit f to y with weight reciprocal_pred
”fit f to y with weight reciprocal_predsqr
”fit f to y with weight weight_Cauchy
Weight Functions in Nonlinear Regression – SigmaPlot equation items sometimes use a weight variable for the purpose of assigning a weight to each observation (or response) in a regression data set. The weight for an observation measures its uncertainty relative to the probability distribution from which it’s sampled. A larger weight indicates an observation which varies little from the true mean of its distribution while a smaller weight refers to an observation that is sampled more from the tail of its distribution.
Under the statistical assumptions for estimating the parameters of a fit model using the least squares approach, the weights are, up to a scale factor, equal to the reciprocal of the population variances of the (Gaussian) distributions from which the observations are sampled. Here, we define a residual, sometimes called a raw residual, to be the difference between an observation and the predicted value (the value of the fit model) at a given value of the independent variable(s). If the variances of the observations are not all the same (heteroscedasticity), then a weight variable is needed and the weighted least squaresproblem of minimizing the weighted sum of squares of the residuals is solved to find the best-fit parameters.
Our new feature will allow the user to define a weight variable as a function of the parameters contained in the fit model. Seven predefined weight functions have been added to each fit function (3D functions are slightly different). The seven shown below are 1/y, 1/y2, 1/x, 1/x2, 1/predicted, 1/predicteds2 and Cauchy.
One application of this more general adaptive weighting is in situations where the variances of the observations cannot be determined prior to performing the fit. For example, if all observations are Poisson distributed, then the population means, which are what the predicted values are estimating, equal to the population variances. Although the least squares approach for estimating parameters is designed for normally distributed data, other distributions are sometimes used with least squares when other methods are unavailable. In the case of Poisson data, we need to define the weight variable as the reciprocal of the predicted values. This procedure is sometimes referred to as “weighting by predicted values”.
Another application of adaptive weighting is to obtain robust procedures for estimating parameters that mitigate the effects of outliers. Occasionally, there may be a few observations in a data set that are sampled from the tail of their distributions with small probability, or there are a few observations that are sampled from distributions that deviate slightly from the normality assumption used in least squares estimation and thus contaminate the data set. These aberrant observations, called outliers in the response variable, can have a significant impact on the fit results because they have relatively large (raw or weighted) residuals and thus inflate the sum of squares that is being minimized.
One way to mitigate the effects of outliers is to use a weight variable that is a function of the residuals (and hence, also a function of the parameters), where the weight assigned to an observation is inversely related to the size of the residual. The definition of what weighting function to use depends upon assumptions about the distributions of observations (assuming they are not normal) and a scheme for deciding what size of residual to tolerate. The Cauchy weighting function is defined in terms of the residuals, y-f where y is the dependent variable value and f is the fitting function, and can be used for minimizing the effect of outliers.
weight_Cauchy = 1/(1+4*(y-f)^2)
The parameter estimation algorithm we use for adaptive weighting, iteratively reweighted least squares (IRLS), is based on solving a sequence of constant-weighted least squares problems where each sub-problem is solved using our current implementation of the Levenberg-Marquardt algorithm. This process begins by evaluating the weights using the initial parameter values and then minimizing the sum of squares with these fixed weights. The best-fit parameters are then used to re-evaluate the weights.
With the new weight values, the above process of minimizing the sum of squares is repeated. We continue in this fashion until convergence is achieved. The criterion used for convergence is that the relative error between the square root of the sum of the weighted residuals for the current parameter values and the square root of the sum of weighted residuals for the parameter values of the previous iteration is less than the tolerance value set in the equation item. As with other estimation procedures, convergence is not guaranteed.
Analysis of Variance and Covariance
- Independent and paired two sample t-tests, one-sample t-test
- One, two, and three way ANOVA
- One and two way repeated measures and mixed ANOVA
- One way ANCOVA with multiple covariates
- One-sample signed rank test
- Mann-Whitney rank sum test
- Wilcoxon signed rank test
- Kruskal-Wallis ANOVA on ranks
- Friedman repeated measures ANOVA
- Pearson product moment
- Spearman rank order
- Sample mean, standard deviation, standard error of mean, median, percentiles, sum and sum of squares, skewness, kurtosis, confidence interval for the mean, range, maximum and minimum values, normality tests, sample size, missing value content
Principal Components Analysis
- Covariance or correlation matrix analysis, multiple methods of component selection
Power and Sample Size in Experimental Design
- T-tests, ANOVA, proportions, chi-square and correlation
ANOVA Multiple Comparison Procedures
- Duncan’s Multiple Range
- Fisher LSD
- Bonferroni t-test
- Dunnett’s t-test
- Dunn’s test
- Stack data
- Index and Un-index data for one or two factor variables
- Center, standardize, and rank data
- Apply simple transforms to data using arithmetic operations and basic numerical functions
- Create dummy variables with either reference coding or effects coding
- Create sequences of random numbers that are uniform or normally distributed
- Filter data from worksheet columns using a key column
- Linear and multiple linear
- Polynomial for a specific order and model comparisons for several orders
- Stepwise, forward and backward
- Best subsets
- Multiple logistic
- Nonlinear,using built-in and user-defined models
Rates and Proportions
- Chi-Square analysis of contingency tables
- McNemar’s test
- Fisher exact test
- Odds Ratio
- Relative Risk
- Kaplan-Meier, including single group, LogRank, and Gehen-Breslow
- Cox Regression, including proportional hazards and stratified models
- Kolmogorov-Smirnov with Lilliefors correction
- Brown-Forsythe test for ANOVA analysis
- Spearman rank correlation test for regression analysis
- Check test assumptions like normality and equal variance that, if violated, may prompt the user to choose an alternate test
- Criterion options that change the way the analysis for a test is performed
- Options for running multiple comparison tests depending on the significance of main test effects
- Options for placing residuals, confidence intervals, predicted values, and weights in the worksheet
- Additional statistics and diagnostics in reports that explore your data and enhance the results, including outlier detection, multicollinearity, retrospective power and residual analysis
- Result graph options for scaling, error bars and color.