# ActivStats Tools

## Statistics Tools

These tools are provided free for students of statistics (and others). These tools are based on the ActivStats statistics e-book.

The first three demonstrate fundamental principles important to statistics.

The four online tables are a convenient way to look up critical values from those distributions.

The final four tools provide randomization-based ways to perform inference that can be used to supplement classical frequentest inference methods—or even to replace them.

These tools provide built-in datasets, but can also be used with any tab-delimited text data file for which the first row holds variable names.

Contact Us to submit an issue to the Data Desk Technical Support Team.

This is a simulation that demonstrates the Law of Large numbers. Click on the striped bar to see a cursor speed across the bar. Release your mouse button to stop the cursor. The color over which it stops appears in the circle to the right and falls into the appropriate bin for its color. The graph at the bottom shows the accumulated percentage of red balls. Notice that the percentage settles down with more and more results as the Law of Large Numbers says it should. Because the selection is under your control, you can be assured that the results are truly random and not pre-determined or even “pseudo-random”.

To accelerate the process, drag the slider to specify a “batch” size and click the Sample button to run the simulation many times.

Note that the “population” proportion of red (the fraction of the bar that actually is red) is randomly determined, so the value to which the experimental results converge is different each time you use the tool. Click the Reset button to make a new colored bar (with a different true percentage of red).

This tool is the continuous version of the Law of Large Numbers tool. Click on the blue bar to start the cursor. Release the mouse button to stop the cursor. The cursor runs at a constant speed between 0 and 1. The value at which it stops is shown to the right and falls into the collection bin. The graph at the bottom shows the mean of the accumulated values. As in the LLN tool, you can adjust the slide to select the size of a batch.

This tool demonstrates the Central Limit Theorem. In operation, it works much like the Random Quantitative Generator tool (which you might want to look at first.) Click on the blue bar to start the cursor. Release the mouse button to stop the cursor. The cursor runs at a constant speed between 0 and 1. The value at which it stops is shown to the right and falls into the collection bin. Now set your sample size, n, between 1 and 36 with the slider. The tool will draw that many values at random, placing them in the bin on the right. It will then find the mean of these values and add it to the histogram at the bottom of the window. To speed things up, select a Batch size with the second slider. The tool will then repeat the experiment that many times, accumulating a histogram of the sample means. If the maximum of 300 samples isn’t enough, click the Sample button several times to add to your sample of means.

Notice that the histogram tends to be unimodal and symmetric and to resemble a Normal model. Experiment with the sample size to see how that affect the shape and spread of the histogram.

Now select a different underlying shape for the data from the list of alternatives. Physically, what happens is that the speed of the cursor adjusts to the height of the population distribution so that values are drawn at random from that population.

How does the tendency for the means to follow a Normal model depend on the sample size for different population distributions?

This is an interactive Normal probability table. It works just like those found in the back of most statistics textbooks, except that the graph at the top of the page changes to show the selected area under the Normal curve.

You can also drag flag in the graph to any point you like and see the corresponding values in the table.

If you prefer a direct interface, click the Switch View command and type the z-score or probability into the box provided. The graph is still interactive here.

This is an interactive Student’s t probability table. It works just like those found in the back of most statistics textbooks, except that the graph at the top of the page changes to show the shape of the distribution (varying by degrees of freedom) and to show the selected area under the curve, and the table extends to 1,000 degrees of freedom.

You can also drag flag in the graph to any point you like and see the corresponding values in the table at the df currently selected in the table.

Paper t-tables are ordinarily reported only for selected probabilities, but by clicking the Insert Probability button, you can add a column for any probability value you like.

This is an interactive Chi square probability table. It works just like those found in the back of most statistics textbooks, except that the graph at the top of the page changes to show the selected area under the curve and to show the shape of the curve as it corresponds to the degrees of freedom. And the table extends to 200 df. You can also drag flag in the graph to any point you like and see the corresponding values in the table at the df currently selected in the table.

Paper Chi Square tables are ordinarily reported only for selected probabilities, but by clicking the Insert Probability button, you can add a column for any probability value you like.

The Chi square distribution changes radically from 1 df to more df. Try clicking at the top of a column and dragging down the table while watching the graph of the distribution. Chi square becomes more nearly Normal, and its mean slides to higher values with higher df. In fact, the mean of the chi square is the number of df. (The mode is at df – 1.) Above 200 df, use the fact that as the df, k, approaches infinity Chi square on k df approaches the Normal with mean k and standard deviation sqrt(2k).

This is an interactive F probability table. It works just like those found in the back of most statistics textbooks, except that the graph at the top of the page changes to show the selected area under the curve and to show the shape of the curve as it corresponds to the degrees of freedom. And the table extends to 120 and 120 df.

You can also drag flag in the graph to any point you like and see the corresponding values in the table at the df currently selected in the table.

This tool provides a way to draw a simple random sample from a set of data. From the Data menu select Built-in for a dataset from the archive or Import… to use your own data. Imported data should be in tab-delimited text format with variable names in the first row.

You will then see the first row of the data and a drop-down menu from which to choose the column of your data table you wish to sample from. When you select a column, the tool will make a histogram of the data in that column. To choose a different column, select the Change Variable command from the Data menu.

Now specify the sample size and the number of samples to draw Click the Run button to perform the sampling. The lower histogram displays the means of the samples as they are drawn. It also shows limits on a middle portion of those means. You can drag the flags at the edges to see how wide an interval is needed to “catch” what percentage of the means. When you do this, the upper histogram (of the data) shows the corresponding values on the scale of the original data to show how the means vary less than the data. The Options menu offers a way to “unlink” the limits of the selected interval if you wish to make a one-sided interval.

The regression simple sample tool performs repeated samples (without replacement) from the data provided, computes the least squares regression slope relating two variables, and accumulates the resulting slopes. From the Data menu select Built-in for a dataset from the archive or Import… to use your own data. Imported data should be in tab-delimited text format with variable names in the first row.

You will then see the first row of the data and a drop-down menu from which to choose the columns of your data table you wish to relate. When you select the columns, the tool will make a scatterplot of the data and show a least squares regression line. The means of the two variables and the slope of the line are shown to the right of the scatterplot.

To choose different columns, select the Change Variable command from the Data menu.

Now specify the size of the samples and the number of samples to draw.

Click the Run button. As a sample is drawn, the corresponding points highlight on the scatterplot and the least squares slope corresponding to those points is drawn on the plot. The scatterplot shows a “sheaf” of regression lines fit for each sample. The lower graph is a histogram of the slopes of the samples as they are drawn. It also shows limits on a middle portion of those means. You can drag the flags at the edges to locate a middle fraction of the slopes. When you do this, the scatterplot of the data shows the corresponding slope limits with red lines.

The bootstrap mean tool finds a bootstrapped confidence interval for the mean of any variable. From the Data menu select Built-in for a dataset from the archive or Import… to use your own data. Imported data should be in tab-delimited text format with variable names in the first row.

You will then see the first row of the data and a drop-down menu from which to choose the column of your data table for which you wish to find a bootstrapped confidence interval. When you select a column, the tool will make a histogram of the data in that column. To choose a different column, select the Change Variable command from the Data menu.

Now specify the number of bootstrap samples to draw Click the Run button to perform the bootstrap. The lower histogram displays the means of the bootstrap samples as they are drawn. It also shows limits on a middle portion of those means. You can drag the flags at the edges to see how wide an interval is needed to find a bootstrap confidence interval for the mean. When you do this, the upper histogram (of the data) shows the corresponding values on the scale of the original data to show how the means vary less than the data. The Options menu offers a way to “unlink” the limits of the selected interval if you wish to make a one-sided interval.

The regression bootstrap tool performs a bootstrapped estimate of a confidence interval for the least squares regression slope relating two variables. From the Data menu select Built-in for a dataset from the archive or Import… to use your own data. Imported data should be in tab-delimited text format with variable names in the first row.

You will then see the first row of the data and a drop-down menu from which to choose the columns of your data table you wish to relate. When you select the columns, the tool will make a scatterplot of the data and show a least squares regression line. The means of the two variables and the slope of the line are shown to the right of the scatterplot.

To choose different columns, select the Change Variable command from the Data menu.

Now specify the number of bootstrap samples to draw.

Click the Run button to perform the bootstrap. As a bootstrap sample is drawn, the corresponding points highlight on the scatterplot and the least squares slope corresponding to those points is drawn on the plot. The scatterplot shows a “sheaf” of regression lines fit for each bootstrap sample. The lower graph is a histogram of the slopes of the bootstrap samples as they are drawn. It also shows limits on a middle portion of those means. You can drag the flags at the edges to see how wide an interval is needed to find a bootstrap confidence interval for the slope. When you do this, the scatterplot of the data shows the corresponding slope limits with red lines.

Technical note: There are other ways to bootstrap for regression slopes. This is the simplest, but it may not yield the narrowest confidence intervals. Consult more technical sources for details.

Take two variables, one binary variable X and one quantitative Y. Here we have the speeds of cars driving on a hill. The speeds are associated with driving uphill or downhill.

Determine the difference of means for both values of X.

Assign the set of Y variables to the set of X variables randomly, and repeat this difference of means calculation.

By plotting many random cases, we can compare our result to a distribution of null cases.

Take two variables, one binary variable X and one binary Y. For example, sex and handedness.

Record the count in the bottom left cell.

Assign the set of Y variables to the set of X variables randomly, and record the new value of the bottom left cell.

By plotting many random cases, we can compare our result to a distribution of null cases.