statistical summary – R-posts.com

Hello,

I’d like to share CougarStats, a free and open-source R Shiny web app I developed to support the teaching and learning of Statistics. CougarStats runs entirely in a browser and is designed for accessibility and ease of use. You can explore the app here: https://www.cougarstats.ca/

The name CougarStats is inspired by Mount Royal University’s athletics mascot, the cougar, symbolizing strength and agility, and by the app’s focus on statistics.

Key features of CougarStats

Descriptive Statistics: Compute measures like mean, median, mode, quartiles, IQR, standard deviation, and identify potential outliers.
Data Visualization: Construct Boxplots, Histograms, and Scatterplots.
Probability: Calculate marginal, joint, union, and conditional probability for contingency tables; exact and cumulative probabilities for Binomial, Poisson, Negative Binomial and Hypergeometric distributions; and cumulative probabilities for the Normal distribution.
Sample Size Estimation: Determine the required sample sizes for various scenarios.
Statistical Inference: Construct confidence intervals, conduct hypothesis tests for one- and two-samples (mean, proportion and standard deviation).
ANOVA: Perform one-way Analysis of Variance with an option to conduct Bonferroni post hoc tests.
Regression and Correlation: Fit simple linear regression models and compute Pearson correlation coefficient, multiple linear regression, logistic regression.
Categorical Data Analysis: Perform Chi-Square test of independence with and without Yates’ continuity correction, Fisher’s exact test.
Nonparametric Tests: Perform the Mann-Whitney U Test, Kruskal-Wallis test etc.

I would be delighted if you could explore CougarStats and share it with your students and colleagues who might find it useful.

Thank you for your time, and I look forward to hearing your thoughts.

Sincerely,

Ashok

—

Ashok Krishnamurthy, PhD

Associate Professor

Department of Mathematics and Computing

Mount Royal University

4825 Mount Royal Gate SW

Calgary, AB, T3E 6K6 Canada

[email protected]

Understanding R’s describe() Function: A Complete Guide to Summary Statistics

Introduction to describe()
Breaking Down the Output Columns
Key Statistics and Their Interpretation
Practical Examples
When to Use Which Statistic
Extending the Functionality
Conclusion

Introduction to `describe()`

The describe() function from R’s psych package (Revelle, 2023) provides a comprehensive statistical summary of your dataset. Unlike R’s base summary() function, it includes additional metrics that are particularly useful for data exploration and assumption checking.

library(psych)
describe(your_data)

Breaking Down the Output Columns

Here’s what each column in the output represents:

Column	Description	Formula/Calculation	Ideal Use Case
vars	Variable index number	–	Tracking variable order
n	Complete cases	`length(na.omit(x))`	Data completeness check
mean	Arithmetic average	`sum(x)/n`	Normally distributed data
sd	Standard deviation	`sqrt(var(x))`	Measuring spread
median	50th percentile	`quantile(x, 0.5)`	Skewed distributions
trimmed	Mean after removing extremes	`mean(x, trim=0.1)`	Robust central tendency
mad	Median absolute deviation	`median(abs(x-median(x)))`	Outlier-resistant spread
min	Minimum value	`min(x)`	Range assessment
max	Maximum value	`max(x)`	Range assessment
range	Max – Min	`max(x)-min(x)`	Total spread
skew	Distribution asymmetry	`sum((x-mean(x))³)/(n*sd(x)³)`	Detecting skew direction
kurtosis	Tailedness	`sum((x-mean(x))⁴)/(n*sd(x)⁴)-3`	Outlier propensity
se	Standard error	`sd(x)/sqrt(n)`	Precision of mean estimate

Key Statistics and Their Interpretation

Central Tendency

Mean vs. Median: Differences indicate skewness
Trimmed Mean: Removes influence of outliers (default drops top/bottom 10%)

Variability

SD vs. MAD: Use MAD when outliers are present
Range: Simple but outlier-sensitive

Distribution Shape

Skewness:
- >0: Right-tailed
- <0: Left-tailed
- 0: Symmetric
Kurtosis (Excess):
- >0: Heavy-tailed (more outliers than normal)
- <0: Light-tailed

Practical Examples

Example 1: MPG from mtcars

describe(mtcars$mpg)

Output Interpretation:

   vars  n   mean    sd median trimmed   mad min  max range skew kurtosis   se
1     1 32 20.09 6.03   19.2   19.70 5.41 10.4 33.9  23.5 0.61    -0.37 1.07

Right-skewed (mean > median, positive skew)
Light-tailed (negative kurtosis)
SD (6.03) > MAD (5.41): Suggests some outlier influence

When to Use Which Statistic

Scenario	Recommended Statistics
Normal Distribution	Mean, SD
Skewed Data	Median, IQR, MAD
Outlier Detection	MAD, trimmed mean, kurtosis
Parametric Testing	Mean, SE
Nonparametric Analysis	Median, IQR

Extending the Functionality

Adding IQR

The default describe() doesn’t show IQR, but you can add it:

library(dplyr)
describe(mtcars) %>% 
  mutate(IQR = apply(mtcars, 2, IQR, na.rm = TRUE))

Comparing Groups

Use describeBy() for grouped statistics:

describeBy(mtcars$mpg, group = mtcars$cyl)

Conclusion

R’s describe() function provides a powerful starting point for exploratory data analysis. By understanding each statistic it provides, you can:

Detect data quality issues
Choose appropriate analysis methods
Understand your variables’ distributions
Make informed decisions about data transformations

For formal reporting, consider supplementing these metrics with visualization and statistical tests.

Pro Tip: Always visualize your data alongside these statistics – numbers tell part of the story, but plots reveal the full picture!

Happy coding!

—
Reference:
Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University.

Tag: statistical summary

CougarStats: a free and open-source Statistics web app for Teaching and Learning

Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics

Table of Contents

Introduction to `describe()`

Breaking Down the Output Columns

Key Statistics and Their Interpretation

Central Tendency

Variability

Distribution Shape

Practical Examples

Example 1: MPG from mtcars

When to Use Which Statistic

Extending the Functionality

Adding IQR

Comparing Groups

Conclusion

Table of Contents

Introduction to describe()

Breaking Down the Output Columns

Key Statistics and Their Interpretation

Central Tendency

Variability

Distribution Shape

Practical Examples

Example 1: MPG from mtcars

When to Use Which Statistic

Extending the Functionality

Adding IQR

Comparing Groups

Conclusion

Introduction to `describe()`