R-posts.com

Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.
Understanding R’s describe() Function: A Complete Guide to Summary Statistics

Introduction to describe()

The describe() function from R’s psych package (Revelle, 2023) provides a comprehensive statistical summary of your dataset. Unlike R’s base summary() function, it includes additional metrics that are particularly useful for data exploration and assumption checking.

library(psych)
describe(your_data)

Breaking Down the Output Columns

Here’s what each column in the output represents:

Column Description Formula/Calculation Ideal Use Case
vars Variable index number Tracking variable order
n Complete cases length(na.omit(x)) Data completeness check
mean Arithmetic average sum(x)/n Normally distributed data
sd Standard deviation sqrt(var(x)) Measuring spread
median 50th percentile quantile(x, 0.5) Skewed distributions
trimmed Mean after removing extremes mean(x, trim=0.1) Robust central tendency
mad Median absolute deviation median(abs(x-median(x))) Outlier-resistant spread
min Minimum value min(x) Range assessment
max Maximum value max(x) Range assessment
range Max – Min max(x)-min(x) Total spread
skew Distribution asymmetry sum((x-mean(x))³)/(n*sd(x)³) Detecting skew direction
kurtosis Tailedness sum((x-mean(x))⁴)/(n*sd(x)⁴)-3 Outlier propensity
se Standard error sd(x)/sqrt(n) Precision of mean estimate

Key Statistics and Their Interpretation

Central Tendency

Variability

Distribution Shape

Practical Examples

Example 1: MPG from mtcars

describe(mtcars$mpg)

Output Interpretation:

   vars  n   mean    sd median trimmed   mad min  max range skew kurtosis   se
1     1 32 20.09 6.03   19.2   19.70 5.41 10.4 33.9  23.5 0.61    -0.37 1.07

When to Use Which Statistic

Scenario Recommended Statistics
Normal Distribution Mean, SD
Skewed Data Median, IQR, MAD
Outlier Detection MAD, trimmed mean, kurtosis
Parametric Testing Mean, SE
Nonparametric Analysis Median, IQR

Extending the Functionality

Adding IQR

The default describe() doesn’t show IQR, but you can add it:

library(dplyr)
describe(mtcars) %>% 
  mutate(IQR = apply(mtcars, 2, IQR, na.rm = TRUE))

Comparing Groups

Use describeBy() for grouped statistics:

describeBy(mtcars$mpg, group = mtcars$cyl)

Conclusion

R’s describe() function provides a powerful starting point for exploratory data analysis. By understanding each statistic it provides, you can:

For formal reporting, consider supplementing these metrics with visualization and statistical tests.

Pro Tip: Always visualize your data alongside these statistics – numbers tell part of the story, but plots reveal the full picture!

Happy coding!


Reference:
Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University.

Exit mobile version