I had been meaning to write a script and blog post on descriptive statistics for some time now, but with work and winter weather and the extra work that winter weather brings, and now that the winter weather is over trying to get back into an exercise routine (running up a hill is such a challenging experience, but when I get to the top of that hill I feel like Rocky Balboa on the steps at the steps at the entrance of the Philadelphia Museum of Art), I haven’t had the time to devote to this site that I would have liked. Well, that’s not entirely true. I have still been programming in my spare time. I just haven’t been able to share it here. I went to a conference in February and in my down time, I was able to write a script on descriptive statistics that I think gives a nice introduction to the area.

Before I go into descriptive statistics though, lets talk about statistics, which is concerned with the collection, analysis, interpretation and presentation of data. Statistics can generally be broken down into two categories, descriptive statistics and ~~infernal~~inferential statistics, depending on what we would like to do with that data. When we are concerned with visualizing and summarizing the given data, descriptive statistics gives methods to operate on this data set. On the other hand, if we wish to draw conclusions about a larger population from our sample, then we would use methods from inferential statistics.

In the script on descriptive statistics I’ve written, I consider three different types of summaries for descriptive statistics:

Measures of Central Tendency

**Mean** – the arithmetic average of a set of values

**Median** – the middle number in a set of values

**Mode** – the most used number in a set of values

Dispersion

**Maximum** – the largest value in the data set

**Minimum** – the smallest value in the data set

**Standard Deviation** – the amount of variation in a set of data values

**Variance** – how far a set of numbers is spread out

Shape

**Kurtosis** – how peaked or flat a data set is

**Skewness** – how symmetric a data set is

Plots

**Histogram Plots** – a bar diagram where the horizontal axis shows different categories of values, and the height of each bar is related to the number of observations in the corresponding category.

**Box and Whisker Plots** – A box-and-whisker plot for a list of numbers consists of a rectangle whose left edge is at the first quartile of the data and whose right edge is at the third quartile, with a left whisker sticking out to the smallest value, and a right whisker sticking out to the largest value.

**Stem and Leaf Plots** – A stem and leaf plot illustrates the distribution of a group of numbers by arranging the numbers in categories based on the first digit.

- Covariance of Vectors (0.694)
- Polynomial Arithmetic (0.323)
- Hidden Markov Models: The Baum-Welch Algorithm (0.301)
- Arithmetic Sequences (0.192)
- Hidden Markov Models: The Viterbi Algorithm (0.170)

Lol @infernal stats #truth #InferentialCanAlsoBeInfernal #YouDidThatOnPurpose

@Alexis, Thanks for noticing this – bad typo on my part. I really don’t have a problem with inferential stats and I’ll probably do something similar one day. But with that area I have a lot of possibilities of what to include, and there’s a whole question of whether I should do a single page like this or a script for each page. But getting back to my mistake, its more of product of not having an editor and thus using spellcheck as my main editor. I try not to make the common mistakes like your/you’re, their/they’re, its/it’s. Never thought I’d make the infernal/inferential mistake.