I had been meaning to write a script and blog post on descriptive statistics for some time now, but with work and winter weather and the extra work that winter weather brings, and now that the winter weather is over trying to get back into an exercise routine (running up a hill is such a challenging experience, but when I get to the top of that hill I feel like Rocky Balboa on the steps at the steps at the entrance of the Philadelphia Museum of Art), I haven’t had the time to devote to this site that I would have liked. Well, that’s not entirely true. I have still been programming in my spare time. I just haven’t been able to share it here. I went to a conference in February and in my down time, I was able to write a script on descriptive statistics that I think gives a nice introduction to the area.
Before I go into descriptive statistics though, lets talk about statistics, which is concerned with the collection, analysis, interpretation and presentation of data. Statistics can generally be broken down into two categories, descriptive statistics and infernal statistics, depending on what we would like to do with that data. When we are concerned with visualizing and summarizing the given data, descriptive statistics gives methods to operate on this data set. On the other hand, if we wish to draw conclusions about a larger population from our sample, then we would use methods from inferential statistics.
In the script on descriptive statistics I’ve written, I consider three different types of summaries for descriptive statistics:
Measures of Central Tendency
Mean – the arithmetic average of a set of values
Median – the middle number in a set of values
Mode – the most used number in a set of values
Maximum – the largest value in the data set
Minimum – the smallest value in the data set
Standard Deviation – the amount of variation in a set of data values
Variance – how far a set of numbers is spread out
Kurtosis – how peaked or flat a data set is
Skewness – how symmetric a data set is
Histogram Plots – a bar diagram where the horizontal axis shows different categories of values, and the height of each bar is related to the number of observations in the corresponding category.
Box and Whisker Plots – A box-and-whisker plot for a list of numbers consists of a rectangle whose left edge is at the first quartile of the data and whose right edge is at the third quartile, with a left whisker sticking out to the smallest value, and a right whisker sticking out to the largest value.
Stem and Leaf Plots – A stem and leaf plot illustrates the distribution of a group of numbers by arranging the numbers in categories based on the first digit.