Category Archives: Blog

These are my thoughts on various topics.

Learn About Descriptive Statistics

I had been meaning to write a script and blog post on descriptive statistics for some time now, but with work and winter weather and the extra work that winter weather brings, and now that the winter weather is over trying to get back into an exercise routine (running up a hill is such a challenging experience, but when I get to the top of that hill I feel like Rocky Balboa on the steps at the steps at the entrance of the Philadelphia Museum of Art), I haven’t had the time to devote to this site that I would have liked. Well, that’s not entirely true. I have still been programming in my spare time. I just haven’t been able to share it here. I went to a conference in February and in my down time, I was able to write a script on descriptive statistics that I think gives a nice introduction to the area.

Before I go into descriptive statistics though, lets talk about statistics, which is concerned with the collection, analysis, interpretation and presentation of data. Statistics can generally be broken down into two categories, descriptive statistics and infernalinferential statistics, depending on what we would like to do with that data. When we are concerned with visualizing and summarizing the given data, descriptive statistics gives methods to operate on this data set. On the other hand, if we wish to draw conclusions about a larger population from our sample, then we would use methods from inferential statistics.

In the script on descriptive statistics I’ve written, I consider three different types of summaries for descriptive statistics:

Measures of Central Tendency
Mean – the arithmetic average of a set of values
Median – the middle number in a set of values
Mode – the most used number in a set of values

Maximum – the largest value in the data set
Minimum – the smallest value in the data set
Standard Deviation – the amount of variation in a set of data values
Variance – how far a set of numbers is spread out

Kurtosis – how peaked or flat a data set is
Skewness – how symmetric a data set is

Histogram Plots – a bar diagram where the horizontal axis shows different categories of values, and the height of each bar is related to the number of observations in the corresponding category.
Box and Whisker Plots – A box-and-whisker plot for a list of numbers consists of a rectangle whose left edge is at the first quartile of the data and whose right edge is at the third quartile, with a left whisker sticking out to the smallest value, and a right whisker sticking out to the largest value.
Stem and Leaf Plots – A stem and leaf plot illustrates the distribution of a group of numbers by arranging the numbers in categories based on the first digit.

Slope Formula

I was watching a football game a few days ago, and to prepare for it, we decided to pick up some snacks. As the game progressed, I found myself eating a lot of chips, but at halftime, there was one snack that remained unopened, a snack I had been thinking about all night, a snack that I couldn’t seem to locate until that moment – Recee’s Pieces. We purchased a pretty large sized bag that I thought would last a while. So at the end of halftime, when the bag was almost empty, someone made sure to warn me that at the rate I’m eating these things I’d be sure to be sick the next day.

Just to give you some insight into my how my mind works, the fact that she used the term “rate” took me back to classes of Algebra 1 where we were first learning about linear equations, slopes, y-intercepts, point slope form, slope intercept form, and so on. The more I thought about it, the more I thought this would be a good script to add to this site as it would probably provide help to many students who are currently enrolled in those classes as well as a remembrance to adults who took those classes years ago and wish to recall the concepts.

So I have added a script which helps go over the slope formula. It randomly generates two points and asks users to select which of a set of choices is the slope of those two numbers. In case you forget, there is a button where you can be given the slope as well.

Interactive Midpoint Formula

Interactive Midpoint Formula

I hope everyone had a great time over the holidays reconnecting with their family and friends. I definitely enjoyed hearing about the changes in their lives since we last spoke, and meeting new additions to the families. One of the best times I had was with a two year old who I had met earlier in the year. She was much more responsive this time, and as we sat and talked, she was eager to tell me the things she knew, from the alphabet to her numbers, to her shapes. So today’s blog-script is inspired by this conversation, which became a game of point and describe. I would point to an object and she would say “That’s a circle”, or “That’s a red triangle”. I was amazed at both how simple it was and how much I enjoyed this activity.

I enjoyed it so much that I decided that I’d like to have some programs on my site that were more of this genre. The first that I decided to write is a lesson on the midpoint formula. But instead of simply giving the formula and writing a script to walk users through the steps in calculating the midpoint, I thought I’d write a point-and-click approach to it.

The script will randomly generate two points in the XY plane and ask users to calculate the midpoint between these points. Five options are then given and the user is asked to select the radio button next to the correct choice. Once the submit button is pressed, the program will let users know if their choice is correct. Users can have the program generate new points at any time. There is also an option for users to have the midpoint formula displayed.

Because this is my first program of this sort, I am curious to know what users think. When generating choices that are supposed to be incorrect, what’s a good method for doing so? I decided not to keep a timer or a score for the “score” of a user, but I did think about it. Ultimately I wanted this to feel less like a “test” and more like a “game” so I decided against this option. But I would like to know what you think – either through your comments here or on twitter @MindAfterMath.

Once again, here’s the link to the script. I’ll keep you updated on if my new two year old friend likes this.

The Bridge Crossing Problem

Most puzzles are fun in their own right. Some puzzles are so fun that they have the added benefit that they are likely to come up in unexpected places, like maybe in a job interview. I was recently reading a paper by Günter Rote entitled “Crossing the Bridge at Night” where Rote analyzes such a puzzle. Upon finishing the paper, I decided to write a script so that users could see the general form of this puzzle.

The problem can be stated as follows: There is a set of people, lets make the set finite by saying that there are exactly n people, who wish to cross a bridge at night. There are a few restrictions that make crossing this bridge somewhat complicated.

  • Each person has a travel time across the bridge.
  • No more than two people can cross the bridge at one time.
  • If two people are on the bridge together, they must travel at the pace of the slower person.
  • There is only one flashlight and no party (of one or two people) can travel across the bridge without the flashlight.
  • The flashlight cannot be thrown across the bridge, and nobody can go to the store to purchase another flashlight

The image above shows the optimal solution when the 4 people have travel times of 1, 2, 5, and 10. The script I have written allows users to work with different numbers of people with random travel times. Give it a try and see if you can spot the patterns in the solution.

Hierarchical Clustering

Hierarchical Clustering algorithms give a nice introduction for computer science students to unsupervised machine learning. I say this because the bottom-up approach to Hierarchical clustering (which I have implemented here) is very similar to Kruskal’s algorithm for finding the minimum spanning tree of a graph.

In Kruskal’s algorithm, we begin by creating a forest, or a set of trees where each node is its own tree. The algorithm then selects the two trees that are closest together (closest being defined as the minimum cost edge between two distinct trees) and merges those trees together. This process of merging the closest two trees is then repeated until there is only one tree remaining, which is a minimum spanning tree of the graph.

Similarly, bottom-up hierarchical clustering of a group of points begins by saying that each point is its own cluster. Then the clusters are compared to one another to check if two clusters will be merged into one. Generally, there will be some stopping criteria, , saying that we do not want to merge two clusters together if their distance is greater than . So if the minimum distance between two clusters is less than we will proceed as in Kruskal’s algorithm by merging these two clusters together into one cluster. We repeat this process of merging the closest two clusters together until we find that the minimum distance between two clusters is greater than or equal to , in which case we can stop and the result is a partition of our data set into distinct clusters.

Hierarchical clustering is comparable to K-Means Clustering. Here are some differences between the two approaches:

  1. K-Means Clustering requires an initial number of desired clusters, while Hierarchical clustering does not.
  2. A run of K-Means Clustering will always give K clusters, whereas Hierarchical Clustering can give more or less, depending on our tolerance .
  3. K-Means can undo previous mistakes (assignments of an element to the wrong cluster), while Hierarchical Clustering cannot.

So, here is a link to my page on Hierarchical Clustering. Hope you enjoy.

Introduction to JavaScript Programming

Here is a link to my sample JavaScript code.

I received a lot of attention from friends interested in programming after my recent blog post entitled “Introduction to Python Programming”. While many found it interesting, the fact that Python is more useful to mathematicians hindered sine of my friends desire to learn it as their first language.

In out conversations, my recommendation for a first language was JavaScript. This is a powerful language in the sense that just about anybody who is involved with the internet knows it, and it’s likely to boost a person’s resume. It also has many similarities to more powerful languages like C++ and Java, so while not trivial, it could be a good launch pad into more advanced languages. But my favorite reason is that unlike many other programming languages that rely in an MS-DOS like command like approach for run time interaction, JavaScript’s basic interaction is with the standard internet browsers we use everyday. There isn’t even anything you need to download or install. Just create a basic HTML file in a text editor (like notepad, wordpad, or notepad++). This makes it easier to show off your creations which makes learning more fun.

The script I’ve finished provides examples on writing output, declaring variables, data types, conditionals, loops, and functions. Although I do not go into detail about all the events and objects on an HTML page, I do finish with three examples of more advanced JavaScript programs. Once you’ve selected a program, the code well be revealed in the text area. There is also a button that, when clicked, will execute that script on a new HTML tab.

I hope you enjoy, and let me know if you have any suggestions or comments.

With that being said, here is a link to my sample JavaScript code.

Simple Linear Regression

I finished a script that helps explain the concepts of simple linear regression.

We live in a world that is filled with patterns – patterns all around us just waiting to be discovered. Some of these patterns are not as easily discovered because of the existence of outside noise.

Consider for example an experiment where a set of people were each given the task of drinking a number of beers and having their blood alcohol level taken afterwards. Some noise factors in this could include the height and weight of the individual, the types of drinks, the amount of food eaten, and the time between drinks. Even with this noise, though, we can still see a correlation between the number of drinks and their blood alcohol level. Consider the following graph showing people’s blood-alcohol level after a given number of drinks. The x-axis represents the number of drinks and the y-axis is the corresponding blood alcohol level.

Example of Linear Regression

x 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4
y 0.10 0.03 0.19 0.12 0.04 0.095 0.070 0.060 0.020 0.050 0.070 0.10 0.085 0.090 0.010 0.050

We can definitely see a correlation, and although the data doesn’t quite fit on a straight line. It leads us to ask further questions like can we use this data to build a model that estimates a person’s blood-alcohol level and how strong is this model?

One of the tools we can use to model this problem is linear regression. A linear regression takes a two-dimensional data set, with the assumption that one column (generally represented by the x variable) is independent and the second column (generally represented by the y variable) being dependent on the first column. The assumption is that the relationship between the two columns is linear and can be represented by the linear equation

y = 0 + 1x + e.

The right hand side of the above equation has three terms. The first two (0 and 1) are the parameters of the linear equation (the y-intercept and slope respectively), while the third term of the right hand side of the above equation represents the error term. The error term represents the difference between this linear equation and the y values in the data provided. We are seeking a line that minimizes the error term. That is, we are seeking to minimize

D = i = 1 to n [yi – (0 + 1xi)]2

There are several ways one could approach this problem. In fact, there are several lines that one could use to build a linear model. The first line that one may use to model these points is the one generated by only mean of the y values of the points, called the horizontal line regression.

For the data set above, the mean of the y values can be calculated as = 0.0738, so we could build a linear model based on this mean that would be y = 0.738. This horizontal line regression model is a horizontal line that predicts the same score (the mean), regardless of the x value. This lack of adjustments means it is generally a poor fit for most models. But as we will see later, this horizontal line regression model does serve a purpose in determining how well the model we develop performs.

A second attempt at solving this problem would be to generate the least squares line. This is the line that minimizes the D value listed above. We can see that D is a multi-variable polynomial, and we can find the minimum of such a polynomial using calculus, partial derivatives and Gaussian elimination (I will omit the work here because it deters us from the main point of this blog post, however Steven J. Miller has a good write-up of this).

The calculus leads us to the following equations:

SXY = i = 1 to n(xy) –
(i = 1 to nx)(i = 1 to ny)

SXX = i = 1 to n(x2) –
(i = 1 to nx)2

1 =

0 = 1

To calculate the least squares line for this example, we first need to calculate a few values:
i = 1 to n(xy) = 6.98
i = 1 to n(x2) = 443
i = 1 to nx = 77
i = 1 to ny = 1.18
Sxx = 72.44
Sxy = 1.30

This lets us evaluate that
1 = 0.018
0 = -0.0127

So the resulting linear equation for this data is

= -0.0127 + 0.018*x

Below is a graph of the two attempts at building a linear model for this data.

Example of Models of Linear Regression

In the above image, the green line represents the horizontal line regression model and the blue line represents the least-squares line. As stated above, the horizontal line regression model is a horizontal line that does not adjust as the data changes. The least-squares line adjusts both the slope and y-intercept of this line according to the data provided to better fit the data provided. The question becomes how well does the least-squares line fit the data.

The Sum of Squares Error (SSE) sums the deviation at each point of our data from the least-squares line.

SSE = i = 1 to n(yii)2

A second metric that we are interested in is how well the horizontal line regression linear model estimates our data. This is called the Total Sum of Squares (SST).

SST = i = 1 to n(yi)2

The horizontal line regression model ignores the independent variable x from our data set and thus any line that takes this independent variable into account will be an improvement on the horizontal line regression model. Because of this, the SST sum is a worse case scenario of how poorly our model can perform.

Knowing now that SST is always greater than SSE, the regression sum of squares (SSR) is the difference between the total sum of squares and the sum of squares error.


This tells us how much of the total sum of squares is accounted for by the model.

Finally, the coefficient of determination (r2) is defined by

r2 = SSR / SST

This tells SSR as a percentage of SST, or the amount of the variation in the data that is explained by the model.

So, check out my script on simple linear regression and let me know what you think.

Probability: Sample Spaces

I’ve been doing a few games lately (can be seen here, here and here) and, while I think those are very good ways to become interested in some of the avenues of math research, I also have had a few people come to me with questions regarding help with their classes. So I decided to write a script to try to help understand some elementary probability theory, focusing on discrete sample spaces.

Probability Image

In statistics, any process of observation is referred to as an experiment.
The set of all possible outcomes of an experiment is called the sample space and it is usually denoted by S. Each outcome in a sample space is called an element of the sample space. An event is a subset of the sample space or which the event occurs. Two events are said to be mutually exclusive if they have no elements in common.

Similar to set theory, we can form new events by performing operations like unions, intersections and compliments on other events. If A and B are any two subsets of a sample space S, then their union A ∪ B is the subset of S that contains all the elements that are in either A, in B, or in both; their intersection A ∩ B is the subset of S that contains all the elements that are in both A and B; the compliment A’ of A is the subset of S that contains all the elements of S that are not in A.

A probability is a function that assigns real numbers to events of a sample space. The following are the axioms of probability that apply when the sample space is discrete (finite or countable).

Axiom 1: The probability of an event is a non-negative real number; that is P(A) ≥ 0 for any subset A of S.
Axiom 2: The probability of the entire sample space is 1; that is P(S) = 1.
Axiom 3: If A1, A2, A3, … , is a finite or infinite sequence of mutually exclusive events of S, then
P(A1 ∪ A2 ∪ A3 ∪ …) = P(A1) + P(A2) + P(A3) + …
If A and B are any two events in a sample space S and P(A) ≠ 0, the conditional probability of B given A is

P(B | A) =
P(A ∩ B)


Two events A and B are independent if and only if P(A | B) = P(A) ∙ P(B).

Dots and Boxes Game

Dots and Boxes Game

When I was in high school, one of my favorite ways to waste time in class (not recommended) was to play a game called dots and boxes (although at the time we just called it dots). I was very surprised to find later that this game belongs to a class of games called “Impartial Combinatorial Games”. These are games where the moves available to the player depend only on the position of the game, and not the player.

In a game of Dots and Boxes, we start with an initial grid with dots at each row and column intersection. At each player’s turn, they have the option of drawing either a horizontal or vertical line between two neighboring dots (depending on if the dots are in the same row or column). If a player fills in the last line on a box (the 4th side), we say that player “owns” the box. The game ends when there are no neighboring dots without a line between them. At the conclusion of the game, the player who owns the most dots is declared the winner.

The game is impartial because there is no restriction on which move a player can make other than the fact that a player cannot re-do a move that has already been made (a partial version of this game would be if player one could only move horizontally and player two could only move vertically).

I have implemented a javascript version of this game. Check it out and let me know what you think.

I also spoke earlier about the discovery that this game in particular was an active area of research. I wanted to provide a link to a paper entitled “Solving Dots and Boxes” by Joseph K. Barker and Richard E Korf that speaks about winning strategies for each player in a game of dots and boxes.