Category Archives: Blog

These are my thoughts on various topics.

Range Puzzles

Range Puzzles

I have always liked puzzles. I really enjoy discovering a new puzzle. Sometimes when I have discovered a new puzzle, I enjoy it so much that I can’t seem to do enough of them. In such cases, I quickly run out of these puzzles to do and find myself looking for either a new puzzle or a way to generate them on my own.

Such was the case when I discovered the “Range Puzzles”. The rules are simple:

Every cell is marked either blue or gray
No two gray cells can be next to one another
The grid must be a connected (i.e. there is always a path from every cell to every other cell using horizontal and vertical connections.
Some cells have a number inside. This indicates the number of cells that can be viewed (horizontally and vertically in both directions) by this cell, including the cell itself.

To try out these puzzles, visit Range Puzzles at

How To Take Notes in Math Class

I was recently asked by someone how they should be taking notes in math class. I could immediately relate because I once asked this very same question. In both undergrad and grad, I had to ask myself how to take notes because I often would leave class with a bunch of sentence fragments based on what the professor said, but without anything I could use as a study guide. At best, it would be a garbled thing that I could combine with somebody else’s notes and try to make some legible out of it. Generally though, I would just ignore my notes and go to the text book if it was well written, or the library if the text book was not well written.

So how did I get past this? Well, after taking the Set Theory course, I started seeing mathematics as more of a construction job, like building a house. Mathematics is based on proofs which is nothing more than logical reasoning, and logical reasoning is just a series of statements that are either assumptions, definitions, or conclusions drawn from those assumptions based on known facts. The main purpose of classes is to present these “known facts” to the students and help them become more informed mathematicians. So what is a mathematical fact?

There are two important types of facts we generally learn in mathematics. The first gives us a language we can work from – these are our definitions. These are the most important things in a mathematics class because if it is a class on “group theory”, then one of the first definitions will be of a “group,” and not knowing this definition will cause problems when you are trying to prove that something is or is not a group. Similarly, if your class is on solving quadratic equations, then it is very important to be able to define what a “quadratic equation” is. Definitions in mathematics are important because, unlike in an English class, you cannot always derive the meaning of a mathematical term from its usage. Mathematical definitions are very precise and your notes should include this precision.

As a side note, I will state (if not stress) that examples are not definitions. Examples are meant to bring the definition to life, and to help connect the definition to the student, but if you are asked what is a group and respond that “the set of integers mod 7 under the operation of addition is a group”, you’d be incorrect because you gave an example of a group without telling why this is a group. Similarly, if I were to ask you to define a quadratic equation and you said “x^2 + 2x + 1 = 0 is a quadratic equation”, you’d be giving me an example but not a definition. It is important to understand the distinction between examples and definitions because when we understand the definition we can clearly explain why (ala prove) that the given instance is in fact an example.

Some professors will briefly mention a definition and then focus most of their time on examples in an attempt to make the concepts easier to understand. As a student though, it is important to ask the question “what is the definition of _____” because the exams, or the research that follows, or the applications of this concept in real life are not likely to be that same example. If the professor does not write out the definition of whatever concept you are studying, or you are unclear of this definition you should ask questions, go to an online resource, or a text book for supplementary reference.

The second type of fact presented in mathematical classes is the theorem (aka lemma or corollary). A theorem is a statement that is provable by mathematical reasoning. In a classroom setting, theorems will generally be presented in an orderly fashion such that if Theorem 1 is presented before Theorem 2, then Theorem 1 does not require Theorem 2 in order to be proven.

Because theorems are provable statements, it may be tempting to jump directly into the proof, and particularly to only take notes on the proof. I cannot stress against this enough. Before writing the proof, you should make sure to begin the proof with a clear statement of the theorem. This should be a declarative statement that is thus provable. Do not confuse the questions a professor may ask before proving a theorem with the theorem itself. An example of a theorem from group theory is “Any group that has a prime number of elements of elements in that group is cyclic (or can be generated by a single element). Also, do not confuse the nicknames of theorems with the theorem itself. For instance Lagrange’s Theorem says that “the number of elements of any subgroup of a finite group divides the number of elements in the original group.” Remembering this as Lagrange’s Theorem is fine, but it is much more important to remember the declarative statement proved.

I’m sure there are other things people use to take notes in math classes. Feel free to leave a comment sharing some of this advice.

Learn About Descriptive Statistics

I had been meaning to write a script and blog post on descriptive statistics for some time now, but with work and winter weather and the extra work that winter weather brings, and now that the winter weather is over trying to get back into an exercise routine (running up a hill is such a challenging experience, but when I get to the top of that hill I feel like Rocky Balboa on the steps at the steps at the entrance of the Philadelphia Museum of Art), I haven’t had the time to devote to this site that I would have liked. Well, that’s not entirely true. I have still been programming in my spare time. I just haven’t been able to share it here. I went to a conference in February and in my down time, I was able to write a script on descriptive statistics that I think gives a nice introduction to the area.

Before I go into descriptive statistics though, lets talk about statistics, which is concerned with the collection, analysis, interpretation and presentation of data. Statistics can generally be broken down into two categories, descriptive statistics and infernalinferential statistics, depending on what we would like to do with that data. When we are concerned with visualizing and summarizing the given data, descriptive statistics gives methods to operate on this data set. On the other hand, if we wish to draw conclusions about a larger population from our sample, then we would use methods from inferential statistics.

In the script on descriptive statistics I’ve written, I consider three different types of summaries for descriptive statistics:

Measures of Central Tendency
Mean – the arithmetic average of a set of values
Median – the middle number in a set of values
Mode – the most used number in a set of values

Maximum – the largest value in the data set
Minimum – the smallest value in the data set
Standard Deviation – the amount of variation in a set of data values
Variance – how far a set of numbers is spread out

Kurtosis – how peaked or flat a data set is
Skewness – how symmetric a data set is

Histogram Plots – a bar diagram where the horizontal axis shows different categories of values, and the height of each bar is related to the number of observations in the corresponding category.
Box and Whisker Plots – A box-and-whisker plot for a list of numbers consists of a rectangle whose left edge is at the first quartile of the data and whose right edge is at the third quartile, with a left whisker sticking out to the smallest value, and a right whisker sticking out to the largest value.
Stem and Leaf Plots – A stem and leaf plot illustrates the distribution of a group of numbers by arranging the numbers in categories based on the first digit.

Slope Formula

I was watching a football game a few days ago, and to prepare for it, we decided to pick up some snacks. As the game progressed, I found myself eating a lot of chips, but at halftime, there was one snack that remained unopened, a snack I had been thinking about all night, a snack that I couldn’t seem to locate until that moment – Recee’s Pieces. We purchased a pretty large sized bag that I thought would last a while. So at the end of halftime, when the bag was almost empty, someone made sure to warn me that at the rate I’m eating these things I’d be sure to be sick the next day.

Just to give you some insight into my how my mind works, the fact that she used the term “rate” took me back to classes of Algebra 1 where we were first learning about linear equations, slopes, y-intercepts, point slope form, slope intercept form, and so on. The more I thought about it, the more I thought this would be a good script to add to this site as it would probably provide help to many students who are currently enrolled in those classes as well as a remembrance to adults who took those classes years ago and wish to recall the concepts.

So I have added a script which helps go over the slope formula. It randomly generates two points and asks users to select which of a set of choices is the slope of those two numbers. In case you forget, there is a button where you can be given the slope as well.

Interactive Midpoint Formula

Interactive Midpoint Formula

I hope everyone had a great time over the holidays reconnecting with their family and friends. I definitely enjoyed hearing about the changes in their lives since we last spoke, and meeting new additions to the families. One of the best times I had was with a two year old who I had met earlier in the year. She was much more responsive this time, and as we sat and talked, she was eager to tell me the things she knew, from the alphabet to her numbers, to her shapes. So today’s blog-script is inspired by this conversation, which became a game of point and describe. I would point to an object and she would say “That’s a circle”, or “That’s a red triangle”. I was amazed at both how simple it was and how much I enjoyed this activity.

I enjoyed it so much that I decided that I’d like to have some programs on my site that were more of this genre. The first that I decided to write is a lesson on the midpoint formula. But instead of simply giving the formula and writing a script to walk users through the steps in calculating the midpoint, I thought I’d write a point-and-click approach to it.

The script will randomly generate two points in the XY plane and ask users to calculate the midpoint between these points. Five options are then given and the user is asked to select the radio button next to the correct choice. Once the submit button is pressed, the program will let users know if their choice is correct. Users can have the program generate new points at any time. There is also an option for users to have the midpoint formula displayed.

Because this is my first program of this sort, I am curious to know what users think. When generating choices that are supposed to be incorrect, what’s a good method for doing so? I decided not to keep a timer or a score for the “score” of a user, but I did think about it. Ultimately I wanted this to feel less like a “test” and more like a “game” so I decided against this option. But I would like to know what you think – either through your comments here or on twitter @MindAfterMath.

Once again, here’s the link to the script. I’ll keep you updated on if my new two year old friend likes this.

The Bridge Crossing Problem

Most puzzles are fun in their own right. Some puzzles are so fun that they have the added benefit that they are likely to come up in unexpected places, like maybe in a job interview. I was recently reading a paper by Günter Rote entitled “Crossing the Bridge at Night” where Rote analyzes such a puzzle. Upon finishing the paper, I decided to write a script so that users could see the general form of this puzzle.

The problem can be stated as follows: There is a set of people, lets make the set finite by saying that there are exactly n people, who wish to cross a bridge at night. There are a few restrictions that make crossing this bridge somewhat complicated.

  • Each person has a travel time across the bridge.
  • No more than two people can cross the bridge at one time.
  • If two people are on the bridge together, they must travel at the pace of the slower person.
  • There is only one flashlight and no party (of one or two people) can travel across the bridge without the flashlight.
  • The flashlight cannot be thrown across the bridge, and nobody can go to the store to purchase another flashlight

The image above shows the optimal solution when the 4 people have travel times of 1, 2, 5, and 10. The script I have written allows users to work with different numbers of people with random travel times. Give it a try and see if you can spot the patterns in the solution.

Hierarchical Clustering

Hierarchical Clustering algorithms give a nice introduction for computer science students to unsupervised machine learning. I say this because the bottom-up approach to Hierarchical clustering (which I have implemented here) is very similar to Kruskal’s algorithm for finding the minimum spanning tree of a graph.

In Kruskal’s algorithm, we begin by creating a forest, or a set of trees where each node is its own tree. The algorithm then selects the two trees that are closest together (closest being defined as the minimum cost edge between two distinct trees) and merges those trees together. This process of merging the closest two trees is then repeated until there is only one tree remaining, which is a minimum spanning tree of the graph.

Similarly, bottom-up hierarchical clustering of a group of points begins by saying that each point is its own cluster. Then the clusters are compared to one another to check if two clusters will be merged into one. Generally, there will be some stopping criteria, , saying that we do not want to merge two clusters together if their distance is greater than . So if the minimum distance between two clusters is less than we will proceed as in Kruskal’s algorithm by merging these two clusters together into one cluster. We repeat this process of merging the closest two clusters together until we find that the minimum distance between two clusters is greater than or equal to , in which case we can stop and the result is a partition of our data set into distinct clusters.

Hierarchical clustering is comparable to K-Means Clustering. Here are some differences between the two approaches:

  1. K-Means Clustering requires an initial number of desired clusters, while Hierarchical clustering does not.
  2. A run of K-Means Clustering will always give K clusters, whereas Hierarchical Clustering can give more or less, depending on our tolerance .
  3. K-Means can undo previous mistakes (assignments of an element to the wrong cluster), while Hierarchical Clustering cannot.

So, here is a link to my page on Hierarchical Clustering. Hope you enjoy.

Introduction to JavaScript Programming

Here is a link to my sample JavaScript code.

I received a lot of attention from friends interested in programming after my recent blog post entitled “Introduction to Python Programming”. While many found it interesting, the fact that Python is more useful to mathematicians hindered sine of my friends desire to learn it as their first language.

In out conversations, my recommendation for a first language was JavaScript. This is a powerful language in the sense that just about anybody who is involved with the internet knows it, and it’s likely to boost a person’s resume. It also has many similarities to more powerful languages like C++ and Java, so while not trivial, it could be a good launch pad into more advanced languages. But my favorite reason is that unlike many other programming languages that rely in an MS-DOS like command like approach for run time interaction, JavaScript’s basic interaction is with the standard internet browsers we use everyday. There isn’t even anything you need to download or install. Just create a basic HTML file in a text editor (like notepad, wordpad, or notepad++). This makes it easier to show off your creations which makes learning more fun.

The script I’ve finished provides examples on writing output, declaring variables, data types, conditionals, loops, and functions. Although I do not go into detail about all the events and objects on an HTML page, I do finish with three examples of more advanced JavaScript programs. Once you’ve selected a program, the code well be revealed in the text area. There is also a button that, when clicked, will execute that script on a new HTML tab.

I hope you enjoy, and let me know if you have any suggestions or comments.

With that being said, here is a link to my sample JavaScript code.

Simple Linear Regression

I finished a script that helps explain the concepts of simple linear regression.

We live in a world that is filled with patterns – patterns all around us just waiting to be discovered. Some of these patterns are not as easily discovered because of the existence of outside noise.

Consider for example an experiment where a set of people were each given the task of drinking a number of beers and having their blood alcohol level taken afterwards. Some noise factors in this could include the height and weight of the individual, the types of drinks, the amount of food eaten, and the time between drinks. Even with this noise, though, we can still see a correlation between the number of drinks and their blood alcohol level. Consider the following graph showing people’s blood-alcohol level after a given number of drinks. The x-axis represents the number of drinks and the y-axis is the corresponding blood alcohol level.

Example of Linear Regression

x 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4
y 0.10 0.03 0.19 0.12 0.04 0.095 0.070 0.060 0.020 0.050 0.070 0.10 0.085 0.090 0.010 0.050

We can definitely see a correlation, and although the data doesn’t quite fit on a straight line. It leads us to ask further questions like can we use this data to build a model that estimates a person’s blood-alcohol level and how strong is this model?

One of the tools we can use to model this problem is linear regression. A linear regression takes a two-dimensional data set, with the assumption that one column (generally represented by the x variable) is independent and the second column (generally represented by the y variable) being dependent on the first column. The assumption is that the relationship between the two columns is linear and can be represented by the linear equation

y = 0 + 1x + e.

The right hand side of the above equation has three terms. The first two (0 and 1) are the parameters of the linear equation (the y-intercept and slope respectively), while the third term of the right hand side of the above equation represents the error term. The error term represents the difference between this linear equation and the y values in the data provided. We are seeking a line that minimizes the error term. That is, we are seeking to minimize

D = i = 1 to n [yi – (0 + 1xi)]2

There are several ways one could approach this problem. In fact, there are several lines that one could use to build a linear model. The first line that one may use to model these points is the one generated by only mean of the y values of the points, called the horizontal line regression.

For the data set above, the mean of the y values can be calculated as = 0.0738, so we could build a linear model based on this mean that would be y = 0.738. This horizontal line regression model is a horizontal line that predicts the same score (the mean), regardless of the x value. This lack of adjustments means it is generally a poor fit for most models. But as we will see later, this horizontal line regression model does serve a purpose in determining how well the model we develop performs.

A second attempt at solving this problem would be to generate the least squares line. This is the line that minimizes the D value listed above. We can see that D is a multi-variable polynomial, and we can find the minimum of such a polynomial using calculus, partial derivatives and Gaussian elimination (I will omit the work here because it deters us from the main point of this blog post, however Steven J. Miller has a good write-up of this).

The calculus leads us to the following equations:

SXY = i = 1 to n(xy) –
(i = 1 to nx)(i = 1 to ny)

SXX = i = 1 to n(x2) –
(i = 1 to nx)2

1 =

0 = 1

To calculate the least squares line for this example, we first need to calculate a few values:
i = 1 to n(xy) = 6.98
i = 1 to n(x2) = 443
i = 1 to nx = 77
i = 1 to ny = 1.18
Sxx = 72.44
Sxy = 1.30

This lets us evaluate that
1 = 0.018
0 = -0.0127

So the resulting linear equation for this data is

= -0.0127 + 0.018*x

Below is a graph of the two attempts at building a linear model for this data.

Example of Models of Linear Regression

In the above image, the green line represents the horizontal line regression model and the blue line represents the least-squares line. As stated above, the horizontal line regression model is a horizontal line that does not adjust as the data changes. The least-squares line adjusts both the slope and y-intercept of this line according to the data provided to better fit the data provided. The question becomes how well does the least-squares line fit the data.

The Sum of Squares Error (SSE) sums the deviation at each point of our data from the least-squares line.

SSE = i = 1 to n(yii)2

A second metric that we are interested in is how well the horizontal line regression linear model estimates our data. This is called the Total Sum of Squares (SST).

SST = i = 1 to n(yi)2

The horizontal line regression model ignores the independent variable x from our data set and thus any line that takes this independent variable into account will be an improvement on the horizontal line regression model. Because of this, the SST sum is a worse case scenario of how poorly our model can perform.

Knowing now that SST is always greater than SSE, the regression sum of squares (SSR) is the difference between the total sum of squares and the sum of squares error.


This tells us how much of the total sum of squares is accounted for by the model.

Finally, the coefficient of determination (r2) is defined by

r2 = SSR / SST

This tells SSR as a percentage of SST, or the amount of the variation in the data that is explained by the model.

So, check out my script on simple linear regression and let me know what you think.

Probability: Sample Spaces

I’ve been doing a few games lately (can be seen here, here and here) and, while I think those are very good ways to become interested in some of the avenues of math research, I also have had a few people come to me with questions regarding help with their classes. So I decided to write a script to try to help understand some elementary probability theory, focusing on discrete sample spaces.

Probability Image

In statistics, any process of observation is referred to as an experiment.
The set of all possible outcomes of an experiment is called the sample space and it is usually denoted by S. Each outcome in a sample space is called an element of the sample space. An event is a subset of the sample space or which the event occurs. Two events are said to be mutually exclusive if they have no elements in common.

Similar to set theory, we can form new events by performing operations like unions, intersections and compliments on other events. If A and B are any two subsets of a sample space S, then their union A ∪ B is the subset of S that contains all the elements that are in either A, in B, or in both; their intersection A ∩ B is the subset of S that contains all the elements that are in both A and B; the compliment A’ of A is the subset of S that contains all the elements of S that are not in A.

A probability is a function that assigns real numbers to events of a sample space. The following are the axioms of probability that apply when the sample space is discrete (finite or countable).

Axiom 1: The probability of an event is a non-negative real number; that is P(A) ≥ 0 for any subset A of S.
Axiom 2: The probability of the entire sample space is 1; that is P(S) = 1.
Axiom 3: If A1, A2, A3, … , is a finite or infinite sequence of mutually exclusive events of S, then
P(A1 ∪ A2 ∪ A3 ∪ …) = P(A1) + P(A2) + P(A3) + …
If A and B are any two events in a sample space S and P(A) ≠ 0, the conditional probability of B given A is

P(B | A) =
P(A ∩ B)


Two events A and B are independent if and only if P(A | B) = P(A) ∙ P(B).