Monthly Archives: April 2012

My Life (as a Number)

A few days ago I went on a nice run that really helped to clear my mind. Afterwards, the song “I Gave You Power” by Nas stayed inside my mind. And I wondered if I could do a similar thing, but more math related. After a quick free-write, what follows is what I came up with.

My Life (as a Number)
– by AfterMath

I’ve been used, abused, and mistreated
Like a tool for you to do what you’ve got to do.
But when you’re through, you’re through
No “how was it for you”
No asking about what I’ve been through
Nah, I don’t get a word from you

Except for those odd days
When you want to whine and moan and complain
About how much you don’t like me
About how much you can’t stand me
About how much you hate me
But you keep coming back to me

And I keep letting you come back to me.
I keep solving your problems for you
And when you get into trouble
I’m the one you turn to.
But I guess that’s what I was created to do
At least now you know what numbers go through.

K-Means Clustering

This is how the data looks before the clustering algorithm is run. This is how the data looks after the clustering algorithm is run.

I have now uploaded my K-Means Clustering Script. The script generates a set of random numbers (as ordered pairs) and asks the user how many clusters we should divide the numbers into and a maximum number of iterations to go through before we stop.

One of the largest problems that we face today is understanding data. Before we even get to the point of trying to interpret what the data means and making decisions based on that data there is often a problem with the general amount of data. Clustering algorithms seek to solve this problem by defining some notion of similarity and using that notion to group the data into sets or ‘clusters’, where two elements belong to the same cluster if they are considered similar. Once elements are placed into clusters, we can analyze this (generally smaller) set of clusters instead of the entire data set, which should help in understanding the data.

Finding an exact solution to this problem is computationally difficult. Instead, we can approximate a solution rather quickly using the k-means clustering algorithm. This algorithm attempts to separate a given data set into a user specified (k) number of groups. The k signifies the number of clusters that we will generate. The algorithm works by initially selecting k elements of the data to serve as the “center” of each cluster. Every element of the data is then compared to each cluster center and assigned to the cluster with the closest cluster center. Once every element in the data is assigned to a cluster, the cluster centers may have changed. So the next step is to measure the elements inside each cluster and determine the new cluster center. The process of assigning elements to (new) clusters and determining (new) cluster centers is repeated until either no element changes cluster or we have reached some maximum number of iteration that the user specifies that they do not want to exceed.

K-Means Clustering can be thought of as an algorithm in the area of unsupervised machine learning. Machine learning is a field of artificial intelligence that focuses on computer programs that have the ability to learn without being explicitly programmed. Unsupervised machine learning seeks to make interpret data without any knowledge of what a “correct” interpretation is. In comparison, supervised machine learning algorithms are useful for data that has been separated into categories. These algorithms generally divide the data into a training set and a test set and seek to produce a function that agrees with the results on the training set.