# Set Partition Problems

This is my first post of 2019 and my first post in a while. There was one posted a few months ago, but not really geared towards the algorithms and learning focus of the site. I have been doing a lot of coding in my spare time, but honestly life has just gotten in the way. Its not a bad thing, but life is life and sometimes I have to prioritize the things. In particular, I have been having ideas and actually coding things up but the time it takes to clean up code, write a blog entry and finding a nice way to visualize these things has been something that I haven’t been able to really focus on as much as I’ve wanted to.

That said, I want to talk to you about the Set Partition Problem today. You can go to Wikipedia to get more information about this problem, but I will give you a brief introduction to it and then talk about two different approaches to it. The problem assumes that we are given as input a (multi)-set S. The reason we say it is a multi-set and not a simple set is because we can have the same element appear multiple times in the set. So if S1 = {2} and S2 = {2, 2}, then although as sets they are both equal to the set {2} = S1, as multi-sets allow for multiple instances of an element. The elements of S are assumed to be positive integers.

So given this multi-set S, we ask the question of can the elements of S be divided into two smaller multi-sets, C1 and C2 where

• C1 [union] C2 = S
• C1 [intersect] C2 = [empty set]
• [Sigma]_[x in C1] = [Sigma]_[x in C2].C1 [union] C2 = S, C1 [intersect] C2 = [empty set], and [Sigma]_[x in C1] = [Sigma]_[x in C2]

The first two bullets above say that the sets C1 and C2 form a partition of S. The third bullet says that the sums of the elements in the two children multi-sets are equal.

This problem is known to be NP Complete. This means that it is one of the more difficult decision problems. Because of this finding an algorithm that solves this problem exactly will generally take a long running time. And finding an algorithm that runs quickly will more than likely be incorrect in some instances.

I will show you two approaches to this problem. One is based on Dynamic Programming (DP) and one is based on Greedy Algorithms. The DP version solves the problem exactly but has a slow running time. The greedy algorithm is fast but is not guaranteed to always give the correct answer.

Dynamic Programming is based on the principle of optimality. This says that in order to have a correct solution to the overall problem, we need optimal solutions to all subproblems of this problem. This is done by keeping track of a table which can be used for looking up these subproblems and the optimal solutions to these subproblems.

For this problem, we will build a table where the rows represent the final sums and columns represent subsets of the given multi-set (containing the first 0…n elements). The question we are repeatedly asking is “can we find a subset of the set in this column whose sum is exactly the given rowsum?” Here is an example of the DP algorithm on the multi-set {5, 6, 5, 6, 7}.

Greedy Algorithms are generally based on sorting the elements based on some principle and using that to try to answer the underlying question. The main problem with this approach is that it is very short sided because they do not look at the overall picture. Such algorithms are known for finding local optima that are not always globally optimal. The benefit to these problems though is that they are generally easier to code and easier to understand.

For the Set Partition Problem , the Greedy approach is to sort elements in descending order. Once this is done, the goal is to keep two subsets while iterating through the array, adding the element to the smaller of the two sets whenever possible.

For more examples, check out Set Partition Problems

# ID3 Algorithm Decision Trees

As I grow LEARNINGlover.com, I’m always thinking of different ways to expose my own personality through the site. This is partially because it is easier for me to talk about subjects where I am already knowledgeable, but it is more-so being done to help make some of these algorithms and concepts I encode more understandable, and sometimes relating foreign concepts to everyday life makes them easier to understand.

Today, I’d like to write about decision trees, and the ID3 algorithm for generating decision trees in particular. This is a machine learning algorithm that builds a model from a training data set consisting of a feature vector and an outcome. Because our data set consists of an outcome element, this falls into the category of supervised machine learning.

The model that the ID3 algorithm builds is called a decision tree. It builds a tree based on the features, or columns of the data set with a possible decision corresponding to each value that the feature can have. The algorithm selects the next feature by asking “which feature tells me the most about our data set?” This question can be answered first by asking how much “information” is in the data set, and then comparing that result with the amount of information in each individual feature.

In order to execute this algorithm we need a way to measure both the amount the information in outcomes of the overall data set as well as how much each feature tells us about the data set. For the first, we will use entropy, which comes from the field of information theory and encoding. Entropy is based on the question of how many bits are necessary to encode the information in a set. The more information, the higher the entropy, and the more bits required to encode that information. Although we are not encoding, the correlation between high information and high entropy suits our purposes.

To understand how much each feature tells us about the outcomes of the data set we will build on the concept of entropy to define the information gain of a feature. Each feature has multiple options, so the dataset can be partitioned based on each possible value of this feature. Once we have this partition, we can calculate the entropy of each subset of the rows of data. We define the information gain of a feature as the sum over all possible outcomes of that feature can have of the entropy of that outcome multiplied by the probability of that outcome.

To illustrate this algorithm, I decided to relate it to the question of whether we think of a character in a novel as a hero or villain. This is interesting because I try to read at least one book a month and as I’m reading, I often find myself asking this question about characters based on the traits of the characters as well as characters I’ve read about. In order to build an interactive script for this problem, I considered 25 possible character traits that could be present. A subset of these 25 character traits will be selected and a row will be generated grading a fictional character on a scale of 0 to 3 (0 meaning that they do not possess the trait at all, 3 meaning that the trait is very strong in their personality), and users will be asked whether they think a character with the given character traits should be listed as a hero or a villain. Then there is a button at the bottom of the script with the text “Build Tree” that executes the ID3 Algorithm and shows a decision tree that could be used to reach the set of decisions given by the user.

The possible features are:
Abstract, Adaptable, Aggressive, Ambition, Anxiety, Artistic, Cautious, Decisive, Honesty, Dutiful, Fitness, Intellect, Independent, Introverted, Lively, Open-minded, Orderly, Paranoid, Perfectionist, Romantic, Sensitive, Stable, Tension, Warmth and Wealthy

Once users select the option to build the tree, there will be several links outlining each step in the process to build this tree. These links will allow for users to expand the information relating to that step and minimize that information when done. Hopefully this will help users to understand each step more. I must say that as much fun as it has been writing this program, there were several questions when trying to explain it to others. Hopefully users get as much fun from using this tool as I had in creating it. As always feel free to contact me with any comments and or questions.

Ok, so here’s a link to the ID3 Algorithm Page. Please check it out and let me know what you think.

# The A* Algorithm

As a child I remember traveling on road trips, sitting in the back of a car trying to do my best to keep myself busy for what would occupy the next six to ten hours of my life. One of the things I grew to like were the simple maze books that were sold on the magazine racks at some of the gas stations where we’d stop for food. There are two basic strategies I employed for solving these mazes: For simpler mazes, I could generally just take a look at the overall maze structure, decide upon a path through the maze, then write a path without any mistakes. For more complex mazes though, I would generally begin a route that looks the most promising. If that route reaches a point where I can see that it will be impossible to finish, then I’d go back to where the decision was made, exclude the route I had just tried, and select the “most promising” remaining route. I’d continue this process until I had completed the maze, or until it became simple enough for me to solve the maze using only my memory.

The A* Algorithm works in a similar manner to the second approach I just described. We begin at a starting point, and consider where to move next from that starting point. The set of possible options for this next move is determined by the neighbor function for a given cell. For each neighbor the algorithm estimates the length of the route through that cell by calculating the sum of two values, f(n) = g(n) + h(n), where

g(n) is the (known) cost to travel from the starting point to the cell n.
h(n) is the (approximate) cost to travel from the cell n to the final cell.

The sum g(n) + h(n) allows us to approximate the total cost of a route through the cell n.

The A* algorithm begins at the starting position of the maze. There are two sets we will be considering throughout the process of determining the optimal route, called the closedSet and the openSet. The elements of closedSet are the nodes whose total distance from the starting position has been calculated, whereas the elements of openSet represents nodes whose total distance is still under consideration. There is also a map called prev which is used to reconstruct the path. Below is how the algorithm operates:

A* Algorithm Pseudocode
closedSet is the empty set
openSet = {start}
prev is the empty set

While there are still elements in openSet,
Find the element c* in openSet with the lowest value f(c).
If c* is the target position
Reconstruct the path.
Else If c* is not the target position,
Remove c* from openSet
For each neighbor n of c* that is not in closedSet,
Calculate a temporary g value, temp_g(n) = g(c*) + dist(c*, n).
If n is not in openSet, or if n is in openSet and temp_g(n) < g(n),                     Set prev(n) = c*                     Set g(n) = temp_g(n)                     Set f(n) = g(n) + h(n).                     if n is not in openSet                          add n to openSet.      End If End While End Algorithm An important question becomes what makes a good heuristic function to approximate the distance to the goal. This can lead to an in depth discussion based on the word "good", but the necessary condition for any heuristic is that it NEVER over-estimates the cost of the path from the cell to the goal. Some examples of possible heuristics for mazes are the Euclidean Distance (the square root of the sum of the squares of the horizontal and vertical differences in distances, i.e. dE(x, y) = (i(xi – yi)2) and TaxiCab Distance (the sum of the differences in the horizontal and vertical dimensions, i.e. dT(x, y) = i|xi – yi|). Both of these are feasible metrics for heuristics on a maze. Other herusitics, like h(n) = 0 for all n are possible, but doing this would make the algorithm treat all cells equally and ignore the heuristic part of the A* algorithm, turning it into Dijkstra’s algorithm.

I’ve written a script that generates random mazes and uses the A* algorithm to find the optimal path through this maze. For this script, I used the taxicab distance heuristic.

Check it out and let me know what you think.

# Learn About “the Other” Algebra

When I visit family for the holidays, the topic of my being a mathematician always seems to come up, and there’s always a child in the family struggling with maths, and when I ask the subject of their struggles the word “algebra” is always the culprit. I’ll save for another post my ideas on how this subject should be taught in high school and some of the main problems facing students.

I want to concentrate this post on a topic that few outside the mathematical world know about, but which many inside this world (myself included) hold dearly – the topic of modern or abstract algebra. I refer to this as “the other” algebra because a general conversation about the word “algebra” will generally revolve around concepts such as systems of equations, slopes, intercepts, intersection, rise-over-run, point-slope, and other terminology that limits algebra to a specific domain (the set of real or complex numbers) while at the same time ignoring the underlying beauty associated with this area.

I wrote previously about the area of set theory and the beauty associated with taking math out of the scope of a basic number line and into a much more undefined space. Abstract algebra is a continuation of set theory where in addition to our set, we have a (binary) operation defined on any two elements of this set. The inclusion of this binary operation allows us to consider several different structures based on the properties that this binary operation holds.

The structures I’d like to write about today are called groups. A group is a set along with an operation (or function) defined on any two elements of the set with the following properties:
– It is closed. This means that any time we run this function on two elements on the set, the function gives us a member of the set. In mathematical terms, for all a, b in the set A, f(a, b) must also be a member of A.
– There is an identity element. An identity element is defined as an element where is we include it in the binary operator with any other element, the operator will always return the other element. So if the element i is the identity element, then f(i, a) = a and f(a, i) = a for any other a in the set A. Any group must have an identity element.
– Every element has an inverse. Inverse elements are based on the identity element. What the inverse says is that for every element, there is a way to use the binary operator to get to the identity element. So for all elements a in the set A, there is an element b in the set A such that f(a, b) = i, where i is the identity element.
– The binary operator is associative. I described the associative property when I discussed the functions and relations of set theory. A function is associative if the way we group things (aka associate them) doesn’t matter. This means that for any elements a, b, and c of the set A, f(f(a, b), c) must be the same as f(a, f(b, c)).

If these four properties hold for a set A and a binary function f, then we say that the pair (A, f) is a group. We will generally use a common notation such as a · b, or a * b or simply ab to represent f(a, b).

Another important concept in group theory is the idea of a Cayley table. These are similar to multiplication tables that we drew out when we were first learning our “times tables”. For a group with n elements, we form a table with n rows and n columns. Each element of the group is written out to the left of each row and above each column (so really we can think of it as an n+1 by n+1 table with the first row and column being descriptive rows). Each cell of the table is the binary operator applied to the two elements indicated by the row and column (with an understanding of whether we have row before column or vice versa). Obviously, we can only do this for finite groups as we cannot write out all the elements of an infinite set.

The script I’ve added is a tester to allow users to input the information for a possible group (size, name of each element, and a Cayley table) and with this information the user is informed whether or not it forms a group. If it does not form a group, the reasons why it does not form one are also given. There are also some sample groups given to give insight into this area.

# Understanding Bayes’ Theorem

I’ve finished a script that helps understand Bayes’ Theorem.

If we have a set of mutually exclusive (aka non-overlapping) sets Bi for i {0, 1, 2, …, n} for some integer n, then the union of these sets forms a sample space. Lets call the sample space S. Suppose that we also have some set (also known as an event) A which is also a subset of S. Bayes’ Theorem considers the probability that one of these mutually exclusive events (one of the Bi‘s) caused the observed event (A).

This probability can be calculated by the formula

 Pr(Bj | A) = Pr(Bj) Pr(A | Bj) Pr(Bi) Pr(A | Bi)

The theorem helps us determine the the probability of the event Bj given A, or in more plain English, the probability that the event Bj is the cause that gives rise to the observed event A. The numerator is given by the product of of the probability of the causal event (Pr(Bj) times the conditional probability of the observed event given the causal event (Pr(A | Bj)). This numerator could be replaced by its equivalent statement of the set A Bj. Likewise, the denominator the sum (over all the causal events) of the probaility of each causal event times the conditional probability of the observed event given that particular causal event. Each term in this denominator could be replaced b its equivalent staetment A Bi, which when summed give the total probability of A because each pair of the Bj‘s is mutually exclusive. So we are able to replace the probability of A with Pr(Bi) Pr(A | Bi) because of the fundamental law of probability.

An example that would use Bayes’ Theorem is analyzing the results of an election. The set of mutually exclusive events could be membership in a political party (Democrat, Republican, or Independent). The observed event could be the election of an individual. And the conditional distributions could be the percentage of each party that voted for this individual. If we want to calculate how significant each party was to the individual’s election, we’d use Bayes’ Theorem.

The script I’ve written to help understand Bayes’ Theorem works as follows:
– A set of mutually exclusive sets is randomly generated (the number of sets also varies). These sets are called Bi for i (0, …, n}.
– A set A is randomly generated from the union of the Bi‘s.
– A table is displayed showing:
Pr(Bi) for each i on line 1.
Pr(A | Bi) for each i on line 2.

– The user is given the option to select which of the mutually exclusive sets they would like to use to calculate the probability that this set caused the event A.
– Once a set is chosen, the user clicks the “Calculate Conditional” button and Bayes’ Theorem gives the result.
– If the “show work” checkbox was checked, then the steps used in this calculation are also shown.
– All work is done using fractions to give an idea of where the numbers come from.

Other Blogs that have covered this topic:
Better Explained
Bayes’ Theorem-qed

# Learn Math Through Set Relations

Much of our world deals with relationships – both in the sense of romantic ones or ones that show some interesting property between two sets. When mathematicians think of set theory, a relation between the set A and the set B is a set of ordered pairs, where the first element of the ordered pair is from the set A and the second element of the ordered pair id from the set B. So if we say that R is a relation on the sets A and B, that would mean that R consists of elements that look like (a, b) where a is in A and b is in B. Another way of writing this is that R is a subset of A x B. For more on subsets and cross product, I refer you to my earlier script work on set operations.

Relations can provide a useful means of relating an abstract concept to a real world one. I think of things like the QB rating system in the NFL as an example. We have a set of all quarterbacks in the NFL (or really all people who have thrown a pass) and we would like some means of saying that one QB is performing better than another. The set of statistics kept on a QB is a large set, so attempting to show that one QB is better by showing that every year that they played one is better in every statistical category can be (a) exhaustive, and (b) will lead to very few interesting comparisons. Most of the really good QBs have some areas that they are really good and others that they are not. The QB rating system provides a relation between the set of all QBs in the NFL and the set of real numbers. Once this relation was defined, we can say that one QB is performing better than another if his QB rating is higher. Similarly we can compare a QB to his own statistics at different points in his career to see the changes and trends.

This is just one example, and there are countless others that I could have used instead.

Once we understand what a relation is, we have several properties that we are interested in. Below I list four, although there are many more.

Properties of Relations:
A relation R is symmetric if whenever an element (a, b) belongs to R, then so does (b, a).

A relation R is reflexive if for every element a in the universe of the relation, the element (a, a) belongs to R.

A relation R is transitive if for every pair of elements (a, b) and (c, d) and b = c, then the element (a, d) belongs to R.

A relation R is anti-symmetric if the elements (a, b) and (b, a) do not belong to the relation whenever a is not equal to b.

Once we understand what a relation is, there are a few common ones that we are interested in. Below I list four, but again, I want to stress that these are some of the more common ones, but there are several others.

Types of Relations:
A relation R is a function (on its set of defined elements) if there do not exist elements (a, b) and (a, c) which both belong to R.

A relation R is an equivalence relation if R is symmetric, reflexive and transitive.

A relation R is a partial order set if R is anti-symmetric, reflexive and transitive.

A relation R is a total order set if it is a partial order set and for every pair of elements a and b, either (a, b) is in R or (b, a) is in R.

A partial order is just an ordering, but not everything can be compared to everything else. Think about the Olympics, and a sport like gymnastics. Consider the floor and the balance beam. One person can win gold on the floor and another person wins gold on the balance beam. That puts each of them in the “top” of the order for their particular section, but there’s no way of comparing the person who won the floor exercise to the person who won the balance beam. So we say the set is “partially ordered”. More formally, lets say that two people (person X and person Y) relate if they competed in the same event and the the first person (in this case person X) received an equal or higher medal in that event than the second person (in this case person Y). Obviously any person receives the same medal as themselves, so this relation is reflexive. And if Jamie received an equal or higher medal than Bobby and Bobby received an equal or higher medal than Chris, then Jamie must have received an equal or higher medal than Chris so this relation is transitive. To test this relation for anti-symmetry, suppose that Chuck received an equal or higher medal than Charlie and Charlie received an equal or higher medal than Chuck. This means that they must have received the same medal, but since only one medal is awarded at each color for each event (meaning one gold, one silver and one bronze…if this is not true, assume it is), this must mean that Chuck and Charlie are the same person, and this relation is thus anti-symmetric.

If we have a partial ordering where we can compare everything, then we say that the set is “totally ordered”.

An equivalence relation tries to mimic equality on our relation. So, staying with that example of the Olympics, an example of an equivalence relation could be to say that two athletes relate to one another if they both received the same color medal in their event (for the sake of argument lets assume that no athlete competes in more than one event). Then obviously an athlete receives the same medal as themselves, so this relation is reflexive. If two people received the same medal, then it doesn’t matter if we say Chris and Charlie or Charlie and Chris, so the relation is symmetric. And Finally if Chris received the same medal as Charlie an if Charlie received the same medal as Jesse, then all three people received the same medals, so Chris and Jesse received the same medals and this relation is transitive. Because this relation has these three properties, it is called an equivalence relation.

# How Could You Possibly Love/Hate Math?

Growing up, I never really liked math. I saw it as one of those necessary evils of school. People always told me that if I wanted to do well and get into college, I needed to do well in math. So I took the courses required of a high school student, but I remember feeling utter confusion from being in those classes. My key problem was my inquisitive nature. I really didn’t like being “told” that certain things were true in math (I felt this way in most classes). I hated just memorizing stuff, or memorizing it incorrectly, and getting poor grades because I couldn’t regurgitate information precise enough. If this stuff was in fact “true”, I wanted to understand why. It seemed like so much was told to us without any explanation, that its hard to expect anybody to just buy into it. But that’s what teachers expected. And I was sent to the principal’s office a number of times for what they called “disturbing class”, but I’d just call it asking questions.

At the same time, I was taking a debate class. This class was quite the opposite of my math classes, or really any other class I’d ever had. We were introduced to philosophers like Immanuel Kant, John Stuart Mill, Thomas Hobbes, John Rawls, etc. The list goes on and on. We discussed theories, and spoke of how these concepts could be used to support or reject various propositions. Although these philosophies were quite complex, what I loved was the inquiries we were allowed to make into understanding the various positions. Several classmates and I would sit and point out apparent paradoxes in the theories. We’d ask about them and sometimes find that others (more famous than us) had pointed out the same paradoxes and other things that seemed like paradoxes could be resolved with a deeper understanding of the philosophy.

Hate is a strong word, but I remember feeling that mathematicians were inferior to computer programmers because “all math could be programmed”. This was based on the number of formulas I had learned through high school and I remember having a similar feeling through my early years of college. But things changed when I took a course called Set Theory. Last year, I wrote a piece that somewhat describes this change:

```They Do Exist!

Let me tell you a story about when I was a kid
See, I was confused and here's what I did.
I said "irrational number, what’s that supposed to mean?
Infinite decimal, no pattern? Nah, can't be what it seems."
So I dismissed them and called the teacher wrong.
Said they can't exist, so let’s move along.
The sad thing is that nobody seemed to mind.
Or maybe they thought showing me was a waste of time.

Then one teacher said "I can prove they exist to you.
Let me tell you about my friend, the square root of two."
I figured it'd be the same ol' same ol', so I said,
"Trying to show me infinity is like making gold from lead"
So he replies, "Suppose you're right, what would that imply?"
And immediately I thought of calling all my teachers lies.
"What if it can be written in lowest terms, say p over q.
Then if we square both sides we get a fraction for two."

He did a little math and showed that p must be even.
Then he asked, "if q is even, will you start believing?"
I stood, amazed by what he was about to do.
But I responded, "but we don't know anything about q"
He says, "but we do know that p squared is a factor of 4.
And that is equal to 2 q squared, like we said before."
Then he divided by two and suddenly we knew something about q.
He had just shown that q must be even too.

Knowing now that the fraction couldn't be in lowest terms
a rational expression for this number cannot be confirmed.
So I shook his hand and called him a good man.
Because for once I yould finally understand
a concept that I had denied all my life,
a concept that had caused me such strife.
And as I walked away from the teacher's midst,
Excited, I called him an alchemist and exhaled "THEY DO EXIST!"```

Aside from its lack of poetic content, I think that many mathematicians can relate to this poem, particularly the ones who go into the field for its theoretic principles. For many of us, Set Theory is somewhat of a “back to the basics” course where we learn what math is really about. The focus is no longer on how well you can memorize a formula. Instead, its more of a philosophy course on mathematics – like an introduction to the theory of mathematics, hence the name Set Theory.

The poem above focuses on a particular frustration of mine, irrational numbers. Early on, we’re asked to believe that these numbers exist, but we’re not given any answers as to why they should exist. The same could be said for a number of similar concepts though – basically, whenever a new concept is introduced, there is a reasonable question of how do we know this is true. This is not just a matter of practicality, but a necessity of mathematics. I mean I could say “lets now consider the set of all numbers for which X + 1 = X + 2”, but if this is true for any X, then it means that 1 equals 2, which we know is not true. So the set I’d be referring to is the empty set. We can still talk about it, but that’s the set I’d be talking about.

So why is this concept of answering the why’s of mathematics ignored, sometimes until a student’s college years? This gives students a false impression of what math really is, which leads to people making statements like “I hate math”, not really knowing what math is about.

# Learning Math through Set Theory

In grade school, we’re taught that math is about numbers. When we get to college (the ones of us who are still interested in math), we’re taught that mathematics is about sets, operations on sets and properties of those sets.

Understanding Set Theory is fundamental to understanding advanced mathematics. Iv wrote these scripts so that users could begin to play with the different set operations that are taught in a basic set theory course. Here, the sets are limited to positive integers and we’re only looking at a few operations, in particular the union, intersection, difference, symmetric difference, and cross product of two sets. I will explain what each of these is below.

The union of the sets S1 and S2 is the set S1 [union] S2, which contains the elements that are in S1 or S2 (or in both).
Note: S1 [union] S2 is the same as S2 [union] S1.

The intersection of the sets S1 and S2 is the set S1 [intersect] S2, which contains the elements that are in BOTH S1 and S2.
Note: S1 [intersect] S2 is the same as S2 [intersect] S1.

The difference between the sets S1 and S2 is the set S1 / S2, which contains the elements that are in S1 and not in S2.
. Note. S1 / S2 IS NOT the same as S2 / S1.
Note. S1 / S2 is the same as S1 [intersect] [not]S2.

The symmetric difference between the sets S1 and S2 is the set S1 [symm diff] S2, which contains the elements that are in S1 and not in S2, or the elements that are in S2 and not in S1.
Note. S1 [symm diff] S2 is the same as S2 [symm diff] S1.
Note. S1 [symm diff] S2 is the same as (S1 [intersect] [not] S2) [union] (S2 [intersect] [not] S1).

The cartesian product of the two sets S1 and S2 is the set of all ordered pairs (a, b), where a [in] S1 and b [in] S2.