Skip to content

GroupSort 3: Putting Data in Perspective

Part 1: Naive algorithm

Part 2: “Team Captain” algorithm

For this next part, I wanted to make some tools to better put our data in context, as it comes out from our two algorithms, the naive but perfect algorithm, and our homemade “team-captain” style algorithm.

We left off with just simple, single-integer values to compare against each other, and we saw that the team-captain algorithm fared worse (there were typically 2-3 times as many pairing overlaps as with the naive output), but how much worse is this?

We’re going to get a bigger look at the data here, by introducing the concept of “microstates”. As it relates to our problem, we know that there can be several tens of thousands of possible solutions, even when we keep our numbers down to something small (10 students, split into three groups). Each of this possible solutions is called a “microstate.” In a random shuffling of students into groups, each outcome is equally likely as each other outcome (like a single 5-card hard dealt from a shuffled deck of cards). Also! Each solution has its own “overlap value”, representing the collective sum of historical pairings of every member of every group across every other member of their group (phew!).

By simple example, if we said that Billy and Margot had been in the same group twice already, Margot and Peppa once, and Billy and Peppa three times, then the sum of the historical pairings in a group of Billy + Margot + Peppa would be just those values summed together: 2 + 1 + 3 = 6. That’s what we’re calling our overlap value.

But previously, our historical pair counts were just randomly generated. Let’s really put some effort into simulating these cases! We write the following function + associated helper:

We’re turning our history value into a Counter (from the collections module) instead of a basic dict. We do this because we’re using our history to tally counts of things (how many times everyone has been in the same group as each other), and the Counter object has a few niceties that make dealing with these tallies easier.

In build_group_history, we simply construct a Counter representing every pairing of members in a single group (night_count is a way of keeping track of how many nights this group stayed together, in the case of hotel rooming configurations).

These values are then summed together, one for each group in a group_list, and added to the history Counter, before being returned as the output of the update_history function. This is one of the niceties of Counters! You can’t “sum” dictionaries; it doesn’t really make sense, but Counters can be summed, because it’s assumed that their values are all integers (assumed, though not enforced).

So, what we do is, pass our existing history Counter into the function, along with a list of groups, and number of nights; and what we get back is a new Counter, with the number of nights updated for each pairing in each group!

Now we can track history across many iterations of our algorithms!

We write one more function to provide us with some helpful data. It should look familiar; it’s almost identical to our naive algorithm:

The only difference is, rather than return a single group_list value, this new function tallies the overlap value of every single possible group_list, and keeps count of just how many combinations have a given overlap value.

This data can put into context our blind comparison from the last entry! If the naive algorithm has an overlap count of 5, and the “team-captain” algorithm has a count of 7, we don’t know much about what this means. BUT if we can see that of all the possible combinations, a count of 7 falls in the top 10% of solutions, then we have some quantitative way of determining whether or not it’s any good!

We write the following little program, which loads our module, runs a few iterations of groupsort, in order to build up a realistic history, and then runs both algorithms, outputting the overlap value for both, along with saving a plot of the total solution states, so we can see how our algorithm compares!

And our output is the following:

groupsort

What we see is that most possible solutions have an overlap value of 22 & 23, with a very small number of “perfect” solutions (18), and several more with 20, which is what the “team captain” algorithm got.

Running the simulation a few more times, for kicks, the “team captain” algorithm pretty consistently returns 20s, with the occasional 18 or 21. Simply on a cursory inspection, it looks as though this puts us in the upper 10-20% of solutions, at least for our small sample of data, with these test figures (notably, smaller than what we expect from our real-world figures).

This is a pretty useful benchmark, I’d say! We can have a moderate degree of confidence, at this stage, that our algorithm does its job! Next step: let’s step up our analysis, and run some simulations with more real-world-like data (obviously not with our naive algorithm, which can’t run when the numbers get that high).

NOTE: Guess what! During the above testing, I was pretty disheartened that the “team captain” algorithm was performing so much worse than the naive approach. I ran a few simulations of each, and output the histories to see what the final spread of values was (as discussed in our first entry, ideally we want a very even spread of values across all historical pairings; this tells us that members got paired with other members evenly over time). What I saw was a terrible spread from the “team captain” algo history. What I also saw was that members with IDs 1, 2 and 3 were never getting paired together.
Why?

Here’s the main loop of our “team captain” algorithm again:

We start with empty groups, then run this puppy across each group, until all groups are full. See the problem?

The python min function behaves in a specific way: if all the values in a list are the same, it will return the first value.

Well, when we start making our groups, all the groups are empty; thus, the “ideal member” to add to that group is any member, they’re all the same. So the min function just returned the first. It would do this for the 3 groups, every single time. This means that members 1, 2 and 3 were always their respective team captains, and were never paired together, throwing the whole solution out of whack!

How do we fix this? Have a look at our new lines 47 and 48:

We just need to add a shuffle in there somewhere, to randomize our team captains each time. Here, we do this making use of the random module’s sample, which returns a shuffled copy of the list (using the actual shuffle function shuffles in place. We could use it if we wanted to just break the list comprehension into its own line, then shuffle the variable its set to, and pass the now-mutated list into the min function).

Anyway, now it kicks ass, and is a much better approach to sorting our students!

Comments are closed.