Part 2: “Team Captain” algorithm
For this next part, I wanted to make some tools to better put our data in context, as it comes out from our two algorithms, the naive but perfect algorithm, and our homemade “team-captain” style algorithm.
We left off with just simple, single-integer values to compare against each other, and we saw that the team-captain algorithm fared worse (there were typically 2-3 times as many pairing overlaps as with the naive output), but how much worse is this?
We’re going to get a bigger look at the data here, by introducing the concept of “microstates”. As it relates to our problem, we know that there can be several tens of thousands of possible solutions, even when we keep our numbers down to something small (10 students, split into three groups). Each of this possible solutions is called a “microstate.” In a random shuffling of students into groups, each outcome is equally likely as each other outcome (like a single 5-card hard dealt from a shuffled deck of cards). Also! Each solution has its own “overlap value”, representing the collective sum of historical pairings of every member of every group across every other member of their group (phew!).
By simple example, if we said that Billy and Margot had been in the same group twice already, Margot and Peppa once, and Billy and Peppa three times, then the sum of the historical pairings in a group of Billy + Margot + Peppa would be just those values summed together: 2 + 1 + 3 = 6. That’s what we’re calling our overlap value.
But previously, our historical pair counts were just randomly generated. Let’s really put some effort into simulating these cases! We write the following function + associated helper:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from collections import Counter import itertools def update_history(history, group_list, night_count = 1): # Returns a new updated history counter with counts from group_list return sum((build_group_history(history, group, night_count) for group in group_list), history) def build_group_history(history, group, night_count = 1): # Build a pair history Counter for a single group return Counter({ tuple(sorted(pair)): night_count for pair in itertools.combinations(group, 2) }) |
We’re turning our history value into a Counter
(from the collections
module) instead of a basic dict. We do this because we’re using our history to tally counts of things (how many times everyone has been in the same group as each other), and the Counter object has a few niceties that make dealing with these tallies easier.
In build_group_history
, we simply construct a Counter representing every pairing of members in a single group (night_count
is a way of keeping track of how many nights this group stayed together, in the case of hotel rooming configurations).
These values are then summed together, one for each group in a group_list, and added to the history
Counter, before being returned as the output of the update_history
function. This is one of the niceties of Counters! You can’t “sum” dictionaries; it doesn’t really make sense, but Counters can be summed, because it’s assumed that their values are all integers (assumed, though not enforced).
So, what we do is, pass our existing history
Counter into the function, along with a list of groups, and number of nights; and what we get back is a new Counter, with the number of nights updated for each pairing in each group!
Now we can track history across many iterations of our algorithms!
We write one more function to provide us with some helpful data. It should look familiar; it’s almost identical to our naive algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
def microstate_analysis(ids, history, group_schema): microstates = Counter() def make_group(id_set, group_list, _group_schema): if not _group_schema: # Terminating condition - check our built group_list pair_sum = get_group_list_pair_sum(history, group_list) microstates[pair_sum] += 1 return # Take our schema's last item without mutating group_size = _group_schema[-1] # Make a list of all combinations from the ID set for group_size group_combinations = itertools.combinations(id_set, group_size) for group in group_combinations: # Recursively make groups with the remaining IDs make_group(id_set - set(group), group_list + [group], _group_schema[0:-1]) make_group(ids, [], group_schema) return microstates |
The only difference is, rather than return a single group_list
value, this new function tallies the overlap value of every single possible group_list, and keeps count of just how many combinations have a given overlap value.
This data can put into context our blind comparison from the last entry! If the naive algorithm has an overlap count of 5, and the “team-captain” algorithm has a count of 7, we don’t know much about what this means. BUT if we can see that of all the possible combinations, a count of 7 falls in the top 10% of solutions, then we have some quantitative way of determining whether or not it’s any good!
We write the following little program, which loads our module, runs a few iterations of groupsort, in order to build up a realistic history, and then runs both algorithms, outputting the overlap value for both, along with saving a plot of the total solution states, so we can see how our algorithm compares!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import groupsort import matplotlib.pyplot as plt from collections import Counter def main(): """ Run a series of simulations with the groupsort algorithms """ h = Counter() ids = set(range(1,10)) for i in range(10): g = groupsort.groupsort(ids, h, [3, 3, 3]) h = groupsort.update_history(h, g) g1 = groupsort.groupsort(ids, h, [3, 3, 3]) g2 = groupsort.groupsort_captains(ids, h, [3, 3, 3]) m = groupsort.microstate_analysis(ids, h, [3, 3, 3]) plt.plot(m.keys(), m.values(), linestyle='None', marker='o') plt.ylabel('# of solutions') plt.xlabel('overlap value') plt.savefig('groupsort.png') print("Overlap of naive algo: {}".format(groupsort.get_group_list_pair_sum(h, g1))) print("Overlap of captain algo: {}".format(groupsort.get_group_list_pair_sum(h, g2))) if __name__ == "__main__": main() |
And our output is the following:
1 2 |
Overlap of naive algo: 18 Overlap of captain algo: 20 |
What we see is that most possible solutions have an overlap value of 22 & 23, with a very small number of “perfect” solutions (18), and several more with 20, which is what the “team captain” algorithm got.
Running the simulation a few more times, for kicks, the “team captain” algorithm pretty consistently returns 20s, with the occasional 18 or 21. Simply on a cursory inspection, it looks as though this puts us in the upper 10-20% of solutions, at least for our small sample of data, with these test figures (notably, smaller than what we expect from our real-world figures).
This is a pretty useful benchmark, I’d say! We can have a moderate degree of confidence, at this stage, that our algorithm does its job! Next step: let’s step up our analysis, and run some simulations with more real-world-like data (obviously not with our naive algorithm, which can’t run when the numbers get that high).
NOTE: Guess what! During the above testing, I was pretty disheartened that the “team captain” algorithm was performing so much worse than the naive approach. I ran a few simulations of each, and output the histories to see what the final spread of values was (as discussed in our first entry, ideally we want a very even spread of values across all historical pairings; this tells us that members got paired with other members evenly over time). What I saw was a terrible spread from the “team captain” algo history. What I also saw was that members with IDs 1, 2 and 3 were never getting paired together.
Why?
Here’s the main loop of our “team captain” algorithm again:
42 43 44 45 46 47 48 49 |
while not all_groups_full(group_list): group = group_cycle.next() if is_group_full(group): continue ideal_member = min([(get_new_member_pair_count(group, member), member) for member in id_pool], key=itemgetter(0)) add_member(group, ideal_member[1], id_pool) |
We start with empty groups, then run this puppy across each group, until all groups are full. See the problem?
The python min
function behaves in a specific way: if all the values in a list are the same, it will return the first value.
Well, when we start making our groups, all the groups are empty; thus, the “ideal member” to add to that group is any member, they’re all the same. So the min
function just returned the first. It would do this for the 3 groups, every single time. This means that members 1, 2 and 3 were always their respective team captains, and were never paired together, throwing the whole solution out of whack!
How do we fix this? Have a look at our new lines 47 and 48:
47 48 |
ideal_member = min(sample([(get_new_member_pair_count(group, member), member) for member in id_pool], len(id_pool)), key=itemgetter(0)) |
We just need to add a shuffle in there somewhere, to randomize our team captains each time. Here, we do this making use of the random
module’s sample
, which returns a shuffled copy of the list (using the actual shuffle
function shuffles in place. We could use it if we wanted to just break the list comprehension into its own line, then shuffle the variable its set to, and pass the now-mutated list into the min
function).
Anyway, now it kicks ass, and is a much better approach to sorting our students!
Comments are closed.