Skip to content

GroupSort 2: Let’s be Real Here

In the last post, we wrote a naive algorithm to sort a set of users into groups of a specified size, minimizing the number of “repeat pairs” of group members, as given in a pairing history variable. In other words, you’re trying to put students into rooming configurations, and you want students who’ve been roomed together before to be less likely to roomed together again. We want to maximize the pairing diversity because meeting new people is fun and a memorable part of your formative years.

But our algorithm failed those students, because it could only realistically run for numbers up to about 10 students, before the computation time exploded. So today we’re gonna visit the approach that I took when I first tackled this problem about 6 or 7 years ago, and see how it stacks up to the “ideal” algorithm.

But before we do that, some pieces of code from our naive algorithm are going to be useful to us, so we’ll pull them out into their own functions:

These helper functions take our history object, and return the historical sum of pairings we’re looking for – at the level of a single pair of student IDs (eg. “How many times have Bert and Amy been paired together?” => “Three”), or the level of a group (the sum of all the pairings of every pair of members within a group), or at the level of a group list, which is a collection of groups encapsulating all our student IDs. When we arrive at a solution, our last function there,  get_group_list_pair_sum , returns a single number representing our total historical overlap. In a way, this could serve as the basis of a very useful measure of just how good our solution is, since this is the number we’re seeking to minimize.

“Team Captains” Algorithm

With that, let’s describe the algorithm we’re gonna write today. I call this the “Team Captain” method, because it’s akin to picking teams for dodgeball or whatever in a schoolyard setting – each group gets a “turn” to choose a person from the set of IDs, and this repeats until there’s nobody left, and all the groups are full. The trick comes in when determining how a group picks its new member: we check the history of each candidate again each member in the group already, and whoever has the lowest historical pair count with all the members of the group gets added to that group, and removed from the pool of available members. In this way, it’s like a team captain looking across their fellow classmates, having an in-depth discussion with their team about who’s been paired with who in the past, and then picking a new member based on who has the smallest total count.

This algorithm looks kinda like this:

Here we’ve introduced the concept of a Group object (a Python named tuple), with a few helper functions. The gist of the algorithm is from lines 42-49, in our while loop.

So how does this stack up to our naive algorithm? We can perform a few simple test runs to get an at-a-glance look, but we’re really going to need some kind of benchmarking tests to get a good sense of how we’re doing.

Here’s what a cursory run comparison looks like for a few sets of randomly generated data with user ID range from 1-10, and a group schema of  [3, 3, 4] :

As expected, our “team captain” algorithm gets a worse overlap value pretty consistently. But how much worse is it really? It certainly can’t get a better solution than the naive algo, but how would it compare against just randomly chosen groups? In our next post, we’ll write up some tools to give us a better analysis of a specific algorithm’s approach, so that we can write yet more implementations and decide which among them best suits our needs, by the numbers. As a little look ahead, I think we can say that speed matters, but as long as a solution is found within a certain minimal threshold, we should strongly favour solutions that better minimize this overlap consistently.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *