Skip to content

GroupSort -or- Please Wait Students, the Algorithm is Almost Done

A few years back, a teacher friend of mine asked me if I could write a program to solve a problem for him. His school travels, so they’re spending every night in hotels, and need to group the students into rooms. Ideally, the students will be with new people as often as possible, and not just sharing rooms with the same people every hotel. Adding to the complexity, the rooms can be of varying size, and the number of nights in each hotel will differ slightly.

The problem, then, is this:

By the end of the trip, we want each student to have spent the same number of nights sharing a room with each other student; or as close as is reasonably possible. (The math won’t always work out such that a perfectly even distribution is possible.)

Now I’ve since learned that this is similar to a sort of textbook problem about grouping golfers for a tournament, minimizing repeated pairings. But since the golf problem is always groups of 4, and always 1 game per grouping, it’s really a special case of the more general solution we need here.

So let’s get to work building our groupsort module. Python’s my dominant language, so we’ll start with that. Let’s define a function prototype that takes a set of student IDs, the current history of each student pairing, and the desired group schema:

We want our result to be a list of Python sets, representing the groupings of student IDs. These sets should be generated such that the sum of the pairing histories between each unique pair of members in a group, is minimized, compared to other possible group combinations. So given the following example pairing history for a group of 3 students:

We can see that students 1 and 2 have been grouped together once; 2 and 3 have also been grouped together; but 1 and 3 have not. So if we given a schema of  [1, 2]  (a group of 1 and a group of 2), our desired output would be  [{1, 3}, {2}] . Our updated history, after this grouping, would then be an even spread.

A naive, brute-force approach to getting there would have our algorithm perform the following steps:

  1. Create every possible unique combination of IDs across the groups.
  2. Sum the historical pairings of each pair of members in each group simulation.
  3. Return the group with the lowest value for this summing. Pick one randomly if multiple variations exist.

Sounds great! We’re ready to roll! Let’s see what that looks like:

Looks great! We’re ready to ship! (Voice from deep withinnaive … brute force …)

Anyone hear something? Oh, what the heck, let’s just run a simple test to make sure we’re in good shape.

The following can generate some test data for us, with a small range of IDs and mock history where any given pair of students has been partnered randomly either once or never:

This data yields the following output, lickety-split:

How’d we do? Well, looking at our history, we can clearly see that every member of every group is paired with new members! 2 and 4 have no history, 3 and 6 have none, and 1, 5 and 7 are all new pairings.

Voice… factorial … increase … NP … completeness

Could’ve swore I… oh, well. Just to be sure, let’s test it with a more real-world number, say, 30 students, broken into 6 groups of 5.

[1,000,000 years pass]

I am older and much wiser since I began running this program, and indeed, how foolish I was. I have since invented a time machine, and attempted to send messages back in time to myself before I ran the program, but alas, my efforts were in vain.

As you may understand, this problem is NP-complete – the computational time required to solve it explodes as our number of variables (students & groups) increases. Now, there’s probably a few little tweaks we could make on our above algorithm to improve its performance, but we’ll never catch up with the explosion in time. A real-world number like 30 students into 6 groups is unsolvable.

Also everyone’s immortal now and it kicks ass.

To give a sense of just how our function explodes, look squarely at our invocation of  itertools.combinations() . In our 30-student example, we find every combination of some group of 5 from those students (142,506), then for each of those, we find every combination of the remaining 25 students (53,130), then the remaining 20 (15,504), the remaining 15, (3,003), 10 (252), and the last 5 (1).

That’s ~ 8.9 x 1019 combinations to evaluate.

What’s the age of universe in seconds again? (Our 7-member test run had 35 * 6 * 1 = 210 combinations to compare.)

To come up with a useful algorithm, we’re going to need to employ some heuristics — little “cheats” to estimate our best moves, based on things we know about the problem and our desired solution.

Coming in part 2!

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *