A few years back, a teacher friend of mine asked me if I could write a program to solve a problem for him. His school travels, so they’re spending every night in hotels, and need to group the students into rooms. Ideally, the students will be with new people as often as possible, and not just sharing rooms with the same people every hotel. Adding to the complexity, the rooms can be of varying size, and the number of nights in each hotel will differ slightly.

The problem, then, is this:

**By the end of the trip, we want each student to have spent the same number of nights sharing a room with each other student; or as close as is reasonably possible.** (The math won’t always work out such that a perfectly even distribution is possible.)

Now I’ve since learned that this is similar to a sort of textbook problem about grouping golfers for a tournament, minimizing repeated pairings. But since the golf problem is *always* groups of 4, and *always *1 game per grouping, it’s really a special case of the more general solution we need here.

So let’s get to work building our **groupsort** module. Python’s my dominant language, so we’ll start with that. Let’s define a function prototype that takes a set of student IDs, the current history of each student pairing, and the desired group schema:

1 2 3 4 5 6 7 8 9 |
def groupsort(ids, history, group_schema): """Sort a list of IDs into groups. Arguments: ids -- set of integers; the complete set of IDs to be sorted history -- pairing history. A dictionary mapping pairs of IDs to the historical count of previous pairings group_schema -- list of group sizes. must sum to len(ids) """ |

We want our result to be a list of Python sets, representing the groupings of student IDs. These sets should be generated such that the *sum* of the pairing histories between *each unique pair of members in a group*, is *minimized*, compared to other possible group combinations. So given the following example *pairing history* for a group of 3 students:

1 2 3 4 5 |
sample_history = { (1, 2): 1, (1, 3): 0, (2, 3): 1, } |

We can see that students 1 and 2 have been grouped together once; 2 and 3 have also been grouped together; but 1 and 3 have not. So if we given a schema of [1, 2] (a group of 1 and a group of 2), our desired output would be [{1, 3}, {2}] . Our updated history, after this grouping, would then be an even spread.

A naive, brute-force approach to getting there would have our algorithm perform the following steps:

- Create every possible unique combination of IDs across the groups.
- Sum the historical pairings of each pair of members in each group simulation.
- Return the group with the
*lowest*value for this summing. Pick one randomly if multiple variations exist.

Sounds great! We’re ready to roll! Let’s see what that looks like:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
def groupsort(ids, history, group_schema): """Sort a list of IDs into groups. Arguments: ids -- set of integers; the complete set of IDs to be sorted history -- pairing history. A dictionary mapping pairs of IDs to the historical count of previous pairings group_schema -- list of group sizes. must sum to len(ids) """ import itertools # These are single-item lists so that our enclosed functions can mutate them # In Python3, we would instead declare them with `nonlocal` keyword lowest_sum = [None] return_group_list = [None] def get_pair_count(id_pair): # Return the historical count of pairings for these 2 IDs return history.get((min(id_pair), max(id_pair))) or 0 def get_group_pair_sum(group): # Return the sum of historical pairings in this group return sum([get_pair_count(pair) for pair in itertools.combinations(group, 2)]) def get_group_list_pair_sum(group_list): # Return sum of historical pairings for each group in list return sum([get_group_pair_sum(group) for group in group_list]) def make_group(id_set, group_list, _group_schema): """ Recursively make groups with remaining IDs until we've run out of groups in our schema, at which point, we terminate, and check if our current combination has a lower pairing sum than our current lowest. """ if not _group_schema: # Terminating condition - check our built group_list pair_sum = get_group_list_pair_sum(group_list) if lowest_sum[0] is None or pair_sum < lowest_sum[0]: lowest_sum[0] = pair_sum return_group_list[0] = group_list return # Take our schema's last item without mutating group_size = _group_schema[-1] # Make a list of all combinations from the ID set for group_size group_combinations = itertools.combinations(id_set, group_size) for group in group_combinations: # Recursively make groups with the remaining IDs make_group(id_set - set(group), group_list + [group], _group_schema[0:-1]) make_group(ids, [], group_schema) # Return the group_list that had the lowest pairing sum return return_group_list[0] |

Looks great! We’re ready to ship! (**Voice from deep within****: ***naive … brute force …*)

Anyone hear something? Oh, what the heck, let’s just run a simple test to make sure we’re in good shape.

The following can generate some test data for us, with a small range of IDs and mock history where any given pair of students has been partnered randomly either once or never:

1 2 3 |
TEST_IDS = [i for i in range(1, 8)] TEST_HISTORY = {pair: random.choice((0, 1)) for pair in itertools.combinations(TEST_IDS, 2)} TEST_GROUP_SCHEMA = [3, 2, 2] |

This data yields the following output, lickety-split:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
TEST_HISTORY = { (1, 2): 1, (1, 3): 0, (1, 4): 1, (1, 5): 0, (1, 6): 1, (1, 7): 0, (2, 3): 1, (2, 4): 0, (2, 5): 1, (2, 6): 0, (2, 7): 0, (3, 4): 1, (3, 5): 1, (3, 6): 0, (3, 7): 1, (4, 5): 1, (4, 6): 1, (4, 7): 1, (5, 6): 0, (5, 7): 0, (6, 7): 1 } Output: [(2, 4), (3, 6), (1, 5, 7)] |

How’d we do? Well, looking at our history, we can clearly see that every member of every group is paired with *new* members! 2 and 4 have no history, 3 and 6 have none, and 1, 5 and 7 are all new pairings.

*Voice*: *… factorial … increase … NP … completeness*

Could’ve swore I… oh, well. Just to be sure, let’s test it with a more real-world number, say, 30 students, broken into 6 groups of 5.

[1,000,000 years pass]

I am older and much wiser since I began running this program, and indeed, how foolish I was. I have since invented a time machine, and attempted to send messages back in time to myself before I ran the program, but alas, my efforts were in vain.

As you may understand, this problem is NP-complete – the computational time required to solve it explodes as our number of variables (students & groups) increases. Now, there’s probably a few little tweaks we could make on our above algorithm to improve its performance, but we’ll never catch up with the explosion in time. A real-world number like 30 students into 6 groups is unsolvable.

Also everyone’s immortal now and it kicks ass.

To give a sense of just *how* our function explodes, look squarely at our invocation of
itertools.combinations() . In our 30-student example, we find every combination of some group of 5 from those students (142,506), then for each of those, we find every combination of the remaining 25 students (53,130), then the remaining 20 (15,504), the remaining 15, (3,003), 10 (252), and the last 5 (1).

That’s ~ 8.9 x 10^{19 }combinations to evaluate.

What’s the age of universe in seconds again? (Our 7-member test run had 35 * 6 * 1 = 210 combinations to compare.)

To come up with a useful algorithm, we’re going to need to employ some heuristics — little “cheats” to estimate our best moves, based on things we know about the problem and our desired solution.

Coming in part 2!

## Be First to Comment