Assignment 4: Fair Elections of Foods (55 Points)

Chris Tralie

Overview / Logistics

The purpose of this assignment is to give you practice with recursion and ideas from permutation theory/sorting in a fun application to which the whole class has contributed: fair elections of foods. Each member of the class ranked a subset of foods that students volunteered on discord at the beginning of the course. In this assignment, we will define mathematically an optimal "average ranking" that incorporates everyone's choices in a fair and even way, and we will explore computational issues that arise when we attempt to create algorithms to compute this average ranking.

Though this is a pretty whimsical application of election theory, hopefully it will get you thinking about some of the computational issues involved with ranked choice voting as people begin to reckon with the myriad issues we have with fairness in our current democractic system. Click here to read a recent Time article on this topic.

Click here to review notes on Kendall-Tau and Kemeny Optimal rankings.

Learning Objectives

  • Implement an exact and an approximate version of the same algorithm.
  • Use recursion to implement both efficient and brute source solutions to problems.
  • Manage permutations, orders, and sorts in arrays.
  • Be introduced to the the notion of an algorithm reduction.

What To Submit

Submit your file to canvas

Suggested Timeline

  • Finish brute force kendall-tau by Wednesday 10/19
  • Finish diameter and average rank by Friday 10/21
  • Finish brute force kemeny by Monday 10/24
  • Finish efficient Kendall-Tau by Thursday 10/27

Programming Tasks

Click here to download the starter code for this assignment. You will be editing Code has been provided for loading in the rankings and plotting a dimension reduced version of all pairwise ranking comparisons.

Rankings As Numbers

To keep things simple, if we have N items we're ranking, we'll label them from 0 to N-1 in some arbitrary order. In this assignment, we'll be using the food votes that students cast, and we'll use the following labels:

	0: Sushi
	1: Tacos
	2: Pho
	3: Spaghetti_Meatballs
	4: Strawberries
	5: Dumplings
	6: Pizza
	7: Brownies

So, for instance, if someone gives the rankings

  1. Strawberries
  2. Pho
  3. Tacos
  4. Spaghetti_Meatballs
  5. Dumplings
  6. Brownies
  7. Pizza
  8. Sushi

then this would correspond to the list 4, 2, 1, 3, 5, 7, 6, 0. I have intentionally kept the list small so we can explore all permutations of this list in some later tasks.

In the starter code, you will see a method called load_permutations which loads the class rankings from a text file that's been provided in the starter code. There is a list of 8 food strings. Then, the rankings of each student are represented as arrays of numbers between 0 and 7 which index this array. All of the rankings are stored in a dictionary, where the key is the student's name as a string, and the value is such a list. The data is shown below

Brute Force Kendall-Tau Distance (10 Points)

Fill in the method kendall_tau to compute the Kendall-Tau distance between two rankings. For N rankings, your method should run in O(N2) time and use O(N) space.

To test this on the example in the background, you can write the code

If this works, you should see an output of 7, as explained in the background. For a more comprehensive example, you can run the method plot_mds_distances, which will use a method known as multidimensional scaling (MDS) to come up with a set of 2D coordinates that best preserve the Kendall-Tau distances between every pair of raters (though it will usually introduce some distortion, for a similar reason to the fact that it's impossible to create a square map of the earth without stretching some areas). If you run the code below You should see the following plot

You can study this plot to choose your next partner with whom to setup a dinner CS 271 work session, or perhaps with whom to setup a catering business!

Diameter (7 Points)

As hinted by the MDS example above, the The Kendall-Tau distance is truly a distance, in the sense that it is a metric space. Even though we can't perfectly visualize it spatially without error, we can still reason geometrically about it with numerics. For instance, we can search for the two rankings which are closest to each other in this space. In this case, those would be

who are at a distance of 5 from each other; they agree except for the the pairs (7, 5), (7, 3), (2, 1), (0, 2), (0, 4).

Actually, we see here some of the distortion that's introduced in the MDS plot above, because, for instance, Connor and Tom appear to be further apart than Connor and Breeze, but the latter two are actually 10 apart.

Aside from closest rankings, we can also define the opposite notion of the two ratings that are furthest apart from each other, which we refer to as the "diameter."

Your Task

Write a method that takes in the rankings dictionary, and which prints out the two people who achieve the diameter, along with their rankings and the Kendall-Tau distance between them. For M people who rated and N foods, your method should run in O(M2 N2) time.

If this worked properly, you should see that Tom and Shiba L are the most different, with 26 pairwise disagreements (in fact, they are nearly exact reverses of each other except for two pairs: (5, 3) and (3, 7)!).

Hint: You might take inspiration by studying the code in plot_mds_distances, which does some similar preprocessing to the code you will have to write.

Average Rank Aggregation (8 Points)

We discussed the Kemeny optimal rank aggregation, but there is a simpler technique we can try first to incorporate everyone's votes. We can think of every ranking as giving rise to a place number associated to each index. For example, if a number shows up at index 6 in the a permutation, we give it a place of 7 (since we're zero-indexed). We then simply average these place numbers to get a final average place. For example, in our rankings, we get the following averages:

which leads to these results in the election if we sort in ascending order of average place number

Expressed as a list, this corresponds to the permutation [0, 1, 2, 7, 3, 6, 5, 4]

Your Task

Create a method get_average_rankings that accepts the parameters foods and raters and which prints out the foods in the order of their average aggregated rankings. For N choices and M rankings, your method should run in O(MN) time. You may use a python or numpy sorting method to help with this if you'd like.

Brute Force Kemeny Optimal Ranking (15 Points)

The above answer seemed to match peoples' preferences, but what about the Kemeny Optimal rank? (click here to review notes on this). Unfortunately, there is no known way to compute this efficiently. But we can create a brute force scheme to recursively examine every permutation and find the one which minimizes the sum of the Kendall-Tau distances to all rankings. In this case, this is doable, because we only have 8 alternatives, and 8! = 40320, so we can just crunch through this number of possibilities (though many more foods than this would be intractable).

Your Task

Create a recursive method which computes the Kemeny Optimal ranking by trying all possible permutations. You can use the permutation enumeration code from the sorting module as a starting point. You will probably want to pass a python list or dictionary by reference recursively that stores the minimum cost seen so far, as well as a copy of the permutation that achieved that cost (you can use list.copy() to copy it over).

If this works correctly, you should get the following permutation as the one that minimizes the sum of Kendall-Tau distances:

[0, 1, 7, 2, 6, 3, 4, 5]

Let's look at what characters these are and compare them side by side with the average ranking

Average Raking Kemeny Optimal

Interestingly, the trends are the same, but if we swap brownies and tacos, strawberries and pizza, and pho and Spaghetti_Meatballs, we get more agreements overall.

It's encouraging that they are so similar, but how do we know that the Kemeny optimal one is better? Well, if we examine the sum of the Kendall-Tau distances of the average ranking to all rankings, we see that it is 195. But the Kemeny optimal one has a sum of Kendall-Tau distances of 187. Therefore, though it is very close, the Kemeny optimal rank has eight fewer disagreements with everyone in the class than the average ranking. And that's all there is to it!

Efficient Kendall-Tau (15 Points)

The implementation of brute force Kemeny Optimal ranking using your O(N2) code for Kendall-Tau probably took a few minutes to complete. It's worthwhile to speed up the individual Kendall-Tau distance computations since we have to do so many of them. In fact, it's possible to compute the Kendall-Tau distance in O(N log N) time using merge sort. Review the technique here for how to do this.

Your task: Implement a method kendall_tau_fast which computes the Kendall-Tau distance between two rankings in O(N log N) time. Then, replace the calls to Kendall-Tau in your Kemeny Optimal code with this method. To quantify the performance increase, time your Kemeny Optimal code with the brute force method, and compare that to Kemeny Optimal with your faster merge sort-based method. You can use the snippet below to help you time the code: