Millionaire’s Problem
Lets assume there are two millionaires, named Alice and Bob. Alice and Bob what to know who is richer without revealing how much money either one of them have. This is what is known as the millionaires problem, originally introduced by Andrew Yao [paper link]. Why am I talking about this? Well I’m interested in the ability to share information among people while preserving their data privacy. In this case the two millionaires will know which is richer (information) without sharing how much either is worth (their private data).
The solution we will be exploring relies upon two assumptions. First, it is assumed that the two millionaires knows approximately how rich each are. In our example we’ll assume that Alice and Bob knows that they are both worth somewhere between 1 and 10 million dollars. The second assumption is that Alice and Bob are using a public key encryption system, in this example we’ll be using RSA. With those assumptions lets begin working through an example.
We’ll give Alice 8 million dollars (i=8) and Bob 6 million dollars (j=6). Then lets assume that Alice’s public key is (17, 1591) and her private key is (89, 1591). These keys are really small to make the math easy, also note that the maximum number that can be encoded with this pair is limited (actually the maximum is 1590).
STEP 1: Lets have Bob start the process. To do this Bob first generates a N bit random number called U, we’ll choose 1180. Bob then takes that number and computes C = 1180^17 mod 1591 = 749. Note this is actually the encrypted version of Bob’s random number using Alice’s public key.
STEP 2: Bob takes the C value he computed and adds his wealth value J and subtracts one, C - J + 1 = 749 - 6 + 1 = 744. Bob then sends Alice this value.
STEP 3: After receiving this value Alice creates values Y1, Y2, …, Y10. Where Y1 is the RSA deciphering of (C - J + 1), Y2 is the deciphering of (C - J + 2), Y3 is the deciphering of (C - J + 3), etc. Although Alice doesn’t know C or J, Alice can still compute this because Bob sent her (C - J +1). This produces the following table of Y values.
| x | (C - J + x) | RSA Function | Yx |
|---|---|---|---|
| 1 | 744 | 744^89 mod 1591 | 805 |
| 2 | 745 | 745^89 mod 1591 | 281 |
| 3 | 746 | 746^89 mod 1591 | 339 |
| 4 | 747 | … | 1311 |
| 5 | 748 | … | 1244 |
| 6 | 749 | … | 1180 |
| 7 | 750 | … | 1062 |
| 8 | 751 | … | 1359 |
| 9 | 752 | … | 219 |
| 10 | 753 | 753^89 mod 1591 | 942 |
STEP 4: Alice generates a random N/2 prime number called p, remember Bob’s random number was N bits. We’ll use 631, its prime and its close enough to N/2 bits. Using this number Alice generates Z1, Z2, Z3, … Z10. Where Z1 = Y1 mod p, Z2 = Y2 mod p, etc. This operation generates the following table of Z values.
| x | (C - J + x) | RSA Function | Yx | Zx (= Yx mod 631) |
|---|---|---|---|---|
| 1 | 744 | 744^89 mod 1591 | 805 | 174 |
| 2 | 745 | 745^89 mod 1591 | 281 | 281 |
| 3 | 746 | 746^89 mod 1591 | 339 | 339 |
| 4 | 747 | … | 1311 | 49 |
| 5 | 748 | … | 1244 | 613 |
| 6 | 749 | … | 1180 | 549 |
| 7 | 750 | … | 1062 | 431 |
| 8 | 751 | … | 1359 | 97 |
| 9 | 752 | … | 219 | 219 |
| 10 | 753 | 753^89 mod 1591 | 942 | 311 |
Detail Note: The random prime needs to be chosen such that the consecutive Z values differ by at least 2, the one we chose does and its easy to verify in the general case.
STEP 5: The rest of this is just the setup, heres the punchline. Alice then sends Bob the random prime she selected and the calculated Z values: Z1, Z2, Z3, … up to the i value (the number of millions Alice has i = 8). The remaining 10 values Alice sends but adds one to each value; Z9 + 1 and Z10 + 1. To recap Alice sends the first i Z values as they were calculated, she then sends the remaining Z values plus 1. Here are the numbers alice sends:
| p | Z1 | Z2 | Z3 | Z4 | Z5 | Z6 | Z6 | Z8 | Z9+1 | Z10+1 |
|---|---|---|---|---|---|---|---|---|---|---|
| 631 | 174 | 281 | 339 | 49 | 613 | 549 | 431 | 97 | 219+1=220 | 311+1=312 |
STEP 6: Bob then computes G = U mod p = 1180 mod 631 = 549. Recall U is the random number that Bob selected in step 1. The value p is known because Alice told Bob the random prime she selected (p). Bob then looks at the jth value (his wealth in millions) of the 10 Z values sent to him by Alice (ignoring the random prime Alice sent first). If this jth value is equal to G then Bob knows that Alice is of greater than or equal wealth to himself. In our case G does equal the jth value (549 = 549), this means that Alice is either of greater or equal wealth. On the other hand if G is not equal to jth value it means that Bob is richer than Alice.
STEP 7: Bob then tells Alice that she is richer. Both parties now know which person is richer and was able to determine this without exposing their actual wealth. Note that Bob actually determines if Alice is greater than or equal wealth, to determine if its equal or greater than you would have to run the algorithm in reverse (swap the roles for Alice and Bob).
At the end they both know more, for example Alice knows that Bob is worth between 1 and 8 million and Bob knows that Alice is between 6 and 10 million. This is a result of just knowing who is richer than the other and the original range of values they already knew. Besides this no other information can be determined by the involved parties.
Still not convinced? Here is some python code that demonstrates how this algorithm works, it implements RSA public key encryption and then walks through one example (similar to this article). You can easily change the wealth of Alice and Bob and see how it works. How to play with this:
- Download million.py
- Execute this file using python.
python million.py - At the bottom of the file is the “main” body of the program. You can edit various values in this section and generally play with everything there. Specifically BOB_VALUE, ALICE_VALUE, and MAX_VALUE.
There are several implementation problems with this when in a real world problem. First, what if people lie? Bob and or Alice could simply not tell the truth about their values. By lying Alice and Bob could run the algorithm multiple times to and eventually determine exactly how rich the other party is. Yao actually addresses this issue in his original paper and proposes a few solutions. Second, computation cost is extremely high, since you are computing public key encryption values for several values the computation time could be very high. Third, the range of values, in most practical problems the range of values is greater than 10 which means lots of computation but also lots of Z values need to be sent between Alice and Bob. Fourth, if you implement this for multiple parties there can be collusion among the parties which could reveal private data about an individual in the group.
Its interesting to think that this was written in 1982 just a few years after public key encryption was discovered. Was it a problem looking for a solution or a solution looking for a problem? Which ever direction the solution came from the millionaires problem proves that you can share information yet protect ones private data. While this solution might have some issues at least there is a solution! Since this paper was published there has been several more sophisticated solutions but this one is elegant and is fairly obvious after working through a few examples.
October 29th, 2007 at 1:58 pm
This is near, but it took me a while to identify something that was bugging me about it: it’s simply a way to compare two numbers. The fact that we’re comparing monetary amounts is OOB data that needs to be communicated and agreed on between the two parties. I think that it’s generally the OOB data that’s the interesting part of the interaction. Going back to a previous post of yours.. I don’t really care how many Neil Diamond albums you own. The fact that you do own a Neil Diamond album, on the other hand, is an interesting bit (literally) of data. In order to ask/answer that question, though, we still need to agree on the subject.
For me, an interesting problem is figuring out how to answer those binary questions without revealing additional information, *including* the OOB information about the subject.
October 31st, 2007 at 4:24 pm
Agreed, the interesting item to know is that I own a Neil Diamond album or just that I have some interest in X. Using the music taste example its really not important that you know that I own an album but more important to know that people (aggregate) that own similar albums as you do also own some other album you don’t have. This means you need a way of grouping people without revealing identities.
I’ve looked at are partial hashing of collections in the hope of grouping multiple people together. You can’t use exact hashing because I might own 1 album different than you which would cause us not to be grouped. If you do partial hashing of your collection of music then you can group with one another and then having a mapping to other hash values that contain other albums. The hash works as just a way of aggregating preferences in an anonymous way. I actually have a working prototype of this and will try to post this soon.
The problem with my little prototype is it current looks at all permutations of your album collection to and creates hash entries for each. I’ve looking at using two hash functions and creating a 2d space where you then map your entries into that instead of the 1d hash function. Then you can do things like range queries over the 2d space that should help cut down the number of permutations needed. This still needs some more thought, although if your interested you should look at how you can do range queries using a DHT.
November 5th, 2007 at 5:46 pm
Re: the hashing for album collections:
How about using shingles instead? They can be compared. It’s used in search engines a lot.
Re: millionares problem
I remember a paper that came out around 1992 or so. The problem was to do the classic “flip a coin, call it in the air” with two remote people and prevent lying. It was similar, except that I actually understood (eventually) the math.
Bug:
I assume that should have the words “adds” and “substracts” flipped…
Ciao!
November 17th, 2007 at 3:11 am
I’ve thought about the problem (”Is X interested in Y”) in terms of sparse bloom filters, with an agreement on how data is canonicalized before it is hashed. This allows you to construct a binary blob that you can hand to me, and that I can then query, without knowing exactly what data went in to making up the blob. It still requires some OOB data - but that’s an agreement on protocol, not subject.