I like Neil Diamond! Is that private data?

Recently I purchased a CD that was recommended to me by Amazon. I was really impressed, although I’ve not listened to the entire CD I’m glad Amazon recommended it. I’m not the only one with this experience but it got me thinking about the privacy of my data (in this case my purchase data) and more importantly who actually owns it.

We are told to only allow trusted parties access to our private information but what if those trusted parties don’t have what I’m interested in? Or maybe they just can’t figure out what I like. Is there a benefit in allowing untrusted parties access to my private information so they can recommend new items or at least target their presentation towards me? I’m not talking about spam I’m talking about user-aware presentation logic. If I visit a new website selling shoes it would be useful for both of us to know that I usually buy brown shoes. Do you want to tell everyone in the world that you buy brown shoes? Maybe preferred shoe color isn’t violating my privacy but what if the site could know that I support gun rights or that I’m pro-choice. Where do you draw the line? Can they know my favorite color but not my preferred style of underwear?

I believe that a much better approach is to have a decentralized algorithm that allows me to use my peers to determine what items I might be interested in. The idea is that I want to tell a group of peers “Hey, I’ve bought these items do you have any suggestions for items I should buy?”. How might we do this? The constraint is that I want to keep both mine and my peers data private.

To recap I have to ask my peers about purchases without revealing that I made those purchases myself? How can I do that? How do you do that with your friends today? Ahh… we use the I have a friend that really likes Neil Diamond what should I buy her for her birthday? That’s right blame the friend for your bad music tastes :) How can the friend respond without admitting they too listen to Neil Diamond? Using the same trick; “I have a friend that loves Neil Diamond and he also likes Bread”.

There you have it a decentralized privacy preserving approach to creating personal recommendations. The actual implementation isn’t so simple but the general idea is easy to see. Get a bunch of people together then form groups. Ask around your neighbors and then have your neighbors ask the people in their group. As long as responses and requests hide the origin of the message no one will know who has this sickening music taste. But in the end everyone is happy because data is kept private but you also have recommendations. How might this work in the context of the web? That’s what I’m working on right now so I’ll let you know if I come up with a solution.

In the meantime if you want to learn more about privacy of data you might want to look at various papers in the data mining literature. There are also some papers dealing with peer-to-peer technologies and how to implement private data sharing, an interesting one is Friends Troubleshooting Network (FTN) out of the Princeton University (NDSS 2005). Over the next few weeks I’ll post some code samples that implement various ideas about privacy preserving data mining.

6 Responses to “I like Neil Diamond! Is that private data?”

  1. The Doctor What Says:

    Yeah, but if you know someone who likes Neil Diamond, that’s *also* information.

    Besides, there may be lots of information that might correlate for a suggestion as well. People who buy less than $20 worth of cds might like Neil Diamond. Do you share that information? What if it gets you a better suggestion?

    If you share too much of the sort of info you can blame on the friend you’re buying for, they can identify your “friend” as well!

    It’s sort of like the derivative of data.

    I think it’s even more complicated than you suggest.

    Ciao!

  2. Bryan Mills Says:

    Absolutely, this is much more difficult than I suggest. The general idea I presented has several major problems.

    We assumed you have some secure communications channel, otherwise a person overhearing many requests/responses can determine what people “like”.

    If people “gossip” they can determine who “likes” what.

    As Doctor What pointed out the system I proposed assumed you trust everyones suggestions. One way to solve this is to vote on suggestions, so instead of just blindly accepting a suggestion you can put it to vote using a similar protocol of asking your friends.

    Some of the problems with these types of schemes have been solved but there are still plenty of open questions surrounding this problem.

    -bryan

  3. Samrobb Says:

    Gene & I were talking about something similar to this at one point - the idea would be to do personal data mining and correlation based on data flow in a smart phone (OK, iPhone :-) Part of the problem is that you’d want to keep it from becoming the ultimate spyware app. I want my machine / phone / whatever to build a profile that *I* can use to find stuff I might be interested in, not that *others* can use to figure out how to spam me with advertisements.

    BTW, Bryan, you’re one of the top Google hits for “Apple Store Pittsburgh” :-)

  4. Bryan Mills Says:

    Its interesting that Samrobb would mention cell phones because you could envision this aggregation actually occurring as the data travels through a multi-hop network. If you did this you could actually ensure the privacy of the data as it travels through the network.

  5. Samrobb Says:

    This a current project, or just speculative thinking?

  6. Samrobb Says:

    … and, now, the iPhone finally has an SDK coming, so something like this may actually be doable.

Leave a Reply