Crunching the OKCupid database

The OKCupid people have been running a free online dating service, backed by psychological matching algorithms driven by user-written tests, for many years, and have build up a huge corpus of data about how people interact. Now they have started a blog, where they discuss the statistical findings that may be gathered from comparing people's profiles and message counts.

One blog posts looks at how well different profile attributes predict whether two people will match. Not surprisingly, the zodiac signs of any two people have no effect on their actual personalities, and thus on how well they would get along:

Race has a slightly greater influence (of a few percentage points either way), presumably because of uneven distribution of cultural backgrounds, but it is still fairly small. (Keep in mind that the match scores are computed from how users answer others' questions, and not from explicitly asking questions like "would you date a Virgo/Polynesian/Buddhist".) Religion, however, turns out to be a lot more telling:
According to this, atheists, agnostics, Jews and Buddhists seem to get along just swell (in fact, Buddhists appear to be slightly more compatible with the nonbelievers than with other Buddhists), whereas the Christians, Hindus and Muslims tend to be somewhat more contentious, not only not getting along with other religions as well but also with each other. Additionally, the more seriously one takes religion, it seems, the less likely one is to get along with others.

Looking again at the issue of race, while race doesn't seem to affect actual compatibility scores, it does affect how likely people are to get responses:

Love may be blind, but it also seems that it, or at least attraction, is deeply racist.

On a lighter note, OKCupid have crunched the word frequencies of successful and unsuccessful opening messages and discovered what to write if you want a reply. Netspeak and "hip" misspellings ('u', 'luv', 'wat') and physical compliments are out, whereas mentions of specific interests are helpful. Unsurprisingly, mentioning religion is generally a bad idea as well.

There are 4 comments on "Crunching the OKCupid database":

Posted by: Greg Mon Oct 26 06:12:46 2009

I suspect this is based on circular reasoning. They make claims about 'types' of people, based on counts of who 'matches' whom. The types are races and religions, which are guaranteed to spark headlines. But their definition of a 'match' is based on profile data: "If, for example, a couple match each other 71%, it means they are likely to like each other, based on their own individual definitions of what makes a person cool, sexy, and attractive, not ours." Well, for mine, if they're going to claim that person A is a match for person B, they'd want to watch them contact each other, meet, date ... to whatever extent one feels a relationship must go before the pair can be deemed a match - not just *assume* they will match based on criteria they typed in. Of course, a dating site wants to believe that profile data predict a match. But humans do not always get these predictions right ("s/he seemed right until we got married" etc). As far as I can tell there is no observation of 'post first meeting' events.

Posted by: acb Mon Oct 26 09:17:26 2009

AFAIK, a match is derived from how two people answered a set of questions (from a large body of user-defined questions), and what significance they gave them. For example, if two people give the same answer to a question and mark it as important, this boosts their compatibility score more than if they gave the same answer and rated it as unimportant, and if they give conflicting answers and mark it as important, it decreases their compatibility score.

Posted by: Greg Mon Oct 26 10:07:42 2009

That's my gripe. Data like those don't predict much about attractiveness, which is quite embodied, as shown in a range of studies such as the classic "smell the t-shirt" paper ( People might *think* they'll be attracted to someone who holds similar political beliefs, entertainment preferences and so on, but these turn out to be outweighed by visual and olfactory estimates of genetic suitability. An interesting study would be to see how well 'conscious preferences' such as profile data predict relationship longevity, seeing that those other studies suggest not. But instead of testing this, the dating site (self-servingly) assumes that they correlate, and goes on to draw conclusions about racial and religious groups that are probably quite outrageously unsupportable. (I say 'self-servingly' because dating sites are predicated on the idea that suitable partners can be found via data typed into a website, though this is probably not the case.)

Posted by: acb Mon Oct 26 13:37:37 2009

I think you're onto something there.

