The Null Device

Sexing text

Researchers at the Illinois Institute of Technology have written a program which identifies the sex of an author by their word usage frequency. Apparently women use relationship-related words like "with" and "for", whereas men use more specific and absolute words like "the", "this" and "as"; which brings us back to the old rock-logic/water-logic cliché.
The results showed that the words favoured most heavily by men were what grammarians call determinative words such as "the," "a," "as," "that" and "one." Female writers favoured "she" and relationship words such as "for," "with," "in," "and" and "not."
"This is surprising, since, unlike conversation, writing a book or an article does not involve direct social interaction"

Hmmm; if one wrote up such a program and applied it to, say, blogs on the web, I wonder what proportion it would sex accurately.

Update: the paper may be found here (though you have to subscribe to get the PDF). However, there is also a copy on the personal page of Prof. Moshe Koppel, one of the authors. And it appears that they're from Israel, not Illinois. (Perhaps the journalist confused the abbreviations?)

There are 2 comments on "Sexing text":

Posted by: gjw http://the-fix.org Tue May 27 23:53:29 2003

I wish this program was running on a CGI so I could give it a go - it looks like fun. I wonder if it's capable of discriminating in technical writing, where your choice of words is much more restricted.

Posted by: acb http://dev.null.org Wed May 28 08:09:08 2003

It appears that the paper may be found here:

http://www3.oup.co.uk/litlin/current/170401.sgm.abs.html

The abstract also mentions that the same technique may be used to determine whether a text is fiction or non-fiction.

Want to say something? Do so here.

Post pseudonymously

Display name:
URL:(optional)
To prove that you are not a bot, please enter the text in the image into the field below it.

Your Comment:

Please keep comments on topic and to the point. Inappropriate comments may be deleted.

Note that markup is stripped from comments; URLs will be automatically converted into links.