How and what Google learns

The London Review of Books looks at various books recently published about Google, an essay on Google's data-collecting and machine-learning operations; it appears that a lot of the services Google provide are
In 2007, Google told the New York Times that it was now using more than 200 signals in its ranking algorithm, and the number must now be higher. What every one of those signals is and how they are weighted is Google’s most precious trade secret, but the most useful signal of all is the least predictable: the behaviour of the person who types their query into the search box. A click on the third result counts as a vote that it ought to come higher. A ‘long click’ – when you select one of the results and don’t come back – is a stronger vote. To test a new version of its algorithm, Google releases it to a small subset of its users and measures its effectiveness through the pattern of their clicks: more happy surfers and it’s just got cleverer. We teach it while we think it’s teaching us. Levy tells the story of a new recruit with a long managerial background who asked Google’s senior vice-president of engineering, Alan Eustace, what systems Google had in place to improve its products. ‘He expected to hear about quality assurance teams and focus groups’ – the sort of set-up he was used to. ‘Instead Eustace explained that Google’s brain was like a baby’s, an omnivorous sponge that was always getting smarter from the information it soaked up.’ Like a baby, Google uses what it hears to learn about the workings of human language. The large number of people who search for ‘pictures of dogs’ and also ‘pictures of puppies’ tells Google that ‘puppy’ and ‘dog’ mean similar things, yet it also knows that people searching for ‘hot dogs’ get cross if they’re given instructions for ‘boiling puppies’. If Google misunderstands you, and delivers the wrong results, the fact that you’ll go back and rephrase your query, explaining what you mean, will help it get it right next time. Every search for information is itself a piece of information Google can learn from.
By 2007, Google knew enough about the structure of queries to be able to release a US-only directory inquiry service called GOOG-411. You dialled 1-800-4664-411 and spoke your question to the robot operator, which parsed it and spoke you back the top eight results, while offering to connect your call. It was free, nifty and widely used, especially because – unprecedentedly for a company that had never spent much on marketing – Google chose to promote it on billboards across California and New York State. People thought it was weird that Google was paying to advertise a product it couldn’t possibly make money from, but by then Google had become known for doing weird and pleasing things. ... What was it getting with GOOG-411? It soon became clear that what it was getting were demands for pizza spoken in every accent in the continental United States, along with questions about plumbers in Detroit and countless variations on the pronunciations of ‘Schenectady’, ‘Okefenokee’ and ‘Boca Raton’. GOOG-411, a Google researcher later wrote, was a phoneme-gathering operation, a way of improving voice recognition technology through massive data collection. Three years later, the service was dropped, but by then Google had launched its Android operating system and had released into the wild an improved search-by-voice service that didn’t require a phone call.
One takeaway from the article is that, while it may be said that "if you don't know what the product is, you are the product", Google don't really give that much personal information to advertisers, or even allow advertisers to target ads very precisely (as they can, for example, on Facebook). Google collect a wealth of information, though the bulk of it remains in the machine:
It isn’t possible, using Google’s tools, to target an ad to 32-year-old single heterosexual men living in London who work at Goldman Sachs and like skiing, especially at Courchevel. You can do exactly that using Facebook, but the options Google gives advertisers are, by comparison, limited: the closest it gets is to allow them to target display ads to people who may be interested in the category of ‘skiing and snowboarding’ – and advertisers were always able to do that anyway by buying space in Ski & Snowboard magazine. The rest of the time, Google decides the placement of ads itself, using its proprietary algorithms to display them wherever it knows they will get the most clicks. The advertisers are left out of the loop.

