What can we learn about ourselves from the things we ask online? Seth StephensDavidowitz analysed anonymous Google search data, uncovering disturbing truths about our desires, beliefs and prejudices
Everybody lies. People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when theyre not. They say theyll be in touch when they wont. They say its not about you when it is. They say they love you when they dont. They say theyre happy while in the dumps. They say they like women when they really like men. People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves. And they damn sure lie to surveys. Heres my brief survey for you:
Have you ever cheated in an exam?
Have you ever fantasised about killing someone?
Were you tempted to lie?
Many people underreport embarrassing behaviours and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias. An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behaviour, and charitable giving.
Has anything changed in 65 years? In the age of the internet, not owning a library card is no longer embarrassing. But, while whats embarrassing or desirable may have changed, peoples tendency to deceive pollsters remains strong. A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared with official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2% reported that they graduated with lower than a 2.5 GPA (grade point average). In reality, about 11% did. And 44% said they had donated to the university in the past year. In reality, about 28% did.
Then theres that odd habit we sometimes have of lying to ourselves. Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40% of one companys engineers said they are in the top 5%. More than 90% of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1% in their ability to get along with other people. If you are deluding yourself, you cant be honest in a survey.
The more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them. However, on sensitive topics, every survey method will elicit substantial misreporting. People have no incentive to tell surveys the truth.
How, therefore, can we learn what our fellow humans are really thinking and doing? Big data. Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.
The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.
I have spent the past four years analysing anonymous Google data. The revelations have kept coming. Mental illness, human sexuality, abortion, religion, health. Not exactly small topics, and this dataset, which didnt exist a couple of decades ago, offered surprising new perspectives on all of them. I am now convinced that Google searches are the most important dataset ever collected on the human psyche.
The Truth About Sex
How many American men are gay? This is a regular question in sexuality research. Yet it has been among the toughest questions for social scientists to answer. Psychologists no longer believe Alfred Kinseys famous estimate based on surveys that oversampled prisoners and prostitutes that 10% of American men are gay. Representative surveys now tell us about 2% to 3% are. But sexual preference has long been among the subjects upon which people have tended to lie. I think I can use big data to give a better answer to this question than we have ever had.
First, more on that survey data. Surveys tell us there are far more gay men in tolerant states than intolerant states. For example, according to a Gallup survey, the proportion of the population that is gay is almost twice as high in Rhode Island, the state with the highest support for gay marriage, than Mississippi, the state with the lowest support for gay marriage. There are two likely explanations for this. First, gay men born in intolerant states may move to tolerant states. Second, gay men in intolerant states may not divulge that they are gay. Some insight into explanation number one gay mobility can be gleaned from another big data source: Facebook, which allows users to list what gender they are interested in. About 2.5% of male Facebook users who list a gender of interest say they are interested in men; that corresponds roughly with what the surveys indicate.