Intelligent Machines

Can You Really Spot Cancer Through a Search Engine?

A new study argues that mining people’s searches could help catch cancer sooner, providing a tantalizing glimpse of how our online habits might be used to improve health.

Jun 8, 2016

In the world of cancer treatment, early diagnosis can mean the difference between being cured and being handed a death sentence. At the very least, catching a tumor early increases a patient’s chances of living longer.

Researchers at Microsoft think they may know of a tool that could help detect cancers before you even think to go to a doctor: your search engine.

In a study published Tuesday in the Journal of Oncology Practice, the Microsoft team showed that it was able to mine the anonymized search queries of 6.4 million Bing users to find searches that indicated someone had been diagnosed with pancreatic cancer (such as “why did I get cancer in pancreas,” and “I was told I have pancreatic cancer what to expect”). Then, looking at people’s search patterns before their diagnosis, they identified patterns of search that indicated they had been experiencing symptoms before they ever sought medical treatment.

Pancreatic cancer is a particularly deadly form of the disease. It’s the fourth-leading cause of cancer death in the U.S., and three-quarters of people diagnosed with it die within a year. But catching it early still improves the odds of living longer.

By looking for searches for symptoms—which include yellowing, itchy skin, and abdominal pain—and checking the user’s search history for signs of other risk factors like alcoholism and obesity, the team was often able to identify searches for symptoms up to five months before they were diagnosed.

Ryen White, chief technology officer of Microsoft Health.

In their paper, the team acknowledged the limitations of the work, saying that it is not meant to provide people with a diagnosis. Instead they suggested that it might one day be turned into a tool that warns users whose searches indicate they may have symptoms of cancer.

“The goal is not to perform the diagnosis,” said Ryen White, one of the researchers, on a post on Microsoft’s blog. “The goal is to help those at highest risk to engage with medical professionals who can actually make the true diagnosis.”

White and his colleague Eric Horvitz have performed many similar studies looking at what types of information can be gleaned from search engines, including a study last month on how people’s searches evolve as they cope with breast cancer. In 2013, they showed that people’s searches could be mined for adverse effects of prescription drugs even before the U.S. Food and Drug Administration was aware of any problems. Social media also appears to be rich territory—the city of Chicago has used tweets to look for signs of food-borne illnesses stemming from local restaurants.

But other initiatives have disappointed. Google Flu Trends aimed to track flu and dengue outbreaks based on people’s searches, but was discontinued when it didn't work as well as hoped.

In their latest work, the Microsoft researchers acknowledge the drawbacks of their study. For one thing, search queries make for a messy data set. The team originally started with data from 9.2 million users but had to cut it to 6.4 million because, for example, some people search for health-related terms more than 20 percent of the time. That likely means those people are health-care professionals—but that’s a large chunk of users to just leave by the wayside.

All this leads to an interesting question: how much health insight can we pull from the data we generate online? Research like the Microsoft team’s provides a tantalizing glimpse of an answer—but for now, at least, it seems like it will remain just out of reach.

(Read more: New York Times, Microsoft blog, Centers for Disease Control and Prevention, “Software Predicts Tomorrow’s News by Analyzing Today’s and Yesterday’s”)