Google Flu Trends Proves Inaccurate
The Google Flu Trends tracker, a source of pride for "big data" aficionados, has proven to be somewhat of a failure due to its regular inaccuracy, according to a recent study.
Google's Flu Trends tracker is a data analysis system intended to interpret the rates of flu-related searches in order to predict the prevalence of flu in a region even before health officials, such as the Centers for Disease Control and Prevention (CDC). It was developed in 2008 and geographically tracks search engine inquiries based on whether or not they pertain to flu symptoms. Using complex algorithms, the tracker converts this data into an estimated flu infection rate and then proceeds to predict the number of additional infections per region.
However, findings recently published in Science show that for the past three years the Google Flu Trends (GFT) has been remarkably inaccurate. Following an observation published in the scientific journal Nature that indicated that the GFT was predicting double the influenza infection rates of what the CDC was reporting last year, Northeastern and Harvard University researchers decided to measure the accuracy of the GFT's predictions since its birth back in 2008.
What they found was remarkable. Not only did the GFT miss the mark considerably back in 2012, but has overestimated flu infection rates for 100 out of 108 weeks starting since August 2011. Worse, the data analysis, utterly overlooked the 2009 H1N1 outbreak, even though the word "swine flu" was filling headlines for months. At that the time CDC reports were slow to come in due to the unseasonal nature of the outbreak, but the GFT, being an algorithmic data system, should have not been nearly as thrown off by a "surprise" flu outbreak, as computers don't surprise easily -- or so researchers assume.
So what caused the inaccuracies? The study indicated that the GFT has been increasingly less accurate due to the fact that Google's own business model encourages web surfers to make several searches on the same subject in one sitting, thus seeing more ads. Unfortunately for the GFT, this simulates an increased rate of flu-related searches that was not accounted for, contributing to an overestimation of flu infection rates.
Key-word misinterpretations also has played a part. According to the study searches for even things like college basketball sometimes accidentally contained the right combination of over 50 million "flu-related" search terms to tick the GFT into thinking someone was making a flu-related search. College basketball season happens to coincide with flu season in the U.S., causing the interference to be overlooked.
What does this mean for GFT? Probably a few lessons learned and a "better luck next year" is all Google programmers should expect. The researchers maintain that if you wish to follow annual flu trends, the weekly CDC reports are still your best bet, even if flawed with minimal human error. Computer error, it seem, can be much worse.
The study was published in Science on March 14.
Mar 16, 2014 12:25 AM EDT