Culturomics 2.0 Aims to Predict Future Events
By analyzing tens of millions of news stories, a supercomputer in Tennessee may be able to predict future human events.
Last week, shortly after Idea Lobby blogger Emily Badger wrote about “a new R&D project to test tools that would mine publicly available data to predict political and humanitarian crises, disease outbreaks, mass violence and instability,” a professor at the University of Illinois published his findings on how a computational analysis of millions of news stories could have predicted the Arab Spring.
Kalev H. Leetaru, writing in the online journal First Monday, showed how data mining in the worldwide news archive could have “have forecasted the revolutions in Tunisia, Egypt, and Libya, including the removal of Egyptian President Mubarak, predicted the stability of Saudi Arabia (at least through May 2011), estimated Osama Bin Laden’s likely hiding place as a 200-kilometer radius in Northern Pakistan that includes Abbotabad, and offered a new look at the world’s cultural affiliations.”
The forecasts came after the fact, but Leetaru’s work suggests a proof of concept for the Open Source Indicators Program, and project sponsored by the U.S. Office of the Director of National Intelligence.
As Badger wrote, the project “is premised on the idea that big events are preceded by population-level changes, and that those population-level changes should be identifiable if we just look in the right places.” To Leetaru, the “right place” is within news stories, both print and online.
“While heavily biased and far from complete, the news media captures the only cross-national real-time record of human society available to researchers,” he wrote. He observes that news stories convey much more than just factual information; the news also features “an array of cultural and contextual influences strongly impact how events are framed for an outlet’s audience, offering a window into national consciousness.”
The federal project expects to spend more time observing Internet activity. In his paper, Leetaru discusses the benefits that would come from examining social media and search engine trends, “but the technical and linguistic complexities, especially the need to operate on large numbers of vernacular languages across the world, made it beyond the scope of this study.”
Leetaru gathered data from a range of sources including organizations that monitor local media, and online news archives. A supercomputer at the University of Tennessee analyzed more than 100 million articles, and its computations suggested (albeit in hindsight) that a similar effort may be able to predict events such as revolutions in Libya or Egypt.
If this draws comparisons with the hot new pursuit of “culturomics,” in which an historical issue is analyzed based on mentions in a giant body of printed material, that’s no accident. Leetaru dubs his work “culturomics 2.0.”
He concludes, “The findings of this study suggest that Culturomics, which has thus far focused on the digested history of books, can yield intriguing new understandings of human society when applied to the real-time data of news.”