Artificial Intelligence 14 min read

Alternative Data Mining: From 19th‑Century Cholera Mapping to Modern AI‑Driven Risk Modeling

This talk reviews the concept of alternative data, illustrates its early use in John Snow's cholera map, explores contemporary AI‑powered systems such as IBM's Debater and satellite‑based poverty estimation, and presents the speaker's own research on using unconventional data for financial‑market risk detection and prediction.

DataFunTalk
DataFunTalk
DataFunTalk
Alternative Data Mining: From 19th‑Century Cholera Mapping to Modern AI‑Driven Risk Modeling

The presentation begins by defining "alternative data" as niche, under‑exploited datasets and outlines the agenda: historical examples, cutting‑edge engineering advances, and the speaker's own research on risk modeling.

1. Historical example – John Snow’s cholera map (19th century) : Snow surveyed households, plotted cases on a map, identified a contaminated water pump as the outbreak source, and advocated its removal, demonstrating early data‑driven epidemiology.

2. Modern AI applications – IBM Debater : The Debater system, a decade‑long, multi‑nation effort, combines deep learning, natural‑language processing, and data‑mining to generate arguments from news articles and historical debate transcripts, showcasing AI’s ability to emulate and surpass human debate.

3. Satellite imagery for poverty estimation (Science, 2016) : Researchers used publicly available night‑time light intensity and high‑resolution satellite images to derive features (e.g., building density) and predict poverty indicators in African countries, overcoming the lack of reliable ground‑truth socioeconomic data.

4. Risk modeling with unconventional data : The speaker’s group leverages alternative data to monitor sudden risk events (terrorist attacks, natural disasters, pandemics) and predict secondary‑market reactions. They build event‑driven market models using historical incident databases and real‑time news extraction, incorporate night‑light data to weight economic development, and achieve ~70 % accuracy with decision‑tree classifiers for event‑impact prediction.

5. Political‑tweet analysis for market forecasting : By extracting entities, linking to knowledge bases, and performing sentiment and causal reasoning on high‑profile officials' tweets (e.g., former U.S. President Trump), the team aims to forecast market movements and generate early risk warnings.

The talk concludes with a brief recap, encouraging further exploration of alternative data to uncover hidden “water pumps” that can drive societal and technological progress.

Artificial IntelligenceData Miningsatellite imageryalternative datarisk modelingfinancial markets
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.