There may be an considerable quantity of unstructured information about historic occasions — information articles, authorities studies, and native bulletins — however extracting this info manually at scale is unimaginable. Our methodology analyzes information studies the place flooding is a main topic. We then use the Google Learn Aloud user-agent to isolate main textual content from 80 languages, which is standardized into English by way of the Cloud Translation API.
Probably the most essential step of the extraction course of is completed utilizing the Gemini Massive Language Mannequin (LLM). We engineered a complicated immediate that guides Gemini by a strict analytical verification course of:
- Classification: The mannequin distinguishes between studies of precise, ongoing, or previous floods and articles that merely focus on future warnings, coverage conferences, or basic danger modeling.
- Temporal reasoning: Gemini anchors relative references (e.g., “final Tuesday”) in opposition to an article’s publication date to find out exact occasion timing.
- Spatial precision: The system identifies granular areas (neighborhoods and streets) and maps them to standardized spatial polygons utilizing utilizing Google Maps Platform.
The technical validation of Groundsource confirms its reliability for high-stakes analysis. In handbook evaluations, we discovered that 60% of extracted occasions had been correct in each location and timing. Crucially, 82% had been correct sufficient to be virtually helpful for real-world evaluation — for instance, by capturing the right administrative district or pinpointing the occasion inside a single day of its reported peak.
The protection offered by Groundsource represents a massive-scale growth over current archives. By reworking unstructured media into information, we’ve got generated 2.6 million occasions — a major improve in comparison with the information present in conventional monitoring programs. Moreover, spatiotemporal matching reveals that Groundsource captured between 85% and 100% of the extreme flood occasions recorded by GDACS between 2020 and 2026, an indication of its effectiveness in figuring out high-impact disasters alongside smaller, localized occasions.
