I’ve been interested in big data ever since I first read about it, but never really understood how it worked in practice or considered how it might be used in the humanitarian world. So I when I was asked to be an advisor to a new project at Georgetown University to explore using big data to forecast forced migration, I jumped at the opportunity. The participants in this NSF-funded planning project are an impressive and diverse group, including experts in immunology, data retrieval, law, anthropology, risk management, and of course experts in displacement. While this was my first meeting with the group, it was clear that they had established ways of not only communicating but working together across disciplinary boundaries.
I must say that at one level, it was a frustrating meeting. While I feel fairly confident of my expertise on humanitarian issues, particularly displacement, there were times in the meeting when I felt like I was listening to a discussion in Aymara or Finnish. I could understand some of the words – like scenario planning, algorithms and corpus. Others I could guess at: data scraping, Ground Truth documents, localized data. Still other terms were far beyond anything I had ever imagined: non-negative matrix factorization, latent dirichlet allocation.
The project seeks to do three things: to create an interdisciplinary community of computational and social scientists, to foster capacity for analyzing early warning indicators and to use big data to test the feasibility for creating an early warning system to detect forced migration in the context of humanitarian crises.
As Jeff Crisp, another Subject Matter Expert (SME), observed, “Early warning has been the holy grail of the humanitarian community for decades.” How can you predict how and when and where people will move when a conflict or disaster or famine occurs? There are other early warning systems for conflict and famine (though they all have their shortcomings), but none that try to predict displacement. Susan Martin, one of the principal investigators in the project, explained that the intention was to identify the proximate indicators associated with displacement: what are the factors that occur before; sometimes right before, large-scale displacement occurs?
This project is based on analysis of the unique Raptor dataset, held at Georgetown, which presently consists of over 600 million pieces of information, which is compiled through daily scraping of +20,000 newspaper sources in +45 languages, including local, national, regional and international sources (though curiously it originally did not include US-based sources). That’s a lot of data. The computational scientists are figuring out not only how to retrieve useful data from all those newspaper articles, but how to identify what is important amid all the Noisy Data. We talked a lot about the reliability of the data, of the particular news sources being scraped. One person emphasized the importance of having credible sources, but Martin explained that in the context of understanding forced migration, sometimes widely-circulated false information can be more of a trigger for migration than more accurate information. Perceptions may matter more than the truth, an observation reflected in the project’s use of the Dread Threat concept. It’s not just an objective, observable phenomenon that constitutes a threat, but also the perception of the threat that can act as a driver of migration. And Dread Threats vary between individuals. For example, I might experience a Dread Threat when a rampaging militia is 100 miles away while my more laid-back sister doesn’t feel a Dread Threat until the militia is just up the street.
We talked about platforms and data transformation. We looked at causal loop diagrams and reviewed mini-sims. There was so much enthusiasm in the room that it was hard to admit that I was having a hard time keeping up. But beyond the new-to-me terminology, the diagrams made sense. How do people respond when they are exposed to violence or other threats? The model is based on several alternative response strategies: people exploit current opportunities, they pursue alternative livelihoods, they stand up self-defense, or they move. And at different points in time, in response to different perceptions of risk, the mixture of strategies they use changes. This makes intuitive sense which in turn leads to other questions: what influences the particular “blend” of strategies used by a community and how do you include the complex array of factors related to local perception of threats and to a family’s threat management strategy into a model which can explain forced migration?
I wonder if big data can provide a better way not only of understanding how and why people decide to flee from their homes, but of forecasting when and where they will move. If we ever get to that point, that were be a whole different set of issues which will need discussion, but for now I’m looking forward to participating in future meetings of the group and watching how this attempt to apply big data to life-and-death issues of displacement turns out.