How often do policymakers have to decide where to concentrate poverty-alleviation programs based on limited data? For example, where to roll out cash transfer programs aiming to reach the most vulnerable or chronically poor. New methods are emerging on how to measure poverty, consumption expenditure, and asset wealth from high-resolution satellite imagery that may be able to provide systematic and objective-based measures on where these households are more likely to be found when other data is not available.
World Bank researchers Jonathon Hersh and David Newhouse together with Ryan Engstrom from George Washington University have been exploring how features—such as buildings, roads, and cars—derived from high spatial resolution satellite images can be helpful in predicting spatial variation in poverty in Sri Lanka. Simultaneously a consortium of researchers at Stanford have also been working on the problem using data on the African countries of Nigeria, Tanzania, Uganda, Malawi, and Rwanda. They recently published their results in Science.
How do they do it?
Each group uses a slightly different methodology. For the Sri Lanka poverty prediction, the researchers first use both machine learning (ML) algorithms and manual digitization to identify features (like number of cars, number and size of buildings, type of farmland, type of building materials, and road extent and material) and combine this with a series of textural and spectral measurements that detect the levels of contrast and image patterns (like edges and points) within satellite images. The next step is to use a linear model to regress the poverty rates in each administrative area (on average 2.17 square kilometers) on the features and textural measurements for that area, as well as a series of controls (like size of the area). This approach explains about 60 percent of the variation in the estimated share of the population in the bottom 40th percentile of the national income distribution. When the out-of-sample performance of the model is considered through cross-validation, they obtain an error rate of 33 percent at the 40th percentile poverty metric.
Researchers at Stanford first trained a ML algorithm popular for image recognition tasks (a convolutional neural network or CNN model) on a publicly available database of images (ImageNet), where images already have labels. This helps the model learn how to classify images correctly based on low-level image features such as edges and corners. Using this trained CNN model they apply it to a new problem—how to predict nighttime light intensity based on daytime satellite imagery. In this way the model learns which types of features detected from daytime satellite imagery are more likely to be predictive of poverty levels using the result that nightlights are positively correlated with economic activity, albeit noisily. The last step builds ridge regression models (an approach that prevents overfitting) to estimate average expenditure and wealth levels within administrative areas (covering 10-27 households) using the features identified from the second step. These ridge regressions can explain up to 56 percent (in Tanzania model) of variation in average household consumption and 75 percent (in Rwanda model) in average household asset levels. For predicting the share of the population in the bottom 40th percentile, the researchers show that their multi-stage approach outperforms models that just rely on variation in nightlight intensity. To examine the potential of the model to predict poverty out-of-sample, they use a pooled model trained using data from all of the countries and examine how well it predicts consumption and asset levels in each country individually. They find this model performs almost as well as each country-specific model does individually, suggesting that the model may be informative where ground truth data on consumption and assets is not available.
Figure 1: World Bank/George Washington University approach versus Stanford approach to predicting poverty
Both approaches have the ability to predict poverty from space at lower levels of geographic resolutions (e.g., villages) than what most national expenditure surveys can (e.g., districts or counties). In the Sri Lanka analysis, the researchers used a supervised approach that focuses on features and spectral measurements they ex-ante think will be more predictive, while with the Africa data, the researchers use an unsupervised approach. However, the results from both approaches find that information like the density of buildings, building materials, roads, type of agricultural area, and bodies of water are most important for predicting poverty. On the surface, this may not seem particularly revelatory; most of us already know that poverty is concentrated in remote areas, with lower road access, smaller and more rudimentary agricultural plots, and where household dwellings are made with basic materials. However, if we can predict poverty based on publicly available satellite imagery, this means that when survey data does not exist or is incomplete, there are other approaches to making objective decisions about where to focus poverty alleviation programs.
In the future it would also be interesting to explore whether satellite imagery can help us understand the dynamic nature of poverty. Would it be able to accurately detect seasonal adjustments in vulnerability? For example, can we extract information from satellite imagery that is correlated with changing agricultural yields, food prices, or livestock health? Could we also detect sensitive information about when households or communities move out of extreme poverty and could graduate from cash transfer programs?