|Position:||Chancellor's Associate Professor, School of Information and Public Policy|
Director, Data-Intensive Development Lab
Faculty Co-Director, Center for Effective Global Action
University of California, Berkeley
|My work focuses on using novel data and methods to try and improve the lives of disadvantaged people around the world. You can find a short professional bio here.|
Aiken, E, Bellue, S, Karlan, D, Udry, C, and Blumenstock, JE (2021). Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance, Revise and Resubmit, Nature
Targeting is a central challenge in the administration of anti-poverty programs: given available data, how does one rapidly identify the individuals and families with the greatest need? Here we show that non-traditional “big” data from satellites and mobile phone networks can improve the targeting of anti-poverty programs. Our analysis compares outcomes – including exclusion errors, total social welfare, and measures of fairness – under different targeting regimes. Relative to the geographic targeting options considered by the Government of Togo at the time, the machine learning approach reduces errors of exclusion by 4-21%. These results highlight the potential for new data sources to contribute to humanitarian response efforts, particularly in crisis settings when traditional data are missing or out of date.
Chi, G, Fang, S, and Blumenstock, JE (2021). Micro-Estimates of Wealth and Poverty for all Low- and Middle-Income Countries, Accepted, Proceedings of the National Academy of Sciences
Many critical policy decisions, from strategic investments to the allocation of humanitarian aid, rely on data about the geographic distribution of wealth and poverty. Yet many poverty maps are out of date or exist only at very coarse levels of granularity. Here we develop the first micro-estimates of wealth and poverty that cover the populated surface of all 135 low and middle-income countries (LMICs) at 2.4km resolution. The estimates are built by applying machine learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook. We train and calibrate the estimates using nationally-representative household survey data from 56 LMICs, then validate their accuracy using four independent sources of household survey data from 18 countries. We also provide confidence intervals for each micro-estimate to facilitate responsible downstream use...
Blumenstock, JE, Chi, G, and Tan, X (2021). Migration and the Value of Social Networks, Conditionally Accepted, Review of Economic Studies [pdf]
How do social networks influence the decision to migrate? Prior work suggests two distinct mechanisms that have historically been difficult to differentiate: as a conduit of information, and as a source of social and economic support. We disentangle these mechanisms using a massive 'digital trace' dataset that allows us to observe the migration decisions made by millions of individuals over several years, as well as the complete social network of each person in the months before and after migration. These data allow us to establish a new set of stylized facts about the relationship between social networks and migration. Our main analysis indicates that the average migrant derives more social capital from 'interconnected' networks that provide social support than from 'extensive' networks that efficiently transmit information.
Rolf, Simchowitz, Dean, Liu, Björkegren, Hardt, and Blumenstock (2020). Score-Based Classifiers for Welfare-Aware Machine Learning, International Conference on Machine Learning (ICML '20) [pdf]
While real-world decisions involve many competing objectives, algorithmic decisions are often evaluated with a single objective function. We study algorithmic policies which explicitly trade off between a private objective (such as profit) and a public objective (such as social welfare). We analyze a natural class of policies which trace an empirical Pareto frontier based on learned scores, and focus on how such decisions can be made in noisy or data-limited regimes. Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies.
Blumenstock, JE, Callen, M, and Ghani, T (2018). Why Do Defaults Affect Behavior? Experimental Evidence from Afghanistan, American Economic Review, 108 (10), 2868-2901 [pdf]
We report on an experiment examining why default options impact behavior. By randomly assigning employees to different varieties of a salary-linked savings account, we find that default enrollment increases participation by 40 percentage points -- an effect equivalent to providing a 50% matching incentive. We then use a series of experimental interventions to differentiate between explanations for the default effect, which we conclude is driven largely by present-biased preferences and the cognitive cost of thinking through different savings scenarios. Default assignment also changes employees' saving habits, and makes them more likely to actively decide to save after the study concludes.
Blumenstock, JE (2018). Don't forget people in the use of big data for development, Nature, 561 (7722), 170-172 [pdf]
Aid organizations, researchers and private companies are looking for ways to leverage the 'data revolution' to transform international development. In the rush to find technological solutions to complex global problems, however, there's a danger that we get by distracted the technology and lose track of the deeper issues that are unique to each local context... The CEO of a popular big-data platform recently described data science as "a blend of Red-Bull-fueled hacking and espresso-inspired statistics." In my view, the successful use of big data in development will require a data science that is considerably more humble than this version that has captured the popular imagination.
Blumenstock, JE, Cadamuro, G, On, R (2015). Predicting Poverty and Wealth from Mobile Phone Metadata, Science, 350(6264), 1073-1076 [pdf]
Accurate and timely estimates of population characteristics are a critical input to social and economic research and policy. We show that an individual's past history of phone use can be used to infer his or her socioeconomic status, and that the predicted attributes of millions of individuals can in turn be used to accurately reconstruct the distribution of wealth of an entire nation, or to infer the asset distribution of micro-regions comprised of just a few households. In resource-constrained environments where censuses and household surveys are rare, this creates an option for gathering localized and timely information at a fraction of the cost of traditional methods.
Working Papers / Active Projects
Violence and Financial Decisions: Evidence from Mobile Money in Afghanistan - joint with Michael Callen, Tarek Ghani, and Robert Gonzalez (Accepted, Review of Economics and Statistics)
We provide evidence that violence changes the financial decisions people make. Exploiting the quasi-random timing of several thousand violent incidents in Afghanistan, we show that individuals who are exposed to violence retain more cash and are less likely to adopt and use mobile money, a new financial technology. This effect is corroborated using three independent sources of data: (i) the universe of mobile money transactions in Afghanistan; (ii) high-frequency data from a randomized experiment designed to increase mobile money adoption; and (iii) a behavioral lab-in-the-field experiment with experienced mobile money users. Collectively, the evidence highlights an economic cost of violence that operates through individual beliefs, which is large enough to impede the development of formal financial systems in conflict settings.
Many decisions that once were made by humans are now made using algorithms. These algorithms are typically designed with a single, profit-related objective in mind: Loan approval algorithms are designed to maximize profit, smart phone apps are optimized for engagement, and news feeds are optimized for clicks. However, these decisions have side effects: irresponsible payday loans, addictive apps, and fake news can harm individuals and society. This project develops and tests a new paradigm for prioritizing the social impact of an algorithmic decision from the start, rather than as an afterthought. The key insight is to leverage recent advances in machine learning -- which make it possible to predict who will benefit from a decision and how -- to design algorithms that balance those predicted benefits alongside traditional profit-related objectives.
Manipulation-Proof Machine Learning - joint with Daniel Björkegren and Samsun Knight
An increasing number of decisions are guided by machine learning algorithms. In many settings, from consumer credit to criminal justice, those decisions are made by applying an estimator to data on an individual's observed behavior. But when consequential decisions are encoded in rules, individuals may strategically alter their behavior to achieve desired outcomes. This paper develops a new class of estimator that is stable under manipulation, even when the decision rule is fully transparent. We explicitly model the costs of manipulating different behaviors, and identify decision rules that are stable in equilibrium. Through a large field experiment in Kenya, we show that decision rules estimated with our strategy-robust method outperform those based on standard supervised learning approaches.
How Do Firms Respond to Insecurity? Evidence from Afghan Phone Records - joint with Tarek Ghani, Sylvan Herskowitz, Ethan B. Kapstein, Thomas Scherer, and Ott Toomet
We provide new evidence on how insecurity affects firm behavior by linking data on violent conflict in Afghanistan to geo-stamped corporate mobile phone records. We begin by developing a method for observing firm location choice with phone data, and validate these measurements using independent sources of administrative and survey data. Next, we show that deadly terrorist attacks reduce the presence of firms in targeted districts by 4-6%. The effect includes both an increase in the local exit of existing firms following attacks and a decrease in new firm entry. We find large negative spillovers from attacks in provincial capitals on firm presence in nearby rural districts. After violence, employees in provincial capitals are 33% more likely to move to Kabul and 15% more likely to leave for another province.
Scalable Methods for Discovering Latent Structure in Societal-Scale Data - joint with Sham Kakade
The proliferation of digital devices has created an unparalleled opportunity to observe, model, and understand the changing structure of social networks in developing and conflict-affected states. However, current state-of-the-art computational methods used to analyse such data are notoriously ill-suited to answer basic, fundamental questions in the social science and policy arena. While many new, provably efficient algorithms for community detection have been recently developed, these methods have several key limitations: they rarely scale to real-world datasets consisting of millions of interconnected actors; they are not applicable to dynamic contexts where network structure evolves over time; and they are almost never validated. This project adapts recent algorithmic advances in theoretical computer science to build scalable tools capable of reliably discovering hidden structure in societal-scale network data.
(Machine) Learning what Governments Value - joint with Daniel Björkegren and Samsun Knight
This paper develops a method to uncover the values consistent with observed allocation decisions. We use machine learning estimators for heterogeneous treatment effects to identify who benefits from an allocation. We then decompose the objective underlying the allocation into: differential (i) treatment effects, (ii) welfare weights between entities; and (iii) impact weights across outcomes. We apply this approach to Mexico's PROGRESA anti-poverty program and estimate the preferences consistent with its design. We find evidence of heterogeneous impacts by income and age; accounting for this heterogeneity, allocations imply higher welfare weights on the indigenous, poor, and for families with more children. The implied value of each missed school day and child sick day is estimated imprecisely but does not rule out conventional valuations or preferences reported by Mexican residents. Alternate eligibility criteria could have improved either average consumption, health or schooling outcomes.
Program Targeting with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan - joint with Emily Aiken, Guadalupe Bedoya, Aidan Coville
Can mobile phone data improve program targeting? By combining rich survey data from a “big push” anti-poverty program in Afghanistan with detailed mobile phone logs from program beneficiaries, we study the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for program benefits from ineligible households. We show that supervised learning methods leveraging mobile phone data can identify ultra-poor households nearly as accurately as survey-based measures of consumption and wealth; and that combining survey-based measures with mobile phone data produces classifications more accurate than those based on a single data source.
The Impact of Mobile Phones: Experimental Evidence from the Random Assignment of New Cell Towers - joint with Niall Keleher, Arman Rezaee, Erin Troland
We present experimental evidence on the economic impacts of mobile phone access. Our results are based on a randomized control trial in the Philippines, through which 14 isolated and previously unconnected villages were randomly assigned to either receive or not receive a new cellphone tower. Following a pre-analysis plan, we find that the introduction of mobile phones had large and significant impacts on household income and expenditure, particularly for wage workers. Mobile phone access also increased social connections within and between communities. However, there are no consistent impacts on market access, informedness, or subjective well being. In post-specified analysis, we find suggestive evidence that the improved economic conditions are driven by increases in migration, remittances, and self-employment. Working paper available by request.
How Important are the Yellow Pages? Experimental Evidence from Tanzania - joint with Brian Dillon and Jenny Aker
Mobile phones reduce the cost of communicating with existing social contacts, but do not eliminate frictions in forming new relationships. We report the findings of a two-sided randomized control trial in central Tanzania, centered on the production and distribution of a "yellow pages" phone directory with contact information for local enterprises. Enterprises randomly assigned to be listed in the directory receive more business calls, make more use of mobile money, and employ more workers. There is evidence of positive spillovers, as both listed and unlisted enterprises in treatment villages experience significant increases in sales relative to a pure control group. Households randomly assigned to receive copies of the directory make greater use their phones for farming, are more likely to rent land and hire labor, have lower rates of crop failure, and sell crops for weakly higher prices. Willingness-to-pay to be listed in future directories is significantly higher for treated enterprises.