International migration is one of the key drivers of demographic change. However, official statistics on “stocks of migrants”, i.e. how many people with origin country X are residing in country Y, are often unreliable. Reasons for this include the free movement of EU nationals within the EU, as well as generally inadequate census and civil registration systems for many developing countries.
Work done by Emilio Zagheni, Krishna Gummadi and myself tries to address some of the shortcomings of traditional methods to create migration
statistics by tapping into a new kind of data: audience estimates provided by
Facebook.
Facebook and other internet giants collect a
rich data set on their users to be able to serve more targeted and more
relevant advertising to their users. The data collected includes user
self-declared attributes such as age or gender, it includes meta data such as
the device or internet connection type used to access the service, it includes
third party information
such as credit card or voter registration data, and it includes attributes such
as topical interests inferred from behavior such as "liking" posts on
Facebook or visiting websites with social plugins.
See https://www.cision.com/us/2017/07/how-to-improve-social-media-targeting/
for a good list of available targeting options on Facebook, Twitter, LinkedIn
and Snapchat.
The detailed users profiles are generally not
available to researchers outside the companies. However, aggregate and
anonymized data is shared with potential advertisers in the form of audience estimates. Basically,
Facebook and other social networks provide advertisers with information on
"how many users match criteria X". For example, to help with planning
an advertising campaign, an advertiser could inquire "how many monthly
active Facebook users are married, male German expats aged 30-50 living in
Qatar"? Answer: 120 (as of Dec 20, 2017).
This type of real-time digital census over Facebook's could potentially be of value to augment existing population estimates, in particular for countries where official statistics are unreliable or outdated. However, due to selection biases and an estimated 13% of duplicate or fake accounts it is clear that using this data set as a simplistic enumeration tool for the whole population will not give accurate results. See https://www.theguardian.com/technology/2017/sep/07/facebook-claims-it-can-reach-more-people-than-actually-exist-in-uk-us-and-other-countries for more indications of shortcomings of the data.
In our own research, we do not use the raw
advertising audience estimates as the final answer. Rather we treat it as one
of potentially many input signals for an estimation task of the kind "how
many Germans are living in Qatar today"? As long as the biases in the
underlying data are either (i) uniform, e.g. 13% of duplicate or fake Facebook
accounts for all countries, or (ii) systematic, e.g. Western Europeans are
always less likely to be on Facebook compared to Arab nationals, an
appropriately fitted model can account for and correct such biases.
In our paper “Leveraging Facebook'sAdvertising Platform to Monitor Stocks of Migrants”,
Emilio, Krishna Gummadi and I show the feasibility of this approach to derive
stocks of migrants across different US states and around the world. Concretely,
we show that it is indeed possible to build models to make out-of-sample
predictions on how many people from a certain origin country are residing in a
particular US state. Similarly, it is possible to predict the percentage of
expats out of the whole population for countries around the globe.
Potentially, the Facebook audience estimates
could also give estimates for stocks of migrants at the sub-national and even
the sub-city level. To illustrate this, Matheus Araujo,
Michael Aupetit,
Yelena Mejova
and myself created a data visualization for the Facebook data for Doha: http://fb-doha.qcri.org.
As an example, this shows a density map of Nepali expats across
Doha, with the highest density in the Industrial Area.
The tool also shows that Nepali expats in Doha are predominantly male (93%) and
are Android users (94%). Contrast this to the same map for Western expats with the highest densities in West Bay and the Pearl.
Western expats are more gender balanced (44% female) and more likely to own
iPhones (56%). A similar visualization
for New York City can be explored at http://fb-nyc.qcri.org [Usage info for the two
data visualizations: Select several filters on the left to drill down to
smaller populations by nationality, gender or other criteria. Click a selection
again to de-select and revert to the whole category such as all nationalities
or all genders.]
Given Facebook’s global reach of 2.1B monthly active users
we believe there is a lot of potential in using this data source to support
global development efforts, in particular given its easy accessibility through
official APIs.
At the same time, no single data source is a cure-all and many have
complementary strengths. Satellite data has truly global reach and can give
estimates of population densities
but satellite data will never reveal the nationality or gender of earthlings.
Call detail records (CDR, https://en.wikipedia.org/wiki/Call_detail_record) are great
for studying dynamic changes in population density,
but there are limitations for monitoring international migration as people
often change their SIM cards once they move.
I’m truly optimistic that as Digital
Demography advances and matures as a field and as researchers start to work
collaboratively, combining different data sources, we will see more and more
scientific work with real impact on the creation of migration statistics. If
you’re interested in how to use new data sources and methodologies to help fill
data gaps around the globe, please get in touch by email at: iweber -atsignal -
hbku.edu.qa.
Using internet advertising data for studying
international migration (https://www.slideshare.net/IngmarWeber/using-internet-advertising-data-for-studying-international-migration)
Digital Demography - WWW'17 Tutorial - Part II (https://www.slideshare.net/IngmarWeber/digital-demography-www17-tutorial-part-ii)
Wrapper libraries to obtain Facebook advertising audience estimates:
Wrapper library in R (https://github.com/CSDE-UW/IUSSP-digital-demog-2017)
by Connor Gilroy (https://soc.washington.edu/people/connor-gilroy)
Wrapper library in Python (https://github.com/maraujo/pySocialWatcher) by Matheus Araujo (https://sites.google.com/view/matheusaraujo/)
All of my publications are available at https://ingmarweber.de/publications/.
Feel free to follow me at https://twitter.com/ingmarweber.