Brian Head is a research methodologist at RTI International. This post first appeared on SurveyPost on 20 May, 2014. You can follow Brian on Twitter @BrianFHead.
Survey researchers have become interested in big data because it offers potential solutions to problems we’re experiencing with traditional methods. Much of the focus so far has been on social media (e.g., Tweets), but sensors (wearable tech) and the internet of things (IoT) are producing an increasingly rich, complex, and massive source of data. These new data sources could lead to an important change in how individuals see the data collected about them, and thus have ramifications for those interested in gathering and analyzing those data.
Who compiles data?
Quantitative data about people have been gathered for millennia. But with technological advances and identification of new purposes for it, the past 100 years have seen significant increases in the amount of data produced and collected—e.g., data on consumer patterns and other market research, probability surveys, etc.
Common to these data are three factors: 1) the data are a commodity compiled, used, or traded by third parties; 2) generally there are no direct benefits to individuals about whom data are gathered; and 3) the organizations interested in the data gather, store, and analyze it. All this is not to say that throughout history individuals haven’t collected information about themselves. Individuals have collected qualitative data in the form of diaries and biographies. And, they have collected some quantitative data but this has generally to satisfy a third-party (e.g., collecting financial information to file taxes). But, now in addition to all of the data others compile about them, new technologies like wearable technologies (sensors) and IoT devices allow people to voluntarily produce and compile massive amounts of data about themselves and doing so can have a direct benefit to them. (Involuntary data collection through connected devices is already taking place—e.g., internet connected devices are being used for geo-targeting advertising).
Who owns or controls data?
Data are collected in different ways. Census data are collected periodically (intervals vary by nation) through a mandatory government data collection. Surveys generally operate under the requirement of voluntary participation, although there are exceptions. Much of the consumer data gathered now is done surreptitiously. Examples include browser cookies that collect information about the websites we visit, search engines that collect information about the internet searches people conduct, email providers that scan emails, and apps that use geodata to market goods and services to prospective clients.
It seems the public is increasingly aware of and concerned with the sum of these data collections. According to a recent Robert Wood Johnson Foundation (RWJF) study large majorities of self-tracking app/device users think (84%) they do or want (75%) to own data that are collected with the device. There have been attempts to limit data collection, such as the recent attempt to limit the data the U.S. government collects on citizens. Advocates of efforts like this tend to cite concerns over burden and privacy. The exponential growth of data collected both voluntarily and involuntarily through apps, sensors, and the IoT may cause similar (perhaps successful) attempts to change government and corporate policies to provide individuals more control over their data. In fact, market researchers are already beginning to respond to such an interest among consumers by offering to pay consumers for access to their browsing history, social network activity, and transactions they conduct online while at the same time giving those consumers control over which data they sell to the brokers.
As the amount of data collected about us increases, there’s a good chance individuals will increasingly see their data as their own, understand the value it has to various third parties, demand more control over it, and to be compensated for it. At first brush that may seem concerning. However, the type of compensation individuals’ desire for data will likely depend on how data will be used. For example, consumers are likely to continue to trade data for convenience in services (see thesis # 12). And, the RWJF report cited above suggests the usual leverages used to gain survey participation—e.g., topic salience and altruism—may work in gaining access to big data when the purpose of the study is for “public good research.”
Need for further research
Further research is needed in this area of big data to answer questions like: 1) to what extent, and how soon, will a larger proportion of the population begin to voluntarily use sensor and IoT devices; 2) will the general public continue to tolerate involuntary data collection when those data are collected by connected devices; 3) will the general public have opinions similar to early adopters in the RWJF about sharing personal data from connected devices with survey researchers; 4) will the leverages that work for gaining survey participation work for gaining access to personal big data or will new/additional leverages be needed; 5) will we be able to use techniques similar to those used to access administrative record data or will we need to develop new protocol for seeking permission to access these data? I look forward to seeing and contributing toward the research to answer these questions. What are your thoughts?