Joe Murphy is a senior survey methodologist with over 17 years of research and project management experience. Mr. Murphy has extensive experience developing and applying new technologies and modes of communication to improve the quality, relevance, and efficiency of survey research. His recent work has centered on the use and analysis of social media to supplement survey data, with a detailed focus on Twitter. Mr. Murphy also investigates optimal designs for multi-mode data collection platforms, data visualization, crowdsourcing, and social research in virtual worlds. Mr. Murphy is a demographer by training and survey methodologist by practice. His significant research experience includes the substantive topics of energy, hospitals and health care, and substance use and mental health. Mr. Murphy is also a proficient SAS programmer, experienced in the analysis and manipulation of large, complex data sets. @joejohnmurphy
1. Twitter is like a giant opt-in survey with one question.
Twitter started in 2006 with a simple prompt for its users: “what are you doing?” From a survey methodologist’s perspective, this isn’t really optimal question design. How people actually use Twitter is so varied, there might as well be no question at all. We aren’t used to working with answers to a question no one asked, and Twitter is a good example of what has been described as "organic data" – it just appears without our having designed for it. Tweets are limited to 140 characters in length. Pretty short, but a Tweet can capture a lot of information, and include links to other websites, photos, videos, and conversations.
2. Twitter is massive.
Every day, half a billion Tweets are posted. Half a billion! That means by the time you finish reading this, there will be approximately one million new Tweets. And the pace is only growing. With Twitter’s application programming interface (API) you can pull from a random 1% of Tweets. To get at all Tweets, or the Firehose (100% of Tweets), you need to go through one of a few vendor and for a fee, though the Library of Congress is working on providing access in the future.
3. Twitter is increasingly popular on mobile devices like smartphones and tablets.
You’ll see people tweeting at events, as news is happening right in front of them, or where you don’t really expect or want to see them tweeting, like while they’re driving. Many use Twitter on mobile devices with another screen on at the same time. That’s called multiscreening. Like when people tweet while watching television in a backchannel discussion with friends and fans of their favourite shows.
4. The user-base is large, but it doesn’t exactly reflect the general population.
It would be kind of weird it if did, honestly. There are surely many factors that influence the likelihood of adoption and wouldn’t it be surprising if we saw no differences by demographics? The Pew Research Center estimates 16% of online Americans now use Twitter, and about half of those do so on a typical day. Users are younger, more urban, and disproportionately black non-His- panic compared to the general population. This is interesting when thinking about new approaches for sometimes hard-to- reach populations.
5. It is made up of more than just people.
Twitter is not cleanly defined with one account per person or even just one person behind every account. Some people have multiple accounts and some accounts are inactive. Groups and organizations use Twitter to promote products and inform followers. They can purchase “promoted Tweets” that show up in users’ streams like a commercial. And watch out for robots! Some soft- ware applications run automated tasks to query or Retweet con- tent making it extra challenging when trying to interpret the data.
6. There are research applications beyond trying to sup- plant survey estimates.
Think about the survey lifecycle and where there may be needs for a large, cheap, timely source of data on behaviours and opinions or a standing network of users to provide information. In the design phase of a survey, can we use Twitter to help identify items to include? Can we identify and recruit subjects for a study using Twitter? How about a diary study when we need a more continuous data collection and want to let people work with a system they know instead of trying to train them to do something unfamiliar? Can Twitter be used to disseminate study results? What about network analysis? Is there information that can be gleaned from someone’s network of friends and followers, or the spread of tweets from one (or few) users to many? We often think of public opinion as characterizing sentiment at a specific place and time, but are there insights to be had from Twitter on opinion formation and influence?
7. Twitter is cheap and fast, but making sense of it may not be.
What’s the unit of analysis? Can we apply or adapt the total survey error framework when looking at Twitter? What does it mean when someone tweets as opposed to gives a response in a survey? Beyond demographics, how do Twitter users differ from other populations? How can we account for Twitter’s exponential growth when analysing the data? The best answer to each right now is “it depends” or “more research is needed.” We need a more solid understanding and some common metrics as we look to use Twitter for research. Work on this front is beginning but has a long way to go.
8. Naïve and general text mining methods for tweets can be severely lacking in quality.
The brevity of tweets, inclusion of misnomers, misspellings, slang, and sarcasm make sentiment analysis a real challenge. We’ve found the off-the-shelf systems pretty bad and inconsistent when coding sentiment on tweets. If you’re going to do automated sentiment analysis, be sure to account for nuances of your topic or population as much as possible and have a human coding component for validation. One approach we’ve found to be promising is to use crowdsourcing for human coding of tweet content.
9. Beware of the curse of Big Data and the file cabinet effect.
Searching for patterns in trillions of data points, you’re bound to find coincidences with no predictive power or that can’t be replicated. The file cabinet effect is when researchers publish exciting results about Twitter but hide away their null or negative findings.
10. Surveys aren’t perfect either.
Surveys are getting harder to complete with issues like declining response rates and reduced landline coverage. Twitter isn’t a fix-all but it may be able to fill some gaps. It’ll take some focused study and creative thinking to get there.