Showing posts with label conference. Show all posts
Showing posts with label conference. Show all posts

Thursday, 6 November 2014

You Are What You Tweet: An Exploration of Tweets as an Auxiliary Data Source

Ashley Richards is a survey methodologist at RTI International. This post first appeared on SurveyPost on 29, July 2014. 

Last fall at MAPOR , Joe Murphy presented the findings of a fun study he did with our colleague, Justin Landwehr, and me. We asked survey respondents if we could look at their recent Tweets and combine them with their survey data. We took a subset of those respondents and masked their responses on six categorical variables. We then had three human coders and a machine algorithm try to predict the masked responses by reviewing the respondents’ Tweets and guessing how they would have responded on the survey. The coders looked for any clues in the Tweets, while the algorithm used a subset of Tweets and survey responses to find patterns in the way words were used. We found that both the humans and machine were better than random in predicting values of most of the variables.

We recently took this research a step further and compared the accuracy of these approaches to multiple imputation, with the help of our colleague Darryl Creel. Imputation is the approach traditionally used to account for missing data and we wanted to see how the nontraditional approaches stack up. Furthermore, we wanted to check out these approaches because imputation cannot be used in the case where survey questions are not asked. This commonly occurs because of space limitations, the desire to reduce respondent burden, or other factors. I will be presenting on this research at the upcoming Joint Statistical Meetings (JSM), in early August. I’ll give a brief summary here, but if you’d like more details on it please check out my presentation or email me for a copy of the paper.

Income was the only variable for which imputation was the most accurate approach, but the differences between imputation and the other approaches were not statistically significant. Imputation correctly predicted income 32% of the time, compared to 25% for human coders and 26% for the machine algorithm. Considering that there were four income categories and a person would have a 25% chance of randomly selecting the correct response, I am unimpressed with these success rates of 25%-32%.

Human coders outperformed imputation on the other demographic items (age and sex), but imputation was more accurate than the machine algorithm. For these variables, the human coders picked up on clues in respondents’ Tweets. I was one of the coders and found myself jumping to conclusions, but I did so with a pretty good rate of success. For instance, if a Tweeter said “haha” a lot or used smiley faces, I was more likely to guess the person was young and/or female. These are tendencies that I’ve observed personally but I’ve read about them too.

As a coder I struggled to predict respondents’ health and depression statuses, and this was evident in the results. Imputation was better than humans at predicting these, but the machine algorithm was even more accurate. The machine was also best at predicting who respondents voted for in the previous presidential election, with human coders in second place and imputation in last place. As a coder I found that predicting voting was fairly simple among the subset of respondents who Tweeted about politics. Many Tweeters avoided the subject altogether, but those who Tweeted about politics tended to make it obvious who they supported.

twitter_predictions
So what does this all mean? We found that even with a small set of respondents, Tweets can be used to produce estimates with accuracy in the same range or better[1] as imputation procedures. There is quite a bit of room for improvement in our methods that could make them even more accurate. For example, we could use a larger sample of Tweets to train the machine algorithm and we could select human coders who are especially perceptive and detail-oriented. The finding that Tweets are as good or better as imputation is important because imputation cannot be used in the case where survey questions were not asked.

As interesting as these findings may be, they need to be taken with a grain of salt, especially because of our small sample size (n=29).[2] Relying on Twitter data is challenging because many respondents are not on Twitter, and those who are on Twitter are not representative of the general population and may not be willing to share their Tweets for these purposes. Another challenge is the variation in Tweet content. For example, as I mentioned earlier, some people Tweet their political views while others stay away from the topic on Twitter.

Despite these limitations, Twitter may represent an important resource for estimating values that are desired but not asked for in a survey. Many of our survey respondents are dropping clues about these values across the Internet, and now it’s time to decide if and how to use them. How many clues have you dropped about yourself online? Is your online identity revealing of your true characteristics?!?

[1] Even if approaches using Tweets may be more accurate than imputation, they require more time and money and in many cases may not be worth the tradeoff. As discussed later, these findings need to be taken with a grain of salt.

[2] We had more than 2,000 respondents, but our sample size for this portion of the study was greatly reduced after excluding respondents who don’t use Twitter, respondents who did not authorize our use of their Tweets, and respondents whose Tweets were not in English. Furthermore, half of the remaining respondents’ Tweets were used to train the machine algorithm.

Monday, 14 July 2014

Exhausted, intrigued and all about social media!

Kelsey Beninger is a researcher at NatCen Social Research and can be contacted through Twitter @KBeninger.

Attending two conferences related to social media the past three days took me on a long journey, and not just from the 6 train rides, nonexistent mobile reception or from navigating my way around overly complex campus layouts! My journey also included the reflection on different approaches to talking about social media research and the need, now more than ever, to be transparent about what we do, why we do it and what is wrong with our approaches (because we aren’t perfect- it’s too early days!).

At the European Conference on Social Media in Brighton I was fortunate to meet and learn from some of the people from the 35 different countries represented. While I wont shy away from saying that I think the strongly academic focus was a wee bit theoretical for my applied social policy mind, I did learn about very different types of projects, from e-health platforms in Lithuania and online civic participation in Egypt and Malaysia, to keynotes including the evolution of social media and the feasibility of moving knowledge cafes to the online space.

The enthusiasm and excitement for social media was palpable throughout the conference and inspired hope for future innovations. I wondered, though, about how presentations I attended focused more so on the research outcome- the findings. What lacked was clarity on what means led to the end- the design and methodology. Indeed this emerged in discussions with participants who were also student supervisors. A reoccurring theme was what theory should I tell my kids to use who are wanting to use social media in their dissertation? Or, what’s the best tool, the best approach? What issues do my students need to be aware of from the beginning? There are no right answers but, yes, it would be quite handy to have an idea of the approaches, tools, considerations and adjustments researchers have used to succeed in their work so we can inform future work.

I led a roundtable on just this, one of just two roundtables in two full days of lectures. I was a bit worried there wouldn’t be dialogue but participants quickly opened up about challenges they are facing and it very much echoed the work we at @NatCen and @NSMNSS have been doing this year (see this post and this one too). Challenges of distinguishing between what is legal or allowed with research using social media brought an engaging debate amongst attendees to the roundtable. This provoked alot of discussion as some attendees initially viewed this point in black and white before the discussion turned to the grey area perceived by social media users- while T&Cs of a platform or country digital technology laws may make the use of data for research legal, the expectations of some users view it as the moral obligation of a researcher to still gain consent or anonymise data.

Also flagged up was the difficulty of defining social media (by far the most heated topic!), knowing what questions to ask to know what to do next in your design, how to sample ‘properly’ online, understanding big data collection tools and their weaknesses. Yet none of this came up in the other sessions of the day that I attended. The session may have left some people feeling at a loss for solutions and I believe that it is ok to not have a strict answer. What the session made obvious for me was the importance of researchers talking, sharing, questioning. Without more of that, then research will stagnate and researchers will get complacent.

The Research Methods Festival in Oxford was equally diverse and engaging, yet provided what I felt was missing from the ECSM2014- the methods! I presented at and attended a panel on the opportunities and challenges of social media research (See slides here) and it was so refreshing to have all people in the room be open and honest about what they can and can’t do with their research. @Donna_Peach of @SocPHD discussed how she created a community of practice on twitter and on a website, sharing examples of collaboration between different groups. The guys at CASM discussed Natural Language Processing and shared a great discrete example that even my teeny mind can comprehend! COSMOS also gave us a sneak peak of their new platform. I was intrigued about the Visual Media Lab discussed by Farida Vis, a new collaboration with the aim of tapping into the visual data online (often overlooked for the text that is more commonly the focus of research).

 Basically there is one moral to my ramblings: as a researcher don’t be afraid to bare your research design soul. Without more of that how can the field persevere and innovate, gain credibility from the skeptics and avoid reinventing the wheel?

Check out #ECSM2014 and #RFM14 twitter feeds for a roundup of both conferences.

 Papers and slides from ESRC Research Methods Festival 2014 are here: http://www.ncrm.ac.uk/RMF2014/programme/session.php?id=E8

Review the sessions at European Conference on Social Media 2014: http://academic-conferences.org/ecsm/ecsm2015/ecsm15-timetable.htm

Tuesday, 15 April 2014

Back to University: Summary of ‘Research Ethics into the Digital Age’ Conference

Kelsey Beninger is a researcher at NatCen Social Research.

Recently I was invited to speak at Sheffield University’s ‘Research Ethics into the Digital Age’ Conference. It was an exciting opportunity, not in the least because it involved a swanky pre-conference speaker’s dinner. But really, it was exciting because the University was celebrating ten years of its research ethics committee and was launching their new purpose-built online ethics submission portal. Paper-based applications, be gone!

Usually at these type of ethics and internet mediated research events there are a diverse bunch of cross discipline and cross institution technicians, practitioners, ethical specialists, to name a few. This event had a diverse audience but in a different way; they were mostly from Sheffield University. The great turnout demonstrated how there was not only commitment to high ethical standards, but actual interest from across departments and job roles at the University. I met a few administrators that manage the huge numbers of ethics applications, members of the university research ethics committee, and students and professors galore!

The morning had a great line up of key note speakers. Professor Richard Jenkins from Sheffield University provided a nice overview of ethics in international projects around three themes:

  1. data must satisfy the host country’s legal and ethical requirements,
  2. data must satisfy your university’s REC policy, and
  3. data must satisfy the professional standards of the profession you are associated with.
Some obvious but important key points included knowing the specific cultural, legal and social laws of the country you are going to research in before you get there and keep a detailed paper trail of everything you do. Also, you share risks with any collaborators so if they don’t have governance framework in their country then discuss it early and encourage the application of your UK standards. The moral of the story: the ethical situation is more complex with international research but the responsibilities you have as a researcher with respect to ethics are the same.

Professor Joe Cannataci from the University of Malta cut the legal jargon and conveyed important points about data protection and the use of personal data from the internet. He drew on an intriguing array of international projects (one of which may have involved a funny story of him dancing his entrance to gain acceptance when meeting a rural tribe in Malaysia). He started his presentation by discussing the principle of relevance in data protection law. This is something many of us in research are familiar with- collect only the right data, collected only by the right people, at the right time and used by the right people in the right way for an agreed time. Say that 10 times fast! Of particular relevance to researchers beginning studies across Europe is knowing where the data is to be stored because there are different data protection laws in European countries compared to EU countries.

Next up was Claire Hewson from the Open University. Claire provided an overview of the challenges associated with the ethics of internet mediated research. A point that got me thinking ‘Is there truly an ‘unobtrusive’ type of data collection?’ was her distinction between obtrusive and unobtrusive methods. Obtrusive are activities such as actively recruiting individuals and those individuals knowingly partake in research. Unobtrusive methods included big data, data mining, and observations. It’s only unobtrusive because people are not aware of it in the first instance. It would become pretty obtrusive to some if participants were cognisant of what was being done with their personal data. A nice take away point from the presentation was that ‘thinking is not optional’ when it comes to applying ethical frameworks to changing online environments.

After a tasty lunch in Sheffield University’s lovely new student union building my turn was up! I delivered a session on the challenges of social media research, drawing upon recent exploratory research with users of social media (http://www.natcen.ac.uk/our-research/research/research-using-social-media-users-views/) about their views on researchers using their data in their work. I also drew on the work of the network and the survey conducted by network member Janet Salmons on researcher and practitioner views of the ethics of online research. My group explored the challenges associated with three themes: recruitment and data collection; interviewer identity and wellbeing; and analysis and presentation of data. Summarising key points that we as researchers are familiar with, I pushed the issues home by using direct examples from our exploratory research.

We wrapped up with a section on recommendations including only using social media in research if it is appropriate for your question; being transparent with participants and other researchers about risks of the research and the limitations of your sample; taking reasonable steps to inform users of your intention to utilise their data in research. Read the full list of recommendations in the report, here.