Showing posts with label Twitter. Show all posts
Showing posts with label Twitter. Show all posts

Thursday, 4 August 2016

Consultation – Asking consent to link Twitter data and survey data

Curtis Jessop is a Senior Researcher at NatCen Social Research where he works on longitudinal surveys and is the Network Lead for the NSMNSS network

Next year, I, along with Luke Sloan and Tarek Al Baghal, will be running an experiment on the Understanding Society Innovation Panel to look at the feasibilities and practicalities of linking Twitter and survey data in a longitudinal context, and how they can be combined to improve the quality of both.

We will be specifically focussing on how survey data can be enhanced using social media data (for example by creating new measures, validating survey estimates or improving non-response adjustments) and how social media data can be validated using survey data. However, we are aware that such a dataset has a greater potential than this, so we are also thinking about the ethics and practicalities that may be involved in making this dataset available more widely.

This will no doubt be tricky – as far as we are aware it is unprecedented to attempt to link data in this way and make it available to wider set of researchers, and it is therefore difficult to predict what issues may arise. We therefore aim to be as open as possible about these issues; this will involve documentation of the choices we make so others may learn from mistakes we may make, but we would also like to consult with the wider research community at key points in the process:

  • Consent to data linkage
  • Social media data collection/linking to survey data
  • Data archiving

We are currently at the first stage – asking consent to link participants’ survey data and Twitter data. To an extent, by asking consent we are going beyond what many social media researchers may do, but by linking to survey data and aiming to archive, this changes the dynamic somewhat.

There are constraints to what we can do: the survey will be administered in web, telephone or face-to-face modes, so the process must work in all contexts. There is also limited questionnaire space, so we cannot add any more questions, and we also need to consider burden on the participant – a large amount of information may overwhelm and leave them less informed.

Below, I have outlined the template for the three questions we would like to ask. We are proposing to use ‘help links’ during the questionnaire to allow the participant to find out more information if they want it online, or an interviewer to answer questions in an interviewer-administered mode:

Q1 [Ask All]
Do you have a personal Twitter account?
1. Yes
2. No

Q2 [IF Q1 = Yes]
We are interested in being able to link people’s answers to this survey to the ways in which they use Twitter. We would also like to know who uses Twitter.

We will not use your tweets to identify you in any way and your Twitter information will be treated as confidential and given the same protections as your interview data. Your Twitter name, and any information that would allow you to be identified would not be published.

HELP SCREEN: What data will you collect from my Twitter account?
HELP SCREEN: What will the data be used for?
HELP SCREEN: Who will be able to access the linked data?
HELP SCREEN: What will you do to protect my data?

Are you willing to tell me the name of your personal Twitter account and for your Twitter information to be linked with your answers to this survey?
1. Yes
2. No

Q3 [IF Q2 = Yes]

INTERVIEWER: Please enter the respondent’s Twitter name here: [OPEN]

We would really appreciate any feedback you may have on what information we might include in these help links, or how we might change the question wording/ administration. If you do also have any thoughts that may not be possible in this context, they would also still be useful to hear so we can document them for others that may want to do this in the future.

If you have any suggestions, or would like to discuss this further, please do contact me at curtis.jessop@natcen.ac.uk. As we need to submit our final version of the question text to ISER by mid-September, please do try to get any comments to me by the end of August. 

Wednesday, 20 May 2015

Getting to ‘Yes’ in the digital age

Last Monday (11th May) the SRA, along with NSMNSS, hosted a seminar looking at ethics and informed consent in the context of online research. As long-standing network members will know, the ethical challenges of conducting research has been something that has consistently cropped up as a core concern for social scientists looking to use social media in their research, and this event demonstrated that this continued to be the case.


Matt Williams from the Social Data Science Lab kicked off proceedings with a discussion of his experiences of ethics in social media research based on his work with COSMOS, a platform for accessing Twitter data.
Matt focussed on some of the issues relating to publishing research based on Twitter data, and how Twitter’s policies relating to the publication of Tweets (showing names, @usernames, and unmodified Tweet text) can clash with researchers requirements of anonymity and protecting participants from harm, before outlining their approach to ‘risk assessment’ before publishing Tweets.
He also introduced some survey research suggesting that social media users were split in how concerned they were about their data being used for research, and that some types of users were more likely to be concerned than others. However, participants seemed to be less concerned about their data being used by universities than government or commercial organisations.


Following Matt (and some role playing!), Janet Salmons talked through some of her experiences of the best way of getting informed participation from potential research respondents in an online environment.
Janet emphasised the importance of building trust and credibility with research participants, and that different types of online communication actually provide opportunities to engage and inform participants; the key is thinking about who your target population is.

For those who missed out, or went and didn’t get everything down at the time, you can find copies Matt and Janet’s slides here and here, and Janet has also provided some additional resources about ethics and online research here.

As is often the case, perhaps the most interesting session was the Q&A at the end, chaired by NatCen’s Kandy Woodfield. The panel & audience discussed a range of questions covering topics such as the difficulties (and ethics) involved in removing participants once analysis has started (or been published!), whether consent can ever be truly ‘informed’, and whether participants need to be reminded of consent for passive (ongoing) observation.
Unfortunately (although perhaps not unexpectedly), there was not enough time to answer all of the attendees’ questions, so NSMNSS are going to be hosting a follow-up #NSMNSS Tweetchat on the ethics of online research on Monday 1st June at 5pm (UK time).

If you have any questions that you would like to put to the NSMNSS community, please message us @NSMNSS, email us at nsmnss@natcen.ac.uk, or simply leave a comment below!

Wednesday, 21 January 2015

New Year, New #NSMNSS Twitter Chats, Vote Now!

Last year we had a number of successful twitter chats on topics from generating representative online datasets to the changing roles of social media researchers in an increasingly mediated world. Based on topics arising from our previous discussions, and current issues and emerging trends in online research, we have a number of topics for upcoming #NSMNSS twitter chats this year (see below). However, we really want our chats to be led by you - which topic we should discuss first?  Have a read and cast your votes here!  We will announce the winning topic on Twitter.  The first chat is on Tuesday 3rd February at 5pm (UK time. You can time check for your location here).
  1. To blog or not to blog? What a question! Why are we blogging (or not)? What are the opportunities and challenges of blogging for our research? What are the relationships between blogging and traditional forms of publishing? How can we use quantitative metrics to know more about our audiences and what other information do we need? Can blogging improve scholarship? How can we avoid running out of steam?
  2. How can we use social media to reflect participants’ voices?  An experiment in sharing our experiences of representing/reflecting participants voices through different mediums and ideas for how research might make use of social media to co-produce research with participants. How might Pinterest be used to reflect participants’ experiences? How might twitter be used to generate insights on people’s lives?
  3. The first #NSMNSS Research Exchange of Favourite Projects and Fresh Ideas: What social media research and initiatives inspire you? What are your ambitions and aims for social media research projects? An opportunity to pool our resources and links to useful articles/websites/tools too!
  4. Big data discussion!  How can we define ‘big data’? Where do you start in a ‘big data’ project (tips/advice)? What are the strengths and limitations of using ‘big data’? How can we ensure users ethical rights? What about ‘small data’? What does the future hold for social sciences in terms of new forms of data? Not to forget, we will also discuss ethical issues in big data research!
  5. Ways to say ‘I do’ in social media research! How are traditional forms (e.g. print out, sign here) in play for gaining consent in the online world? Which electronic forms work well? Can we record a verbal agreement to participate in research? What about terms and conditions attached to the sites people use? Should we seek permission to quote from social media users?
  6. How to pick a platform in social media research: who is using what and why? Where are our participants? How does the platform shape the research? How should we attend to our online identities as researchers? Sharing our experiences of using different platforms in research.

#NSMNSS chats will run on the first Tuesday of every month at 5pm (UK time). Remember to include #NSMNSS in all your posts to help us capture all of the discussion. We will provide a transcript of the Tweetchat on our blog following the event.

Don’t forget to vote! Suggestions for other topics are also most welcome!

 

Wednesday, 10 December 2014

Missed out? Read the twitter feed from our chat on changing role of researchers in a social media world

See below for the twitter feed from our tweetchat on 9th December 2014 all about the changing role of researchers involved in online and social media research. Scroll to the bottom and work your way up to follow the conversation in the order it occured.

A summary of the 5 questions posed to those taking part is included below:

Q1: How is social media impacting upon you as a researcher? E.g. identity, work/life balance, ethics
Q2: How is social media changing our identities as researchers?
Q3 from : How does insider-outsider position influence your role, identity,access or objectivity?
Q4 from : What does it mean to be ‘virtually ethical’?
Q5: What are the key issues around social media for researchers going forward?
Q6: What topics shall we chat about on in the new year?




Wednesday, 19 November 2014

Making the most out of big data: computer mediated methods

Patrick Readshaw is a Media and Cultural Studies Doctoral Candidate at Canterbury Christ Church University. Patrick is interested in social media as an alternative and empowering source of information on current events, free from the constraints of other agenda-setting media forms. You can contact Patrick by email on p.j.readshaw68@canterbury.ac.uk  

When I was asked to write a blog for NSMNSS, I was certainly excited and being my first post of this kind I was suitably anxious about the prospect. However, my ongoing thesis has never ceased to provide interesting discussions with individuals in linked or parallel fields relating to social media. The main caveat in these discussions is that I often have to try not to over complicate things. With that in mind and my ham-fisted introduction out of the way I want to take some time to break down the value of so called “new media systems” like Twitter and the how I personally go about dealing with the data I collect. 

Since Social Media sites such as “Facebook” burst onto the scene 10 years ago, researchers and market analysts have been looking for a way to tap into the content on these sites. In recent years, there have been several attempts to do this with some being more successful than others (Lewis, Zamith & Hermida, 2013), particularly with regards to the scale of the medium in question. For those uninitiated (apologies to those that are) the term “Big Data” is the catch-all for the enormous trails of information generated by consumers going about their day in an increasingly digitized world (Manyika et al., 2011). It is this sheer volume of information that poses the first hurdle to be overcome when conducting research online. For example, earlier this year I was collecting data on the European Parliamentary Election and generated over 16,000 tweets in about three weeks. Bearing in mind that on average a tweet contains approximately 12 words in 1.5 sentences (Twitter, 2013), for those three weeks I had 196,500 words or 24,500 sentences to come to terms with. That is a lot of data for one person to deal with alone, especially if only applying manual techniques such as content analysis. 

So ultimately you have to ask two questions. Firstly how many undergraduates/interns chained to computers running basic content analysis is it going to take to complete the analysis in a reasonable space of time and whether that analysis is going to be reliable between the analysts. Secondly, while computational methods save time on analysis can you guarantee the same level of depth as with manual content analysis? Considering that content analysis goes beyond basic frequency statistics which can be collected simply from Twitter’s own search engine, I advocate the use of computer mediate techniques in which the data collected can firstly be reduced using filters to removes reTweets or spam responses and secondly to apply hierarchical cluster analysis among others to structure the data somewhat, or at least conceptualise it along a number of important factors. Both Howard (2011) and Papacharissi (2010) utilise this mixed methods approach as do Lewis, Zamith and Hermida (2013) whose method I adapted to my own work and applied as described above. Furthermore these individual pieces of research suggest the value of the medium overall as a source of data, due to its role as one of the primary news disseminators when access to mainstream news media is blocked such as during 2011 Arab Spring events. Burgess and Bruns (2012) have conducted addition research looking at the 2010 federal election campaign in Australia, advising the use of computational methods to reduce their sample to facilitate manual methods ultimately, maintaining depth during content analysis. As can be imagined Lewis, Zamith and Hermida (2013) and Manovich (2012) both support the methodologies utilized by the studies above and advocate making the most of the technical advances that have allowed for the content in question to be organized and harnessed in an efficient way.  

The application of mixed methodologies will continue to develop the techniques integral to facilitating the oncoming age of computational social science (Lazer et al., 2009) or “New Social Science”. While this is the case it is vitally important that while using this readily available source of data is not exploited in a way that could be potentially damaging to the medium as a whole and maintaining good research practice concerning the ethics associated with consumer privacy. As a final aside I would like to remind everyone that this data is hugely fascinating and rich beyond all belief but there are dangers associated with quantifying social life and if possible this should be at front of our minds before, during and after conducting research online (Boyd & Crawford, 2012; Oboler, Welsh & Cruz, 2012).


References

Boyd, d. & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15 (5), 662–679.

Burgess, J., & Bruns, A. (2012). (Not) the Twitter election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology. Journalism Practice, 6 (3), 384– 402.
Howard, P. (2011). The digital origins of dictatorship and democracy: Information technology and political Islam. London, UK: Oxford University Press.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barbási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009). Life in the network: The coming age of computational social science. Science, 323 (5915), 721-723.

 Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media, 57 (1), 34–52.

Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities (pp. 460–475). Minneapolis, MN: University of Minnesota Press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational social science. First Monday, 17 (7-2). Retrieved from
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3993/3269.

Papacharissi, Z. (2010). A private sphere: Democracy in a digital age. Cambridge, England: Polity Press.




Thursday, 6 November 2014

You Are What You Tweet: An Exploration of Tweets as an Auxiliary Data Source

Ashley Richards is a survey methodologist at RTI International. This post first appeared on SurveyPost on 29, July 2014. 

Last fall at MAPOR , Joe Murphy presented the findings of a fun study he did with our colleague, Justin Landwehr, and me. We asked survey respondents if we could look at their recent Tweets and combine them with their survey data. We took a subset of those respondents and masked their responses on six categorical variables. We then had three human coders and a machine algorithm try to predict the masked responses by reviewing the respondents’ Tweets and guessing how they would have responded on the survey. The coders looked for any clues in the Tweets, while the algorithm used a subset of Tweets and survey responses to find patterns in the way words were used. We found that both the humans and machine were better than random in predicting values of most of the variables.

We recently took this research a step further and compared the accuracy of these approaches to multiple imputation, with the help of our colleague Darryl Creel. Imputation is the approach traditionally used to account for missing data and we wanted to see how the nontraditional approaches stack up. Furthermore, we wanted to check out these approaches because imputation cannot be used in the case where survey questions are not asked. This commonly occurs because of space limitations, the desire to reduce respondent burden, or other factors. I will be presenting on this research at the upcoming Joint Statistical Meetings (JSM), in early August. I’ll give a brief summary here, but if you’d like more details on it please check out my presentation or email me for a copy of the paper.

Income was the only variable for which imputation was the most accurate approach, but the differences between imputation and the other approaches were not statistically significant. Imputation correctly predicted income 32% of the time, compared to 25% for human coders and 26% for the machine algorithm. Considering that there were four income categories and a person would have a 25% chance of randomly selecting the correct response, I am unimpressed with these success rates of 25%-32%.

Human coders outperformed imputation on the other demographic items (age and sex), but imputation was more accurate than the machine algorithm. For these variables, the human coders picked up on clues in respondents’ Tweets. I was one of the coders and found myself jumping to conclusions, but I did so with a pretty good rate of success. For instance, if a Tweeter said “haha” a lot or used smiley faces, I was more likely to guess the person was young and/or female. These are tendencies that I’ve observed personally but I’ve read about them too.

As a coder I struggled to predict respondents’ health and depression statuses, and this was evident in the results. Imputation was better than humans at predicting these, but the machine algorithm was even more accurate. The machine was also best at predicting who respondents voted for in the previous presidential election, with human coders in second place and imputation in last place. As a coder I found that predicting voting was fairly simple among the subset of respondents who Tweeted about politics. Many Tweeters avoided the subject altogether, but those who Tweeted about politics tended to make it obvious who they supported.

twitter_predictions
So what does this all mean? We found that even with a small set of respondents, Tweets can be used to produce estimates with accuracy in the same range or better[1] as imputation procedures. There is quite a bit of room for improvement in our methods that could make them even more accurate. For example, we could use a larger sample of Tweets to train the machine algorithm and we could select human coders who are especially perceptive and detail-oriented. The finding that Tweets are as good or better as imputation is important because imputation cannot be used in the case where survey questions were not asked.

As interesting as these findings may be, they need to be taken with a grain of salt, especially because of our small sample size (n=29).[2] Relying on Twitter data is challenging because many respondents are not on Twitter, and those who are on Twitter are not representative of the general population and may not be willing to share their Tweets for these purposes. Another challenge is the variation in Tweet content. For example, as I mentioned earlier, some people Tweet their political views while others stay away from the topic on Twitter.

Despite these limitations, Twitter may represent an important resource for estimating values that are desired but not asked for in a survey. Many of our survey respondents are dropping clues about these values across the Internet, and now it’s time to decide if and how to use them. How many clues have you dropped about yourself online? Is your online identity revealing of your true characteristics?!?

[1] Even if approaches using Tweets may be more accurate than imputation, they require more time and money and in many cases may not be worth the tradeoff. As discussed later, these findings need to be taken with a grain of salt.

[2] We had more than 2,000 respondents, but our sample size for this portion of the study was greatly reduced after excluding respondents who don’t use Twitter, respondents who did not authorize our use of their Tweets, and respondents whose Tweets were not in English. Furthermore, half of the remaining respondents’ Tweets were used to train the machine algorithm.

Tuesday, 21 October 2014

It started with a tweet...

Nsmnss

Kandy Woodfield is the Learning and Enterprise Director at NatCen Social Research, and the co-founder of the NSMNSS network. You can reach Kandy on Twitter @jess1ecat.


It started with a tweet, a blog post and a nervous laugh. Three months later I found  myself looking at a book of blogs. How did that happen?! Being involved in the NSMNSS network since its beginning has been an ongoing delight for me. It's full of researchers who aren't afraid to push the boundaries, question established thinking and break down a few silos. When I began my social research career, mobile phones were suitcase-sized and collecting your data meant lugging a tape recorder and tapes around with you. That world is gone, the smartphone most of us carry in our pockets now replaces most of the researcher's kitbag, and one single device is our street atlas, translator, digital recorder, video camera and so much more. Our research world today is a different place from 20 years ago, social media are common and we don't bat an eyelid at running a virtual focus group or online survey. We navigate and manage our social relationships using a plethora of tools, apps and platforms and the worlds we inhabit physically no longer limit our ability to make connections.

Social research as a craft, a profession, is all about making sense of the worlds and networks we and others live in, how strange would it be then if the methods and tools we use to navigate these new social worlds were not also changing and flexing.  Our network set out to give researchers a space to reflect on how social media and new forms of data were challenging conventional research practice and how we engage with research participants and audiences. If we had found little to discuss and little change it would have been worrying, I am relieved to report the opposite, researchers have been eager to share their experiences, dissect their success at using new methods and explore knotty questions about robustness, ethics and methods.

Our forthcoming  book of blogs is our members take on what that changing methodological world feels like to them, it's about where the boundaries are blurring between disciplines and methods, roles and realities. It is not a peer reviewed collection and it's not meant to be used as a text book, what we hope it offers is a series of challenging, interesting, topical perspectives on how social research is adapting, or not, in the face of huge technological and social change.

We are holding a launch event on Wednesday 29th October at NatCen Social Research if you would like more details please contact us.

I want to thank every single author from the established bloggers to the new writers who have shared their thoughts with us in this volume. I hope you enjoy the book as much as I have enjoyed curating it. Remember you can follow the network and join in the discussion @NSMNSS, #NSMNSS or at our http://nsmnss.blogspot.co.uk/

Thursday, 16 October 2014

Analytics, Social Media Management and Research Impact

Sebastian Stevens is an Associate Lecturer and Research Assistant at Plymouth University. He teaches research methods to social science students specialising in quantitative methods. He is on twitter @sebstevens99 and has a blog site at www.everydaysocialresearch.com. 

A key benefit that social media can bring to social science research is through impact and engagement. Demonstrating how a research project will achieve impact and engage the public is a key requirement of most social science research bids today, with many funders looking for more than the traditional conference and journal article as being sufficient. Funders today want to see not only how your research will contribute to the current body of knowledge, but also how your research could impact other areas of academia as well as providing public engagement and economic and societal wide benefits.

To promote your research to the widest possible audience, it is often necessary to use a number of Social Media platforms in order to access different populations. It is also now possible to measure this level of engagement through the use of web analytics with the two most common social media platforms (Facebook and Twitter) both providing free access to analytic software for their users. Managing the content and evaluating the impact of a number of social media platforms can however become tiresome and laborious, an issue overcome by the use of a Social Media Management System (SMMS).

The benefits of using a SMMS are vast and take the hassle out of managing multiple social media platforms for your research for a reasonable yearly subscription. There are many SMMS on the market today with an example that I am currently using on a project being Hootsuite. This particular SMMS provides a research team the benefits of:

1.    Scheduling – Researchers are busy people and have little time to manage multiple social media accounts. With a SMMS you can schedule posts to be sent to multiple social media platforms at times of the day known to deliver the largest impact.

2.    Enhanced analytics – The standard analytics of the accounts included in the SMMS are available in one place, alongside extra features including Google Analytics and Klout scores.  

3.    Streams – These provide the opportunity to keep up to date with features of your accounts such as your newsfeeds, retweets, mentions, hashtag usage plus many others.

4.    Multiple Authors – Multiple authors can be added to the system taking the responsibility away from one member of the team.

5.    RSS/Atom feeds – You can keep up with updates of other websites related to your research by adding the RSS/Atom feeds to the system.

By adopting the use of a SMMS a research team has a centralised, hassle free dashboard in which to create and post content alongside evaluating its impact. Each management system comes at a different price and includes different features, however most will take the hassle out of managing your social media platforms and provide greater opportunities to evaluate your research impact.

 

 

 

Friday, 13 June 2014

Some Thoughts on Why You Would Like to Archive and Share [Small] Twitter Data Sets

This post by Ernesto Priego was originally published here. We're delighted to share it as it touches on many issues network members have been discussing, so thanks to Ernesto for allowing us to repost.

Twitter ecosystem quadrant, via blog.twitter.com

This is just a quick snippet to jot down some ideas as some kind of follow-up to my blog post on the Ethics of researching Twitter datasets republished today [28 May 2014] by the LSE Impact blog.

If you have ever tried to keep up with Twitter you will know how hard it is. Tweets are like butterflies– one can only really look at them for long if one pins them down out of their natural environment. The reason why we have access to Twitter in any form is because of Twitter’s API, which stands for “Application Programming Interface”. As Twitter explains it,
“An API is a defined way for a program to accomplish a task, usually by retrieving or modifying data. In Twitter’s case, we provide an API method for just about every feature you can see on our website. Programmers use the Twitter API to make applications, websites, widgets, and other projects that interact with Twitter. Programs talk to the Twitter API over HTTP, the same protocol that your browser uses to visit and interact with web pages.”
You might also know that free access to historic Twitter search results are limited to the last 7 days. This is due to several reasons, including the incredible amount of data that is requested from Twitter’s API, and –this is an educated guess– not disconnected from the fact that Twitter’s business model relies on its data being a commodity that can be resold for research. Twitter’s data is stored and managed by at least one well-known third-party, Gnip, one of their “certified data reseller partners”.

For the researcher interested in researching Twitter data, this means that harvesting needs to be done not only automatedly (needless to say storyfiying won’t cut it, even if your dataset is to be very small), but in real time.

As Twitter grew, their ability to satisfy the requests from uncountable services changed. Around August 2012 they announced that their 1.0 version of their API would be switched off in March 2013. About a month later they announced the release of a new version of their API. This imposed new limitations and guidelines (what they call their “Developer Rules of the Road“). I am not a developer, so I won’t attempt to explain these changes like one. As a researcher, this basically means that there is no way to do proper research of Twitter data without understanding how it works at API level, and this means understanding the limitations and possibilities this imposes on researchers.

Taking how the Twitter API works into consideration, it is not surprising that González-Bailón et al (2012) should alert us that the Twitter Search API isn’t 100% reliable, as it “over-represents the more central users and does not offer an accurate picture of peripheral activity” (“Assessing the bias in communication networks sampled from twitter”, SSRN 2185134). What’s a researcher to do? The whole butterfly colony cannot be captured with the nets most of us have available.

In April, 2010, the Library of Congress and Twitter signed an agreement providing the Library the public an archive of tweets from 2006 through April, 2010. The Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis under the same terms. On 4 January 2013, the Library of Congress announced an update on their Twitter collection, publishing a white paper [PDF] that summarized the Library’s work collecting Twitter (we haven’t heard of any new updates yet). There they said that
“Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes.”
To get an idea of the enormity of the project, the Library’s white paper says that
“On February 28, 2012, the Library received the 2006-2010 archive through Gnip in three compressed files totaling 2.3 terabytes. When uncompressed the files total 20 terabytes. The files contained approximately 21 billion tweets, each with more than 50 accompanying metadata fields, such as place and description."
As of December 1, 2012, the Library has received more than 150 billion additional tweets and corresponding metadata, for a total including the 2006-2010 archive of approximately 170 billion tweets totaling 133.2 terabytes for two compressed copies.”

To date, none of this data is yet publicly available to researchers. This is why many of us were very excited when on 5 February 2014 Twitter announced their call for “Twitter Data Grants” [closed on 15 March 2014]. This was/is a pilot programme [Editors note - succesful applicants were announced on April 17th]. In the call, Twitter clarified that
“For this initial pilot, we’ll select a small number of proposals to receive free datasets. We can do this thanks to Gnip, one of our certified data reseller partners. They are working with us to give selected institutions free and easy access to Twitter datasets. In addition to the data, we will also be offering opportunities for the selected institutions to collaborate with Twitter engineers and researchers.”
As Martin Hawksey pointed out at the time,
“It’s worth stressing that Twitter’s initial pilot will be limited to a small number of proposals, but those who do get access will have the opportunity to “collaborate with Twitter engineers and researchers”. This isn’t the first time Twitter have opened data to researchers having made data available for a Jisc funded project to analyse the London Riot and while I expect Twitter end up with a handful of elite researchers/institutions hopefully the pilot will be extended.”
Most researchers out there are likely not to benefit from access to huge Twitter data dumps. We are working with relatively small data sets, limited by the methods we use to collect, archive and study the data (and by our own disciplinary frameworks, [lack of] funding and other limitations). We are trying to do the talk whilst doing the walk, and conduct research on Twitter and about Twitter.

There should be no question now about how valuable Twitter data can be for researchers of perhaps all disciplines. Given the difficulty to properly collect and analyse Twitter data as viewable from most Twitter Web and mobile clients (as most users get it) and the very limited short-span of search results, there is the danger of losing huge amounts of valuable historical material. As Jason Leavey (2013) says, “social media presents a growing body of evidence that can inform social and economic policy”, but
“A more sophisticated and overarching approach that uses social media data as a source of primary evidence requires tools that are not yet available. Making sense of social media data in a robust fashion will require a range of different skills and disciplines to come together. This is a process already taking shape in the research community, but it could be hastened by government.”
At the moment, unlimited access to the data has been the privilege of a few lucky individuals and elite institutions.

So, why collect and share Twitter data?

In my case, Martin Hawksey’s Twitter Archive Google Spreadsheet has provided a relatively-simple method to collect some tweets from academic Twitter backchannels (for an example, start with this post and this dataset). I have been steadily collecting them for qualitative and quantitative analysis and archival and historical reasons since at least 2010.

My interest is to also share this data with the participants of the studied networks, in order to encourage collaboration, interest, curiosity, wider dissemination, aswareness, reproducibility of my own findings and ideally further research. For the individual researcher there is a wealth of data out there that, within the limitations imposed by the Twitter API and the archival methods we have at our disposal, can be saved and made accessible before it disappears.

Figshare has been a brilliant addition to my Twitter data research workflow, enabling me to get a Digital Object Identifier for my uploaded outputs, and useful metrics that give me a smoke signal that I am not completely mad and alone in the world.

I believe that you should cite data in just the same way that you can cite other sources of information, such as articles and books. Sharing research data openly can have several benefits, not limited to
enabling easy reuse of data allowing the reach of data to be measured or tracked strengthening research networks and fostering exchange and collaboration. Finally, some useful sources of information that have inspired me to share small data sets are:
…and many others…

This coming academic year with my students at City University London I am looking forward to discussing and dealing practically with the challenges and opportunities of researching, collecting, curating, sharing and preserving data such as the kind we can obtain from Twitter.

If you’ve read this far you might be interested to know that James Baker (British Library) and me will lead a workshop at the dhAHRC event ‘Promoting Interdisciplinary Engagement in the Digital Humanities’ [PDF] at the University of Oxford on 13 June 2014.

This session will offer a space to consider the relationships between research in the arts and humanities and the use and reuse of research data. Some thoughts on what research data is, the difference between available and useable data, mechanisms for sharing, and what types of sharing encourage reuse will open the session. Through structured group work, the remainder of the session will encourage participants to reflect on their own research data, to consider what they would want to keep, to share with restrictions, or to share for unrestricted reuse, and the reasons for these choices.

Update: for some recent work with a small Twitter dataset, see http://epriego.wordpress.com/2014/06/11/digital-humanities-summer-institute-some-charts-from-my-dhsi2014-archive/

Ernesto Priego is a Lecturer in Library Science at the Centre for Information Science, City University London. His blog at City is here. His research interests include comics scholarship, digital humanities, library science, online publishing, journalism, social media, alt-metrics, data research and scholarly communications.

Thursday, 12 June 2014

Tweets and The Streets: A video interview about Paolo Gerbaudo’s book

By Jamileh Kadivar, student in the MA in Social Media at the University of Westminster.

Watch the video here https://www.youtube.com/watch?v=mKeMTkqYF9I

The video I produced is an introduction to Tweets and The Streets: Social Media And Contemporary Activism, a must read and a great book with lots of interesting insights and a strong research foundations. The book was written by Paolo Gerbaudo and published by Pluto Press in 2012.

The video includes two interviews. In the beginning, I present the main arguments of the book and introduce the author as an academic, a journalist and also an activist; then, you can watch the interviewees' replies to my questions. Finally, I present my own conclusion.

The first interview was conducted with Paolo Gerbaudo on March 2nd, 2014, by Skype. Dr. Paolo Gerbaudo, a lecturer in Digital Culture and Society at King’s College London, answers questions about the key terms in his book, such as 'choreography of assembly' and 'choreographic leaders', and then explains how he defines leadership. He also challenges techno-optimistic and techno- pessimistic views and discusses the theoretical approach he took in the book. He narrates why he selected the 2011 movements, considering that there were also movements in the years before 2011 that have used social media such as Facebook, Twitter, and YouTube as mobilizing tools. He also explains why out of so many movements, he chose the Arab Spring in Egypt, the Indignados movement in Spain, and the Occupy Wall Street movement in the US. He talks about the more recent events in Egypt and argues that what happened there in 2013 was a coup, and not a revolution or an extension of the 2011 Egyptian movement. He describes the main differences between anti-globalization movements and street protests clearly in this video.

My second interview in this video was conducted on March 5th, 2014. It is a face- to-face discussion with Dr. Miriyam Aouragh, an expert in social media and social movements, and a lecturer in the University of Westminster. She has written a review of Paolo Gerbaudo’s book. In this video, she speaks about the book's strengths and discusses her views and criticisms regarding the book in a fair and clear way.

Making a video for me, as a person without any previous experience in movie-production before this stimulating and challenging experiment, at first seemed something strange and far-fetched! At this time last year, I did not have a clue of how to do it. But now, I think it was both an interesting and fun activity that added well to the theoretical aspects of our modules. But still I'd rather prefer to write 10 essays than to make a video!

Last, but not least, the book is very well written and easy to read, not only for academics, but also for journalists, activists and ordinary people. I hope this video assists you to find good information about Tweets and the Street’s main ideas. I think, watching this video, before starting to read the book, can be helpful to get a good overview of the key arguments.


Thursday, 15 May 2014

Vlogging: Video interview about Dhiraj Murthy’s book “Twitter: Social Communication in the Twitter Age”


By Barrie Schooling, student in the MA in Social Media at the University of Westminster

Watch the video interview here: https://www.youtube.com/watch?v=N3u4CyVStk4

Google ‘Twitter book review’ and you’ll get reviews of 1,000 page tomes reduced to 140 characters, for example “The Unbearable Lightness of Being - Beautiful and thought provoking novel of love, obsession, lust and oppression". This sort of activity, indeed every sort of activity on Twitter from the profound to the banal are the subject of a book called ‘Twitter: Social Communication in the Twitter Age’ by Dhiraj Murthy.

Having recently written a review of the book I arranged to interview Dhiraj in his Goldsmiths lab and chat to him a bit more about his work and to ask questions that might help to me critically review his work.

In ‘Twitter. Social communication in the Twitter age’ Dhiraj Murthy discusses the platform as it gives ‘ordinary’ people a medium to publish ‘user-generated “news/updates”, Twitter is a ‘social media’ where the ‘social’ is derived from the content being created by users rather than a ‘traditional’ media outlet.

Traditional media content is determined by the producer whereas the process is more democratic in Twitter. Murthy identifies ‘whoever is considered to be an expert or simply worthy of being listened to is potentially determined by consumers rather than producers’. Users can choose what information they wish to receive through the accounts they follow – ‘they can choose from a variety of sources: traditional media, individual commentators, friends, leaders in an occupational field’.

Murthy explores the notion of the ‘global village’ and other’s who see Twitter as the realisation of the global village, for example ‘fashion enthusiasts can interact with fashionistas in London and Paris, regardless of where they live’.

On one hand Twitter ‘can be thought of as a megaphone that makes public the voices/conversations of any individual or entity’ but yet if the voices ‘reflect influence already present in traditional media’ then it isn’t truly a democratizing technology merely a different medium for the same messages. Murthy presents an ‘event society’ but given that these ‘trending topics’ or ‘events’ can often be a mixture of a major news story alongside a celebrity story he questions what constitutes an event? A major political rebellion is undoubtedly an ‘event’ but is what your colleague had for breakfast?

Murthy observes many changes to traditional journalism, from Twitter adding to journalists’ ‘source mix’ to historically impartial journalists giving personal viewpoints via Twitter to the idea of ‘crowdsourcing’ and he gives examples like users looking through the great volume of MP’s expenses for discrepancies.

The ‘citizen journalist’ is presented through coverage of the Mumbai bombings in 2008 and the US Airways flight that downed in the Hudson River in 2009, both of which were first reported via Twitter. Although the ‘citizen journalist’ has in the aforementioned cases been the first to report on the story Murthy presents a case for the traditional news media still being the main reference point for verification of the story.

Throughout the book Murthy is careful not to over- or under-credit Twitter’s importance to the subject at hand, be it in saving lives following a disaster. This chapter on Activism uses statistics to debunk a number of myths about the role of Twitter in Arab Spring uprisings and the so-called ‘Twitter Revolution’.

Whilst not entirely convinced about ‘revolutions’ happening as a direct result of Twitter activity Murthy does constantly note social changes arising as a result of its usage and the chapter ‘Twitter and Health’ provides one such example. Throughout the chapter we are given examples of changes (and limitations) in people’s medical advice and opinion being shared via Twitter, to new support groups forming via Twitter or to the use of hashtags to filter the usually like-minded into the similarly-diagnosed.

I’d like to thank Dhiraj for allowing me to interview him and I hope you enjoy listening to his opinions on the subject.

Monday, 17 February 2014

Collecting stories, moving along

Keurkoon Phoomwittaya is a student in the Social Media MA at the University of Westminster.

They travelled with me to many places in my hometown in Thailand and flew across the sky to England. I take them on every journey that I go.

Since I was young, I used to write stories about travelling experiences in my secret notebook. I kept it in my backpack. Now I have a smart phone as another object which I use for capturing moments and telling my stories via social media platforms.

Turkle says that "We think with the objects we love; we love the objects we think with" (2007:5). The things that we take around us provoke emotions toward moments that we relate to.

Not only do we use mobile phones to share life moments online, we also observe stories that our friends tell. We often imagine along with their stories, but we do not sense exactly the same as how they feel.

Indeed, we know best our own embodied experience of being in a place, and why we choose to tell about it. The important reminder then is to see values attached in stories that people share about the space they are in (Farman, 2012). This is what I would like to bring into focus as I think of my research approach.

I am currently working on a postgraduate dissertation entitled “Girlguiding’s Use of Twitter in Storytelling”. My interest is on how people reflect on their experiences by sharing their stories via Twitter. Interestingly, Twitter’s hashtag search helps me as a researcher to find out what people say about events they are involved in. 

What draws my attention are hashtags which create a collaboration of common values that people in social groups reflect through stories, even though they are in different locations. 

As a Girlguiding volunteer, I would like to investigate how using hashtags to tell stories via twitter represents the organization’s values of giving girls and young women a space where they have fun and can be themselves. 

A search for #girlguides on twitter enables me to collect data of the values projected by messages and attached pictures. The expressions seen are of girls and young women having fun at concerts and campsites in different countries. Textual analysis will be a major approach as I will see the bigger image of how hashtags create a collective feminist identity among Girlguides participants. 

Qualitative interviews will be a minor method to let the participants reflect on how they use Twitter as a platform for sharing Girlguiding stories. 

I think meaningful research projects are reflected by being part of social groups. When I went to a Girlguides’ campsite in London for the first time, I was impressed by its green and comforting environment. However, what interests me more is the stories. 

Similarly, while people give meaning to communities they live in as part of their histories, mobile media lets us explore and tell our own experiences from every corner in which we belong.
 
References
  • Farman, J. 2012. Mobile Interface Theory: Embodied Space and Locative Media. United Kingdom: Routledge.
  • Turkle, S. 2007. Evocative Objects. United States: Massachusetts Institute of Technology.

 

Friday, 10 January 2014

Reflections on the influence of social media on privacy

Akin Olaniyan is a student in the Social Media MA at the University of Westminster. 

Making sense of the obvious tension between online visibility and privacy is never going to be a straightforward thing for me. Having worked as a reporter and a corporate communication specialist for more than two decades, I have some sense of dealing with public scrutiny of my work. Until now, social media was just for me another platform. If you know newspapers, I used to think, you shouldn’t have problems functioning in the new environment that digital convergence has created.

Or so I thought.

Just weeks after arriving for the Social Media MA at Westminster University, I have come to agree with danah boyd that being visible through social media can both complicate and enrich our lives (boyd, 2012). Social media networking sites like Facebook and Twitter offer new ways of engagement that have collapsed the walls of privacy, sometimes with terrible consequences. Henry Jenkins captures this well when he says, ‘when people take media into their own hands, the results can be wonderfully creative; they can also be bad news for all involved.’ For me, herein lies the irony; the thought that we would be willing to trade off a slice of our privacy for a chance to make ourselves ‘visible.’

The culture of sharing that is one of the tenets of the convergent media environment may be fraught with minefields, but Castells’ point, that, ‘In our society, the protocols of communication are not based on a sharing of culture but on the culture of sharing’ (Castells, 2009) is useful here. The new environment has given us ‘power’ to determine what we create, remix, share, anytime we want and with those we choose.

True, in the social media environment, ‘the media are no longer what just what we watch, listen to or read – the media are now what we do’ (Meikle and Young, 2012). Oh! How we enjoy the newfound freedom, to do away with the middleman and reach out in our network. Never mind that I performed a similar role in a newspaper. Maybe it sounds out of place to ask whether social media serves a critical need. The status updates, the likes and the sometimes, meaningless chatter all serve a need. They bind us together. “Our playful conventions and in-jokes may create insider symbols that help groups to cohere’ as Baym (2010) notes very well.

Notwithstanding, it looks to me like Rosen’s description of the people formerly known as the audience is rather too ‘romantic’. For one, corporate media may no longer ‘own the eyeballs’ as he states but in this process of becoming more active, we lose something important as well. Given what I have leant in just a few weeks, boyd’s argument that, ‘when people assume you share everything, they don’t ask about what you don’t share’ (boyd, 2012), for me, sounds frightening even in the era of ‘Big Brother’. We all have a way of ignoring ‘Big Brother’ until we’re caught in uncompromising positions.

My mind went to Boyd’s position the story of the UK university students whose Facebook profiles were swiped by ratemash.com and published without permission, in the latest example of third party misuse of online data. The all pervasive power of both Facebook and Twitter, to be able to remove whatever exists of the thin line between the private and the public has got me taking a second look at my accounts on social media platforms.

My goal? Cut out all but the most important of my engagements online. Every text, every image and ever engagement is an opportunity to say something and connect. As Baym (2010) says, “…as people appropriate the possibilities of textual media to convey social cues, create immediacy, entertain, and show off for one another, they build for themselves, build interpersonal relationships, and create social concepts….” 

I realize as I do this though, that there’s a chance that I may miss out in other ways but the thought that my profile and other data can be taken, remixed and shared sounds worrying.

Maybe I’m old fashioned but I strongly think there’s a creepy feeling to having a text, say, an unflattering selfie made available on the scale that convergent media makes possible. But having seen the reaction of some of the students whose profiles were swiped by ratemash.com, I don’t know anyone who wouldn’t be embarrassed unless they were in showbiz.



References:
boyd, d. (2012): Participating in the always-on lifestyle. In: Mandiberg, M (ed.) The social media reader. New York/London: New York University Press, Pp 71-76

Jenkins, H. (2008): Convergence culture: Where old media and new media collide. New York/London: New York University Press

Castells, M. (2009): Communication power. Oxford: Oxford University Press

Meikle, G. and Young, S. (2012): Media convergence: Networked Digital media in everyday life. London: Palgrave

Baym, K. N. (2010): Personal connections in the digital age. Cambridge: Polity


Rosen, Jay: The people formerly known as the audience. In: Mandiberg, M (ed.) The social media reader. New York/London: New York University Press, pp. 13-16

Wednesday, 4 December 2013

What Social Media means to my research

By Amy Aisha Brown, PhD Blogger

After reading Dr Janet Salmons’s last post (Defining our terms: What is “social media”?), I was inspired to think about what the term means to me but coming up with a definition is not that easy. Is social media defining platforms, the affordances of platforms, something else? Is it the online factor? Because it is participatory? How far does the label extend? I am excited about the #NSMNSS tweetchat we are going to have on 5 December (details here) where we can discuss some of these questions and think about what social media means to us as a group, but after failing to find a personal definition for this potentially all-encompassing term, I thought a better approach might be for me to think about what social media means or brings to my doctoral research.

On a basic level I could say, “social media is data”. That is, I use Twitter as my way into investigating how the English language in Japan is talked about. This is not a common approach in studies of language ideology, but Twitter has a massive active user base (around 10% of the Japanese population if recent statistics from in the loop are accurate), and tweets are both accessible and plentiful. In fact, tweets in Japanese mentioning the English language are so numerous that it is impossible for me to collect them all—a matter for another time. But while these features show the potential of Twitter to collect a vast amount of potentially relevant data, what is it that makes Twitter, as opposed to some other source of information, useful for undertaking my investigations?

I could have chosen to look at how the English language is talked about in Japanese newspaper reports, policy documents, or organisational websites, or used interviews or focus groups. However, what previous studies using these techniques have yet to focus on is how English is talked about across domains, from the mass media, to business, through to daily conversation. And it is at this point that Twitter comes in.

Until recently Twitter’s about page claimed it to be “a real time information network”, and the phrase still appears elsewhere on the site. However, Twitter is not just news and information, it is also a social space where people interact, comment, and chat. This social side of Twitter means that I can look at how ‘official’ information sources (newspapers, organisations, etc) talk about English and how people talk about English in response to those sources, but I can also take into consideration how it is talked about in wider conversation, something that would be near impossible using other kinds of data.





So, for me, the massive user base of Twitter and the online, real-time, accessibility of it cannot be underestimated, but what is really important is the social, collaborative, participatory side of Twitter that enables the talk about my topic (and pretty much any other) to thrive.