My presentation given at the SRA and #NSMNSS allowed me to finally meet face-to-face with 7 expert speakers presenting tools for social media research. It was a day of learning for me. An all-day event that left me elated and keen to get on with my research in the knowledge that I would be able to call on the expertise of the others if needed - and of course my door is always open to the other speakers. I highly recommend taking part in similar future events to all.
The talk was titled “Critically Engaging with Social Media Research Tools”; it was about using the tools but with ethical concerns at the forefront of the social researcher’s mind rather than relegating them to a mere paragraph in the methods section. In order to illustrate the fluid nature of the visualisations that the software can co-create I had decided to collect, analyse and visualise Twitter data on the hashtag #BigData. By selecting this hashtag, I was also keen to get behind who or what organisations were promoting the buzz surrounding big social data.
The tools that I introduced; TAGS; Yourtwapperkeeper; DMI-TCAT; Gephi; Leximancer; to collect data from Twitter, YouTube enable the social researcher to take part (in a limited capacity) in surveillance capitalism. Researchers are able to collect big social data from people’s lives without their knowledge or consent. I was keen to highlight the notion that as researchers are in this position of observing others interactions that they have a duty of care to those they are researching. As we do when applying any other research tool.
The answer to the question regarding who or what institution is behind/key influencer/major player/controlling the flow of communication on #BigData was revealed by analysing 1,040,000 #BigData Tweets with Leximancer. On Twitter the key influencer around the term #bigdata is a contractor who supplies staff to the National Security Agency in the United States – Booz Allen Hamilton. Booz Allen Hamilton are the contractors who employed Edward Snowden.
This visualisation was presented with the caveat that the graphs and images being shown are the result of numerous steps and decisions by the researcher guided by certain principles from social network analysis (SNA) and graph theory. What was presented are a few of the techniques and tools of data mining and analytics, with machine learning and automation in Leximancer. Such insights that ‘come’ from the data and the application of algorithms need to be validated in the light of informed understanding of the ‘never raw data’ position. The existence of this ‘data’ is the result of a long chain of requirements, goals and a shift in the wider political economy; surveillance capitalism. The ‘insights’ are at the macro level – devoid of this context.
Big/Social data does not represent what we think they do. It represents something, and this is worth investigating. We are looking at the various ways in which populations are defined, managed, and governed. The modelling algorithms used to visualise the social data know nothing about letters, nothing about narrative form, nothing about people.
The algorithm’s lack of knowledge of semantic meaning, and particularly its lack of knowledge of the social media as a form or genre, lets it point us to a very different model of the social. Such ‘Reading Machines’ are engaged in datafication of the social. The concern with the notion of datafication is that as it attempts to describe a certain state of affairs, it flattens human experience. It is this flattening by computer aided approaches to research of social media platforms that requires caution and can be ameliorated by the application of ethnographic approaches to collecting social media data from Twitter and other platforms.
A major worry is that designers, developers, and policy makers will continue to take big/social data at face value, as an object or representation of a truth that can be extracted from and that reflects the social.
We are glimpsing the various ways in which we are to be defined, managed, and governed. As social researchers we too engage in defining, managing and governing. The first ethical step when using the tools listed below is to have a carefully formulated research question and to select your targets carefully.
What follows is the list of tools referred to during the talk and links to each tool with installation support where provided. Also found here: https://snacda.com
- TAGS – available here – https://tags.hawksey.info/ by Martin Hawksey. Contains useful instructions and videos to help setting it up.
The only concern is that Twitter now requires you to not only have a Twitter account but also have installed their app on your phone and provide them with your phone number and verify it. So it’s “Free”! Just provide us with your entire identity and all the data that goes with it!
- YOURTWAPPERKEEPER – available here – https://github.com/540co/yourTwapperKeeper
It has been seriously undermined by changes to Twitters rules and regulations and its creator John O’Brien III seems to have sold it to Hootsuite and left it at that. It may now be in contravention of Twitter’s Terms of Services.
- DMI-TCAT – available here – https://github.com/digitalmethodsinitiative/dmi-tcat
The Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) allows for the retrieval and collection of tweets from Twitter and to analyse them in various ways. Please check https://github.com/digitalmethodsinitiative/dmi-tcat/wiki for further information and installation instructions.
This software is highly recommended – it also has a version that can access Youtube – https://github.com/bernorieder/YouTube-Data-Tools
- GEPHI – available here – https://gephi.org/
GEPHI can now be used to collect Twitter data – and operates on Windows and Apple operating systems – just be very careful with java updates and incompatible versions of iOS.
- TROPES – available here – http://www.semantic-knowledge.com/tropes.htm
Designed for Information Science, Market Research, Sociological Analysis and Scientific studies, Tropes is a Natural Language Processing and Semantic Classification software that “guarantees” pertinence and quality in Text Analysis.
- LEXIMANCER – available here – http://info.leximancer.com/