Friday, 22 April 2016

Who Tweets? Making Twitter Data ‘Useful’ for Social Science


Dr Luke Sloan is a Senior Lecturer in Quantitative Methods, Deputy Director of Cardiff Q-Step and a member of the Collaborative Online Social Media Observatory (COSMOS: www.cosmosproject.net). He is based in the School of Social Sciences at Cardiff University and his research focuses on the development of demographic proxies for Twitter data and understanding how social media data can augment traditional modes of social scientific analysis. @drlukesloan

A perennial criticism of Twitter data is that it’s missing many of the variables that we find interesting as social scientists and, because of this, it will never be a viable source of data for social scientific analysis. We are anchored to the practices of survey methodology in which a question is asked and answered, thus we ensure that the researcher collects the relevant demographic information allowing us to compare gender/ethnic/socio-economic groups. This is the bread and butter of social science.
In contrast, social media data is naturally occurring it is not elicited! Because of this it is unfocused, messy and does not neatly address a pre-conceived research question. But it is a rich source of information on attitudes and provides insights into immediate reactions following key events. It’s been used to predict elections, box office revenue and even to calculate the epicentre of an earthquake. So clearly we shouldn’t be so quick to dismiss this data as useless, particularly if we are creative and innovative in how we conceptualise the manner in which demographic data may manifest and thus open this data up to social scientific analysis.
Imagine that you are walking down the street and have decided that today you are going to guess the demographics characteristics of the people that you see the only rule is that you cannot ask them outright, you must observe their behaviour without being obtrusive. How might you work out someone’s gender? Well, perhaps you overhear someone shouting his or her name. What about their occupation? Maybe they have an ID badge or are carrying tools. What about their age? Well we all make guesses about age based on appearance, often at the risk of offending someone. The point is that through the passive uptake of incidental information which is there to be analysed (and which you have not elicited!) you can tell quite a bit about a person.

Now let’s consider this in the context of Twitter. People put their name on Twitter, thus allowing us to derive a proxy for their gender. For those who have geo-tagging switched on we can tell where they were when they tweeted, or we can use profile information to workout their home town. If we have enough time we can even look at the place which they make reference to in their tweets. We know about their hobbies as they report on  their leisure activities and we know a bit about their work if they report on it via social media. Are they employed? Well we can have a look at whether they’re complaining about work, about colleagues or about the printer breaking down (‘again!’). When we look close enough we are flooded with ‘signatures’ that offer us an indication of characteristics that that would typically be found in the demographics section of a survey.

The sticking point is that we can’t derive this information for all tweeters and not all the proxies are as reliable as others. First names are actually quite an accurate proxy for gender as identity play is a minority pursuit. As long as you have stringent classification rules and understand that around 52% of UK users can’t be classified (this still results in successful identification of around 600,000 users), then you still have information for 48%*. You could think of this 48% as a sample of Twitter users which is synonymous to a survey sample, although not randomly sampled… but even then do we have any reason to think that the users we have been able to identify are substantively different to those we can’t?

The bottom line is that it is possible to derive important demographic information from Twitter data if we’re prepared to think creatively. The methods will get better and programmes of work will emerge which allow the confirmation of proxy demographic reliability. We’re only a few metres off the ground on our climb up this new methodological edifice, but seeking out a viable trail enables others to follow and establish safer, more secure routes.

Friday, 15 April 2016

They flirt, they share porn and they gossip

This is another blog from the UCL project Why We Post written by Juliano Spyer (@jasper). The project is researching the uses and consequences of social media around the globe. The online course has now finished, but will be restarted in the coming months. There are also seven more books to be launched, that can be downloaded for free via Open Access. Follow their twitter @UCLWhyWePost for more information and check out their website to learn about the different discoveries they have made.

The last four months of 2015 were tough. I was locking myself in a claustrophobic student carrel every day, spending 9 hours staring at a computer screen but not being able to finish the final draft of my book. I began having trouble sleeping and pictured a clock ticking everywhere I went. But the source of this anxiety – as I realized later – was a prolonged and unconscious struggle to say something about my research while the evidence was pointing the other way. I wanted very badly to conclude on my book saying that this poor settlement in Brazil had a lot of problems, but that because of social media things are changing for the better. But they aren’t.

This realization came after a long conversation with a friend that kindly took the time to read a previous draft of my book. The last chapter is about the effects of social media on relationships between people that are not relatives or friends. I did not notice this before, but I ordered the cases in a way to construct an argument that social media was empowering locals to protest against injustices. But this friend summarized her impression of that chapter saying that despite all this fuss about social mobility in Brazil, people are still living as second rate citizens. If a relative is murdered, not just they have to accept that the police will not investigate: they also have to keep quiet or risk being subjected to more violence.

The internet and particularly social media is everywhere in this settlement. Teenagers and young people are crazy about it but adults and older folks also share the excitement. There is the enchantment with the new possibilities of being in touch with people and also the pride related to having a computer and to be able to use it. It shows that they are not as “ignorant” [illiterate] as others might have thought and the PC looks good in the living-room next to the flat screen TV. But how much of this represents real change and how much is – as my friend’s commentary indicates –just an appearance of change?

In short, I wanted to sympathise with “the oppressed” and also show the internet is empowering. And in order to claim that, I denied the basic evidence of what they do with social media. It is not about learning, though that happens. (For instance, they are much more interested in reading and writing in order to better use things like Facebook and WhatsApp.) However, their reason for wanting to be on social media is mostly to flirt, to share some (very) gruesome videos and to spy on one another and gossip about it.

Evangelic Christianity is much more clearly responsible for “positive” change there than the internet or social media: the protestant ideology promotes literacy and education, helps people get and keep their jobs, reduces the incidences of alcoholism and family violence. Social media, on the other hand, is usually not for opening and expanding the access to information and to new relationships, but to restore and strengthen local networks. Facebook and WhatsApp are in some cases a possibility for young people to harness the desire to study and move beyond their subordinate position in society, but it is also intensely used for social control – i.e. for spying and spreading rumours attacking people who want to challenge conformity.

The picture I have now is not as neat and “positive”. But perhaps the best contribution an anthropological research has to offer is just that: to challenge generalizations and expose how contradictory human relations can be.