The second session at the Knowledge Exchange Seminar on quantitative
methods on the 26th of September was from Grant Blank of the OII.
The topic was Populations and Sampling and
he asked the questions; What is the “population” on social media platforms? How
do platforms differ in population characteristics? How can we select cases or
sample on social media?
One of the key
issues in terms of sampling online is that it’s difficult to develop a sampling
frame; Grant pointed out that a biased
sampling frame was unavoidable in much online research. However, despite
the potential problems, the advantages of online data collection often outweigh
the challenges, not least because it’s cheap and fast.
Since Twitter
data are so easy to collect, much of the discussion following the session was
around the challenges in sampling Tweets.
How can we get a random representative sample of tweets, especially if we’re
interested in looking at more than just a snapshot of time? It seems to me that
a potential aim for the network might be to put together some guidelines around sampling from Twitter
for new researchers who are looking for guidance. Again, the question was
raised about what kinds of questions Twitter data can really help us to answer,
if we know that Twitter users are not representative of the whole population
and that even getting a random, representative sample of tweets is problematic.
Some case studies and examples of research questions where Twitter data has
been used to good effect could also be helpful to network members.
Little time
was spent discussing sampling from other social media platforms, but an
interesting reference was provided for Gjoka et al (2010) which promotes a
Random Walk technique to obtain an unbiased sample of social network sites,
see: