The second session at the Knowledge Exchange Seminar on quantitative methods on the 26th of September was from Grant Blank of the OII. The topic was Populations and Sampling and he asked the questions; What is the “population” on social media platforms? How do platforms differ in population characteristics? How can we select cases or sample on social media?
One of the key issues in terms of sampling online is that it’s difficult to develop a sampling frame; Grant pointed out that a biased sampling frame was unavoidable in much online research. However, despite the potential problems, the advantages of online data collection often outweigh the challenges, not least because it’s cheap and fast.
Since Twitter data are so easy to collect, much of the discussion following the session was around the challenges in sampling Tweets. How can we get a random representative sample of tweets, especially if we’re interested in looking at more than just a snapshot of time? It seems to me that a potential aim for the network might be to put together some guidelines around sampling from Twitter for new researchers who are looking for guidance. Again, the question was raised about what kinds of questions Twitter data can really help us to answer, if we know that Twitter users are not representative of the whole population and that even getting a random, representative sample of tweets is problematic. Some case studies and examples of research questions where Twitter data has been used to good effect could also be helpful to network members.
Little time was spent discussing sampling from other social media platforms, but an interesting reference was provided for Gjoka et al (2010) which promotes a Random Walk technique to obtain an unbiased sample of social network sites, see: