Thursday, 7 July 2016

Using Big Data to Solve Social Science Problems

Curtis Jessop is a Senior Researcher at NatCen Social Research and is the Network Lead for the NSMNSS network

On Wednesday 29th June I attended a roundtable hosted by our network partners SAGE on using big data to solve social science problems. It was a great day, with contributions from leading researchers and lots of discussion of some of the key issues of working with big data in social science.

Jane Elliott began with an overview of the ESRC’s Big Data Network. She identified the difficulties with data access that earlier phases had faced, but also highlighted key challenges that big data social science currently faces:

1. Methodological
  • Can we apply the same qualitative techniques/statistical inferences we have in the past?
  • Are social scientists (falling) behind in using machine learning & algorithms? What are the implications of these methods?
2. Relevance of research
  • Making sure we use big data to answer pertinent social science questions, and not just focus on methods
3. Ethics at a macro & micro level
  • Working ethically with big data - data security, anonymity, informed consent & data ownership
  • What are the implications of a ‘big data society’/algorithm-led decision making?

New methods, tools and techniques for big data research

Giuseppe Veltri outlined how data-driven science differs from ‘traditional’ social science research as it generates hypotheses and insights from the data, rather than theory, combining abductive, inductive & deductive approaches. Further, Phillip Brooker identified a tension in big data analysis between wanting to use qualitative research approaches with data of a scale that requires numerical treatment. As a result, social scientists need to work with ‘unfamiliar’ techniques and software.

Tools for Big Data analysis

It was generally agreed that existing software are not fit for addressing academic/social science research questions. Also, tools offered by commercial companies are often ‘black boxes’, when social scientists need to be transparent on the algorithms they use as they are part of the methodology.

Many at the roundtable have therefore developed their own tools (e.g. COSMOS, TextonicsChorus, & Method52 from CASM) to enable them to conduct analysis in a manner they wanted to. However, it was felt there was still some way to go - many of these tools are ‘in-house’ and ongoing funding/support is needed to develop something more stable, well-supported, and ‘outward facing’.

Interdisciplinary working

One approach to addressing the challenges of big data analysis is working in interdisciplinary teams (in particular linking between social & computer science departments). Luke Sloan and Mark Carrigan identified the key challenge of this at a ‘human level’ is ensuring a common understanding of language, after which it was easy to have an open discussion and there were rarely disagreements. Mark argued that what was key was not necessarily making sure that everyone had the same definitions, but that there was an understanding that different fields may have different perspectives.

Mark Kennedy, based on his experiences at the Data Science Institute, emphasised the importance of ‘getting excited’ about the right research question, not just focusing on the technology, and then building a team based on what skills you need to fill that gap.

However, attendees felt that there were structural barriers to interdisciplinary working in academia – departmental silos, geography, navigating different funding bodies, finding journals to publish in, and demonstrating value for the REF were all recognised as problems, although it was also mentioned that funding increasingly supported this approach.

Training in the social sciences

Quite early in the discussion, the question was raised that if there is such a clear skills gap in the social sciences, why had universities not responded to it?

Although it was accepted that training needed to address big data methods, there were differing opinions on how feasible this might be. Adding new techniques into methods courses was welcomed, but to what extent was this achievable when these are already packed covering ‘traditional’ methods? Further, given the relative rarity of established social scientists with this skill-set, who would provide this teaching?

Although it was felt that new students are open to using Python or R/new statistical techniques, this scarcity of trainers with the skills to teach both programming and its application within social sciences was again identified as a problem. Giving students (and academics) access to data science training materials that are framed by social science problems, and relevant dummy data to work with, was suggested as a way to start addressing this.

Answering social science questions with Big Data

While discussing his own research, Slava Mikhaylov highlighted that a good way to make impact is, rather than starting with a research question, to aim to solve a problem. This was echoed by Carl Miller, who outlined some principles that Demos follow for making impact:
  • Look beyond academic funders – if research is funded by a government department, they’re going to have to listen to it!
  • Ask the right question – what is interesting to a researcher vs. a policy maker
  • Answer quickly – policy interests change, and research won’t make an impact if everyone’s moved on
  • Diversify outputs – can they be real-time, interactive, engaging?
  • Networking – who are the champions of big data research?

 Carl emphasised that was just the approach that Demos used, and may not be appropriate for all research or audiences. He also mentioned you need to work hard in a new discipline to be responsible and transparent about what your research doesn’t do or say.

Ethics of research using Big Data

Anne Alexander differentiated between the ethics of research using big data and the ethics of doing research in a networked world.

On the latter, Anne felt that there has not been enough reflection on the implications of the ‘datafication’ of human interaction, and that we need to de-mystify these processes and consider what the use of machine learning/algorithms means for society (e.g. their potential for discrimination).

Anne emphasised the need to take into consideration the public’s views on this when considering Big Data research, a point re-enforced by Steve Ginnis, whose work at Ipsos Mori on developing ethical guidelines for social media research drew on public ethics, existing industry guidelines and legal frameworks.

Steve’s research identified that the public both have low awareness of, and are not keen on, their social media data being used for research. This was not just due to concerns about privacy/anonymization – people were uncomfortable with being profiled and its possible implications.

That said, participants were willing to weigh up the risks and benefits, and context (who is doing the research and why) was important. Nonetheless, the ‘fundamentals’ (consent, what information, anonymization, etc.) played a much larger role in whether they felt research using social data was appropriate.

Both Anne & Steve emphasised that ethics is an ongoing process, not a one-off event at the start of a project – they need to be considered at the collection, analysis and publication stages of the research cycle.

Some concluding thoughts

Carl Miller identified that in the context of pressure for evidence-based policy, digital by default, and the open data initiative, there has never been a better time for social scientists to make impact with big data research.

Wednesday’s session demonstrated how far big data analysis in the social sciences has come over recent years and it is impressive to hear how much work has been put into developing the tools and methods to mould this rich, but novel, form of data into social insights.

However, the session also showed that there are number of areas that still need to be addressed if we are to make the most of big data:
  • Access to large data sets continues to be an issue, be they proprietary, public, or administrative. We need to bargain collectively to talk to large, often global, actors and argue for academic access.
  • There is a skills gap among social scientists for analysing big data, and support is needed to help develop the required methodological and programming skills.
  • The interdisciplinary working required for big data analysis can be challenging, and we need to work to enable effective collaboration.
  • Developing an ethical approach to big data analysis is challenging given its novelty, variety, and changing nature. Any framework needs to provide practical guidance to researchers while remaining flexible and responsive to changing contexts.
  • Available tools for big data analysis can be expensive, lack transparency, or inappropriate for social science research. A maintained central library of available tools, with appropriate documentation and guidance could be extremely useful.


  1. Blogging is incredible and every blogger playing a great role to introduce new things in blogging. I always like to fly on different blogs and read the strategies of different blogger to understand the blogging in more depth. Being a bloggers I really appreciate your works and no doubt your blog is awesome.
    Love from Asad Niazi

  2. Really I appreciate for providing the great info in this website that to using the nice info in this blog. I definitely loved every little bit of it.
    social networking

  3. One of the skills that almost every student is expected to do really well in his or her academic life is writing. In their college life, students are assigned to write a number of different academic papers which they find really challenging to write. See more how to write sop for internship

  4. ClixSense is an high paying get-paid-to site.

  5. Getting the privilege PCNSE6 Certification is a standout amongst the best courses for you to expect a promising vocation in the realm of data innovation. Palo Alto accreditations are noteworthy certification that can unquestionably help numerous IT experts in the focused business. "

  6. Big Data and Data Science Course Material. Avail 15 Day Free Trial! Learn Flume, Sqoop, Pig, Hive, MapReduce, Yarn & More. Get Certified By Experts! big data developer training

  7. That is very interesting; you are a very skilled blogger. I have shared your website in my social networks! A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article.
    Data Science Online Training

  8. Thanks for your informative blog!!! Keep on updating your with such awesome information.
    Hadoop Online Training

  9. Nice tutorial iam reading your article really I am impresses it is very helpful for providing a lot of information about Datascience …. Thank you Please keep share some more…………

  10. I have read your blog and I gathered some needful information from your blog. Keep update your blog. Awaiting for your next update.

    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|
    Data science training in Marathahalli|
    Data science training in Bangalore|

  11. Thanks for this blog. Provided great information. All the details are explained clearly with the great explanation.
    Data Science Training in BTM Layout
    Java Training in Marathahalli

  12. This comment has been removed by the author.

  13. Thanks to explaining about tools for Big Data Analysis. Big Data is an Open Source Java-based programming framework mainly been designed to store and process relatively enormous data levels which can scale up to any high values.

    Big Data Training in Hyderabad

  14. Big Data is one of the most important subject in today's market well Bigdata has the ability to gain the support in all the modules aswell has huge demand in the market While I was having my PMP Certification in Mumbai I heard about the Big data and It has the ability to slove social science Thankyou so much for the Blog Please Kepp Updating

  15. hello Big Data is related to different set of elements of having n number of variables When I have done my PMP in Chennai, I came to know about more number of certain set of projects related to Big Data Thank you for providing the information

  16. Appreciate your work, very informative blog on Data Science. I just wanted to share information about DataScience Online Training. Hope it helps community here.

  17. • I very much enjoyed this article. Nice article thanks for given this information. I hope it useful to many PeopleHadoop admin Online Training

  18. Linux Online training in India – Webtrackker Technology is providing the linux online training with 100% placement support. If you are looking for the BEST LINUX & UNIX Training Institute In india or linux online training from india, live project based LINUX & UNIX online training then you can contact to us.

    Python online training in India, RPA Online training in India, Salesforce online training in india, AWS online training in india, Cloud Computing Online Training in India, SAS Online Training in india, Hadoop online training in INDIA, Oracle DBA online training in India, SAP online Training In india, Linux Online training in India

  19. Nice tutorial. Thanks for sharing the valuable info about Data Science Training. it’s really helpful. Keep sharing on updated tutorials…..

  20. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

    Data Science Training in Bangalore

    DataScience Training in Chennai

  21. This comment has been removed by the author.

  22. Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up.
    Hadoop Training in Gurgaon

  23. I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.

    Data Science Training in Chennai

    Data Science Training in Banglore

  24. Visit for computer tips to manage all type of projects.

  25. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

    RPA Training in Bangalore

  26. Thanks for the explanation. It’s really helpful. Please keep sharing
    Big Data Training in Delhi

  27. Thanks for your effort to put this information I think its useful.for more information about machine learning go through this link. machine learning training in hyderabad

  28. It's A Great Pleasure reading your Article, learned a lot of new things, we have to keep on updating it Bala Guntipalli Thanks for posting.

  29. Nice blog,

    Then you can go through this Hadoop training in Hyderabad
    of professional trained in Hadoop. Due to being Hadoop robust and handling humongous, the data seems much easier now. There are highly experienced industry experts will deliver you the best training amendments with a deep understanding,So start your career ALL THE BEST.

  30. Demand for skilled data scientists continues to be sky-high, with IBM recently predicting that there will be a 28% increase in the number of employed data scientists in the next two years.
    Businesses in all industries are beginning to capitalize on the vast increase in data and the new big data technologies becoming available for analyzing and gaining value from it.This makes it a great prospect for anyone looking for a well-paid career in an exciting and cutting-edge field Data Science.
    Iteanz provides the most comprehensive and extraordinary technical training with our wealth of experience on Data Science.


  31. Nice blog..! I really loved reading through this article. Thanks for sharing such a amazing post with us and keep blogging...

    Hadoop training in Hyderabad
    Hadoop online training in Hyderabad
    Bigdata Hadoop online training in Hyderabad

  32. Thanks for sharing this blog. This very important and informative blog Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind.
    Very interesting and useful blog!
    best wireless bluetooth headphones
    best power bank for mobile
    dual sim smartphone