#NSMNSS: Using “Small Data” to Improve the Use of “Big Data”

Friday, 28 February 2014

Using “Small Data” to Improve the Use of “Big Data”

This post was first published on Survey Post on Feb. 3rd, 2014.

Recently, I attended two statistical events in the Washington, DC, area: one was the 23rd Morris Hansen Lecture on “Envisioning the 2030 U.S. Census”; the other was the SAMSI workshop on “Computational Methods for Censuses and Surveys.” “Big data” was a popular keyword at both events and stirred up discussions on how to utilize it (such as from administrative records and online data sources) for current government statistics, especially when combining big data with traditional survey data.

Statisticians are exploring new ways in which big data can be used. The US Census has initiated investigations on using administrative records in the 2020 Census. The National Center for Health Statistics (NCHS) has identified some research opportunities combining multiple data sources. University-based researchers have launched studies on the use of Google trends and other online data in small area estimation.

When big data dominated the mainstream discussion at these events, I started thinking more about “small data.” Can small data help us make better use of big data? Here are some of my thoughts.

Applying a conventional sampling-based approach to big data: more and more administrative records are collected electronically. Statisticians are excited about using these records that may contain information from the entire population for analytic purposes. Literature in the past two decades has extensively discussed the advantages of administrative records. Processing administrative records data, however, can be quite time consuming. In addition, it can be cumbersome to run analyses on these large datasets because of the large data volume. Especially, when analysts use conventional statistical software, such as SAS, Stata and R, it becomes increasingly complex to handle, store and analyze these data. The question is: is there a way to reduce the data volume and increase computational speed? Applying conventional sampling-based approach (e.g. optimal sampling, calibration weighting) may make a big data smaller and more manageable while allowing researchers to maintain decent data quality.
Combining non-probability sample data with probability sample data: many big data, such as data collected by Google/Twitter/Facebook, are not census (population) data. We may treat them as non-probability sample data. Elements are chosen arbitrarily in these datasets and there is no way to estimate the probability that each element in the population will be included. Also, it is not guaranteed that each element has a chance of being included, making it impossible either to assess the validity (always measured in terms of “bias”) and reality (always measured in terms of “variance”) of the data. One solution to make the data more representative of the entire population is to combine them with probability sample data (e.g. survey data), which can be relatively smaller. This method can also assist us estimating sample variability and identifying potential bias in big data.
Using high-quality small data for measuring and adjusting errors in big data: big data is not only non-representative of the target population, but also carry loads of measurement errors because the construct behind a particular measure in these data can differ from the construct that analysts require. To evaluate errors in the big data and improve precision, small survey data can be collected for validation. Take the National Health Interview Survey (NHIS) as an example. This is a household interview survey with only self-reported data. To improve on analyses of the NHIS self-reported data, an imputation-based strategy for using clinical information from an examination-based health survey (i.e. National Health Nutrition Examination Survey, NHANES) was implemented that predicts clinical values from self-reported values and covariates. Estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. Similarly, we may assess potential errors in big data through a more sophisticated and accurate small survey.

While big data provides us massive and timely information from various sources (e.g. social media, administrative records, small data is simple, easy to collect and process, and can be more accurate and representative. Can small data help you when dealing with your big data problems?

Dan Liao is a research statistician at RTI International. She currently works on multiple aspects of data processing and analysis for large, multistage surveys of health care in the United States, including sampling design, calibration weighting, data editing and imputation, statistical disclosure control, and the analysis of survey data. Her survey research interests include multiphase survey designs, combining survey and administrative data, domain estimation, calibration weighting, and regression diagnostics for complex survey data. Dan has a PhD in Survey Methodology from the Joint Program in Survey Methodology at University of Maryland and has published research focusing on regression diagnostics, calibration weighting and predictive modeling.

30 comments:

rohanrj2 October 2018 at 06:14
I need these details to complete this function indoor university, gifts the exact same trouble together with your publish. Relation, amazing verify. amazon bewertungen generieren
ReplyDelete
Replies
rohanrj2 October 2018 at 09:10
This comment has been removed by the author.
ReplyDelete
Replies
rohanrj2 October 2018 at 09:15
I really do have confidence in which in turn best would like toward beneficial facts plus details you have thus provided this specific. אבי פארטס בע
ReplyDelete
Replies
rohanrj2 October 2018 at 11:03
I simply at this point want to found substantial thumbs additional technique up toward excellent details you can have within with this write-up. When i should be anytime re-occurring online web page created for further swiftly! illuminated signs perth
ReplyDelete
Replies
rohanrj3 October 2018 at 06:43
I truly enjoy basically evaluating ones web sites. Basically prepared to express to an individual that you've got men and women as well as me personally who appreciate function. Surely an excellent publish. fake id
ReplyDelete
Replies
rohanrj3 October 2018 at 08:32
I truly savored this Account with the Wedding ring. The idea manufactured me personally sad to believe your ex finished up being taken with this kind of in the beginning generation. Especially creating the hubby and infants. Many thanks with regards to sharing this kind of charming account. cannabis business plan sample Canada
ReplyDelete
Replies
rohanrj3 October 2018 at 12:53
I want to share this excellent site truthfully self-confident most of us so that they can perform which will! Relation, high quality write-up. How much does it cost to open a dispensary
ReplyDelete
Replies
rohanrj4 October 2018 at 11:43
It happens to be thus intriguing. I would really like to know various main features of the web page. Therefore it is advisable to allow me personally this kind of mass media swiftly. It's my job to will unquestionably realize an individual. magician melbourne
ReplyDelete
Replies
rohanrj5 October 2018 at 05:56
It is a excellent restriction. When i savored this accoutrements great deal. High heel shoes for men
ReplyDelete
Replies
rohanrj5 October 2018 at 07:13
It is a reasonable website. You've gotten much know-how concerning pcs this challenge, thus very much love. Digital Marketing Strategy Course
ReplyDelete
Replies
rohanrj5 October 2018 at 09:18
It’s appropriate event the right way to come up with a number of choices cash created for issues it's also event the right way getting satisfied. beautiful free wordpress themes
ReplyDelete
Replies
rohanrj6 October 2018 at 06:18
It's a great pleasure reading your post. It's full of information I am looking for and I love to post a comment that "The content of your post is awesome" Great work! beachfront vacation rentals
ReplyDelete
Replies
rohanrj6 October 2018 at 08:21
Many thanks for this brilliant post! Many points have extremely useful...Discover the easiest way to trade you’ve ever imagined! Copy Trader from Ettore frees you from the complicated fuss to put you in charge. more info:) countdown clock
ReplyDelete
Replies
rohanrj6 October 2018 at 09:00
Nice post. I was checking continuously this blog and I am impressed! radiology jobs
ReplyDelete
Replies
rohanrj6 October 2018 at 10:29
Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up! bali restaurant
ReplyDelete
Replies
rohanrj6 October 2018 at 11:50
Sugarcane harvest will likely then be ready-made into sugar could be very hard. There are various kinds of sugar that can be found in the future very good article that deserves all the praise, congratulations. Authentic Pet Photographer in Washington DC
ReplyDelete
Replies
travelhunter13 June 2019 at 10:50
thanks for this information and if u looking for traveling packages kindly visit last minute
ReplyDelete
Replies
Sophia29 July 2019 at 12:50
Nice blog about Using “Small Data” to Improve the Use of “Big Data”. Thanks for share with us. If you need data management services then ConroyCreativeCounsel is the best place for you.

ReplyDelete
Replies
salman seo21 September 2019 at 18:03
A lot of people having an incorrect image about the cash advance loans or sometimes refer it as bad credit payday loans. online marketing company services
ReplyDelete
Replies
Rashika14 July 2020 at 11:26
Nice post. It is really interesting. Thanks for sharing the post!

Digital Marketing Training in Chennai | Certification | SEO Training Course | Digital Marketing Training in Bangalore | Certification | SEO Training Course | Digital Marketing Training in Hyderabad | Certification | SEO Training Course | Digital Marketing Training in Coimbatore | Certification | SEO Training Course | Digital Marketing Online Training | Certification | SEO Online Training Course

ReplyDelete
Replies
Anand Shankar18 July 2021 at 21:24
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
ReplyDelete
Replies
Airlines Policy 24715 March 2022 at 12:22
“Airlinespolicy247.com” tells about policies of different airlines under one hood. Passengers usually get confused about Flight Change Policy, Cancellation Policy, Baggage Policy, Check-In Policy, and Pet Policy. We cover all of these policies of different airlines and you will find everything in one place. With us, you enjoy the benefits of getting knowledge of every policy of each airline which you must know before traveling with an airline.
Delta Airlines Change Policy
ReplyDelete
Replies
슬롯사이트14 April 2024 at 07:53
I will be sure to bookmark your blog and definitely will come back sometime soon.
ReplyDelete
Replies
gostopsite.com14 April 2024 at 07:53
The information you are providing that is really good.
ReplyDelete
Replies
sportstotomen.com14 April 2024 at 07:54
Thanks again and keep it up.
ReplyDelete
Replies
19guide03.com14 April 2024 at 07:54
Greast article thank you so much!!
ReplyDelete
Replies
slotplayground.com14 April 2024 at 07:56
You may be a great author. Thank you!!
ReplyDelete
Replies
edwardsrailcar.com14 April 2024 at 07:56
Heya i'm for the first time here. I came across this board and I find
ReplyDelete
Replies
cmriindia.org14 April 2024 at 07:57
I hope to give something back and help others like you aided me.
ReplyDelete
Replies
Anonymous20 September 2024 at 12:49
very nice
MUST VISIT- Power BI Training in Bangalore

Power BI Training Company in Bangalore

ReplyDelete
Replies

Add comment

Pages

Friday, 28 February 2014

Using “Small Data” to Improve the Use of “Big Data”

30 comments:

NSMNSS is led by:

Affiliate organisations supporting NSMNSS

Pages

Friday, 28 February 2014

Using “Small Data” to Improve the Use of “Big Data”

30 comments:

NSMNSS is led by:

Affiliate organisations supporting NSMNSS

Subscribe To