This post was first published on Survey Post on Feb. 3rd, 2014.
Recently, I attended two statistical events in the Washington, DC, area: one was the 23rd Morris Hansen Lecture on “Envisioning the 2030 U.S. Census”; the other was the SAMSI workshop on “Computational Methods for Censuses and Surveys.” “Big data” was a popular keyword at both events and stirred up discussions on how to utilize it (such as from administrative records and online data sources) for current government statistics, especially when combining big data with traditional survey data.
Statisticians are exploring new ways in which big data can be used. The US Census has initiated investigations on using administrative records in the 2020 Census. The National Center for Health Statistics (NCHS) has identified some research opportunities combining multiple data sources. University-based researchers have launched studies on the use of Google trends and other online data in small area estimation.
When big data dominated the mainstream discussion at these events, I started thinking more about “small data.” Can small data help us make better use of big data? Here are some of my thoughts.
- Applying a conventional sampling-based approach to big data: more and more administrative records are collected electronically. Statisticians are excited about using these records that may contain information from the entire population for analytic purposes. Literature in the past two decades has extensively discussed the advantages of administrative records. Processing administrative records data, however, can be quite time consuming. In addition, it can be cumbersome to run analyses on these large datasets because of the large data volume. Especially, when analysts use conventional statistical software, such as SAS, Stata and R, it becomes increasingly complex to handle, store and analyze these data. The question is: is there a way to reduce the data volume and increase computational speed? Applying conventional sampling-based approach (e.g. optimal sampling, calibration weighting) may make a big data smaller and more manageable while allowing researchers to maintain decent data quality.
- Combining non-probability sample data with probability sample data: many big data, such as data collected by Google/Twitter/Facebook, are not census (population) data. We may treat them as non-probability sample data. Elements are chosen arbitrarily in these datasets and there is no way to estimate the probability that each element in the population will be included. Also, it is not guaranteed that each element has a chance of being included, making it impossible either to assess the validity (always measured in terms of “bias”) and reality (always measured in terms of “variance”) of the data. One solution to make the data more representative of the entire population is to combine them with probability sample data (e.g. survey data), which can be relatively smaller. This method can also assist us estimating sample variability and identifying potential bias in big data.
- Using high-quality small data for measuring and adjusting errors in big data: big data is not only non-representative of the target population, but also carry loads of measurement errors because the construct behind a particular measure in these data can differ from the construct that analysts require. To evaluate errors in the big data and improve precision, small survey data can be collected for validation. Take the National Health Interview Survey (NHIS) as an example. This is a household interview survey with only self-reported data. To improve on analyses of the NHIS self-reported data, an imputation-based strategy for using clinical information from an examination-based health survey (i.e. National Health Nutrition Examination Survey, NHANES) was implemented that predicts clinical values from self-reported values and covariates. Estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. Similarly, we may assess potential errors in big data through a more sophisticated and accurate small survey.
While big data provides us massive and timely information from various sources (e.g. social media, administrative records, small data is simple, easy to collect and process, and can be more accurate and representative. Can small data help you when dealing with your big data problems?
Dan Liao is a research statistician at RTI International. She currently works on multiple aspects of data processing and analysis for large, multistage surveys of health care in the United States, including sampling design, calibration weighting, data editing and imputation, statistical disclosure control, and the analysis of survey data. Her survey research interests include multiphase survey designs, combining survey and administrative data, domain estimation, calibration weighting, and regression diagnostics for complex survey data. Dan has a PhD in Survey Methodology from the Joint Program in Survey Methodology at University of Maryland and has published research focusing on regression diagnostics, calibration weighting and predictive modeling.
I need these details to complete this function indoor university, gifts the exact same trouble together with your publish. Relation, amazing verify. amazon bewertungen generieren
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI really do have confidence in which in turn best would like toward beneficial facts plus details you have thus provided this specific. אבי פארטס בע
ReplyDeleteI simply at this point want to found substantial thumbs additional technique up toward excellent details you can have within with this write-up. When i should be anytime re-occurring online web page created for further swiftly! illuminated signs perth
ReplyDeleteI truly enjoy basically evaluating ones web sites. Basically prepared to express to an individual that you've got men and women as well as me personally who appreciate function. Surely an excellent publish. fake id
ReplyDeleteI truly savored this Account with the Wedding ring. The idea manufactured me personally sad to believe your ex finished up being taken with this kind of in the beginning generation. Especially creating the hubby and infants. Many thanks with regards to sharing this kind of charming account. cannabis business plan sample Canada
ReplyDeleteI want to share this excellent site truthfully self-confident most of us so that they can perform which will! Relation, high quality write-up. How much does it cost to open a dispensary
ReplyDeleteIt happens to be thus intriguing. I would really like to know various main features of the web page. Therefore it is advisable to allow me personally this kind of mass media swiftly. It's my job to will unquestionably realize an individual. magician melbourne
ReplyDeleteIt is a excellent restriction. When i savored this accoutrements great deal. High heel shoes for men
ReplyDeleteIt is a reasonable website. You've gotten much know-how concerning pcs this challenge, thus very much love. Digital Marketing Strategy Course
ReplyDeleteIt’s appropriate event the right way to come up with a number of choices cash created for issues it's also event the right way getting satisfied. beautiful free wordpress themes
ReplyDeleteIt's a great pleasure reading your post. It's full of information I am looking for and I love to post a comment that "The content of your post is awesome" Great work! beachfront vacation rentals
ReplyDeleteMany thanks for this brilliant post! Many points have extremely useful...Discover the easiest way to trade you’ve ever imagined! Copy Trader from Ettore frees you from the complicated fuss to put you in charge. more info:) countdown clock
ReplyDeleteNice post. I was checking continuously this blog and I am impressed! radiology jobs
ReplyDeleteReally great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up! bali restaurant
ReplyDeleteSugarcane harvest will likely then be ready-made into sugar could be very hard. There are various kinds of sugar that can be found in the future very good article that deserves all the praise, congratulations. Authentic Pet Photographer in Washington DC
ReplyDeletethanks for this information and if u looking for traveling packages kindly visit last minute
ReplyDeleteNice blog about Using “Small Data” to Improve the Use of “Big Data”. Thanks for share with us. If you need data management services then ConroyCreativeCounsel is the best place for you.
ReplyDeleteA lot of people having an incorrect image about the cash advance loans or sometimes refer it as bad credit payday loans. online marketing company services
ReplyDeleteNice post. It is really interesting. Thanks for sharing the post!
ReplyDeleteDigital Marketing Training in Chennai | Certification | SEO Training Course | Digital Marketing Training in Bangalore | Certification | SEO Training Course | Digital Marketing Training in Hyderabad | Certification | SEO Training Course | Digital Marketing Training in Coimbatore | Certification | SEO Training Course | Digital Marketing Online Training | Certification | SEO Online Training Course
one funnel away challenge
ReplyDeleteone funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
“Airlinespolicy247.com” tells about policies of different airlines under one hood. Passengers usually get confused about Flight Change Policy, Cancellation Policy, Baggage Policy, Check-In Policy, and Pet Policy. We cover all of these policies of different airlines and you will find everything in one place. With us, you enjoy the benefits of getting knowledge of every policy of each airline which you must know before traveling with an airline.
ReplyDeleteDelta Airlines Change Policy
I will be sure to bookmark your blog and definitely will come back sometime soon.
ReplyDeleteThe information you are providing that is really good.
ReplyDeleteThanks again and keep it up.
ReplyDeleteGreast article thank you so much!!
ReplyDeleteYou may be a great author. Thank you!!
ReplyDeleteHeya i'm for the first time here. I came across this board and I find
ReplyDeleteI hope to give something back and help others like you aided me.
ReplyDeletevery nice
ReplyDeleteMUST VISIT- Power BI Training in Bangalore
Power BI Training Company in Bangalore