It takes time, and sometimes policy and utilization changes, to fully accommodate a new technology's potential. This certainly has been true for big data in health. This two-part post discusses key areas in health policy and data use where current approaches may be impeding full big data outcomes.
Big data tools facilitate pulling together great amounts of available data to support an objective, whether those data were recorded specifically and narrowly for that objective or not. In health, sometimes these are called "secondary" data if they were recorded initially for clinical care purposes but then used for something else. It’s all about increasing the amounts of data you can get, instead of getting the exact data you think you need. We’ll call these new ways of looking at the data that you can get "observational."
Big data tools offer great promise for new approaches to health research. As healthcare tries to broaden the scientific basis for treatments and as it begins to engage in comparative effectiveness work, there are needs to use data that have been accumulated for clinical care, communications and other purposes. There are also needs to reuse data that have been previously accumulated in other research and use them for follow-up, further extension or new hypothesis testing. Big data tools offer the opportunity to add to the "traditional" research analysis of limited sets of specifically-extracted and highly-specified data with big data analysis of huge amounts of less well-structured, less well-specified electronic clinical care, "social media" and other data.
Of course, where possible, it is still desirable to have well-structured, highly-specified data, but that route to research alone does not seem to address the size of the research problem in front of us. What is more, the general scientific expenditure is threatened by funding reductions like much of the rest of the national discretionary budget.
Big data has many potential roles in research, but a major one is the "mining" of large, less well-structured data that exist as a byproduct of clinical care and other engaged electronic systems. In general there are two policy approaches to using health data that originates with an identifiable patient or person. One is to de-identify the data and the other is to obtain the individual's consent that their data can be used.
In the U.S., the Health Insurance Portability and Accountability Act (HIPAA) "consent" presents issues for some big data uses. The conflict here is that while big data approaches offer great opportunity for additional queries and subsequent analysis of large data sets for unexpected findings and secondary conclusions, HIPAA, even after the Omnibus update, requires that patients be re-consented if the new investigations are of a different nature than the original work.
The ironic part is that these HIPAA consent constraints relate to work defined as being "designed to develop or contribute to generalizable knowledge" versus for locally-used treatment, payment and operations uses. Many people struggle with why such a laudable population health goal actually induces greater constraints on how data can be used.
In my next post, we will consider some of the problems with HIPAA “de-identified data” as well as other population health big data issues.