Data science and predictive analytics have become integral to many facets of our daily lives: Facebook tags our friends in photos, our cars automatically detect when we’re drifting out of our lane or need to brake, and Netflix knows which shows we’ll most enjoy. And at first glance, healthcare has been primed for this data revolution.
• Data: The most costly aspect of any big-data solution, gathering the data, is done, and has been around for decades. Every time anyone goes to the doctor, a detailed record of the event is created, often in the form of an electronic health record (EHR).
• Value Proposition: High-stakes questions are waiting to be answered. Interested parties include not only patients themselves, but insurance companies, physicians, and hospital systems.
So why don’t we have an app for that? Why haven’t we seen advances in medical data modeling as dramatic as we have in more commonplace aspects of our lives?
One challenge is the quality and cohesion of EHR data.
• Quality: Beyond structure, any model is only as good as the data from which it is based. Many electronic health records are haphazard. Take medication codes: a properly formatted code consists of ten digits containing three numbers representing the company, product, and packaging separated by dashes. Segments can vary in digit length and have leading zeroes. Some EHRs may omit the dashes and leading zeros, making it unclear if 12345678 represents 1234-0567-08, 01234-0567-8, or any other combination therein. Attempts are underway to standardize this notation, but prior historical records and imperfect compliance delay these benefits.
• Cohesion: There is no all-inclusive repository of health knowledge, though some data services are making large strides. Instead, holistic data sets must be cobbled together from disparate hospital system repositories. Although EHRs are required to include certain information and follow certain formats, data structure and standards vary from database to database. Collecting this information into a consistent database requires careful consideration. For example, some schemas contain demographic information on patients whereas others do not. These fields may affect the course of a patient’s care by excluding certain treatments due to religious objections or informing diagnosis of diseases and genetic disorders that are more prevalent in certain age, race, gender, or religious groups. This treatment will be reflected in the data without the context with which the decisions were made.
Medicine is complex. This is why physicians train for decades before beginning their practice and why years can pass between disease discovery and treatment. This topical understanding and expertise is a critical component of any modeling effort. Without it, data models produce disastrous and potentially dangerous results. Effective data science for healthcare requires combining teams of practitioners and medical experts to provide data context with teams of data scientists to execute the big data analytics. Not many of these teams exist. Only a small fraction of physicians spend significant time on research, and data scientists are currently scarce. As science becomes more interdisciplinary, cross-functional teams should become more common.
If an algorithm chooses the wrong entertainment content, at worst, a user wastes two hours watching a bad movie. If an algorithm misdiagnoses a patient, the stakes can be much higher. For this reason, just as for any healthcare drug, device, or practitioner, algorithms must go through an approval process before use in the general public. That’s not a bad thing, as it ensures the models in use are safe and effective, but the process does take extra time (1-4 years) and money ($24-$75 million) to complete compared to less-regulated domains.
These obstacles are not insurmountable. Indeed, great work is being performed daily by large initiatives like IBM Watson, academic and private sector groups, and individual contributors through competitions like Kaggle. New researchers are emerging with backgrounds in medicine and training as data scientists. So although there’s no instantaneous body scanner or tricorder to immediately identify and treat any malady, we’re making progress every day.