Enabling Personalized Medicine through Health Information Technology: Advancing the Integration of Information

Darrell M. West

Executive Summary

With federal officials pursuing the goal of a personal human genome map under $1,000 in five years (White House, 2010), it is possible to envision a future where treatments are tailored to individuals’ genetic structures, prescriptions are analyzed in advance for likely effectiveness, and researchers study clinical data in real-time to learn what works. Implementation of these regimens creates a situation where treatments are better targeted, health systems save money by identifying therapies not likely to be effective for particular people, and researchers have a better understanding of comparative effectiveness (President’s Council of Advisors on Science and Technology, 2010).

Yet despite these benefits, consumer and system-wide gains remain limited by an outmoded policy regime.  Federal regulations were developed years before recent advances in gene sequencing, electronic health records, and information technology.  With scientific innovation running far ahead of public policy, physicians, researchers, and patients are not receiving the full advantage of latest developments.  Current policies should leverage new advances in genomics and personalized medicine in order to individualize diagnosis and treatment.  Similarly, policies creating incentives for the adoption of health information technology should ensure that the invested infrastructure is one that supports new-care paradigms as opposed to automating yesterday’s health care practices.

To determine what needs to be done, a number of key leaders from government, academia, non-profit organizations, and business were interviewed about ways to promote a better use of health information technology to enable personalized medicine.  The interviews focused on policy and operational issues surrounding interoperability, standards, data sharing protocols, privacy, predictive modeling, and rapid learning feedback models.

This paper outlines the challenges of enabling personalized medicine, as well as the policy and operational changes that would facilitate connectivity, integration, reimbursement reform, and analysis of information.   Our health system requires a seamless and rapid flow of digital information, including genomic, clinical outcome, and claims data.  Research derived from clinical care must feed back into assessment in order to advance care quality for consumers.  There currently are discrete data on diagnosis, treatment, medical claims, and health outcomes that exist in parts of the system, but it is hard to determine what works and how treatments differ across subgroups.  Changes in reimbursement practices would better align incentives with effective health care practices. 

Furthermore, we need privacy rules that strike the right balance between privacy and innovation.  These rules should distinguish health research from clinical practice, and create mechanisms to connect data from multiple sources into databases for secondary research usage and population cohort analysis.  More balanced rules would improve innovation.  It is nearly impossible to evaluate treatment effectiveness without being able to aggregate data and compare results.  Faster knowledge management would enable “rapid learning” models and evidence-based decision-making on the part of physicians and public health officials.

As more information on treatment, lab tests, genomics, and financial costs get integrated into health care, it is hard to incorporate data from medical history, vital signs, genetic background, and lab testing into diagnosis and treatment.  Predictive modeling represents a way for physicians to move towards systematic and evidence-based decision-making.  While the first step toward enabling personalized medicine is ensuring clinicians have access to what is known about patient gene variants, computer models can go beyond this approach to predict what treatments are likely to be most effective given observed symptoms.  Public policy should incorporate rapid learning and predictive modeling to gain the full benefits of personalized medicine.    

There are several ways in which personalized medicine can be enabled: (1) “meaningful use” requirements promulgated by the executive branch, (2) change driven by consumer demand for personalized medicine, (3) pilot and demonstrations projects supported by the Centers for Medicare and Medicaid Services (CMS) Innovation Center, and (4) academic-industry collaborations encouraged by the government through investment.  The declining costs of DNA sequencing will drive consumer demand and generate growing demands for physicians to personalize medicine.  In addition, CMS should deploy some of its $10 billion in pilot project resources through its new innovation center to encourage personalized medicine.  Along with the National Institute of Health, the agency could fund new projects designed to demonstrate innovation in health care (Wechsler, 2009).

The Challenges of Enabling Personalized Medicine

There are a number of policy and operational challenges that interfere with the public’s ability to gain the benefits of personalized medicine through health information technology (Pollack, 2010; Wade, 2010).  These include issues such as interoperability, inconsistent coding and language standards, problems in data sharing, weak feedback loops, privacy concerns, and ineffective reimbursement policies. 

Interoperability represents a major challenge because of the difficulty of integrating data from different sources.  If researchers and healthcare providers are not able to exchange information, it raises the cost of health care and makes it difficult to learn in real-time.  A considerable amount of medical information is collected, but too little of it is integrated or put into data bases that are usable for research or public health purposes. 

As our understanding of diseases becomes ever more stratified by their genomic signatures, even larger data sets will be needed to establish treatment protocols.  Patient data across geography and health care plans will need to be queried simultaneously.  This can only be achieved through large, federated pools of information that includes patient genomic data and their health histories.

Legitimate concerns over privacy and confidentiality complicate secondary use of health care information.  Even when data are aggregated and depersonalized, it is hard for researchers to gain access to information that helps them spot trends or gain insights into public health trends.  

Regulatory processes will be strained by genomic discoveries.  There will be no way to conduct conventional clinical studies for every genetic signature as a unique diagnostic test.  One solution would be a statistical strength standard that must be demonstrated before genetics can be applied to medical decision-making.  Statistical strength could be determined through mining of federated data pools.  This mechanism could alleviate capacity constrains and costs associated with clinical studies and speed innovation to the marketplace.

There also are problems in terms of reimbursement policies.  Many programs are not well aligned with laudatory goals such as preventive medicine or positive health outcomes.  This mismatch makes it difficult to judge quality or build incentives for healthy outcomes.  We need to reward providers for good behavior and reduce incentives for wasteful or unnecessary treatment. 

Three Revolutions and How They Affect Health Care

The Medical Delivery Revolution:  New Actors and New Relationships

Health care is shifting from a hierarchical delivery system to one that features greater transparency, collaboration, and patient involvement.  For much of the 20th century, medicine was dominated by physicians with considerable professional autonomy, hospitals, the pharmaceutical industry, insurance companies, and government agencies that focused on the elderly, veterans, and the poor (Starr, 1984).

Now we are seeing a more empowered relationship between primary care doctors and their patients, and the emergence of customer-driven medicine that has expanded the range of non-traditional health care providers and placed more information-gathering responsibility on patients and their care-givers, such as children with elderly parents.  Businesses such as CVS and Wal-Mart have developed in-store treatment centers (Jones and Japsen, 2010).  Out-patient facilities have proliferated at a rapid rate.  Patients can order drugs through Internet sites.  Rather than rely only on doctors, consumers can get health information from the Internet, social networking sites, fellow patients, and chat rooms (Miller, 2010). 

Remote monitoring devices and mobile health applications allow people to monitor their own weight, blood pressure, pulse, and sugar levels, and send results electronically to health care providers.  Patients can store their medical records online and have access regardless of where they are in the United States or around the world.  Some get personalized feedback via email and reminders when they gain weight, have an uptick on their cholesterol levels, don’t take their medicine, or have high blood pressure (West, October, 2009). 

Scans and imaging have improved to a high level of resolution.  Imaging tests, especially computed tomography or CT scans, can measure tissue down to one-third of a millimeter in size.  This development allows health care providers to describe physical anomalies with tremendous precision and monitor patient responses to various therapies.  Imaging enhances medical personalization and tailors treatment to someone’s individual circumstances.

The Digital Revolution and Ways to Convert Data into Knowledge


Concurrent with major changes in medical care delivery has been an explosion of digital resources for patients as well as physicians. Websites such as,,,, and answer questions and provide links to discussion groups about particular illnesses.  In states such as Massachusetts, California, New York, and Michigan, consumers can visit health department sites and compare quality performance data on provider care programs.  Nationally, the U.S. government has a website,, that evaluates 2,500 hospitals on mortality rates, room cleanliness, call button responses, and how patients judge their quality of care (West, 2009). 

Social networking sites represent another way to share information among chronic condition sufferers.  For example, a network developed by the company PatientsLikeMe has 23,000 patients who have signed up to share information regarding five different illnesses:  mood disorders, Parkinson’s, multiple sclerosis, HIV/AIDS, and Lou Gehrig’s disease.  Particularly for rare illnesses where it is hard to generate the patient numbers required for clinical trial, site organizers say “patients have been a tremendously underutilized resource.”  While large clinical trials with randomized assignment clearly need to remain central to drug assessment, digital technology that helps providers and researchers identify worrisome trends represents an additional way to gain useful feedback.

Through these and other digital resources, doctors and patients have much more information at their disposal (Christensen, Grossman, and Hwang, 2008).  They know more about their own histories, can link to additional sources of information, and can interact electronically with health care providers.  This level of information strengthens patients’ access to information and helps them ask more informed questions about their medical conditions.

As part of the 2009 American Recovery and Reinvestment Act, Congress authorized $44 billion in public funding of physician and hospital adoption of electronic health records.  Policymakers hope to extend the utilization of electronic health records by providing grants to hospitals and physicians meeting “meaningful use” standards.  Their goal is to increase the usage of health information technology from 10 to 90 percent of health care providers so that they have adopted electronic health records in meaningful ways by 2015.  The new investment creates the opportunity to adopt information systems that accelerate personalized medicine as opposed to merely automating systems designed years ago.

Genomics and the Impact on Medical Care

Scientists have made extensive progress over the last two decades in understanding human genetics and the role of proteins and chemicals in gene behavior (Goodman, 2009).  In 1989, the National Institutes of Health launched the Human Genome Project in an effort to identify the basic building blocks for human beings.  By 2003, investigators had sequenced the genome and identified three billion discrete “chemical units.” 

Since that time, scientists have worked to establish links between gene structures, human illnesses, treatment effectiveness, and adverse effects (Institute of Medicine, 2010b).  Integrating genetic sequencing data into electronic health records potentially cuts health care costs through more effective targeting of treatments and more accurate diagnoses.  This type of connectivity speeds research feedback into clinical care, and gets more timely information to patients, physicians, and medical researchers. 

Advances in DNA sequencing have made it possible to develop greater understanding regarding the role of genetic structures in disease susceptibility and treatment efficacy (Wade, 2010).  Scientists have identified genes that raise the odds of getting illnesses such as breast cancer, or increase the likelihood of adverse reactions or bleeding.

For example, they have found that those carrying certain mutations in the BRCA1 or BRCA2 genes have a higher risk of breast cancer and those expressing the HER2 protein are at greater risk of reoccurrence.  Combined with detailed family histories and diagnostic tests, doctors can pinpoint who is most susceptible to breast cancer and therefore needs to be monitored most carefully.  Physicians have documented that you can’t just treat patients based on population averages, but need to be aware of subgroup and individual differences.

Investigators have made progress in determining who is most likely to benefit from possible treatments and who is likely to be harmed.  In oncology, for example, pathologists measure estrogen receptor expression to determine eligibility for tamoxifen hormone therapy among those suffering from breast cancer.  Effectiveness has been found to be contingent on a cytochrome enzyme P450 2D6 needed to metabolize the drug, although the results have not always been consistent across studies (Goodman, 2009). Genetic tests for HLA-B*1502, a particular variant of human leukocyte antigen (HLA), are already available and can predict increased susceptibility to dangerous or even fatal skin reactions such as Stevens Johnson syndrome and toxic epidermal necrolysis resulting from carbamazepine therapy used in the treatment of seizures.  This allele occurs almost exclusively in patients with ancestry across broad areas of Asia.

There has been mixed evidence regarding a link between the genotype CYP450 and treatments using selective serotonin reuptake inhibitors (SSRI) antidepressants.  Some patients suffering from metastatic colorectal cancer whose tumors have a gene mutation called KRAS have not responded well to treatments using panitumumab or cetuximab (Downing, 2009).

Analysis has demonstrated that many patients are not able to benefit from particular drug therapies.  Iressa and Tarceva are drugs for treatment of non-small cell lung cancer, but they are effective only in tumors that express the epidermal growth factor receptor gene.  Other medications are ineffective for 70 percent for Alzheimer sufferers, 50 percent for those with arthritis, 43 percent who are diabetic, 40 percent who suffer from asthma, and 38 percent who take SSRI antidepressants (Spear, Heath-Chiozzi, and Huff, 2001; Goodman, 2009).  Since people metabolize medicine in so many different ways depending on their particular combination of genes, the resulting enzymes, and their current health status, it is vital for a safer and more effective healthcare system to have an understanding of genomic information to reduce adverse events and determine optimal therapy (U.S. Department of Health and Human Services, 2008).

Policy Challenges and Recommendations

While technology, patient engagement, and scientific advancement are changing health care practices, genomic information in particular has the potential to transform medical practices and outcomes in fundamental respects.  Genomics is being introduced in several ways.  One is through patient empowerment, making genetic information available directly to them so they can address genetic factors directly with their physician (Pollack, 2011).  Another model that is unfolding through public and private sector initiatives is to employ genomic information through medical practice and weaving that material into patient care throughout their lifetime.

The capture and storage of genomic information will redefine health informatics data flows.  The result will be improved decision rules and streaming of information directly into medical decision-making.  This will make health care delivery more efficient.  Therapies can be given more precisely to those patients most likely to benefit and not offered to those patients who would be harmed by the treatment.  While there needs to be additional research on these questions, there is reason to believe that the net cost of care per patient could be reduced.

Realizing the use of genomic information in health care has the potential to generate important benefits for patients, physicians, and public health officials.  In order to take advantage of these developments, though, it is vital to connect genomic and other personalized information to electronic health records, and to integrate established statistical correlations between genetics and drug effectiveness.  Diagnostic and treating physicians need this information to coordinate patient care effectively.   As researchers learn more, genetic information and susceptibility to drugs and side effects should be at the fingertips of doctors in the same way that family history, vital signs, and medical tests are.  Timely information would help caregivers incorporate what works and doesn’t work into their clinical decisions.

Better Data-Sharing Networks

One of the biggest barriers to gaining efficiencies in our current system is interoperability problems in connecting different information systems (President’s Council of Advisors on Science and Technology, 2010).  The United States has a health care system that is quite fragmented owing to the existence of 650,000 doctors and 5,800 hospitals.  The clinical records of patients do not travel with them electronically, and most of the computing systems do not enable data flow.  Many have different systems for compiling billing, lab tests, medical records, prescriptions, treatment, and appointments, which makes it very difficult for providers to exchange information outside of electronic converters.  And to make matters worse, the information captured in some of these systems relies on different semantics that make cross correlation nearly impossible.

There currently are discrete data sets that exist in parts of the system, but they are not integrated.  It is hard to determine what works and how to assess costs and benefits.  Technology has been used to improve the accounting and administrative aspects of health care, but not its knowledge management.  We need information systems that help us analyze the overall contours of health care.

In the medical area, the creation of national drug codes (NDC) created reimbursement efficiencies.  Establishing a 10-digit code for each medication helped to make drug administration safer and more economical.  It facilitated the tracking of pharmacological information, and produced benefits both for consumers and businesses.

Health information technology and electronic health records can serve the same type of integrative role.  It is possible to track claim receipts in real time.  When combined with information on medical tests and clinical outcomes, this material will shorten evaluation cycles and enhance our ability to control costs in ways that do not weaken quality.  Treatment guidelines in electronic health records would help physicians understand their treatment options. 

The goal of data-sharing networks is to develop a so called “virtuous cycle” for health care where improvements build on one another.  Electronic health records with proper coding and the use of that data could inform clinical care and help evaluate substantive value.  Treatment information should be linked to outcomes, with reimbursements based on the end-result.  A balance between costs and benefits would help people make informed decisions.  Right now, there is much greater concern about the costs and burdens of integration than its possible benefits to patients, physicians, researchers, and public health administrators.

An excellent example of a new kind of data exchange is the cancer Biomedical Informatics Grid, or caBIG.  This network, launched in 2003 by the National Cancer Institute, connects more than 50 NCI-designated Cancer Centers, along with other academic and commercial organizations, making it the largest national biomedical information network in the United States.  Capabilities compliant with caBIG interoperability specifications enable the collection, analysis, and exchange of a wide range of biomedical information through a well-integrated, standards-based infrastructure coupled with open-source and commercial software applications.  These technologies create an integrated electronic system that enables clinical research, genomics, medical images, biospecimens, and patient outcomes data to flow easily but securely between and among authorized individuals, organizations, and institutions.  These capabilities enable health care providers to leverage resources developed in research settings to identify molecularly sub-grouped patients, collect and view their patients’ histories individually and in the aggregate, and collaborate across organizations to test research hypotheses and evaluate new treatments.   

Ending the Health Care “Tower of Babel”:  Improved Semantics and Data Coding

Current electronic health systems have a “Tower of Babel” feature that undermines connectivity.  Researchers, clinicians, and industry employ inconsistent standards in how medical terms are defined and applied to health conditions.  They don’t classify diseases in the same way or describe symptoms with similar language.  These semantic inconsistencies make it difficult to populate electronic records with data that are comparable.

In the world of specialty care, this problem becomes even more serious.  Health providers from various specialties require different information and often record symptoms in dissimilar ways.  Pathologists may have different informational protocols from oncologists or internal medicine physicians.  As long as there are semantic inconsistencies, it is hard to take full advantage of digital record-keeping systems.  The College of American Pathologists has worked with oncologists to develop cancer checklists which standardize reporting and help to close this gap.

Right now, the United States does not have adequate diagnostic coding protocols.  Each of our 5,800 hospitals has its own nomenclature and fields of description.  Many providers use different terms to describe the same symptoms.  The data are overly aggregated and therefore hard to determine what actually is going on.  For example, there now are 60 different types of leukemia, but the currently-used International Classification of Diseases (ICD) codes do not reflect the diversity of that disease.  Physicians and administrators say they need greater granularity in the coding conventions.  Since diagnostic tests represent up to 70 percent of physician’s core decisions, according to informed experts, the best way to evaluate costs involves greater precision in coding lab tests.  However, with the future implementation of ICD-10, additional granularity with regard to specific diagnosis will be enhanced. 

The same problem develops in regard to genomics.  Although the CPT (Current Procedural Terminology) Editorial Panel of the American Medical Association is working to correct this issue, we currently do not have differentiated billing codes for various molecular or genetic conditions.  Many health care systems don’t distinguish gene tests for breast cancer versus other illnesses where genetic tests are employed.  This makes it impossible to aggregate data or link genomic information to disease diagnosis and treatment.  Validating genetic links to particular diseases will improve drug targeting and treatment. 

There have been improvements on some of these dimensions.  The Systematized Nomenclature of Medicine – Clinical Terms (Snomed CT) has developed a disease categorization nomenclature that is used in 15 countries.  But many of its codes are not detailed enough for research purposes.  There are disconnects between clinical and research communities that prevent each from building on the work of the other.  Common points of reference are required that make use of new research as it develops.  This includes more variegated descriptors, more detailed codes, and more specificity on the different types of cancers that are being identified.  The system needs to be dynamic in nature so that it is regularly updated as researchers develop new knowledge about medical illness.  

Professional networks are making progress on coding and language description.  The role of the Clinical Data Interchange Standards Consortium has been helpful and its reliance on professional experts from several fields led to the development of widely-used standards.   Clinical genomics guidelines for health care providers would be useful in order to provide an overall framework for integrating genetic information into electronic health records.  There remains a need for greater specification of different types of genetic background. 

Of course, standardized vocabularies work only when there is uniformity in what is being described.  Medicine still is constrained by the imprecision of human language and vagueness of patient symptoms.  True semantic uniformity may come best with machine-generated data, not perceptual data (Shirky, 2003).

More Balanced Privacy Rules

Privacy represents a major issue for the American public.  According to survey data, many Americans are concerned over the confidentiality of online medical information.  Sixty-two percent of adults in a national poll felt that use of electronic medical records makes it more difficult to ensure patients’ privacy (PR Newswire, 2007).  Seventy-five percent of Internet users worry about health care websites sharing information without their permission.  Seventeen percent of people in a Harris Interactive survey (2007) reported that they withhold information from medical personnel due to concerns that these individuals would disclose the data to unauthorized individuals.

The rise of personalized medicine makes these concerns even more pronounced.  Genetic information, by definition, is deeply personal since genotypes, enzymes, and proteins are unique to the specific person studied.  If released publicly, this information has ramifications for possible employment, economic prospects, and social relationships.  If employers knew someone carried a gene that seriously increased the odds of a chronic disease, would they hire that person?  Knowing the benefits of genetic testing for diagnosis and treatment does not mitigate against the possible risk of privacy violations.

There is no question that strong privacy rules are required (Goldman and Hudson, 2000).  People fear discrimination or adverse job consequences from medical information not being kept confidential.  In part, this is why Congress in 1996 adopted the Health Insurance Portability and Accountability Act.  That legislation was designed to address patient privacy concerns and insure that appropriate safeguards were put into place.  The American Recovery and Reinvestment Act has strengthened these privacy rules.  The Department of Health and Human Services has a proposed rule out for public comment that would apply HIPAA rules to business associates of covered entities.

In 2008, Congress passed the Genetic Information Non-Discrimination Act prohibiting use of genomics in employment hiring or firing was very helpful in ensuring patient protection.  This bill disallows health insurers from using genetic testing to determine rates and helps to reassure consumers that genomic information could not be used against them.

However, some experts question whether current privacy rules strike the right balance between privacy and innovation.  A 2009 Institute of Medicine (IOM) report concluded that “the HIPAA Privacy Rule does not protect privacy as well as it should, and that, as currently implemented, it impedes important health research.”  The report suggests people should distinguish “information-based research” from “interventional clinical research.”   For research, project analysts argued that it was not feasible to get consent for all secondary uses, especially in situations where the data were “de-identified”.   

The IOM Report authors suggest the need to revise privacy rules to distinguish health research from practice and to allow for a “mechanism for linking an individual’s data from multiple sources such as databases so that more useful databases can be made available for research in a manner that protects privacy, confidentiality, and security.”  In its conclusion, the report calls for a new approach to privacy, saying “effective privacy protections must be implemented in a way that does not hinder health research or inhibit medical advances.”

New Approaches to Privacy and Access Control

The question for the health community is how to protect privacy and provide mechanisms for patients’ access control.  Technology helps in this regard because it allows patients to make decisions in more refined and differentiated ways.  Intruders leave digital fingerprints so that patients and administrators can determine who saw electronic records, what files they looked at, and how long they browsed particular parts of the record.  Digital systems produce a level of accountability that is not possible with paper records and make it easier to enforce penalties where there are intrusions.

Current limits on data sharing and secondary analysis of de-identified data make it difficult to get the benefits of genomics.  Data access is n