Sections

Commentary

In the Future We Will Store Data Not in a Cloud But in a Lake

A data center

Big data has led to huge gains in efficiency and productivity. The private sector has developed many innovative and beneficial uses for big data. The public sector has lagged behind and is just beginning to apply these types of data analyses to provide better services and make government programs more efficient.

Current Government Big Data Policy

In January, the Obama Administration released the report, Big Data: Seizing Opportunities, Preserving Values. This report emphasizes the President’s goal of more effectively harnessing big data to benefit the American public. An innovative new approach to big data is coming from the Center for Medicare & Medicaid Services, which has recently launched the CMS Virtual Research Data Center (VRDC). This data portal will allow researchers to access vast amounts of health care data from their own computer. The VRDC allows for greater data transparency and could lead to better insights into how to improve the effectiveness of current healthcare systems. This type of access should be extended to other sectors of the government to encourage the same type of analysis.

The President’s report calls for improved data standards and places a premium on machine readable formats. This strategy intends to enable entrepreneurs and others to analyze such data sets using big data analytics. Given that the public sector lags behind in its use of data this is undoubtedly the best approach, but it is not without its disadvantages. To make data easier to share government agencies must clean it and convert it into machine-readable formats. The release version of the data is often aggregated to allow for specific types of analysis. For example CMS provides data on how often doctors perform certain procedures and how much they charge. The process of cleaning and aggregating the data results in a significant loss of records, which places a strict limit on the possible avenues of study. In order to make available a finer grain of data, the government must adopt a new approach to the management of data.

Data Lakes: The Future of Big Data

Data lakes are widely regarded as the next step in the rapid evolution of big data. These massive databases will contain data that remains “unclean” and users will have access to a more “pure” form of the data. Administrators would only format and standardize data to make it machine readable and easy to use. Developers in the private sector are developing new types of search engines to query these vast stores of complex data. Several enterprise data companies have started developing this concept with the hope of being able to implement data lakes as a common enterprise solution. The government spends considerable resources collecting, cleaning, and aggregating data. Using data lakes would cut down on the resources necessary for processing data and makes it easier to combine data sources from across government agencies. The implementation of a data lake could lead to more efficient government processes and access to more diverse data sets.

The depth of analysis that data lakes can enable holds immense potential for improving existing information systems and achieving new insights. However, serious considerations should be given to the level of information that is accessible through a data lake. In the healthcare industry, data lakes could provide access to patient-level information over the entire lifespan of the patient. Who will be able to access this information and through what medium? These are just a couple of the questions that will need to be answered before this type of technology can be safely and effectively implemented.

 

Kevin Risser contributed to this post