The world’s 3.6 billion internet users depend on computer algorithms to sort through the vast ocean of information available online. Algorithms follow a set of programmed instructions to transform data into a form that humans can understand, deciding everything from the content of social media feeds to the creditworthiness of borrowers. Though algorithms handle digital data, their decisions also have consequences in the analog world. Last month, a homeowner in Illinois filed a lawsuit against the real estate data website Zillow, alleging that their home value estimator tool significantly undervalued her home and impeded its sale. To protect European citizens when their data is used in “automated decisionmaking”, the European Union enacted new data protection rules last year. This kind of scrutiny may increase with a greater reliance on algorithms to make sense of online data.
Big data, big money
Zillow responded to the Illinois lawsuit by stating that their “Zestimate” tool did not constitute an appraisal, which requires a license in the state of Illinois. Without Zillow or a similar service calculating a property value estimate based on square footage, tax appraisals, and sales of similar homes, a buyer would have to research this information by themselves. However, the sums involved raise the stakes for computing home price estimates: Zestimate’s reported average error rate of five percent of the final sale price for a home results in a difference of tens of thousands of dollars.
Regardless of the outcome, the lawsuit raises the question of how an individual can dispute the outcomes of an algorithm. Typically, the underlying code is not open to public scrutiny, and technology companies guard the content of algorithms as extremely valuable intellectual property. For example, Zillow recently offered a million-dollar prize to the individual or team that could best improve the accuracy of its estimation tool. Forcing companies to disclose the contents of their algorithms could devalue their intellectual property, but the possibility of adverse consequences remains. The Zillow lawsuit or future regulation could mandate some recourse for individuals questioning how their information is used.
Explaining a “right to explanation”
Across the Atlantic, European Union regulators have long held an interest in how online information impacts EU citizens. The General Data Protection Regulation (GDPR), scheduled to take effect in 2018, updates the previous set of rules from 1995. The GDPR codifies the “right to be forgotten”, allowing EU citizens to request that search engines remove links to information deemed incorrect or irrelevant. In addition, the ruling establishes “the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” To uphold this right, the GDPR guarantees a right to human intervention whenever the results of an algorithm come into question.
Technology observers worry that the GDPR establishes a “right to explanation” for automated decisions, though the extent of this right is subject to interpretation. Sandra Wachter, Brent Mittelstadt, and Luciano Floridi at the Oxford Internet Institute argue that the language of the ruling is too vague to offer anything more than a “right to be informed” that their data is used. Legal ambiguity notwithstanding, it remains unclear exactly how to explain outcomes in a satisfactory manner. Algorithms find statistical correlations in sample data, but are not designed to establish causation between data inputs and outputs.
This becomes especially problematic in cases where input data reflects human biases based on race, gender, age, sexual orientation, and other categories protected by anti-discrimination laws. Paradoxically, algorithms can amplify these biases without additional oversight. One potential solution would be to strip away any identifying information that could lead to discrimination, intended or otherwise. However, this preventative measure would not explain any disputed outcomes after a decision was made. According to Bryce Goodman and Seth Flaxman, also at the Oxford Internet Institute, algorithms must be designed so that a human can interpret the outcome. However, the two researchers highlight a tradeoff between the representation and interpretation of algorithms. Simpler models are easier to explain, but also fail to capture complex interactions among many variables.
While algorithms can perform routine functions faster than any human, they cannot evaluate themselves on how well they serve human needs. The Zillow lawsuit and the EU’s General Data Protection Regulation show a growing recognition of the consequences of some “automated decisionmaking”. Those impacted should understand how algorithms use their data to arrive at an outcome without needing any computer science expertise. Giving users an overview of how their data is used also preserves intellectual property protections of the algorithm’s underlying code.
To determine exactly what information users should receive, regulators should work with technology companies when developing rules for human oversight of algorithms. This will give the public a better understanding of how algorithms use their data, give regulators better tools to protect consumers, and give tech companies valuable feedback for improving their products. As more algorithms are deployed to make sense of increased personal data generated online, some effort must be made to make sense of all the algorithms.