Skip navigation
The Habeas Citebook: Prosecutorial Misconduct - Header
× You have 2 more free articles available this month. Subscribe today.

De-Identified Is Not Anonymous

by Michael Dean Thompson

Corporations collect all kinds of data about their customers with few rules about what they can do with it. Often, these collections come with assurances that the data will be de-identified before being sold to additional parties such as data brokers. One meaningful example is the list of apps that collect location data but then de-identify the data by stripping all but the device ID, location, and timestamp. However, that data is anything but anonymous.

Nearly all data tend to have some degree of identifiability. Direct identifiers are obvious, such as social security and driver’s license numbers. Indirect identifiers such as phone numbers and email addresses give us a high degree of confidence with regard to identity. If a specific email address is tied to a corresponding phone number during a transaction, the two begin to more closely proximate a direct identifier. Aggregation, however, can render data unidentifiable, such as the aggregated census data. An example of unidentifiable aggregated data would be a count of people who list a specific restaurant as their favorite.

Even when indirect identifiers are removed from the data, law enforcement, data brokers, identity thieves, and many others may still be able to identify the people within the data. The question is how many of the data points can be tied to a single person (i.e., the granularity of the data). Location data is quite frequently collected by various apps, including those within automobiles. Studies have shown that just a few locations tied to a specific device is enough to identify the person behind the device.

A 2013 study in Nature, found that just four GPS coordinates was enough to identify 95% of the persons in a dataset containing location data for 1.5 million people. Armed with just a birth date, gender, and the first three digits of the zip code (a very rough location point), one study was able to identify 87% of the people. Yet another study of the same three data points identified 63%. If a website asks for those three tidbits of information, very few of us would hesitate to give it, not realizing we are more likely than not identifying ourselves.

The problem is that more granular data is more valuable to the data brokers. Data aggregation requires knowing how the data will be used. More granular data is more valuable because it can be repackaged for more consumers.

Increasingly, law enforcement is turning to corporations and data brokers for consumer data. Fusion centers armed with massive data processing capabilities are devouring consumer data to feed their surveillance apparatuses. Location history warrants occupy up to 25% of the warrants received by Google. And, while Google has recently announced efforts to change its practices with location data, it is not the only one collecting it, nor is location the only identifiable data it collects. There must be a change at the public policy level that forces corporations to identify the type of data collected and how it is shared (especially with law enforcement) and then only on an opt-in basis.   



As a digital subscriber to Criminal Legal News, you can access full text and downloads for this and other premium content.

Subscribe today

Already a subscriber? Login



Stop Prison Profiteering Campaign Ad 2
Advertise here
The Habeas Citebook Ineffective Counsel Side