GDPR reading notes

tl;dr those are my reading notes from the General Data Protection Regulation, GDPR. There are a few basic rights that are guaranteed to end users and a lot of regulations will go on service providers. Nevertheless most specifics of the upcoming regulations will be established by national regulatory bodies, which counters the initial intent stated in the text: reinforce the single market by uniformizing the data protection and dataflow within the European Union. Most of the rights granted by GDPR are existing in a lot of european countries existing laws.

I am trying here to go a bit deeper than just yelling “data protection officer” and “huge fines” like a maniac.

I do not have an extended background in law, and reading a full EU regulation was a first time for me. Maybe my biggest advice takeaway for next time is: do not read the first pages which seems like a long prolog full of “should” that lay out very general concepts and intentions but that is not composed of formal obligations. That is about half of it. The law is about “shall” and “have to”, not “should” and “may”.

I have read the GDPR in english, but the translation is available under all european languages. You can find the reference document here. I will try to state precisely the paragraphs related to my various points, when you read 24.3.c, you must look at Article 24, paragraph 3, sub paragraph c.

This regulation applies starting 28th May 2018.

Intentions

The General Data Protection Regulation states that its intention is to provide a common ground to ease the flow of personal data within the European Union.

Scope

Personal data regarding people “natural persons” located within the union.

Out of the scope:

The processing from an individual (2.2.c), understand that most organization must comply.
All if not all cases of data usage related to judiciary authorities (2.2.d)
Data related to the dead

Generally out of the scope (because of many exceptions):

Political parties
Scientific and historical purposes (5.1.b)
Member states (23)
Any state (27.2.b & 48)
Vital interests (6.1.d)
State actions (6.1.c & 6.1.e)

Note here that a lot of statements are made with exception to “public interest, scientific or historical research purposes or statistical purposes” (89.1). In my understanding this means that deriving models and usage analysis are authorized in a lot of cases. As long as data is kept to only be usable for those purposes (see below “restriction of processing”).

Not a lot for statements are made to protected about press and general freedom of speech. Those shall be adapted to comply. (85.1)

Processing of children related data (8) are more restricted that other data.

General concepts

Personal data & data subjects

“Personal data” is any information related to an “identified or identifiable person” called in the GDPR “data subject” (4.1).

Identifiability can be direct or indirect (4.1). Which means that an id, an address, an email, an address are automatically under the personal data definition. But multiple factor can also qualify identifiability: genetic, physiological, physcial, or social identity. Identification shall only be feasible while it is necessary for the data collection purpose, 5.1.e states that you can keep unidentifiable data for statistical purpose.

Note here that data is associated here with a purpose (5.1.b), its collected for such a purpose and can not be processed in a way incompatible with that initial purpose.

Processing, profiling and restriction of processing

“Processing” is used as a very broad term that covers recording, storage, adaptation, destruction, it can be automated or not. (4.2)

“Profiling” is also mentioned as a form of automated processing (4.3) Its definition covers a lot of applications of machine learning, artificial intelligence or whatever you want to call it.

The notion of “restriction of processing” is the idea of limiting personal data to whatever is needed for the intended processing (4.4). “If you don’t need it, remove it”, a principle that will surely limit the impact of data leaks. “data minimization” is also used to describe similar approaches.

Data controller and data processor

“Data controller” and “data processor” are the entities, generally companies, that will process the data. The controller is the entity that decides the processing. Processors are the outsourcing companies, ranging from software engineering services to hosting companies (4.7 and 4.8), by default processors can not outsource processing further without the written controller’s consent (28.2).

Data protection by design and by default explained in 25 is a super fancy term but its definition is unclear and it shall be proportioned to the risks. Read later about “Security”.

No matter who is actually performing the processing, all involved commanding entities are liable. (28.4)

If a processor can access some data, it does not mean that it has a right to process it. (29)

The subject has to actively “consent” to the processing (4.11). And this consent has to be distinguishable from other matters (7.2). My guess is that on boarding processes should clearly mention that the user accepts general terms and agrees to data processing. The regulators insist on a “clear and plain language”, which the GDPR is not precisely a perfect example of :D

7.3 states that consent is withdrawable.

7.4 brings ambiguity about freely given consent. It is questionable when a service is conditional on consent to the processing of unrelated personal data. While raising the ambiguity of such “freely given consent”, the regulation does not state that this does not qualify as a freely given consent.

Breach

“Personal data breach” covers security breaches leading unauthorized access to data but also loss or data alteration (4.12).

This reference to security breaches as a source of a problem probably makes the regulation irrelevant in the event of mistaken data deletion.

In case of breach resulting in a risk to personal data, a notification must be send to the supervisory authority within 72 hours after the controller becomes aware of it (33.1). It should among other things contain the measures to mitigate the possible adverse effects. (33.1.d)

If, and that is a big “if”, the data breach may results in high risks for the data subject and the data has not been encrypted; the data subjects must be informed without undue delay. (34, 34.3.a)

Supervisory authorities

A lot of the details of the regulations will have to be defined by the local authorities of member states (4.21).

For example, the regulation states that appropriate security measures have to be laid out but it does not state which ones, nor defines a union body that will define and update them. I am thinking about cases where two local jurisdiction are coming up with two different standards and the mess it could lead too. For example, one stating that passwords should be hashed using SHA1 and another one prefering SHA256. There is a huge potential for incompatibilities, headache and nightmare.

The concerned authority (4.22 & 79) can be determined by the location of the controller or processor, but also if it has received a complaint. Yes, you are likely to be based in Spain and deal with requests from Swedish and Polish authorities. #enjoy

There are mechanisms but no strong binding obligations for consistency. (63, 64)

My reading of 78.1, is that an appeal can be made to authority decisions, bringing those authorities back to a more traditional Justice system.

Data protection officer

This is the new fancy word to pronounce if you are a consultant discussing about GDPR, if you are looking for a job it also made “I am an entrepreneur” completely has been. You can offer your service as an external data protection officer. (37.6)

Controllers must have a data protection officer, if they perform data processing as a core activity, or sensitive data processing. (37.b & 37.c)

He mostly ensures the compliance with GDPR. (38 & 39)

Codes of conduct and certifications

The GDPR gives a lot of room to mechanisms enabling controllers and processors to prove good faith in being compliant. Codes of conduct and certifications can be defined at many levels (40, 41 & 42). This is a the guaranty of regulation conflicts between states, supervisory authorities, industries and perhaps companies.

I have read a couple articles from consulting people selling those certification as being a way to reduce the risks. I stress out 42.4, a certification “does not reduce responsibility”, it only demonstrates a compliance effort.

Key principles

User rights

Data subjects are garanteed to be able to have transparency over a processing (5.1.a), access (12.2), to rectify informations without delay (5.1.d), to request for data erasure (17.1.b & 17.1.c).

12.3 data access has to be easy (electronic) and free, the maximum response time is one month and can be extended to two months depending on the complexity. It can only be refused for unfounded or excessive, but the controller will have to be able to prove this particular point. Standard format should be provided, this is mentioned as a “right to data portability”. (15.3 & 20)

If some entities expose data to a third party and it is feasible, enabling the propagation of deletion requests should be done.

No major rights is granted against the state entities (6.1.c & 6.1.e) including massive population monitoring. I guess it is ok as long as you are the good guys. #ironic

Ironic

Controller duties

See “Data protection officer” above.

13 is clarifying the obligation from the controller to inform on data collection: identification, purpose of processing, storage period, right to withdraw, the existence of automated decision making, the right to lodge a complaint, the right to access data, personal data origin,

Note that 13.2.g about automated decision making (machine learning, artificial intelligence and other data joys) asks for an explanation of the logic and consequences of the processing. It does not mean models accountability, but it means general description and principles.

Processing activities must be recorded (30) by controller and processor. The record must contain : contact detail of the controller and data protection officer, purpose, description of data subjects and category of personal data, recipients of processing, transfers of personal data, envisaged time limit (for erasure), technical and organizational security measures. 30.5 contains a list of exception to this obligation, but if you read this article this might not be relevant.

Data breaches must be also kept in a record. (33.5)

Nevertheless 6.1.b also authorizes processing when it is required for the execution of contract or before contracting where the subject is party.

I did not reach a satisfying understanding of 6.1.f, authorizing the processing for legitimate interest of the controller only if not going against the fundamental rights and freedoms of the data subject. This is a “by default” right to processing in many case such as crawling.

Ambiguity moment, controllers must inform data subjects about their rights at most one month after collection. (13.3) But this only stands if feasible. (13.5.3)

9.1 forbids this processing without consent for social, political, biological and sexual data.

21 enables data subjects to make motivated objection processing ; in particular to direct marketing. (21.3)

22.1 if processing has legal effects (22.2.b except when a Member State authorized it), data subjects can refuse the decision.

Security

32 mentions a few elements for the security of processing that must be appropriate to the likelyhood and severity of the risk, among which encryption and pseudonymization. Understand here, that encryption is not mandatory, it is left to the interpretation of risk.

Accountability

Controllers and processors have to be able to demonstrate compliance (5.2). They have to be able to demonstrate that they have collected consent before processing (7.1), I guess that storing the last accepted “terms and conditions” version number and the consent data should become industry practice.

Fines & compensations

There is a lot of fuss has been made around the potential costs for companies.

In case of damage, there is a right for compensation for the data subjects (82.1). And the controller is liable for such compensation. Which means, that if a company uses a processor just to use it as a fuse mechanism… this just does not work. But a controller can seek compensation to the others parties with proportion to their responsibility.

The maximum threshold for fines is very high: €20 M or 4% of the worldwide turnover (83.4). But even it should be proportionate, effective and dissuassive, that is a non ambiguous way to mention that this might be a milking machine against internet giants.

New technologies and risk assessment

New technologies are falling under 35, that requires a very cautious, and, at least thought approach to new risks. Controller are required to assess, reduces and potentially seek approval before new applications.

International transfer

By principle the transfer to a country outside of the European Union, shall not affect the protection granted to data subjects. (44)

Such protection can be guaranted by an approval by the Commission (45) or by contract. (46 and 47)

Technicalities

Data warehousing

Data warehousing is a practice usually reserved where an important volume of data is processed. You extract data from your normal database and you structure it for analytical purpose in another database. Since you are supposed to erase personal data after use or upon request, it is a very nice idea to assemble a data warehouse where you remove identifiers but keep data if that makes sense in your business for statistical and analytical purpose.

Random thoughts on random cases

What is identification?

What the GDPR does not state is too which extent statistical identification qualifies as identification. If I know your city, your income, the school you attended to and your birth year, does that qualify as an identification ? If you are in London, probably not, if you are in Konz - Germany, most likely yes. Generally the regulation is calling to consider the state of the art and to moderation to specific cases from local regulators and authorities. So my personal opinion would be to call this specific cased a bit far fetched and not qualifying as personal data.

Feeding your CRM system with LinkedIn profiles

Marketing purposes are stated as being legitimate and a LinkedIn profile is explicitly public information. So most perhaps you have nothing to change here but respect the right for update, non processing and deletion.

Email tracking

A lot of people doing sales love tracking the opening of their email. Which by default seems to me to step over the recipient / data subject privacy, so this might not be a compliant thing to do.

If you can collect consent for processing though, go for it if you can!

This might lead to a few email tracking companies disappearance.

Crawling

Like for the LinkedIn profiles, I would say that this is by default compliant. Crawling is a technicality, what matters is the purpose. If the purpose is against the data subjects interests it becomes illegal, e.g.: if you start crawling for proof of love affairs.

Revenge porn

Becomes illegal, not only in the EU, but against any EU residents. It is clearly against the interest of the data subject and without consent.

Leak of hashed passwords

If you store user passwords, you should hash them, otherwise you are doing very bad engineering.

Now let’s say that you have a database with only a user identifier (email or phone number) and a hash passwords that gets breached. This clearly qualifies as personal data because you have an identifier. Though you can argue that the database is useless, you can argue too that the data is worthless because there is not much inside.

Now considering that people tend to reuse password a lot, you may not only have stored their password for your service, but for a lot of others. This would make the data much more valuable.

Now depending on the technicalities of how you encrypted / hashed the password (use of a user specific salt or generic salt), you can also consider that the passwords are very likely to be decrypted.

You must warn the supervisory authority, update your breach record.

The fact that you should warn your users is more to be discussed.

Codes of conduct incompatibility

There is a high authority to resolve conflict between supervisory authorities and guidelines to promote consistency, but nothing forces it. Hence I would say that if two code of conducts are presenting incompatibilities then there is a grey area, but since following a code of conduct or getting certified are not obligations, it is not a major problem.

There is a notion of lead supervisory authority (56.1) determined by the location of the controller, though may still mean multiple authorities for service providers or subsidiaries.

Students affectation to university

In some countries student are assigned to universities through a fully automated process governed by law. The GDPR is giving an explicit right to refuse this decision and request for a manual processing.

Data transmission through post-Brexit UK

Post-Brexit UK might either be considered as a complying country by the commission or providers in the UK will have to accept binding contract to ensure GDPR compliance.

Blockchain transactions

A lot of blockchain technologies have identifiable transactions, so they qualify as personal data. Since there is no legal entity being heavily distributed system and data is technically not erasable, it is likely that the right to update or deletion is impossible to use.

Compliance attacks

Even after regulation comes in force it is very likely that a lot of organization will not be ready to comply. A request for data extraction will cause a certain amount of extra work that companies are force to provide for free.

Even big organizations can be subjected to discussions. The compliance paranoia can cost a lot of money. For example Facebook now enables you to export “all” your data, but from this data is missing internal user tagging or scoring as well as foreign keys. But all those a linked some one to you, so they qualify as personal data and should be accessible under GDPR. Full compliance most perhaps mean that organization are going to leak a lot of their architecture and process through data exports, and users are entitled to do whatever they want with it.

Movies & art production

Let’s say that in 2016 you hired an actor for a movie. In June 2018, he comes to you and ask you to remove him from every recording that you have.

This claim is heavily connected with image rights, which I do not know. But following GDPR, the movie is personal data because you can clearly identify the actor (4.1) and the right to deletion (17.1.b) if he withdraw consent. But as the copyright owner you can argue that this copyright gives you a rightful right for processing giving you a legitimate ground (17.1.c) for processing.

Interviews

Data access right knows very little limitations. So if you are curious to know what has been said about you in an interview, you can now ask. Every recorded data should be sent to you, we are talking about digital or non-digital data.

Losing a phone or an access badge

Losing a phone or an access badge is clearly a data breach in any sense. Informing the supervisory authority and the users will depend on your risk analysis, what does this phone or badge gives access to? How sensitive is the data? Is this data encrypted?

Sending an email to the wrong address

Again, of what kind of risk are we talking here? If you have just sent a medical analysis to the wrong person, then I would say it ranks among the risky and perhaps highly risky elements. If it was just about buying a bike, then that is a different call.

Checklist

A short checklist if you need something to jump to if you are a controller or processor acting under GDPR. This is most likely the case if you are having a database with people inside it at the core of your business.

Minimum:

Data protection officer
Rightful processing base: mostly consent or legitimate interest
If obligation to reach out to the data subject, provide controller identity, data protection officer identity, purpose of processing, legitimate interest of processing, recipients of the data processing, transfer to third country or international organisations.
If consent is collected from the data subject, provide a clear link to the data policy.
If consent is collected, maintain a record of when and which data policy was accepted.
A way for the data subjects to send data update, export or deletion request.
Record of processing activities (see above)
Record of data breaches

Enhanced, very basic:

Automated way for data export
Automated way for data deletion
Encryption

Without fully buying this list as it goes above regulation, I suggest to check the GDPR Compliance Checklist.

Conclusion

So I seem to have written a 10 pages article about an 88 pages law document, I am pretty proud of this , in my head it is now a 98 pages topic.

More seriously, the GDPR is bringing a new administrative burden to data processing but does not provide any norm, nor unified mechanism to define them in order to ensure a better processing of data. Like this is now this habit about regulatory topics. I would not be surprised to see a GDPR2 in a few years like we have had a PSD 2, a MIFID II and a Basel III. If so, software is becoming the new finance, a fancy target of regulations efforts.

As a conclusion, regardless of the content of the regulation I am wondering what motivation drove it. There is a lot of media coverage about big companies selling people’s data which in most cases is a very narrow oriented way to say that they are making revenues from targeted advertising. There is not much of a track record suggesting that our society (especially in Europe) actually needed such a regulation.

Maybe the big internet corporations make easy targets, but they will adapt to all this pretty easily. New actors will not. Google have thousands of users that will opt-in to whatever you ask them and their business will just run as usual. If there is a new company in Europe that wants to take Google’s place, they will have to do it without a massive user base consenting to data analysis.

On the other hand there is a clear need for a wider markets in information technologies, a topic not addressed by the GDPR. Many european countries have industry specific regulations legitimating mandatory approval of some technologies & service providers, like HDS (health data hosting) in France, or PSF (financial sector professional in Luxembourg. This leads to many markets being disconnected from the tech market real world, as far as I know neither AWS, Azure or Google Cloud are approved for either PSF or HDS. Meaning that selling you another pair of shoes involves more up to date information technology than curing your cancer or investing your retirement money.

This article is missing perhaps a few reviews if you have comments or criticism please use the Hacker News thread.