The site you are currently visting does not have a cookie consent banner because it does not set first-party cookies, nor does it run software setting third-party cookies. It also doesn’t track or collect personal information from people visiting the site. There. Easy-peasy. End of story.
OK, so not quite.
Nobody likes consent banners, regardless of what you think of cookies or tracking. Ideally, in my view, that dislike should steer publishers away from tracking that requires consent.
Sadly, that is rarely the case. Most publishers – or at least their advertisers – prefer to redirect their readers’ annoyance at the lawmakers who “oblige” them to pester their users with complicated cookie consent forms. The lawmakers do no such thing.
The ePrivacy directives and the GDPR do not require you to annoy and pester your users. Neither do they require you to abjure data or analytics. They require a reasonable balancing of considerations when storing and processing information. If you want to store personal information without any other lawful basis, they require consent. Quite rightly so.
This series is a reflection of the fact that for the longest time I left the question of what my site did with cookies and personal data. There were always better projects, more interesting ones to engage with. However, as I was increasingly making demands on others to respect my GDPR rights, I was starting to feel a bit hypocritical not really knowing what I was doing with data myself, or what I ought to be doing.
Here’s my starting point:
- I want to respect my visitors’ privacy.
- I want to be in compliance with the law.
- I do not want to profile or track individual visitors.
- I do not want to monetise visitor information.
- I do not want copy-pasta legalese, written and styled to manipulate “consent”.
- I am, however, curious to see what is useful to visitors to this site.
What EU rules are applicable with regard to cookies and analytics
In short, the ePrivacy directive(s) and the General Data Protection Regulation. In the future the ePrivacy regulation should supercede the ePrivacy directives but for now those three are the foundation for national laws or – in the case of the regulation – the law. In full:
- Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications)
- Directive 2009/136/EC of the European Parliament and of the Council of 25 November 2009 amending […] Directive 2002/58/EC concerning the processing of personal data and the protection of privacy in the electronic communications sector […]
- Regulation 2016/679/EU of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)
A bit of nomenclature: In order to gain the full force of law, directives must be implemented in national legislation. Regulations are effectively law in the entire union straight away but will often also (eventually) result in national legislation to square away any differences. If the member state you live in keeps laws in conflict with a regulation on the books, please take your government to EU court.
I am a Danish citizen, living in Denmark, operating a site hosted in Germany. The differences in how the directives are reflected in national legislation, however, are only really interesting to marketing people desperately trying to find loopholes. I will just refer to the directives, and the choices I make should be legal in any member state.
It is also worth noting that the directives are set to be replaced by the long delayed eprivacy regulation (remember the distinction between directives and regulations). Somewhat contrary to EU Parliament character, the regulation currently (November 2021) looks to make regulations regarding data gathering for analytics purposes more lenient. In other words: What I end up with here will in all likelihood continue to be legal in the forseeable future. I will try to do an update once the regulations lands.
The 2002/58/ec directive dealt with a lot of things, and of the many articles the only relevant one here is article 5(3), and even that referred only to information stored locally with the end user, not explicitly cookies. That article wasn’t changed in the 2009 amendment but the latter contained a recital that went in to a bit more detail and called out cookies specifically. And when that directive resulted in national law in the member states, we got the whole “cookie law” debacle of the early tens.
If you are confused as to what is covered by the GDPR and what the directives (or “cookie law”) deal with, it’s really quite simple: The directives only deal with the act of storing and reading information on the end user’s machines (“terminal equipment” which seems to cover everything on the end user’s network, inclding routers, ISP modems, etc. not just stuff that functions like a good oldfashioned terminal or it’s modern day descendants). They do not operate with any notion of data or tracking. It’s simply regulation of a precise technical act. Whether you’re placing cookies to track or using visitors’ computers as some sort of distributed filesystem for your own files because you ran out of space, it falls under article 5(3) as an act “[…] to store information or to gain access to information stored in the terminal equipment […]” The 2009 amending directive makes some distinctions, more on that below.
The GDPR on the other hand operates on another level. The GDPR regulates the storing and processing of personal information. The manner in which you come into possession of that information is unimportant (though there are limitations so that listening to village gossip does not make you a data controller). What it means The implication for this purpose is that unlike the ePrivacy directives, the GDPR applies regardless of whether the data collection happens server side or client side.
|Personal data||Not-personal data|
Finally, they both apply to me, regardless of the size of my tiny operation. I suspect I’m not the only one who has at some point conflated the GDPR with the similarly-intentioned California Consumer Provacy Act (CCPA). While the CCPA only applies to businesses, and medium sized to large ones at that, the EU legislation applies, AFAICT, to anybody operating a website.
I want to quote the 2002 directive article 5(3) in full because there really isn’t much to it.
Member States shall ensure that the use of electronic communications networks to store information or to gain access to information stored in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned is provided with clear and comprehensive information in accordance with Directive 95/46/EC, inter alia about the purposes of the processing, and is offered the right to refuse such processing by the data controller. This shall not prevent any technical storage or access for the sole purpose of carrying out or facilitating the transmission of a communication over an electronic communications network, or as strictly necessary in order to provide an information society service explicitly requested by the subscriber or user.https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32002L0058&from=EN
As mentioned above the object of the article is “stor[ing] information or […] gain[ing] access to information stored in the terminal equipment of a subscriber”. This requires informed user consent. Two exceptions are made, one for what I think is ISPs storing information on their modem that they have loaned to you, and one for basic functionality that requires local storage. Shopping carts are an oft-cited example. Login cookies could be another. In essence, the user themselves does something that can only reasonably be accomplished using local storage. The user’s consent is therefore implied.
The 2009 directive didn’t add much to article 5(3) but it started a discussion about whether the 2009 and 2002 directives required opt-in or opt-out schemes. Some thought the 2009 directive signalled an opening for opt-out because recital 66 mentions “the right to refuse”. It seems a bit academic. The net result, whatever the classification, was the proliferation of consent banners. In some jurisdictions apparently (the UK, surprise surprise) consent could be inferred by the user not clicking on anything. In others it could not. So opt-out and opt-in respectively, I’m guessing.
General Data Protection Regulation
[Personal data shall be] adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’)Article 5(1)(c), General Data Protection Regulation
The General Data Protection Regulation (note the regulation moniker) regulates what rights apply when authorities or private enterprise store and process personal information. The GDPR can broadly be described as ‘anti-data harvesting’:
- Have a lawful purpose for your data processing
- Restrict your purpose as much as possible
- Restrict your data processing to only what is needed for that purpose
- Only keep that data for as long as you need it
I will always need a lawful basis for any processing of personal data. Lawful basises are listed in article 6(1) and include, among other things, compliance with other laws, delivery on a contract, “legitimate interest” (e.g. that of a school in their students), or consent. Even with consent (by reading this you hereby authorize me to gather absolutely all the data my trojan can get from the local files on your machine) there are limits on what I can do. Consent obviously has to be informed, and it can only be for using the data for specific, defined purposes, rather than an all-encompassing right to use it as I see fit.
I believe article 7(4) also says, basically, that the personal data has to be relevant to what business I have with visitors. So even if I somehow obtain consent to record visitors through their webcams while they are on the site, it would not be considered valid, because the personal data, I would be gathering, i.e. recordings, would not be relevant to the purposes of this blog. Were a live streaming website to ask a visitor for consent to the same, it would be much more appropriate and unlikely to get struck down.
Most crucially from the perspective of individuals, the GDPR established various rights, including the right of access, rectification and erasure. With some exceptions – the intelligence services don’t have to tell you what they know about you; creditors can keep your address on file whether you like it or not – we enjoy wideranging rights to know what is registered about us and to ask for it’s deletion.
The GDPR also establishes the roles of data controller (the authority or private enterprise on whose behalf the data collection and processing is being done) and data processor, the agent who is doing the actual collecting, storing or processing. The essential implication of these definitions is that you cannot outsource your responsibility as a data processor. If Google did it on your behalf, you are still accountable.
An issue familiar to many who have tried to assert their GDPR rights, is the question of what exactly is “personal data”. This is central because the scope of the GDPR is personal data:
This Regulation applies to the processing of personal data wholly or partly by automated means […]Article 2(1), General Data Protection Regulation
.. and implicitly not to data that are not personal (or not wholly or partly computerized or systematically filed). When I asked a charity that I had once, long ago contributed to (and who persisted in calling me when I had told them not to) to share with me what personal information they had on me, they screenshotted their systems showing my address and phone number and nothing else. Yes, an address and phone number is personal information but so is a log of calls and responses. An evaluation of ‘squeeziness’ is personal information. Call notes about being asked to be left alone, yep, that’s personal information. While only a subclass of personal information is labelled sensitive personal information and requires very good reasons for processing – e.g. my employer while lawfully obligated to register a ton of data about our students is not allowed to register their ethnicity – any info, no matter how banal, can be personal information when it is tied to personal identifiers. To quote the definition in article 4 of the GDPR:
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;Article 4(1), General Data Protection Regulation
I am going to consider IP addresses a direct personal identifier, so any information attached to a request from a specified IP address is personal information. This seems to be the consensus on the web and it’s certainly the safest assumption.
However, it also follows that any data trove, however detailed, not attached to personal identifiers may no longer be “personal data” and so may not be covered at all by the GDPR. Anonymization and pseudonimization may achieve this to some degree. What constitutes effective anonymization or pseudonymization is a contentious and complicated subject, so take the following with a large grain of salt. Recital 26 is central to understanding the subject:
Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.Excercept from Recital 26, General Data Protection Regulation
Let’s take a hypothetical example. I am creating a vast database of natural persons’ preferences for pears over apples or vice versa. Like so:
This is personal data because it relates to “identified […] natural persons”. Is the information banal and trivial? Sure. It’s still personal data. Reading the GDPR, I realize, I don’t care about who likes what, I just care about frequencies: How popular are apples compared to pears. I don’t want to be held accountable for how I process the personal data of Alice og Bernard. So I anonymize the data:
Now I just have a list of observations from which I can calculate frequencies (40% prefer pears, 60% prefer apples). This is not personal information. Deleting the observations and only storing the frequencies, i.e. aggregation, would also mean that I was no longer in possession of personal data.
Pseudonymization and hashing
What about pseudonymization? For work I once had to figure out how to share some data on students with an outside party. My employer, the data controller, had a lawful basis to process the personal data; it was required to do so by national law. The outside party did not have any legal basis to process the information. That means we would need a data processing agreement, as stated in article 28:
Where processing is to be carried out on behalf of a controller, the controller shall use only processors providing sufficient guarantees to implement appropriate technical and organisational measures in such a manner that processing will meet the requirements of this Regulation and ensure the protection of the rights of the data subject.Article 28(1), General Data Protection Regulation
Our data protection officer and I figured the third party did not have the wherewithal to uphold a data processing agreement. The answer turned out to be pseudonymization, sufficient to make it practically impossible to identify the subjects. That way, the data set was no longer considered to contain personal data under the GDPR. The advantage of pseudonymized data in that case was that some subjects appeared more than once in the data set. By pseudonymizing a personal identifier, I made it so the third party were able to spot duplicates. Completely anonymizing the set would have made that impossible.
How do you pseudonymize in practice? You run a hashing function on a string, it spits out another string, that is your pseudonym. A hashing function will always produce the same output given the same input. However, and this is the crucial bit, it is not possible to reverse the process and get the input from the output. I could also have run Caesar’s cipher or any other reversible cryptographic function on the personal identifiers (“Abcdef” becoming “Defghi”) but that would be easy to reverse or decrypt. Pseudonymizing my apples and pears data set produces the following:
The challenge with pseudonymization is to make it useful while also actually making it not-personal data. If you hadn’t read the input in the original table there was no way for you to uncover the identity of d4449d8e74b7d0affd8[…] and so it can reasonably be considered to not be personal data. However, in this case it isn’t really any more useful than the anonymized table.
The sort-of-equivalent to an encryption key in hashing is called “salt”, basically a random large number, that is used in the transformation. Say I want to pseudonymize IPv4 addresses of visitors to my site. If I need to consistently get the same IP address to produce the same pseudonym – otherwise what’s the point? – I have to keep the salt around somewhere. Now, even though I know the salt, I still cannot reverse the process. I can, however, use it to produce a large table of possible outputs given a range of inputs. There are only a mere 4.3 billion IPv4 adresses, so it’s easy for me to generate a table of pseudonyms of every single IPv4 address ever. Then in order to uncover what IP address a pseudonym stands for, I just do a reverse lookup in my giant table. So while I might believe that the pseudonym I generate is not personal data, I have the means to undo the masking with little more than some CPU horsepower. And as recital 26 says, if data subjects can potentially be identified (within reasonable time and expense) it is personal data.
Even if I try to anonymize and remove all manners of direct personal identifiers – names, phone numbers, email addresses, etc. – there is still a risk of data being linked to data subjects and so being personal data. Example: If am the only person at a place of employment with a membership of a specific union and my employer removes all personal identifiers in a data set that includes union membership… It doesn’t take much to realize that that data subject is me.
The point I’m trying to make here, is that while pseudonymization was a good solution for the students case, it doesn’t work for website statistics. Why? Because it was a one-time thing and I could throw away the salt after use, making it impossible to generate tables. It also helped that there was a separation of data controller (my employer) and third party, so that they would never have seen the personal data input or the salt, while the one who had, me, was legally allowed to process personal data. And finally, the meat of the data, survey questions, could not be used to piece together identifiers (no background questions, just opinions).
Basically, when using pseudonymization as a get-out-of-GDPR-jail-free card it is crucial to consider how hard/easy it would be to discover the identity of the data subject given a lot of factors. My gut feeling says that at the very least you either need a “separation of powers” where a data controller (with a legal mandate to process the personal data) can keep secrets from a data processor or you need the pseudonymization process to be a timed affair where your past self can keep secrets from your present self because past self threw away the salt. See recital 29 for some support for this notion (essentially it encourages the use of pseudonymization and storing additional information separately “within the same controller” to avoid reidentification). In both cases you also need to make sure that data cannot be pieced together in such a fashion as to uniquely identify individuals. Most legal advice would, I suspect, go way further than that to make sure that there is no way that what you’re processing is personal data. Again, I am not a lawyer.
What does that mean for setting up my site? Let’s take the directives first, as they are the simplest to deal with.
If we go by the wording of 5(3) there is no way that cookies set for analytics purposes are covered by any of the exemptions. They are not necessary for operating a telecommunications network, nor are they necessary to fulfill a user’s wishes. Therefore they would require consent according to article 5(3). If I do not use any analytics cookies, I would also be in complete compliance with article 5(3) without asking the user anything, regardless of what’s happening server side. No cookies, no need for consent.
As previously mentioned, the upcoming eprivacy regulation may muddy those waters a bit. My national regulatory agency has already signalled a laissez-faire attitude to analytics cookies (in Danish) in anticipation of the new rules. I will stick to the letter of the law for now.
Even if I opt for compliance with the ePrivacy directive by way of not using cookies, that still leaves the question of what the GDPR means for analytics. The crucial question is whether or not the data I gather as a website owner is “personal data” or not. If it’s not, well, then I am compliant. If it is, I need to either
- stop gathering said data, or
- make it not-personal data or
- live up to the requirements for handling personal data, that the GDPR requires, like asking for consent (lacking other lawful basises for processing) and respecting the various data subject rights.
As mentioned previously, IP addresses are considered personal identifiers. So any information keyed by IP addresses would definitely be personal information.
Finally, there is another question of what data is really reasonable and necessary to collect. Is user profiling necessary? Detailed tracking of web browsing within or outside of ther site? Heat maps and mouse movements etc.? Personally, I think that demand for that sort of thing is way overestimating it’s usefulness. However, it is important to remember that if that data is not considered personal information, it may be not be covered by the GDPR and so becomes either a question for national lawmakers or a question of morals and common sense.
To be continued
I think I have established a reasonably solid foundation of understanding of the law on which to build a compliant solution that also adheres to the goals I set out at the start of this post. While the eprivacy directives do not leave a lot of wiggle room for website owners, the GDPR will require more interpretation. Fortunately, people more knowledgeable than me in matters of the law, have done some of that work for me. More on that in the next post where I will get in to the nitty gritty of implementing a solution.