The State of Online Tracking pt.1

Lola Odelola

Web Developer Advocate

Photo by Julissa Capdevilla on Unsplash

An Introduction to Cookies

Internet users are becoming cognisant of the ways we’re being followed and tracked online, the ways our behaviour is being analysed and data sold on to line the pockets of a the ad industry. Over the last few years, the news has been filled with debate about who truly owns our data, who can use it and how can they use it. While the law is typically quite behind on tech matters, there have been a few laws that have arisen that aim to protect users (e.g GDPR, LGPD, CCPA, etc) however, some are really complicated to understand, and others only really deal with internet users in the context of capitalism (as consumers or advertiser/seller). Ultimately though, these laws are hindered by national boarders, GDPR is only relevant for users in the European Union, CCPA only applicable to Californian users, etc.

Tech workers aren’t ignorant to these issues, tracking has been happening for decades and as it becomes more egregious, there is pressure for tech workers to do something. In October 2020, the Technical Architecture Group at W3C (the main international standards organisation for the web) published the Ethical Web Principles. Standards play a big role in dictating how the web and associated web technologies are used and implemented and the list acts as a bare minimum requirement for web standards to meet, the list includes:

Security and privacy are essential
We will write specs and build platforms in line with our responsibility to our users, knowing that we are making decisions that change their ability to protect their personal data. This data includes their conversations, their financial transactions and how they live their lives. We will start by creating web technologies that create as few risks as possible, and will make sure our users understand what they are risking in using our services.

This is the most pertinent to online tracking technology (although so many more points in the list are applicable) and will be crucial for any new standards to meet.

In this series I want to delve into some of the standards which are being discussed for proposals in the W3C Privacy Community Group and demystify a lot of what’s being discussed. My hope is that we will all be more informed and encouraged to participate in the broader discussion.

HTTP Cookies

Before I go into detail about the new standards, it’s good to understand the current landscape and the foundation for a lot of the discussion, and cookies make up a considerable amount of that foundation.

What is a Cookie?

Over the last few years, we’ve seen an influx of cookie notices on websites and web apps. Some request permission to use cookies (usually if they’re using third part cookies) while others just inform the user that cookies are being used, and while we all know they’re not the edible kind, we may not all know exactly what they are.

the inkey list and uber eats websites both with cookie noticesthe inkey list and uber eats websites both with cookie notices

HTTP Cookies are a type of data store on the browser, to put it succinctly. They allow websites to collect almost any data they want about the user so that they can tailor the user experience &/or analyse data about who is using their site. For example, here is some of the information Twitter is collecting about me in a cookie:

"dnt=1;
remember_checked_on=1;
lang=en;
eu_cn=1;
night_mode=2;
ads_prefs="******"; 
twid=********;"

Twitter are storing my advertising preference, my twitter ID, what theme I’m using (night mode), my language, if I have Do Not Track enabled, and if I set remember me when I signed in. These are pretty harmless and are used to make my user experience as seamless as possible. It’d be pretty annoying to have to set dark mode every time I logged in or to sign in after I’ve asked to be remembered. You can clear cookies in your browser settings but even if you don’t all cookies (with the exception of Session Cookies — more on those later) have an expiration date, although they tend to be very far into the future.

As seen above, cookies are a collection of key-value pairs. So setting a cookie means assigning a key and a value in a HTTP header:

Set-Cookie: `darkmode=1; Expires="Wed,12 May,2021 00:00:00 UTC"`

Session Cookies

As I mentioned earlier, session cookies don’t have expiration dates, this is how they’re distinguished from other cookies and also what makes them temporary. They last for as long as the browser is active, once the user quits the browser, session cookies will be deleted.

Third Party Cookies

These are the cookies that are responsible for a lot of the tracking complaints we have especially those related to ads. Typically cookies are first party, which means the domain key on the cookie matches the domain in the browser’s address bar.

Going back to the Twitter example, when I inspect the cookies, I see the following output:

List of cookie domains for twitter.comList of cookie domains for twitter.com

As you can see, the only domain listed is https://twitter.com, and since this is the same as the domain in the address bar, this cookie is a first-party cookie.

A third-party cookie would have a different domain to what is in the address bar, which is how ads can do cross-site tracking. Doing the same thing on a popular gossip site produces a very different result:

List of cookie domains for mtonews.comList of cookie domains for mtonews.com

In this case, the first domain mtonews.com is a first-party cookie since that’s the domain in the address bar. However, there are a list of other domains which all have cookies on mtonews.com, they’re all third-party cookies. These cookies work by building a (sort-of) profile of you based on the sites you visit, in this case we’ve visited mtonews.com which has cookies belonging to ads.pubmatic.com. If we went to anothergossipsite.com which also had a cookie from ads.pubmatic.com both of the cookies from mtonews & anothergossipsite would be sent to ads.pubmatic.com’s servers which would allow them to create a browsing history of the site’s we’ve been to with an ads.pubmatic.com cookie on them. This is typically how cookies are used to track user behaviour.

Final Thoughts

There are other types of cookie, such as the Zombie cookie 🧟‍♀ which gets regenerated after it’s been deleted or the Http-only cookie which can only be accessed via JavaScript, but for this series I wanted to give a brief explainer. The privacy web landscape is changing (for example third-party cookies are being retired) and in order to understand the direction we’re moving in, we should understand a little of where we’re coming from. This post is the first in a series of explainers that will cover what’s happening in the web standards world when it comes to web privacy. Feel free to follow what’s happening by checking out what’s on the table in the W3C Privacy Community Group on their GitHub or by having a look through the meeting minutes.

Further Reading