The State of Online Tracking pt.1
Web Developer Advocate
An Introduction to Cookies
Internet users are becoming cognisant of the ways we’re being followed and tracked online, the ways our behaviour is being analysed and data sold on to line the pockets of a the ad industry. Over the last few years, the news has been filled with debate about who truly owns our data, who can use it and how can they use it. While the law is typically quite behind on tech matters, there have been a few laws that have arisen that aim to protect users (e.g GDPR, LGPD, CCPA, etc) however, some are really complicated to understand, and others only really deal with internet users in the context of capitalism (as consumers or advertiser/seller). Ultimately though, these laws are hindered by national boarders, GDPR is only relevant for users in the European Union, CCPA only applicable to Californian users, etc.
Tech workers aren’t ignorant to these issues, tracking has been happening for decades and as it becomes more egregious, there is pressure for tech workers to do something. In October 2020, the Technical Architecture Group at W3C (the main international standards organisation for the web) published the Ethical Web Principles. Standards play a big role in dictating how the web and associated web technologies are used and implemented and the list acts as a bare minimum requirement for web standards to meet, the list includes:
Security and privacy are essential
We will write specs and build platforms in line with our responsibility to our users, knowing that we are making decisions that change their ability to protect their personal data. This data includes their conversations, their financial transactions and how they live their lives. We will start by creating web technologies that create as few risks as possible, and will make sure our users understand what they are risking in using our services.
This is the most pertinent to online tracking technology (although so many more points in the list are applicable) and will be crucial for any new standards to meet.
In this series I want to delve into some of the standards which are being discussed for proposals in the W3C Privacy Community Group and demystify a lot of what’s being discussed. My hope is that we will all be more informed and encouraged to participate in the broader discussion.
Before I go into detail about the new standards, it’s good to understand the current landscape and the foundation for a lot of the discussion, and cookies make up a considerable amount of that foundation.
What is a Cookie?
the inkey list and uber eats websites both with cookie notices
HTTP Cookies are a type of data store on the browser, to put it succinctly. They allow websites to collect almost any data they want about the user so that they can tailor the user experience &/or analyse data about who is using their site. For example, here is some of the information Twitter is collecting about me in a cookie:
Twitter are storing my advertising preference, my twitter ID, what theme I’m using (night mode), my language, if I have Do Not Track enabled, and if I set remember me when I signed in. These are pretty harmless and are used to make my user experience as seamless as possible. It’d be pretty annoying to have to set dark mode every time I logged in or to sign in after I’ve asked to be remembered. You can clear cookies in your browser settings but even if you don’t all cookies (with the exception of Session Cookies — more on those later) have an expiration date, although they tend to be very far into the future.
As seen above, cookies are a collection of key-value pairs. So setting a cookie means assigning a key and a value in a HTTP header:
Set-Cookie: `darkmode=1; Expires="Wed,12 May,2021 00:00:00 UTC"`
As I mentioned earlier, session cookies don’t have expiration dates, this is how they’re distinguished from other cookies and also what makes them temporary. They last for as long as the browser is active, once the user quits the browser, session cookies will be deleted.
Third Party Cookies
These are the cookies that are responsible for a lot of the tracking complaints we have especially those related to ads. Typically cookies are first party, which means the domain key on the cookie matches the domain in the browser’s address bar.
Going back to the Twitter example, when I inspect the cookies, I see the following output:
List of cookie domains for twitter.com
As you can see, the only domain listed is
https://twitter.com, and since this is the same as the domain in the address bar, this cookie is a first-party cookie.
A third-party cookie would have a different domain to what is in the address bar, which is how ads can do cross-site tracking. Doing the same thing on a popular gossip site produces a very different result:
List of cookie domains for mtonews.com
In this case, the first domain
mtonews.com is a first-party cookie since that’s the domain in the address bar. However, there are a list of other domains which all have cookies on mtonews.com, they’re all third-party cookies. These cookies work by building a (sort-of) profile of you based on the sites you visit, in this case we’ve visited mtonews.com which has cookies belonging to
ads.pubmatic.com. If we went to anothergossipsite.com which also had a cookie from
ads.pubmatic.com both of the cookies from mtonews & anothergossipsite would be sent to ads.pubmatic.com’s servers which would allow them to create a browsing history of the site’s we’ve been to with an
ads.pubmatic.com cookie on them. This is typically how cookies are used to track user behaviour.