What is Metadata and Why Does it Matter?
In my last blog, I cited a statistic that the majority of the web is encrypted. This means that when you visit, say Facebook, that your Internet Service Provider (ISP) can see that you visited and logged in, but they can’t see your login credentials (username and password). Similarly, using Amazon means they (and whoever owns the router of the network you’re using) can see what you bought but not your card number. This is done with the use of HTTPS, a powerful and increasingly popular encryption protocol used online. It’s quite effective and difficult to break.
So in effect even the average person has – generally speaking – a basic level of powerful security in their online lives. Which begs the question that privacy enthusiasts everywhere have come to despise like nails on a chalkboard: “why should I care?” If your sensitive details such as password and credit card number are safely encrypted, who cares if your ISP can see what websites you visit? Or if your network administrator can see what you bought? Or if Google remembers what you searched last week? (Spoiler alert: the front page of my website has a curated list of reasons why you should care, but let’s ignore that for now.)
This information in question is called “metadata,” sometimes described as “data about the data.” Maybe I can’t see exactly what you said in your email, but I can see who you emailed, what time, and the size of the email. And on the surface it doesn’t seem so bad. Who cares if you know that I emailed my mom at 7pm and the email was 7KB?
As is the case with most privacy and security concerns in the modern era, the problem isn’t so much what’s collected but rather how it has the potential to be used. Take this excellent article from the Electronic Frontier Foundation, for example. A couple examples they list of metadata that has the potential to be too revealing include:
They know you rang a phone sex line at 2:24 am and spoke for 18 minutes. But they don't know what you talked about.
They know you got an email from an HIV testing service, then called your doctor, then visited an HIV support group website in the same hour. But they don't know what was in the email or what you talked about on the phone.
As you can see, metadata has the potential to be just as revealing as content itself, and therefore should be protected just as much as the actual data. “You keep saying potential,” you might say to yourself. “Do you think that’s likely?” The answer is yes. China is already notorious for their incredibly invasive, 1984-like “Social Credit System.” The United States is starting to implement the use of your social network in insurance industries. Oh, and the United States is working on their own “Social Credit System” too. So yeah, metadata is an important part of your attack surface that you need to consider as you protect your privacy and security.
Certain metadata is impossible to avoid. Most of us probably aren’t willing to leave our phones at home and go without, so location-based metadata is inevitable. Most websites also collect information about what type of operating system and therefore device you’re using when you visit. Even connecting to a VPN service or sending encrypted email requires a certain amount of metadata to communicate. I wish this blog post ended with a list of suggested services to help reduce or eliminate the amount of metadata you leak in your daily life, but the fact is no such thing yet exists (easily). Instead, the goal of this post is to make you aware of metadata, how it exists and is collected, and what it says about you. As you pick services to help keep you safe and private in the digital world, it’s important to consider who those services are talking to and what they’re saying. Wire messenger, for example, collects metadata about your initial sign-up so they can create your unique account. This data isn’t shared outside the company without a warrant, but it is a good reminder that Wire is not 100% anonymous from the company itself if that’s your goal. On the other hand, Mullvad VPN allows you to pay with Bitcoin without surrendering any personal information at all, so in theory – if done carefully and correctly – Mullvad can make you 100% anonymous on the internet (disclaimer: there’s a lot more that goes into that than just buying a VPN with Bitcoin, so don’t get any ideas, it’s just an example that is possible in theory).
Most of us probably don’t need to be 100% anonymous for any reason, but it is a good idea for us to protect our metadata just as much as our actual communications. Again, I wish I had some concrete advice, but instead it simply comes down to asking yourself “what metadata am I giving up and to who?” Using a VPN means you’re transferring a considerable amount of your metadata away from your ISP and over to your VPN provider. Assuming you use a reputable, trustworthy VPN provider, that’s a good strategy. Encrypted emails are the same thing. Many of these companies will surrender what they can if given a warrant, but these same companies rarely have much to turn over aside from a few login locations and times (which can again be defeated with a VPN). It’s a multi-layered approach but it’s one worth considering until technology can catch up to protect our metadata by default.