what is privacy

2020-07-18

Something that is an ongoing irritation to me is the discourse and marketing around online privacy and anonymity tools. There is a great deal of misleading discussion and confusing argumentation not only in marketing and online discussions but also in the security community, where those who will let perfect be the enemy of good (namely: the security community) often make broad statements about the security properties of various technologies without actually considering the threat model... and thus make broad statements which are broadly false.

For a rare occasion I would like to try to convey something useful, which is, a threat-centric approach to understanding online privacy and anonymity services. Then I will kvetch about technology as usual.

The biggest reason that discussions about online privacy go astray is because both "online" and "privacy" are words that seem innocuous enough but encapsulate an enormous realm of different and sometimes contradictory elements. "Privacy" means different things for different people and contexts. Let's turn to our Study Tech and approach the matter by defining words. Privacy, at least in the information assurance context, is best thought of as being the confidentiality of something from someone. That is, I feel that I have privacy when some thing, person, or group of things or person is not able to obtain some set of information about me.

The thing/person/group (subject) and information (object)---the subject and object of privacy---are highly variable depending on the context. In common situations, people expect many types of privacy. We expect that the letter carrier does not read our mail. We expect that the police are not tracking our location. Think about the internet, though: do we expect that gmail is not reading our mail? Well, it is, in a sense. Privacy can be a complicated topic in that the expectations---in terms of subject and object---vary from person to person and case to case, and yet we have a tendency to refer to the whole thing simply as "privacy." You can imagine that this makes discussions of privacy inherently confusing, and some people seem content to perpetuate the confusion because they see discussing privacy without defining the case they are considering as being a sort of argument for that case (consider when people talk about "freedom," explicitly including something like free speech in the definition of freedom, in order to make a point).

So, at fear of sounding too much like a rationalist, in order to discuss online privacy we must first define our terms. When it comes to the object of privacy, there are two significant and useful definitions:

Privacy of the payload, that is, the contents of the webpages we view, files we download, messages we send, etc.
Privacy of the metadata, that is, the IP addresses (and implicitly services/websites) that we communicate with, how much, how often.
Privacy of your identity, that is, your IP address and other identifying information.

Let's think about these two practically. Object 1, payload, is protected by TLS (HTTPS). Object 2, metadata, is protected partially at best by TLS. While e.g. TLS with SNI can provide some protection of metadata in certain situations it does not provide effective protection in the general case.

Let's also think about subject. This is where things get a bit more complicated:

A) Privacy from your ISP, network administrator, other users on your local network. We will call this the local segment.

B) Privacy from the broader internet, that is, transit providers. We will call this the internet segment.

C) Privacy from the operator(s) of the services you connect to, e.g., privacy of your identity from a website you view. We will call this the remote segment.

You can see that there is a certain relationship between the subjects and objects in common online privacy concerns. When we are talking about subject C, the remote segment, it is almost exclusively subject 3, our own identity, that we might worry about. The website that we are connecting to will obviously know that we are connecting to them and retrieving certain data, but we might not want them to know who we are. And of course in many cases we don't care, because the website we're using might be one that we explicitly identify ourselves to anyway. Say, our bank.

Finally, there are some concepts that are in fact entirely separate from privacy but are still often co-mingled with privacy in discussions of technology and policy. The most prominent such concept is "countercensorship," which is the desire of users to view concept that someone does not want them to. Countercensorship also varies by subject, that is, censorship is often discussed in consideration of two different subjects:

The local segment---that is, a user's ISP or even national government desiring to prevent them accessing certain content. The case of a national government may actually be found in the internet segment, but for practical purposes the situation is the same for countercensorship. Someone on the connection path is trying to prevent a user reaching content.
The remote segment---websites may decline to provide certain content to a user but the user might desire to access it anyways. The most common case is that of region-locking, in which say BBC iPlayer refuses to stream BBC TV series to users it believes to be outside of the UK.

There are various types of technologies and services intended to handle different combinations of all of these cases. However, the requirements of these different types of privacy and countercensorship are different and can be contradictory. This means that a service or technology which is suitable for one situation may be unsuitable for another situation.

This is the real problem that frustrates me endlessly: various services and technologies are constantly promoted for "privacy," and while they might possibly be useful for one case they are almost always entirely unsuitable for another case. Users do not understand this, and people hocking various services do not attempt to educate them, and so users who are concerned about their privacy are induced into making choices which, in fact, compromise their privacy, and sometimes by the well meaning. One of the groups that I consider guilty of this offense, for example, is the EFF, through their breathless promotion of the Tor project with little consideration of the utility of Tor in specific situations. Instead, it is presented as a silver bullet for both privacy and countercensorship---applications which it is not always ideal for, and is sometimes counter to the purpose.

So to start discussing online privacy within this framework, let's look at a technical (and commercial) solution which is widely advertised for user privacy: VPNs.

VPN stands for Virtual Private Network. The "Private" in this acronym is intended in a completely different context and it's best to ignore it, in the context of "the VPN" as it relates to common internet users it is generally deceptive. From a technical perspective, a VPN is a technology which allows a network to be virtualized on top of a different network. For example, it allows a corporate internal network to be "extended" over the internet to the devices of employees who are working remotely, or for two different physical locations to have their local networks "unified" to a single network.

This is all rather boring to end users. To end users, or increasingly anyone who's ever heard of a YouTube video, a VPN is a service which makes the internet private.

Commercial VPN services such as Private Internet Access, NordVPN, ExpressVPN, the new Mozilla thing that probably no one uses, etc. are best viewed as services which take your computer and place it, logically, on the network of the operator---instead of the network you are currently connected to. This explanation, more than others, might help people to understand the privacy and security properties of such services.

So let's examine this from the perspective of subjects and objects of privacy. VPN services can easily protect objects 1 and 2 (payload and metadata) from subject A. That is, the use of a VPN service makes it so that the local network segment, the coffee shop WiFi you're on and/or your ISP, cannot view your traffic (even if unencrypted) and where it's going. They may still be able to collect certain metadata because VPNs are imperfect, for example, traffic volume, which research has found can sometimes be used on its own to derive useful information about payload. However, a VPN certainly provides stronger protection of 1 and 2 from A than not using one.

This protection of your traffic from the local segment is the primary function of a VPN. It is one of the key functions of corporate VPNs (in a client-to-site scenario) and the most significant value which can be derived from a commercial VPN service such as NordVPN, etc.

VPNs do not generally provide protection of objects 1/2 from subjects B/C, because once your traffic has departed the VPN provider network it traverses the internet the same as it would have otherwise. However this situation is more than sufficient for most people, privacy from subject C (a website/service operator) will always be limited by the fact that they necessarily have access to payload. Privacy from subject B (the greater internet, or transit providers) is generally of less concern to consumers because surveillance of internet transit is uncommon and generally limited to state actors. By far the greatest surveillance (and tampering) risk exists on the local segment.

VPNs may have some secondary value in protecting object 3 (your personal identity) from all three subjects. This basically occurs because most VPN providers present a large list of users as a single network identity (NAT is the technical mechanism), which entirely prevents methods like IP geolocation and also makes more sophisticated methods of identifying users somewhat more difficult because they tend to become "lost in the noise" of multiple users sharing the same address. However, the protection offered here is severely limited, and in general should not be a selling point of VPN services although it often is.

Protection of personal identity is most effective in the case of eavesdroppers on the link as it frustrates analysis via e.g. deep packet inspection, since numerous users will appear as having the same source IP. However, even this assertion relies on a long list of assumptions, perhaps most important of them being that your network traffic is indistinguishable (by means other than source IP) from other users of the same service coming from the same node of the same VPN service. This may sometimes hold out for very busy/popular services, but in many cases you will be the sole user from that node of that VPN provider, there are fingerprinting methods available even to DPI, etc. In general, VPNs are not really designed to protect users from DPI occurring elsewhere on the internet and cannot be expected to be effective in doing so in the general case.

Let's consider the case of protecting your personal identity (3) from website and service operators (C). This is perhaps the use-case for which VPNs are most misleadingly advertised and sought by some users. The use of a VPN service provides virtually no protection of your identity from the websites or services that you access. There are multiple reasons for this, but here are the several most compelling:

You may willingly provide your identity to many services and websites, e.g. via logging in to an account. This negates any privacy protections. This may seem obvious, but it is somewhat baffling how frequently people use "privacy tools" to log into Facebook with the expectation that it somehow mitigates Facebook's knowledge of their identity and behavior. There may be very limited cases in which it does, but in general, there is no value to most privacy protection technologies when you use them to access a service that you provide your identity to.
VPNs provide no mitigation against conventional fingerprinting methods, which rely on behavior of your web browser and features provided by your web browser to uniquely identify you. Because users normally roam between networks as part of normal practice, most advertising, analytics, etc. networks will seamlessly identify you between using a VPN and not using a VPN, without even any detection that anything is abnormal. In most modern surveillance contexts you are identified by fingerprinting, not by network origin, and use of a VPN does nothing to deter this.

A final consideration about VPNs is perhaps the most critical. We have seen that VPNs provide good protection in some cases (of payload and most metadata from the local segment) and limited protection in some other cases (of personal identity from the internet and website operators), although this protection is so limited that I feel it to be ethically very questionable to advertise it. And, when I say "advertise," I don't mean only in commercial advertising. I would apply this admonition equally to any number of well-intentioned "online privacy guides" and etc. that advocate the use of VPNs as a "privacy measure" without an explanation of what protection they provide---and more importantly, what protection they do not provide.

Many, even in the security community, will justify recommending methods of limited efficacy by the fact that they do provide some benefit in limited cases, and so it is better to use them than not. That may very well be true in some cases, but there is a significant hazard to end-users who are falsely confident. That is, users who believe that they are "protected" may put themselves at risk because of the assumption that they cannot be identified. This is especially true of individuals in more critical situations where such privacy mechanisms are often recommended---journalists, subjects of oppressive regimes, etc. This makes it not only a matter of technical correctness but also moral correctness to educate users as to the limitations of privacy technologies.

The whole thing is further complicated by the fact that VPN services have become a Big Business, and so there is a great deal of paid promotion. Despite a strong tide of opinion (and law) against this practice, there is still plenty of unacknowledged or "native" promotion for VPNs to be found. This creates the unfortunate situation where even well-meaning people recommending VPNs to friends and family may inadvertently be acting as an agent of a commercial promotion scheme that is motivated by profit, not by any sense of privacy, safety, or security.

The situation is perhaps the most extreme when we consider the potential risks to privacy and security from using a VPN service. Recall the technical model of the behavior of a VPN: it replaces the local segment (local area network and consumer/commercial ISP) of the user with the local segment of the VPN provider. All of the risks once to be found on the local segment are still present on the VPN service's local segment. As a user, all you have is the VPN provider's assurance that they act in your interest and apply best practice security measures to their own internal network and internet service arrangement.

Considering the low price and fast multiplication of these VPN services, it is inevitable that there are problems in this area even without any malicious intent. The majority of VPN providers today operate out of a relatively small number of low-rent colocation facilities, often simply white-labeling nodes provided by a different VPN service (you can detect this simply by observing that many VPN providers have exactly identical lists of nodes). They may have no one on staff with significant technical expertise. They may not have invested in any security program whatsoever. If they operate their own infrastructure, they may be devoid of the most fundamental secure practices, such as limitation of privileges and patch management. All of this adds up to an inevitability: commercial VPN providers will inevitably experience security incidents which compromise their users privacy. This is true of all providers but especially of the low-end providers which offer large numbers of nodes at very low prices, often consist of just a few people without technical expertise, and problematically often market the most heavily by more questionable means (such as "native" social media campaigns, e.g. "influencers").

Look at it this way: there can be very good reasons not to trust your local network segment. Some of the US's largest ISPs have displayed decidedly anti-consumer behavior and made it clear that they have little concern for consumer privacy. However, do you trust $5 MegaFastVPN more than your own ISP? At least Comcast has a security program, even if its concern for consumers is questionable. Many of these VPN providers could likely be subject to malicious outside surveillance for an extended period of time without knowing. This is especially true since so many lower-end VPN providers share infrastructure which is itself obtained from low-end colo and dedi providers with severely limited security programs. Considering that these VPN providers and especially infrastructure providers to VPN providers concentrate a large amount of consumer traffic into one soft target, they become extremely attractive to malicious actors.

There was recently a significant database breach of a commercial VPN provider (which provided services to many whitebox providers), which released about 20M log entries from VPN providers which "retained no logs." In this particular case there appears to have been knowing deception involved as the data in question came from an ElasticSearch instance (you don't feed logs into ES if you don't intend to use them), but there is of course a substantial aspect of incompetence involved as the ES instance was left exposed to the internet and unsecured (a remarkably common mistake with ES which, by default, listens on all interfaces and requires no authentication... also a sure sign of utterly lacking basic security practices). However, it's easy for this kind of thing to happen out of pure incompetence. A great deal of network and management software retains logs by default, truly asserting that you "retain no logs" would require a degree of technical competence and effort, and ideally an auditing program to ensure ongoing compliance, that inexpensive VPN providers do not offer.

I am not necessarily here to give advice, after all, I'm probably not licensed to give computer advice in your state[1]. The problem, though, is that it's very hard to give advice in this area for two reasons. First, the concerns and behavior of users differs, and this impacts what privacy measure they should take. Second, commercial VPN providers are quite frankly a cesspool of questionable practices and I have a hard time trusting even the most reputable. There are probably a few things we can state with some confidence; PrivateInternetAccess is probably more trustworthy than the coffee shop's open wireless network, and in fact, the scenario of untrusted (shared, open, etc) local networks is the situation in which I have an easy time recommending the use of a VPN. But the sheer number of bad actors in the VPN space make me extremely hesitant to ever tell another human being that they ought to look into one. They are likely to misunderstand the privacy protections as stronger than they are, and even worse there's a good chance that the VPN provider could itself turn out to either be a malicious actor or compromised by one.

Here's perhaps my best advice: if you're concerned about security and privacy on your local network segment you should use a VPN that you operate yourself (not too difficult if you have a background with Linux) or use my NordVPN affiliate link^w^w^w^w^wone operated by someone you know. Just ask the nearest neckbeard and smile and nod when they start going on about WireGuard. But all of these commercial VPNs are a disaster.

I hadn't really intended to only cover the topic of VPNs in this post but I did and it's already pretty long, so let's declare a multi-parter. Join us next time to talk about some privacy and countercensorship technologies other than commercial VPNs, and why I hate those too.

[1] I've been admitted to any number of bars, but it's the bouncers that have refused me that you probably ought to ask.