crying for onions

2020-08-01

*** In the interest of being up-front there is some mention of child sexual abuse in this one. It's brief as I intend to take up the topic in more depth in a future message, but you still might want to skip from paragraph starting "I'm kidding, Tor..." to "pearl-clutching aside" if you would rather not think about it. This is a topic that I think is important to discuss (for reasons I outline here), but it is not easy to discuss, and I hope that it is clear that I may make light of it only because that is my way of discussing everything. The issue is not at all a light one, and that is why technologists should to choose to engage with it.

I have been a bit busy lately due to some combination of finally deciding to commit to getting my private pilot's certificate and spending a greater than average amount of time getting angry at computers and resolving not to touch them ever again. However, I have finally returned to prattle on a bit longer about online privacy.

What I want to talk about today is: Tor.

I have always had a rather quarrelsome relationship with the Tor project. There are a few reasons for this, some technical and some not. Just for the sake of getting past the boring parts I'll dispose of the non-technical ones first: for one, prominent Tor developer Jacob Appelbaum (who often represented the project in public) was widely accused of sexual harassment, which a private investigator hired by the Tor project reported to be true. Because Appelbaum was the public face of the project to such an extent, this represented a bit of a black mark on the organization, which may have sat on the issue for a year or longer. Second, the Tor project has attracted funding from a wide variety of sources, most of which I personally feel was ill-spent supporting a project that has a good brand but poor credentials. But of course these are all issues which are quite separate from the actual technology, and I am here to complain about computers.

Let's talk about what Tor is. Tor is the most prominent implementation of a concept called "onion routing." The underlying idea is actually fairly simple and originates from some academic papers that led to the Tor project. Essentially, the idea is that if you route traffic through several "layers" of a network, each layer being unaware of the other layers (this blindness is achieved by encryption), no layer has the information from other layers to establish the actual origin of the traffic. An explanation that I personally think is simpler than the "onion" metaphor goes something like this: if you tell a friend to pass a message to a friend to pass a message to another friend, after a few rounds of this no one will be clear on where the message actually came from. This is basically what Tor does, but of course routing IP traffic requires having a return path, so Tor uses a cryptographic approach so that each "friend" is able to route traffic in the reverse direction as well but none of them know the route more than one hop in each direction.

So Tor routes your IP traffic through a series of nodes, each of which is blind as to the full traffic route. There are a couple of types of nodes, a Tor "node" or "router" in general is one that shuffles traffic around inside of the network. An "exit node" is specifically a node that is willing to be the final node in the chain, forwarding traffic into the public internet. Exit nodes are somewhat less common because the Tor network is widely used for various types of abusive behavior and the exit nodes, being the apparent origin of this traffic, tend to catch most of the flak for it.

If each node only knows the previous and next hops in the route, as is the scheme with Tor, then three hops through the network is sufficient such that no one node knows the source and destination of the traffic. This creates a form of anonymity: the nodes that know who you are don't know who you're talking to, and the nodes that you're talking to don't know who you are. In the scheme that we devised in the last post, this provides anonymity of your identity from the operators of the websites you access. This is the primary objective of Tor: to allow you to access web services without the operators of those services knowing who you are.

To be clear, the concept of onion routing is not specific to Tor, although Tor was developed in part by the author of the first paper to describe the scheme, so it is perhaps the "reference" implementation. Early on, onion routing was also notably implemented by the Mixmaster anonymous email system (onion routing tends to be high latency and so is naturally more suited for asynchronous email than real-time IP routing), which is primarily used to send bomb threats to state universities and which I contributed to for some time in my wild youth[1]. While Mixmaster is still somehow operational, no one cares about it, and onion routing is mostly associated with various IP routers of which Tor is by far the most widely used.

Tor was further extended with something called the Rendezvous System. The implementation is somewhat complex, but the basic idea is that it makes the privacy protections of Tor bidirectional. Instead of just protecting the identity of the user from the web service, it allows a user to connect to a web service without either knowing the true network identity (IP address) of the other. Very roughly speaking this means using Tor in a "hairpin" manner, sending traffic through the Tor network which loops right back into the Tor network to get to the other end. The rendezvous functionality is generally referred to as "Tor hidden services" and even more widely as "THE DARKNET," and it is the facility that allows you to go to a website whose URL is a very long sequence of random characters followed by ".onion" in order to purchase drugs.

I'm kidding, Tor as a mechanism of purchasing drugs is largely a failure at scale, because the drugs still need to be physically delivered (which provides all kinds of opportunities for law enforcement to detect and identify participants) and because Ross Ulbricht was not especially competent at running a criminal empire [2]. Tor is actually used for child pornography[3].

It should be clear by now that I am being extremely critical of the Tor project and painting it in a rather poorer light than almost everyone else, although I am certainly not the only person arguing that Tor serves primarily as an aid to what is, in the industry, called child sexual abuse material (CSAM). I am not sure that I want to spend this otherwise fairly good evening articulating the significant concerns that exist surrounding CSAM and internet anonymity services, however, CSAM is very much the elephant in the room in this area. Even after making somewhat light of the long run of bomb threats against the University of Pittsburgh (which at least did not result in bodily injury), I feel that I would be participating in common but ethically questionable activity to critically discuss Tor without addressing the issue of child pornography.

It is a well-known reality among people familiar with internet anonymity systems that the majority of anonymous internet content distribution systems, whether old or new, centralized or decentralized, have seen a significant amount of use to distribute CSAM. This is a complex issue and there is clearly a certain amount of moral judgment involved in establishing whether or not these services are a net negative or positive for society. However, I firmly believe that progress in internet anonymity technology requires that we acknowledge and grapple with this uncomfortable fact. The vast majority of internet anonymity projects have addressed the problem of CSAM by simply ignoring it. I do not feel that this is excusable. The issue is not merely one of CSAM, CSAM is simply the most obvious case and the case which is most heavily pushed towards advanced anonymity technology because of aggressive prosecution by law enforcement. The broader issue is that all anonymity technologies are highly subject to a wide variety of abuse. Consider, as another example, anonymous social media like Whisper and all of the problems it has been associated with.

To pretend that the matter of abusive use (whether towards children, other users, people's email inboxes, etc) is a social or "non-technical" problem and thus not a consideration in the design of systems is, in my opinion, a regressive view that keeps the development of these technologies in a sort of "silicon tower" and promotes the continued development of technologies that are increasingly complex but fail to address actual social problems. Ethical and human safety concerns require that designers, developers, and operators of anonymity and privacy technologies take a holistic view in which they consider the way that their technologies engage with real-world behavior and impact real people. Taking a techno-libertarian, crypto-anarchist view of the matter and proclaiming that "information wants to be free" and anonymity technologies are indifferent to their uses has both a human cost (to a real extent literally, in the form of excess deaths as a result of these technologies) and a technology cost in that this view actually stifles attempts to develop a technical approach that is aware of and responsive to these ethical problems. Technology which fails, and especially willfully refuses, to address social realities is technology which is not fit for purpose.

Clearly this is a complex issue and I have just expressed a great deal of opinion, some of which is less technical and more moral and political. I intend to write an essay explicitly on this topic in the future, but it's not a very easy essay to write as the considerations involved are many, hard evidence on the issues is slim, and my opinions on the topic seem to be directly opposed to those of many technologists, and so I feel that I must be more effective in persuasion. Normally I wrote about how computers were a mistake, an opinion which is remarkably non-controversial among professional users of computers, and so I don't have to try very hard at all.

Okay, so, pearl-clutching aside, let's get back to the technical considerations. What is Tor good for?

As I mentioned, the primary function of Tor is to protect the identity of the user from the web services with which they interact. With the addition of the rendezvous system (the "dark web"), this protection becomes bidirectional, protecting both the user and the service from their identity being known by the other.

This is very neat. I don't want to seem like I am downplaying the achievement, there is very real technical achievement in designing a system which can meet these goals. However, if I was at all successful in articulating the previous post, you will know that I feel that privacy technologies generally fail to communicate their actual capabilities and limitations to users. Even this feature of Tor is an important example.

Remaining anonymous online is very difficult. As I suggested in the previous message, there is a strong temptation among internet privacy services (whether for-profit or not) to present online privacy as simple. For VPN services, and even for Tor, it often comes down to IP address: if they can see your IP address they can identify you, if they cannot, they can't. In reality, the user's IP address is a minor consideration in most analytics and tracking schemes, since it is prone to changing unexpectedly anyway. As a result, mere use of Tor does virtually nothing to protect user privacy as users can trivially be re-identified in other ways.

The Tor project has significantly improved this situation, more than any other project, by developing the Tor Browser. The Tor Browser is a modified version of Firefox which is dedicated to the purpose of browsing via the Tor network. While it has various privacy-centric features built-in (such as disabling Javascript by default and taking various anti-fingerprinting actions like fudging the viewport dimensions), by far the most significant feature of the Tor browser is simply being a different web browser, which means that it will not, from the start, have the user's Facebook cookies.

This serves as a critical precaution against what I would say is the most difficult problem in online privacy: no matter what you do technically, people have a habit of identifying themselves. For every development to mitigate Javascript fingerprinting, there are five hundred people who use the Tor network to log into their email. At this point they are, of course, completely identifiable to their email service, and as a result are very probably identifiable to a variety of other people based on various means of correlation. The Tor network and browser do feature various mitigations for this issue (like the ability to force selection of new routes) but they have limited efficacy and are probably ignored by most users.

The bottom line is that technical means of anonymity are very often foiled by user behavior. Changing user behavior to prevent users revealing their identities is extremely difficult and not something which is really amenable to technical solutions. Throwing additional plugins into Firefox to prevent identifying user behavior will always be a game of whac-a-mole. While law enforcement has in some cases re-identified users of Tor using technically sophisticated methods of attacking the privacy technology, they have more often re-identified users by using "old-fashioned policework" like noticing that they use the same stupid username on crime forums as they did on MySpace in 2005.

So, I have hopefully convincingly argued that Tor is not an unmitigated success in protecting user identity. Certainly it has benefits, but the privacy it provides is not absolute, and really taking advantage of Tor requires a relatively sophisticated understanding of the practical and technical issues surrounding online privacy, tracking, etc.

As our dearly departed Billy Mays would say, wait, there is more. While Tor provides certain privacy advantages, it also provides a privacy disadvantage: the Tor exit node handles all traffic in cleartext, so the exit node used to route your Tor connections has visibility into all of your traffic to the public internet. This is somewhat (but not completely) mitigated by the use of HTTPS, but it is still clear that significant surveillance on Tor usage can be obtained by operating an exit node which records traffic. The utility of this kind of collection has not been well established in the open literature but it is known that government organizations have operated Tor exit nodes for this purpose. It is speculated (and supported by evidence) that the initial documents published by WikiLeaks were extracted from traffic intercepted by malicious Tor exit nodes and later shared with Assange, potentially by Chinese intelligence, although attribution in these matters is always shaky.

The point is that alongside its privacy benefits, Tor also presents a privacy concern by intentionally introducing an on-path attacker[4]. This concern is not merely speculative but realized.The extent of the risk is probably not enormous in typical usage but is also not well understood. The history of confidentiality research is generally one of finding that metadata exposes more information than anyone had ever expected, and so it seems like we ought to err on the side of overstating the unknown risk rather than understating.

Tor provides a privacy benefit but also a privacy concern, and users of the system must weigh these against each other to inform their behavior. But, the public marketing of Tor expresses very little of this. This reflects my underlying concern: marketing and public discussion of privacy services fails to express the real capabilities, limitations, and privacy benefits of these systems, which are virtually always less than said.

But then there's a whole other matter: that of countercensorship.

Remember that?! We haven't talked about it for a while. A different but closely related topic to online privacy is that of countercensorship, allowing users to access content that someone (their local segment, national government, etc) doesn't want them to access. This is actually the most widely known and discussed benefit of Tor in most contexts. And yet, it is a purpose that Tor is fundamentally unsuited for.

Tor has a property that many distributed internet systems have: in order for the Tor network to function, a good portion of Tor nodes must be aware of the existence and network identity (IP address) of a good portion of the other nodes. This property (which is shared with other popular systems like BitTorrent) means that it is fairly trivial to collect a list of all nodes participating in the Tor network. Many commercial services collect this data and offer it for a modest price.

As a result, Tor is basically ineffective as a means of countering censorship. Network operators or regimes which wish to censor internet content simply censor Tor entirely, blocking access to all known Tor nodes. This is a standard practice in the oppressive regimes for which Tor is most widely advertised.

Of course the Tor project has a solution to this problem, called Tor bridging. Tor bridges are Tor nodes which do not provide standard routing and so are not readily discoverable from the Tor system. This makes it less likely that censors are aware of them, and so if a user behind such censorship can discover Tor bridges by a side channel they can connect to them for access to the Tor network. Tor bridges generally support various ways of obfuscating the Tor protocol (which is otherwise fairly easy to identify on the network) so that censors don't block connections to bridges based on fingerprinting of the protocol.

The adept reader might be wondering: if a Tor bridge can be used to connect to the Tor network from behind a censored connection, couldn't it simply be used to connect to the broader internet?

Indeed, a Tor bridge is essentially a proxy or VPN (the two are almost synonymous in this kind of usage) dedicated to providing access to the Tor network. From a technical perspective, such an obfuscated and hard-to-discover node, which is cooperative in the purpose of evading censorship, could simply forward traffic to censored services on the behalf of a user without any use of the Tor protocol. In fact, this is an old and still reasonably widely used method of evading censorship, and is often as simple as a website that will retrieve another website on a user's behalf. High school students 'round the globe are using such services to access adult websites during computer lab. Obviously there are privacy implications to these services, but they can be designed in a reasonably privacy-preserving way that presents the same exposure of user data as Tor.

So, if the mission of the Tor project is at least in part to allow those under oppressive regimes to evade censorship, why doesn't it provide such relays for more general use? I propose a reason which is a bit unconventional but I believe to be ultimately true (even if it is not consciously the reasoning behind the decision): internet proxies and relays are highly subject to abuse by all kinds of malicious actors, for example the ones that fill out the contact form on your website fifty times a day offering "negative SEO." Restricting Tor bridges to providing access to Tor makes them unattractive for this type of use, both because Tor is very slow and because many websites block Tor exit nodes because they generate a high level of abuse.

The bottom line is this: for many users in oppressive regimes, who need to use Tor bridges, Tor isn't used as a countercensorship mechanism at all. It's used to degrade the service of the Tor bridges so that they are less useful for abuse.

The Tor bridges are what actually provides the ability to evade censorship. The whole Tor network on the other side of them just makes it harder to fill out a thousand contact forms per minute.

In a way this is brilliant, because it seems to allow Tor bridges to persist longer than other types of proxy/countercensorship services. Tor bridges also seem to have seen a higher level of development for obfuscation techniques than other methods, although arguably VPNs provide a better level of obfuscation merely because the same protocols used by VPN services are also often used by corporate networks.

That said, Tor does not really provide a countercensorship function, when we get down to it. All we need for countercensorship is a node, which we can access, which will cooperate on forwarding traffic on our behalf. There are difficult parts of this problem (namely developing a way for users to locate these nodes without the censor being able to locate and block them), but the Tor project does not address these difficulties. The Tor project simply recommends that users find out the addresses of bridges by other means, like in person or through messaging services or etc.

Let me restate this, because this is a pet issue of mine and I am presently in a surly mood: there is a difficult problem in countercensorship, but the Tor project does not address it. What the Tor project does address is making their conventional countercensorship mechanism less effective so that it will attract less abuse.

As before I have injected a great deal of opinion into this discussion. There are use-cases which Tor addresses which more conventional countercensorship approaches (web proxies, VPNs, etc) do not address, most significantly the case where the person evading censorship is putting themselves at personal risk while accessing a service they do not necessarily trust, and so they desire stronger protection of their identity from the services they access. Of course this is subject to all the caveats and limitations I discussed earlier, but it is something that Tor is capable of addressing that simpler methods cannot.

That said, I do not think that this case is actually that common or important. If Facebook's decision at one point to offer a Tor hidden service tells us anything, it's that people use Tor to evade censorship in order to access Facebook (this was actually their explicit stated goal in the move). These are clearly not people who are trying to obscure their identity. I mean, sure, you can in theory sign up for Facebook without divulging your identity, but participating in Facebook in such a way as to not make yourself re-identifiable would be a difficult venture requiring a fairly high skill level.

Tor is widely advertised and recommend for purposes that it is either unsuitable for or (like nearly any technical solution) provides only limited utility for. This will almost certainly lead users to trust the technical solution to protect them, something that it cannot really do, and this will lead users in dangerous situations, say journalists under oppressive regimes, the group that the Tor project really advertises itself to, to place themselves at risk by participating in illegal or "dissident" activity while still being identifiable.

The Tor project, in the hero banner of their website, says "Defend yourself against tracking and surveillance. Circumvent censorship." I have argued that it has limited utility for the first sentence, and is basically unsuitable for the second sentence. And yet, it is one of the most widely used solutions for both, because it has an excellent reputation developed in good part on the back of extensive corporate, government, and nonprofit funding.

I do not mean to accuse the Tor project of having ill intent. I completely believe that everyone at the Tor project is sincerely doing their best to address these real-world problems. Most people at the Tor project are no doubt completely aware of all of the problems I have raised, even things like Tor's basic unsuitability for censorship evasion, but believe that they are presenting a good trade-off to their users by providing a limited set of benefits in a highly user-friendly package. I respect and appreciate them for this effort.

However, I feel that, like commercial VPN providers, the Tor project has placed the acquisition of users and funding over the actual security of their users. Because of their desire to be user-friendly, popular, and well-funded, they make promises which they are not technically able to keep.

To summarize by somewhat rough analogy, many privacy and anonymity technologies could be compared to a handgun[5]. You can sell someone a handgun by telling them that it could defend their lives, and this is not untrue, but they are more likely to shoot themselves in the foot. And yet, all you tell them is that it will defend their lives, because you are in the business of selling handguns. Privacy advocates are in the business of selling privacy and subject to the same errors. It is (probably) not impossible to promote these products in a conscientious and utility-maximizing fashion, but it is significantly more difficult than selling them the easy way. If we are going to address the problems of online privacy and censorship, we need to learn to do it the hard way that works, not the easy way that only feels like it.

[1] I mean that I contributed to the Mixmaster project, not the steady stream of bomb threats it handles. It was a different era, back when Usenet was just dying and not quite dead.

[2] Incidentally, Ross Ulbricht was arrested at the branch of the San Francisco Public Library that I used, after having gone there because he couldn't find a seat at the next door coffee shop from which he apparently usually operated the Silk Road. I was astounded by this fact because I frequented that coffee shop for the time I lived in San Francisco and there were never seats available. There is something delightful to me about Ross Ulbricht, international drug kingpin, standing in the front of the tiny coffee shop frowning at the other patrons taking all the seats at his preferred drug empire command post.

[3] It is virtually impossible to actually establish the typical usage patterns of Tor hidden services. The Justice Department once claimed that 80% of Tor traffic is child pornography but I am not especially familiar with their methodology and am inherently skeptical of their opinions in this area, given the DEA and all. That said, if I were in the mood I would make a lengthy technical argument that child pornography is almost the only purpose for which Tor hidden services are actually technically suited, and everyone using them for another purpose has been had.

[4] The term "on-path attacker" is actually not especially familiar to me, but I am steering towards compliance with draft Best Current Practice RFC "Terminology, Power, and Inclusive Language in Internet-Drafts and RFCs" which suggests it as a replacement for "man in the middle" which avoids the use of gendered language. While it is a bit of adjustment for me I also appreciate that "on-path attacker" more concisely expresses the technical concept.

[5] I both own handguns and do not desire to be on the receiving end of second amendment arguments, so please don't interpret this as a condemnation of firearms, but really this only serves to emphasize the idea that privacy services are something that have the potential to do good in some situations but the potential to do harm in others, which has the effect of requiring that they be promoted with the utmost caution.