_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford               home archive subscribe rss

>>> 2020-06-20 204 No Content (PDF)

One of those things that nearly everyone knows about computers is that for some reason "404" means "file not found." Most people that work with computers seriously are aware that HTTP uses a set of three-digit numbers to report status back to the client, and that these codes are categorized by first digit. For example, the '2xx' codes generally mean 'success' and '200' means 'OK." The '4xx' codes mean that there is something wrong with the request, and '404' means that the requested file could not be found by the server.

Perhaps less widely known is where this whole idea of status codes comes from.

It's not unique to HTTP at all. Another widely used internet protocol, SMTP, uses a very similar scheme of three-digit codes in which, for example, '200' means something similar to 'OK' (really just that the server is sending back a 'normal' reply) and '4xx' codes indicate a transient failure, for example '422' means that the recipient's mail box is full (exceeding storage quota). This is obviously very similar to HTTP, down to the rough meaning of the first-digit categories.

SMTP was first formally described (by Jon Postel!) in RFC 821, dated 1982. HTTP was first formally described (by Tim Berners-Lee!) in RFC 1945, dated 1996. Both protocols saw limited internal use prior to being published in RFC format, but it's clear from the gap in years that SMTP is the older protocol. In fact, it's kind of fascinating to me to consider that HTTP was published when I was alive, as it seems so ubiquitous that it must be older than me.

Anyway, FTP was formally described (also by Jon Postel!) in RFC 765 dated 1980, and in fact FTP uses a set of three-digit numeric status codes that also match the categories used by HTTP. RFC 765 elaborates somewhat on the concept of the reply codes:

The number is intended for use by automata to determine what state to
enter next; the text is intended for the human user.

We must remember that it was 1980, a rather different day in computing, when we read that a separate numeric representation must be provided "for use by automata." Indeed, a set of state diagrams is provided in the RFC based on those codes. It's an extremely "early computer science" way to approach the problem of designing a protocol. That is to say, it makes perfect logical sense and is perhaps the best approach, but has been largely abandoned today because such a state diagram for a "modern" protocol would span kilometers.

The question that interests me is whether or not FTP is the origin of the concept of three-digit status codes or reply codes, and the rough categorization of 100 for continuation, 200 for OK, 300 for redirect, 400 for temporary error, and 500 for permanent error (HTTP uses those last two a little bit differently, for client-side and server-side error).

RFC 765 was not the first discussion of FTP, which, being a very obvious idea (what if we could use this newfangled network to move files around!), has a long history. Numerous earlier RFCs represent different stages in the development of the RFC protocol. The three-digit error codes seem to first appear in RFC 354, a revision of the draft standard. Previous revisions of the draft (and protocol, prior to being TCP-based) use one-byte binary error codes or do not specify brief numeric error codes.

RFC 354 conveniently states that the FTP error codes are similar to the RJE protocol. RJE, or Remote Job Entry, is a now forgotten protocol which was essentially a very early form of RPC (as now done with protocols like XML-RPC and arguably basically all network APIs). Indeed, RJE, as described in draft form in RFC 360, includes a very similar set of status codes (including 200 OK), except that it also uses the 0xx series of codes.

Confusingly, RJE incorporates FTP as a component of the protocol, but an earlier form of FTP based on NCP (not TCP) that uses one-byte status codes.

As suggested by the sequence numbers, RFC 360 is very close in date to the previously mentioned RFC 354, and explicitly mentions that the same set of status codes are intended to be applicable to "other protocols besides RJE (like FTP.)" The wording in these two RFCs would seem to imply that the idea originated with RJE and was then also applied to FTP; the two both had authors at MIT who were presumably sharing notes, and there is logical overlap between the two protocols including RJE essentially having an FTP "mode," which makes them difficult to completely separate.

This RJE protocol, as ultimately formally described in RFC 407 after revisions, was actually somewhat sparsely used. RJE protocols in general were mostly used with mainframe and time-sharing systems, which mostly predated ARPANET, and so already had their own various RJE protocols implemented by the vendor or the user (these were back in the days when owners of time sharing systems sometimes wrote their own operating systems to get a few features they wanted). This makes it pretty difficult to trace the history of RFC 407 in much detail, not least because the term "RJE" refers collectively to at least a dozen different such published protocols.

I was able to track down contact information for one of the authors of RFC 407, Richard Guida. Unfortunately he didn't recall how the reply code numbers came about, but I'm not especially surprised. Of course this was quite a long time ago, but the reply codes also seem like a relatively obvious idea that probably didn't strike anyone as particularly noteworthy at the time.

Notably, there is some precedent. The pre-TCP (NCP) version of FTP, which predates RFC 407 RJE, uses a one-byte reply code in a fairly similar way to RJE and TCP FTP. Speculatively, it seems likely that one of the authors of RJE (or possibly TCP FTP which seems to have been written out more or less in parallel) was familiar with the previous NCP FTP protocol and decided that replacing the one-byte reply code with a three-digit ASCII reply code would both be more human-readable (useful in a time when debugging protocol implementations by interacting with them "manually" was probably more common) and would allow for hierarchical organization by digit.

In fact, the hierarchy was somewhat more specific then. Both the RJE and TCP FTP specifications refer to the reply codes as being organized into three levels by hundreds, tens, and ones. HTTP makes no mention of such a three- level hierarchy, only the two levels of hundreds and ones. While Tim Berners-Lee was clearly inspired by the RJE/FTP reply codes, he did not duplicate their structure as faithfully as SMTP.

In summary, the three-digit HTTP status codes date back to at least 1972, and were already about a quarter decade old when they (or at least a similar set) were used for HTTP. We are now coming up on 50 years since 200 "OK" was first defined, and it does not seem likely that it will go away any time soon.

One might question the utility of having these numeric reply codes when there are also text explanations sent along with them. The original intent seems to have primarily been that the numeric codes were easier to parse and use in software. That said, all the way back, protocols which use these codes have stated that the text representation is not bound to a specific string. This means that a 404 error is a 404 error regardless of whether or not the accompanying text error is 'File Not Found,' which could allow for internationalization or just unusual server configurations.

Of course, in the world of HTTP, these errors are almost always represented to the end user in the form of a dedicated page designed to express the error. As a result, the actual HTTP status code and conventional error string "File Not Found" are basically irrelevant. That said, both browsers and servers have long had default representations of these errors which included the literal phrase "404 File Not Found," and this has pushed the status code and error string into the cultural lexicon firmly enough that they remain in common use on custom- designed error pages that could say whatever they want.

In the end, a fairly minor detail of a network protocol could end up influencing the popular culture fifty years later. Kind of makes you nervous about your API designs today, doesn't it?