>>> 2020-05-25 more weird protocols (PDF)

Some days ago I was talking about FTP. Now I would like to expand with some other protocols that face similar problems.

The basic phenomena is this: network protocols which make the assumption that any internet host can open a connection to any other internet host, something that was an important part of the Big Idea of the internet but is broadly untrue today due to NAT and other phenomena. In fact, to be clear, for the most part a values decision has been made that this behavior is explicitly undesirable for reasons of security (thus the extance of firewalls).

FTP is not alone in this sort of design, it's actually a fairly common design pattern to have a control channel which is independent of the data channels. It makes a lot of engineering sense, a separate control plane and data plane are viewed as key parts of the whole concept of "software defined networking" and other two-point-oh developments. The idea isn't new at all, but in addition to being found on the cutting edge it's also found in various ossified protocols that were designed some time ago and still used today.

For an example which is, today, largely a toy example, consider RTMP. RTMP or real-time media protocol is one of those things which was clearly designed by people with high ideals but little awareness of how the 21st century internet would function. RTMP is a very flexible protocol which provides a control channel to negotiate streaming media connections between origins, mixers, and receivers. It is designed to accommodate sophisticated situations like live video production where there are multiple stages of switching and processing that media passes through before reaching an end-user.

And, in perfect old protocol fashion, it is regarded as completely unsuitable for this purpose today. There is no television studio running on a complex RTMP architecture. What does run on RTMP is the unidirectional, unicast link from a Twitch streamer to the Twitch service. In fact, if we squint and tilt our heads just right, we could look at most modern uses of RTMP as being the internet equivalent of the studio-transmitter links or STLs which television and radio studios use to deliver their content to the actual transmitter.

How is it that RTMP went from a flexible system for media processing and streaming to a simple unicast protocol that just presents an apparently over-complicated solution to delivering an MPEG stream to a streaming service that actually distributes it (via HLS or MPEG-DASH)? Well, it's essentially the same problem. RTMP defines a control channel which is used to agree on the connections that should be opened to actually move the media. The problem is that, today, if a client and server set up a stream from the server to the client (as is the most common use case for streaming media), the server won't actually be able to connect to the client because it is behind NAT or something.

So, RTMP was replaced with things like HLS (what YouTube seems to use most of the time) and MPEG-DASH (an up-and-coming open standard for the same thing) which rather crudely implement real-time media over HTTP, more or less by brute force of downloading the same file over and over again. It's hard to describe HLS/DASH as anything but ugly hacks to fit a round peg in a square hole, but they're what we have, because they don't encounter the problem of needing to open connections in the "wrong" direction---and, more strongly, they fit into the tidy world of everything running inside a web browser which highly constrains what kind of connections can be opened to where.

The change from RealMedia (which used RTMP to deliver streaming media to the consumer) to the HLS we have today is a huge one, and one that was not motivated by any kind of elegance but instead only by the expedience of needing a solution that worked in a web browser. If there is one overriding theme for the technology work of the 21st century, it's "make it work in a web browser."

How about another example? A rather ugly one that I run into pretty often is SIP or session initiation protocol. For the un-initiated (if you will pardon the semi-pun), SIP is the dominant protocol used by VoIP phones and infrastructure to set up audio channels between e.g. two people who would like to talk to each other. This seemingly rather simple application of computer technology (that is, replicate the thing that analogue phones have done for a century) is in actuality extremely complicated to use over the internet because, first, computers are bad, and second, because SIP uses a control-channel-plus-media- channels architecture that is fundamentally at odds with the modern internet.

When you dial a call on a VoIP phone, it uses a control channel connection (which was likely already established through SIP's REGISTER verb) to tell a server or another phone that it would like to open a voice channel. The other end then connects back for audio one way, and the phone connects out for audio the other way. This makes a lot of sense if you look at it from the abstract perspective of How Computers Ought To Work. In practice, if there is any kind of complexity involved it's very common to end up with one-way audio (you can hear but not speak or vice versa) because one of the connections could not be made.

The end result is that the VoIP protocol stack is more or less completely unsuitable to run over the actual internet, and so VoIP phones in corporate contexts are usually placed on a special VLAN where they can be coddled into believing that the internet is the perfect world that physicists once imagined. When it comes to actually moving VoIP across the internet, VoIP or "virtual private ethernet" solutions are common to achieve the same thing. VoIP phones, I contend, only actually work when software is used to emulate the ideal internet for which the protocols were designed.

So that's where we are. It's the year 2020, and using computer technology to allow two people to talk to each other by voice is still a feat of technology largely only achieved by proprietary solutions that take stupidly simple approaches but can still advertise them as Advanced Technology because the open standards solutions don't work for very common use-cases. If you're cynical like me, look into WebRTC which is a "works in a web browser" approach to a similar problem space which falls flat on its face for the same reasons--- in that case HTTP is the control channel and opening connections both ways is still a mess.

There are yet more examples of the same problem, but RTMP and SIP are the ones that I run into somewhat regularly. SIP is particularly interesting because it is still widely used in corporate contexts where it requires special network preparation to work. There was a time when SIP was thought to be such a big innovation that it would be the basis of things like instant messaging. In fact, Microsoft Lynq (now Skype for Bidness) uses SIP for transporting text messages. Anyone who has ever interacted with Lynq, even a single time, will appreciate that this is an unmitigated disaster, and that applying the Skype name to it has done nothing but ruin the already grim reputation of Skype under Microsoft ownership.

To speak more broadly, I often contend that instead of the TCP model of the network (physical - IP - TCP - application) we ought to discuss the Front-End Model of Physical - IP - TCP - HTTP - Application. A large part of the reason for this is that web browsers implement HTTP and everything in 2020 is expected to run in a web browser, but a large part of the reason is also that HTTP maintains a more or less strict client-server unidirectional connection model which is better suited for the modern internet than most special-purpose protocols which were designed under the assumption that the internet actually, umm, worked.

But here we are: the internet is a capstone of society and barely functions. Fortunately there has been a lot of ingenuity to work around the problems by, well, making everything use exclusively HTTP. Go us.