_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford               home archive subscribe rss
COMPUTERS ARE BAD is a newsletter semi-regularly issued directly to your doorstep to enlighten you as to the ways that computers are bad and the many reasons why. While I am not one to stay on topic, the gist of the newsletter is computer history, computer security, and "constructive" technology criticism.

I have an MS in information security, more certifications than any human should, and ready access to a keyboard. These are all properties which make me ostensibly qualified to comment on issues of computer technology. When I am not complaining on the internet, I work in professional services for a DevOps software vendor. I have a background in security operations and DevSecOps, but also in things that are actually useful like photocopier repair.

You can read this here, on the information superhighway, but to keep your neighborhood paperboy careening down that superhighway on a bicycle please subscribe. This also contributes enormously to my personal self esteem. There is, however, also an RSS feed for those who really want it. Fax delivery available by request.

--------------------------------------------------------------------------------

>>> 2024-02-11 the top of the DNS hierarchy

In the past (in fact two years ago, proof I have been doing this for a while now!) I wrote about the "inconvenient truth" that structural aspects of the Internet make truly decentralized systems infeasible, due to the lack of a means to perform broadcast discovery. As a result, most distributed systems rely on a set of central, semi-static nodes to perform initial introductions.

For example, Bitcoin relies on a small list of volunteer-operated domain names that resolve to known-good full nodes. Tor similarly uses a small set of central "directory servers" that provide initial node lists. Both systems have these lists hardcoded into their clients; coincidentally, both have nine trusted, central hostnames.

This sort of problem exists in basically all distributed systems that operate in environments where it is not possible to shout into the void and hope for a response. The internet, for good historic reasons, does not permit this kind of behavior. Here we should differentiate between distributed and decentralized, two terms I do not tend to select very carefully. Not all distributed systems are decentralized, indeed, many are not. One of the easiest and most practical ways to organize a distributed system is according to a hierarchy. This is a useful technique, so there are many examples, but a prominent and old one happens to also be part of the drivetrain mechanics of the internet: DNS, the domain name system.

My reader base is expanding and so I will provide a very brief bit of background. Many know that DNS is responsible for translating human-readable names like "computer.rip" into the actual numerical addresses used by the internet protocol. Perhaps a bit fewer know that DNS, as a system, is fundamentally organized around the hierarchy of these names. To examine the process of resolving a DNS name, it is sometimes more intuitive to reverse the name, and instead of "computer.rip", discuss "rip.computer" [1].

This name is hierarchical, it indicates that the record "computer" is within the zone "rip". "computer" is itself a zone and can contain yet more records, we tend to call these subdomains. But the term "subdomain" can be confusing as everything is a subdomain of something, even "rip" itself, which in a certain sense is a subdomain of the DNS root "." (which is why, of course, a stricter writing of the domain name computer.rip would be computer.rip., but as a culture we have rejected the trailing root dot).

Many of us probably know that each level of the DNS hierarchy has authoritative nameservers, operated typically by whoever controls the name (or their third-party DNS vendor). "rip" has authoritative DNS servers provided by a company called Rightside Group, a subsidiary of the operator of websites like eHow that went headfirst into the great DNS land grab and snapped up "rip" as a bit of land speculation, alongside such attractive properties as "lawyer" and "navy" and "republican" and "democrat", all of which I would like to own the "computer" subdomain of, but alas such dictionary words are usually already taken.

"computer.rip", of course, has authoritative nameservers operated by myself or my delegate. Unlike some people I know, I do not have any nostalgia for BIND, and so I pay a modest fee to a commercial DNS operator to do it for me. Some would be surprised that I pay for this; DNS is actually rather inexpensive to operate and authoritative name servers are almost universally available as a free perk from domain registrars and others. I just like to pay for this on the general feeling that companies that charge for a given service are probably more committed to its quality, and it really costs very little and changing it would take work.

To the observant reader, this might leave an interesting question. If even the top-level domains are subdomains of a secret, seldom-seen root domain ".", who operates the authoritative name servers for that zone?

And here we return to the matter of even distributed systems requiring central nodes. Bitcoin uses nine harcoded domain names for initial discovery of decentralized peers. DNS uses thirteen harcoded root servers to establish the top level of the hierarchy.

These root servers are commonly referred to as a.root-servers.net through m.root-servers.net, and indeed those are their domain names, but remember that when we need to use those root servers we have no entrypoint into the DNS hierarchy and so are not capable of resolving names. The root servers are much more meaningfully identified by their IP addresses, which are "semi-harcoded" into recursive resolves in the form of what's often called a root hints file. You can download a copy, it's a simple file in BIND zone format that BIND basically uses to bootstrap its cache.

And yes, there are other DNS implementations too, a surprising number of them, even in wide use. But when talking about DNS history we can mostly stick to BIND. BIND used to stand for Berkeley Internet Name Domain, and it is an apt rule of thumb in computer history that anything with a reference to UC Berkeley in the name is probably structurally important to the modern technology industry.

One of the things I wanted to get at, when I originally talked about central nodes in distributed systems, is the impact it has on trust and reliability. The TOR project is aware that the nine directory servers are an appealing target for attack or compromise, and technical measures have been taken to mitigate the possibility of malicious behavior. The Bitcoin project seems to mostly ignore that the DNS seeds exist, but of course the design of the Bitcoin system limits their compromise to certain types of attacks. In the case of DNS, much like most decentralized systems, there is a layer of long-lived caching for top-level domains that mitigates the impact of unavailability of the root servers, but still, in every one of these systems, there is the possibility of compromise or unavailability if the central nodes are attacked.

And so there is always a layer of policy. A trusted operator can never guarantee the trustworthiness of a central node (the node could be compromised, or the trusted operator could turn out to be the FBI), but it sure does help. Tor's directory servers are operated by the Tor project. Bitcoin's DNS seeds are operated by individuals with a long history of involvement in the project. DNS's root nodes are operated by a hodgepodge of companies and institutions that were important to the early internet.

Verisign operates two, of course. A California university operates one, of course, but amusingly not Berkeley. Three are operated by various arms of US defense. Some internet industry associations, an NCC, another university, ICANN runs one of them themselves. It's pretty random, though, and just reflects a set of organizations prominently involved in the early internet.

Some people, even some journalists I've come across, hear that there are 13 name servers and picture 13 4U boxes with a lot of blinking lights in heavily fortified data centers. Admittedly this description was more or less accurate in the early days, and a couple of the smaller root server operators did have single machines until surprisingly recently. But today, all thirteen root server IP addresses are anycast groups.

Anycast is not a concept you run into every day, because it's not really useful on local networks where multicast can be used. But it's very important to the modern internet. The idea is this: an IP address (really a subnetwork) is advertised by multiple BGP nodes. Other BGP nodes can select the advertisement they like the best, typically based on lowest hop count. As a user, you connect to a single IP address, but based on the BGP-informed routing tables of internet service providers your traffic could be directed to any number of sites. You can think of it as a form of load balancing at the IP layer, but it also has the performance benefit of users mostly connecting to nearby nodes, so it's widely used by CDNs for multiple reasons.

For DNS, though, where we often have a bootstrapping problem to solve, anycast is extremely useful as a way to handle "special" IP addresses that are used directly. For authoritative DNS servers like 192.5.5.241 [2001:500:2f::f] [2] (root server F) or recursive resolvers like 8.8.8.8 [2001:4860:4860::8888] (Google public DNS), anycast is the secret that allows a "single" address to correspond to a distributed system of nodes.

So there are thirteen DNS root servers in the sense that there are thirteen independently administered clusters of root servers (with the partial exception of A and J, both operated by Verisign, due to their acquisition of former A operator Network Solutions). Each of the thirteen root servers is, in practice, a fairly large number of anycast sites, sometimes over 100. The root server operators don't share much information about their internal implementation, but one can assume that in most cases the anycast sites consist of multiple servers as well, fronted by some sort of redundant network appliance. There may only be thirteen of them, but each of the thirteen is quite robust. For example, the root servers typically place their anycast sites in major internet exchanges distributed across both geography and provider networks. This makes it unlikely that any small number of failures would seriously affect the number of available sites. Even if a root server were to experience a major failure due to some sort of administration problem, there are twelve more.

Why thirteen, you might ask? No good reason. The number of root servers basically grew until the answer to an NS request for "." hit the 512 byte limit on UDP DNS responses. Optimizations over time allowed this number to grow (actually using single letters to identify the servers was one of these optimizations, allowing the basic compression used in DNS responses to collapse the matching root-servers.net part). Of course IPv6 blew DNS response sizes completely out of the water, leading to the development of the EDNS extension that allows for much larger responses.

13 is no longer the practical limit, but with how large some of the 13 are, no one sees a pressing need to add more. Besides, can you imagine the political considerations in our modern internet environment? The proposed operator would probably be Cloudflare or Google or Amazon or something and their motives would never be trusted. Incidentally, many of the anycast sites for root server F (operated by ISC) are Cloudflare data centers used under agreement.

We are, of course, currently trusting the motives of Verisign. You should never do this! But it's been that way for a long time, we're already committed. At least it isn't Network Solutions any more. I kind of miss when SRI was running DNS and military remote viewing.

But still, there's something a little uncomfortable about the situation. Billions of internet hosts depend on thirteen "servers" to have any functional access to the internet.

What if someone attacked them? Could they take the internet down? Wouldn't this cause a global crisis of a type seldom before seen? Should I be stockpiling DNS records alongside my canned water and iodine pills?

Wikipedia contains a great piece of comedic encyclopedia writing. In its article on the history of attacks on DNS root servers, it mentions the time, in 2012, that some-pastebin-user-claiming-to-be-Anonymous (one of the great internet security threats of that era) threatened to "shut the Internet down". "It may only last one hour, maybe more, maybe even a few days," the statement continues. "No matter what, it will be global. It will be known."

That's the end of the section. Some Wikipedia editor, no doubt familiar with the activities of Anonymous in 2012, apparently considered it self-evident that the attack never happened.

Anonymous may not have put in the effort, but others have. There have been several apparent DDoS attacks on the root DNS servers. One, in 2007, was significant enough that four of the root servers suffered---but there were nine more, and no serious impact was felt by internet users. This attack, like most meaningful DDoS, originated with a botnet. It had its footprint primarily in Korea, but C2 in the United States. The motivation for the attack, and who launched it, remains unknown.

There is a surprisingly large industry of "booters," commercial services that, for a fee, will DDoS a target of your choice. These tend to be operated by criminal groups with access to large botnets; the botnets are sometimes bought and sold and get their tasking from a network of resellers. It's a competitive industry. In the past, booters and botnet operators have sometimes been observed announcing a somewhat random target and taking it offline as, essentially, a sales demonstration. Since these demonstrations are a known behavior, any time a botnet targets something important for no discernible reason, analysts have a tendency to attribute it to a "show of force." I have little doubt that this is sometimes true, but as with the tendency to attribute monumental architecture to deity worship, it might be an overgeneralization of the motivations of botnet operators. Sometimes I wonder if they made a mistake, or maybe they were just a little drunk and a lot bored, who is to say?

The problem with this kind of attribution is evident in the case of the other significant attack on the DNS root servers, in 2015. Once again, some root servers were impacted badly enough that they became unreliable, but other root servers held on and there was little or even no impact to the public. This attack, though, had some interesting properties.

In the 2007 incident, the abnormal traffic to the root servers consisted of large, mostly-random DNS requests. This is basically the expected behavior of a DNS attack; using randomly generated hostnames in requests ensures that the responses won't be cached, making the DNS server exert more effort. Several major botnet clients have this "random subdomain request" functionality built in, normally used for attacks on specific authoritative DNS servers as a way to take the operator's website offline. Chinese security firm Qihoo 360, based on a large botnet honeypot they operate, reports that this type of DNS attack was very popular at the time.

The 2015 attack was different, though! Wikipedia, like many other websites, describes the attack as "valid queries for a single undisclosed domain name and then a different domain the next day." In fact, the domain names were disclosed, by at least 2016. The attack happened on two days. On the first day, all requests were for 336901.com. The second day, all requests were for 916yy.com.

Contemporaneous reporting is remarkably confused on the topic of these domain names, perhaps because they were not widely known, perhaps because few reporters bothered to check up on them thoroughly. Many sources make it sound like they were random domain names perhaps operated by the attacker, one goes so far as to say that they were registered with fake identities.

Well, my Mandarin isn't great, and I think the language barrier is a big part of the confusion. No doubt another part is a Western lack of familiarity with Chinese internet culture. To an American in the security industry, 336901.com would probably look at first like the result of a DGA or domain generation algorithm. A randomly-generated domain used specifically to be evasive. In China, though, numeric names like this are quite popular. Qihoo 360 is, after all, domestically branded as just 360---360.cn.

As far as I can tell, both domains were pretty normal Chinese websites related to mobile games. It's difficult or maybe impossible to tell now, but it seems reasonable to speculate that they were operated by the same company. I would assume they were something of a gray market operation, as there's a huge intersection between "mobile games," "gambling," and "target of DDoS attacks." For a long time, perhaps still today in the right corners of the industry, it was pretty routine for gray-market gambling websites to pay booters to DDoS each other.

In a 2016 presentation, security researchers from Verisign (Weinberg and Wessels) reported on their analysis of the attack based on traffic observed at Verisign root servers. They conclude that the traffic likely originated from multiple botnets or at least botnet clients with different configurations, since the attack traffic can be categorized into several apparently different types [3]. Based on command and control traffic from a source they don't disclose (perhaps from a Verisign honeynet?), they link the attack to the common "BillGates" [4] botnet. Most interestingly, they conclude that it was probably not intended as an attack on the DNS root: the choice of fixed domain names just doesn't make sense, and the traffic wasn't targeted at all root servers.

Instead, they suspect it was just what it looks like: an attack on the two websites the packets queried for, that for some reason was directed at the root servers instead of the authoritative servers for that second-level domain. This isn't a good strategy; the root servers are a far harder target than your average web hosting company's authoritative servers. But perhaps it was a mistake? An experiment to see if the root server operators might mitigate the DDoS by dropping requests for those two domains, incidentally taking the websites offline?

Remember that Qihoo 360 operates a large honeynet and was kind enough to publish a presentation on their analysis of root server attacks. Matching Verisign's conclusions, they link the attack to the BillGates botnet, and also note that they often observe multiple separate botnet C2 servers send tasks targeting the same domain names. This probably reflects the commercialized nature of modern botnets, with booters "subcontracting" operations to multiple botnet operators. It also handily explains Verisign's observation that the 2015 attack traffic seems to have come from more than one implementation a DNS DDoS.

360 reports that, on the first day, five different C2 servers tasked bots with attacking 336901.com. On the second day, three C2 servers tasked for 916yy.com. But they also have a much bigger revelation: throughout the time period of the attacks, they observed multiple tasks to attack 916yy.com using several different methods.

360 concludes that the 2015 DNS attack was most likely the result of a commodity DDoS operation that decided to experiment, directing traffic at the DNS roots instead of the authoritative server for the target to see what would happen. I doubt they thought they'd take down the root servers, but it seems totally reasonable that they might have wondered if the root server operators would filter DDoS traffic based on the domain name appearing in the requests.

Intriguingly, they note that some of the traffic originated with a DNS attack tool that had significant similarities to BillGates but didn't produce quite the same packets. Likely we will never know, but a likely explanation is that some group modified the BillGates DNS attack module or implemented a new one based on the method used by BillGates.

Tracking botnets gets very confusing very fast, there are just so many different variants of any major botnet client! BillGates originated, for example, as a Linux botnet. It was distributed to servers, not only through SSH but through vulnerabilities in MySQL and ElasticSearch. It was unusual, for a time, in being a major botnet that skipped over the most common desktop operating system. But ports of BillGates to Windows were later observed, distributed through an Internet Explorer vulnerability---classic Windows. Why someone chose to port a Linux botnet to Windows instead of using one of the several popular Windows botnets (Conficker, for example) is a mystery. Perhaps they had spent a lot of time building out BillGates C2 infrastructure and, like any good IT operation, wanted to simplify their cloud footprint.

High in the wizard's tower of the internet, thirteen elders are responsible for starting every recursive resolver on its own path to truth. There's a whole Neal Stephenson for Wired article there. But in practice it's a large and robust system. The extent of anycast routing used for the root DNS servers, to say nothing of CDNs, is one of those things that challenges are typical stacked view of the internet. Geographic load balancing is something we think of at high layers of the system, it's surprising to encounter it as a core part of a very low level process.

That's why we need to keep our thinking flexible: computers are towers of abstraction, and complexity can be added at nearly any level, as needed or convenient. Seldom is this more apparent than it is in any process called "bootstrapping." Some seemingly simpler parts of the internet, like DNS, rely on a great deal of complexity within other parts of the system, like BGP.

Now I'm just complaining about pedagogical use of the OSI model again.

[1] The fact that the DNS hierarchy is written from right-to-left while it's routinely used in URIs that are otherwise read left-to-right is one of those quirks of computer history. Basically an endianness inconsistency. Like American date order, to strictly interpret a URI you have to stop and reverse your analysis part way through. There's no particular reason that DNS is like that, there was just less consistency over most significant first/least significant first hierarchical ordering at the time and contemporaneous network protocols (consider the OSI stack) actually had a tendency towards least significant first.

[2] The IPv4 addresses of the root servers are ages old and mostly just a matter of chance, but the IPv6 addresses were assigned more recently and allowed an opportunity for something more meaningful. Reflecting the long tradition of identifying the root servers by their letter, many root server operators use IPv6 addresses where the host part can be written as the single letter of the server (i.e. root server C at [2001:500:2::c]). Others chose a host part of "53," a gesture at the port number used for DNS (i.e. root server J, [2001:7fe::53]). Others seem more random, Verisign uses 2:30 for both of their root servers (i.e. root server A, [2001:503:ba3e::2:30]), so maybe that means something to them, or maybe it was just convenient. Amusingly, the only operator that went for what I would call an address pun is the Defense Information Systems Agency, which put root server G at [2001:500:12::d0d].

[3] It really dates this story that there was some controversy around the source IPs of the attack, originating with none other than deceased security industry personality John McAfee. He angrily insisted that it was not plausible that the source IPs were spoofed. Of course botnets conducting DDoS attacks via DNS virtually always spoof the source IP, as there are few protections in place (at the time almost none at all) to prevent it. But John McAfee has always had a way of ginning up controversy where none was needed.

[4] Botnets are often bought, modified, and sold. They tend to go by various names from different security researchers and different variants. I'm calling this one "BillGates" because that's the funniest of the several names used for it.

--------------------------------------------------------------------------------

>>> 2024-01-31 multi-channel audio part 2

Last time, we left off at the fact that modern films are distributed with their audio in multiple formats. Most of the time, there is a stereo version of the audio, and a multi-channel version of the audio that is perhaps 5.1 or 7.1 and compressed using one of several codecs that were designed within the film industry for this purpose.

But that was all about film, in physical form. In the modern world, films go out to theaters in the form of Digital Cinema Packages, a somewhat elaborate format that basically comes down to an encrypted motion JPEG 2000 stream with PCM audio. There are a lot of details there that I don't know very well and I don't want to get hung up on anyway, because I want to talk about the consumer experience.

As a consumer, there are a lot of ways you get movies. If you are a weirdo, you might buy a Blu-Ray disc. Optical discs are a nice case, because they tend to conform to a specification that allows relatively few options (so that players are reasonable to implement). Blu-Ray are allowed to encode their audio as linear PCM [1], Dolby Digital, Dolby TrueHD, DTS, DTS-HD, or DRA.

DRA is a common standard in the Chinese market but not in the US (that's where I live), so I'll ignore it. That still leaves three basic families of codecs, each of which have some variations. One of the interesting things about the Blu-Ray specification is that PCM audio can incorporate up to eight channels. The Blu-Ray spec allows up to 27,648 Kbps of audio, so it's actually quite feasible to do uncompressed, 24-bit, 96 kHz, 7.1 audio on a Blu-Ray disc. This is an unusual capability in a consumer standard and makes the terribly named Blu-Ray High Fidelity Pure Audio standard for Blu-Ray audio discs make more sense. Stick a pin in that, though, because you're going to have a tough time actually playing uncompressed 7.1.

On the other hand, you might use a streaming service. There's about a million of those and half of them have inane names ending in Plus, so I'm going to simplify by pretending that we're back in 2012 and Netflix is all that really matters. We can infer from Netflix help articles that Netflix delivers audio as AAC or Dolby Digital.

Or, consider the case of video files that you obtained by legal means. I looked at a few of the movies on my NAS to take a rough sampling. Most older films, and some newer ones, have stereo AAC audio. Some have what VLC describes as A52 aka AC3. A/52 is an ATSC standard that is equivalent to AC3, and AC-3 (hyphen inconsistent) is sort of the older name of Dolby Digital or the name of the underlying transport stream format, depending on how you squint at it. Less common, in my hodgepodge sample, is DTS, but I can find a few.

VLC typically describes the DTS and Dolby Digital as 3F2M/LFE, which is a somewhat eccentric (and I think specific to VLC) notation for 5.1 surround. An interesting detail is that VLC differentiates 3F2M/LFE and 3F2R/LFE, both 5.1, but with the two "surround" channels assigned to either side or rear positions. While 5.1 configurations with the surround channels to the side seem to be more standard, you could potentially put the two surround channels to the rear. Some formats have channel mapping metadata that can differentiate the two.

Because there is no rest for the weary, there is some inconsistency between "5.1 side" and "5.1 rear" in different standards and formats. At the end of the day, most applications don't really differentiate. I tend to consider surround channels on the side to be "correct," in that movie theaters are configured that way and thus it's ostensibly the design target for films. One of few true specifications I could find for general use, rather than design standards specific to theaters like THX, is ITU-R BS 775. It states that the surround channels of a 5.1 configuration should be mostly to the side, but slightly behind the listener.

That digression aside, it's unsurprising that a video file could contain a multi-channel stream. Most video containers today can support basically arbitrary numbers of streams, and you could put uncompressed multichannel audio into such a container if you wanted. And yet, multi-channel audio in films almost always comes in the form of a Dolby Digital or DTS stream. Why is that? Well, in part, because of tradition: they used to be the formats used by theaters, although digital cinema has somewhat changed that situation and the consumer versions have usually been a little different in the details. But the point stands, films are usually mastered in Dolby or DTS, so the "home video" release goes out with Dolby or DTS.

Another reason, though, is the problem of interconnections.

Let's talk a bit about interconnections. In a previous era of consumer audio, the age of "hi-fi," component systems dominated residential living rooms. In a component system, you had various audio sources that connected to a device that came to be known as a "receiver" since it typically had an FM/AM radio receiver integrated. It is perhaps more accurate to refer to it as an amplifier since that's the main role it serves in most modern systems, but there's also an increasing tendency to think of their input selection and DSP features as part of a preamp. The device itself is sometimes referred to as a preamp, in audiophile circles, when component amplifiers are used to drive the actual speakers. You can see that in these conventional component systems you need to move audio signals between devices. This kind of set up, though, is not common in households with fewer than four bathrooms and one swimming pool.

Most consumers today seem to have a television and, hopefully, some sort of audio device like a soundbar. Sometimes there are no audio interconnections at all! Often the only audio interconnection is from the TV to the soundbar via HDMI. Sometimes it's wireless! So audio interconnects as a topic can feel a touch antiquated today, but these interconnects still matter a lot in practice. First, they are often either the same as something used in industry or similar to something used in industry. Second, despite the increasing prevalence of 5.1 and 7.1 soundbar systems with wireless satellites, the kind of people with a large Blu-Ray collection are still likely to have a component home theater system. Third, legacy audio interconnects don't die that quickly, because a lot of people have an older video game console or something that they want to work with their new TV and soundbar, so manufacturers tend to throw in one or two audio interconnects even if they don't expect most consumers to use them.

So let's think about how to transport multi-channel audio. An ancient tradition in consumer audio says that stereo audio will be sent between components on two sets of two-conductor cables terminated by RCA connectors. The RCA connector dates back to to the Radio Corporation of America and, apparently, at least 1937. It remains in widespread service today. There are a surprising number of variations in this interconnect, in practice.

For one, the audio cables may be coaxial or just zipped up in a common jacket. Coaxial audio cables are a bit more expensive and a lot less flexible but admit less noise. There is a lot of confusion in this area because a particular digital transport we'll talk about later specified coaxial cables terminated in RCA connectors, but then is frequently used with non-coaxial cables terminated in RCA connectors, and for reasonable lengths usually still works fine. This has lead to a lot of consumer confusion and people thinking that any cable with RCA connectors is coaxial, when in fact, most of them are not. Virtually all of them are not. Unless you specifically paid more money to get a coaxial one, it's not, and even then sometimes it's not, because Amazon is a hotbed of scams.

Second, though these connections are routinely described as "line level" as if that means something, there is remarkably little standardization of the actual signaling. There are various conventions like 1.7v peak-to-peak and 2v peak-to-peak and about 1v peak-to-peak, and few consumer manufacturers bother to tell you which convention they have followed. There are also a surprising number of ways of expressing signaling levels, involving different measurement bases (peak vs RMS) and units (dBv vs dBu), making it a little difficult to interpret specifications when they are provided. This whole mess is just one of the reasons you find yourself having to make volume adjustments for different sources, or having to tune input levels on receivers with that option [2].

But that's all sort of a tangent, the point here is multi-channel audio. You could, conceptually, move 5.1 over six RCA cables, or 7.1 over eight RCA cables. Home theater receivers used to give you this option, but much like analog HDTV connections, it has largely disappeared.

There is one other analog option: remember Pro Logic, from the film soundtracks? that matrixed five channels into the analog stereo? Some analog formats like VHS and LaserDisc often had a Pro Logic soundtrack that could be "decoded" (really dematrixed) by a receiver with that capability, which used to be common. In this case you can transport multi-channel audio over your normal two RCA cables. The matrixing technique was always sort of cheating, though, and produces inferior results to actual multichannel interconnects. It's no longer common either.

Much like video, audio interconnects today have gone digital. Consumer digital audio really took flight with the elegantly named Sony/Philips Digital Interface, or S/PDIF. S/PDIF specifies a digital format that is extremely similar to, but not quite the same as, a professional digital interconnect called AES3. AES3 is typically carried on a three-conductor (balanced) cable with XLR connectors, though, which are too big an expensive for consumer equipment. In one of the weirder decisions in the history of consumer electronics, one that I can only imagine came out of an intractable political fight, S/PDIF specified two completely different physical transports: one electrical, and one optical.

The electrical format should be transmitted over a coaxial cable with RCA connectors. In practice it is often used over non-coaxial cables with RCA connectors, which will usually work fine if the length is short and nothing nearby is too electrically noisy. S/PDIF over non-coaxial cables is "fine" in the same way that HDMI cables longer than you are tall are "fine." If it doesn't work reliably, try a more expensive cable and you'll probably be good.

The optical format is used with cheap plastic optical cables terminated in a square connector called Toslink, originally for Toshiba Link, after the manufacturer that gave us the optical variant. Toslink is one of those great disappointments in consumer products. Despite the theoretical advantages of an optical interconnect, the extremely cheap cables used with Toslink mean it's mostly just worse than the electrical transport, especially when it comes to range [3].

But the oddity of S/PDIF's sibling formats isn't the interesting thing here. Let's talk about the actual S/PDIF bitstream, the very-AES3-like format the audio actually needs to get through.

S/PDIF was basically designed for CDs, and so it comfortably carries CD audio: two channels of 16 bit samples at 44.1kHz. In fact, it can comfortably go further, carrying 20 (or with the right equipment even 24) bit samples at the 48 kHz sampling rate more common of digital audio other than CDs. That's for two channels, though. Make the leap to six channels for 5.1 and you are well beyond the capabilities of an S/PDIF transceiver.

You see where this is going? compression.

See, the problems that Dolby Digital and DTS solved, of fitting multichannel audio onto the limited space of a 35mm film print, also very much exist in the world of S/PDIF. CDs brought us uncompressed digital audio remarkably early on, but also set sort of a constraint on the bitrate of digital audio streams that ensured the opposite in the world of multi-channel theatrical sound. It sort of makes sense, anyway. DTS soundtracks came on CDs!

Of course even S/PDIF is looking rather long in the tooth today. I don't think I use it at all any more, which is not something I expected to be saying this soon. Today, though, all of my audio sources and sinks are either analog or have HDMI. HDMI is the de facto norm for consumer digital audio today.

HDMI is a complex thing when it comes to audio or, really, just about anything. Details like eARC and the specific HDMI version have all kinds of impacts on what kind of audio can be carried, and the same is true for video as well. I am going to spare a lengthy diversion into the many variants of HDMI, which seem almost as numerous as those of USB, and talk about HDMI 2.1.

Unsurprisingly, considering the numerous extra conductors and newer line coding, HDMI offers a lot more bandwidth for audio than S/PDIF. In fact, you can transport 8 channels of uncompressed 24-bit PCM at 192kHz. That's about 37 Mbps, which is not that fast for a data transport but sure is pretty fast for an audio cable. Considering the bandwidth requirements for 4K video at 120Hz, though, it's only a minor ask. With HDMI, compression of audio is no longer necessary.

But we still usually do it.

Why? Well, basically everything can handle Dolby Digital or DTS, and so films are mostly mastered to Dolby Digital or DTS, and so we mostly use Dolby Digital or DTS. That's just the way of things.

One of the interesting implications of this whole thing is that audio stacks have to deal with multiple formats and figure out which format is in use. That's not really new, with Dolby Pro Logic you either had to turn it on/off with a switch or the receiver had to try to infer whether or not Pro Logic had been used to matrix a multichannel soundtrack to stereo. For S/PDIF, IEC 61937 standardizes a format that can be used to encapsulate a compressed audio stream with sufficient metadata to determine the type of compression. HDMI adopts the same standard to identify compressed audio streams (and, in general, HDMI audio is pretty much in the same bitstream format as good old S/PDIF, but you can have a lot more of it).

In practice, there are a lot of headaches around this format switching. For one, home theater receivers have to switch between decoding modes. They mostly do this transparently and without any fuss, but I've owned a couple that had occasional issues with losing track of which format was in use, leading to dropouts. Maybe related to signal dropouts but my current receiver has the same problem with internal sources, so it seems more like a software bug of some sort.

It's a lot more complicated when you get out of dedicated home theater devices, though. Consider the audio stack of a general-purpose operating system. First, PCs rarely have S/PDIF outputs, so we are virtually always talking about HDMI. For a surprisingly long time, common video cards had no support for audio over HDMI. This is fortunately a problem of the past, but unfortunately ubiquitous audio over HDMI means that your graphics drivers are now involved in the transport of audio, and graphics drivers are notoriously bad at reliably producing video, much less dealing with audio as a side business. I shudder to think of the hours of my life I have lost dealing with defects of AMD's DTS support.

Things are weird on the host software side, though. The operating system does not normally handle sound in formats even resembling Dolby Digital or DTS. So, when you play a video file with audio encoded in one of those formats, a "passthrough" feature is typically used to deliver the compressed stream directly to the audio (often actually video) device, without normal operating system intervention. We are reaching the point where this mostly just works but you will still notice some symptoms of the underlying complexity.

On Linux, it's possible to get this working, but in part because of licensing issues I don't think any distros will do it right out of the box. My knowledge may be out of date as I haven't tried for some time, but I am still seeing Kodi forum threads about bash scripts to bypass PulseAudio, so things seem mostly unchanged.

There are other frustrations, as well. For one, the whole architecture of multichannel audio interconnection is based around sinks detecting the mode used by the source. That means that your home theater receiver should figure out what your video player is doing, but your video player has no idea what your home theater receiver is doing. This manifests in maddening ways. Consider, for example, the number of blog posts I ran across (while searching for something else!) about how to make Netflix less quiet by disabling surround sound.

If Netflix has 5.1 audio they deliver it; they don't know what your speaker setup is. But what if you don't have 5.1 speakers? In principal you could downmix the 5.1 back to stereo, and a lot of home theater receivers have DSP modes that do this (and in general downmix 5.1 or 7.1 to whatever speaker channels are active, good for people with less common setups like my own 3.1). But you'd have to turn that on, which means having a receiver or soundbar or whatever that is capable, understanding the issue, and knowing how to enable that mode. That is way more than your average Netflix watcher wants to think about any of this. In practice, setting the Netflix player to only ever provide stereo audio is an easier fix.

The use of compressed multichannel formats that are decoded in the receiver rather than the computer playing back introduces other problems as well, like source equalization. If you have a computer connected to a home theater receiver (which is a ridiculous thing to do and yet here I am), you have two completely parallel audio stacks: "normal" audio that passes through the OS sound server and goes to the receiver as PCM, and "surround sound" that bypasses the OS sound server and goes to the receiver as Dolby Digital or DTS. It is very easy to have differences in levels, adjustments, latency, etc. between these two paths. The level problem here is just one of the several factors in the perennial "Plex is too quiet" forum threads [4].

Finally, let's talk about what may be, to some readers, the elephant in the room. I keep talking about Dolby Digital and DTS, but both are 5.1 formats, and 5.1 is going out of fashion in the movie world. Sure, there's Dolby Digital Plus which is 7.1, but it's so similar to the non-plus variant that there isn't much use in addressing them separately. Insert the "Plus" after Dolby Digital in the proceeding paragraphs if it makes you feel better.

But there are two significantly different formats appearing on more and more film releases, especially in the relatively space-unconstrained Blu-Ray versions: lossless surround sound and object-based surround sound.

First, lossless is basically what it sounds like. Dolby TrueHD and DTS-HD are both formats that present 7.1 surround with only lossless compression, at the cost of a higher bitrate than older media and interconnects support. HDMI can easily handle these, and if you have a fairly new setup of a Blu-Ray player and recent home theater receiver connected by HDMI you should be able to enjoy a lossless digital soundtrack on films that were released with one. That's sort of the end of that topic, it's nothing that revolutionary.

But what about object-based surround sound? I'm using that somewhat lengthy term to try to avoid singling out one commercial product, but, well, there's basically one commercial product: Dolby Atmos. Atmos is heralded as a revolution in surround sound in a way that makes it sort of hard to know what it actually is. Here's the basic idea: instead of mastering a soundtrack by mixing audio sources into channels, you master a soundtrack by specifying the physical location (in cartesian coordinates) of each sound source.

When the audio is played back, an Atmos decoder then mixes the audio into channels on the fly, using whatever channels are available. Atmos allows the same soundtrack to be used by theaters with a variety of different speaker configurations, and as a result, makes it practical for theaters to expand into much higher channel counts.

Theaters aren't nearly as important a part of the film industry as they used to be, though, and unsurprisingly Atmos is heavily advertised for consumer equipment as well. How exactly does that work?

Atmos is conveyed on consumer equipment as 7.1 Dolby Digital Plus or Dolby TrueHD with extra metadata.

If you know anything about HDR video, also known as SDR video with extra metadata, you will find this unsurprising. But some might be confused. The thing is, the vast majority of consumers don't have Atmos equipment, and with lossless compression soundtracks are starting to get very large so including two complete copies isn't very appealing. The consumer encoding of Atmos was selected to have direct backward compatibility to 7.1 systems, allowing normal playback on pre-Atmos equipment.

For Atmos-capable equipment, an extra PCM-like subchannel (at a reduced bitrate compared to the audio channels) is used to describe the 3D position of specific sound sources. Consumer Atmos decoders cannot support as many objects as the theatrical version, so part of the process of mastering an Atmos film for home release is clustering nearby objects into groups that are then treated as a single object by the consumer Atmos decoder. One way to think about this is that Atmos is downmixed to 7.1, and in the process a metadata stream is created that can be used to upmix back to Atmos mostly correctly. If it sounds kind of like matrix encoding it kind of is, in effect, which is perhaps part of why Dolby's marketing materials are so insistent that it is not matrix encoding. To be fair it is a completely different implementation, but has a similar effect of reducing the channel separation compared to the original source.

Also I don't think Atmos has really taken off in home setups? I might just be out of date here, I think half the soundbars on the market today claim Atmos support and amazing feats with their five channels two of which are pointed up. I'm just pretty skeptical of the whole "we have made fewer, smaller speakers behave as if they were more, bigger speakers" school of audio products. Sorry Dr. Bose, there's just no replacement for displacement.

[1] The term Linear PCM or LPCM is used to clarify that no companding has been performed. This is useful because PCM originated for the telephone network, which uses companding as standard. LPCM clarifies that neither μ-law companding nor A-law companding has been performed. I will mostly just use PCM because I'm talking about movies and stuff, where companding digital audio is rare.

[2] There is also the matter of magnetic sources like turntables and microphones that produce much lower output levels than a typical "line level." Ideally you need a preamplifier with adjustable gain for these, although in the case of turntables there are generally accepted gain levels for the two common types of cartridges. A lot of preamplifiers either let you choose from those two or give you no control at all. Traditionally a receiver would have a built-in preamplifier to bring up the level of the signal on the turntable inputs, but a lot of newer receivers have left this out to save money, which leads to hipsters with vinyl collections having to really crank the volume.

[3] I don't feel like I should have to say this, but in the world of audio, I probably do: if it works, it doesn't matter! The problem with optical is that it develops reliability problems over shorter lengths than the electrical format. If you aren't getting missing samples (dropouts) in the audio, though, it's working fine and changing around cables isn't going to get you anything. In practice the length limitations on optical don't tend to matter very much anyway, since the average distance between two pieces of a component home theater system is, what, ten inches?

[4] Among the myriad other factors here is the more difficult problem that movies mix most of the dialog into the center channel while most viewers don't have a center channel. That means you need to remix the center channel into left and right to recover dialog. So-called professionals mastering Blu-Ray releases don't always get this right, and you're in even more trouble if you're having to do it yourself.

--------------------------------------------------------------------------------

>>> 2024-01-21 multi-channel audio part 1

Stereophonic or two-channel audio is so ubiquitous today that we tend to refer to all kinds of pieces of consumer audio reproduction equipment as "a stereo." As you might imagine, this is a relatively modern phenomenon. While stereo audio in concept dates to the late 19th century, it wasn't common in consumer settings until the 1960s and 1970s. Those were very busy decades in the music industry, and radio stations, records, and film soundtracks all came to be distributed primarily in stereo.

Given the success of stereo, though, one wonders why larger numbers of channels have met more limited success. There are, as usual, a number of factors. For one, two-channel audio was thought to be "enough" by some, considering that humans have two ears. Now it doesn't quite work this way in practice, and we are more sensitive to the direction from which sound comes than our binaural system would suggest. Still, there are probably diminishing returns, with stereo producing the most notable improvement in listening experience over mono.

There are also, though, technical limitations at play. The dominant form of recorded music during the transition to stereo was the vinyl record. There is a fairly straightforward way to record stereo on a record, by using a cartridge with coils on two opposing axes. This is the limit, though: you cannot add additional channels as you have run out of dimensions in the needle's free movement.

This was probably the main cause of the failure of quadraphonic sound, the first music industry attempt at pushing more channels. Introduced almost immediately after stereo in the 1970s, quadraphonic or four-channel sound seemed like the next logical step. It couldn't really be encoded on records, so a matrix encoding system was used in which the front-rear difference was encoded as phase shift in the left and right channels. In practice this system worked poorly, and especially early quadraphonic systems could sound noticeably worse than the stereo version. Wendy Carlos, an advocate of quadraphonic sound but harsh critic of musical electronics, complained bitterly about the inferiority of so-called quadraphonic records when compared to true four-channel recordings, for example on tape.

Of course, four-channel tape players were vastly more expensive than record players in the 1970s, as they ironically remain today. Quadraphonic sound was in a bind: it was either too expensive or too poor of quality to appeal to consumers. Quadraphonic radio using the same matrix encoding, while investigated by some broadcasters, had its own set of problems and never saw permanent deployment. Alan Parsons famously produced Pink Floyd's "Dark Side of the Moon" in quadraphonic sound; the effort was a failure in several ways but most memorably because, by the time of the album's release in 1973, the quadraphonic experiment was essentially over.

Three-or-more-channel-sound would have its comeback just a few years later, though, by the efforts of a different industry. Understanding this requires backtracking a bit, though, to consider the history of cinema prints.

Many are probably at least peripherally aware of Cinerama, an eccentric-seeming film format that used three separate cameras, and three separate projectors, to produce an exceptionally widescreen image. Cinerama's excess was not limited to the picture: it involved not only the three 35mm film reels for the three screen panels, but also a fourth 35mm film that was entirely coated with a magnetic substrate and was used to store seven channels of audio. Five channels were placed behind the screen, effectively becoming center, left, right, left side, and right side. The final two tracks were played back behind the audience, as the surround left and surround right.

Cinerama debuted in 1952, decades before 35mm films would typically carry even stereo audio. Like quadraphonic sound later, Cinerama was not particularly successful. By the time stereo records were common, Cinerama had been replaced by wider film formats and anamorphic formats in which the image was horizontally compressed by the lens of the camera, and expanded by the lens of the projector. Late Cinerama films like 2001: A Space Odyssey were actually filmed Super Panavision 70 and projected onto Cinerama screens from a single projector with a specialized lens.

There's a reason people talk so much about Cinerama, though. While it was not a commercial success, it was influential on the film industry to come. Widescreen formats, mostly anamorphic, would become increasingly common in the following decades. It would take years longer, but so would seven-channel theatrical sound.

"Surround sound," as these multi-channel formats came to be known in the late '50s, would come and go in theatrical presentations throughout the mid-century even as the vast majority of films were presented monaurally, with only a single channel. Most of these relied on either a second 35mm reel for audio only, or the greater area for magnetic audio tracks allowed by 70mm film. Both of these options were substantially more expensive for the presenting theater than mono, limiting surround sound mostly to high-end theaters and premiers. For surround sound to become common, it had to become cheap.

1971's A Clockwork Orange (I will try not to fawn over Stanley Kubrick too much but you are learning something about my film preferences here) employed a modest bit of audio technology, something that was becoming well established in the music industry but was new to film. The magnetic recordings used during the production process employed Dolby Type A noise reduction, similar to what became popular on compact cassette tapes, for a slight improvement in audio quality. The film was still mostly screened in magnetic mono, but it was the beginning of a profitable relationship between Dolby Labs and the film industry. Over the following years a number of films were released with Dolby Type A noise reduction on the actual distribution print, and some theaters purchased decoders to use with these prints. Dolby had bigger ambitions, though.

Around the same time, Kodak had been experimenting with the addition of stereo audio to 35mm release prints, using two optical tracks. They applied Dolby noise reduction to these experimental prints, and brought Dolby in to consult. This presented the perfect opportunity to implement an idea Dolby had been considering. Remember the matrix encoded quadraphonic recording that had been a failure for records? Dolby licensed a later-generation matrix decoder design from Sansui, and applied it to Kodak's stereo film soundtracks, allowing separation into four channels. While the music industry had placed the four channels at the four corners of the soundstage, the film industry had different tastes, driven mostly by the need to place dialog squarely in the center of the field. Dolby's variant of quadraphonic audio was used to present left, right, center, and a "surround" or side channel. This audio format went through several iterations, including much improved matrix decoding, and along the way picked up a name that is still familiar today: Dolby Stereo.

That Dolby Stereo is, in fact, a quadraphonic format reflects a general atmosphere of terminological confusion in the surround sound industry. Keep this in mind.

One of Dolby Stereo's most important properties was its backwards compatibility. The two optical tracks could be played back on a two-channel (or actually stereo) system and still sound alright. They could even be placed on the print alongside the older magnetic mono audio, providing compatibility with mono theaters. This compatibility with fewer channels became one of the most important traits in surround sound systems, and somewhat incidentally served to bring them to the consumer. Since the Dolby Stereo soundtrack played fine on a two-channel system, home releases of films on formats like VHS and Laserdisc often included the original Dolby Stereo audio from the print. A small industry formed around these home releases, licensing the Dolby technology to sell consumer decoders that could recover surround sound from home video.

For cost reasons these decoders were inferior to Dolby's own in several ways, and to avoid the hazard of damage to the Dolby Stereo brand, Dolby introduced a new marketing name for consumer Dolby Stereo decoders: Dolby Surround.

By the 1980s, Dolby Stereo, or Dolby Surround, had become the most common audio format on theatrical presentations and their home video releases. Even some television programs and direct-to-video material was recorded in Dolby Surround. Consumer stereo receivers, in the variant that came to be known as the home theater receiver, often incorporated Dolby Surround decoders. Improvements in consumer electronics brought the cost of proper Dolby Stereo decoders down, and so the home systems came to resemble the theatrical systems as well. Seeking a new brand to unify the whole mess of Dolby Stereo and Dolby Surround (which, confusingly, were often 4 and 3 channel, respectively), Dolby seems to have turned to the "Advanced Logic" and "Full Logic" terms once used by manufacturers of quadraphonic decoders. Dolby's theatrical sound solution came to be known as Dolby Pro Logic. A Dolby Pro Logic decoder processed two audio channels to produce a four-channel output. According to a modern naming convention, Dolby Pro Logic is a 4.0 system: four full-bandwidth channels.

This entire thing, so far, has been a preamble to the topic I actually meant to discuss. It's an interesting preamble, though! I just want to apologize that I didn't mean to write a history of multi-channel audio distribution and so this one isn't especially complete. I left out a number of interesting attempts at multi-channel formats, of which the film industry produced a surprising number, and instead focused on the ones that were influential and/or used for Kubrick films [1].

Dolby Pro Logic, despite its impressive name, was still an analog format, based on an early '70s technique. Later developments would see an increase in the number of channels, and the transition to digital audio formats.

Recall that 70mm film provided six magnetic audio channels, which were often used in an approximation of the seven-channel Cinerama format. Dolby experimented with the six-channel format, though, confusingly also under the scope of the Dolby Stereo product. During the '70s, Dolby observed that the ability of humans to differentiate the source of a sound is significantly reduced as the sound becomes lower in frequency. This had obvious potential for surround sound systems, enabling something analogous to chroma subsampling in video. The lower-frequency component of surround sound does not need to be directional, and for a sense of directionality the high frequencies are most important.

Besides, bassheads were coming to the film industry. The long-used Academy response curve fell out of fashion during the '70s, in part due to Dolby's work, in part due to generally improved loudspeaker technology, and in part due to the increasing popularity of bass-heavy action films. Several 70mm releases used one or more of the audio channels as dedicated bass channels.

For the 1979 film Apocalypse Now in its 70mm print, Dolby premiered a 5.1 format in which three full-bandwidth channels were used for center, left, and right, two channels with high-pass filtering were used for surround left and surround right, and one channel with low-pass filtering was used for bass. Apocalypse Now was not, in fact, the first film to use this channel configuration, but Dolby promoted it far more than the studios had.

Interestingly, while I know less about live production history, the famous cabaret Moulin Rouge apparently used a 5.1 configuration during the 1980s. Moulin Rouge was prominent enough to give the 5.1 format a boost in popularity, perhaps particularly important because of the film industry's indecision on audio formats.

The seven-channel concept of the original Cinerama must have hung around in the film industry, as there was continuing interest in a seven-channel surround configuration. At the same time, the music industry widely adopted eight-channel tape recorders for studio use, making eight-channel audio equipment readily available. The extension to 7.1 surround, adding left and right side channels to the 5.1 configuration, was perhaps obvious. Indeed, what I find strangest about 7.1 is just how late it was introduced to film. Would you believe that the first film released (not merely remastered or mixed for Blu-Ray) in 7.1 was 2010's Toy Story 3?

7.1 home theater systems were already fairly common by then, a notable example of a modern trend afflicting the film industry: the large installed base and cost avoidance of the theater industry means that consumer home theater equipment now evolves more quickly than theatrical systems. Indeed, while 7.1 became the gold standard in home theater audio during the 2000s, 5.1 remains the dominant format in theatrical sound systems today.

Systems with more than eight channels are now in use, but haven't caught on in the consumer setting. We'll talk about those later. For most purposes, eight-channel 7.1 surround sound is the most complex you will encounter in home media. The audio may take a rather circuitous route to its 7.1 representation, but, well, we'll get to that.

Let's shift focus, though, and talk a bit about the actual encodings. Audio systems up to 7.1 can be implemented using analog recording, but numerous analog channels impose practical constraints. For one, they are physically large, making it infeasible to put even analog 5.1 onto 35mm prints. Prestige multi-channel audio formats like that of IMAX often avoided this problem by putting the audio onto an entirely separate film reel (much like Cinerama back at the beginning), synchronized with the image using a pulse track and special equipment. This worked well but drove up costs considerably. Dolby Stereo demonstrated that it was possible to matrix four channels into two channels (with limitations), but considering the practical bandwidth of the magnetic or optical audio tracks on film you couldn't push this technique much further.

Remember that the theatrical audio situation changed radically during the 1970s, going from almost universal mono audio to four channels as routine and six channels for premiers and 70mm. During the same decade, the music reproduction industry, especially in Japan, was exploring another major advancement: digital audio encoding.

In 1980, the Compact Disc launched. Numerous factors contributed to the rapid success of CDs over vinyl and, to a lesser but still great extent, the compact cassette. One of them was the quality of the audio reproduction. CDs were a night and day change: records could produce an excellent result but almost always suffered from dirt and damage. Cassette tapes were better than most of us remember but still had limited bandwidth and a high noise floor, requiring Dolby noise reduction for good results. The CD, though, provided lossless digital audio.

Audio is encoded on an audio CD in PCM format. PCM, or pulse code modulation, is a somewhat confusing term that originated in the telephone industry. If we were to reinvent it today, we would probably just call it digital modulation. To encode a CD, audio is sampled (at 44.1 kHz for historic reasons) and quantized to 16 bits. A CD carries two channels, stereo, which was by then the universal format for music. Put together, those add up to 1.4Mbps. This was a very challenging data rate in 1980, and indeed, practical CD players relied on the fact that the data did not need to be read perfectly (error correcting codes were used) and did not need to be stored (going directly to a digital to analog converter). These were conveniently common traits of audio reproduction systems, and the CD demonstrated that digital audio was far more practical than the computing technology of the time would suggest.

The future of theatrical sound would be digital. Indeed, many films would be distributed with their soundtracks on CD.

There remained a problem, though: a CD could encode two channels. Even four channels wouldn't fit within the data rate CD equipment was capable of, much less six or eight. The film industry would need to formats that could encode six or eight channels of audio into either the bandwidth of a two-channel signal or into precious unused space on 35mm film prints.

Many ingenious solutions were developed. A typical 35mm film print today contains three distinct representations of the audio: a two-channel optical signal outside of the sprocket holes (which could encode Dolby Stereo), a continuous 2D barcode between the frame and sprocket holes which carries the SDDS (Sony Dynamic Digital Sound) digital signal, and individual 2D barcodes between the sprocket holes which encode the Dolby digital signal. Finally, a small pulse pattern at the very edge of the film provides a time code used for synchronization with audio played back from a CD, the DTS system.

But then, a typical 35mm film print today wouldn't exist, as 35mm film distribution has all but disappeared. Almost all modern film is played back entirely digitally from some sort of flexible stream container. You would think, then, that the struggles of encoding multi-channel audio are over. Many media container formats can, after all, contain an arbitrary number of audio channels.

Nothing is ever so simple. Much like a dedicated audio reel adds cost, multiple audio channels inflate file sizes, media cost, and in the era of playback from optical media, could stress the practical read rate. Besides, constraints of the past have a way of sticking around. Every multichannel audio format to find widespread success in the film industry has done so by maintaining backwards compatibility with simple mono and stereo equipment. That continues to be true today: modern multi-channel digital audio formats are still mostly built as extensions of an existing stereo encoding, not as truly new arbitrary-channel formats.

At the same time, the theatrical sound industry has begun a transition away from channel-centric audio formats and towards a more flexible system that is much further removed from the actual playback equipment.

Another trend has emerged since 1980 as well, which you probably already suspected from the multiple formats included in 35mm prints. Dolby's supremacy in multi-channel audio was never as complete as I made it sound, although they did become (and for some time remained) the most popular surround sound solution. They have always had competition, and that's still true today. Just as 35mm prints came with the audio in multiple formats, current digitally distributed films often do as well.

In Part 2, I'll get to the topic I meant to write about today before I got distracted by history: the landscape of audio formats included in digitally distributed films and common video files today, and some of the ways they interact remarkably poorly with computers. We're going to talk about:

Postscript: Film dweebs will of course wonder where George Lucas is in this story. His work on the Star Wars trilogy lead to the creation of THX, a company that will long be remembered for its distinctive audio identity. The odd thing is that THX was never exactly a technology company, although it was closely involved in sound technology developments of the time. THX was essentially a certification agency: THX theaters installed equipment by others (Altec Lansing, for much of the 20th century), and used any of the popular multi-channel audio formats.

To be a THX-certified theater, certain performance requirements had to be met, regardless of the equipment and format in use. THX certification requirements included architectural design standards for theaters, performance specifications for audio equipment, and a specific crossover configuration designed by Lucasfilm.

In 2002, Lucasfilm spun out THX and it essentially became a rental brand, shuffled into the ownership of gamer headphone manufacturer Razer today. THX certification still pops up in some consumer home theater equipment but is no longer part of the theatrical audio industry.

Read part 2 >

[1] Incidentally, Kubrick did not adapt to Dolby Stereo. Despite his early experience with Dolby noise reduction, all of his films would be released in mono except for 2001 (six-channel audio only in the Cinerama release) and Eyes Wide Shut (edited in Dolby Stereo after Kubrick's death).

--------------------------------------------------------------------------------

>>> 2024-01-16 the tacnet tracker

Previously on Deep Space Nine, I wrote that "the mid-2000s were an unsettled time in mobile computing." Today, I want to share a little example. Over the last few weeks, for various personal reasons, I have been doing a lot of reading about embedded operating systems and ISAs for embedded computing. Things like the NXP TriMedia (Harvard architecture!) and pSOS+ (ran on TriMedia!). As tends to happen, I kept coming across references to a device that stuck in my memory: the TacNet Tracker. It prominently features on Wikipedia's list of applications for the popular VxWorks real-time operating system.

It's also an interesting case study in the mid-2000s field of mobile computing, especially within academia (or at least the Department of Energy). You see, "mobile computing" used to be treated as a field of study, a subdiscipline within computer science. Mobile devices imposed practical constraints, and they invited more sophisticated models of communication and synchronization than were used with fixed equipment. I took a class on mobile computing in my undergraduate, although it was already feeling dated at the time.

Today, with the ubiquity of smartphones, "mobile computing" is sort of the normal kind. Perhaps future computer science students will be treated to a slightly rusty elective in "immobile computing." The kinds of strange techniques you use when you aren't constrained by battery capacity. Busy loop to blink the cursor!

Sometime around 2004, Sandia National Laboratory's 6452 started work on the TacNet Tracker. The goal: to develop a portable computer device that could be used to exchange real-time information between individuals in a field environment. A presentation states that an original goal of the project was to use COTS (commercial, off-the-shelf) hardware, but it was found to be infeasible. Considering the state of the mobile computing market in 2004, this isn't surprising. It's not necessarily that there weren't mobile devices available; if anything, the opposite. There were companies popping up with various tablets fairly regularly, and then dropping them two years later. You can find any number of Windows XP tablets; but the government needed something that could be supported long-term. That perhaps explains the "Life-cycle limitations" bullet point the presentation wields against COTS options.

The only products with long-term traction were select phones and PDAs like the iPaq and Axim. Even this market collapsed almost immediately with the release of the iPhone, although Sandia engineers wouldn't have known that would come. Still, the capabilities and expandability of these devices were probably too limited for the Tracker's features. There's a reason all those Windows XP tablets existed. They weighed ten pounds, but they were beefy enough to run the data entry applications that were the major application of commercial mobile computing at the time.

The TacNet Tracker, though, was designed to fit in a pocket and to incorporate geospatial features. Armed with a Tracker, you could see the real-time location of other Tracker users on a map. You could even annotate the map, marking points and lines, and share these annotations with others. This is all very mundane today! At the time, though, it was an obvious and yet fairly complex application for a mobile device.

The first question, of course, is of architecture. The Tracker was built around the XScale PXA270 SoC. XScale, remember, was Intel's marketing name for their ARMv5 chips manufactured during the first half of the '00s. ARM was far less common back then, but was already emerging as a leader in power-efficient devices. The PXA270 was an early processor to feature speed-stepping, decreasing its clock speed when under low load to conserve power.

The PXA270 was attached to 64MB of SDRAM and 32MB of flash. It supported more storage on CompactFlash, had an integrated video adapter, and a set of UARTs that, in the Tracker, would support a serial interface, a GPS receiver, and Bluetooth.

A rechargeable Li-Poly pack allowed the Tracker to operate for "about 4 hours," but the presentation promises 8-12 hours in the future. Battery life was a huge challenge in this era. It probably took about as long to charge as it did to discharge, too. There hadn't been much development in high-rate embedded battery chargers yet.

The next challenge was communication. 802.11 WiFi was achieving popularity by this time, but suffered from a difficult and power-intensive association process even more than it does today. Besides, in mobile applications like those the Tracker was intended for, conventional WiFi's requirement for network infrastructure was impractical. Instead, Sandia turned to Motorola. The Tracker used a PCMCIA WMC6300 Pocket PC MEA modem. MEA stands for "Mesh Enabled Architecture," which seems to have been the period term for something Motorola later rebranded as MOTOMESH.

Marketed primarily for municipal network and public safety applications, MOTOMESH is a vaguely 802.11-adjacent proprietary radio protocol that provides broadband mesh routing. One of the most compelling features of MEA and MOTOMESH is its flexibility: MOTOMESH modems will connect to fixed infrastructure nodes under central management, but they can also connect directly to each other, forming ad-hoc networks between adjacent devices. 802.11 itself was conceptually capable of the same, but in practice, the higher-level software to support this kind of use never really emerged. Motorola offered a complete software suite for MOTOMESH, though, and for no less than Windows CE.

Yes, it really enforces the period vibes that the user manual for the WMC6300 modem starts by guiding you through using Microsoft ActiveSync to transfer the software to an HP iPaq. One did not simply put files onto a mobile device at the time; you had to sync them. Microsoft tried to stamp out an ecosystem of proprietary mobile device sync protocols with ActiveSync. Ultimately none of them would really see much use, PDAs were always fairly niche.

Sandia validated performance of the Tracker's MEA modem using an Elektrobit Propsim C2. I saw one of these at auction once (possibly the same one!), and sort of wish I'd bid on it. It's a chunky desktop device with a set of RF ports and the ability to simulate a wide variety of different radio paths between those ports, introducing phenomena like noise, fading, and multipath that will be observed in the real world. The results are impressive: in a simulated hilly environment, Trackers could exchange a 1MB test image in just 13.6 seconds. Remember that next time you are frustrated by LTE; we really take what we have today for granted.

But what of the software? Well, the Tracker ran VxWorks. Actually, that's how I ran into it: it seems that Wind River (developer of VxWorks) published a whitepaper about the Tracker, which made it onto a list of featured applications, which was the source a Wikipedia editor used to flesh out the article. Unfortunately I can't find the original whitepaper, only dead links to it. I'm sure it would have been a fun read.

VxWorks is a real-time operating system mostly used in embedded applications. It supports a variety of architectures, provides a sophisticated process scheduler with options for hard real-time and opportunistic workloads, offers network, peripheral bus, and file system support, and even a POSIX-compliant userspace. It remains very popular for real-time control applications today, although I don't think you'd find many UI-intensive devices like the Tracker running it. A GUI framework is actually a fairly new feature.

The main application for the Tracker was a map, with real-time location and annotation features. It seems that a virtual whiteboard and instant messaging application were also developed. A charmingly cyberpunk Bluetooth wrist-mounted display was pondered, although I don't think it was actually made.

But what was it actually for?

Well, federal R&D laboratories have a tendency to start a project for one application and then try to shop it around to others, so the materials Sandia published present a somewhat mixed message. A conference presentation suggests it could be used to monitor the health of soldiers in-theater (an extremely frequent justification for grants in mobile computing research!), for situational awareness among security or rescue forces, or for remote control of weapons systems.

I think a hint comes, though, from the only concrete US government application I can find documented: in 2008, Sandia delivered the TacNet Tracker system to the DoE Office of Secure Transportation (OST). OST is responsible for the over-road transportation of nuclear weapons and nuclear materials in the United States. Put simply, they operate a fleet of armored trucks and accompanying security escorts. There is a fairly long history, back to at least the '70s, of Sandia developing advanced radio communications systems for use by OST convoys. Many of these radio systems seemed ahead of their time or at least state of the art, but they often failed to gain much traction outside of DoE. Perhaps this relates to DoE culture, perhaps to the extent to which private contractors have captured military purchasing.

Consider, for example, that Sandia developed a fairly sophisticated digital HF system for communication between OST convoys and control centers. It seemed rather more advanced than the military's ALE solution, but a decade or so later OST dropped it and went to using ALE like everyone else (likely for interoperability with the large HF ALE networks operated by the FBI and CBP for domestic security use, although at some point the DoE itself also procured its own ALE network). A whole little branch of digital HF technology that just sort of fizzled out in the nuclear weapons complex. There's a lot of things like that, it's what you get when you put an enormous R&D capability into a particularly insular and secretive part of the executive branch.

Sandia clearly hoped to find other applications for the system. A 2008 Sandia physical security manual for nuclear installations recommends that security forces consider the TacNet Tracker as a situational awareness solution. It was pitched for several military applications. It's a little hard to tell because the name "TacNet" is a little too obvious, but it doesn't seem that the Sandia device ever gained traction in the military.

As it does with many technical developments that don't go very far, Sandia licensed the technology out. A company called Homeland Integrated Security Systems (HISS) bought it, a very typical name for a company that sells licensed government technology. HISS partnered with a UK-based company called Arcom to manufacture the TacNet Tracker as a commercial product, and marketed it to everyone from the military to search and rescue teams.

HISS must have found that the most popular application of the Tracker was asset tracking. It makes sense, the Tracker device itself lacked a display, under the assumption that it would be in a dock or used with an accessory body-worn display. By the late 2000s, HISS had rebranded the TacNet Tracker as the CyberTracker, and re-engineered it around a Motorola iDEN board. I doubt they actually did much engineering on this product, it seems to have been pretty much an off-the-shelf Motorola iDEN radio that HISS just integrated into their tracking platform. It was advertised as a deterrent to automotive theft and a way to track hijacked school buses in real time---the Chowchilla kidnapping was mentioned.

And that's the curve of millennial mobile computing: a cutting-edge R&D project around special-purpose national security requirements, pitched as a general purpose tactical device, licensed to a private partner, turned into yet another commodity anti-theft tracker. Like if LoJack had started out for nuclear weapons. Just a little story about telecommunications history.

Sandia applied for a patent on the Tracker in 2009, so it's probably still in force (ask a patent attorney). HISS went through a couple of restructurings but, as far as I can tell, no longer exists. The same goes for Arcom, a company by the same name that makes cable TV diagnostic equipment seems to be unrelated. Like the OLPC again, all that is left of the Tracker is a surprising number of used units for sale. I'm not sure who ever used the commercial version, but they sure turn up on eBay. I bought one, of course. It'll make a good paperweight.

--------------------------------------------------------------------------------

>>> 2024-01-06 usb on the go

USB, the Universal Serial Bus, was first released in 1996. It did not achieve widespread adoption until some years later; for most of the '90s RS-232-ish serial and its awkward sibling the parallel port were the norm for external peripheral. It's sort of surprising that USB didn't take off faster, considering the significant advantages it had over conventional serial. Most significantly, USB was self-configuring: when you plugged a device into a host, a negotiation was performed to detect a configuration supported by both ends. No more decoding labels like 9600 8N1 and then trying both flow control modes!

There are some significant architectural differences between USB and conventional serial that come out of autoconfiguration. Serial ports had no real sense of which end was which. Terms like DTE and DCE were sometimes used, but they were a holdover from the far more prescriptive genuine RS-232 standard (which PCs and most peripherals did not follow) and often inconsistently applied by manufacturers. All that really mattered to a serial connection is that one device's TX pin went to the other device's RX pin, and vice versa. The real differentiation between DCE and DTE was the placement of these pins: in principle, a computer would have them one way around, and a peripheral the other way around. This meant that a straight-through cable would result in a crossed-over configuration, as expected.

In practice, plenty of peripherals used the same DE-9 wiring convention as PCs, and sometimes you wanted to connect two PCs to each other. Some peripherals used 8p8c modular jacks, some peripherals used real RS-232 connectors, and some peripherals used monstrosities that could only have emerged from the nightmares of their creators. The TX pin often ended up connected to the TX pin and vice versa. This did not work. The solution, as we so often see in networking, was a special cable that crossed over the TX and RX wires within the cable (or adapter). For historical reasons this was referred to as a null modem cable.

One of the other things that was not well standardized with serial connections was the gender of the connectors. Even when both ends features the PC-standard DE-9, there was some inconsistency over the gender of the connectors on the devices and on the cable. Most people who interact with serial with any regularity probably have a small assortment of "gender changers" and null-modem shims in their junk drawer. Sometimes you can figure out the correct configuration from device manuals (the best manuals provide a full pinout), but often you end up guessing, stringing together adapters until the genders fit and then trying with and without a null modem adapter.

You will notice that we rarely go through this exercise today. For that we can thank USB's very prescriptive standards for connectors on devices and cables. The USB standard specifies three basic connectors, A, B, and C. There are variants of some connectors, mostly for size (mini-B, micro-B, even a less commonly used mini-A and micro-A). For the moment, we will ignore C, which came along later and massively complicated the situation. Until 2014, there was only A and B. Hosts had A, and devices had B.

Yes, USB fundamentally employs a host-device architecture. When you connect two things with USB, one is the host, and the other is the device. This differentiation is important, not just for the cable, but for the protocol itself. USB prior to 3, for example, does not feature interrupts. The host must poll the device for new data. The host also has responsibility for enumeration of devices to facilitate autoconfiguration, and for flow control throughout a tree of USB devices.

This architecture makes perfect sense for USB's original 1990s use-case of connecting peripherals (like mice) to hosts (like PCs). In fact, it worked so well that once USB1.1 addressed some key limitations it became completely ubiquitous. Microsoft used the term "legacy-free PC" to identify a new generation of PCs at the very end of the '90s and early '00s. While there were multiple criteria for the designation, the most visible to users was the elimination of multiple traditional ports (like the game port! remember those!) in favor of USB.

Times change, and so do interconnects. The consumer electronics industry made leaps and bounds during the '00s and "peripheral" devices became increasingly sophisticated. The introduction of portables running sophisticated operating systems pushed the host-device model to a breaking point. It is, of course, tempting to talk about this revolution in the context of the iPhone. I never had an iPhone though, so the history of the iDevice doesn't have quite the romance to me that it has to so many in this space [1]. Instead, let's talk about Nokia. If there is a Windows XP to Apple's iPhone, it's probably Nokia. They tried so hard, and got so far, but [...].

The Nokia 770 Internet Tablet was not by any means the first tablet computer, but it was definitely a notable early example. Introduced in 2005, it premiered the Linux-based Maemo operating system beloved by Nokia fans until iOS and Android killed it off in the 2010s. The N770 was one of the first devices to fall into a new niche: with a 4" touchscreen and OMAP/ARM SoC, it wasn't exactly a "computer" in the consumer sense. It was more like a peripheral, something that you would connect to your computer in order to load it up with your favorite MP3s. But it also ran a complete general-purpose operating system. The software was perfectly capable of using peripherals itself, and MP3s were big when you were storing them on MMC. Shouldn't you be able to connect your N770 to a USB storage device and nominate even more MP3s as favorites?

Obviously Linux had mainline USB mass storage support in 2005, and by extension Maemo did. The problem was USB itself. The most common use case for USB on the N770 was as a peripheral, and so it featured a type-B device connector. It was not permitted to act as a host. In fact, every PDA/tablet/smartphone type device with sophisticated enough software to support USB peripherals would encounter the exact same problem. Fortunately, it was addressed by a supplement to the USB 2.0 specification released in 2001.

The N770 did not follow the supplement. That makes it fun to talk about, both because it is weird and because it is an illustrative example of the problems that need to be solved.

The N770 featured an unusual USB transceiver on its SoC, seemingly unique to Nokia and called "Tahvo." The Tahvo controller exposed an interface (via sysfs in the Linux driver) that allowed the system to toggle it between device mode (its normal configuration) and host mode. This worked well enough with Maemo's user interface, but host mode had a major limitation. The N770 wouldn't provide power on the USB port; it didn't have the necessary electrical components. Instead, a special adapter cable was needed to provide 5v power from an alternate source.

So there are several challenges for a USB device to operate as host or device:

Note that "special cable" involved in host mode for the N770. You might think this was the ugliest part of the situation. You're not wrong, but it's also not really the hack. For many years to follow, the proper solution to this problem would also involve a special cable.

As I mentioned, since 2001 there has been a supplement USB specification called USB On-The-Go, commonly referred to as USB OTG, perhaps because On-The-Go is an insufferably early '00s name. It reminds me of, okay, here goes a full-on anecdote.

Anecdote

I attended an alternative middle school in Portland that is today known as the Sunnyside Environmental School. I could tell any number of stories about the bizarre goings-on at this school that you would scarcely believe, but it also had its merits. One of them, which I think actually came from the broader school district, was a program in which eighth graders were encouraged to "job shadow" someone in a profession they were interested in pursuing. By good fortune, a friend's father was an electrical engineer employed at Intel's Jones Farm campus, and agreed to be my host. I had actually been to Jones Farm a number of times on account of various extracurricular programs (in that era, essentially every STEM program in the Pacific Northwest operated on the largess of either Intel or Boeing, if not both). This was different, though: this guy had a row of engraved brass patent awards lining his cubicle wall and showed me through labs where technicians tinkered with prototype hardware. Foreshadowing a concerning later trend in my career, though, the part that stuck with me most was the meetings. I attended meetings, including one where this engineering team was reporting to leadership on the status of a few of their projects.

I am no doubt primed to make this comparison by the mediocre movie I watched last night, but I have to describe the experience as Wonka-esque. These EEs demonstrated a series of magical hardware prototypes to some partners from another company. Each was more impressive than the last. It felt like I was seeing the future in the making.

My host demonstrated his pet project, a bar that contained an array of microphones and used DSP methods to compare the audio from each and directionalize the source of sounds. This could be used for a sophisticated form of noise canceling in which sound coming from an off-axis direction could be subtracted, leaving only the voice of the speaker. If this sounds sort of unremarkable, that is perhaps a reflection of its success, as the same basic concept is now implemented in just about every laptop on the market. Back then, when the N770 was a new release, it was challenging to make work and my host explained that the software behind it usually crashed before he finished the demo, and sometimes it turned the output into a high pitched whine and he hadn't quite figured out why yet. I suppose that meeting was lucky.

But that's an aside. A long presentation, and then debate skeptical execs, revolved around a new generation of ultramobile devices that Intel envisioned. One, which I got to handle a prototype of, would eventually become the Intel Core Medical Tablet. It featured chunky, colorful design that is clearly of the same vintage as the OLPC. It was durable enough to stand on, which a lab technician demonstrated with delight (my host, I suspect tired of this feat, picked up some sort of lab interface and dryly remarked that he could probably stand on it too). The Core Medical Tablet shared another trait with the OLPC: the kind of failure that leaves no impact on the world but a big footprint at recyclers. Years later, as an intern at Free Geek, I would come across at least a dozen.

Another facet of this program, though, was the Mobile Metro. The Metro was a new category of subnotebook, not just small but thin. A period article compares its 18mm profile to the somewhat thinner Motorola Razr, another product with an outsize representation in the Free Geek Thrift Store. Intel staff were confident that it would appeal to a new mobile workforce, road warriors working from cars and coffee shops. The Mobile Metro featured SideShow, a small e-ink display (in fact, I believe, a full Windows Mobile system) on the outside of a case that could show notifications and media controls.

The Mobile Metro was developed around the same time as the Classmate PC, but seems to have been even less successful. It was still in the conceptual stages when I heard of it. It was announced, to great fanfare, in 2007. I don't think it ever went into production. It had WiMax. It had inductive charging. It only had one USB port. It was, in retrospect, prescient in many ways both good and bad.

The point of this anecdote, besides digging up middle school memories while attempting to keep others well suppressed, is that the mid-2000s were an unsettled time in mobile computing. The technology was starting to enable practical compact devices, but manufacturers weren't really sure how people would use them. Some innovations were hits (thin form factors). Some were absolute misses (SideShow). Some we got stuck with (not enough USB ports).

End of anecdote

As far as I can tell, USB OTG wasn't common on devices until it started to appear on Android smartphones in the early 2010s. Android gained OTG support in 3.1 (codenamed Honeycomb, 2011), and it quickly appeared in higher-end devices. Now OTG support seems nearly universal for Android devices; I'm sure there are lower-end products where it doesn't work but I haven't yet encountered one. Android OTG support is even admirably complete. If you have an Android phone, amuse yourself sometime by plugging a hub into it, and then a keyboard and mouse. Android support for desktop input peripherals is actually very good and operating mobile apps with an MX Pro mouse is an entertaining and somewhat surreal experience. On the second smartphone I owned, I hazily think a Samsung in 2012-2013, I used to take notes with a USB keyboard.

iOS doesn't seem to have sprouted user-exposed OTG support until the iPhone 12, although it seems like earlier versions probably had hardware support that wasn't exposed by the OS. I could be wrong about this; I can't find a straightforward answer in Apple documentation. The Apple Community Forums seem to be... I'll just say "below average." iPads seem to have gotten OTG support a lot earlier than the iPhone despite using the same connector, making the situation rather confusing. This comports with my general understanding of iOS, though, from working with bluetooth devices: Apple is very conservative about hardware peripheral support in iOS, and so it's typical for iOS to be well behind Android in this regard for purely software reasons. Ask me about how this has impacted the Point of Sale market. It's not positive.

But how does OTG work? Remember, USB specifies that hosts must have an A connector, and devices a B connector. Most smartphones, besides Apple products and before USB-C, sported a micro-B connector as expected. How OTG?

The OTG specification decouples, to some extent, the roles of A/B connector, power supply, and host/device role. A device with USB OTG support should feature a type AB socket that accommodates either an A or a B plug. Type AB is only defined for the mini and micro sizes, typically used on portable devices. The A or B connectors are differentiated not only by the shape of their shells (preventing a type-A plug being inserted into a B-only socket), but also electrically. The observant among you may have noticed that mini and micro B sockets and plugs feature five pins, while USB2.0 only uses four. This is the purpose of the fifth pin: differentiation of type A and B plugs.

In a mini or micro type B plug, the fifth pin is floating (disconnected). In a mini or micro type A plug, it is connected to the ground pin. When you insert a plug into a type AB socket, the controller checks for connectivity between the fifth pin (called the ID pin) and the ground. If connectivity is present, the controller knows that it must act as an OTG A-device---it is on the "A" end of the connection. If there is no continuity, the more common case, the controller will act as an OTG B-device, a plain old USB device [2].

The OTG A-device is always responsible for supplying 5v power (see exception in [2]). By default, the A-device also acts as the host. This provides a basically complete solution for the most common OTG use-case: connecting a peripheral like a flash drive to your phone. The connector you plug into your phone identifies itself as an A connector via the ID pin, and your phone thus knows that it must supply power and act as host. The flash drive doesn't need to know anything about this, it has a B connection and acts as a device as usual. This simple case only became confusing when you consider a few flash drives sold specifically for use with phones that had a micro-A connector right on them. These were weird and I don't like them.

In the more common situation, though, you would use a dongle: a special cable. A typical OTG cable, which were actually included in the package with enough Android phones of the era that I have a couple in a drawer without having ever purchased one, provides a micro-A connector on one end and a full-size A socket on the other. With this adapter, you can plug any USB device into your phone with a standard USB cable.

Here's an odd case, though. What if you plug two OTG devices into each other? USB has always had this sort of odd edge-case. Some of you may remember "USB link cables," which don't really have a technical name but tend to get called Laplink cables after a popular vendor. Best Buy and Circuit City used to be lousy with these things, mostly marketed to people who had bought a new computer and wanted to transfer their files. A special USB cable had two A connectors, which might create the appearance that it connected two hosts, but in fact the cable (usually a chunky bit in the middle) acted as two devices to connect to two different hosts. The details of how these actually worked varied from product to product, but the short version is "it was proprietary." Most of them didn't work unless you found the software that came with them, but there are some pseudo-standard controllers supported out of the box by Windows or Linux. I would strongly suggest that you protect your mental state by not trying to use one.

OTG set out to address this problem more completely. First, it's important to understand that this in no way poses an exception to the rule that a USB connection has an A end and a B end. A USB cable you use to connect two phones together might, at first glance, appear to be B-B. But, if you inspect closer, you will find that one end is mini or micro A, and the other is mini or micro B. You may have to look close, the micro connectors in particular have a similar shell!

If you are anything like me, you are most likely to have encountered such a cable in the box with a TI-84+. These calculators had a type AB connector and came with a micro A->B cable to link two units. You might think, by extension, that the TI-84+ used USB OTG. The answer is kind of! The USB implementation on the TI-84+ and TI-84+SE was very weird, and the OS didn't support anything other than TIConnect. Eventually the TI-84+CE introduced a much more standard USB controller, although I think support for any OTG peripheral still has to be hacked on to the OS. TI has always been at the forefront of calculator networking, and it has always been very weird and rarely used.

This solves part of the problem: it is clear, when you connect two phones, which should supply power and which should handle enumeration. The A-device is, by default, in charge. There are problems where this interacts with common USB devices types, though. One of the most common uses of USB with phones is mass storage (and its evil twin MTP). USB mass storage has a very strong sense of host and device at a logical level; the host can browse the devices files. When connecting two smartphones, though, you might want to browse from either end. Another common problem case here is that of the printer, or at least it would be if printer USB host support was ever usable. If you plug a printer into a phone, you might want to browse the phone as mass storage on the printer. Or you might want to use conventional USB printing to print a document from the phone's interface. In fact you almost certainly want to do the latter, because even with Android's extremely half-assed print spooler it's probably a lot more usable than the file browser your printer vendor managed to offer on its 2" resistive touchscreen.

OTG adds Host Negotiation Protocol, or HNP, to help in this situation. HNP allows the devices on a USB OTG connection to swap roles. While the A-device will always be the host when first connected, HNP can reverse the logical roles on demand.

This all sounds great, so where does it fall apart? Well, the usual places. Android devices often went a little off the script with their OTG implementations. First, the specification did not require devices to be capable of powering the bus, and phones couldn't. Fortunately that seems to have been a pretty short lived problem, only common in the first couple of generations of OTG devices. This wasn't the only limitation of OTG implementations; I don't have a good sense of scale but I've seen multiple reports that many OTG devices in the wild didn't actually support HNP, they just determined a role when connected based on the ID pin and could not change after that point.

Finally, and more insidiously, the whole thing about OTG devices having an AB connector didn't go over as well as intended. We actually must admire TI for their rare dedication to standards compliance. A lot of Android phones with OTG support had a micro-B connector only, and as a result a lot of OTG adapters use a micro-B connector.

There's a reason this was common; since A and B plugs are electrically differentiable regardless of the shape of the shell, the shell shape arguably doesn't matter. You could be a heavy OTG user with such a noncompliant phone and adapter and never notice. The problem only emerges when you get a (rare) standards-compliant OTG adapter or, probably more common, OTG A-B cable. Despite being electrically compatible, the connector won't fit into your phone. Of course this behavior feeds itself; as soon as devices with an improper B port were common, manufacturers of cables were greatly discouraged from using the correct A connector.

The downside, conceptually, is that you could plug an OTG A connector (with a B-shaped shell) into a device with no OTG support. In theory this could cause problems, in practice the problems don't seem to have been common since both devices would think they were B devices and (if standards compliant) not provide power. Essentially these improper OTG adapters create a B-B cable. It's a similar problem to an A-A cable but, in practice, less severe. Like an extension cord with two female ends. Home Depot might even help you make one of those.

While trying to figure out which iPhones had OTG support, I ran across an Apple Community thread where someone helpfully replied "I haven't heard of OTG in over a decade." Well, it's not a very helpful reply, but it's not exactly wrong either. No doubt the dearth of information on iOS OTG is in part because no one ever really cared. Much like the HDMI-over-USB support that a generation of Android phones included, OTG was an obscure feature. I'm not sure I have ever, even once, seen a human being other than myself make use of OTG.

Besides, it was completely buried by USB-C.

The thing is that OTG is not gone at all, in fact, it's probably more popular than ever before. There seems to be some confusion about how OTG has evolved with USB specifications. I came across more than one article saying that USB 3.1 Dual Role replaced OTG. This assertion is... confusing. It's not incorrect, but there's a good chance of it leading you int he wrong direction.

Much of the confusion comes from the fact that Dual-Role doesn't mean anything that specific. The term Dual-Role and various resulting acronyms like DRD and DRP have been applied to multiple concepts over the life of USB. Some vendors say "static dual role" to refer to devices that can be configured as either host or device (like the N770). Some vendors use dual role to identify chipsets that detect role based on the ID pin but are not actually capable of OTG protocols like HNP. Some articles use dual role to identify chipsets with OTG support. Subjectively, I think the intent of the changes in USB 3.1 were mostly to formally adopt the "dual role" term that was already the norm in informal use---and hopefully standardize the meaning.

For USB-C connectors, it's more complicated. USB-C cables are symmetric, they do not identify a host or device end in any way. Instead, the USB-C ports use resistance values to indicate their type. When either end indicates that it is only capable of the device role, the situation is simple, behaving basically the same way that OTG did: the host detects that the other end is a device and behaves as the host.

When both ends support the host role, things work differently: the Dual Role feature of USB-C comes into play. The actual implementation is reasonably simple; a dual-role USB-C controller will attempt to set up a connection both ways and go with whichever succeeds. There are some minor complications on top of this, for example, the controller can be configured with a "preference" for host or device role. This means that when you plug your phone into your computer via USB-C, the computer will assume the host role, because although it's capable of either the phone is configured with a preference for the device role. That matches consumer expectations. When both devices are capable of dual roles and neither specifies a preference, the outcome is random. This scenario is interesting but not all that common in practice.

The detection of host or device role by USB-C is based on the CC pins, basically a more flexible version of OTG's ID pin. There's another important difference between the behavior of USB-C and A/B: USB-C interfaces provide no power until they detect, via the CC pins, that the other device expects it. This is an important ingredient to mitigate the problem with A-A cables, that both devices will attempt to power the same bus.

The USB-C approach of using CC pins and having dual role controllers attempt one or the other at their preference is, for the most part, a much more elegant approach. There are a couple of oddities. First, in practice cables from C to A or B connectors are extremely common. These cables must provide the appropriate values on the CC pins to allow the USB-C controller to correctly determine its role, both for data and power delivery.

Second, what about role reversal? For type A and B connectors, this is achieved via HNP, but HNP is not supported on USB-C. Application notes from several USB controller vendors explain that, oddly enough, the only way to perform role reversal with USB-C is to implement USB Power Delivery (PD) and use the PD negotiation protocol to change the source of power. In other words, while OTG allows reversing host and device roles independently of the bus power source, USB-C does not. The end supplying power is always the host end. This apparent limitation probably isn't that big of a deal, considering that the role reversal feature of OTG was reportedly seldom implemented.

That's a bit of a look into what happens when you plug two USB hosts into each other. Are you confused? Yeah, I'm a little confused too. The details vary, and a lot more based on the capabilities of the individual devices rather than the USB version in use. This has been the malaise of USB for a solid decade now, at least: the specification has become so expansive, with so many non-mandatory features, that it's a crapshoot what capabilities any given USB port actually has. The fact that USB-C supports a bevy of alternate modes like Thunderbolt and HDMI only adds further confusion.

I sort of miss when the problem was just inappropriate micro-B connectors. Nonetheless, USB-C dual role support seems ubiquitous in modern smartphones, and that's the only place any of this ever really mattered. Most embedded devices still seem to prefer to just provide two USB ports: a host port and a device port. And no one ever uses the USB host support on their printer. It's absurd, no one ever would. Have you seen what HP thinks is a decent file browser? Good lord.

[1] My first smartphone was the HTC Thunderbolt. No one, not even me, will speak of that thing with nostalgia. It was pretty cool owning one of the first LTE devices on the market, though. There was no contention at all in areas with LTE service and I was getting 75+Mbps mobile tethering in 2011. Then everyone else had LTE too and the good times ended.

[2] There are actually several additional states defined by fixed resistances that tell the controller that it is the A-device but power will be supplied by the bus. These states were intended for Y-cables that allowed you to charge your phone from an external charger while using OTG. In this case neither device supplies power, the external charger does. The details of how this works are quite straightforward but will be confusing to keep adding as an exception, so I'm going to pretend the whole feature doesn't exist.

--------------------------------------------------------------------------------
                                                                        older ->