_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford               home archive subscribe rss

>>> 2024-03-01 listening in on the neighborhood

Last week, someone leaked a spreadsheet of SoundThinking sensors to Wired. You are probably asking "What is SoundThinking," because the company rebranded last year. They used to be called ShotSpotter, and their outdoor acoustic gunfire detection system still goes by the ShotSpotter name.

ShotSpotter has attracted a lot of press and plenty of criticism for the gunfire detection service they provide to many law enforcement agencies in the US. The system involves installing acoustic sensors throughout a city, which use some sort of signature matching to detect gunfire and then use time of flight to determine the likely source.

One of the principle topics of criticism is the immense secrecy with which they operate: ShotSpotter protects information on the location of its sensors as if it were state secret, and does not disclose them even to the law enforcement agencies that are its customers. This secrecy attracts accusations that ShotSpotter's claims of efficacy cannot be independently validated, and that ShotSpotter is attempting to suppress research into the civil rights impacts of its product.

I have encountered this topic before: the Albuquerque Police Department is a ShotSpotter customer, and during my involvement in police oversight was evasive in response to any questions about the system and resisted efforts to subject its surveillance technology purchases to more outside scrutiny. Many assumed that ShotSpotter coverage was concentrated in disadvantaged parts of the city, an unsurprising outcome but one that could contribute to systemic overpolicing. APD would not comment.

I have always assumed that it would not really be that difficult to find the ShotSpotter sensors, at least if you have my inclination to examine telephone poles. While the Wired article focuses heavily on sensors installed on buildings, it seems likely that in environments like Albuquerque with city-operated lighting and a single electrical utility, they would be installed on street lights. That's where you find most of the technology the city fields.

The thing is, I didn't really know what the sensors looked like. I've seen pictures, but I know they were quite old, and I assumed the design had gotten more compact over time. Indeed it has.

ShotSpotter sensor on light pole

An interesting thing about the Wired article is that it contains a map, but the MapBox embed produced with Flourish Studio had a surprisingly high maximum zoom level. That made it more or less impossible to interpret the locations of the sensors exactly. I'm concerned that this was an intentional decision by Wired to partially obfuscate the data, because it is not an effective one. It was a simple matter to find the JSON payload the map viewer was using for the PoI overlay and then convert it to KML.

I worried that the underlying data would be obscured; it was not. The coordinates are exact. So, I took the opportunity to enjoy a nice day and went on an expedition.

ShotSpotter sensor in a neighborhood

The sensors are pretty much what I imagined, innocuous beige boxes clamped to street light arms. There are a number of these boxes to be found in modern cities. Some are smart meter nodes, some are base stations for municipal data networks, others collect environmental data. Some are the police, listening in on your activities.

This is not as hypothetical of a concern as it might sound. Conversations recorded by ShotSpotter sensors have twice been introduced as evidence in criminal trials. In one case the court allowed it, in another the court did not. The possibility clearly exists, and depending on interpretation of state law, it may be permissible for ShotSpotter to record conversations on the street for future use as evidence.

ShotSpotter sensor in a neighborhood

This ought to give us pause, as should the fact that ShotSpotter has been compellingly demonstrated to manipulate their "interpretation" of evidence to fit a prosecutor's narrative---even when ShotSpotter's original analysis contradicted it.

But pervasive surveillance of urban areas and troubling use of that evidence is nothing new. Albuquerque already has an expansive police-operated video surveillance network connected to the Real-Time Crime Center. APD has long used portable automated license plate readers (ALPR) under cover of "your speed is" trailers, and more recently has installed permanent ALPR at major intersections in the city.

All of this occurs with virtually no public oversight or even public awareness.

ShotSpotter sensor in a neighborhood

What most surprised me is the density of ShotSpotter sensors. In my head, I assumed they were fairly sparse. A Chicago report on the system says there are 20 to 25 per square mile. Density in Albuquerque is lower, probably reflecting the wide streets and relative lack of high rises. Still, there are a lot of them. 721 in Albuquerque, a city of about 190 square miles. At present, only parts of the city are covered.

Map of ShotSpotter sensors in Albuquerque

And those coverage decisions are interesting. The valley (what of it is in city limits) is well covered, as is the west side outside of Coors/Old Coors. The International District, of course, is dense with sensors, as is inner NE bounded by roughly by the freeways to Louisiana and Montgomery.

Conspicuously empty is the rest of the northeast, from UNM's north campus area to the foothills. Indian School Road makes almost its entire east side length without any sensors.

ShotSpotter sensor in a neighborhood

The reader can probably infer how this coverage pattern relates to race and class in Albuquerque. It's not perfect, but the distance from your house to a ShotSpotter sensor correlates fairly well with your household income. The wealthier you are, the less surveilled you are.

The "pocket of poverty" south of Downtown where I live, the historically Spanish Barelas and historically Black South Broadway, are predictably well covered. All of the photos here were taken within a mile, and I did not come even close to visiting all of the sensors. Within a one mile radius of the center of Barelas, there are 31 sensors.

ShotSpotter sensor in a neighborhood

Some are conspicuous. Washington Middle School, where 13-year-old Bennie Hargrove was shot by another student, has a sensor mounted at its front entrance. Another sensor is in the cul de sac behind the Coors and I-40 Walmart, where a body was found in a burned-out car. Perhaps the deep gulch of the freeway poses a coverage challenge, there are two more less than a thousand feet away.

In the Downtown Core, buildings were preferred to light poles. The PNM building, the Anasazi condos, and the Banque building are all feeding data into the city's failing scheme of federal prosecutions for downtown gun crime.

The closest sensor to the wealthy Heights is at Embudo Canyon, and coverage stops north of Central in the affluent Nob Hill residential area. Old Town is almost completely uncovered, as is the isolationist Four Hills.

Highland High School has a sensor on its swimming pool building. The data says there are two at the intersection of Gibson and Chavez, probably an error, it also says there are two sensors on "Null Island." Don't worry about coverage in the south campus area, though. There are 16 in the area bounded by I-25 to Yale and Gibson to Coal.

Detail of a ShotSpotter sensor

KOB quotes APD PIO Gallegos saying "We don't know, technically, where all the sensors are." Well, I suppose they do now, the leak has been widely reported on. APD received about 14,000 ShotSpotter reports last year. The accuracy of these reports, in terms of their correctly identifying gunfire, is contested. SoundThinking claims impressive statistics, but has actively resisted independent evaluation. A Chicago report found that only 11.3% of ShotSpotter reports could be confirmed as gunfire. APD, for its part, reports a few hundred suspects or victims identified as a result of ShotSpotter reports.

APD has used a local firearms training business, Calibers, to fire blanks around the city to verify detection. They say the system performed well.

But, if asked, they provide a form letter written by ShotSpotter. Their contract prohibits the disclosure of any actual data.


>>> 2024-02-25 a history of the tty

It's one of those anachronisms that is deeply embedded in modern technology. From cloud operator servers to embedded controllers in appliances, there must be uncountable devices that think they are connected to a TTY.

I will omit the many interesting details of the Linux terminal infrastructure here, as it could easily fill its own article. But most Linux users are at least peripherally aware that the kernel tends to identify both serial devices and terminals as TTYs, assigning them filesystem names in the form of /dev/tty*. Probably a lot of those people remember that this stands for teletype or perhaps teletypewriter, although in practice the term teleprinter is more common.

Indeed, from about the 1950s (the genesis of electronic computers) to the 1970s (the rise of video display terminals/VDTs), teleprinters were the most common form of interactive human-machine interface. The "interactive" distinction here is important; early computers were built primarily around noninteractive input and output, often using punched paper tape. Interactive operation was a more advanced form of computing, one that took almost until the widespread use of VDTs to mature. Look into the computers of the 1960s especially, the early days of interactive operation, and you will be amazed at how bizarre and unfriendly the command interface is. It wasn't really intended for people to use; it was for the Computer Operator (who had attended a lengthy training course on the topic) to troubleshoot problems in the noninteractive workload.

But interactive computing is yet another topic I will one day take on. Right now, I want to talk about the heritage of these input/output mechanisms. Why is it that punched paper tape and the teleprinter were the most obvious way to interact with the first electronic computers? As you might suspect, the arrangement was one of convenience. Paper tape punches and readers were already being manufactured, as were teleprinters. They were both used for communications.

Most people who hear about the telegraph think of Morse code keys and rhythmic beeping. Indeed, Samuel Morse is an important figure in the history of telegraphy. The form of "morse code" that we tend to imagine, though, a continuous wave "beep," is mostly an artifact of radio. For telegraphs, no carrier wave or radio modulation was required. You can transmit a message simply by interrupting the current on a wire.

This idea is rather simple to conceive and even to implement, so it's no surprise that telegraphy has a long history. By the end of the 18th century inventors in Europe and Great Britain were devising simple electrical telegraphs. These early telegraphs had limited ranges and even more limited speeds, though, a result mostly of the lack of a good way to indicate to the operator whether or not a current was present. It is an intriguing aspect of technical history that the first decades of experimentation with electricity were done with only the clumsiest means of measuring or even detecting it.

In 1820, three physicists or inventors (these were vague titles at the time) almost simultaneously worked out that electrical current induced a magnetic field. They invented various ways of demonstrating the effect, usually by deflecting a magnetic needle. This innovation quickly lead to the "electromagnetic telegraph," in which a telegrapher operates a key to switch current, which causes a needle or flag to deflect at the other end of the circuit. This was tremendously simpler than previous means of indicating current and was applied almost immediately to build the first practical telegraphs. During the 1830s, the invention of the relay allowed telegraph signals to be repeated or amplified as the potential weakened (the origin of the term "relay"). Edward Davy, one of the inventors of the relay, also invented the telegraph recorder.

From 1830 to 1850, so many people invented so many telegraph systems that it is difficult to succinctly describe how an early practical telegraph worked. There were certain themes: for non-recording systems, a needle was often deflected one way or the other by the presence or absence of current, or perhaps by polarity reversal. Sometimes the receiver would strike a bell or sound a buzzer with each change. In recording systems, a telegraph printer or telegraph recorder embossed a hole or left a small mark on a paper tape that advanced through the device. In the first case, the receiving operator would watch the needle, interpreting messages as they came. In the second case, the operator could examine the paper tape at their leisure, interpreting the message based on the distances between the dots.

Recording systems tended to be used for less time-sensitive operations like passing telegrams between cities, while non-recording telegraphs were used for more real-time applications like railroad dispatch and signaling. Regardless, it is important to understand that the teleprinter is about as old as the telegraph. Many early telegraphs recorded received signals onto paper.

The interpretation of telegraph signals was as varied as the equipment that carried them. Samuel Morse popularized the telegraph in the United States based in part on his alphabetic code, but it was not the first. Gauss famously devised a binary encoding for alphabetic characters a few years earlier, which resembles modern character encodings more than Morse's scheme. In many telegraph applications, though, there was no alphabetic code at all. Railroad signal telegraphs, for example, often used application-specific schemes that encoded types of trains and routes instead of letters.

Morse's telegraph system was very successful in the United States, and in 1861 a Morse telegraph line connected the coasts. It surprises some that a transcontinental telegraph line was completed some fifty years before the transcontinental telephone line. Telegraphy is older, though, because it is simpler. There is no analog signaling involved; simple on/off or polarity signals can be amplified using simple mechanical relays. The tendency to view text as more complex than voice (SMS came after the first cellphones, for one) has more to do with the last 50 years than the 50 years before.

The Morse telegraph system was practical enough to spawn a large industry, but suffered a key limitation: the level of experience required to key and copy Morse quickly and reliably is fairly high. Telegraphers were skilled and, thus, fairly well paid and sometimes in short supply [1]. To drive down the cost of telegraphy, there would need to be more automation.

Many of the earliest telegraph designs had employed parallel signaling. A common scheme was to provide one wire for each letter, and a common return. These were impractical to build over any meaningful distance, and Morse's one-wire design (along with one-wire designs by others) won out for obvious reasons. The idea of parallel signaling stayed around, though, and was reintroduced during the 1840s with a simple form of multiplexing: one "logical channel" for each letter could be combined onto one wire using time division muxing, for example by using a transmitter and receiver with synchronized spinning wheels. Letters would be presented by positions on the wheel, and a pulse sent at the appropriate point in the revolution to cause the teleprinter to produce that letter. With this alphabetic teleprinter, an experienced operator was no longer required to receive messages. They appeared as text on a strip of paper, ready for an unskilled clerk to read or paste onto a message card.

This system proved expensive but still practical to operate, and a network of such alphabetic teleprinters was built in the United States during the mid 19th century. A set of smaller telegraph companies operating one such system, called the Hughes system after its inventor, joined together to become the Western Union Telegraph Company. In a precedent that would be followed even more closely by the telephone system, practical commercial telegraphy was intertwined with a monopoly.

The Hughes system was functional but costly. The basic idea of multiplexing across 30 channels was difficult to achieve with mechanical technology. Émile Baudot was employed by the French telegraph service to find a way to better utilize telegraph lines. He first developed a proper form of multiplexing, using synchronized switches to combine five Hughes system messages onto one wire and separate them again at the other end. Likely inspired by his close inspection of the Hughes system and its limitations, Baudot went on to develop a more efficient scheme for the transmission of alphabetic messages: the Baudot code.

Baudot's system was similar to the Hughes system in that it relied on a transmitter and receiver kept in synchronization to interpret pulses as belonging to the correct logical channel. He simplified the design, though, by allowing for only five logical channels. Instead of each pulse representing a letter, the combination of all five channels would be used to form one symbol. The Baudot code was a five-bit binary alphabetic encoding, and most computer alphabetic encodings to the present day are at least partially derived from it.

One of the downsides of Baudot's design is that it was not quite as easy to operate as telegraphy companies would hope. Baudot equipment could keep up 30 words per minute with a skilled operator who could work the five-key piano-style keyboard in good synchronization with the mechanical armature that read it out. This took a great deal of practice, though, and pressing keys out of synchronization with the transmitter could easily cause incorrect letters to be sent.

In 1901, during the early days of the telephone, Donald Murray developed an important enhancement to the Baudot system. He was likely informed by an older practice that had been developed for Morse telegraphs, of having an operator punch a Morse message into paper tape to be transmitted by a simple tape reader later. He did the same for Baudot code: he designed a device with an easy to use typewriter-like keyboard that punched Baudot code onto a strip of paper tape with five rows, one for each bit. The tape punch had no need to be synchronized with the other end, and the operator could type at whatever pace they were comfortable.

The invention of Murray's tape punch brought about the low-cost telegram networks that we are familiar with from the early 20th century. A clerk would take down a message and then punch it onto paper tape. Later, the paper tape would be inserted into a reader that transmitted the Baudot message in perfect synchronization with the receiver, a teleprinter that typed it onto tape as text once again. The process of encoding and decoding messages for the telegraph was now fully automated.

The total operation of the system, though, was not. For one, the output was paper tape, that had to be cut and pasted to compose a paragraph of text. For another, the transmitting and receiving equipment operated continuously, requiring operators to coordinate on the scheduling of sending messages (or they would tie up the line and waste a lot of paper tape).

In a wonderful time capsule of early 20th century industrialism, the next major evolution would come about with considerable help from the Morton Salt Company. Joy Morton, its founder, agreed to fund Frank Pearne's efforts to develop an even more practical printing telegraph. This device would use a typewriter mechanism to produce the output as normal text on a page, saving considerable effort by clerks. Even better, it would use a system of control codes to indicate the beginning and end of messages, allowing a teleprinter to operate largely unattended. This was more complex than it sounded, as it required finding a way for the two ends to establish clock synchronization before the message.

There were, it turned out, others working on the same concept. After a series of patent disputes, mergers, and negotiations, the Morkrum-Kleinschmidt Company would market this new technology. A fully automated teleprinter, lurching into life when the other end had a message to send, producing pages of text like a typewriter with an invisible typist.

In 1928, Morkrum-Kleinschmidt adopted a rather more memorable name: the Teletype Corporation. During the development of the Teletype system, the telephone network had grown into a nationwide enterprise and one of the United States' largest industrial ventures (at many points in time, the country's single largest employer). AT&T had already entered the telegraph business by leasing its lines for telegraph use, and work had already begun on telegraphs that could operate over switched telephone lines, transmitting text as if it were a phone call. The telephone was born of the telegraph but came to consume it. In 1930, the Teletype Corporation was purchased by AT&T and became part of Western Electric.

That same year, Western Electric introduced the Teletype Model 15. Receiving Baudot at 45 baud [2] with an optional tape punch and tape reader, the Model 15 became a workhorse of American communications. By some accounts, the Model 15 was instrumental in the prosecution of World War II. The War Department made extensive use of AT&T-furnished teletype networks and Model 15 teleprinters as the core of the military logistics enterprise. The Model 15 was still being manufactured as late as 1963, a production record rivaled by few other electrical devices.

It is difficult to summarize the history of the networks that teleprinters enabled. The concept of switching connections between teleprinters, as was done on the phone network, was an obvious one. The dominant switched teleprinter network was Telex, not really an organization but actually a set of standards promulgated by the ITU. The most prominent US implementation of Telex was an AT&T service called TWX, short for Teletypewriter Exchange Service. TWX used Teletype teleprinters on phone lines (in a special class of service), and was a very popular service for business use from the '40s to the '70s.

Incidentally, TWX was assigned the special purpose area codes 510, 610, 710, 810, and 910, which contained only teleprinters. These area codes would eventually be assigned to other uses, but for a long time ranked among the "unusual" NPAs.

Western Union continued to develop their telegraph network during the era of TWX, acting in many ways as a sibling or shadow of AT&T. Like AT&T, Western Union developed multiplexing schemes to make better use of their long-distance telegraph lines. Like AT&T, Western Union developed automatic switching systems to decrease operator expenses. Like AT&T, Western Union built out a microwave network to increase the capacity of their long-haul network. Telegraphy is one of the areas where AT&T struggled despite their vast network, and Western Union kept ahead of them, purchasing the TWX service from AT&T. Western Union would continue to operate the switched teleprinter network, under the Telex name, into the '80s when it largely died out in favor of the newly developed fax machine.

During the era of TWX, encoding schemes changed several times as AT&T and Western Union developed better and faster equipment (Western Union continued to make use of Western Electric-built Teletype machines among other equipment). ASCII came to replace Baudot, and so a number of ASCII teleprinters existed. There were also hybrids. For some time Western Union operated teleprinters on an ASCII variant that provided only upper case letters and some punctuation, with the benefit of requiring fewer bits. The encoding and decoding of this reduced ASCII set was implemented by the Bell 101 telephone modem, designed in 1958 to allow SAGE computers to communicate with one another and then widely included in TWX and Telex teleprinters. The Bell 101's descendants would bring about remote access to time-sharing computer systems and, ultimately, one of the major forms of long-distance computer networking.

You can see, then, that the history of teleprinters and the history of computers are naturally interleaved. From an early stage, computers operated primarily on streams of characters. This basic concept is still the core of many modern computer systems and, not coincidentally, also describes the operation of teleprinters.

When electronic computers were under development in the 1950s and 1960s, teleprinters were near the apex of their popularity as a medium for business communications. Most people working on computers probably had experience with teleprinters; most organizations working on computers already had a number of teleprinters installed. It was quite natural that teleprinter technology would be repurposed as a means of input and output for computers.

Some of the very earliest computers, for example those of Konrad Zuse, employed punched tape as an input medium. These were almost invariably repurposed or modified telegraphic punched tape systems, often in five-bit Baudot. Particularly in retrospect, as more materials have become available to historians, it is clear that much of the groundwork for digital computing was laid by WWII cryptological efforts.

Newly devised cryptographic machines like the Lorenz ciphers were essentially teleprinters with added digital logic. The machines built to attack these codes, like Colossus, are now generally recognized as the first programmable computers. The line between teleprinter and computer was not always clear. As more encoding and control logic was added, teleprinters came to resemble simple computers.

The Manchester Mark I, a pioneer of stored-program computing built in 1949, used a 5-bit code adopted from Baudot by none other than Alan Turing. The major advantage of this 5-bit encoding was, of course, that programs could be read and written using Baudot tape and standard telegraph equipment. The addition of a teleprinter allowed operators to "interactively" enter instructions into the computer and read the output, although the concept of a shell (or any other designed user interface) had not yet been developed. EDSAC, a contemporary of the Mark I and precursor to a powerful tea logistics system that would set off the development of business computing, also used a teleprinter for input and output.

Many early commercial computers limited input and output to paper tape, often 5-bit for Baudot or 8-bit for ASCII with parity, as in the early days of computing preparation of a program was an exacting process that would not typically be done "on the fly" at a keyboard. It was, of course, convenient that teleprinters with tape punches could be used to prepare programs for entry into the computer.

Business computing is most obviously associated with IBM, a company that had large divisions building both computers and typewriters. The marriage of the two was inevitable considering the existing precedent. Beginning around 1960 it was standard for IBM computers to furnish a teleprinter as the operator interface, but IBM had a distinct heritage from the telecommunications industry and, for several reasons, was intent on maintaining that distinction. IBM's teleprinter-like devices were variously called Data Communications Systems, Printer-Keyboards, Consoles, and eventually Terminals. They generally operated over proprietary serial channels.

Other computer manufacturers didn't have typewriter divisions, and typewriters and teleprinters were actually rather complex mechanical devices and not all that easy to build. As a result, they tended to buy teleprinters from established manufacturers, often IBM or Western Electric. Consider the case of a rather famous non-IBM computer, the DEC PDP-1 of 1960. It came with a CRT graphics display as standard, and many sources will act as if this was the primary operator interface, but it is important to understand that early CRT graphics displays had a hard time with text. Text is rather complex to render when you are writing point-by-point to a CRT vector display from a rather slow machine. You would be surprised how many vertices a sentence has in it.

So despite the ready availability of CRTs in the 1960s (they were, of course, well established in the television industry), few computers used them for primary text input/output. Instead, the PDP-1 was furnished with a modified IBM typewriter as its console. This scheme of paying a third-party company (Soroban Engineering) to modify IBM typewriters for teleprinter control was apparently not very practical, and later DEC PDP models tended to use Western Electric Teletypes as user terminals. These had the considerable advantage that they were already designed to operate over long telephone circuits, making it easy to install multiple terminals throughout a building for time sharing use.

Indeed, time sharing was a natural fit for teleprinter terminals. With a teleprinter and a computer with a suitable modem, you could "call in" to a time sharing computer over the telephone from a remote office. Most of the first practical "computer networks" (term used broadly) were not actually networks of computers, but a single computer with many remote terminals. This architecture evolved into the BBS and early Internet-like services such as CompuServe. The idea was surprisingly easy to implement once time sharing operating systems were developed; the necessary hardware was already available from Western Electric.

While I cannot swear to the accuracy of this attribution, many sources suggest that the term "tty" as a generic reference to a user terminal or serial I/O channel originated with DEC. It seems reasonable; DEC's software was very influential on the broader computer industry, particularly outside of IBM. UNIX originally targeted a PDP-11 with teleprinters. While I can't prove it, it seems quite believable that the tty terminology was adopted directly from RT-11 or another operating system that Bell Labs staff might have used on the PDP-11.

Computers were born of the teleprinter and would inevitably come to consume them. After all, what is a computer but a complex teleprinter? Today, displaying text and accepting it from a keyboard is among the most basic functions of computers, and computers continue to perform this task using an architecture that would be familiar to engineers in the 1970s. They would likely be more surprised by what hasn't changed than what has: many of us still spend a lot of time in graphical software pretending to be a video display terminal built for compatibility with teleprinters.

And we're still using that 7-bit ASCII code a lot, aren't we. At least Baudot died out and we get to enjoy lower case letters.

[1] Actor, singer, etc. Gene Autry had worked as a telegrapher before he began his career in entertainment. This resulted in no small number of stories of a celebrity stand-in at the telegraph office. Yes, this is about to be a local history anecdote. It is fairly reliably reported that Gene Autry once volunteered to stand in for the telegrapher and station manager at the small Santa Fe Railroad station in Socorro, New Mexico, as the telegrapher had been temporarily overwhelmed by the simultaneous arrival of a packed train and a series of telegrams. There are enough of these stories about Gene that I think he really did keep his Morse sharp well into his acting career.

[2] Baud is a somewhat confusing unit derived from Baudot. Baud refers to the number of symbols per second on the underlying communication medium. For simple binary systems (and thus many computer communications systems we encounter daily), baud rate is equivalent to bit rate (bps). For systems that employ multi-level signaling, the bit rate will be higher than the baud rate, as multiple bits are represented per symbol on the wire. Methods like QAM are useful because they result in bit rates that are many multiples of the baud rate, reducing the bandwidth on the wire.


>>> 2024-02-11 the top of the DNS hierarchy

In the past (in fact two years ago, proof I have been doing this for a while now!) I wrote about the "inconvenient truth" that structural aspects of the Internet make truly decentralized systems infeasible, due to the lack of a means to perform broadcast discovery. As a result, most distributed systems rely on a set of central, semi-static nodes to perform initial introductions.

For example, Bitcoin relies on a small list of volunteer-operated domain names that resolve to known-good full nodes. Tor similarly uses a small set of central "directory servers" that provide initial node lists. Both systems have these lists hardcoded into their clients; coincidentally, both have nine trusted, central hostnames.

This sort of problem exists in basically all distributed systems that operate in environments where it is not possible to shout into the void and hope for a response. The internet, for good historic reasons, does not permit this kind of behavior. Here we should differentiate between distributed and decentralized, two terms I do not tend to select very carefully. Not all distributed systems are decentralized, indeed, many are not. One of the easiest and most practical ways to organize a distributed system is according to a hierarchy. This is a useful technique, so there are many examples, but a prominent and old one happens to also be part of the drivetrain mechanics of the internet: DNS, the domain name system.

My reader base is expanding and so I will provide a very brief bit of background. Many know that DNS is responsible for translating human-readable names like "computer.rip" into the actual numerical addresses used by the internet protocol. Perhaps a bit fewer know that DNS, as a system, is fundamentally organized around the hierarchy of these names. To examine the process of resolving a DNS name, it is sometimes more intuitive to reverse the name, and instead of "computer.rip", discuss "rip.computer" [1].

This name is hierarchical, it indicates that the record "computer" is within the zone "rip". "computer" is itself a zone and can contain yet more records, we tend to call these subdomains. But the term "subdomain" can be confusing as everything is a subdomain of something, even "rip" itself, which in a certain sense is a subdomain of the DNS root "." (which is why, of course, a stricter writing of the domain name computer.rip would be computer.rip., but as a culture we have rejected the trailing root dot).

Many of us probably know that each level of the DNS hierarchy has authoritative nameservers, operated typically by whoever controls the name (or their third-party DNS vendor). "rip" has authoritative DNS servers provided by a company called Rightside Group, a subsidiary of the operator of websites like eHow that went headfirst into the great DNS land grab and snapped up "rip" as a bit of land speculation, alongside such attractive properties as "lawyer" and "navy" and "republican" and "democrat", all of which I would like to own the "computer" subdomain of, but alas such dictionary words are usually already taken.

"computer.rip", of course, has authoritative nameservers operated by myself or my delegate. Unlike some people I know, I do not have any nostalgia for BIND, and so I pay a modest fee to a commercial DNS operator to do it for me. Some would be surprised that I pay for this; DNS is actually rather inexpensive to operate and authoritative name servers are almost universally available as a free perk from domain registrars and others. I just like to pay for this on the general feeling that companies that charge for a given service are probably more committed to its quality, and it really costs very little and changing it would take work.

To the observant reader, this might leave an interesting question. If even the top-level domains are subdomains of a secret, seldom-seen root domain ".", who operates the authoritative name servers for that zone?

And here we return to the matter of even distributed systems requiring central nodes. Bitcoin uses nine harcoded domain names for initial discovery of decentralized peers. DNS uses thirteen harcoded root servers to establish the top level of the hierarchy.

These root servers are commonly referred to as a.root-servers.net through m.root-servers.net, and indeed those are their domain names, but remember that when we need to use those root servers we have no entrypoint into the DNS hierarchy and so are not capable of resolving names. The root servers are much more meaningfully identified by their IP addresses, which are "semi-harcoded" into recursive resolves in the form of what's often called a root hints file. You can download a copy, it's a simple file in BIND zone format that BIND basically uses to bootstrap its cache.

And yes, there are other DNS implementations too, a surprising number of them, even in wide use. But when talking about DNS history we can mostly stick to BIND. BIND used to stand for Berkeley Internet Name Domain, and it is an apt rule of thumb in computer history that anything with a reference to UC Berkeley in the name is probably structurally important to the modern technology industry.

One of the things I wanted to get at, when I originally talked about central nodes in distributed systems, is the impact it has on trust and reliability. The TOR project is aware that the nine directory servers are an appealing target for attack or compromise, and technical measures have been taken to mitigate the possibility of malicious behavior. The Bitcoin project seems to mostly ignore that the DNS seeds exist, but of course the design of the Bitcoin system limits their compromise to certain types of attacks. In the case of DNS, much like most decentralized systems, there is a layer of long-lived caching for top-level domains that mitigates the impact of unavailability of the root servers, but still, in every one of these systems, there is the possibility of compromise or unavailability if the central nodes are attacked.

And so there is always a layer of policy. A trusted operator can never guarantee the trustworthiness of a central node (the node could be compromised, or the trusted operator could turn out to be the FBI), but it sure does help. Tor's directory servers are operated by the Tor project. Bitcoin's DNS seeds are operated by individuals with a long history of involvement in the project. DNS's root nodes are operated by a hodgepodge of companies and institutions that were important to the early internet.

Verisign operates two, of course. A California university operates one, of course, but amusingly not Berkeley. Three are operated by various arms of US defense. Some internet industry associations, an NCC, another university, ICANN runs one of them themselves. It's pretty random, though, and just reflects a set of organizations prominently involved in the early internet.

Some people, even some journalists I've come across, hear that there are 13 name servers and picture 13 4U boxes with a lot of blinking lights in heavily fortified data centers. Admittedly this description was more or less accurate in the early days, and a couple of the smaller root server operators did have single machines until surprisingly recently. But today, all thirteen root server IP addresses are anycast groups.

Anycast is not a concept you run into every day, because it's not really useful on local networks where multicast can be used. But it's very important to the modern internet. The idea is this: an IP address (really a subnetwork) is advertised by multiple BGP nodes. Other BGP nodes can select the advertisement they like the best, typically based on lowest hop count. As a user, you connect to a single IP address, but based on the BGP-informed routing tables of internet service providers your traffic could be directed to any number of sites. You can think of it as a form of load balancing at the IP layer, but it also has the performance benefit of users mostly connecting to nearby nodes, so it's widely used by CDNs for multiple reasons.

For DNS, though, where we often have a bootstrapping problem to solve, anycast is extremely useful as a way to handle "special" IP addresses that are used directly. For authoritative DNS servers like [2001:500:2f::f] [2] (root server F) or recursive resolvers like [2001:4860:4860::8888] (Google public DNS), anycast is the secret that allows a "single" address to correspond to a distributed system of nodes.

So there are thirteen DNS root servers in the sense that there are thirteen independently administered clusters of root servers (with the partial exception of A and J, both operated by Verisign, due to their acquisition of former A operator Network Solutions). Each of the thirteen root servers is, in practice, a fairly large number of anycast sites, sometimes over 100. The root server operators don't share much information about their internal implementation, but one can assume that in most cases the anycast sites consist of multiple servers as well, fronted by some sort of redundant network appliance. There may only be thirteen of them, but each of the thirteen is quite robust. For example, the root servers typically place their anycast sites in major internet exchanges distributed across both geography and provider networks. This makes it unlikely that any small number of failures would seriously affect the number of available sites. Even if a root server were to experience a major failure due to some sort of administration problem, there are twelve more.

Why thirteen, you might ask? No good reason. The number of root servers basically grew until the answer to an NS request for "." hit the 512 byte limit on UDP DNS responses. Optimizations over time allowed this number to grow (actually using single letters to identify the servers was one of these optimizations, allowing the basic compression used in DNS responses to collapse the matching root-servers.net part). Of course IPv6 blew DNS response sizes completely out of the water, leading to the development of the EDNS extension that allows for much larger responses.

13 is no longer the practical limit, but with how large some of the 13 are, no one sees a pressing need to add more. Besides, can you imagine the political considerations in our modern internet environment? The proposed operator would probably be Cloudflare or Google or Amazon or something and their motives would never be trusted. Incidentally, many of the anycast sites for root server F (operated by ISC) are Cloudflare data centers used under agreement.

We are, of course, currently trusting the motives of Verisign. You should never do this! But it's been that way for a long time, we're already committed. At least it isn't Network Solutions any more. I kind of miss when SRI was running DNS and military remote viewing.

But still, there's something a little uncomfortable about the situation. Billions of internet hosts depend on thirteen "servers" to have any functional access to the internet.

What if someone attacked them? Could they take the internet down? Wouldn't this cause a global crisis of a type seldom before seen? Should I be stockpiling DNS records alongside my canned water and iodine pills?

Wikipedia contains a great piece of comedic encyclopedia writing. In its article on the history of attacks on DNS root servers, it mentions the time, in 2012, that some-pastebin-user-claiming-to-be-Anonymous (one of the great internet security threats of that era) threatened to "shut the Internet down". "It may only last one hour, maybe more, maybe even a few days," the statement continues. "No matter what, it will be global. It will be known."

That's the end of the section. Some Wikipedia editor, no doubt familiar with the activities of Anonymous in 2012, apparently considered it self-evident that the attack never happened.

Anonymous may not have put in the effort, but others have. There have been several apparent DDoS attacks on the root DNS servers. One, in 2007, was significant enough that four of the root servers suffered---but there were nine more, and no serious impact was felt by internet users. This attack, like most meaningful DDoS, originated with a botnet. It had its footprint primarily in Korea, but C2 in the United States. The motivation for the attack, and who launched it, remains unknown.

There is a surprisingly large industry of "booters," commercial services that, for a fee, will DDoS a target of your choice. These tend to be operated by criminal groups with access to large botnets; the botnets are sometimes bought and sold and get their tasking from a network of resellers. It's a competitive industry. In the past, booters and botnet operators have sometimes been observed announcing a somewhat random target and taking it offline as, essentially, a sales demonstration. Since these demonstrations are a known behavior, any time a botnet targets something important for no discernible reason, analysts have a tendency to attribute it to a "show of force." I have little doubt that this is sometimes true, but as with the tendency to attribute monumental architecture to deity worship, it might be an overgeneralization of the motivations of botnet operators. Sometimes I wonder if they made a mistake, or maybe they were just a little drunk and a lot bored, who is to say?

The problem with this kind of attribution is evident in the case of the other significant attack on the DNS root servers, in 2015. Once again, some root servers were impacted badly enough that they became unreliable, but other root servers held on and there was little or even no impact to the public. This attack, though, had some interesting properties.

In the 2007 incident, the abnormal traffic to the root servers consisted of large, mostly-random DNS requests. This is basically the expected behavior of a DNS attack; using randomly generated hostnames in requests ensures that the responses won't be cached, making the DNS server exert more effort. Several major botnet clients have this "random subdomain request" functionality built in, normally used for attacks on specific authoritative DNS servers as a way to take the operator's website offline. Chinese security firm Qihoo 360, based on a large botnet honeypot they operate, reports that this type of DNS attack was very popular at the time.

The 2015 attack was different, though! Wikipedia, like many other websites, describes the attack as "valid queries for a single undisclosed domain name and then a different domain the next day." In fact, the domain names were disclosed, by at least 2016. The attack happened on two days. On the first day, all requests were for 336901.com. The second day, all requests were for 916yy.com.

Contemporaneous reporting is remarkably confused on the topic of these domain names, perhaps because they were not widely known, perhaps because few reporters bothered to check up on them thoroughly. Many sources make it sound like they were random domain names perhaps operated by the attacker, one goes so far as to say that they were registered with fake identities.

Well, my Mandarin isn't great, and I think the language barrier is a big part of the confusion. No doubt another part is a Western lack of familiarity with Chinese internet culture. To an American in the security industry, 336901.com would probably look at first like the result of a DGA or domain generation algorithm. A randomly-generated domain used specifically to be evasive. In China, though, numeric names like this are quite popular. Qihoo 360 is, after all, domestically branded as just 360---360.cn.

As far as I can tell, both domains were pretty normal Chinese websites related to mobile games. It's difficult or maybe impossible to tell now, but it seems reasonable to speculate that they were operated by the same company. I would assume they were something of a gray market operation, as there's a huge intersection between "mobile games," "gambling," and "target of DDoS attacks." For a long time, perhaps still today in the right corners of the industry, it was pretty routine for gray-market gambling websites to pay booters to DDoS each other.

In a 2016 presentation, security researchers from Verisign (Weinberg and Wessels) reported on their analysis of the attack based on traffic observed at Verisign root servers. They conclude that the traffic likely originated from multiple botnets or at least botnet clients with different configurations, since the attack traffic can be categorized into several apparently different types [3]. Based on command and control traffic from a source they don't disclose (perhaps from a Verisign honeynet?), they link the attack to the common "BillGates" [4] botnet. Most interestingly, they conclude that it was probably not intended as an attack on the DNS root: the choice of fixed domain names just doesn't make sense, and the traffic wasn't targeted at all root servers.

Instead, they suspect it was just what it looks like: an attack on the two websites the packets queried for, that for some reason was directed at the root servers instead of the authoritative servers for that second-level domain. This isn't a good strategy; the root servers are a far harder target than your average web hosting company's authoritative servers. But perhaps it was a mistake? An experiment to see if the root server operators might mitigate the DDoS by dropping requests for those two domains, incidentally taking the websites offline?

Remember that Qihoo 360 operates a large honeynet and was kind enough to publish a presentation on their analysis of root server attacks. Matching Verisign's conclusions, they link the attack to the BillGates botnet, and also note that they often observe multiple separate botnet C2 servers send tasks targeting the same domain names. This probably reflects the commercialized nature of modern botnets, with booters "subcontracting" operations to multiple botnet operators. It also handily explains Verisign's observation that the 2015 attack traffic seems to have come from more than one implementation a DNS DDoS.

360 reports that, on the first day, five different C2 servers tasked bots with attacking 336901.com. On the second day, three C2 servers tasked for 916yy.com. But they also have a much bigger revelation: throughout the time period of the attacks, they observed multiple tasks to attack 916yy.com using several different methods.

360 concludes that the 2015 DNS attack was most likely the result of a commodity DDoS operation that decided to experiment, directing traffic at the DNS roots instead of the authoritative server for the target to see what would happen. I doubt they thought they'd take down the root servers, but it seems totally reasonable that they might have wondered if the root server operators would filter DDoS traffic based on the domain name appearing in the requests.

Intriguingly, they note that some of the traffic originated with a DNS attack tool that had significant similarities to BillGates but didn't produce quite the same packets. Likely we will never know, but a likely explanation is that some group modified the BillGates DNS attack module or implemented a new one based on the method used by BillGates.

Tracking botnets gets very confusing very fast, there are just so many different variants of any major botnet client! BillGates originated, for example, as a Linux botnet. It was distributed to servers, not only through SSH but through vulnerabilities in MySQL and ElasticSearch. It was unusual, for a time, in being a major botnet that skipped over the most common desktop operating system. But ports of BillGates to Windows were later observed, distributed through an Internet Explorer vulnerability---classic Windows. Why someone chose to port a Linux botnet to Windows instead of using one of the several popular Windows botnets (Conficker, for example) is a mystery. Perhaps they had spent a lot of time building out BillGates C2 infrastructure and, like any good IT operation, wanted to simplify their cloud footprint.

High in the wizard's tower of the internet, thirteen elders are responsible for starting every recursive resolver on its own path to truth. There's a whole Neal Stephenson for Wired article there. But in practice it's a large and robust system. The extent of anycast routing used for the root DNS servers, to say nothing of CDNs, is one of those things that challenges are typical stacked view of the internet. Geographic load balancing is something we think of at high layers of the system, it's surprising to encounter it as a core part of a very low level process.

That's why we need to keep our thinking flexible: computers are towers of abstraction, and complexity can be added at nearly any level, as needed or convenient. Seldom is this more apparent than it is in any process called "bootstrapping." Some seemingly simpler parts of the internet, like DNS, rely on a great deal of complexity within other parts of the system, like BGP.

Now I'm just complaining about pedagogical use of the OSI model again.

[1] The fact that the DNS hierarchy is written from right-to-left while it's routinely used in URIs that are otherwise read left-to-right is one of those quirks of computer history. Basically an endianness inconsistency. Like American date order, to strictly interpret a URI you have to stop and reverse your analysis part way through. There's no particular reason that DNS is like that, there was just less consistency over most significant first/least significant first hierarchical ordering at the time and contemporaneous network protocols (consider the OSI stack) actually had a tendency towards least significant first.

[2] The IPv4 addresses of the root servers are ages old and mostly just a matter of chance, but the IPv6 addresses were assigned more recently and allowed an opportunity for something more meaningful. Reflecting the long tradition of identifying the root servers by their letter, many root server operators use IPv6 addresses where the host part can be written as the single letter of the server (i.e. root server C at [2001:500:2::c]). Others chose a host part of "53," a gesture at the port number used for DNS (i.e. root server J, [2001:7fe::53]). Others seem more random, Verisign uses 2:30 for both of their root servers (i.e. root server A, [2001:503:ba3e::2:30]), so maybe that means something to them, or maybe it was just convenient. Amusingly, the only operator that went for what I would call an address pun is the Defense Information Systems Agency, which put root server G at [2001:500:12::d0d].

[3] It really dates this story that there was some controversy around the source IPs of the attack, originating with none other than deceased security industry personality John McAfee. He angrily insisted that it was not plausible that the source IPs were spoofed. Of course botnets conducting DDoS attacks via DNS virtually always spoof the source IP, as there are few protections in place (at the time almost none at all) to prevent it. But John McAfee has always had a way of ginning up controversy where none was needed.

[4] Botnets are often bought, modified, and sold. They tend to go by various names from different security researchers and different variants. I'm calling this one "BillGates" because that's the funniest of the several names used for it.


>>> 2024-01-31 multi-channel audio part 2

Last time, we left off at the fact that modern films are distributed with their audio in multiple formats. Most of the time, there is a stereo version of the audio, and a multi-channel version of the audio that is perhaps 5.1 or 7.1 and compressed using one of several codecs that were designed within the film industry for this purpose.

But that was all about film, in physical form. In the modern world, films go out to theaters in the form of Digital Cinema Packages, a somewhat elaborate format that basically comes down to an encrypted motion JPEG 2000 stream with PCM audio. There are a lot of details there that I don't know very well and I don't want to get hung up on anyway, because I want to talk about the consumer experience.

As a consumer, there are a lot of ways you get movies. If you are a weirdo, you might buy a Blu-Ray disc. Optical discs are a nice case, because they tend to conform to a specification that allows relatively few options (so that players are reasonable to implement). Blu-Ray are allowed to encode their audio as linear PCM [1], Dolby Digital, Dolby TrueHD, DTS, DTS-HD, or DRA.

DRA is a common standard in the Chinese market but not in the US (that's where I live), so I'll ignore it. That still leaves three basic families of codecs, each of which have some variations. One of the interesting things about the Blu-Ray specification is that PCM audio can incorporate up to eight channels. The Blu-Ray spec allows up to 27,648 Kbps of audio, so it's actually quite feasible to do uncompressed, 24-bit, 96 kHz, 7.1 audio on a Blu-Ray disc. This is an unusual capability in a consumer standard and makes the terribly named Blu-Ray High Fidelity Pure Audio standard for Blu-Ray audio discs make more sense. Stick a pin in that, though, because you're going to have a tough time actually playing uncompressed 7.1.

On the other hand, you might use a streaming service. There's about a million of those and half of them have inane names ending in Plus, so I'm going to simplify by pretending that we're back in 2012 and Netflix is all that really matters. We can infer from Netflix help articles that Netflix delivers audio as AAC or Dolby Digital.

Or, consider the case of video files that you obtained by legal means. I looked at a few of the movies on my NAS to take a rough sampling. Most older films, and some newer ones, have stereo AAC audio. Some have what VLC describes as A52 aka AC3. A/52 is an ATSC standard that is equivalent to AC3, and AC-3 (hyphen inconsistent) is sort of the older name of Dolby Digital or the name of the underlying transport stream format, depending on how you squint at it. Less common, in my hodgepodge sample, is DTS, but I can find a few.

VLC typically describes the DTS and Dolby Digital as 3F2M/LFE, which is a somewhat eccentric (and I think specific to VLC) notation for 5.1 surround. An interesting detail is that VLC differentiates 3F2M/LFE and 3F2R/LFE, both 5.1, but with the two "surround" channels assigned to either side or rear positions. While 5.1 configurations with the surround channels to the side seem to be more standard, you could potentially put the two surround channels to the rear. Some formats have channel mapping metadata that can differentiate the two.

Because there is no rest for the weary, there is some inconsistency between "5.1 side" and "5.1 rear" in different standards and formats. At the end of the day, most applications don't really differentiate. I tend to consider surround channels on the side to be "correct," in that movie theaters are configured that way and thus it's ostensibly the design target for films. One of few true specifications I could find for general use, rather than design standards specific to theaters like THX, is ITU-R BS 775. It states that the surround channels of a 5.1 configuration should be mostly to the side, but slightly behind the listener.

That digression aside, it's unsurprising that a video file could contain a multi-channel stream. Most video containers today can support basically arbitrary numbers of streams, and you could put uncompressed multichannel audio into such a container if you wanted. And yet, multi-channel audio in films almost always comes in the form of a Dolby Digital or DTS stream. Why is that? Well, in part, because of tradition: they used to be the formats used by theaters, although digital cinema has somewhat changed that situation and the consumer versions have usually been a little different in the details. But the point stands, films are usually mastered in Dolby or DTS, so the "home video" release goes out with Dolby or DTS.

Another reason, though, is the problem of interconnections.

Let's talk a bit about interconnections. In a previous era of consumer audio, the age of "hi-fi," component systems dominated residential living rooms. In a component system, you had various audio sources that connected to a device that came to be known as a "receiver" since it typically had an FM/AM radio receiver integrated. It is perhaps more accurate to refer to it as an amplifier since that's the main role it serves in most modern systems, but there's also an increasing tendency to think of their input selection and DSP features as part of a preamp. The device itself is sometimes referred to as a preamp, in audiophile circles, when component amplifiers are used to drive the actual speakers. You can see that in these conventional component systems you need to move audio signals between devices. This kind of set up, though, is not common in households with fewer than four bathrooms and one swimming pool.

Most consumers today seem to have a television and, hopefully, some sort of audio device like a soundbar. Sometimes there are no audio interconnections at all! Often the only audio interconnection is from the TV to the soundbar via HDMI. Sometimes it's wireless! So audio interconnects as a topic can feel a touch antiquated today, but these interconnects still matter a lot in practice. First, they are often either the same as something used in industry or similar to something used in industry. Second, despite the increasing prevalence of 5.1 and 7.1 soundbar systems with wireless satellites, the kind of people with a large Blu-Ray collection are still likely to have a component home theater system. Third, legacy audio interconnects don't die that quickly, because a lot of people have an older video game console or something that they want to work with their new TV and soundbar, so manufacturers tend to throw in one or two audio interconnects even if they don't expect most consumers to use them.

So let's think about how to transport multi-channel audio. An ancient tradition in consumer audio says that stereo audio will be sent between components on two sets of two-conductor cables terminated by RCA connectors. The RCA connector dates back to to the Radio Corporation of America and, apparently, at least 1937. It remains in widespread service today. There are a surprising number of variations in this interconnect, in practice.

For one, the audio cables may be coaxial or just zipped up in a common jacket. Coaxial audio cables are a bit more expensive and a lot less flexible but admit less noise. There is a lot of confusion in this area because a particular digital transport we'll talk about later specified coaxial cables terminated in RCA connectors, but then is frequently used with non-coaxial cables terminated in RCA connectors, and for reasonable lengths usually still works fine. This has lead to a lot of consumer confusion and people thinking that any cable with RCA connectors is coaxial, when in fact, most of them are not. Virtually all of them are not. Unless you specifically paid more money to get a coaxial one, it's not, and even then sometimes it's not, because Amazon is a hotbed of scams.

Second, though these connections are routinely described as "line level" as if that means something, there is remarkably little standardization of the actual signaling. There are various conventions like 1.7v peak-to-peak and 2v peak-to-peak and about 1v peak-to-peak, and few consumer manufacturers bother to tell you which convention they have followed. There are also a surprising number of ways of expressing signaling levels, involving different measurement bases (peak vs RMS) and units (dBv vs dBu), making it a little difficult to interpret specifications when they are provided. This whole mess is just one of the reasons you find yourself having to make volume adjustments for different sources, or having to tune input levels on receivers with that option [2].

But that's all sort of a tangent, the point here is multi-channel audio. You could, conceptually, move 5.1 over six RCA cables, or 7.1 over eight RCA cables. Home theater receivers used to give you this option, but much like analog HDTV connections, it has largely disappeared.

There is one other analog option: remember Pro Logic, from the film soundtracks? that matrixed five channels into the analog stereo? Some analog formats like VHS and LaserDisc often had a Pro Logic soundtrack that could be "decoded" (really dematrixed) by a receiver with that capability, which used to be common. In this case you can transport multi-channel audio over your normal two RCA cables. The matrixing technique was always sort of cheating, though, and produces inferior results to actual multichannel interconnects. It's no longer common either.

Much like video, audio interconnects today have gone digital. Consumer digital audio really took flight with the elegantly named Sony/Philips Digital Interface, or S/PDIF. S/PDIF specifies a digital format that is extremely similar to, but not quite the same as, a professional digital interconnect called AES3. AES3 is typically carried on a three-conductor (balanced) cable with XLR connectors, though, which are too big an expensive for consumer equipment. In one of the weirder decisions in the history of consumer electronics, one that I can only imagine came out of an intractable political fight, S/PDIF specified two completely different physical transports: one electrical, and one optical.

The electrical format should be transmitted over a coaxial cable with RCA connectors. In practice it is often used over non-coaxial cables with RCA connectors, which will usually work fine if the length is short and nothing nearby is too electrically noisy. S/PDIF over non-coaxial cables is "fine" in the same way that HDMI cables longer than you are tall are "fine." If it doesn't work reliably, try a more expensive cable and you'll probably be good.

The optical format is used with cheap plastic optical cables terminated in a square connector called Toslink, originally for Toshiba Link, after the manufacturer that gave us the optical variant. Toslink is one of those great disappointments in consumer products. Despite the theoretical advantages of an optical interconnect, the extremely cheap cables used with Toslink mean it's mostly just worse than the electrical transport, especially when it comes to range [3].

But the oddity of S/PDIF's sibling formats isn't the interesting thing here. Let's talk about the actual S/PDIF bitstream, the very-AES3-like format the audio actually needs to get through.

S/PDIF was basically designed for CDs, and so it comfortably carries CD audio: two channels of 16 bit samples at 44.1kHz. In fact, it can comfortably go further, carrying 20 (or with the right equipment even 24) bit samples at the 48 kHz sampling rate more common of digital audio other than CDs. That's for two channels, though. Make the leap to six channels for 5.1 and you are well beyond the capabilities of an S/PDIF transceiver.

You see where this is going? compression.

See, the problems that Dolby Digital and DTS solved, of fitting multichannel audio onto the limited space of a 35mm film print, also very much exist in the world of S/PDIF. CDs brought us uncompressed digital audio remarkably early on, but also set sort of a constraint on the bitrate of digital audio streams that ensured the opposite in the world of multi-channel theatrical sound. It sort of makes sense, anyway. DTS soundtracks came on CDs!

Of course even S/PDIF is looking rather long in the tooth today. I don't think I use it at all any more, which is not something I expected to be saying this soon. Today, though, all of my audio sources and sinks are either analog or have HDMI. HDMI is the de facto norm for consumer digital audio today.

HDMI is a complex thing when it comes to audio or, really, just about anything. Details like eARC and the specific HDMI version have all kinds of impacts on what kind of audio can be carried, and the same is true for video as well. I am going to spare a lengthy diversion into the many variants of HDMI, which seem almost as numerous as those of USB, and talk about HDMI 2.1.

Unsurprisingly, considering the numerous extra conductors and newer line coding, HDMI offers a lot more bandwidth for audio than S/PDIF. In fact, you can transport 8 channels of uncompressed 24-bit PCM at 192kHz. That's about 37 Mbps, which is not that fast for a data transport but sure is pretty fast for an audio cable. Considering the bandwidth requirements for 4K video at 120Hz, though, it's only a minor ask. With HDMI, compression of audio is no longer necessary.

But we still usually do it.

Why? Well, basically everything can handle Dolby Digital or DTS, and so films are mostly mastered to Dolby Digital or DTS, and so we mostly use Dolby Digital or DTS. That's just the way of things.

One of the interesting implications of this whole thing is that audio stacks have to deal with multiple formats and figure out which format is in use. That's not really new, with Dolby Pro Logic you either had to turn it on/off with a switch or the receiver had to try to infer whether or not Pro Logic had been used to matrix a multichannel soundtrack to stereo. For S/PDIF, IEC 61937 standardizes a format that can be used to encapsulate a compressed audio stream with sufficient metadata to determine the type of compression. HDMI adopts the same standard to identify compressed audio streams (and, in general, HDMI audio is pretty much in the same bitstream format as good old S/PDIF, but you can have a lot more of it).

In practice, there are a lot of headaches around this format switching. For one, home theater receivers have to switch between decoding modes. They mostly do this transparently and without any fuss, but I've owned a couple that had occasional issues with losing track of which format was in use, leading to dropouts. Maybe related to signal dropouts but my current receiver has the same problem with internal sources, so it seems more like a software bug of some sort.

It's a lot more complicated when you get out of dedicated home theater devices, though. Consider the audio stack of a general-purpose operating system. First, PCs rarely have S/PDIF outputs, so we are virtually always talking about HDMI. For a surprisingly long time, common video cards had no support for audio over HDMI. This is fortunately a problem of the past, but unfortunately ubiquitous audio over HDMI means that your graphics drivers are now involved in the transport of audio, and graphics drivers are notoriously bad at reliably producing video, much less dealing with audio as a side business. I shudder to think of the hours of my life I have lost dealing with defects of AMD's DTS support.

Things are weird on the host software side, though. The operating system does not normally handle sound in formats even resembling Dolby Digital or DTS. So, when you play a video file with audio encoded in one of those formats, a "passthrough" feature is typically used to deliver the compressed stream directly to the audio (often actually video) device, without normal operating system intervention. We are reaching the point where this mostly just works but you will still notice some symptoms of the underlying complexity.

On Linux, it's possible to get this working, but in part because of licensing issues I don't think any distros will do it right out of the box. My knowledge may be out of date as I haven't tried for some time, but I am still seeing Kodi forum threads about bash scripts to bypass PulseAudio, so things seem mostly unchanged.

There are other frustrations, as well. For one, the whole architecture of multichannel audio interconnection is based around sinks detecting the mode used by the source. That means that your home theater receiver should figure out what your video player is doing, but your video player has no idea what your home theater receiver is doing. This manifests in maddening ways. Consider, for example, the number of blog posts I ran across (while searching for something else!) about how to make Netflix less quiet by disabling surround sound.

If Netflix has 5.1 audio they deliver it; they don't know what your speaker setup is. But what if you don't have 5.1 speakers? In principal you could downmix the 5.1 back to stereo, and a lot of home theater receivers have DSP modes that do this (and in general downmix 5.1 or 7.1 to whatever speaker channels are active, good for people with less common setups like my own 3.1). But you'd have to turn that on, which means having a receiver or soundbar or whatever that is capable, understanding the issue, and knowing how to enable that mode. That is way more than your average Netflix watcher wants to think about any of this. In practice, setting the Netflix player to only ever provide stereo audio is an easier fix.

The use of compressed multichannel formats that are decoded in the receiver rather than the computer playing back introduces other problems as well, like source equalization. If you have a computer connected to a home theater receiver (which is a ridiculous thing to do and yet here I am), you have two completely parallel audio stacks: "normal" audio that passes through the OS sound server and goes to the receiver as PCM, and "surround sound" that bypasses the OS sound server and goes to the receiver as Dolby Digital or DTS. It is very easy to have differences in levels, adjustments, latency, etc. between these two paths. The level problem here is just one of the several factors in the perennial "Plex is too quiet" forum threads [4].

Finally, let's talk about what may be, to some readers, the elephant in the room. I keep talking about Dolby Digital and DTS, but both are 5.1 formats, and 5.1 is going out of fashion in the movie world. Sure, there's Dolby Digital Plus which is 7.1, but it's so similar to the non-plus variant that there isn't much use in addressing them separately. Insert the "Plus" after Dolby Digital in the proceeding paragraphs if it makes you feel better.

But there are two significantly different formats appearing on more and more film releases, especially in the relatively space-unconstrained Blu-Ray versions: lossless surround sound and object-based surround sound.

First, lossless is basically what it sounds like. Dolby TrueHD and DTS-HD are both formats that present 7.1 surround with only lossless compression, at the cost of a higher bitrate than older media and interconnects support. HDMI can easily handle these, and if you have a fairly new setup of a Blu-Ray player and recent home theater receiver connected by HDMI you should be able to enjoy a lossless digital soundtrack on films that were released with one. That's sort of the end of that topic, it's nothing that revolutionary.

But what about object-based surround sound? I'm using that somewhat lengthy term to try to avoid singling out one commercial product, but, well, there's basically one commercial product: Dolby Atmos. Atmos is heralded as a revolution in surround sound in a way that makes it sort of hard to know what it actually is. Here's the basic idea: instead of mastering a soundtrack by mixing audio sources into channels, you master a soundtrack by specifying the physical location (in cartesian coordinates) of each sound source.

When the audio is played back, an Atmos decoder then mixes the audio into channels on the fly, using whatever channels are available. Atmos allows the same soundtrack to be used by theaters with a variety of different speaker configurations, and as a result, makes it practical for theaters to expand into much higher channel counts.

Theaters aren't nearly as important a part of the film industry as they used to be, though, and unsurprisingly Atmos is heavily advertised for consumer equipment as well. How exactly does that work?

Atmos is conveyed on consumer equipment as 7.1 Dolby Digital Plus or Dolby TrueHD with extra metadata.

If you know anything about HDR video, also known as SDR video with extra metadata, you will find this unsurprising. But some might be confused. The thing is, the vast majority of consumers don't have Atmos equipment, and with lossless compression soundtracks are starting to get very large so including two complete copies isn't very appealing. The consumer encoding of Atmos was selected to have direct backward compatibility to 7.1 systems, allowing normal playback on pre-Atmos equipment.

For Atmos-capable equipment, an extra PCM-like subchannel (at a reduced bitrate compared to the audio channels) is used to describe the 3D position of specific sound sources. Consumer Atmos decoders cannot support as many objects as the theatrical version, so part of the process of mastering an Atmos film for home release is clustering nearby objects into groups that are then treated as a single object by the consumer Atmos decoder. One way to think about this is that Atmos is downmixed to 7.1, and in the process a metadata stream is created that can be used to upmix back to Atmos mostly correctly. If it sounds kind of like matrix encoding it kind of is, in effect, which is perhaps part of why Dolby's marketing materials are so insistent that it is not matrix encoding. To be fair it is a completely different implementation, but has a similar effect of reducing the channel separation compared to the original source.

Also I don't think Atmos has really taken off in home setups? I might just be out of date here, I think half the soundbars on the market today claim Atmos support and amazing feats with their five channels two of which are pointed up. I'm just pretty skeptical of the whole "we have made fewer, smaller speakers behave as if they were more, bigger speakers" school of audio products. Sorry Dr. Bose, there's just no replacement for displacement.

[1] The term Linear PCM or LPCM is used to clarify that no companding has been performed. This is useful because PCM originated for the telephone network, which uses companding as standard. LPCM clarifies that neither μ-law companding nor A-law companding has been performed. I will mostly just use PCM because I'm talking about movies and stuff, where companding digital audio is rare.

[2] There is also the matter of magnetic sources like turntables and microphones that produce much lower output levels than a typical "line level." Ideally you need a preamplifier with adjustable gain for these, although in the case of turntables there are generally accepted gain levels for the two common types of cartridges. A lot of preamplifiers either let you choose from those two or give you no control at all. Traditionally a receiver would have a built-in preamplifier to bring up the level of the signal on the turntable inputs, but a lot of newer receivers have left this out to save money, which leads to hipsters with vinyl collections having to really crank the volume.

[3] I don't feel like I should have to say this, but in the world of audio, I probably do: if it works, it doesn't matter! The problem with optical is that it develops reliability problems over shorter lengths than the electrical format. If you aren't getting missing samples (dropouts) in the audio, though, it's working fine and changing around cables isn't going to get you anything. In practice the length limitations on optical don't tend to matter very much anyway, since the average distance between two pieces of a component home theater system is, what, ten inches?

[4] Among the myriad other factors here is the more difficult problem that movies mix most of the dialog into the center channel while most viewers don't have a center channel. That means you need to remix the center channel into left and right to recover dialog. So-called professionals mastering Blu-Ray releases don't always get this right, and you're in even more trouble if you're having to do it yourself.


>>> 2024-01-21 multi-channel audio part 1

Stereophonic or two-channel audio is so ubiquitous today that we tend to refer to all kinds of pieces of consumer audio reproduction equipment as "a stereo." As you might imagine, this is a relatively modern phenomenon. While stereo audio in concept dates to the late 19th century, it wasn't common in consumer settings until the 1960s and 1970s. Those were very busy decades in the music industry, and radio stations, records, and film soundtracks all came to be distributed primarily in stereo.

Given the success of stereo, though, one wonders why larger numbers of channels have met more limited success. There are, as usual, a number of factors. For one, two-channel audio was thought to be "enough" by some, considering that humans have two ears. Now it doesn't quite work this way in practice, and we are more sensitive to the direction from which sound comes than our binaural system would suggest. Still, there are probably diminishing returns, with stereo producing the most notable improvement in listening experience over mono.

There are also, though, technical limitations at play. The dominant form of recorded music during the transition to stereo was the vinyl record. There is a fairly straightforward way to record stereo on a record, by using a cartridge with coils on two opposing axes. This is the limit, though: you cannot add additional channels as you have run out of dimensions in the needle's free movement.

This was probably the main cause of the failure of quadraphonic sound, the first music industry attempt at pushing more channels. Introduced almost immediately after stereo in the 1970s, quadraphonic or four-channel sound seemed like the next logical step. It couldn't really be encoded on records, so a matrix encoding system was used in which the front-rear difference was encoded as phase shift in the left and right channels. In practice this system worked poorly, and especially early quadraphonic systems could sound noticeably worse than the stereo version. Wendy Carlos, an advocate of quadraphonic sound but harsh critic of musical electronics, complained bitterly about the inferiority of so-called quadraphonic records when compared to true four-channel recordings, for example on tape.

Of course, four-channel tape players were vastly more expensive than record players in the 1970s, as they ironically remain today. Quadraphonic sound was in a bind: it was either too expensive or too poor of quality to appeal to consumers. Quadraphonic radio using the same matrix encoding, while investigated by some broadcasters, had its own set of problems and never saw permanent deployment. Alan Parsons famously produced Pink Floyd's "Dark Side of the Moon" in quadraphonic sound; the effort was a failure in several ways but most memorably because, by the time of the album's release in 1973, the quadraphonic experiment was essentially over.

Three-or-more-channel-sound would have its comeback just a few years later, though, by the efforts of a different industry. Understanding this requires backtracking a bit, though, to consider the history of cinema prints.

Many are probably at least peripherally aware of Cinerama, an eccentric-seeming film format that used three separate cameras, and three separate projectors, to produce an exceptionally widescreen image. Cinerama's excess was not limited to the picture: it involved not only the three 35mm film reels for the three screen panels, but also a fourth 35mm film that was entirely coated with a magnetic substrate and was used to store seven channels of audio. Five channels were placed behind the screen, effectively becoming center, left, right, left side, and right side. The final two tracks were played back behind the audience, as the surround left and surround right.

Cinerama debuted in 1952, decades before 35mm films would typically carry even stereo audio. Like quadraphonic sound later, Cinerama was not particularly successful. By the time stereo records were common, Cinerama had been replaced by wider film formats and anamorphic formats in which the image was horizontally compressed by the lens of the camera, and expanded by the lens of the projector. Late Cinerama films like 2001: A Space Odyssey were actually filmed Super Panavision 70 and projected onto Cinerama screens from a single projector with a specialized lens.

There's a reason people talk so much about Cinerama, though. While it was not a commercial success, it was influential on the film industry to come. Widescreen formats, mostly anamorphic, would become increasingly common in the following decades. It would take years longer, but so would seven-channel theatrical sound.

"Surround sound," as these multi-channel formats came to be known in the late '50s, would come and go in theatrical presentations throughout the mid-century even as the vast majority of films were presented monaurally, with only a single channel. Most of these relied on either a second 35mm reel for audio only, or the greater area for magnetic audio tracks allowed by 70mm film. Both of these options were substantially more expensive for the presenting theater than mono, limiting surround sound mostly to high-end theaters and premiers. For surround sound to become common, it had to become cheap.

1971's A Clockwork Orange (I will try not to fawn over Stanley Kubrick too much but you are learning something about my film preferences here) employed a modest bit of audio technology, something that was becoming well established in the music industry but was new to film. The magnetic recordings used during the production process employed Dolby Type A noise reduction, similar to what became popular on compact cassette tapes, for a slight improvement in audio quality. The film was still mostly screened in magnetic mono, but it was the beginning of a profitable relationship between Dolby Labs and the film industry. Over the following years a number of films were released with Dolby Type A noise reduction on the actual distribution print, and some theaters purchased decoders to use with these prints. Dolby had bigger ambitions, though.

Around the same time, Kodak had been experimenting with the addition of stereo audio to 35mm release prints, using two optical tracks. They applied Dolby noise reduction to these experimental prints, and brought Dolby in to consult. This presented the perfect opportunity to implement an idea Dolby had been considering. Remember the matrix encoded quadraphonic recording that had been a failure for records? Dolby licensed a later-generation matrix decoder design from Sansui, and applied it to Kodak's stereo film soundtracks, allowing separation into four channels. While the music industry had placed the four channels at the four corners of the soundstage, the film industry had different tastes, driven mostly by the need to place dialog squarely in the center of the field. Dolby's variant of quadraphonic audio was used to present left, right, center, and a "surround" or side channel. This audio format went through several iterations, including much improved matrix decoding, and along the way picked up a name that is still familiar today: Dolby Stereo.

That Dolby Stereo is, in fact, a quadraphonic format reflects a general atmosphere of terminological confusion in the surround sound industry. Keep this in mind.

One of Dolby Stereo's most important properties was its backwards compatibility. The two optical tracks could be played back on a two-channel (or actually stereo) system and still sound alright. They could even be placed on the print alongside the older magnetic mono audio, providing compatibility with mono theaters. This compatibility with fewer channels became one of the most important traits in surround sound systems, and somewhat incidentally served to bring them to the consumer. Since the Dolby Stereo soundtrack played fine on a two-channel system, home releases of films on formats like VHS and Laserdisc often included the original Dolby Stereo audio from the print. A small industry formed around these home releases, licensing the Dolby technology to sell consumer decoders that could recover surround sound from home video.

For cost reasons these decoders were inferior to Dolby's own in several ways, and to avoid the hazard of damage to the Dolby Stereo brand, Dolby introduced a new marketing name for consumer Dolby Stereo decoders: Dolby Surround.

By the 1980s, Dolby Stereo, or Dolby Surround, had become the most common audio format on theatrical presentations and their home video releases. Even some television programs and direct-to-video material was recorded in Dolby Surround. Consumer stereo receivers, in the variant that came to be known as the home theater receiver, often incorporated Dolby Surround decoders. Improvements in consumer electronics brought the cost of proper Dolby Stereo decoders down, and so the home systems came to resemble the theatrical systems as well. Seeking a new brand to unify the whole mess of Dolby Stereo and Dolby Surround (which, confusingly, were often 4 and 3 channel, respectively), Dolby seems to have turned to the "Advanced Logic" and "Full Logic" terms once used by manufacturers of quadraphonic decoders. Dolby's theatrical sound solution came to be known as Dolby Pro Logic. A Dolby Pro Logic decoder processed two audio channels to produce a four-channel output. According to a modern naming convention, Dolby Pro Logic is a 4.0 system: four full-bandwidth channels.

This entire thing, so far, has been a preamble to the topic I actually meant to discuss. It's an interesting preamble, though! I just want to apologize that I didn't mean to write a history of multi-channel audio distribution and so this one isn't especially complete. I left out a number of interesting attempts at multi-channel formats, of which the film industry produced a surprising number, and instead focused on the ones that were influential and/or used for Kubrick films [1].

Dolby Pro Logic, despite its impressive name, was still an analog format, based on an early '70s technique. Later developments would see an increase in the number of channels, and the transition to digital audio formats.

Recall that 70mm film provided six magnetic audio channels, which were often used in an approximation of the seven-channel Cinerama format. Dolby experimented with the six-channel format, though, confusingly also under the scope of the Dolby Stereo product. During the '70s, Dolby observed that the ability of humans to differentiate the source of a sound is significantly reduced as the sound becomes lower in frequency. This had obvious potential for surround sound systems, enabling something analogous to chroma subsampling in video. The lower-frequency component of surround sound does not need to be directional, and for a sense of directionality the high frequencies are most important.

Besides, bassheads were coming to the film industry. The long-used Academy response curve fell out of fashion during the '70s, in part due to Dolby's work, in part due to generally improved loudspeaker technology, and in part due to the increasing popularity of bass-heavy action films. Several 70mm releases used one or more of the audio channels as dedicated bass channels.

For the 1979 film Apocalypse Now in its 70mm print, Dolby premiered a 5.1 format in which three full-bandwidth channels were used for center, left, and right, two channels with high-pass filtering were used for surround left and surround right, and one channel with low-pass filtering was used for bass. Apocalypse Now was not, in fact, the first film to use this channel configuration, but Dolby promoted it far more than the studios had.

Interestingly, while I know less about live production history, the famous cabaret Moulin Rouge apparently used a 5.1 configuration during the 1980s. Moulin Rouge was prominent enough to give the 5.1 format a boost in popularity, perhaps particularly important because of the film industry's indecision on audio formats.

The seven-channel concept of the original Cinerama must have hung around in the film industry, as there was continuing interest in a seven-channel surround configuration. At the same time, the music industry widely adopted eight-channel tape recorders for studio use, making eight-channel audio equipment readily available. The extension to 7.1 surround, adding left and right side channels to the 5.1 configuration, was perhaps obvious. Indeed, what I find strangest about 7.1 is just how late it was introduced to film. Would you believe that the first film released (not merely remastered or mixed for Blu-Ray) in 7.1 was 2010's Toy Story 3?

7.1 home theater systems were already fairly common by then, a notable example of a modern trend afflicting the film industry: the large installed base and cost avoidance of the theater industry means that consumer home theater equipment now evolves more quickly than theatrical systems. Indeed, while 7.1 became the gold standard in home theater audio during the 2000s, 5.1 remains the dominant format in theatrical sound systems today.

Systems with more than eight channels are now in use, but haven't caught on in the consumer setting. We'll talk about those later. For most purposes, eight-channel 7.1 surround sound is the most complex you will encounter in home media. The audio may take a rather circuitous route to its 7.1 representation, but, well, we'll get to that.

Let's shift focus, though, and talk a bit about the actual encodings. Audio systems up to 7.1 can be implemented using analog recording, but numerous analog channels impose practical constraints. For one, they are physically large, making it infeasible to put even analog 5.1 onto 35mm prints. Prestige multi-channel audio formats like that of IMAX often avoided this problem by putting the audio onto an entirely separate film reel (much like Cinerama back at the beginning), synchronized with the image using a pulse track and special equipment. This worked well but drove up costs considerably. Dolby Stereo demonstrated that it was possible to matrix four channels into two channels (with limitations), but considering the practical bandwidth of the magnetic or optical audio tracks on film you couldn't push this technique much further.

Remember that the theatrical audio situation changed radically during the 1970s, going from almost universal mono audio to four channels as routine and six channels for premiers and 70mm. During the same decade, the music reproduction industry, especially in Japan, was exploring another major advancement: digital audio encoding.

In 1980, the Compact Disc launched. Numerous factors contributed to the rapid success of CDs over vinyl and, to a lesser but still great extent, the compact cassette. One of them was the quality of the audio reproduction. CDs were a night and day change: records could produce an excellent result but almost always suffered from dirt and damage. Cassette tapes were better than most of us remember but still had limited bandwidth and a high noise floor, requiring Dolby noise reduction for good results. The CD, though, provided lossless digital audio.

Audio is encoded on an audio CD in PCM format. PCM, or pulse code modulation, is a somewhat confusing term that originated in the telephone industry. If we were to reinvent it today, we would probably just call it digital modulation. To encode a CD, audio is sampled (at 44.1 kHz for historic reasons) and quantized to 16 bits. A CD carries two channels, stereo, which was by then the universal format for music. Put together, those add up to 1.4Mbps. This was a very challenging data rate in 1980, and indeed, practical CD players relied on the fact that the data did not need to be read perfectly (error correcting codes were used) and did not need to be stored (going directly to a digital to analog converter). These were conveniently common traits of audio reproduction systems, and the CD demonstrated that digital audio was far more practical than the computing technology of the time would suggest.

The future of theatrical sound would be digital. Indeed, many films would be distributed with their soundtracks on CD.

There remained a problem, though: a CD could encode two channels. Even four channels wouldn't fit within the data rate CD equipment was capable of, much less six or eight. The film industry would need to formats that could encode six or eight channels of audio into either the bandwidth of a two-channel signal or into precious unused space on 35mm film prints.

Many ingenious solutions were developed. A typical 35mm film print today contains three distinct representations of the audio: a two-channel optical signal outside of the sprocket holes (which could encode Dolby Stereo), a continuous 2D barcode between the frame and sprocket holes which carries the SDDS (Sony Dynamic Digital Sound) digital signal, and individual 2D barcodes between the sprocket holes which encode the Dolby digital signal. Finally, a small pulse pattern at the very edge of the film provides a time code used for synchronization with audio played back from a CD, the DTS system.

But then, a typical 35mm film print today wouldn't exist, as 35mm film distribution has all but disappeared. Almost all modern film is played back entirely digitally from some sort of flexible stream container. You would think, then, that the struggles of encoding multi-channel audio are over. Many media container formats can, after all, contain an arbitrary number of audio channels.

Nothing is ever so simple. Much like a dedicated audio reel adds cost, multiple audio channels inflate file sizes, media cost, and in the era of playback from optical media, could stress the practical read rate. Besides, constraints of the past have a way of sticking around. Every multichannel audio format to find widespread success in the film industry has done so by maintaining backwards compatibility with simple mono and stereo equipment. That continues to be true today: modern multi-channel digital audio formats are still mostly built as extensions of an existing stereo encoding, not as truly new arbitrary-channel formats.

At the same time, the theatrical sound industry has begun a transition away from channel-centric audio formats and towards a more flexible system that is much further removed from the actual playback equipment.

Another trend has emerged since 1980 as well, which you probably already suspected from the multiple formats included in 35mm prints. Dolby's supremacy in multi-channel audio was never as complete as I made it sound, although they did become (and for some time remained) the most popular surround sound solution. They have always had competition, and that's still true today. Just as 35mm prints came with the audio in multiple formats, current digitally distributed films often do as well.

In Part 2, I'll get to the topic I meant to write about today before I got distracted by history: the landscape of audio formats included in digitally distributed films and common video files today, and some of the ways they interact remarkably poorly with computers. We're going to talk about:

Postscript: Film dweebs will of course wonder where George Lucas is in this story. His work on the Star Wars trilogy lead to the creation of THX, a company that will long be remembered for its distinctive audio identity. The odd thing is that THX was never exactly a technology company, although it was closely involved in sound technology developments of the time. THX was essentially a certification agency: THX theaters installed equipment by others (Altec Lansing, for much of the 20th century), and used any of the popular multi-channel audio formats.

To be a THX-certified theater, certain performance requirements had to be met, regardless of the equipment and format in use. THX certification requirements included architectural design standards for theaters, performance specifications for audio equipment, and a specific crossover configuration designed by Lucasfilm.

In 2002, Lucasfilm spun out THX and it essentially became a rental brand, shuffled into the ownership of gamer headphone manufacturer Razer today. THX certification still pops up in some consumer home theater equipment but is no longer part of the theatrical audio industry.

Read part 2 >

[1] Incidentally, Kubrick did not adapt to Dolby Stereo. Despite his early experience with Dolby noise reduction, all of his films would be released in mono except for 2001 (six-channel audio only in the Cinerama release) and Eyes Wide Shut (edited in Dolby Stereo after Kubrick's death).

<- newer                                                                older ->