It's one of those anachronisms that is deeply embedded in modern technology.
From cloud operator servers to embedded controllers in appliances, there
must be uncountable devices that think they are connected to a TTY.
I will omit the many interesting details of the Linux terminal infrastructure
here, as it could easily fill its own article. But most Linux users are at
least peripherally aware that the kernel tends to identify both serial devices
and terminals as TTYs, assigning them filesystem names in the form of
/dev/tty*. Probably a lot of those people remember that this stands for
teletype or perhaps teletypewriter, although in practice the term teleprinter
is more common.
Indeed, from about the 1950s (the genesis of electronic computers) to the 1970s
(the rise of video display terminals/VDTs), teleprinters were the most common
form of interactive human-machine interface. The "interactive" distinction here
is important; early computers were built primarily around noninteractive input
and output, often using punched paper tape. Interactive operation was a more
advanced form of computing, one that took almost until the widespread use of
VDTs to mature. Look into the computers of the 1960s especially, the early days
of interactive operation, and you will be amazed at how bizarre and unfriendly
the command interface is. It wasn't really intended for people to use; it was
for the Computer Operator (who had attended a lengthy training course on the
topic) to troubleshoot problems in the noninteractive workload.
But interactive computing is yet another topic I will one day take on. Right
now, I want to talk about the heritage of these input/output mechanisms. Why is
it that punched paper tape and the teleprinter were the most obvious way to
interact with the first electronic computers? As you might suspect, the
arrangement was one of convenience. Paper tape punches and readers were already
being manufactured, as were teleprinters. They were both used for
communications.
Most people who hear about the telegraph think of Morse code keys and rhythmic
beeping. Indeed, Samuel Morse is an important figure in the history of
telegraphy. The form of "morse code" that we tend to imagine, though, a
continuous wave "beep," is mostly an artifact of radio. For telegraphs, no
carrier wave or radio modulation was required. You can transmit a message
simply by interrupting the current on a wire.
This idea is rather simple to conceive and even to implement, so it's no
surprise that telegraphy has a long history. By the end of the 18th century
inventors in Europe and Great Britain were devising simple electrical
telegraphs. These early telegraphs had limited ranges and even more limited
speeds, though, a result mostly of the lack of a good way to indicate to the
operator whether or not a current was present. It is an intriguing aspect of
technical history that the first decades of experimentation with electricity
were done with only the clumsiest means of measuring or even detecting it.
In 1820, three physicists or inventors (these were vague titles at the time)
almost simultaneously worked out that electrical current induced a magnetic
field. They invented various ways of demonstrating the effect, usually by
deflecting a magnetic needle. This innovation quickly lead to the
"electromagnetic telegraph," in which a telegrapher operates a key to switch
current, which causes a needle or flag to deflect at the other end of the
circuit. This was tremendously simpler than previous means of indicating
current and was applied almost immediately to build the first practical
telegraphs. During the 1830s, the invention of the relay allowed telegraph
signals to be repeated or amplified as the potential weakened (the origin of
the term "relay"). Edward Davy, one of the inventors of the relay, also
invented the telegraph recorder.
From 1830 to 1850, so many people invented so many telegraph systems that it is
difficult to succinctly describe how an early practical telegraph worked. There
were certain themes: for non-recording systems, a needle was often deflected
one way or the other by the presence or absence of current, or perhaps by
polarity reversal. Sometimes the receiver would strike a bell or sound a buzzer
with each change. In recording systems, a telegraph printer or telegraph
recorder embossed a hole or left a small mark on a paper tape that advanced
through the device. In the first case, the receiving operator would watch the
needle, interpreting messages as they came. In the second case, the operator
could examine the paper tape at their leisure, interpreting the message based
on the distances between the dots.
Recording systems tended to be used for less time-sensitive operations like
passing telegrams between cities, while non-recording telegraphs were used for
more real-time applications like railroad dispatch and signaling. Regardless,
it is important to understand that the teleprinter is about as old as the
telegraph. Many early telegraphs recorded received signals onto paper.
The interpretation of telegraph signals was as varied as the equipment that
carried them. Samuel Morse popularized the telegraph in the United States based
in part on his alphabetic code, but it was not the first. Gauss famously
devised a binary encoding for alphabetic characters a few years earlier, which
resembles modern character encodings more than Morse's scheme. In many telegraph
applications, though, there was no alphabetic code at all. Railroad signal
telegraphs, for example, often used application-specific schemes that encoded
types of trains and routes instead of letters.
Morse's telegraph system was very successful in the United States, and in 1861
a Morse telegraph line connected the coasts. It surprises some that a
transcontinental telegraph line was completed some fifty years before the
transcontinental telephone line. Telegraphy is older, though, because it is
simpler. There is no analog signaling involved; simple on/off or polarity
signals can be amplified using simple mechanical relays. The tendency to
view text as more complex than voice (SMS came after the first cellphones,
for one) has more to do with the last 50 years than the 50 years before.
The Morse telegraph system was practical enough to spawn a large industry, but
suffered a key limitation: the level of experience required to key and copy
Morse quickly and reliably is fairly high. Telegraphers were skilled and, thus,
fairly well paid and sometimes in short supply [1]. To drive down the cost of
telegraphy, there would need to be more automation.
Many of the earliest telegraph designs had employed parallel signaling. A
common scheme was to provide one wire for each letter, and a common return.
These were impractical to build over any meaningful distance, and Morse's
one-wire design (along with one-wire designs by others) won out for obvious
reasons. The idea of parallel signaling stayed around, though, and was
reintroduced during the 1840s with a simple form of multiplexing: one "logical
channel" for each letter could be combined onto one wire using time division
muxing, for example by using a transmitter and receiver with synchronized
spinning wheels. Letters would be presented by positions on the wheel, and a
pulse sent at the appropriate point in the revolution to cause the teleprinter
to produce that letter. With this alphabetic teleprinter, an experienced
operator was no longer required to receive messages. They appeared as text on a
strip of paper, ready for an unskilled clerk to read or paste onto a message
card.
This system proved expensive but still practical to operate, and a network of
such alphabetic teleprinters was built in the United States during the mid 19th
century. A set of smaller telegraph companies operating one such system, called
the Hughes system after its inventor, joined together to become the Western
Union Telegraph Company. In a precedent that would be followed even more closely
by the telephone system, practical commercial telegraphy was intertwined with a
monopoly.
The Hughes system was functional but costly. The basic idea of multiplexing
across 30 channels was difficult to achieve with mechanical technology. Émile
Baudot was employed by the French telegraph service to find a way to better
utilize telegraph lines. He first developed a proper form of multiplexing,
using synchronized switches to combine five Hughes system messages onto one
wire and separate them again at the other end. Likely inspired by his close
inspection of the Hughes system and its limitations, Baudot went on to develop
a more efficient scheme for the transmission of alphabetic messages: the Baudot
code.
Baudot's system was similar to the Hughes system in that it relied on a
transmitter and receiver kept in synchronization to interpret pulses as
belonging to the correct logical channel. He simplified the design, though, by
allowing for only five logical channels. Instead of each pulse representing a
letter, the combination of all five channels would be used to form one symbol.
The Baudot code was a five-bit binary alphabetic encoding, and most computer
alphabetic encodings to the present day are at least partially derived from it.
One of the downsides of Baudot's design is that it was not quite as easy to
operate as telegraphy companies would hope. Baudot equipment could keep up 30
words per minute with a skilled operator who could work the five-key
piano-style keyboard in good synchronization with the mechanical armature that
read it out. This took a great deal of practice, though, and pressing keys out
of synchronization with the transmitter could easily cause incorrect letters to
be sent.
In 1901, during the early days of the telephone, Donald Murray developed an
important enhancement to the Baudot system. He was likely informed by an older
practice that had been developed for Morse telegraphs, of having an operator
punch a Morse message into paper tape to be transmitted by a simple tape reader
later. He did the same for Baudot code: he designed a device with an easy to
use typewriter-like keyboard that punched Baudot code onto a strip of paper
tape with five rows, one for each bit. The tape punch had no need to be
synchronized with the other end, and the operator could type at whatever pace
they were comfortable.
The invention of Murray's tape punch brought about the low-cost telegram
networks that we are familiar with from the early 20th century. A clerk would
take down a message and then punch it onto paper tape. Later, the paper tape
would be inserted into a reader that transmitted the Baudot message in perfect
synchronization with the receiver, a teleprinter that typed it onto tape as
text once again. The process of encoding and decoding messages for the
telegraph was now fully automated.
The total operation of the system, though, was not. For one, the output was
paper tape, that had to be cut and pasted to compose a paragraph of text.
For another, the transmitting and receiving equipment operated continuously,
requiring operators to coordinate on the scheduling of sending messages (or
they would tie up the line and waste a lot of paper tape).
In a wonderful time capsule of early 20th century industrialism, the next major
evolution would come about with considerable help from the Morton Salt Company.
Joy Morton, its founder, agreed to fund Frank Pearne's efforts to develop an
even more practical printing telegraph. This device would use a typewriter
mechanism to produce the output as normal text on a page, saving considerable
effort by clerks. Even better, it would use a system of control codes to
indicate the beginning and end of messages, allowing a teleprinter to operate
largely unattended. This was more complex than it sounded, as it required
finding a way for the two ends to establish clock synchronization before the
message.
There were, it turned out, others working on the same concept. After a series
of patent disputes, mergers, and negotiations, the Morkrum-Kleinschmidt Company
would market this new technology. A fully automated teleprinter, lurching into
life when the other end had a message to send, producing pages of text like a
typewriter with an invisible typist.
In 1928, Morkrum-Kleinschmidt adopted a rather more memorable name: the
Teletype Corporation. During the development of the Teletype system, the
telephone network had grown into a nationwide enterprise and one of the United
States' largest industrial ventures (at many points in time, the country's
single largest employer). AT&T had already entered the telegraph business by
leasing its lines for telegraph use, and work had already begun on telegraphs
that could operate over switched telephone lines, transmitting text as if it
were a phone call. The telephone was born of the telegraph but came to consume
it. In 1930, the Teletype Corporation was purchased by AT&T and became part of
Western Electric.
That same year, Western Electric introduced the Teletype Model 15. Receiving
Baudot at 45 baud [2] with an optional tape punch and tape reader, the Model 15
became a workhorse of American communications. By some accounts, the Model 15
was instrumental in the prosecution of World War II. The War Department made
extensive use of AT&T-furnished teletype networks and Model 15 teleprinters as
the core of the military logistics enterprise. The Model 15 was still being
manufactured as late as 1963, a production record rivaled by few other
electrical devices.
It is difficult to summarize the history of the networks that teleprinters
enabled. The concept of switching connections between teleprinters, as was done
on the phone network, was an obvious one. The dominant switched teleprinter
network was Telex, not really an organization but actually a set of standards
promulgated by the ITU. The most prominent US implementation of Telex was an
AT&T service called TWX, short for Teletypewriter Exchange Service. TWX used
Teletype teleprinters on phone lines (in a special class of service), and was
a very popular service for business use from the '40s to the '70s.
Incidentally, TWX was assigned the special purpose area codes 510, 610, 710,
810, and 910, which contained only teleprinters. These area codes would
eventually be assigned to other uses, but for a long time ranked among the
"unusual" NPAs.
Western Union continued to develop their telegraph network during the era of
TWX, acting in many ways as a sibling or shadow of AT&T. Like AT&T, Western
Union developed multiplexing schemes to make better use of their long-distance
telegraph lines. Like AT&T, Western Union developed automatic switching systems
to decrease operator expenses. Like AT&T, Western Union built out a microwave
network to increase the capacity of their long-haul network. Telegraphy is one
of the areas where AT&T struggled despite their vast network, and Western Union
kept ahead of them, purchasing the TWX service from AT&T. Western Union would
continue to operate the switched teleprinter network, under the Telex name,
into the '80s when it largely died out in favor of the newly developed fax
machine.
During the era of TWX, encoding schemes changed several times as AT&T and
Western Union developed better and faster equipment (Western Union continued to
make use of Western Electric-built Teletype machines among other equipment).
ASCII came to replace Baudot, and so a number of ASCII teleprinters existed.
There were also hybrids. For some time Western Union operated teleprinters on
an ASCII variant that provided only upper case letters and some punctuation,
with the benefit of requiring fewer bits. The encoding and decoding of this
reduced ASCII set was implemented by the Bell 101 telephone modem, designed in
1958 to allow SAGE computers to communicate with one another and then widely
included in TWX and Telex teleprinters. The Bell 101's descendants would bring
about remote access to time-sharing computer systems and, ultimately, one of
the major forms of long-distance computer networking.
You can see, then, that the history of teleprinters and the history of
computers are naturally interleaved. From an early stage, computers operated
primarily on streams of characters. This basic concept is still the core of
many modern computer systems and, not coincidentally, also describes the
operation of teleprinters.
When electronic computers were under development in the 1950s and 1960s,
teleprinters were near the apex of their popularity as a medium for business
communications. Most people working on computers probably had experience with
teleprinters; most organizations working on computers already had a number of
teleprinters installed. It was quite natural that teleprinter technology would
be repurposed as a means of input and output for computers.
Some of the very earliest computers, for example those of Konrad Zuse, employed
punched tape as an input medium. These were almost invariably repurposed or
modified telegraphic punched tape systems, often in five-bit Baudot.
Particularly in retrospect, as more materials have become available to
historians, it is clear that much of the groundwork for digital computing was
laid by WWII cryptological efforts.
Newly devised cryptographic machines like the Lorenz ciphers were essentially
teleprinters with added digital logic. The machines built to attack these
codes, like Colossus, are now generally recognized as the first programmable
computers. The line between teleprinter and computer was not always clear. As
more encoding and control logic was added, teleprinters came to resemble simple
computers.
The Manchester Mark I, a pioneer of stored-program computing built in 1949,
used a 5-bit code adopted from Baudot by none other than Alan Turing. The major
advantage of this 5-bit encoding was, of course, that programs could be read
and written using Baudot tape and standard telegraph equipment. The addition of
a teleprinter allowed operators to "interactively" enter instructions into the
computer and read the output, although the concept of a shell (or any other
designed user interface) had not yet been developed. EDSAC, a contemporary of
the Mark I and precursor to a powerful tea logistics system that would set off
the development of business computing, also used a teleprinter for input and
output.
Many early commercial computers limited input and output to paper tape, often
5-bit for Baudot or 8-bit for ASCII with parity, as in the early days of
computing preparation of a program was an exacting process that would not
typically be done "on the fly" at a keyboard. It was, of course, convenient
that teleprinters with tape punches could be used to prepare programs for entry
into the computer.
Business computing is most obviously associated with IBM, a company that had
large divisions building both computers and typewriters. The marriage of the
two was inevitable considering the existing precedent. Beginning around 1960 it
was standard for IBM computers to furnish a teleprinter as the operator
interface, but IBM had a distinct heritage from the telecommunications industry
and, for several reasons, was intent on maintaining that distinction. IBM's
teleprinter-like devices were variously called Data Communications Systems,
Printer-Keyboards, Consoles, and eventually Terminals. They generally operated
over proprietary serial channels.
Other computer manufacturers didn't have typewriter divisions, and typewriters
and teleprinters were actually rather complex mechanical devices and not all
that easy to build. As a result, they tended to buy teleprinters from
established manufacturers, often IBM or Western Electric. Consider the case of
a rather famous non-IBM computer, the DEC PDP-1 of 1960. It came with a CRT
graphics display as standard, and many sources will act as if this was the
primary operator interface, but it is important to understand that early CRT
graphics displays had a hard time with text. Text is rather complex to render
when you are writing point-by-point to a CRT vector display from a rather slow
machine. You would be surprised how many vertices a sentence has in it.
So despite the ready availability of CRTs in the 1960s (they were, of course,
well established in the television industry), few computers used them for
primary text input/output. Instead, the PDP-1 was furnished with a modified IBM
typewriter as its console. This scheme of paying a third-party company (Soroban
Engineering) to modify IBM typewriters for teleprinter control was apparently
not very practical, and later DEC PDP models tended to use Western Electric
Teletypes as user terminals. These had the considerable advantage that they
were already designed to operate over long telephone circuits, making it easy
to install multiple terminals throughout a building for time sharing use.
Indeed, time sharing was a natural fit for teleprinter terminals. With a
teleprinter and a computer with a suitable modem, you could "call in" to a time
sharing computer over the telephone from a remote office. Most of the first
practical "computer networks" (term used broadly) were not actually networks of
computers, but a single computer with many remote terminals. This architecture
evolved into the BBS and early Internet-like services such as CompuServe. The
idea was surprisingly easy to implement once time sharing operating systems
were developed; the necessary hardware was already available from Western
Electric.
While I cannot swear to the accuracy of this attribution, many sources suggest
that the term "tty" as a generic reference to a user terminal or serial I/O
channel originated with DEC. It seems reasonable; DEC's software was very
influential on the broader computer industry, particularly outside of IBM.
UNIX originally targeted a PDP-11 with teleprinters. While I can't prove it, it
seems quite believable that the tty terminology was adopted directly from RT-11
or another operating system that Bell Labs staff might have used on the PDP-11.
Computers were born of the teleprinter and would inevitably come to consume
them. After all, what is a computer but a complex teleprinter? Today,
displaying text and accepting it from a keyboard is among the most basic
functions of computers, and computers continue to perform this task using an
architecture that would be familiar to engineers in the 1970s. They would
likely be more surprised by what hasn't changed than what has: many of us still
spend a lot of time in graphical software pretending to be a video display
terminal built for compatibility with teleprinters.
And we're still using that 7-bit ASCII code a lot, aren't we. At least Baudot
died out and we get to enjoy lower case letters.
[1] Actor, singer, etc. Gene Autry had worked as a telegrapher before he began
his career in entertainment. This resulted in no small number of stories of a
celebrity stand-in at the telegraph office. Yes, this is about to be a local
history anecdote. It is fairly reliably reported that Gene Autry once
volunteered to stand in for the telegrapher and station manager at the small
Santa Fe Railroad station in Socorro, New Mexico, as the telegrapher had been
temporarily overwhelmed by the simultaneous arrival of a packed train and a
series of telegrams. There are enough of these stories about Gene that I think
he really did keep his Morse sharp well into his acting career.
[2] Baud is a somewhat confusing unit derived from Baudot. Baud refers to the
number of symbols per second on the underlying communication medium. For simple
binary systems (and thus many computer communications systems we encounter
daily), baud rate is equivalent to bit rate (bps). For systems that employ
multi-level signaling, the bit rate will be higher than the baud rate, as
multiple bits are represented per symbol on the wire. Methods like QAM are
useful because they result in bit rates that are many multiples of the baud
rate, reducing the bandwidth on the wire.
In the past (in fact two years ago, proof I have been doing this for a while
now!) I wrote
about
the "inconvenient truth" that structural aspects of the Internet make truly
decentralized systems infeasible, due to the lack of a means to perform
broadcast discovery. As a result, most distributed systems rely on a set of
central, semi-static nodes to perform initial introductions.
For example, Bitcoin relies on a small list of volunteer-operated domain names
that resolve to known-good full nodes. Tor similarly uses a small set of
central "directory servers" that provide initial node lists. Both systems have
these lists hardcoded into their clients; coincidentally, both have nine
trusted, central hostnames.
This sort of problem exists in basically all distributed systems that operate
in environments where it is not possible to shout into the void and hope for a
response. The internet, for good historic reasons, does not permit this kind of
behavior. Here we should differentiate between distributed and decentralized,
two terms I do not tend to select very carefully. Not all distributed systems
are decentralized, indeed, many are not. One of the easiest and most practical
ways to organize a distributed system is according to a hierarchy. This is a
useful technique, so there are many examples, but a prominent and old one
happens to also be part of the drivetrain mechanics of the internet: DNS, the
domain name system.
My reader base is expanding and so I will provide a very brief bit of
background. Many know that DNS is responsible for translating human-readable
names like "computer.rip" into the actual numerical addresses used by the
internet protocol. Perhaps a bit fewer know that DNS, as a system, is
fundamentally organized around the hierarchy of these names. To examine the
process of resolving a DNS name, it is sometimes more intuitive to reverse
the name, and instead of "computer.rip", discuss "rip.computer" [1].
This name is hierarchical, it indicates that the record "computer" is within
the zone "rip". "computer" is itself a zone and can contain yet more records,
we tend to call these subdomains. But the term "subdomain" can be confusing
as everything is a subdomain of something, even "rip" itself, which in a
certain sense is a subdomain of the DNS root "." (which is why, of course,
a stricter writing of the domain name computer.rip would be computer.rip.,
but as a culture we have rejected the trailing root dot).
Many of us probably know that each level of the DNS hierarchy has authoritative
nameservers, operated typically by whoever controls the name (or their
third-party DNS vendor). "rip" has authoritative DNS servers provided by a
company called Rightside Group, a subsidiary of the operator of websites like
eHow that went headfirst into the great DNS land grab and snapped up "rip" as a
bit of land speculation, alongside such attractive properties as "lawyer" and
"navy" and "republican" and "democrat", all of which I would like to own the
"computer" subdomain of, but alas such dictionary words are usually already
taken.
"computer.rip", of course, has authoritative nameservers operated by myself or
my delegate. Unlike some people I know, I do not have any nostalgia for BIND,
and so I pay a modest fee to a commercial DNS operator to do it for me. Some
would be surprised that I pay for this; DNS is actually rather inexpensive to
operate and authoritative name servers are almost universally available as a
free perk from domain registrars and others. I just like to pay for this on the
general feeling that companies that charge for a given service are probably
more committed to its quality, and it really costs very little and changing it
would take work.
To the observant reader, this might leave an interesting question. If even the
top-level domains are subdomains of a secret, seldom-seen root domain ".", who
operates the authoritative name servers for that zone?
And here we return to the matter of even distributed systems requiring central
nodes. Bitcoin uses nine harcoded domain names for initial discovery of
decentralized peers. DNS uses thirteen harcoded root servers to establish the
top level of the hierarchy.
These root servers are commonly referred to as a.root-servers.net through
m.root-servers.net, and indeed those are their domain names, but remember that
when we need to use those root servers we have no entrypoint into the DNS
hierarchy and so are not capable of resolving names. The root servers are much
more meaningfully identified by their IP addresses, which are "semi-harcoded"
into recursive resolves in the form of what's often called a root hints file.
You can download a copy, it's a
simple file in BIND zone format that BIND basically uses to bootstrap its
cache.
And yes, there are other DNS implementations too, a surprising number of them,
even in wide use. But when talking about DNS history we can mostly stick to
BIND. BIND used to stand for Berkeley Internet Name Domain, and it is an apt
rule of thumb in computer history that anything with a reference to UC Berkeley
in the name is probably structurally important to the modern technology
industry.
One of the things I wanted to get at, when I originally talked about central
nodes in distributed systems, is the impact it has on trust and reliability.
The TOR project is aware that the nine directory servers are an appealing
target for attack or compromise, and technical measures have been taken to
mitigate the possibility of malicious behavior. The Bitcoin project seems to
mostly ignore that the DNS seeds exist, but of course the design of the Bitcoin
system limits their compromise to certain types of attacks. In the case of DNS,
much like most decentralized systems, there is a layer of long-lived caching
for top-level domains that mitigates the impact of unavailability of the root
servers, but still, in every one of these systems, there is the possibility of
compromise or unavailability if the central nodes are attacked.
And so there is always a layer of policy. A trusted operator can never
guarantee the trustworthiness of a central node (the node could be compromised,
or the trusted operator could turn out to be the FBI), but it sure does help.
Tor's directory servers are operated by the Tor project. Bitcoin's DNS seeds
are operated by individuals with a long history of involvement in the project.
DNS's root nodes are operated by a hodgepodge of companies and institutions
that were important to the early internet.
Verisign operates two, of course. A California university operates one, of
course, but amusingly not Berkeley. Three are operated by various arms of US
defense. Some internet industry associations, an NCC, another university, ICANN
runs one of them themselves. It's pretty random, though, and just reflects a
set of organizations prominently involved in the early internet.
Some people, even some journalists I've come across, hear that there are 13 name
servers and picture 13 4U boxes with a lot of blinking lights in heavily
fortified data centers. Admittedly this description was more or less accurate
in the early days, and a couple of the smaller root server operators did have
single machines until surprisingly recently. But today, all thirteen root
server IP addresses are anycast groups.
Anycast is not a concept you run into every day, because it's not really useful
on local networks where multicast can be used. But it's very important to the
modern internet. The idea is this: an IP address (really a subnetwork) is
advertised by multiple BGP nodes. Other BGP nodes can select the advertisement
they like the best, typically based on lowest hop count. As a user, you connect
to a single IP address, but based on the BGP-informed routing tables of
internet service providers your traffic could be directed to any number of
sites. You can think of it as a form of load balancing at the IP layer, but it
also has the performance benefit of users mostly connecting to nearby nodes, so
it's widely used by CDNs for multiple reasons.
For DNS, though, where we often have a bootstrapping problem to solve, anycast
is extremely useful as a way to handle "special" IP addresses that are used
directly. For authoritative DNS servers like 192.5.5.241 [2001:500:2f::f] [2]
(root server F) or recursive resolvers like 8.8.8.8 [2001:4860:4860::8888]
(Google public DNS), anycast is the secret that allows a "single" address to
correspond to a distributed system of nodes.
So there are thirteen DNS root servers in the sense that there are thirteen
independently administered clusters of root servers (with the partial exception
of A and J, both operated by Verisign, due to their acquisition of former A
operator Network Solutions). Each of the thirteen root servers is, in practice,
a fairly large number of anycast sites, sometimes over 100. The root server
operators don't share much information about their internal implementation, but
one can assume that in most cases the anycast sites consist of multiple servers
as well, fronted by some sort of redundant network appliance. There may only be
thirteen of them, but each of the thirteen is quite robust. For example, the
root servers typically place their anycast sites in major internet exchanges
distributed across both geography and provider networks. This makes it unlikely
that any small number of failures would seriously affect the number of
available sites. Even if a root server were to experience a major failure due
to some sort of administration problem, there are twelve more.
Why thirteen, you might ask? No good reason. The number of root servers
basically grew until the answer to an NS request for "." hit the 512 byte limit
on UDP DNS responses. Optimizations over time allowed this number to grow
(actually using single letters to identify the servers was one of these
optimizations, allowing the basic compression used in DNS responses to collapse
the matching root-servers.net part). Of course IPv6 blew DNS response sizes
completely out of the water, leading to the development of the EDNS extension
that allows for much larger responses.
13 is no longer the practical limit, but with how large some of the 13 are, no
one sees a pressing need to add more. Besides, can you imagine the political
considerations in our modern internet environment? The proposed operator would
probably be Cloudflare or Google or Amazon or something and their motives would
never be trusted. Incidentally, many of the anycast sites for root server F
(operated by ISC) are Cloudflare data centers used under agreement.
We are, of course, currently trusting the motives of Verisign. You should never
do this! But it's been that way for a long time, we're already committed. At
least it isn't Network Solutions any more. I kind of miss when SRI was running
DNS and military remote viewing.
But still, there's something a little uncomfortable about the situation.
Billions of internet hosts depend on thirteen "servers" to have any functional
access to the internet.
What if someone attacked them? Could they take the internet down? Wouldn't this
cause a global crisis of a type seldom before seen? Should I be stockpiling DNS
records alongside my canned water and iodine pills?
Wikipedia contains a great piece of comedic encyclopedia writing. In its
article on the history of attacks on DNS root servers, it mentions the time, in
2012, that some-pastebin-user-claiming-to-be-Anonymous (one of the great
internet security threats of that era) threatened to "shut the Internet down".
"It may only last one hour, maybe more, maybe even a few days," the statement
continues. "No matter what, it will be global. It will be known."
That's the end of the section. Some Wikipedia editor, no doubt familiar with
the activities of Anonymous in 2012, apparently considered it self-evident that
the attack never happened.
Anonymous may not have put in the effort, but others have. There have been
several apparent DDoS attacks on the root DNS servers. One, in 2007, was
significant enough that four of the root servers suffered---but there were nine
more, and no serious impact was felt by internet users. This attack, like most
meaningful DDoS, originated with a botnet. It had its footprint primarily in
Korea, but C2 in the United States. The motivation for the attack, and who
launched it, remains unknown.
There is a surprisingly large industry of "booters," commercial services that,
for a fee, will DDoS a target of your choice. These tend to be operated by
criminal groups with access to large botnets; the botnets are sometimes bought
and sold and get their tasking from a network of resellers. It's a competitive
industry. In the past, booters and botnet operators have sometimes been
observed announcing a somewhat random target and taking it offline as,
essentially, a sales demonstration. Since these demonstrations are a known
behavior, any time a botnet targets something important for no discernible
reason, analysts have a tendency to attribute it to a "show of force." I have
little doubt that this is sometimes true, but as with the tendency to attribute
monumental architecture to deity worship, it might be an overgeneralization of
the motivations of botnet operators. Sometimes I wonder if they made a mistake,
or maybe they were just a little drunk and a lot bored, who is to say?
The problem with this kind of attribution is evident in the case of the other
significant attack on the DNS root servers, in 2015. Once again, some root
servers were impacted badly enough that they became unreliable, but other root
servers held on and there was little or even no impact to the public. This
attack, though, had some interesting properties.
In the 2007 incident, the abnormal traffic to the root servers consisted of
large, mostly-random DNS requests. This is basically the expected behavior of a
DNS attack; using randomly generated hostnames in requests ensures that the
responses won't be cached, making the DNS server exert more effort. Several
major botnet clients have this "random subdomain request" functionality built
in, normally used for attacks on specific authoritative DNS servers as a way to
take the operator's website offline. Chinese security firm Qihoo 360, based on
a large botnet honeypot they operate, reports that this type of DNS attack was
very popular at the time.
The 2015 attack was different, though! Wikipedia, like many other websites,
describes the attack as "valid queries for a single undisclosed domain name and
then a different domain the next day." In fact, the domain names were
disclosed, by at least 2016. The attack happened on two days. On the first day,
all requests were for 336901.com. The second day, all requests were for
916yy.com.
Contemporaneous reporting is remarkably confused on the topic of these domain
names, perhaps because they were not widely known, perhaps because few
reporters bothered to check up on them thoroughly. Many sources make it sound
like they were random domain names perhaps operated by the attacker, one goes
so far as to say that they were registered with fake identities.
Well, my Mandarin isn't great, and I think the language barrier is a big part
of the confusion. No doubt another part is a Western lack of familiarity with
Chinese internet culture. To an American in the security industry, 336901.com
would probably look at first like the result of a DGA or domain generation
algorithm. A randomly-generated domain used specifically to be evasive. In
China, though, numeric names like this are quite popular. Qihoo 360 is, after
all, domestically branded as just 360---360.cn.
As far as I can tell, both domains were pretty normal Chinese websites related
to mobile games. It's difficult or maybe impossible to tell now, but it seems
reasonable to speculate that they were operated by the same company. I would
assume they were something of a gray market operation, as there's a huge
intersection between "mobile games," "gambling," and "target of DDoS attacks."
For a long time, perhaps still today in the right corners of the industry, it
was pretty routine for gray-market gambling websites to pay booters to DDoS
each other.
In a 2016 presentation, security researchers from Verisign (Weinberg and
Wessels) reported on their analysis of the attack based on traffic observed at
Verisign root servers. They conclude that the traffic likely originated from
multiple botnets or at least botnet clients with different configurations,
since the attack traffic can be categorized into several apparently different
types [3]. Based on command and control traffic from a source they don't disclose
(perhaps from a Verisign honeynet?), they link the attack to the common
"BillGates" [4] botnet. Most interestingly, they conclude that it was probably
not intended as an attack on the DNS root: the choice of fixed domain names
just doesn't make sense, and the traffic wasn't targeted at all root servers.
Instead, they suspect it was just what it looks like: an attack on the two
websites the packets queried for, that for some reason was directed at the root
servers instead of the authoritative servers for that second-level domain.
This isn't a good strategy; the root servers are a far harder target than your
average web hosting company's authoritative servers. But perhaps it was a
mistake? An experiment to see if the root server operators might mitigate the
DDoS by dropping requests for those two domains, incidentally taking the
websites offline?
Remember that Qihoo 360 operates a large honeynet and was kind enough to
publish a presentation on their analysis of root server attacks. Matching
Verisign's conclusions, they link the attack to the BillGates botnet, and also
note that they often observe multiple separate botnet C2 servers send tasks
targeting the same domain names. This probably reflects the commercialized
nature of modern botnets, with booters "subcontracting" operations to multiple
botnet operators. It also handily explains Verisign's observation that the 2015
attack traffic seems to have come from more than one implementation a DNS DDoS.
360 reports that, on the first day, five different C2 servers tasked bots with
attacking 336901.com. On the second day, three C2 servers tasked for 916yy.com.
But they also have a much bigger revelation: throughout the time period of the
attacks, they observed multiple tasks to attack 916yy.com using several
different methods.
360 concludes that the 2015 DNS attack was most likely the result of a
commodity DDoS operation that decided to experiment, directing traffic at the
DNS roots instead of the authoritative server for the target to see what would
happen. I doubt they thought they'd take down the root servers, but it seems
totally reasonable that they might have wondered if the root server operators
would filter DDoS traffic based on the domain name appearing in the requests.
Intriguingly, they note that some of the traffic originated with a DNS attack
tool that had significant similarities to BillGates but didn't produce quite
the same packets. Likely we will never know, but a likely explanation is that
some group modified the BillGates DNS attack module or implemented a new one
based on the method used by BillGates.
Tracking botnets gets very confusing very fast, there are just so many
different variants of any major botnet client! BillGates originated, for
example, as a Linux botnet. It was distributed to servers, not only through SSH
but through vulnerabilities in MySQL and ElasticSearch. It was unusual, for a
time, in being a major botnet that skipped over the most common desktop
operating system. But ports of BillGates to Windows were later observed,
distributed through an Internet Explorer vulnerability---classic Windows. Why
someone chose to port a Linux botnet to Windows instead of using one of the
several popular Windows botnets (Conficker, for example) is a mystery. Perhaps
they had spent a lot of time building out BillGates C2 infrastructure and, like
any good IT operation, wanted to simplify their cloud footprint.
High in the wizard's tower of the internet, thirteen elders are responsible for
starting every recursive resolver on its own path to truth. There's a whole
Neal Stephenson for Wired article there. But in practice it's a large and
robust system. The extent of anycast routing used for the root DNS servers, to
say nothing of CDNs, is one of those things that challenges are typical stacked
view of the internet. Geographic load balancing is something we think of at
high layers of the system, it's surprising to encounter it as a core part of a
very low level process.
That's why we need to keep our thinking flexible: computers are towers of
abstraction, and complexity can be added at nearly any level, as needed or
convenient. Seldom is this more apparent than it is in any process called
"bootstrapping." Some seemingly simpler parts of the internet, like DNS, rely
on a great deal of complexity within other parts of the system, like BGP.
Now I'm just complaining about pedagogical use of the OSI model again.
[1] The fact that the DNS hierarchy is written from right-to-left while it's
routinely used in URIs that are otherwise read left-to-right is one of those
quirks of computer history. Basically an endianness inconsistency. Like
American date order, to strictly interpret a URI you have to stop and reverse
your analysis part way through. There's no particular reason that DNS is like
that, there was just less consistency over most significant first/least
significant first hierarchical ordering at the time and contemporaneous network
protocols (consider the OSI stack) actually had a tendency towards least
significant first.
[2] The IPv4 addresses of the root servers are ages old and mostly just a
matter of chance, but the IPv6 addresses were assigned more recently and
allowed an opportunity for something more meaningful. Reflecting the long
tradition of identifying the root servers by their letter, many root server
operators use IPv6 addresses where the host part can be written as the single
letter of the server (i.e. root server C at [2001:500:2::c]). Others chose a
host part of "53," a gesture at the port number used for DNS (i.e. root server
J, [2001:7fe::53]). Others seem more random, Verisign uses 2:30 for both of
their root servers (i.e. root server A, [2001:503:ba3e::2:30]), so maybe that
means something to them, or maybe it was just convenient. Amusingly, the only
operator that went for what I would call an address pun is the Defense
Information Systems Agency, which put root server G at [2001:500:12::d0d].
[3] It really dates this story that there was some controversy around the
source IPs of the attack, originating with none other than deceased security
industry personality John McAfee. He angrily insisted that it was not plausible
that the source IPs were spoofed. Of course botnets conducting DDoS attacks via
DNS virtually always spoof the source IP, as there are few protections in place
(at the time almost none at all) to prevent it. But John McAfee has always had
a way of ginning up controversy where none was needed.
[4] Botnets are often bought, modified, and sold. They tend to go by various
names from different security researchers and different variants. I'm calling
this one "BillGates" because that's the funniest of the several names used for
it.
Last time, we left off at the fact that modern films are distributed with their
audio in multiple formats. Most of the time, there is a stereo version of the
audio, and a multi-channel version of the audio that is perhaps 5.1 or 7.1 and
compressed using one of several codecs that were designed within the film
industry for this purpose.
But that was all about film, in physical form. In the modern world, films go
out to theaters in the form of Digital Cinema Packages, a somewhat elaborate
format that basically comes down to an encrypted motion JPEG 2000 stream with
PCM audio. There are a lot of details there that I don't know very well and I
don't want to get hung up on anyway, because I want to talk about the consumer
experience.
As a consumer, there are a lot of ways you get movies. If you are a weirdo, you
might buy a Blu-Ray disc. Optical discs are a nice case, because they tend to
conform to a specification that allows relatively few options (so that players
are reasonable to implement). Blu-Ray are allowed to encode their audio as
linear PCM [1], Dolby Digital, Dolby TrueHD, DTS, DTS-HD, or DRA.
DRA is a common standard in the Chinese market but not in the US (that's where
I live), so I'll ignore it. That still leaves three basic families of codecs,
each of which have some variations. One of the interesting things about the
Blu-Ray specification is that PCM audio can incorporate up to eight channels.
The Blu-Ray spec allows up to 27,648 Kbps of audio, so it's actually quite
feasible to do uncompressed, 24-bit, 96 kHz, 7.1 audio on a Blu-Ray disc. This
is an unusual capability in a consumer standard and makes the terribly named
Blu-Ray High Fidelity Pure Audio standard for Blu-Ray audio discs make more
sense. Stick a pin in that, though, because you're going to have a tough time
actually playing uncompressed 7.1.
On the other hand, you might use a streaming service. There's about a million
of those and half of them have inane names ending in Plus, so I'm going to
simplify by pretending that we're back in 2012 and Netflix is all that really
matters. We can infer from Netflix help articles that Netflix delivers audio
as AAC or Dolby Digital.
Or, consider the case of video files that you obtained by legal means. I looked
at a few of the movies on my NAS to take a rough sampling. Most older films,
and some newer ones, have stereo AAC audio. Some have what VLC describes as A52
aka AC3. A/52 is an ATSC standard that is equivalent to AC3, and AC-3 (hyphen
inconsistent) is sort of the older name of Dolby Digital or the name of the
underlying transport stream format, depending on how you squint at it. Less
common, in my hodgepodge sample, is DTS, but I can find a few.
VLC typically describes the DTS and Dolby Digital as 3F2M/LFE, which is a
somewhat eccentric (and I think specific to VLC) notation for 5.1 surround. An
interesting detail is that VLC differentiates 3F2M/LFE and 3F2R/LFE, both 5.1,
but with the two "surround" channels assigned to either side or rear positions.
While 5.1 configurations with the surround channels to the side seem to be more
standard, you could potentially put the two surround channels to the rear. Some
formats have channel mapping metadata that can differentiate the two.
Because there is no rest for the weary, there is some inconsistency between
"5.1 side" and "5.1 rear" in different standards and formats. At the end of the
day, most applications don't really differentiate. I tend to consider surround
channels on the side to be "correct," in that movie theaters are configured
that way and thus it's ostensibly the design target for films. One of few true
specifications I could find for general use, rather than design standards
specific to theaters like THX, is ITU-R BS 775. It states that the surround
channels of a 5.1 configuration should be mostly to the side, but slightly
behind the listener.
That digression aside, it's unsurprising that a video file could contain a
multi-channel stream. Most video containers today can support basically
arbitrary numbers of streams, and you could put uncompressed multichannel audio
into such a container if you wanted. And yet, multi-channel audio in films
almost always comes in the form of a Dolby Digital or DTS stream. Why is that?
Well, in part, because of tradition: they used to be the formats used by
theaters, although digital cinema has somewhat changed that situation and the
consumer versions have usually been a little different in the details. But the
point stands, films are usually mastered in Dolby or DTS, so the "home video"
release goes out with Dolby or DTS.
Another reason, though, is the problem of interconnections.
Let's talk a bit about interconnections. In a previous era of consumer audio,
the age of "hi-fi," component systems dominated residential living rooms. In a
component system, you had various audio sources that connected to a device that
came to be known as a "receiver" since it typically had an FM/AM radio receiver
integrated. It is perhaps more accurate to refer to it as an amplifier since
that's the main role it serves in most modern systems, but there's also an
increasing tendency to think of their input selection and DSP features as part
of a preamp. The device itself is sometimes referred to as a preamp, in
audiophile circles, when component amplifiers are used to drive the actual
speakers. You can see that in these conventional component systems you need to
move audio signals between devices. This kind of set up, though, is not common
in households with fewer than four bathrooms and one swimming pool.
Most consumers today seem to have a television and, hopefully, some sort of
audio device like a soundbar. Sometimes there are no audio interconnections at
all! Often the only audio interconnection is from the TV to the soundbar via
HDMI. Sometimes it's wireless! So audio interconnects as a topic can feel a
touch antiquated today, but these interconnects still matter a lot in practice.
First, they are often either the same as something used in industry or similar
to something used in industry. Second, despite the increasing prevalence of 5.1
and 7.1 soundbar systems with wireless satellites, the kind of people with a
large Blu-Ray collection are still likely to have a component home theater
system. Third, legacy audio interconnects don't die that quickly, because a lot
of people have an older video game console or something that they want to work
with their new TV and soundbar, so manufacturers tend to throw in one or two
audio interconnects even if they don't expect most consumers to use them.
So let's think about how to transport multi-channel audio. An ancient tradition
in consumer audio says that stereo audio will be sent between components on two
sets of two-conductor cables terminated by RCA connectors. The RCA connector
dates back to to the Radio Corporation of America and, apparently, at least
1937. It remains in widespread service today. There are a surprising number of
variations in this interconnect, in practice.
For one, the audio cables may be coaxial or just zipped up in a common jacket.
Coaxial audio cables are a bit more expensive and a lot less flexible but admit
less noise. There is a lot of confusion in this area because a particular
digital transport we'll talk about later specified coaxial cables terminated in
RCA connectors, but then is frequently used with non-coaxial cables terminated
in RCA connectors, and for reasonable lengths usually still works fine. This has
lead to a lot of consumer confusion and people thinking that any cable with RCA
connectors is coaxial, when in fact, most of them are not. Virtually all of them
are not. Unless you specifically paid more money to get a coaxial one, it's not,
and even then sometimes it's not, because Amazon is a hotbed of scams.
Second, though these connections are routinely described as "line level" as if
that means something, there is remarkably little standardization of the actual
signaling. There are various conventions like 1.7v peak-to-peak and 2v
peak-to-peak and about 1v peak-to-peak, and few consumer manufacturers bother
to tell you which convention they have followed. There are also a surprising
number of ways of expressing signaling levels, involving different measurement
bases (peak vs RMS) and units (dBv vs dBu), making it a little difficult to
interpret specifications when they are provided. This whole mess is just one of
the reasons you find yourself having to make volume adjustments for different
sources, or having to tune input levels on receivers with that option [2].
But that's all sort of a tangent, the point here is multi-channel audio. You
could, conceptually, move 5.1 over six RCA cables, or 7.1 over eight RCA
cables. Home theater receivers used to give you this option, but much like
analog HDTV connections, it has largely disappeared.
There is one other analog option: remember Pro Logic, from the film
soundtracks? that matrixed five channels into the analog stereo? Some analog
formats like VHS and LaserDisc often had a Pro Logic soundtrack that could be
"decoded" (really dematrixed) by a receiver with that capability, which used to
be common. In this case you can transport multi-channel audio over your normal
two RCA cables. The matrixing technique was always sort of cheating, though,
and produces inferior results to actual multichannel interconnects. It's no
longer common either.
Much like video, audio interconnects today have gone digital. Consumer digital
audio really took flight with the elegantly named Sony/Philips Digital
Interface, or S/PDIF. S/PDIF specifies a digital format that is extremely
similar to, but not quite the same as, a professional digital interconnect
called AES3. AES3 is typically carried on a three-conductor (balanced) cable
with XLR connectors, though, which are too big an expensive for consumer
equipment. In one of the weirder decisions in the history of consumer
electronics, one that I can only imagine came out of an intractable political
fight, S/PDIF specified two completely different physical transports: one
electrical, and one optical.
The electrical format should be transmitted over a coaxial cable with RCA
connectors. In practice it is often used over non-coaxial cables with RCA
connectors, which will usually work fine if the length is short and nothing
nearby is too electrically noisy. S/PDIF over non-coaxial cables is "fine" in
the same way that HDMI cables longer than you are tall are "fine." If it
doesn't work reliably, try a more expensive cable and you'll probably be good.
The optical format is used with cheap plastic optical cables terminated in a
square connector called Toslink, originally for Toshiba Link, after the
manufacturer that gave us the optical variant. Toslink is one of those great
disappointments in consumer products. Despite the theoretical advantages of an
optical interconnect, the extremely cheap cables used with Toslink mean it's
mostly just worse than the electrical transport, especially when it comes to
range [3].
But the oddity of S/PDIF's sibling formats isn't the interesting thing here.
Let's talk about the actual S/PDIF bitstream, the very-AES3-like format the
audio actually needs to get through.
S/PDIF was basically designed for CDs, and so it comfortably carries CD audio:
two channels of 16 bit samples at 44.1kHz. In fact, it can comfortably go
further, carrying 20 (or with the right equipment even 24) bit samples at the
48 kHz sampling rate more common of digital audio other than CDs. That's for
two channels, though. Make the leap to six channels for 5.1 and you are well
beyond the capabilities of an S/PDIF transceiver.
You see where this is going? compression.
See, the problems that Dolby Digital and DTS solved, of fitting multichannel
audio onto the limited space of a 35mm film print, also very much exist in the
world of S/PDIF. CDs brought us uncompressed digital audio remarkably early on,
but also set sort of a constraint on the bitrate of digital audio streams that
ensured the opposite in the world of multi-channel theatrical sound. It sort of
makes sense, anyway. DTS soundtracks came on CDs!
Of course even S/PDIF is looking rather long in the tooth today. I don't think
I use it at all any more, which is not something I expected to be saying this
soon. Today, though, all of my audio sources and sinks are either analog or
have HDMI. HDMI is the de facto norm for consumer digital audio today.
HDMI is a complex thing when it comes to audio or, really, just about anything.
Details like eARC and the specific HDMI version have all kinds of impacts on
what kind of audio can be carried, and the same is true for video as well. I am
going to spare a lengthy diversion into the many variants of HDMI, which seem
almost as numerous as those of USB, and talk about HDMI 2.1.
Unsurprisingly, considering the numerous extra conductors and newer line
coding, HDMI offers a lot more bandwidth for audio than S/PDIF. In fact, you
can transport 8 channels of uncompressed 24-bit PCM at 192kHz. That's about
37 Mbps, which is not that fast for a data transport but sure is pretty fast
for an audio cable. Considering the bandwidth requirements for 4K video at
120Hz, though, it's only a minor ask. With HDMI, compression of audio is no
longer necessary.
But we still usually do it.
Why? Well, basically everything can handle Dolby Digital or DTS, and so films
are mostly mastered to Dolby Digital or DTS, and so we mostly use Dolby Digital
or DTS. That's just the way of things.
One of the interesting implications of this whole thing is that audio stacks
have to deal with multiple formats and figure out which format is in use.
That's not really new, with Dolby Pro Logic you either had to turn it on/off
with a switch or the receiver had to try to infer whether or not Pro Logic had
been used to matrix a multichannel soundtrack to stereo. For S/PDIF, IEC 61937
standardizes a format that can be used to encapsulate a compressed audio stream
with sufficient metadata to determine the type of compression. HDMI adopts the
same standard to identify compressed audio streams (and, in general, HDMI audio
is pretty much in the same bitstream format as good old S/PDIF, but you can
have a lot more of it).
In practice, there are a lot of headaches around this format switching. For
one, home theater receivers have to switch between decoding modes. They mostly
do this transparently and without any fuss, but I've owned a couple that had
occasional issues with losing track of which format was in use, leading to
dropouts. Maybe related to signal dropouts but my current receiver has the same
problem with internal sources, so it seems more like a software bug of some
sort.
It's a lot more complicated when you get out of dedicated home theater devices,
though. Consider the audio stack of a general-purpose operating system. First,
PCs rarely have S/PDIF outputs, so we are virtually always talking about HDMI.
For a surprisingly long time, common video cards had no support for audio over
HDMI. This is fortunately a problem of the past, but unfortunately ubiquitous
audio over HDMI means that your graphics drivers are now involved in the
transport of audio, and graphics drivers are notoriously bad at reliably
producing video, much less dealing with audio as a side business. I shudder to
think of the hours of my life I have lost dealing with defects of AMD's DTS
support.
Things are weird on the host software side, though. The operating system
does not normally handle sound in formats even resembling Dolby Digital or DTS.
So, when you play a video file with audio encoded in one of those formats, a
"passthrough" feature is typically used to deliver the compressed stream
directly to the audio (often actually video) device, without normal operating
system intervention. We are reaching the point where this mostly just works
but you will still notice some symptoms of the underlying complexity.
On Linux, it's possible to get this working, but in part because of licensing
issues I don't think any distros will do it right out of the box. My knowledge
may be out of date as I haven't tried for some time, but I am still seeing Kodi
forum threads about bash scripts to bypass PulseAudio, so things seem mostly
unchanged.
There are other frustrations, as well. For one, the whole architecture of
multichannel audio interconnection is based around sinks detecting the mode
used by the source. That means that your home theater receiver should figure
out what your video player is doing, but your video player has no idea what
your home theater receiver is doing. This manifests in maddening ways.
Consider, for example, the number of blog posts I ran across (while searching
for something else!) about how to make Netflix less quiet by disabling surround
sound.
If Netflix has 5.1 audio they deliver it; they don't know what your speaker
setup is. But what if you don't have 5.1 speakers? In principal you could
downmix the 5.1 back to stereo, and a lot of home theater receivers have DSP
modes that do this (and in general downmix 5.1 or 7.1 to whatever speaker
channels are active, good for people with less common setups like my own 3.1).
But you'd have to turn that on, which means having a receiver or soundbar or
whatever that is capable, understanding the issue, and knowing how to enable
that mode. That is way more than your average Netflix watcher wants to think
about any of this. In practice, setting the Netflix player to only ever provide
stereo audio is an easier fix.
The use of compressed multichannel formats that are decoded in the receiver
rather than the computer playing back introduces other problems as well, like
source equalization. If you have a computer connected to a home theater
receiver (which is a ridiculous thing to do and yet here I am), you have two
completely parallel audio stacks: "normal" audio that passes through the OS
sound server and goes to the receiver as PCM, and "surround sound" that
bypasses the OS sound server and goes to the receiver as Dolby Digital or DTS.
It is very easy to have differences in levels, adjustments, latency, etc.
between these two paths. The level problem here is just one of the several
factors in the perennial "Plex is too quiet" forum threads [4].
Finally, let's talk about what may be, to some readers, the elephant in the
room. I keep talking about Dolby Digital and DTS, but both are 5.1 formats,
and 5.1 is going out of fashion in the movie world. Sure, there's Dolby
Digital Plus which is 7.1, but it's so similar to the non-plus variant that
there isn't much use in addressing them separately. Insert the "Plus" after
Dolby Digital in the proceeding paragraphs if it makes you feel better.
But there are two significantly different formats appearing on more and more
film releases, especially in the relatively space-unconstrained Blu-Ray
versions: lossless surround sound and object-based surround sound.
First, lossless is basically what it sounds like. Dolby TrueHD and DTS-HD are
both formats that present 7.1 surround with only lossless compression, at the
cost of a higher bitrate than older media and interconnects support. HDMI can
easily handle these, and if you have a fairly new setup of a Blu-Ray player
and recent home theater receiver connected by HDMI you should be able to
enjoy a lossless digital soundtrack on films that were released with one.
That's sort of the end of that topic, it's nothing that revolutionary.
But what about object-based surround sound? I'm using that somewhat lengthy
term to try to avoid singling out one commercial product, but, well, there's
basically one commercial product: Dolby Atmos. Atmos is heralded as a
revolution in surround sound in a way that makes it sort of hard to know what
it actually is. Here's the basic idea: instead of mastering a soundtrack by
mixing audio sources into channels, you master a soundtrack by specifying the
physical location (in cartesian coordinates) of each sound source.
When the audio is played back, an Atmos decoder then mixes the audio into
channels on the fly, using whatever channels are available. Atmos allows the
same soundtrack to be used by theaters with a variety of different speaker
configurations, and as a result, makes it practical for theaters to expand
into much higher channel counts.
Theaters aren't nearly as important a part of the film industry as they used
to be, though, and unsurprisingly Atmos is heavily advertised for consumer
equipment as well. How exactly does that work?
Atmos is conveyed on consumer equipment as 7.1 Dolby Digital Plus or Dolby
TrueHD with extra metadata.
If you know anything about HDR video, also known as SDR video with extra
metadata, you will find this unsurprising. But some might be confused. The
thing is, the vast majority of consumers don't have Atmos equipment, and
with lossless compression soundtracks are starting to get very large so
including two complete copies isn't very appealing. The consumer encoding of
Atmos was selected to have direct backward compatibility to 7.1 systems,
allowing normal playback on pre-Atmos equipment.
For Atmos-capable equipment, an extra PCM-like subchannel (at a reduced bitrate
compared to the audio channels) is used to describe the 3D position of specific
sound sources. Consumer Atmos decoders cannot support as many objects as the
theatrical version, so part of the process of mastering an Atmos film for home
release is clustering nearby objects into groups that are then treated as a
single object by the consumer Atmos decoder. One way to think about this is
that Atmos is downmixed to 7.1, and in the process a metadata stream is created
that can be used to upmix back to Atmos mostly correctly. If it sounds kind of
like matrix encoding it kind of is, in effect, which is perhaps part of why
Dolby's marketing materials are so insistent that it is not matrix encoding.
To be fair it is a completely different implementation, but has a similar
effect of reducing the channel separation compared to the original source.
Also I don't think Atmos has really taken off in home setups? I might just be
out of date here, I think half the soundbars on the market today claim Atmos
support and amazing feats with their five channels two of which are pointed up.
I'm just pretty skeptical of the whole "we have made fewer, smaller speakers
behave as if they were more, bigger speakers" school of audio products. Sorry
Dr. Bose, there's just no replacement for displacement.
[1] The term Linear PCM or LPCM is used to clarify that no companding has been
performed. This is useful because PCM originated for the telephone network,
which uses companding as standard. LPCM clarifies that neither μ-law companding
nor A-law companding has been performed. I will mostly just use PCM because I'm
talking about movies and stuff, where companding digital audio is rare.
[2] There is also the matter of magnetic sources like turntables and
microphones that produce much lower output levels than a typical "line level."
Ideally you need a preamplifier with adjustable gain for these, although in the
case of turntables there are generally accepted gain levels for the two common
types of cartridges. A lot of preamplifiers either let you choose from those
two or give you no control at all. Traditionally a receiver would have a
built-in preamplifier to bring up the level of the signal on the turntable
inputs, but a lot of newer receivers have left this out to save money, which
leads to hipsters with vinyl collections having to really crank the volume.
[3] I don't feel like I should have to say this, but in the world of audio, I
probably do: if it works, it doesn't matter! The problem with optical is that
it develops reliability problems over shorter lengths than the electrical
format. If you aren't getting missing samples (dropouts) in the audio, though,
it's working fine and changing around cables isn't going to get you anything.
In practice the length limitations on optical don't tend to matter very much
anyway, since the average distance between two pieces of a component home
theater system is, what, ten inches?
[4] Among the myriad other factors here is the more difficult problem that
movies mix most of the dialog into the center channel while most viewers don't
have a center channel. That means you need to remix the center channel into
left and right to recover dialog. So-called professionals mastering Blu-Ray
releases don't always get this right, and you're in even more trouble if you're
having to do it yourself.
Stereophonic or two-channel audio is so ubiquitous today that we tend to refer
to all kinds of pieces of consumer audio reproduction equipment as "a stereo."
As you might imagine, this is a relatively modern phenomenon. While stereo
audio in concept dates to the late 19th century, it wasn't common in consumer
settings until the 1960s and 1970s. Those were very busy decades in the music
industry, and radio stations, records, and film soundtracks all came to be
distributed primarily in stereo.
Given the success of stereo, though, one wonders why larger numbers of channels
have met more limited success. There are, as usual, a number of factors. For
one, two-channel audio was thought to be "enough" by some, considering that
humans have two ears. Now it doesn't quite work this way in practice,
and we are more sensitive to the direction from which sound comes than our
binaural system would suggest. Still, there are probably diminishing returns,
with stereo producing the most notable improvement in listening experience over
mono.
There are also, though, technical limitations at play. The dominant form of
recorded music during the transition to stereo was the vinyl record. There is a
fairly straightforward way to record stereo on a record, by using a cartridge
with coils on two opposing axes. This is the limit, though: you cannot add
additional channels as you have run out of dimensions in the needle's free
movement.
This was probably the main cause of the failure of quadraphonic sound, the
first music industry attempt at pushing more channels. Introduced almost
immediately after stereo in the 1970s, quadraphonic or four-channel sound
seemed like the next logical step. It couldn't really be encoded on records, so
a matrix encoding system was used in which the front-rear difference was
encoded as phase shift in the left and right channels. In practice this system
worked poorly, and especially early quadraphonic systems could sound noticeably
worse than the stereo version. Wendy Carlos, an advocate of quadraphonic sound
but harsh critic of musical electronics, complained bitterly about the
inferiority of so-called quadraphonic records when compared to true
four-channel recordings, for example on tape.
Of course, four-channel tape players were vastly more expensive than record
players in the 1970s, as they ironically remain today. Quadraphonic sound was
in a bind: it was either too expensive or too poor of quality to appeal to
consumers. Quadraphonic radio using the same matrix encoding, while
investigated by some broadcasters, had its own set of problems and never
saw permanent deployment. Alan Parsons famously produced Pink Floyd's
"Dark Side of the Moon" in quadraphonic sound; the effort was a failure in
several ways but most memorably because, by the time of the album's release
in 1973, the quadraphonic experiment was essentially over.
Three-or-more-channel-sound would have its comeback just a few years later,
though, by the efforts of a different industry. Understanding this requires
backtracking a bit, though, to consider the history of cinema prints.
Many are probably at least peripherally aware of Cinerama, an eccentric-seeming
film format that used three separate cameras, and three separate projectors, to
produce an exceptionally widescreen image. Cinerama's excess was not limited to
the picture: it involved not only the three 35mm film reels for the three
screen panels, but also a fourth 35mm film that was entirely coated with a
magnetic substrate and was used to store seven channels of audio. Five channels
were placed behind the screen, effectively becoming center, left, right, left
side, and right side. The final two tracks were played back behind the audience,
as the surround left and surround right.
Cinerama debuted in 1952, decades before 35mm films would typically carry even
stereo audio. Like quadraphonic sound later, Cinerama was not particularly
successful. By the time stereo records were common, Cinerama had been replaced
by wider film formats and anamorphic formats in which the image was
horizontally compressed by the lens of the camera, and expanded by the lens of
the projector. Late Cinerama films like 2001: A Space Odyssey were actually
filmed Super Panavision 70 and projected onto Cinerama screens from a single
projector with a specialized lens.
There's a reason people talk so much about Cinerama, though. While it was not a
commercial success, it was influential on the film industry to come. Widescreen
formats, mostly anamorphic, would become increasingly common in the following
decades. It would take years longer, but so would seven-channel theatrical
sound.
"Surround sound," as these multi-channel formats came to be known in the late
'50s, would come and go in theatrical presentations throughout the mid-century
even as the vast majority of films were presented monaurally, with only a
single channel. Most of these relied on either a second 35mm reel for audio
only, or the greater area for magnetic audio tracks allowed by 70mm film. Both
of these options were substantially more expensive for the presenting theater
than mono, limiting surround sound mostly to high-end theaters and premiers.
For surround sound to become common, it had to become cheap.
1971's A Clockwork Orange (I will try not to fawn over Stanley Kubrick too
much but you are learning something about my film preferences here) employed a
modest bit of audio technology, something that was becoming well established in
the music industry but was new to film. The magnetic recordings used during the
production process employed Dolby Type A noise reduction, similar to what
became popular on compact cassette tapes, for a slight improvement in audio
quality. The film was still mostly screened in magnetic mono, but it was the
beginning of a profitable relationship between Dolby Labs and the film
industry. Over the following years a number of films were released with Dolby
Type A noise reduction on the actual distribution print, and some theaters
purchased decoders to use with these prints. Dolby had bigger ambitions,
though.
Around the same time, Kodak had been experimenting with the addition of stereo
audio to 35mm release prints, using two optical tracks. They applied Dolby
noise reduction to these experimental prints, and brought Dolby in to consult.
This presented the perfect opportunity to implement an idea Dolby had been
considering. Remember the matrix encoded quadraphonic recording that had been a
failure for records? Dolby licensed a later-generation matrix decoder design
from Sansui, and applied it to Kodak's stereo film soundtracks, allowing
separation into four channels. While the music industry had placed the four
channels at the four corners of the soundstage, the film industry had different
tastes, driven mostly by the need to place dialog squarely in the center of the
field. Dolby's variant of quadraphonic audio was used to present left, right,
center, and a "surround" or side channel. This audio format went through
several iterations, including much improved matrix decoding, and along the way
picked up a name that is still familiar today: Dolby Stereo.
That Dolby Stereo is, in fact, a quadraphonic format reflects a general
atmosphere of terminological confusion in the surround sound industry. Keep
this in mind.
One of Dolby Stereo's most important properties was its backwards
compatibility. The two optical tracks could be played back on a two-channel (or
actually stereo) system and still sound alright. They could even be placed on
the print alongside the older magnetic mono audio, providing compatibility with
mono theaters. This compatibility with fewer channels became one of the most
important traits in surround sound systems, and somewhat incidentally served to
bring them to the consumer. Since the Dolby Stereo soundtrack played fine on a
two-channel system, home releases of films on formats like VHS and Laserdisc
often included the original Dolby Stereo audio from the print. A small industry
formed around these home releases, licensing the Dolby technology to sell
consumer decoders that could recover surround sound from home video.
For cost reasons these decoders were inferior to Dolby's own in several ways,
and to avoid the hazard of damage to the Dolby Stereo brand, Dolby introduced a
new marketing name for consumer Dolby Stereo decoders: Dolby Surround.
By the 1980s, Dolby Stereo, or Dolby Surround, had become the most common audio
format on theatrical presentations and their home video releases. Even some
television programs and direct-to-video material was recorded in Dolby
Surround. Consumer stereo receivers, in the variant that came to be known as
the home theater receiver, often incorporated Dolby Surround decoders.
Improvements in consumer electronics brought the cost of proper Dolby Stereo
decoders down, and so the home systems came to resemble the theatrical systems
as well. Seeking a new brand to unify the whole mess of Dolby Stereo and Dolby
Surround (which, confusingly, were often 4 and 3 channel, respectively), Dolby
seems to have turned to the "Advanced Logic" and "Full Logic" terms once used
by manufacturers of quadraphonic decoders. Dolby's theatrical sound solution
came to be known as Dolby Pro Logic. A Dolby Pro Logic decoder processed two
audio channels to produce a four-channel output. According to a modern naming
convention, Dolby Pro Logic is a 4.0 system: four full-bandwidth channels.
This entire thing, so far, has been a preamble to the topic I actually meant to
discuss. It's an interesting preamble, though! I just want to apologize that I
didn't mean to write a history of multi-channel audio distribution and so this
one isn't especially complete. I left out a number of interesting attempts at
multi-channel formats, of which the film industry produced a surprising number,
and instead focused on the ones that were influential and/or used for Kubrick
films [1].
Dolby Pro Logic, despite its impressive name, was still an analog format, based
on an early '70s technique. Later developments would see an increase in the
number of channels, and the transition to digital audio formats.
Recall that 70mm film provided six magnetic audio channels, which were often
used in an approximation of the seven-channel Cinerama format. Dolby
experimented with the six-channel format, though, confusingly also under the
scope of the Dolby Stereo product. During the '70s, Dolby observed that the
ability of humans to differentiate the source of a sound is significantly
reduced as the sound becomes lower in frequency. This had obvious potential
for surround sound systems, enabling something analogous to chroma subsampling
in video. The lower-frequency component of surround sound does not need to be
directional, and for a sense of directionality the high frequencies are most
important.
Besides, bassheads were coming to the film industry. The long-used Academy
response curve fell out of fashion during the '70s, in part due to Dolby's
work, in part due to generally improved loudspeaker technology, and in part
due to the increasing popularity of bass-heavy action films. Several 70mm
releases used one or more of the audio channels as dedicated bass channels.
For the 1979 film Apocalypse Now in its 70mm print, Dolby premiered a 5.1
format in which three full-bandwidth channels were used for center, left,
and right, two channels with high-pass filtering were used for surround left
and surround right, and one channel with low-pass filtering was used for bass.
Apocalypse Now was not, in fact, the first film to use this channel
configuration, but Dolby promoted it far more than the studios had.
Interestingly, while I know less about live production history, the famous
cabaret Moulin Rouge apparently used a 5.1 configuration during the 1980s.
Moulin Rouge was prominent enough to give the 5.1 format a boost in popularity,
perhaps particularly important because of the film industry's indecision on
audio formats.
The seven-channel concept of the original Cinerama must have hung around in the
film industry, as there was continuing interest in a seven-channel surround
configuration. At the same time, the music industry widely adopted
eight-channel tape recorders for studio use, making eight-channel audio
equipment readily available. The extension to 7.1 surround, adding left and
right side channels to the 5.1 configuration, was perhaps obvious. Indeed,
what I find strangest about 7.1 is just how late it was introduced to film.
Would you believe that the first film released (not merely remastered or mixed
for Blu-Ray) in 7.1 was 2010's Toy Story 3?
7.1 home theater systems were already fairly common by then, a notable example
of a modern trend afflicting the film industry: the large installed base and
cost avoidance of the theater industry means that consumer home theater
equipment now evolves more quickly than theatrical systems. Indeed, while
7.1 became the gold standard in home theater audio during the 2000s, 5.1
remains the dominant format in theatrical sound systems today.
Systems with more than eight channels are now in use, but haven't caught on in
the consumer setting. We'll talk about those later. For most purposes,
eight-channel 7.1 surround sound is the most complex you will encounter in home
media. The audio may take a rather circuitous route to its 7.1 representation,
but, well, we'll get to that.
Let's shift focus, though, and talk a bit about the actual encodings. Audio
systems up to 7.1 can be implemented using analog recording, but numerous
analog channels impose practical constraints. For one, they are physically
large, making it infeasible to put even analog 5.1 onto 35mm prints. Prestige
multi-channel audio formats like that of IMAX often avoided this problem by
putting the audio onto an entirely separate film reel (much like Cinerama back
at the beginning), synchronized with the image using a pulse track and special
equipment. This worked well but drove up costs considerably. Dolby Stereo
demonstrated that it was possible to matrix four channels into two channels
(with limitations), but considering the practical bandwidth of the magnetic or
optical audio tracks on film you couldn't push this technique much further.
Remember that the theatrical audio situation changed radically during the
1970s, going from almost universal mono audio to four channels as routine and
six channels for premiers and 70mm. During the same decade, the music
reproduction industry, especially in Japan, was exploring another major
advancement: digital audio encoding.
In 1980, the Compact Disc launched. Numerous factors contributed to the rapid
success of CDs over vinyl and, to a lesser but still great extent, the compact
cassette. One of them was the quality of the audio reproduction. CDs were a
night and day change: records could produce an excellent result but almost
always suffered from dirt and damage. Cassette tapes were better than most
of us remember but still had limited bandwidth and a high noise floor,
requiring Dolby noise reduction for good results. The CD, though, provided
lossless digital audio.
Audio is encoded on an audio CD in PCM format. PCM, or pulse code modulation,
is a somewhat confusing term that originated in the telephone industry. If we
were to reinvent it today, we would probably just call it digital modulation.
To encode a CD, audio is sampled (at 44.1 kHz for historic reasons) and
quantized to 16 bits. A CD carries two channels, stereo, which was by then
the universal format for music. Put together, those add up to 1.4Mbps. This
was a very challenging data rate in 1980, and indeed, practical CD players
relied on the fact that the data did not need to be read perfectly (error
correcting codes were used) and did not need to be stored (going directly
to a digital to analog converter). These were conveniently common traits
of audio reproduction systems, and the CD demonstrated that digital audio
was far more practical than the computing technology of the time would
suggest.
The future of theatrical sound would be digital. Indeed, many films would
be distributed with their soundtracks on CD.
There remained a problem, though: a CD could encode two channels. Even four
channels wouldn't fit within the data rate CD equipment was capable of, much
less six or eight. The film industry would need to formats that could encode
six or eight channels of audio into either the bandwidth of a two-channel
signal or into precious unused space on 35mm film prints.
Many ingenious solutions were developed. A typical 35mm film print today
contains three distinct representations of the audio: a two-channel optical
signal outside of the sprocket holes (which could encode Dolby Stereo), a
continuous 2D barcode between the frame and sprocket holes which carries the
SDDS (Sony Dynamic Digital Sound) digital signal, and individual 2D barcodes
between the sprocket holes which encode the Dolby digital signal. Finally, a
small pulse pattern at the very edge of the film provides a time code used for
synchronization with audio played back from a CD, the DTS system.
But then, a typical 35mm film print today wouldn't exist, as 35mm film
distribution has all but disappeared. Almost all modern film is played back
entirely digitally from some sort of flexible stream container. You would
think, then, that the struggles of encoding multi-channel audio are over. Many
media container formats can, after all, contain an arbitrary number of audio
channels.
Nothing is ever so simple. Much like a dedicated audio reel adds cost, multiple
audio channels inflate file sizes, media cost, and in the era of playback from
optical media, could stress the practical read rate. Besides, constraints of
the past have a way of sticking around. Every multichannel audio format to find
widespread success in the film industry has done so by maintaining backwards
compatibility with simple mono and stereo equipment. That continues to be true
today: modern multi-channel digital audio formats are still mostly built as
extensions of an existing stereo encoding, not as truly new arbitrary-channel
formats.
At the same time, the theatrical sound industry has begun a transition away
from channel-centric audio formats and towards a more flexible system that is
much further removed from the actual playback equipment.
Another trend has emerged since 1980 as well, which you probably already
suspected from the multiple formats included in 35mm prints. Dolby's supremacy
in multi-channel audio was never as complete as I made it sound, although they
did become (and for some time remained) the most popular surround sound
solution. They have always had competition, and that's still true today. Just
as 35mm prints came with the audio in multiple formats, current digitally
distributed films often do as well.
In Part 2, I'll get to the topic I meant to write about today before I got
distracted by history: the landscape of audio formats included in digitally
distributed films and common video files today, and some of the ways they
interact remarkably poorly with computers. We're going to talk about:
Dolby Digital/AC-3/AC-4
DTS
Dolby Atmos
MPEG Surround/MPEG-H 3D
HDMI (ugh)
And more!
Postscript: Film dweebs will of course wonder where George Lucas is in this
story. His work on the Star Wars trilogy lead to the creation of THX, a company
that will long be remembered for its distinctive audio identity. The odd thing
is that THX was never exactly a technology company, although it was closely
involved in sound technology developments of the time. THX was essentially a
certification agency: THX theaters installed equipment by others (Altec
Lansing, for much of the 20th century), and used any of the popular
multi-channel audio formats.
To be a THX-certified theater, certain performance requirements had to be met,
regardless of the equipment and format in use. THX certification requirements
included architectural design standards for theaters, performance
specifications for audio equipment, and a specific crossover configuration
designed by Lucasfilm.
In 2002, Lucasfilm spun out THX and it essentially became a rental brand,
shuffled into the ownership of gamer headphone manufacturer Razer today.
THX certification still pops up in some consumer home theater equipment but
is no longer part of the theatrical audio industry.
[1] Incidentally, Kubrick did not adapt to Dolby Stereo. Despite his early
experience with Dolby noise reduction, all of his films would be released
in mono except for 2001 (six-channel audio only in the Cinerama release)
and Eyes Wide Shut (edited in Dolby Stereo after Kubrick's death).
Previously on Deep Space Nine, I wrote that "the mid-2000s were an unsettled
time in mobile computing."
Today, I want to share a little example. Over the last few weeks, for various
personal reasons, I have been doing a lot of reading about embedded operating
systems and ISAs for embedded computing. Things like the NXP TriMedia (Harvard
architecture!) and pSOS+ (ran on TriMedia!). As tends to happen, I kept coming
across references to a device that stuck in my memory: the TacNet Tracker. It
prominently features on Wikipedia's list of applications for the popular
VxWorks real-time operating system.
It's also an interesting case study in the mid-2000s field of mobile computing,
especially within academia (or at least the Department of Energy). You see,
"mobile computing" used to be treated as a field of study, a subdiscipline
within computer science. Mobile devices imposed practical constraints, and they
invited more sophisticated models of communication and synchronization than
were used with fixed equipment. I took a class on mobile computing in my
undergraduate, although it was already feeling dated at the time.
Today, with the ubiquity of smartphones, "mobile computing" is sort of the
normal kind. Perhaps future computer science students will be treated to a
slightly rusty elective in "immobile computing." The kinds of strange
techniques you use when you aren't constrained by battery capacity. Busy
loop to blink the cursor!
Sometime around 2004, Sandia National Laboratory's 6452 started work on the
TacNet Tracker. The goal: to develop a portable computer device that could be
used to exchange real-time information between individuals in a field
environment. A presentation states that an original goal of the project was to
use COTS (commercial, off-the-shelf) hardware, but it was found to be
infeasible. Considering the state of the mobile computing market in 2004, this
isn't surprising. It's not necessarily that there weren't mobile devices
available; if anything, the opposite. There were companies popping up with
various tablets fairly regularly, and then dropping them two years later. You
can find any number of Windows XP tablets; but the government needed something
that could be supported long-term. That perhaps explains the "Life-cycle
limitations" bullet point the presentation wields against COTS options.
The only products with long-term traction were select phones and PDAs like the
iPaq and Axim. Even this market collapsed almost immediately with the release
of the iPhone, although Sandia engineers wouldn't have known that would come.
Still, the capabilities and expandability of these devices were probably too
limited for the Tracker's features. There's a reason all those Windows XP
tablets existed. They weighed ten pounds, but they were beefy enough to run the
data entry applications that were the major application of commercial mobile
computing at the time.
The TacNet Tracker, though, was designed to fit in a pocket and to
incorporate geospatial features. Armed with a Tracker, you could see the
real-time location of other Tracker users on a map. You could even annotate
the map, marking points and lines, and share these annotations with others.
This is all very mundane today! At the time, though, it was an obvious and yet
fairly complex application for a mobile device.
The first question, of course, is of architecture. The Tracker was built around
the XScale PXA270 SoC. XScale, remember, was Intel's marketing name for their
ARMv5 chips manufactured during the first half of the '00s. ARM was far less
common back then, but was already emerging as a leader in power-efficient
devices. The PXA270 was an early processor to feature speed-stepping, decreasing
its clock speed when under low load to conserve power.
The PXA270 was attached to 64MB of SDRAM and 32MB of flash. It supported more
storage on CompactFlash, had an integrated video adapter, and a set of UARTs
that, in the Tracker, would support a serial interface, a GPS receiver, and
Bluetooth.
A rechargeable Li-Poly pack allowed the Tracker to operate for "about 4 hours,"
but the presentation promises 8-12 hours in the future. Battery life was a huge
challenge in this era. It probably took about as long to charge as it did to
discharge, too. There hadn't been much development in high-rate embedded battery
chargers yet.
The next challenge was communication. 802.11 WiFi was achieving popularity by
this time, but suffered from a difficult and power-intensive association
process even more than it does today. Besides, in mobile applications like
those the Tracker was intended for, conventional WiFi's requirement for
network infrastructure was impractical. Instead, Sandia turned to Motorola.
The Tracker used a PCMCIA WMC6300 Pocket PC MEA modem. MEA stands for "Mesh
Enabled Architecture," which seems to have been the period term for something
Motorola later rebranded as MOTOMESH.
Marketed primarily for municipal network and public safety applications,
MOTOMESH is a vaguely 802.11-adjacent proprietary radio protocol that provides
broadband mesh routing. One of the most compelling features of MEA and MOTOMESH
is its flexibility: MOTOMESH modems will connect to fixed infrastructure nodes
under central management, but they can also connect directly to each other,
forming ad-hoc networks between adjacent devices. 802.11 itself was
conceptually capable of the same, but in practice, the higher-level software to
support this kind of use never really emerged. Motorola offered a complete
software suite for MOTOMESH, though, and for no less than Windows CE.
Yes, it really enforces the period vibes that the user manual for the WMC6300
modem starts by guiding you through using Microsoft ActiveSync to transfer the
software to an HP iPaq. One did not simply put files onto a mobile device at the
time; you had to sync them. Microsoft tried to stamp out an ecosystem of
proprietary mobile device sync protocols with ActiveSync. Ultimately none of
them would really see much use, PDAs were always fairly niche.
Sandia validated performance of the Tracker's MEA modem using an Elektrobit
Propsim C2. I saw one of these at auction once (possibly the same one!), and
sort of wish I'd bid on it. It's a chunky desktop device with a set of RF ports
and the ability to simulate a wide variety of different radio paths between
those ports, introducing phenomena like noise, fading, and multipath that will
be observed in the real world. The results are impressive: in a simulated hilly
environment, Trackers could exchange a 1MB test image in just 13.6 seconds.
Remember that next time you are frustrated by LTE; we really take what we have
today for granted.
But what of the software? Well, the Tracker ran VxWorks. Actually, that's how I
ran into it: it seems that Wind River (developer of VxWorks) published a
whitepaper about the Tracker, which made it onto a list of featured
applications, which was the source a Wikipedia editor used to flesh out the
article. Unfortunately I can't find the original whitepaper, only dead links to
it. I'm sure it would have been a fun read.
VxWorks is a real-time operating system mostly used in embedded applications.
It supports a variety of architectures, provides a sophisticated process
scheduler with options for hard real-time and opportunistic workloads, offers
network, peripheral bus, and file system support, and even a POSIX-compliant
userspace. It remains very popular for real-time control applications today,
although I don't think you'd find many UI-intensive devices like the Tracker
running it. A GUI framework is actually a fairly new feature.
The main application for the Tracker was a map, with real-time location and
annotation features. It seems that a virtual whiteboard and instant messaging
application were also developed. A charmingly cyberpunk Bluetooth wrist-mounted
display was pondered, although I don't think it was actually made.
But what was it actually for?
Well, federal R&D laboratories have a tendency to start a project for one
application and then try to shop it around to others, so the materials Sandia
published present a somewhat mixed message. A conference presentation suggests
it could be used to monitor the health of soldiers in-theater (an extremely
frequent justification for grants in mobile computing research!), for
situational awareness among security or rescue forces, or for remote control of
weapons systems.
I think a hint comes, though, from the only concrete US government application
I can find documented: in 2008, Sandia delivered the TacNet Tracker system to
the DoE Office of Secure Transportation (OST). OST is responsible for the
over-road transportation of nuclear weapons and nuclear materials in the United
States. Put simply, they operate a fleet of armored trucks and accompanying
security escorts. There is a fairly long history, back to at least the '70s, of
Sandia developing advanced radio communications systems for use by OST convoys.
Many of these radio systems seemed ahead of their time or at least state of the
art, but they often failed to gain much traction outside of DoE. Perhaps this
relates to DoE culture, perhaps to the extent to which private contractors have
captured military purchasing.
Consider, for example, that Sandia developed a fairly sophisticated digital HF
system for communication between OST convoys and control centers. It seemed
rather more advanced than the military's ALE solution, but a decade or so later
OST dropped it and went to using ALE like everyone else (likely for
interoperability with the large HF ALE networks operated by the FBI and CBP for
domestic security use, although at some point the DoE itself also procured its
own ALE network). A whole little branch of digital HF technology that just sort
of fizzled out in the nuclear weapons complex. There's a lot of things like
that, it's what you get when you put an enormous R&D capability into a
particularly insular and secretive part of the executive branch.
Sandia clearly hoped to find other applications for the system. A 2008 Sandia
physical security manual for nuclear installations recommends that security
forces consider the TacNet Tracker as a situational awareness solution. It was
pitched for several military applications. It's a little hard to tell because
the name "TacNet" is a little too obvious, but it doesn't seem that the Sandia
device ever gained traction in the military.
As it does with many technical developments that don't go very far, Sandia
licensed the technology out. A company called Homeland Integrated Security
Systems (HISS) bought it, a very typical name for a company that sells licensed
government technology. HISS partnered with a UK-based company called Arcom to
manufacture the TacNet Tracker as a commercial product, and marketed it to
everyone from the military to search and rescue teams.
HISS must have found that the most popular application of the Tracker was asset
tracking. It makes sense, the Tracker device itself lacked a display, under the
assumption that it would be in a dock or used with an accessory body-worn
display. By the late 2000s, HISS had rebranded the TacNet Tracker as the
CyberTracker, and re-engineered it around a Motorola iDEN board. I doubt they
actually did much engineering on this product, it seems to have been pretty
much an off-the-shelf Motorola iDEN radio that HISS just integrated into their
tracking platform. It was advertised as a deterrent to automotive theft and a
way to track hijacked school buses in real time---the Chowchilla kidnapping was
mentioned.
And that's the curve of millennial mobile computing: a cutting-edge R&D project
around special-purpose national security requirements, pitched as a general
purpose tactical device, licensed to a private partner, turned into yet another
commodity anti-theft tracker. Like if LoJack had started out for nuclear
weapons. Just a little story about telecommunications history.
Sandia applied for a patent on the Tracker in 2009, so it's probably still in
force (ask a patent attorney). HISS went through a couple of restructurings
but, as far as I can tell, no longer exists. The same goes for Arcom, a company
by the same name that makes cable TV diagnostic equipment seems to be
unrelated. Like the OLPC again, all that is left of the Tracker is a surprising
number of used units for sale. I'm not sure who ever used the commercial
version, but they sure turn up on eBay. I bought one, of course. It'll make a
good paperweight.