I have admitted on HN to sometimes writing computer.rip posts which are
extensions of my HN comments, and I will make that admission here as well. A
discussion recently came up that relates to a topic I am extremely interested
in: the fundamental loss of peer-to-peer capability on the internet and various
efforts to implement peer-to-peer distributed systems on top of the internet.
Of course, as is usual, someone questioned my contention that there really is
no such thing as a distributed system on the internet with the example of
Bitcoin. Having once made the mistake of a graduate thesis on Bitcoin
implementation details it is one of the things I feel most confident in
complaining about, and I do so frequently. Rest assured that Bitcoin's
technical implementation is 1) a heaping pile of trash which ought to
immediately dispel stories about Nakamoto being some kind of engineering
savant, and 2) Bitcoin is no exception to the fundamental truth that the
internet does not permit distributed systems.
But before we get there we need to start with history... fortunate since
history is the part I really care to write about.
Some time ago I mentioned that I had a half-written blog post that I would
eventually finish. I still have it, although it's now more like 3/4 written.
The topic of that post tangentially involves early computer networks such as
ARPANET, BITNET, ALOHAnet, etc. that grew up in academic institutions and
incubated the basic concepts around which computer networks are built today.
One of my claims there is that ARPANET is overrated and had a smaller impact on
the internet of today than everyone thinks (PLATO and BITNET were probably both
more significant), but it is no exception to a general quality most of these
early networks had that has been, well, part of the internet story: they were
fundamentally peer to peer.
This differentiation isn't a minor one. Many early commercial computer networks
were extensions of timeshare multiple access systems. That is, they had a
strictly client-server (or we could actually call it terminal-computer)
architecture in which service points and service clients were completely
separated. Two clients were never expected to communicate with each other
directly, and the architecture of the network often did not facilitate this
kind of use.
Another major type of network which predated computer networks and had
influence on them were telegraph networks. Telegraph systems had long employed
some type of "routing," and you can make a strong argument that the very
concept of packet switching originated in telegraph systems. We tend to think
of telegraph systems as being manual, morse-code based networks where routing
decisions and the actual message transfer were conducted by men wearing green
visors. By the 1960s, though, telegraph networks were gaining full automation.
Messages were sent in baudot (a 5-bit alphabetical encoding) with standardized
headers and trailers that allowed electromechanical equipment and, later,
computers to route them from link to link automatically. The resemblance to packet
switched computer networks is very strong, and by most reasonable definitions
you could say that the first wide-scale packet network in the US was that of
Western Union [1].
Still, these telegraph networks continued to have a major structural difference
from what we now consider computer networks. Architecturally they were
"inverted" from how we think of hardware distribution on computer networks:
they relied on simple, low-cost, low-capability hardware at the customer site,
and large, complex routing equipment within the network. In other words, the
"brains of the operation" were located in the routing points, not in the users
equipment. This was useful for operators like Western Union in that it reduced
the cost of onboarding users and let them perform most of the maintenance,
configuration, upgrades etc. at their central locations. It was not so great
for computer systems, where it was desirable for the computers to have a high
degree of control over network behavior and to minimize the cost of
interconnecting computers given that the computers were typically already
there.
So there are two significant ways in which early computer networks proper were
differentiated from time-sharing and telegraph networks, and I put both of them
under the label of "network of equals," a business term that is loosely
equivalent to "peer to peer" but more to our point. First, a computer network
allows any node to communicate with any other node. There is no strict
definition of a "client" or "server" and the operation of the network does not
make any such assumptions as to the role of a given node. Second, a computer
network places complexity at the edge. Each node is expected to have the
ability to make its own decisions about routing, prioritization, etc. In
exchange, the "interior" equipment of the network is relatively simple and does
not restrict or dictate the behavior of nodes.
A major manifestation of this latter idea is distributed routing. Most earlier
networks had their routes managed centrally. In the phone and telegraph
networks, the maintenance of routing tables was considered part of "traffic
engineering," an active process performed by humans in a network operations
center. In order to increase flexibility, computer networks often found it more
desirable to generate routing tables automatically based on exchange of
information between peers. This helped to solidify the "network of equals"
concept by eliminating the need for a central NOC with significant control of
network operations, instead deferring routing issues to the prudence of the
individual system operators.
Both of these qualities make a great deal of sense in the context of computer
networking having been pioneered by the military during the Cold War.
Throughout the early days of both AUTODIN (the military automated telegraph
network) and then ARPANET which was in some ways directly based on it, there
was a general atmosphere that survivability was an important characteristic of
networks. This could be presented specifically as survivability in nuclear war
(which we know was a key goal of basically all military communications projects
at the time), but it has had enduring value outside of the Cold War context as
we now view a distributed, self-healing architecture as being one of the great
innovations of packet-switched computer networks. The fact that this is also
precisely a military goal for survival of nuclear C2 may have more or less
directly influenced ARPANET depending on who you ask, but I think it's clear
that it was at least some of the background that informed many ARPANET design
decisions.
It might help illustrate these ideas to briefly consider the technical
architecture of ARPANET, which while a somewhat direct precursor to the modern
internet is both very similar and very different. Computers did not connect
"directly" to ARPANET because at the time it was unclear what such a "direct"
connection would even look like. Instead, ARPANET participant computers were
connected via serial line to a dedicated computer called an interface message
processor, or IMP. IMPs are somewhat tricky to map directly to modern concepts,
but you could say that they were network interface controllers, modems, and
routers all in one. Each IMP performed the line coding to actually transmit
messages over leased telephone lines, but also conducted a simple distributed
routing algorithm to determine which leased telephone line a message should
be sent over. The routing functionality is hard to describe because it evolved
rapidly over the lifetime of ARPANET, but it was, by intention, both simple and
somewhat hidden. Any computer could send a message to any other computer, using
its numeric address, by transferring that message to an IMP. The IMPs performed
some internal work to route the message but this was of little interest to the
computers. The IMPs only performed enough work to route the message and ensure
reliable delivery, they did not do anything further and certainly nothing
related to application-level logic.
Later, ARPANET equipment contractor BBN, along with the greater ARPANET
project, would begin to openly specify "internal" protocols such as for routing
in order to allow the use of non-BBN IMPs. Consequentially, much of the
functionality of the IMP would be moved into the host computers themselves,
while the routing functionality would be moved into dedicated routing
appliances. This remains more or less the general idea of the internet today.
Let's take these observations about ARPANET (and many, but not all, other early
computer networks) and apply them to the internet today.
Peer-to-Peer Connectivity
The basic assumption that any computer could connect to any other computer at
will proved problematic by, let's say, 1971, when an employee of BBN created
something we might now call a computer worm as a proof of concept. The problem
was far greater as minicomputers and reduced connectivity costs significantly
increased the number of hosts on the internet, and by the end of the '90s
network-based computer worms had become routine. Most software and the
protocols it used were simply not designed to operate in an adversarial
climate. But, with the internet now so readily accessible, that's what it
became. Computers frequently exposed functionality to the network that was
hazardous when made readily available to anonymous others, with Windows SMB
being a frequent concern [2].
The scourge of internet-replicating computer worms was stopped mostly not by
enhancements in host security but instead as an incidental effect of the
architecture of home internet service. Residential ISPs had operated on the
assumption that they provided connectivity to a single device, typically using
PPP over some type of telephone connection. As it became more common for home
networks to incorporate multiple devices (especially after the introduction of
WiFi) residential ISPs did not keep pace. While it was easier to get a subnet
allocation from a residential ISP back then than it is now, it was very rare,
and so pretty much all home networks employed NAT as a method to make the
entire home network look, to the internet, like a single device. This remains
the norm today.
NAT, as a side effect of its operation, prohibits all inbound connections not
associated with an existing outbound one.
NAT was not the first introduction of this concept. Already by the time
residential internet was converging on the "WiFi router" concept, firewalls
had become common in institutional networks. These early firewalls were
generally stateless and so relatively limited, and a common configuration
paradigm (to this day) was to block all traffic that did not match a
restricted set of patterns for expected use.
Somewhere along this series of incremental steps, a major change emerged... not
by intention so much as by the simple accretion of additional network controls.
The internet was no longer peer to peer.
Today, the assumption that a given internet host can connect to another
internet host is one where exceptions are more common than not. The majority
of "end user" hosts are behind NAT and thus cannot accept inbound connections.
Even most servers are behind some form of network policy that prevents them
accepting connections that do not match an externally defined list. All of this
has benefits, there is an upside, but there is also a very real downside, which
is that the internet has effectively degraded to a traditional client-server
architecture.
One of the issues that clearly illustrated this to many of my generation was
multiplayer video games. Most network multiplayer games of the '90s to the
early '00s were built on the assumption that one player would "host" a game and
other players would connect to them. Of course as WiFi routers and broadband
internet became common, this stopped working without extra manual configuration
of the NAT appliance. Similar problems were encountered by youths in
peer-to-peer systems like BitTorrent and, perhaps this will date me more than
anything else, eD2k.
But the issue was not limited to such frivolous applications. Many early
internet protocols were designed with the peer-to-peer architecture of the
internet baked into the design. FTP is a major example we encounter today,
which was originally designed under the assumption that the server could open a
connection to the client. RTMP and its entire family of protocols, including
SIP which remains quite relevant today, suffer from the same basic problem.
For any of these use-cases to work today, we have had to clumsily re-invent the
ability for arbitrary hosts to connect to each other. WebRTC, for example, is
based on RTMP and addresses these problems for the web by relying on STUN and
TURN, two separate but related approaches to "NAT negotiation." Essentially
every two-way real-time media application must take a similar approach, which
is particularly unfortunate since TURN introduces appreciable overhead as well
as privacy and security implications.
Complexity at the Edge
This is a far looser argument, but I think one that is nonetheless easy to make
confidently: the concept of complexity at the edge has largely been abandoned.
This happens more at the application layer than at lower layers, since the
lower layers ossified decades ago. But almost all software has converged on the
web platform, and the web platform is inherently client-server. Trends such as
SPAs have somewhat reduced the magnitude of the situation as even in web
browsers some behavior can happen on the client-side, but a look at the larger
ecosystem of commercial software will show you that there is approximately zero
interest in doing anything substantial on an end-user device. The modern
approach to software architecture is to place all state and business logic "in
the cloud."
Like the shift away from P2P, this has benefits but also has decided
disadvantages. Moreover, it has been normalized to the extent that traditional
desktop development methods that were amenable to significant complexity at the
client appear to be atrophying on major platforms.
As the web platform evolves we may regain some of the performance, flexibility,
and robustness associated with complexity at the edge, but I'm far from
optimistic.
Peer-to-Peer and Distributed Applications Today
My contention that the internet is not P2P might be surprising to many as there
is certainly a bevy of P2P applications and protocols. Indeed, one of the
wonders of software is that with sufficient effort it is possible to build a
real-time media application on top of a best-effort packet-switched network...
the collective ingenuity of the software industry is great at overcoming the
limitations of the underlying system.
And yet, the fundamentally client-server nature of the modern internet cannot
be fully overcome. P2P systems rely on finding some mechanism to create and
maintain open connections between participants.
Early P2P systems such as BitTorrent (and most other file sharing systems)
relied mostly on partial centralization and user effort. In the case of
BitTorrent, for example, trackers are centralized services which maintain a
database of available peers. BitTorrent should thus be viewed as a partially
centralized system, or perhaps better as a distributed system with centralized
metadata (this is an extremely common design in practice, and in fact the
entire internet could be described this way if you felt like it). Further
BitTorrent assumes that the inbound connection problem will be somehow solved
by the user, e.g. by configuring appropriate port forwarding or using a local
network that supports automated mechanisms such as UPnP.
Many P2P systems in use today have some sort of centralized directory or
metadata service that is used for peer discovery, as well as configuration
requirements for the user. But more recent advances in distributed methods have
somewhat alleviated the needs for this type of centralization. Methods such as
distributed hole punching (both parties initiating connections to each other at
the same time to result in appropriate conntrack entries at the NAT devices)
allow two arbitrary hosts to connect, but require that there be some type of
existing communications channel to facilitate that connection.
Distributed hash tables such as the Kademlia DHT are a well understood method
for a distributed system to share peer information, and indeed BitTorrent has
had it bolted on as an enhancement while many newer P2P systems rely on a DHT
as their primary mechanism of peer discovery. But for all of these there is a
bootstrapping problem: once connected to other DHT peers you can use the DHT to
obtain additional peers. But, this assumes that you are aware of at least a
single DHT peer to begin with. How do we get to that point?
You could conceivably search the entire internet space, but given the size of
the internet that's infeasible. The next approach you might reach for is some
kind of broadcast or multicast, but for long-standing abuse, security, and
scalability reasons broadcast and multicast cannot be routed across the
internet. Anycast offers similar potential and is feasible on the internet, but
it requires the cooperation of an AS owner which would be both centralized and
require a larger up-front investment than most P2P projects are interested in.
Instead, real P2P systems address this issue by relying on a centralized
service for initial peer discovery. There are various terms for this that are
inconsistent between P2P protocols and include introducer, seed, bootstrap
node, peer helper, etc. For consistency, I will use the term introducer,
because I think it effectively describes the concept: an introducer is a
centralized (or at least semi-centralized) service that introduces new peers
to enough other peers that they can proceed from that point via distributed
methods.
As a useful case study, let's examine Bitcoin... and we have come full circle
back to the introduction, finally. Bitcoin prefers the term "seeding" to refer
to this initial introduction process. Dating back to the initial Nakamoto
Bitcoin codebase, the Bitcoin introduction mechanism was via IRC. New Bitcoin
nodes connected to a hardcoded IRC server and joined a harcoded channel, where
they announced their presence and listened for other announcements. Because IRC
is a very simple and widely implemented message bus, this use of IRC had been
very common. Ironically it was its popularity that lead to its downfall:
IRC-based C2 was extremely common in early botnets, at such a level that it has
become common for corporate networks to block or at least alert on IRC
connections, as the majority of IRC connections out of many corporate networks
are actually malware checking for instructions.
As a result, and for better scalability, the Bitcoin project changed the
introduction mechanism to DNS, which is more common for modern P2P protocols.
The Bitcoin Core codebase includes a hardcoded
list
of around a dozen DNS names. When a node starts up, it queries a few of these
names and receives a list of A records that represent known-good,
known-accessible Bitcoin nodes. The method by which these lists are curated is
up to the operators of the DNS seeds, and it seems that some are automated
while some are hand-curated. The details don't really matter that much, as long
as it's a list that contains a few contactable peers so that peer discovery can
continue from there using the actual Bitcoin protocol.
Other projects use similar methods. One of the more interesting and
sophisticated distributed protocols right now is
Hypercore, which is the basis of the Beaker
Browser, in my opinion the most promising
distributed web project around... at least in that it presents a vision of a
distributed web that is so far not conjoined at the hip with Ethereum-driven
hypercapitalism. Let's take a look at how Hypercore and its underlying P2P
communications protocol Hyperswarm address the problem.
Well, it's basically the exact same way as Bitcoin with one less step of
indirection. When new Hypercore nodes start up, they connect to
bootstrap1.hyperdht.org through bootstrap3.hyperdht.org, each of which
represents one well-established Hypercore node that can be used to get a
toehold into the broader DHT.
This pattern is quite general. The majority of modern P2P systems bootstrap by
using DNS to look up either a centrally maintained node, or a centrally
maintained list of nodes. Depending on the project these introducer DNS entries
may be fully centralized and run by the project, or may be "lightly
decentralized" in that there is a list of several operated by independent
people (as in the case of Bitcoin). While this is slightly less centralized it
is only slightly so, and does not constitute any kind of real distributed
system.
Part of the motivation for Bitcoin to have multiple independently operated DNS
seeds is that they are somewhat integrity sensitive. Normally the Bitcoin
network cannot enter a "split-brain" state (e.g. two independent and equally
valid blockchains) because there are a large number of nodes which are strongly
interconnected, preventing any substantial number of Bitcoin nodes being
unaware of blocks that other nodes are aware of. In actuality Bitcoin enters a
"split-brain" state on a regular basis (it's guaranteed to happen by the
stochastic proof of work mechanism), but as long as nodes are aware of all
"valid" blockchain heads they have an agreed upon convention to select a single
head as valid. This method can sometimes take multiple rounds to converge,
which is why Bitcoin transactions (and broadly speaking other blockchain
entries) are not considered valid until multiple "confirmations"---this simply
provides an artificial delay to minimize the probability of a transaction
being taken as valid when the Bitcoin blockchain selection algorithm has not
yet converged across the network.
But this is only true of nodes which are already participating. When a new
Bitcoin node starts for the first time, it has no way to discover any other
nodes besides the DNS seeds. In theory, if the DNS seeds were malicious, they
could provide a list of nodes which were complicit in an attack by
intentionally not forwarding any information about some blocks or
advertisements of nodes which are aware of those blocks. In other words, in
practice the cost of a sybil attack is actually reduced to the number of nodes
directly advertised by the DNS seeds, but only for new users and only if the
DNS seeds are complicit. In practice the former is a massive limitation. The
Bitcoin project allows only trusted individuals, but also multiple such
individuals, to operate DNS seeds in order to mitigate the latter. In practice
the risk is quite low, mostly due to the limited impact of the attack rather
than its difficulty level (very few people are confirming Bitcoin transactions
using a node which was just recently started for the first time).
Multicast
One of the painful points here is that multicast and IGMP make this problem
relatively easy on local networks, and indeed mDNS/Avahi/Bonjour/etc solve
this problem on a daily basis, in a reasonably elegant and reliable way,
to enable things like automatic discovery of printers. Unfortunately we cannot
use these techniques across the internet because, among other reasons, IGMP
does not manageably scale to internet levels.
P2P systems can use them across local networks, though, and there are P2P
systems (and even non-P2P systems) which use multicast methods to opportunistically
discover peers on the same local network. When this works, it can potentially
eliminate the need for any centralized introducer. It's, well, not that likely
to work... that would require at least one, preferably more than one, fully
established peer on the same local network. Still, it's worth a shot, and
Hypercore for example does implement opportunistic peer discovery via mDNS
zeroconf.
Multicast presents an interesting possibility: much like the PGP key parties of
yore, a P2P system can be bootstrapped without dependence on any central
service if its users join the same local network at some point. For
sufficiently widely-used P2P systems, going to a coffee shop with a stable,
working node once in order to collect initial peer information will likely be
sufficient to remain a member of the system into the future (as long as there
are enough long-running peers with stable addresses that you can still find
some and use them to discover new peers weeks and months into the future).
Of course by that point we could just as well say that an alternative method to
bootstrapping is to call your friends on the phone and ask them for lists of
good IP addresses. Still, I like my idea for its cypherpunk aesthetics, and when
I inevitably leave my career to open a dive bar I'll be sure to integrate it.
Hope for the future
We have seen that all P2P systems that operate over the internet have,
somewhere deep down inside, a little bit of centralized original sin. It's not
a consequence of the architecture of the internet so much as it's a consequence
of the fifty years of halting change that has brought the internet to its
contemporary shape... changes that were focused around the client-server use
cases that drive commercial computing for various reasons, and so had the
network shaped in their image to the extent of exclusion of true P2P
approaches.
Being who I am it is extremely tempting to blame the whole affair on
capitalism, but of course that's not quite fair. There are other reasons as
well, namely that the security, abuse, stability, scalability, etc. properties
of truly distributed systems are universally more complex than centralized
ones. Supporting fully distributed internet use cases is more difficult, so
it's received lower priority. The plurality of relatively new P2P/distributed
systems around today shows that there is some motivation to change this state
of affairs, but that's mostly been by working around internet limitations
rather than fixing them.
Fixing those limitations is difficult and expensive, and despite the number of
P2P systems the paucity of people who actually use them in a P2P fashion would
seem to suggest that the level of effort and cost is not justifiable to the
internet industry. The story of P2P networking ends where so many stories about
computing end: we got here mostly by accident, but it's hard to change now so
we're going to stick with it.
I've got a list of good IP addresses for you though if you need it.
[1] WU's digital automatic routing network was developed in partnership with
RCA and made significant use of microwave links as it expanded. Many have
discussed the way that the US landscape is littered in AT&T microwave relay
stations, fewer know that many of the '60s-'80s era microwave relays not built
by AT&T actually belonged to Western Union for what we would now call data
networking. The WU network was nowhere near as extensive as AT&T's but was
particularly interesting due to the wide variety of use-cases it served, which
ranged from competitive long distance phone circuits to a a very modern looking
digital computer interconnect service.
[2] We should not get the impression that any of these problems are in any way
specific to Windows. Many of the earliest computer worms targeted UNIX systems
which were, at the time, more common. UNIX systems were in some ways more
vulnerable due to their relatively larger inventory of network services
available, basically all of which were designed with no thought towards
security. Malware developers tended to follow the market.
It's the first of the new year, which means we ought to do something momentous
to mark the occasion, like a short piece about telephones. Why so much on
telephones lately? I think I'm just a little burned out on software at the
moment and I need a vacation before I'm excited to write about failed Microsoft
ventures again, but the time will surely come. Actually I just thought of a
good one I haven't mentioned before, so maybe that'll be next time.
Anyway, let's talk a little bit about phones, but not quite about long distance
carriers this time. Something you may or may not have noticed about the
carriers we've discussed, perhaps depending on how interesting you find data
communications, is that we have covered only the physical layer. So far, there
has been no consideration of how switches communicated in order to set up and
tear down connections across multiple switches (i.e. long distance calls).
Don't worry, we will definitely get to this topic eventually and there's plenty
to be said about it. For the moment, though, I want to take a look at just one
little corner of the topic, and that's multifrequency tone systems.
Most of us are at least peripherally familiar with the term "dual-tone
multifrequency" or "DTMF." AT&T intended to promote Touch-Tone as the consumer
friendly name for this technology, but for various reasons (mainly AT&T's
trademark) most independent manufacturers and service providers have stuck to
the term DTMF. DTMF is the most easily recognizable signaling method in the
telephone system: it is used to communicate digital data over phone lines, but
generally only for "meta" purposes such as connection setup (i.e. dialed
digits). An interesting thing about DTMF that makes it rather recognizable is
that it is in-band, meaning that the signals are sent over the same audio
link as the phone call itself... and if your telephone does not mute during
DTMF (some do but most do not), you can just hear those tones.
Or, really, I should say: if your phone just makes the beep boop noises for fun
pretend purposes, like cellphones, which often emit DTMF tones during dialing
even though the cellular network uses entirely on-hook dialing and DTMF is not
actually used as part of call setup. But that's a topic for another day.
DTMF is not the first multi-frequency signaling scheme. It is directly based on
an earlier system called, confusingly, multifrequency or MF. While DTMF and MF
have very similar names, they are not compatible, and were designed for
separate purposes.
MF signaling was designed for call setup between switches, mostly for
long-distance calling. Whenever a call requires a tandem switch, so say you
call another city, your telephone switch needs to connect you to a trunk on a
tandem switch but also inform the tandem switch of where you intend to call.
Historically this was achieved by operators just talking to each other over the
trunk before connecting it to your local loop, but in the era of direct dialing
an automated method was needed. Several different techniques were developed,
but MF was the most common for long-distance calling in the early direct dial
era.
An interesting thing about MF, though, is that it was put into place in a time
period in which some cities had direct long distance dialing but others did
not. As a result, someone might be talking to an operator in order to set up a
call to a city with direct dial. This problem actually wasn't a new one, the
very earliest direct dialing implementations routinely ran into this issue, and
so it became common for operators switchboards to include a telephone dial
mounted at each operator position. The telephone dial allowed the operator to
dial for a customer, and was especially important when connecting someone into
a direct dial service area.
MF took the same approach, and so one could say that there were two distinct
modes for MF: in machine-to-machine operation, a telephone switch automatically
sent a series of MF tones after opening a trunk, mainly to forward the dialed
number to the next switch in the series. At the same time, many operators had
MF keypads at their positions that allowed them to "dial" to a remote switch
by hand. The circuitry that implemented these keypads turned more or less
directly into the DTMF keypads we see on phones today.
Like DTMF, MF worked by sending a pair of two frequencies [1]. The frequencies
were selected from the pool of 700, 900, 1100, 1300, 1500, and 1700Hz. That's
six frequencies, and it is required that two frequencies always be used, so the
number of possible symbol is 6c2 or 15. Of course we have the ten digits, 0-9,
but what about the other five? The additional five possibilities were used for
control symbols. For reasons that are obscure to me, the names selected for
the control symbols were Key Pulse or KP and Start or ST. Confusingly, KP
and ST each had multiple versions and were labeled differently by different
equipment. The closest thing to a universal rule would be to say that MF could
express the symbols 0-9, KP1-KP2, and ST1-ST3.
Part of the reason that the labeling of the symbols was inconsistent is that
their usage was somewhat inconsistent from switch to switch. Generally
speaking, an operator would connect to a trunk and then press KP1, the number
to be called, and then ST1. KP1 indicated to the far-side switch that it should
set up for an incoming connection (e.g. by assigning a sending unit or other
actions depending on the type of switch), while ST1 indicated that dialing was
complete. Most of the time telephone switches used other means (digit-matching
based on "dial plans") to determine when dialing was complete, but since tandem
switches handled international calls MF was designed to gracefully handle
arbitrary length phone numbers (due to both variance between countries and the
bizarre choice of some countries to use variable-length phone numbers).
The additional KP and ST symbols had different applications but were most often
used to send "additional information" to the far side switch, in which case the
use of one of the control symbols differentiated the extra digits (e.g. an
account number) from the phone number.
MF keypads were conventionally three columns, two columns of digits (vertically
arranged) and one of control symbols on the right.
This is a good time to interject a quick note: the history of MF signaling turns
out to be surprisingly obscure. I had been generally aware of it for years, I'm
not sure why, but when I went to read the details I was surprised by... how few
details there are. Different sources online conflict about basic facts (for
example, Wikipedia lists 6 frequencies which is consistent with the keypad I
have seen and the set of symbols, but a 1960 BSTJ overview article says there
were only five...). So far as I can tell, MF was never formally described in
BSTJ or any other technical journal, and I can't find any BSPs describing the
components. I suspect that MF was an unusually loose standard for the telephone
system, and that the MF implementation on different switches sometimes varied
significantly. This is not entirely surprising since the use of MF spanned from
manual exchanges to modern digital exchanges (it is said to still be in use in
some areas today, although I am not aware of any examples), covering around 80
years of telephone history.
I didn't really intend to go into so much detail on MF here, but it's useful to
understand my main topic: DTMF. MF signaling went into use by the late 1940s
(date unclear for the reasons I just discussed), and by 1960 was considered a
main contender for AT&T's goal of introducing digital signaling not just
between switches but also from the subscriber to the switch [2]. A few years
later, AT&T introduced Touch-Tone or DTMF dialing. Unsurprisingly, DTMF is
really just MF with some problems solved.
MF posed a few challenges for use with subscriber equipment. The biggest was
the simple placement of the frequencies. The consistent 200 Hz separation
meant that certain tones were subject to harmonics and other intermodulation
products from other tones, requiring high signal quality for reliable decoding.
That wasn't much of a problem on toll circuits which were already maintained to
a high standard, but local loops were routinely expected to work despite very
poor quality, and there was a huge variety of different equipment in use on
local loops, some of which was very old and easily picked up spurious noise.
Worse, the MF frequencies were placed in a range that was fairly prominent in
human speech. This resulted in a high risk that a person talking would be
recognized by an MF decoder as a symbol, which could create all kind of
headache. This wasn't really a problem for MF because MF keypads were designed
to disconnect the subscriber when digits were pressed. DTMF, though, was
intended to be simpler to implement and convenient to use while in a call,
which made it challenging to figure out how to disconnect or "mute" both
parties during DTMF signaling.
To address these issues, a whole new frequency plan was devised for DTMF. The
numbers and combinations all seem a bit odd, but were chosen to avoid any kind
of potential intermodulation artifacts that would be within the sensitivity
range of the decoder. DTMF consisted of eight frequencies, which were
organized differently, into a four by four grid. A grid layout was used, in
which there is one set of "low" frequencies and one set of "high" frequencies
and "low" was never mixed with "low" and vice versa, because it allowed much
tighter planning of the harmonics that would result from mixing the
frequencies.
So, we can describe DTMF this way: there are four rows and four columns. The
four rows are assigned 697, 770, 852, and 941 Hz, while the four columns are
1209, 1336, 1477, and 1633 Hz. Each digit consists of one row frequency and
one column frequency, and they're laid out the same way as the keypad.
Wait a minute... four rows, four columns?
DTMF obviously needed to include the digits 0-9. Some effort was put into
selecting the other available symbols, and for various reasons * and # were
chosen as complements to the digits (likely owing to their common use in
typewritten accounting and other business documents at the time). That makes
up 12 symbols, the first three columns. The fourth column, intended mostly for
machine-to-machine applications [3], was labeled A, B, C, and D.
Ever since, DTMF has featured the mysterious symbols A-D, and they have seen
virtually no use. It is fairly clear to me that they were only included
originally because DTMF was based directly on MF and so tried to preserve the
larger set of control symbols and in general a similar symbol count. The
engineers likely envisioned DTMF taking over as a direct replacement for MF
signaling in switch-to-switch signaling, which did happen occasionally but
was not widespread as newer signaling methods were starting to dominate by
the time DTMF was the norm. Instead, they're essentially vestigial.
One group of people which would be generally aware of the existence of A-D are
amateur radio operators, as the DTMF encoders in radios almost always provide a
full 4x4 keypad and it is somewhat common for A-D to be used for controlling
telephone patches---once the telephone patch is connected, 0-9, *, and # will
be relayed directly to the phone network, A-D provide an opportunity for four
symbols that are reserved for the patch itself to respond to.
Another group of people to which this would be familiar is those in the
military from roughly the '70s to the '90s, during the period of widespread
use of AUTOVON. While AUTOVON was mostly the same as the normal telephone
network but reserved for military use, it introduced one major feature that
the public telephone system lacked: a precedence, or priority system.
Normally dialed AUTOVON calls were placed at "routine" priority, but
"priority," "immediate," "flash," and "flash override" were successively higher
precedence levels reserved for successively more important levels of military
command and control. While it is not exactly true, it is almost true, and
certainly very fun to say, that AUTOVON telephones feature a button that only
the President of the United States is allowed to press. The Flash Override or
FO button was mostly reserved for use by the national command authority in
order to invoke a nuclear attack, and as you would imagine would result in
AUTOVON switches abruptly terminating any other call as necessary to make
trunks available.
AUTOVON needed some way for telephones to indicate to the switch what the
priority of the call was, and so it was obvious to relabel the A, B, C, and D
DTMF buttons as FO, F, I, and P respectively. AUTOVON phones thus feature a
full 4x4 keypad, with the rightmost column typically in red and used to prefix
dialed calls with a precedence level. Every once in a while I have thought
about buying one of these phones to use with my home PABX but they tend to be
remarkably expensive... I think maybe restorers of military equipment are
holding up prices.
And that's what I wanted to tell you: the military has four extra telephone
buttons that they don't tell us about. Kinda makes you take the X-files a
little more seriously, huh?
In all seriousness, though, they both do and don't today. Newer military
telephone systems such as DSN and the various secure VoIP systems usually
preserve a precedence feature but offer it using less interesting methods.
Sometimes it's by prefixing dialing with a numeric code, sometimes via feature
line keys, but not by secret DTMF symbols.
[1] This was technically referred to as a "spurt" of MF, a term which I am
refusing to use because of my delicate sensibilities.
[2] One could argue that pulse dialing was "digital," but because it relied on
the telephone interrupting the loop current it was not really "in-band" in the
modern sense and so could not readily be relayed across trunks. Much of the
desire for a digital subscriber signaling system was for automated phone
systems, which could never receive pulses since they were "confined" to the
local loop. Nonetheless DTMF was also useful for the telephone system itself
and enabled much more flexible network architectures, especially related to
remote switches and line concentrators, since DTMF dialing could be decoded
by equipment "upstream" from wherever the phone line terminated without any
extra signaling equipment needed to "forward" the pulses.
[3] This might be a little odd from the modern perspective but by the '60s
machine-to-machine telephony using very simple encodings was becoming very
popular... at least in the eyes of the telephone company, although not always
the public. AT&T was very supportive of the concept of telephones which read
punched cards and emitted the card contents as DTMF. In practice this ended up
being mostly used as a whimsical speed-dial, but it was widely advertised for
uses like semi-automated delivery of mail orders (keypunch them in the field,
say going door to door, and then call an electromechanical order taking system
and feed them all through your phone) and did see those types of use for some
time.
I have another post about half-written that I will finish up and publish soon,
but in the mean time I have been thinking today about something that
perennially comes up in certain orange-tinted online communities: running your
own mail server.
I have some experience that might allow me to offer a somewhat nuanced opinion
on the matter. Some years ago I was a primary administrator of a mailserver for
a small university (~3k users), and today I operate two small mailservers, one
for automated use and one that has a small number of active human users. On the
other hand, while I operated a mailserver for my own personal email for years I
have now been using Fastmail since around 2015 and have had no regrets.
My requirements are perhaps a little unusual, and that no doubt colors my
opinion on options for email: I virtually never use webmail, instead relying on
SMTP/IMAP clients (mostly Thunderbird on desktop and Bluemail on Android,
although I would be hesitant to endorse either very strongly due to
long-running stability and usability problems). A strong filtering capability
is important to me, but I am relatively lax about spam filtering as I normally
review my junk folder manually anyway. I am very intolerant of any
deliverability problems as they have caused things like missed job
opportunities in the past.
The software stack that I normally use for email is a rather pedestrian one
that forms the core of many institutional and commercial email services:
postfix and dovecot, working off of a mix of Unix users and virtual ones.
Typically I have automated management through tools that output postfix map
text files rather than by moving postfix maps to e.g. an SQL server, although I
have worked with one or two mailservers configured that way. I have been from
Squirrelmail to Roundcube to Rainloop for webmail, although as mentioned I do
not really use webmail personally.
My preference is to use Dovecot via LMTP and move as much functionality as
possible (auth, filtering, etc) into Dovecot rather than Postfix. I have always
used SpamAssassin and fail2ban to control log noise and brute force attempts.
All of this said, one of the great frustrations to email servers, especially to
the novice, is that even for popular combinations like Postfix/Dovecot there
are multiple ways to architect the mail delivery, storage, and management
process. For example, there are at least 4-5 distinct architectural options for
configuring Postfix to make mail available to Dovecot. Different distributions
may package these services pre-configured for one approach or the other, or
with nothing pre-configured at all. In fact, mail servers are an area where
your choice of distribution can matter a great deal: under some Linux
distributions, like RHEL, simply installing a few packages will result in a
mostly working mailserver configured for the most common architecture. Under
other distributions the packages will leave you with an empty or nonexistent
configuration and you will have a lot of reading to do before you get to the
most basic working configuration.
The need to support some of the newer anti-spam/anti-abuse technologies
introduces some further complication, as you'll need to figure out a DKIM
signing service and get it inserted into the mail flow at the right point.
Because of the historic underlying architecture of most MTAs/MDAs, this can
actually be surprisingly confusing as it's often difficult to "hook" into
mail processing at a point where you can clearly differentiate email that is
logically "inbound" and "outbound" [1].
Finally, as a general rule mail-related software tends to be "over-architected"
(e.g. follows "the Unix philosophy" in all the worst ways and few of the good
ones) and fairly old [2]. This makes a basic working body of configuration
surprisingly large and complex, and the learning curve can be formidable. It
took me years to feel generally conversant in the actual care and feeding of
Postfix, for example, which like many of these older network services has a lot
of fun details like its own service management and worker pooling system.
All of this goes to explain that configuring a mailserver has one of the
steeper learning curves of common network services. Fortunately, a number of
projects have appeared which aim to "auto-configure" mailservers for typical
use-cases. Mail-in-a-box, iRedMail, etc. promise ten minutes to a fully working
mailserver.
These projects, along with a general desire by many in the tech industry to
reduce their reliance on ad-supported services of major tech companies, have
resulted in a lot of ongoing discussion about the merits of running your own
mail. Almost inevitably these threads turn into surprisingly argumentative
back-and-forths about the merits of self-hosted mail, the maintenance load,
deliverability, and so on.
Years ago, before I had any sort of coherent place to put my writing, I wrote
an essay about email motivated by Ray Tomlinson's death: Obituary, for Ray
Tomlinson and Email. I will not merely
repeat the essay here, in large part because it's mostly philosophical in
nature and I intend to stay very practical in this particular message. The gist
of it, though, is that email as we now know it was one of the first real
federated systems and is also, in my opinion, one of the last. The tremendous
success of the SMTP ecosystem has also revealed the fundamental shortcomings
of federated/distributed systems, in that the loosely federated nature of
email leads to daily real-world frustrations that show close to zero sign of
ever improving.
There are practical implications to these more theoretical problems, and
they're the same ones that repeatedly tank efforts at decentralized
communications and social media. In the case of email, they are particularly
severe, as the problems emerged after email became widely used. Instead of
killing the concept or causing a redesign to eliminate the defects of the
federated design, in the case of email we just have workarounds and
mitigations. These are where most of the real complexity of email lies.
Spam
The most obvious, and usually largest, problem that any decentralized
communications system will run into is spam. There are various theoretical
approaches to mitigating this issue, such as proof of work, but in practice
real-world communications products pretty consistently mitigate spam by
requiring a proof of identity that is difficult to produce en masse.
The most common by far is a telephone number. Complaints about Telegram and
Signal requiring that accounts be associated with a phone number are widespread
(I am one of the people complaining!), but they often miss that this simple (if
irritating to some) step is incredibly effective in reducing spam. This tends
to turn into a real "I found the exception and therefore you are wrong" kind of
conversation, so let me acknowledge that there are plenty of ways to come up
phone numbers that will pass the typical checks used by these service. But that
doesn't in any way invalidate the concept: all these methods of obtaining phone
numbers are relatively expensive and volume limited, so they don't undermine
the basic goal of using SMS validation of a phone number to require a high
effort level to register multiple accounts. The very low volume of outright
spam on both Telegram and Signal as an indication of the success of this
basic strategy.
Of course requiring a validated telephone number as part of identity is a
substantial compromise on privacy and effectively eliminates identity
compartmentalization (the mind boggles at the popularity of Telegram with
furries in consideration of this issue, as compared to common furry use
patterns on services like Twitter that do facilitate compartmentalization).
But there's a more significant problem: it is predicated on centralization.
Sure, it's theoretically possible to implement this in a distributed fashion,
but there's a few reasons that no one is going to. For properly federated
services it's a non-starter, as unless you significantly compromise on the
basic idea of federation you are reliant on all members of the federation to
perform their own validation of users against a scarce proof of identity...
but the federation members themselves are frequently crooked.
In other words, in some ways this approach has been applied to email as
popular free email hosts like Google and Microsoft are increasingly pushing
telephone validation as a requirement on accounts. But that only protects
their own abuse handling resources. Email being federated means that you need
to accept mail from other servers, and you don't know what their validation
policy is.
This is a fundamental problem. Federated systems impose significant limits on
any kind of identity or intent validation for users, and so spam mitigation
needs to be done at the level of nodes or instances instead. This tends to
require an ad-hoc "re-centralization" in the form of community consensus
policy, blocklists of instances, etc. Modern federated systems still handle
this issue fairly poorly, but email, due to its age, lacks even the beginning
of a coordinated solution.
Instead, more reactive measures have had to be taken to protect the usability
of email, and those are the elephant in the room in all discussions of
self-hosted email. They have significant implications for self-hosted email
operators.
Most spam filtering solutions rely on some degree of machine learning or
dynamic tuning. Smaller email operators have an inherently harder time
performing effective spam blocking because of the smaller set of email
available for ongoing training. In practice this problem doesn't seem to be
that big and SpamAssassin mostly performs okay without significant additional
training, but the issue does exist, it's just not too severe.
Because mail servers come and go, and malicious/spam email often comes from new
mail servers, major email operators depend heavily on IP reputation and tend to
automatically distrust any new mail server. This leads to a few problems.
First, cheap or easy-to-access hosting services (from AWS to Uncle Ed's
Discount VPS) almost always have ongoing problems with fly-by-night customers
using their resources to send spam, which means that they almost always have
chunks of their IP space on various blocklists. This is true from the
sketchiest to the most reputable, although the problem tends to be less severe
as you get towards the Oracle(R) Enterprise Cloud(TM) version of the spectrum.
These issues can make DIY email a non-starter, as if you rely on a commodity
provider there's a fair chance you'll just get a "bad IP" and have a bit of an
ongoing struggle to get other providers to trust anything you send. That said,
it's also very possible to get recycled IPs that have no issues at all... it
tends to be a gamble. Less commodity, more bespoke service providers can
usually offer some better options here. In the best case, you may be able to
obtain IP space that hasn't been used in a long time and so is very unlikely to
be on any blocklists or reputation lists. This is ideal but doesn't happen so
often in this era of IPv4 exhaustion. As the next best thing, many providers
that have a higher-touch sales process (i.e. not a "cloud" provider) maintain a
sense of "good quality" IPs that have a history of use only by trusted clients.
If you spend enough money with them you can probably get some of these.
On the other hand, most cheap VPS and cloud providers are getting their IP
space at auction, which has a substantial risk of resulting in IP space with a
history of use by a large-scale organized email spam operation. If you spend
much time looking at websites like LowEndBox you'll see this happening a lot.
Even if you get an IP with no reputational problems, you will still run into
the worst part of this IP reputation issue: IPs with no history are themselves
suspicious. Most providers have logic in place that is substantially more
likely to reject or flag as spam any email coming an IP address without a
history of originating reliable email. Large-scale email operations contend
with this by "warming up" IPs, using them to send progressively more traffic
over time in order to build up positive reputation. As an individual with a
single IP you are not going to be able to do this in such a pre-planned way, but
it does mean that things will get better over time.
A frustrating element of email deliverability is the inconsistency in the way
that email providers handle it. It used to be that it was often possible to get
feedback from email providers on your deliverability, but that information was
of course extremely useful to spammers, so major providers have mostly stopped
giving it out. Instead, email providers typically reject some portion of mail
they don't like entirely, giving an SMTP error that almost universally gives a
link to a support page or knowledgebase article that is not helpful. While
these SMTP rejections are frustrating, the good news is that you actually know
that delivery failed... although in some cases it will succeed on retry. The
mail servers I run have been around long enough that outright SMTP rejections
are unusual, but I still consistently get a seemingly random sample of emails
hard rejected by Apple Mail.
What's a little more concerning is, of course, a provider's decision of whether
or not to put a message into the junk folder. In a way this is worse than an
outright rejection, because the recipient will probably never see the message
but you don't know that. Unfortunately there aren't a lot of ways to get
metrics on this.
If you self-host email, you will run into an elevated number of delivery
problems. That is a guarantee. Fully implementing trust and authentication
measures will help, but it will not eliminate the problem because providers
weight their IP reputation information more than your ability to configure DKIM
correctly. Whether or not it becomes a noticeable problem for you depends on a
few factors, and it's hard to say in advance without just trying it.
Federated systems like email tend to rely on a high degree of informal social
infrastructure. Unfortunately, as email has become centralized into a small
number of major providers, that infrastructure has mostly decayed. It was not
that long ago that you could often resolve a deliverability problem with a
politely worded note to postmaster @ the problematic destination server.
Today, many email providers have some kind of method of contacting them, but I
have never once received a response or even evidence of action due to one of
these messages... both for complaints of abuse on their end and deliverability
problems [3].
Ongoing maintenance
While it is fully possible to set up a mailserver and leave it for years
without much of any intervention beyond automatic updates, I wouldn't recommend
it. Whether you have one user or a thousand, mail service tends to benefit
appreciably from active attention. Manual tuning of spam detection parameters
in response to current spam trends can have a huge positive impact on your spam
filtering quality. I also manually maintain allow and blocklists of domains on
mailservers I run, which can also greatly improve spam results.
More importantly, though, because of the extremely high level of ambient email
abuse, mailservers are uniquely subject to attack. Any mailserver will receive
a big, ongoing flow of probes that range from simple open relay checks to
apparently Metasploit-driven checks for recently published vulnerabilities. A
mailserver which is vulnerable to compromise will start sending out
solicitations related to discount pharmaceuticals almost instantly. While I am
hesitant to discourage anyone trying to grow their own vegetables, I also feel
that it's a bit irresponsible to be too cavalier about mailservers. Any
mailserver should have at least a basic level of active maintenance to ensure
vulnerabilities are patched and to monitor for successful exploitation. I would
not recommend that a person operate a mailserver without at least a basic
competence in Linux administration and security.
Look at it this way: the security and abuse landscape of email is such that the
line between being one of the good guys, and being part of the problem, is
sometimes very thin. It's easy to cross by accident if you do not learn best
practices in mail administration and keep up with them, because they do change.
Privacy and Security
The major motivation for self-hosting email usually has to do with privacy.
There's a clear benefit here, as most ad-driven email providers are known to do
various types of analysis on email content and that feels very sketchy. There
may also be a benefit to privacy and security in that you have a greater
ability to control and audit the way that your email is handled and protected.
There can be some more subtle advantages to running your own mailserver, as
well, since you can customize configuration to meet your usage. For example,
you can simply not implement interfaces and protocols that you do not use (e.g.
POP) to reduce attack surface. You can set up more complex authentication with
fewer bypass options. And you can restrict and monitor access much more
narrowly in general. For example, if you run your own mailserver it may be
practical to put strict firewall restrictions on the submission interface.
One benefit that I rather like is mandatory TLS. SMTP for inter-MTA transfer
provides only opportunistic TLS, meaning that it is susceptible to an SSL
stripping attack if there is an on-path attacker. A countermeasure already
often implemented by email providers is a mandatory TLS list, which is
basically just a list of email domains that are known to support TLS. The MTA
is configured to refuse delivery to any of these domains if they do not accept
an upgrade to TLS and provide a valid certificate. This is basically the email
equivalent of HTTPS Everywhere, and if you run your own mailserver you can
configure it much more aggressively than a major provider can. This is a
substantial improvement in security since it nearly ensures in-transit
encryption, which generally cannot be assumed with email [4].
We must remember, though, that in general the privacy and security situation
with email is an unmitigated disaster. Even when running your own mailserver
you should never make assumptions. A very large portion of the email you send
and receive will pass through the same major provider networks on the other end
anyway. There is always a real risk that you have actually compromised the
security of your email contents, as commercial providers generally have a
substantial professional security program and you do not. The chances are much
higher that your own mailserver will be compromised for an extended period of
time without your knowledge.
It is also more likely that your own mail server will compromise your privacy
by oversharing. Distribution packages fortunately often include a default
configuration with reasonable precautions, but mail services can include a lot
of privacy and security footguns and it can be hard to tell if you have
disarmed them all. For example, when using SMTP submission many MTAs will put
the IP address of your personal computer, and the identity and version of your
MUA, in the outbound email headers. This is a breach of your privacy that can
be particularly problematic when your email address is related to an endeavor
with a DoS problem like competitive gaming. Commercially operated services
virtually always remove or randomize this information, and you can too, but
you have to know what issues to check for, and that's hard to do without
putting appreciable time into it.
Advice
Do I think running your own mail server is a good idea? I do not have an
especially strong opinion on the matter, as you might have guessed from the
fact that I both run mailservers and pay Fastmail to do it for me. I think
it depends a great deal on your needs and desires, and your level of skill
and resources.
I will offer a few things that I think are good rules:
It's not a good idea to run a "set-and-forget" mailserver. If you decide
to run mail yourself, you should be ready to spend a bit of your time on
maintenance and monitoring at least monthly. This won't be very time intensive,
but it should be attended to regularly.
Email can be a critical, central identity proof that your access to a lot of
things relies on. Consider the substantial risk that an issue with your DIY
mail server will result in your losing your ability to access things like your
Google account until you get it fixed. I have seen college students go through
absolute hell because they decided to run their own mailserver, it broke in a
difficult to diagnose way, and now they have to call telephone support at the
bank, insurance company, etc etc to get the 2FA email on their account changed.
I am skeptical of one-click solutions like iRedMail due to how opaque email
troubleshooting can be. If you want to run your own mailserver, I would
strongly encourage you to take it as a learning experience and get smart on how
the MTA, MDA, spam filter, etc actually work. The architecture of a typical
modern mailserver can be hard to wrap your head around, but you will have a
significantly easier time troubleshooting problems and adjusting things to work
the way you want if you have a block-diagram-level understanding of how all the
software you rely on works. Unfortunately the "in a box" type offerings have a
habit of discouraging this, and often even go for a more complex architecture
than you need because greater generality makes automation simpler (but makes
the resulting configuration more complex).
You're better off running a mailserver on a well-worn distribution that is
popular for infrastructure, like the Red Hat or Debian families. They tend to
solve a lot of the problems you're likely to run into in advance via their
packaging and default config conventions.
And my advice for running a mailserver:
Do not cheap out on providers. Run your mailserver in the most reputable
IP space you can find. Most likely you will use some type of cloud instance or
VPS (probably in such a way that the line between the two is blurry).
Unfortunately, per-month cost tends to correlate directly with IP reputation
quality, as does name recognition. But neither are any guarantee. AWS has
plenty of IP reputation problems. Sometimes a little-known, brand-new VPS
provider can be a surprisingly good option because they got their IP space off
of some staid corporation that maintained an impeccable security program. Do
some research to try and find out what kind of results people get from
different providers.
From an IP reputation perspective, consistency is key. Mailservers and IP
reputation are like credit cards and your credit report: once you open one, you
want to keep it as long as possible so that it will build a good IP reputation.
Do not move your mailserver around between providers to save money, it will
result in headache.
Do not run a mailserver off of IP space that is not intended for commercial
services. For example, running a mailserver on a residential ISP is a terrible
idea (if it even works, many residential ISPs drop outbound on port 25). You
will be guaranteed to have IP reputation problems and, moreover, IP space
allocated to end-users usually gets reported to a "policy block list" such as
the one Spamhaus maintains... meaning your outbound email will be blocked by
recipients simply because it's coming from an IP that should not have
mailservers, so it's most likely to be from a compromised end-user device.
Check every box when it comes to auth and trust measures. Set up SPF, DKIM,
and DMARC---read enough about these that you are confident you have set them up
correctly. They will not prevent deliverability problems, but they will help,
and incorrect or missing SPF in particular is virtually guaranteed to cause a
high rate of SMTP rejections. Use an external service like
check-auth@verifier.port25.com to ensure you've set it all up correctly.
Set up fail2ban and make sure it is working correctly. Otherwise you will
pull your hair out every time you have to look at a log due to all the auth
rejections, and even worse someone might actually guess a password correctly
one day.
Be very conservative from a security perspective. After all, a big benefit
of running your own is the ability to do this. Disable interfaces you don't
use, restrict network connections to the mailserver, require two-factor auth,
etc.
If your mailserver has more than a trivial number of users, one of your
biggest headaches is going to be your own side. Especially at an institutional
or business scale, your users will get their credentials compromised (via
phishing or whatever) and those credentials will be used to send spam via
your mailserver. Even if you are the only user, vulnerabilities in webmail are
highly prized because they may allow someone to use your webmail to send email
via your mail server. All of this will cause your mailserver to end up on
blocklists and gain a high spam reputation with major providers. Un-intuitively,
this is why outbound spam detection can be very valuable. Passing your
outbound email through SpamAssassin and alerting on hits (or for large servers,
on an unusual number of hits) will allow you to quickly detect compromised user
accounts (or webmail, or mailserver, etc) and cut them off quickly, hopefully
minimizing the future impact [5].
If you don't use webmail, don't offer it. It's a huge increase in attack
surface. If you do run webmail, I recommend putting it behind another
authentication system (e.g. a reverse proxy implementing oauth). It will reduce
the chances of compromise by orders of magnitude. Similarly, if you are the
type of person who heavily uses a VPN, restricting access to SMTP submission,
IMAP, manage-sieve, etc to the VPN endpoint might be a fairly cheap measure
that gets you a huge improvement in security.
Managing your email storage, retention, backup, etc. can become surprisingly
complex surprisingly quickly. You have multiple options for how to actually
store email ranging from relational database to various flatfile formats. Learn
enough about them to make an informed decision. Unfortunately, Linux being
Linux, there are weird caveats to basically all approaches.
This is more of a matter of opinion, but where possible I prefer to use Unix
accounts for mail rather than virtual accounts. Part of this is because I use
directory authentication (LDAP/Kerberos) wherever possible, but even if you
don't, using Unix accounts for mail tends to be simpler to troubleshoot than
virtual accounts, and it reduces the number of distinct authentication points.
On the other hand, I tend to recommend not storing mail in user home
directories. It does simplify backup and in general make more sense, but it
becomes way too easy to accidentally delete your whole mail spool.
And on that topic, if you're going to run a mail server and you're not using
a 100% virtual approach with everything stored in a relational database, it
will be well worth your time to become an expert on Linux file permissions and
the SetUID behavior of your MDA and, possibly, MTA, filter, etc if they
interact with file storage. Many of the real-world problems you'll run into
will turn out to be related to file permissions and what UID a specific
component of the mailserver is acting as when it takes a specific action.
Depending on details of the setup, processing an individual incoming email
often includes steps that run as a service user, and steps that run as the
user the email is being delivered to.
The Sieve filtering language is substantially better than procmail. Don't
even get started with procmail, use an MDA or IMAP server with Sieve support.
Writing Sieve rules is not always the most fun thing, but if you get
manage-sieve working (likely built into your IMAP server!) there are several
GUI web and desktop mail clients that will let you manage your Sieve rules
using a familiar, gmail-like interface.
Mail is an especially arcane and eldritch world of software and you will gain
some gray hair in the process of becoming competent. That's just the joy of
computing!
Surely I have said at least one thing here which is controversial or just
outright wrong, so you can let me know by sending an email to the MTA that I
pay some Australians to run.
[1] Basically, MTAs especially are designed around a fundamentally distributed
architecture and try to view mail handling as a completely general-purpose
routing exercise. That means that there's no strong differentiation between
"inbound" and "authenticated outbound" messages at many points in the system.
Obviously there are canonical ways to handle this with DKIM, but nonetheless
it's surprisingly easy to do something extremely silly like set up an open
relay where, as a bonus, OpenDKIM signs all of the relayed messages.
[2] Some of this over-architecture is simply a result of the natural worst
tendencies of academic software development (to make the system more general
until it is so general that the simplest use-cases are very complex), but some
of it is a result of a legitimate need for mail software to be unusually
flexible. I think we can attribute this to three basic underlying issues:
first, historically, there has been a strong temptation to "merge" mail with
other similar protocols. This has resulted in a lot of things like MDAs that
also handle NNTP, leading to more complex configuration. Second, there is a
surprising amount of variation in the actual network architecture of email in
institutional networks. You've experienced this if you've ever had to carefully
reread what a "smart host" vs. a "peer relay" is. Third, MTAs and MDAs have
traditionally been viewed as very separate concerns while other related
functions like LDA and filtering have been combined into MTAs and MDAs at
various points. So, most mail software tries to support many different
combinations of different mail services and inter-service protocols.
[3] I have a story about once submitting more than one abuse complaint to
gmail per week for over a year, during which they never stopped the clearly
malicious behavior by one of their users. The punchline is that the gmail user
was sending almost daily emails with so many addresses across the To: and Cc:
(over 700 in total) that the extreme length of the headers broke the validation
on the gmail abuse complaint form, requiring me to truncate them. I also
detailed this problem in my abuse reports, it was also never fixed.
[4] This is increasingly getting written into compliance standards for various
industries, so more and more paid commercial email services also allow you to
configure your own mandatory TLS list.
[5] This issue will mostly not apply to "personal" mail servers, but at an
institutional scale it's one of the biggest problems you'll deal with. I went
through a period where I was having to flush the postfix queue of "3 inches
longer" emails more than once a week due to graduate students falling for
phishing. Yes, I'm kind of being judgmental, but it was somehow always graduate
students. The faculty created their own kinds of problems. Obviously 2FA will
help a lot with this, and it might also give you a bit of sympathy for your
employer's annoying phishing training program. They're annoyed too, by all the
phishing people are falling for.
Last time we perambulated on telephones, we discussed open-wire long-distance
telephone carriers and touched on carriers intended for cables. Recall that, in
the telephone industry, cable refers to an assembly of multiple copper twisted
pairs (often hundreds) in a single jacket. There is a surprising amount of
complexity to the detailed design of cables, but that's the general idea.
Cables were not especially popular for long-distance service because the close
proximity of the pairs always lead to crosstalk problems. Open-wire was much
better in that regard but it was costly to install and maintain, and the large
physical size of open-wire arrays limited the channel capacity of long-distance
leads.
The system of carriers, multiplexing multiple lines onto a single set of wires,
allowed for a significant improvement in the capacity of both cables and
open-wire. However, even the highest quality open-wire circuits could offer
only a very limited bandwidth for multiplexing. In practice, anything near
100kHz became hopelessly noisy as the balanced transmission system used on
these cables was simply ineffective at high frequencies. Because phone
conversations required around 15kHz of bandwidth (assuming no companding, which
was not yet done at the time) this imposed a big limit... which helps to
explain why open-wire carriers basically topped out at 17 total channels.
Fortunately, in the early 1930s AT&T engineers [1] began to experiment with a
then obscure type of cable assembly called a coaxial line (I will stick to this
terminology to avoid confusion with the more specific industry meaning of
"cable"). Coaxial lines were first proposed in the 19th century and are in
widespread use today for all manner of RF applications, although people tend to
associate them most with cable television. At the time, though, there were no
commercial applications for long, high-frequency coaxial lines, and so AT&T's
efforts covered considerable new ground.
The basic concept of a coaxial line is this: a center conductor is surrounded
first by a dielectric (typically air in earlier lines with separation achieved
by means of wrapping a non-conductive fiber cord around the center conductor)
and then by a cylindrical metal outer conductor. Unlike open-wire lines, the
outside conductor is connected to ground. This system has two particularly
useful properties: first, because the signal is carried through the internal
space between the conductors which is effectively a capacitor, the system acts
much like the old loaded telephone lines (but more effective) and can carry
very high frequencies. Second, the skin effect causes the outer conductor to
serve as a very effective shield: undesired RF energy follows the outside of
the outer conductor to ground, and is thus kept well isolated from the RF
energy following the inside of the outer conductor. Coaxial lines can support a
high bandwidth with low noise, and for this reason they are still the norm
today for most RF applications.
The high-bandwidth property of coaxial lines has an interesting implication
that the 1934 BSTJ article introducing the concept must lay out explicitly,
since the technology was not yet familiar. Because a coaxial line can carry a
wide range of frequencies simultaneously, it can be used by a radio system much
like the air. We have previously discussed "wired radio," but coaxial lines
provide a much more literal "wired radio." Modern CATV systems, for example,
are basically an entire broadcast RF spectrum, up to about 1GHz, but contained
inside of a coaxial line instead of traveling through free air. The implication is
that coaxial lines can carry a lot of payload, using conventional radio
encoding methods, but are isolated such that two adjacent lines can use the
same frequencies for different purposes with no (or in practice minimal)
interference with each other.
We can presumably imagine how telephone over coaxial line works: much like the
open-wire carriers, telephone calls are modulated to higher frequencies, and
thus many telephone calls at different frequencies can be multiplexed onto one
line. The principle is simple, although in the 1930s the electronics necessary
to perform the modulation, amplification, and demodulation were rather complex.
Adding further complexity, the early coaxial lines which could be manufactured
at the time had rather high attenuation compared to modern cables, requiring
frequent amplification of the signal (you will be surprised, when we get to it,
by just how frequent). Further, the RF properties of the cables (in terms of
frequency response and attenuation) turned out to be significantly related to
the temperature of the cable, likely mostly because of expansion and
contraction of the outer conductor which was physically relatively large (1/2"
common in early telephony experiments) and secured loosely compared to modern
designs.
Another important trend occurring at the same time was the creation of national
television networks. Radio networks already often used leased telephone lines
to distribute their audio programming to various member stations, and this had
by the 1930s already become a profitable service line for AT&T. Television
networks were now looking to do the same, but the far higher bandwidth required
for a television signal posed a challenge to AT&T which had few options
technically capable of carrying them. This was a huge incentive to develop
significantly higher bandwidth carriers.
AT&T first created a 2600' test cable at a facility in Phoenixville,
Pennsylvania. Tests conducted on this length of copper pipe in 1929 validated
the concept and lead to the 1930s project to fully realize the new carrier
scheme. In 1936, AT&T committed, building a real coaxial long-distance lead
between New York and Philadelphia that supported 240 voice channels or a single
television channel. The design bandwidth of this line was what we now call
1MHz, but in AT&T documents it is widely referred to as the "million-cycle
system" or, less picturesque, the 1,000 kc system. Because of the rather high
attenuation in the line, repeaters were required every 10.5 miles, and the
design of suitably wide-band repeaters was one of the greater challenges in
development of this experimental toll lead.
Powering these repeaters proved a challenge. Previous carrier systems had
usually had a local utility install three-phase power to each repeater station;
it was undesirable to run power along with the signal wires because AC hum in
telephone calls had been an ongoing frustration with telephone lines run along
with power. With repeaters as frequently as 10 miles, though, the cost of
adding so many new power lines would have been excessive. Instead, the decision
was made to bundle the coaxial signal cable along with wires used for
high-voltage DC. "Primary power supply stations," later called "main
stations," had grid connections (and typically backup batteries and generators)
along with rectification equipment to inject HVDC onto the cable. Repeaters
between main stations ran off of this DC power. Much the same technique is used
today for transoceanic cables.
Following experiments performed on this early coaxial route, the frequency
division carrier on coaxial cable was productionized as the L-carrier, or
specifically L1. The first proper L1 route was installed from Stevens Point,
Wisconsin to Minneapolis in 1941. L1 combined voice channels into groups of 12,
called banks. Five banks were then modulated to different carriers to form a
group. Finally, eight groups were modulated onto carriers to form a
"supergroup" of 480 channels, which was transmitted on the cable [2]. The end
result spanned 68 kHz to 2044 kHz on the line, and some additional carriers at
higher frequencies were used as "pilots" for monitoring cable attenuation to
adjust amplifiers.
As L1 equipment and installation methods improved, additional supergroups were
added to reach 600 total channels, and it became the norm to combine multiple
coaxial lines into a single assembly (along with power wires and often several
basic twisted pairs, which served as order wires). Late L1 installations used
four coaxial lines, for a total of 2400 channels.
AT&T briefly experimented with an L2 carrier, a variation that was intended to
be simpler and lower cost and thus suitable for shorter toll leads (e.g. within
metro areas). The effort quickly proved to be uncompetitive with conventional
cables and was canceled, which is simply to explain why most accountings of
L-carrier history totally skip L2.
In 1952, a major advancement to the technology came in the form of the L-3
carrier, initially installed between New York and Philadelphia for testing. L-3
carried three "mastergroups" spanning approximately 200kHz to 8.3 MHz. Each
mastergroup contained two submastergroups, which each contained six
supergroups, which matched L1 in containing eight groups of five banks of
twelve channels. This all combined onto 8 coaxial lines in a typical assembly
yielded a total of 14,880 voice channels per route, although both more and
fewer coaxial lines were used for some routes. As an additional feature, L-3
could optionally replace two mastergroups with a TV channel, allowing one TV
channel and 600 channels on a cable.
One of the larger improvements to L-3, besides its increased capacity, was a
significant expansion of the supporting infrastructure considered part of the
carrier installation. This included repeaters at 4 mile intervals, some of
which were fitted with additional signal conditioning equipment (namely for
equalization of the balanced pairs). Main stations were required to inject
power at roughly 100-mile intervals.
For an additional quality improvement, L-3 used a technique called "frogging" in
which the supergroups were periodically "rearranged" or swapped between
frequency slots. This prevented excessive accumulation of intermodulation
products in any one supergroup frequency range, and was done at some main
stations, typically about every 800 miles.
A more interesting feature, though, was L-3's substantial automatic protection
capabilities. Equipment at each main station (where power was injected)
monitored a set of pilot signals on each coaxial line, and each route was
provided with a spare coaxial line. A failure of any coaxial line triggered an
automatic switching of its modulation and demodulation equipment onto the spare
pair, restoring service in about 15ms. L-3 also contained several other
automatic recovery features, and an extensive alarm system that detected and
reported faults in the cable or at repeaters.
Here, at L-3, we will spend more time discussing what is perhaps the most
interesting part of the L-carrier story: the physical infrastructure. L-3 was
the state of the art in long-distance toll leads during an especially
portentous period in US telecom history: the 1960s or, more to the point, the
apex of the cold war.
Beginning in 1963, the military began construction of the Automatic Voice
Network, or AUTOVON. AUTOVON, sometimes called "the DoD's phone company," was a
switched telephone network very much like the regular civilian one but
installed and operated for the military. But, of course, the military did not
really build telephone infrastructure... they contracted it to AT&T. So, in
practice, AUTOVON was a "shadow" or "mirror" telephone system that was
substantially similar to the PSTN, and operated by AT&T on largely the same
equipment, but only terminated at various military and government
installations.
A specific goal for AUTOVON was to serve as a hardened, redundant system that
would survive an initial nuclear attack to enable coordination of a reprisal.
In other words, AUTOVON was a critical survivable command and control system.
To that end, it needed a survivable long-distance carrier, and L-carrier was
the state of the art.
Previous L-carrier routes, including the initial L-3 routes, had enclosed their
repeaters and power feeds in existing telephone exchange buildings and brick
huts alongside highways. For AUTOVON, though, AT&T refined the design to L-3I,
or L-3 Improved. The "Improved," in this case, was a euphemism for nuclear
hardening, and L-3I routes consisted entirely of infrastructure designed to
survive nuclear war (within certain parameters, such as a maximum 2 PSI blast
overpressure for many facilities). Built in the context of the cold war, nearly
all subsequent L-carrier installations were nuclear hardened.
The first major L-3I project was the creation of a hardened transcontinental
route for AUTOVON. Like the 1915 open-wire route before it, the first L-3I
connected the east coast to the west coast via New York and Mojave---or rather,
connected the military installations of the east coast to the military
installations of the west coast.
L-3I routes consisted of repeater stations every 4 miles, which consisted of a
buried concrete vault containing the repeater electronics and a sheet-metal hut
on the surface, directly over the vault manhole, containing test equipment and
accessories to aid maintenance technicians. Because repeaters were powered by
the line itself, no utility power was required, although many repeater huts
had it added at a later time to allow use of the lights and ventilation blower
without the need to run a generator.
Every 100 miles, a main station injected power onto the cable and contained
automatic protection equipment. Some main stations contained additional
equipment, up to a 4ESS tandem switch. Main stations also served as
interconnect points, and often had microwave antennas, cables, and sometimes
"stub" coaxial routes to connect the L-3I to nearby military and civilian
telephone exchanges (L-3I routes installed for AUTOVON were also used for
civilian traffic on an as-available basis). A few particularly large main
stations had even more equipment, as they were capable of serving as emergency
network control facilities for the AUTOVON system.
A typical main station consisted of a five to twenty thousand square foot
structure buried underground, with all sensitive equipment mounted on springs
to provide protection from shocks transmitted through the ground. Vent shafts
from the underground facility terminated at ground-level vents with blast
deflectors. A gamma radiation detector on the surface (and, in later
installations, a "bhangmeter" type optical detector) triggered automatic
closure of blast valves on all vents when a nearby nuclear detonation was
detected. Several diesel generators, either piston or turbine depending on the
facility, were backed by buried diesel tanks to provide a two-week isolated
runtime. Water wells (with head pit underground), water tanks, and supplies of
sealed food supported the station's staff for the same two week duration.
This was critical, as main stations required a 24/7 staff for monitoring and
maintenance of the equipment.
At those facilities with interconnections to microwave routes, even the
microwave antennas were often a variant of the KS-15676 modified for hardening
against blast overpressures by the addition of metal armor. L-3I main stations,
being hardened connection points to AUTOVON with a maintenance staff on-hand,
were often used as ground stations for the ECHO FOX and NORTH STAR contingency
communications networks that supported Air Force One and E-4B and E-6 "Doomsday
planes."
This first transcontinental L-3I ran through central New Mexico and had a main
station at Socorro, where I used to live. In fact, the Socorro main station [3]
housed a 4ESS tandem switch, a master synchronization time source for the later
digital upgrade of the system, and served as the contingency network control
center for the western half of AUTOVON, making it one of the larger main
stations. You would have no idea from the surface, as the surface structures
are limited to a short microwave tower (for interconnection to the Rio Grande
corridor microwave route) [4], a small surface entry building, and a garage for
vehicle storage. The only indication of the facility's nature as cold war
nuclear C2 infrastructure are the signs on the perimeter fence which bear a
distinctive warning about the criminality of tampering with communications
infrastructure used for military purposes. And the gamma radiation detector, if
you know what they look like.
Hopefully you can see why I have always found this fascinating. Rumors about
secret underground military facilities abound, and yet few really exist... but
somewhat secret underground telephone facilities are actually remarkably
common, as not only L-3I but following L-4 and L-5 main stations were all
hardened, buried facilities that were, at least initially, discreet. The fact
that such an important part of the network infrastructure was located in such a
rural area might be a surprise to you, for example (at least if you are
familiar with New Mexico geography), but this was an explicit goal: L-3I main
stations were required to be located at least 20 miles from any major
population centers, since they were designed based on a nuclear detonation at 5
miles distance. So not only are these sorts of underground facilities found
throughout the nation, they're almost always found in odd places... off the
side of rural US highways between modest towns.
Given the lengths I have already reached, I will spend less time on L-4 and
L-5. This isn't much of a loss to you, because L-4 and L-5 were mostly
straightforward incremental improvements on L-3I. L-4 reached up to 72,000
channels per route while L-5E ("L5 Enhanced," which if you read the relevant
BSTJ articles appears to be merely the original L5 scheme with a limitation in
the multiplexing equipment resolved) reached up to 108,000 channels, using
66 MHz of bandwidth on each coaxial line.
Somewhat counter-intuitively, AT&T achieved these increases in capacity at the
cost of increased attenuation, so the repeater frequency actually increased as
the L-carrier technology evolved. L-4 required a repeater every 2 miles and a
power feed every 150, while L-5 required a repeater every 1 mile and a power
feed every 75. Some L-3I routes, such as the segment between Socorro and
Mojave, were upgraded to L-4, resulting in an L-4 repeater added between each
pair of original L-3I repeaters. Most L-4 routes were upgraded to L-5,
resulting in "alternating" main stations as smaller L-5 power-feed-only main
stations were added between the older L-4 main stations with more extensive
protection equipment.
Both L-4 and L-5 made use of entirely underground repeaters (e.g. no hut at the
surface), although L-4 repeaters sometimes had huts above them... usually at
mid-span equalizing repeaters (every 50 miles) and occasionally randomly at
others. The L-4 huts are said to have been almost entirely empty, serving only
to give technicians a workspace out of the wind and rain.
These L-carrier systems were entirely analogue as you have likely gathered, and
started out as tube equipment that transitioned to transistorized in the L-3I
era. But the analogue limitation was not undefeatable, and Philips designed a
set of systems for digital data on L-carrier referred to as P-140 (140 Mbps on
L-4) and P-400 (400 Mbps on L-5). Most L-carrier routes still in service in the
'80s were upgraded to digital.
What came of all of this infrastructure? Not long after the development of L-5E
in 1975, fiber optics began to reach maturity. By the 1980s fiber optics had
become the obvious future direction in long-distance telecoms, and L-carrier
began to look obsolete. L-carrier routes generally went out of service in the
'90s, although some survived into the '00s. Many, but not all, L-carrier rights
of way were reused to trench fiber optic cable, and some L-carrier repeater
vaults were reused as land for fiber add/drop and regeneration huts (typically
every 24 miles on telco fiber lines).
More interestingly, what about the main stations? Large underground facilities
have long proven difficult to repurpose. At least twice a month someone
declares that it would be a good idea to purchase an underground bunker and
operate it as a "high security data center," and sometimes they even follow
through on it, despite the fact that these ventures have essentially never been
successful (and the exceptions seem to be the type that prove the rule, since
they are barely surviving and/or criminal enterprises). The nation is studded
with AT&T main stations and Atlas and Titan missile silos that suffer from
extremely haphazard remodeling started, but not finished, by a "data center"
operator before going bankrupt. There are two examples just in the sparsely
populated state of New Mexico (both surrounding Roswell in the former Walker
AFB missile silo complex).
In practice, the cost of restoring and upgrading a Cold War underground
facility for modern use usually exceeds the cost of building a new underground
facility. The rub of it is that no one actually wants to put their equipment in
an underground data center anyway. These Cold War facilities cannot practically
be upgraded to modern standards for redundant HVAC, power, and connectivity,
and are never operated by ventures with enough money to hire security, add
vehicle protections, and obtain certifications. Ironically, they are less
secure and reliable than your normal above-ground type. Most of them are highly
prone to flooding [5].
Many main stations, L-4 and L-5 in particular, have been sold into private
ownership. Some owners have tried to make use of the underground facility, but
most have abandoned it and only use the surface land (for example because it is
adjacent to their farm). A few are being restored but these restoration efforts
quickly become very expensive and usually fail due to lack of funds, meaning
these often come up on the market with odd quirks like new kitchen appliances
but a foot of water on the lower level.
On the other hand, because L-carrier main stations sat on high-capacity
long-distance lines and had a staff and space for equipment, they naturally
became junction points for other types of long-distance technology. Many
L-carrier main stations are still in use today as switching centers for fiber
routes, but in most cases the underground has been placed in mothballs and new
surface buildings contain the actual equipment (the cost of modernizing the
electrical infrastructure and adding new cable penetrations to the underground
areas is very high). Mojave is a major example, as the old Mojave L-3I main
station remains one of Southern California's key long-distance telephone
switching centers.
Still others exist somewhere in-between. I have heard from a former employee
that Socorro, for example, is no longer in use for any long-distance
application and is largely empty. But CenturyLink, the present owner, still
performs basic caretaking on the structure at least in part because they know
that details of the lease agreement (most western L-carrier facilities are on
land leased from the BLM or Forest Service) will require them to perform site
remediation that is expected to be very costly. As happens all too often with
old infrastructure, it's cheaper to keep the lights on and little else than to
actually close the facility.
I am not aware of any former main stations that are not fairly well secured.
Repeaters are a different story. L-4 and L-5 seldom lead to interesting
repeater sites since they were underground vaults that were often filled and in
any case are very dangerous to enter (hydrogen sulfide and etc). L-3I, on the
other hand... nearly all visible signs of the L-3I transcontinental in
California were removed due to the environmental sensitivity of the Mojave
Desert region. Many other L-3I routes, though, in their more rural sections,
feature completely abandoned repeater huts with the doors left to flap in the
wind.
Even in the absence of visible structures, L-carrier has a very visible legacy.
In the desert southwest, where the land is slow to heal, the routes are clearly
visible in aerial images to this day. A 100' wide swath cleared of brush with a
furrow in the center is perhaps more likely to be a petroleum pipeline, but
some of them are old L-carrier routes. You can identify AT&T's routes in some
cases by their remarkable dedication to working in completely straight lines,
more so than the petroleum pipelines... perhaps an accommodation to the limited
surveying methods of the 1960s that created a strong desire for the routes to
be easy to pace out on the ground.
Since the genesis of the L-carrier system AT&T has maintained a practice of
marking underground lines to discourage backhoe interference. There are many
types of these markers, but a connoisseur learns certain tricks. Much like the
eccentric wording "Buried Light Guide" indicates one of the earliest fiber
optic routes, signs reading "Buried Transcontinental Telephone Line" usually
indicate L-carrier. Moreover, L-carrier routes all the way to L-5 used AT&T's
older style of ROW markers. These were round wooden posts, about 4" in diameter
and 4-8' tall, with one to three metal bands painted orange wrapped around the
top. Lower on the post, a rectangular sign gave a "Buried telephone line, call
before digging" warning and a small metal plate affixed to the post near the
sign gave a surveyor's description of the route each way from the post (in
terms of headings and distances). Maintenance crews would locate the trench if
needed by sighting off of two posts to find the vector between them, so they
are usually tall enough and close enough together that you can see more than
one at a time.
It's hard to find or put together maps of the entire system, as routes came and
went over time and AT&T often held maps close to their chest. Some are
available though, and a 1972 map [6] depicts L-3I and major microwave routes.
L-4 and L-5 routes were more common but fewer maps depict them; on all maps of
long-distance routes in the 1980s time period there is a high chance that any
given non-microwave route is L-4 or L-5.
At the peak of the L-carrier network, there were over 100 hardened underground
main stations, 1000 underground repeater vaults, and at least 5,000 miles of
right of way. For a period of around two decades, L-carrier was the dominant
far long-distance technology in the United States, and the whole thing was
designed for war.
There is so much more to say about both L-carrier and AT&T's role in Cold War
defense planning, and I have already said a lot here. My next post will
probably be on a different topic, but I will return to long-distance systems in
order to discuss microwave. Microwave relay systems were extensively built and
covered many more route-miles than L-carrier. The lower cost of installation
made microwave better for lower-capacity routes, and also spurred competitive
long distance carriers like MCI and Verizon to use almost entirely microwave.
Later, we will get to fiber, although I have previously written about SONET.
I will also return to AT&T and nuclear war, one of my greatest interests. The
practicalities of the Cold War---that it consisted primarily of an enormous
planning exercise in preparation for nuclear attack---meant that AT&T
functioned effectively as another branch of the military. Nearly every nuclear
scenario involved AT&T's infrastructure, and AT&T and its subsidiaries,
partners, and successors were knee deep in secret planning for the end of the
world. They still are today.
P.S.: I have a fairly extensive collection of information on L-carrier routes,
and particularly those built for AUTOVON. There is a surprisingly large
community of people interested in this topic, which means that many resources
are available. Nonetheless it has always been my intent to put together one of
the most comprehensive sources of information on the topic. For various reasons
I had put this project down for years, but I am picking it back up now and hope
to produce something more in the format of a book over the next year. I will
perhaps share updates on this from time to time.
[1] You have hopefully, by now, realized that I am using "AT&T" to refer to the
entirety of the Bell System apparatus, which at various time periods consisted
of different corporate entities with varying relationships. Much of the work I
attribute to AT&T was actually performed by AT&T Long Lines, Bell Laboratories,
and the Western Electric Company, but some of it was also performed by various
Bell Operating Companies (BOCs, although they became the somewhat more
specifically defined RBOCs post-breakup). All of these entities have been
through multiple rounds of restructuring and, often, M&A and divestitures, with
the result that you sort of need to settle on using one name for all of them to
avoid spending a lot of time explaining the relationships. The same organizations
usually exist today in the forms of AT&T, Alcatel-Lucent, Nokia, Avaya,
CenturyLink, etc., but often not recognizably.
[2] This hierarchy of multiple levels of multiplexing was used both to make the
multiplexing electronics more practical and to allow L-carrier's 12-channel
banks to "appear" the same as the J and K-carrier banks. The concept had a lot
of staying power, and virtually all later telephone-industry multiplexing
scheme used similar hierarchies, e.g. DS0, DS1, etc.
[3] If you are following along at home it is technically the Luis Lopez or
"Socorro #2" main station, located just south of Socorro, as the Socorro name
was already used within AT&T Long Lines for an en-route microwave relay located
somewhat north of Socorro.
[4] If you're one of the few who has seen my few YouTube videos, you might find
it interesting that documentation refers to a direct microwave connection
between the Socorro #2 main station and the Manzano Base nuclear weapons
repository (now disused and part of Kirtland AFB). It's unclear if this was a
dedicated system or merely reserved capacity on the Rio Grande route, although
the latter seems more likely since multiple relays would be required and
there's no evidence of any.
[5] Like most underground facilities, these main stations are often below the
water table and have always required sump pumps for water extraction. As they
get older the rate of water ingress tends to increase, and so if the pumps are
out of operation for any period of time they can quickly turn into
very-in-ground swimming pools.
I have probably described before the concept of the telephone network forming
a single, continuous pair of wires from your telephone to the telephone of the
person you are calling. This is the origin of "circuit switching" and the
source of the term: the notion that a circuit-switched system literally forms
an electrical circuit between two endpoints.
Of course, we presumably understand that modern systems don't actually do this.
For one, most long-distance data transmission today is by means of optics. And
more importantly, most modern systems that we call circuit-switching are
really, in implementation, packet switching systems that use fixed allocation
schemes to provide deterministic behavior that is "as good as a circuit."
Consider the case of MPLS, or Multi-Protocol Label Switching, a network
protocol which was formerly extremely popular in telecom and ISP backhaul
networks and is still common today, although improvements in IP switching have
reduced its popularity [1]. MPLS is a "circuit-switched" system in that it
establishes "virtual circuits," i.e. connection setup is a separate operation
from using the connection. But, in implementation, MPLS is a packet switching
system and inherits the standard limitations thereof (resulting in a need for
QoS and traffic engineering mechanisms to provide deterministic performance).
We can say that MPLS implements circuit switching on top of packet switching.
One of the fun things about networking is that we can do things like this.
Why, though? Circuit switching is, conceptually, very simple. So why do we
bother with things like MPLS that make it very much more complicated, even as
simple as MPLS is?
There are two major reasons, one fundamental and one practical. First, the
conventional naive explanation of circuit switching implies that, when I call
someone in India, the telephone network allocates a set of copper wires all the
way from me to them. This is a distance of many thousands of miles, which
includes oceans, and it does not seem especially likely that the telecom
industry has sunk the thousands of tons of copper into the ocean that would be
required to accommodate the telephone traffic between the US and the Asian
region. It is obvious on any consideration that, somehow, my telephone call
is being combined with other telephone calls onto a shared medium.
Second, there is the issue of range. The microwatt signal produced by your
telephone will not endure thousands of miles of wire, even if the gauge was
made unreasonably large. For this simple practical reason, signals being
moved over long distances need to somehow be encoded differently in a way that
can cover very long distances.
Both of these things are quite unsurprising to us today, because we are
fortunate enough to live in a world in which these problems were solved long
ago. Today, I'm going to talk about how these problems were solved, in the
first case where they were encountered at large scale: the telephone network.
Your landline telephone is connected by means of a single pair, two wires. This
pair forms a loop, called the local loop, from the exchange to your phone and
back. Signals are conveyed by varying the voltage (and, by the same token,
current as the resistance is fixed) on the circuit, or in other words by
amplitude modulation. The fact that this works in full duplex on a single loop
is surprisingly clever from the modern perspective of digital protocols which
almost universally are either half-duplex or need separate paths for each
direction, but the electrical trick that enables this was invented at about the
same time as the telephone. It's reasonably intuitive, although not quite
technically accurate, to say that each end of the telephone line knows what
signal it is sending and can thus subtract it from the line potential.
The possible length of the local loop is limited. It varies by the gauge of
wire used, which telephone companies selected based on loop length to minimize
their costs. In general, beyond about ten miles the practicality starts to drop
as more and more things need to be done to the line to adjust for resistance
and inductance attenuating the signal.
The end result is that the local loop, the part of the telephone system we are
used to seeing, is actually sort of the odd one out. Virtually every other part
of the telephone system uses significantly different signaling methods to
convey calls, and that's not just a result of digitization: it's pretty much
always been that way, since the advent of long-distance telephony.
Before we get too much further into this, though, a brief recap of the logical
architecture of the telephone system. Let's say you make a long distance call.
In simplified form (in the modern world there are often more steps for
optimization reasons), your phone is directly connected by the local loop to a
class 5 or local switch in your exchange office. The local switch consults
routing information and determines that the call cannot be completed locally,
so it connects your call to a trunk. A trunk is a phone line that does not
connect a switch to a phone... instead, it connects a switch to another switch.
Trunk lines thus make up the backbone of the telephone network.
In this case, the trunk line will go to a tandem, class 4, or toll
switch. These are all mostly interchangeable terms used at different periods.
Tandem switches, like trunk lines, are not connected to any subscriber phones.
Their purpose is to route calls from switch to switch, primarily to enable long
distance calling---between two local switches. In our example, the tandem
switch may either select a trunk to the local switch of the called party, or if
there is no one-hop route available it will select a trunk to another tandem
switch which is closer to [2] the called party. Eventually, the last tandem
switch in the chain will select a trunk line to the called party's local switch,
which will select the local loop to their phone.
What we are most interested in, here, are the trunks.
Trunk lines may be very long, reaching thousands of miles for trans-continental
calls. They are also expected to serve a high capacity. Almost regardless of
the technology [3], laying new trunk lines is a considerable expense to this
day, so it's desirable to concentrate a very large amount of traffic onto a
small number of major lines. As a result, common routing in the telephone
network tends to resemble a hub and spoke architecture, with calls between
increasingly larger regions being concentrated onto just a few main trunks
between those regions. The modern more mesh-like architecture of the internet,
the more flexible routing technology it required, and the convergence of
telephony on IP is lessening this effect, but it's still fairly prominent and
was completely true of the early long-distance network.
Consider, for example, calls from New York City to Los Angeles. These two major
cities are separated by a vast distance, yet many calls are placed between
them. For cost reasons, just a small number of transcontinental lines, each of
them a feat of engineering, must take the traffic. Onto those same lines is
aggregated basically the entire call volume between the east coast and the west
coast, easily surpassing one hundred thousand simultaneous connections.
Now, imagine you tried to do this by stringing one telephone line for each
call.
Well, in the earliest days of long-distance telephony, that's exactly what was
done. A very long two-wire telephone circuit was strung between cities just
like between exchange offices and homes. To manage the length, inductance coils
were added at frequent intervals to adjust frequency response, and at less
frequent intervals the line was converted to four-wire (one pair each
direction) so that an amplifier could be inserted in each pair to "boost" the
signal against line loss. These lines were expensive and the quality of the
connection was poor, irritating callers with low volume levels and excessive
noise.
Very quickly, long-distance trunks were converted to a method we now refer to
as open wire. On these open wire trunks, sets of four wires (one pair for
each direction) were strung alongside each other across poles with multiple
cross-arms. Because a set of four wires was required for every simultaneous
phone call the trunk could support (called a channel), it was common to have an
absolute maze of wires as, say, four cross-arms on each pole each supported two
four-wire pairs. This large, costly assembly supported only eight channels.
Four-wire circuits were used instead of two-wire circuits for several reasons,
including lower loss and greater noise immunity. But moreover, continuous use
of a four-wire circuit made it easier to install amplifiers without having to
convert back and forth (which somewhat degraded quality every time). Loading
coils to adjust inductance were still installed at regular intervals (every
mile was typical).
The size and cost of these trunks was huge. Nonetheless, in 1914 AT&T completed
the first transcontinental telephone trunk, connecting for the first time the
eastern network (through Denver) to the previously isolated west coast network.
The trunk used three amplifiers and uncountable loading coils. Amusingly, for
basically marketing reasons, it would not go into regular service until 1915.
The high cost of this and subsequent long-distance connections was a major
contributor to the extraordinary cost of long-distance calls, but demand was
high and so long-distance open-wire trunks were extensively built, especially
in the more densely populated northeast where they formed the primary
connections between smaller cities for decades to come.
Years later, the development of durable, low-cost plastics considerably reduced
the cost of these types of trunks by enabling cheap "sheathed" cables. These
cables combined a great number of wire pairs into a single, thick cable that
was far cheaper and faster to install over long distances. Nonetheless, the
fundamental problem of needing two pairs for each channel and extensive line
conditioning remained much the same. The only real difference in call quality
was that sheathed cables avoided the problem of partially shorting due to rain
or snow, which used to make open-wire routes very poor during storms.
It was clear to Bell System engineers that they needed some form of what we now
call multiplexing: the ability to place multiple phone calls onto a single set
of wires. The first, limited method of doing so was basically a way of
intentionally harnessing crosstalk, the tendency of signals on one pair to
"leak" onto the pair run next to it. By use of a clever transformer
arrangement, two pairs could each carry one direction of one call... and the
two pairs together, each used as one wire of a so-called phantom circuit, could
carry one direction of a third call. This represented a 50% increase in
capacity, and the method was widely used on inter-city trunks. Unfortunately,
combining phantom circuits into additional super-phantom circuits proved
impractical, and so the modest 1.5x improvement remained and the technique was
far from addressing the problem.
Vacuum tube technology, originally employed for amplifiers on open-wire
circuits, soon offered an interesting new potential: carriers. Prior to carrier
methods, all telephone calls were carried on trunks in the audible frequency
range, just like on local loops. Carrier systems entailed using the audio
frequency signal to modulate a higher frequency carrier, much like radio. At
the other end, the carrier frequency could be isolated and the original audio
frequency demodulated. By mixing multiple carriers together, multiple channels
could be placed on the same open-wire pair with what we now call
frequency-division muxing.
The first such multiplexed trunk went into service in 1918 using what AT&T
labeled the "A" carrier. A-carrier was capable of carrying four channels on a
pair, using a single-sideband signal with suppressed carrier frequency, much
like the radio systems of the time. These carrier systems operated above
audible frequency (voice frequency or VF) and so were not considered to include
the VF signal, with the result that an open-wire line with A carrier could
convey five channels: four A-carrier channels and one VF channel.
Subsequent carriers were designed to use FDM on both open-wire and sheathed
cables, using improved electronics to fit more channels. Further, carriers
could be used to isolate the two directions instead of separate pairs, once
again allowing full-duplex operation on a single wire pair while still keeping
amplifiers practical.
This line of development culminated in the J-carrier, which placed 12 channels
on a single open-wire trunk. J-carrier operated above the frequencies used by
older carriers such as C-carrier and VF, and so these carriers could be
"stacked" to a degree enabling a total of 17 bidirectional channels on a
four-wire trunk [4], using frequencies up to 140 KHz. This 17x improvement came
at the cost of relatively complex electronics and more frequent amplifiers, but
still yielded a substantial cost reduction on a per-channel basis. J-carrier
was widely installed in the 1920s as an upgrade to existing open-wire trunks.
Sheathed cables yielded somewhat different requirements, as crosstalk was a
greater issue. A few methods of mitigating the problem lead to the development
of the K-carrier, which multiplexed 12 channels onto each pair in a sheathed
cable. Typically, one sheathed cable was used for signals each direction to
reduce crosstalk. Sheathed cables could contain a large number of pairs
(hundreds was typical), making the capacity highly scalable. Further, K-carrier
was explicitly designed to operate without loading coils, further lessening
cost of the cable itself. In fact, loading coils improved frequency response
only to a point and worsened it much beyond VF, so later technologies like
K-carrier and even DSL required that any loading coils on the line be removed.
As a downside, K-carrier required frequent repeaters: every 17 miles. Each
repeater consisted of two amplifiers, one each direction, per pair in use.
Clever techniques which I will not describe in depth were used to automatically
adjust amplifiers to maintain consistent signal levels throughout the line.
Because these repeaters were fairly large and power intensive, they were
installed in fairly substantial brick buildings that resembled small houses but
for their unusual locations. Three-phase power had to be delivered to each
building, usually adding additional poles and wires.
The size of the buildings is really quite surprising, but we must remember that
this was still prior to the invention of the transistor and so the work was
being done by relatively large, low-efficiency tubes, with sensitive
environmental requirements. The latter was a particularly tricky aspect of
analog carriers. Repeater buildings for most open-wire and cable carriers used
extremely thick brick walls, which was not yet for blast hardening but instead
a method of passive temperature stabilization as the thermal mass of the brick
greatly smoothed the diurnal temperature cycle. A notable K-carrier trunk ran
between Denver and El Paso, and the red brick repeater buildings can still be
seen in some more rural places from I-25.
This post has already reached a much greater length than I expected, and I have
yet to reach the topics that I intended to spend most of it on (coaxial and
microwave carriers). So, let's call this Part I, and look forward to Part II in
which the telephone network will fight the Cold War.
[1] MPLS used to have massive dominance because it was practical to implement
MPLS switching in hardware, and IP switching required software. Of course, the
hardware improved and IP switching can now be done in silicon, which reduces
the performance advantage of MPLS. That said, MPLS continues to have benefits
and new MPLS systems are still being installed.
[2] Closer in the sense of network topology, not physical locality. The
topology of the telephone network often reflects history and convenience, and
so can have unexpected results. For much of the late 20th century, virtually
all calls in and out of the state of New Mexico passed through Phoenix, even if
they were to or from Texas. This was simply because the largest capacity trunk
out of the state was a fiber line from the Albuquerque Main tandem to the
Phoenix Main tandem, along the side of I-40. Phoenix, being by then a more
populous city, was better connected to other major cities.
[3] Basically the only exception is satellite, for which the lack of wires
means that cost tends to scale more with capacity than with distance. But
geosynchronous satellites introduce around a half second of latency, which
telephone callers absolutely despised. AT&T's experiments with using satellites
to connect domestic calls were quickly abandoned due to customer complaints.
Satellites are avoided even for international calls, with undersea cables much
preferred in terms of customer experience. Overall the involvement of
satellites in the telephone network has always been surprisingly minimal, with
their role always basically limited to connecting small or remote countries to
which there was not yet sufficient cable capacity.
[4] Because the VF or non-carrier channel could be used by any plain old
telephone connected to the line (via a hybrid transformer to two-wire), it was
used as an order wire. The order wire was essentially a "bonus" channel that
was primarily used by linemen to communicate with exchange office staff during
field work, i.e. to obtain their orders. While radio technology somewhat
obsoleted this use of the order wire, it remained useful for testing and for
connecting automated maintenance alarms. Telephone carriers to this day usually
have some kind of dedicated order wire feature.