Today, as New Mexico celebrates 4/20 early, seems an appropriate time to talk
about bhang... or rather, the bhangmeter.
The name of the bhangmeter seems to have been a joke by its designer and Nobel
laureate Frederick Reines, although I must confess that I have never totally
gotten it (perhaps I simply haven't been high enough). In any case, the
bhangmeter is one of the earliest instruments designed for the detection of a
nuclear detonation. In short, a bhangmeter is a photosensor with accompanying
discrimination circuits (or today digital signal processing) that identify the
"double flash" optical and heat radiation pattern which is characteristic
of a nuclear detonation.
The double flash originates from the extreme nature of the period immediately
after a nuclear detonation: the detonation creates an immense amount of heat
and light, but very quickly the ionized shockwave emerging from the explosion
actually blocks much of the light output. As the shockwave expands and loses
energy, the light can escape again. The first pulse is only perhaps a
millisecond long and has very sharp edges, while the second pulse appears more
slowly and as much as a second or so later (depending on weapon type,
The immensely bright light of a nuclear detonation, accompanied by this double
flash intensity pattern, is fairly unique and has been widely for remote
sensing for nuclear weapons. Today this is mostly done by GPS and other
military satellites using modern optical imaging sensors, and the same
satellites observe for other indications of nuclear detonation such as an X-ray
pulse to confirm . The bhangmeter itself, though, dates back to 1948 and
always showed potential for large-area, automated monitoring.
The United States first effort at large-scale automated nuclear detonation
monitoring was entrusted to the Western Union company, at the time the nation's
largest digital communications operator. By 1962, Western Union had completed
build-out of the uncreatively named Bomb Alarm System (BAS). BAS covered 99
locations which were thought to be likely targets for nuclear attack, and was
continuously monitored (including state of health and remote testing) from six
master control stations. It operated until the late '60s, when improved space
technology began to obsolete such ground-based systems.
Let's spend some time to look at the detailed design of the BAS, because it
has some interesting properties.
At each target site, three sensors are placed in a circle (at roughly 120
degrees apart) of eleven miles radius. This distance was chosen so that the
expected sensitivity of the sensors in poor weather would result in a
detonation at the center of the circle triggering all three, and because it
allowed ample time for a sensor to finish transmitting its alarm before it was
destroyed by shockwave-driven debris. If a nuclear weapon were to detonate off
center, it may destroy one station but the other two should complete
transmission of the alarm. This even allowed a very basic form of
The sensors were white aluminum cylinders mostly mounted to the top of
telephone poles, although some were on building roofs. On casual observation
they might have been mistaken for common pole-top transformers except that each
had a small cylindrical Fresnel lens sticking out of the top, looking not
unlike a maritime obstruction light. The Fresnel lens focused light from any
direction towards a triangular assembly of three small photocells. A perforated
metal screen between the lens and the photocells served both to attenuate light
(since the expected brightness of a nuclear detonation was extremely high) and
as a mounting point for a set of xenon flash bulbs that could be activated
remotely as a self-test mechanism.
In the weatherproof metal canister below the lens was a substantial set of
analog electronics which amplified the signal from the photocells and then
checked for a bright pulse with a rise time of less than 30ms, a brightness
roughly equivalent to that of the sun, and a decay to half brightness within
30ms. A second pulse must reach the same brightness within one second and
decay within one second.
Should such a double flash be detected, the sensor interrupted the 1100Hz
"heartbeat" tone modulated onto its power supply and instead emitted 920Hz for
one second followed by 720Hz for one second. These power supply lines, at 30vdc
(give or take the superimposed audio frequency tone), could run for up to 20
miles until reaching a signal generating station (SGS).
The SGS was a substantial equipment cabinet installed indoors that provided the
power supply to the sensor and, perhaps more importantly, monitored the tone
provided by the sensor. The SGS itself is very interesting, and seems to have
been well ahead of its time in terms of network design principles.
Long series of SGS could be connected together in a loop of telegraph lines.
Each SGS, when receiving a message on its inbound line, decoded and re-encoded
it to transmit on its outbound line. In this way the series of SGS functioned
as a ring network with digital regeneration at each SGS, allowing for very long
distances. This was quite necessary as the SGS rings each spanned multiple
states, starting and ending at one of the three master control stations.
Further, SGS performed basic collision avoidance by waiting for inbound
messages to complete before sending outbound messages, allowing the ring
network to appropriately queue up messages during busy periods.
During normal operation, the master control station transmitted into the ring a
four-character "poll" command, which seems to have been BBBG. This is based on
a telegraph tape shown in a testing document, it is not clear if this was
always the signal used, but BBBG does have an interesting pattern property in
Baudot that suggests it may have been used as a polling message as a way of
testing timing consistency in the SGS. An SGS failing to maintain its
baudot clock would have difficulty differentiating "B" and "G" and so would
fail to respond to polls and thus appear to be offline.
In response to the poll, each station forwarded on the poll message and checked
the tone coming from its attached sensor. If the normal heartbeat or "green"
tone was detected, it sent a "green" status report. For example, "JGBW," where
the first three characters are an identifier for the SGS. Should it fail to
detect a tone, it could respond with a trouble or "yellow" status, although I
don't have an example of that message.
Since each station sending its status would tie up the line, stations further
down would have to wait to report their status. The way this queuing worked
out, a noticeable amount of time after initiating the poll (around ten seconds
by my very rough estimation) the master control station would receive its own
poll command back, followed by green or yellow status messages from each SGS
in the loop, in order. This process, repeated every couple of minutes, was
the routine monitoring procedure.
Any SGS which failed to receive a poll command for 2.5 minutes would
preemptively send a status message. This might seem odd at first, but it was a
very useful design feature as it could be used to locate breaks in the loop. A
damaged telegraph line would result in no responses except for 2.5 minute
status messages from all of the SGS located after the break. This localized
the break to one section of the loop, a vital requirement for a system where
the total loop length could be over a thousand miles.
Should a sensor emit the 920Hz and 720Hz pattern, the attached SGS would wait
for the inbound line to be idle and then transmit a "red" message. For example,
"JGBCY," where "JG" is a station ID, "B" is an indicator of approximate yield
(this appears to have been a later enhancement to the system and I am not sure
of how it is communicated from sensor to SGS), "C" indicates an alarm and "Y"
is an optional terminator. The terminator does not seem to be present on
polling responses, perhaps since they are typically immediately followed by
The SGS "prioritizes" a red message in that as soon as an inbound message ends
it will transmit the red message, even if there is another inbound message
immediately following. Such de-prioritized messages will be queued to be sent
after the red alert. For redundancy, a second red message is transmitted a bit
later after the loop has cleared.
In the master control center, a computer sends poll messages and tracks
responses in order to make sure that all SGS are responsive. Should any red
message be received polling immediately stops and the computer begins recording
the specific SGS that have sent alarms based on their ID letters. At the same
time, the computer begins to read out the in-memory list of alarming stations
and transmit it on to display stations. Following this alarm process, the
computer automatically polls again and reports any "yellow" statuses to the
display stations. This presumably added further useful information on the
location and intensity of the detonation, since any new "yellow" statuses
probably indicate sensors destroyed by the blast. Finally, the computer
resets to the normal polling process.
When desired, an operator at a master control station can trigger the
transmission of a test command to a specific SGS or the entire loop. When
receiving this command, the SGS triggers the xenon flash bulbs in the sensor.
This should cause a blast detection and the resulting red message, which is
printed at the master control center for operator confirmation. This represents
a remarkably well-thought-out complete end-to-end test capability, in good form
for Western Union which at the time seemed to have a cultural emphasis on
complete remote testing (as opposed to AT&T which tended to focus more on
redundant fault detection systems in every piece of equipment).
To architect the network, the nation was first split roughly in half to form
two regions. In each region, three master control centers operated various
SGS loops. Each target area had three sensors, and the SGS corresponding to
each of the three sensors was on a loop connected to a different one of the
three master control centers. This provided double redundancy of the MCCs,
making the system durable to destruction of an MCC as well as destruction
of a sensor (or really, destruction of up to two of either).
In each display center, a computer system decoded the received messages and lit
up appropriate green, yellow, or red lights corresponding to each sensor. The
green and yellow lights were mounted in a list of all sensors, but the red
lights were placed behind a translucent map, providing an at-a-glance view of
the receiving end of nuclear war.
In the '60s, testing of nuclear defense systems was not as theoretical as it is
today. While laboratory testing was performed to design the sensors, the
sensors and overall system were validated in 1963 by the Small Boy shot of
Operation Dominic II. A small nuclear weapon was detonated at the Nevada Test
Site with a set of three BAS sensors mounted around it, adjusted for greater
than usual sensitivity due to the unusually small yield of the test weapon.
They were connected via Las Vegas to the operational BAS network, and as
expected detonation alarms were promptly displayed at the Pentagon and Ent and
Offutt Air Force Bases of the Strategic Air Command, which at the time would be
responsible for a reprisal.
I have unfortunately not been able to find detailed geographical information on
the system. The three Master Control Stations for the Western United States
were located at Helena, SLC, and Tulsa, per the nuclear test report. A map in a
Western Union report on the system that is captioned "Theoretical system
layout" but seems to be accurate shows detector coverage for Albuquerque,
Wyoming, and Montana in the Western region. These would presumably correspond
to Sandia Labs and Manzano Base and the Minuteman missile fields going into
service in the rural north around the same time as BAS.
The same map suggests Eastern master control stations at perhaps Lancaster,
Charlottesville, and perhaps Greensboro, although these are harder to place.
Additional known target areas monitored, based on several reports on the
 This system, called USNDS as a whole, has a compact space segment that
flies second-class with other military space systems to save money. The main
satellites hosting USNDS are GPS and the Defense Support Platform or DSP, a
sort of general-purpose heat sensing system that can detect various other
types of weapons as well.
I haven't written for a bit, in part because I am currently on vacation in
Mexico. Well, here's a short piece about some interesting behavior I've noticed
I use a cellular carrier with very good international roaming support, so for
the most part I just drive into Mexico and my phone continues to work as if
nothing has changed. I do get a notification shortly after crossing the border
warning that data might not work for a few minutes; I believe (but am not
certain) that this is because Google Fi uses eUICC.
eUICC, or Embedded Universal Integrated Circuit Card, essentially refers to a
special SIM card that can be field reprogrammed for different carrier
configurations. eUICC is attractive for embedded applications since it allows
for devices to be "personalized" to different cellular carriers without
physical changes, but it's also useful for typical smartphone applications
where it allows the SIM to be "swapped out" as a purely software process.
Note well, although the "embedded" seems to suggest it eUICC is not the same as
an "embedded SIM" (e.g. one soldered to the board). eUICC is instead a set of
capabilities of the SIM card and can be implemented either in an embedded SIM
or in a traditional SIM card. Several vendors, particularly in the IoT area,
offer eUICC capable SIMs in the traditional full/mini/micro SIM form factors
to allow an IoT operator to move devices between cellular networks and
Anyway, my suspicion is that Google Fi cuts down on their international service
costs by actually re-provisioning devices to connect to a local carrier in the
country where they are operating. I can't find any information supporting this
theory though, other than clarification that Fi does use embedded (eSIM) eUICC
capability in Pixel devices. Of course the eUICC capabilities can be delivered
in traditional SIM form factor as well, so carrier switching by this mechanism
would not be limited to devices with eSIM. The history of Google Fi as
requiring a custom kernel supports the theory that they rely on eUICC
capabilities, since until relatively recently eUICC was poorly standardized and
Android would likely not normally ship with device drivers capable of
In any case, that wasn't even what I meant to talk about. I was going to say
a bit about cellular voice-over-IP capabilities including VoWiFi and VoLTE,
and the slightly odd way that they can behave in the situation where you are
using a phone in a country other than the one in which it's provisioned. To
get there, we should first cover a bit about how VoIP or "over-the-top
telephony" interacts with modern cellular devices.
Historically, high-speed data modes did not always combine gracefully with
cellular voice connections. Many older cellular air interface standards only
supported being "in a call" or a "data bearer channel," with the result that a
device could not participate in a voice call and a data connection at the same
time. This makes sense when you consider that the data standards were developed
with a goal of simple backwards-compatibility with existing cellular
infrastructure. The result was that basic cellular capabilities like voice
calls and management traffic (SMS, etc) were achieved by the cellular baseband
essentially regressing to an earlier version of the protocol, disabling
high-speed data protocols such as the high-speed-in-name-only HSPDA. Most
early LTE devices carried on this basic architecture, and so when you dialed a
call on many circa 2010s smartphones the baseband basically went back in time
to the 3G days and behaved as a basic GSM device. No LTE data could be
exchanged in the mean time, and some users noticed that they could not, for
example, load a web page while on a phone call.
This is a good time to insert a disclaimer: I am not an expert on cellular
technologies. I have done a fair amount of reading about them, but the full
architecture of modern cellular networks, then combined with all of the legacy
technologies still in use, is bafflingly complicated. I can virtually guarantee
that I will get at least one thing embarrassingly wrong in the length of this
post, especially since some of this is basically speculative. If you know
better I would appreciate if you emailed me, and I will make an edit to avoid
spreading rumors. There are a surprising number of untrue rumors about these
This issue of not being able to use data while in a phone call became
increasingly irritating as more people started using Bluetooth headsets of
speakerphone and expected to be able to do things like make a restaurant
reservation while on a call with a friend. It clearly needed some kind of
resolution. Further, the many layers of legacy in the cellular network made
things a lot more complicated for carriers than they seemed like they ought
to be. Along with other trends like thinner base stations, carriers saw an
obvious way out... one shared with basically the entirety of the telecom
If you are not familiar, over-the-top or OTT delivery is an architecture mostly
discussed in fixed telecoms (e.g. cable and wireline telephone) but also more
generally useful as a way of understanding telecom technologies. The basic
idea of OTT is IP convergence at the last mile. If you make every feature of
your telecom product run on top of IP, you simplify your whole outside plant to
broadband IP transport. The technology for IP is very mature, and there's a
wide spectrum of vendors and protocols available. In general, IP is less
expensive and more flexible than most other telecom transports. An ISP is a good
thing to be, and if cellular carriers can get phones to operate on IP alone,
they are essentially just ISPs with some supported applications.
Modern LTE networks are steering towards exactly this: an all-IP air segment
with a variety of services, including the traditional core of voice calls,
delivered over IP. The system for achieving this is broadly called the IP
Multimedia Subsystem or IMS. It is one of an alarming number of blocks in a
typical high-level diagram of the LTE architecture, and it does a lot of work.
Fundamentally, IMS is a layer of the LTE network that allows LTE devices to
connect to media services (mostly voice although video, for example, is also
possible) using traditional internet methods.
Under the hood this is not very interesting, because IMS tries to use standard
internet protocols to the greatest extent possible. Voice calls, for example,
are set up using SIP, just as in most VoIP environments. Some infrastructure is
required to get SIP to interact nicely with the traditional phone system, and
this is facilitated using SIP proxies, DNS records, etc so that both IMS
terminals (phones) and cellular phone switches can locate the "edges" of the
IMS segment... or in other words the endpoints that they need to connect to in
order to establish a call. While there are a lot of details, the most important
part of this bookkeeping is the Home Subscriber Server or HSS.
The HSS is responsible for tracking the association between end subscribers and
IMS endpoints. This works like a SIP version of the broader cellular network:
your phone establishes a SIP registration with a SIP proxy, which communicates
with the HSS to register your phone (state that it is able to set up a voice
connection to your phone) and obtain a copy of your subscriber information for
use in call processing decisions.
This all makes quite a bit of sense and is probably the arrangement that you
would come up with if asked to design an over-the-top cellular voice system.
Where things get a bit odd is, well, the same place things always get odd: the
edge cases. One of these is when phones travel internationally.
An interesting situation I discovered: when returning to our rented apartment,
I sometimes need to call my husband to let me in the front gate. If my phone
has connected to the apartment WiFi network by this point, the call goes
through normally, but with an odd ringing pattern: the typical "warble"
ringback plays only briefly, before being replaced by a fixed sine tone. If, on
the other hand, my phone has not connected to the WiFi (or the WiFi is not
working, the internet here is rather unreliable), the call fails with an error
message that I have misdialed ("El número marcado no es correcto," an unusually
curt intercept recording from Telcel).
Instead, calls via LTE must be dialed as if international: that is, dialed
00-1-NXX-XXX-XXXX. This works fine, and with normal ringback to boot.
So what's going on here?
This answer is partially speculative, but I think the general contours are
correct. First, Google Fi appears to use Telcel as their Mexican carrier
partner. I would suspect this works similarly to Fi's network switching to
Sprint and US Cellular, with a "ghost number" being temporarily assigned (at
least historically, all Google Fi numbers are "homed" with T-Mobile). When not
connected to WiFi, the phone is either using "traditional" GSM voice or is
connecting to Telcel IMS services located using LTE management facilities. As a
result, my phone is, for all intents and purposes, a Mexican cellphone. Calls
to US numbers must be dialed as international because they are international.
However, when connected to WiFi, the phone likely connects to a Google-operated
IMS segment which handles the phone normally, as if it were in the US. Calls to
US numbers are domestic again.
It's sort of surprising that the user experience here is so awkward. This is
pretty confusing behavior, especially to those unfamiliar with WiFi calling.
It's not so surprising though when you consider the generally poor quality of
Android's handling of international travel. Currently many text messages and
calls I receive are failing to match up with contacts, apparently because the
calling number is coming across with an '00' international dialing prefix and
so not matching the saved phone number. Of course, if the call arrives via
WiFi or the message by RCS, it works correctly. One would think that Android
core applications would correctly handle the scenario of having to remove the
international dialing prefix, but admittedly it would probably be difficult
to come up with an algorithmic rule for this that would work globally.
Another interesting observation, also with some preamble: I believe I have
mentioned before that Mexico has a complex relationship with NANP, the unified
numbering scheme for North American countries that makes up the "+1" country
code. While Mexico originally intended to participate in NANP, a series of
events related to the generally complex history of the Mexican telecom industry
prevented that materializing and Mexico was instead assigned country code +52.
The result is that Mexico is "NANP-ish" but uses a distinct numbering scheme,
and the NANP area codes originally assigned to Mexico have since mostly been
recycled as overlays in the US.
A full history of telephone number planning in Mexico could occupy an entire
post (perhaps I'll write it next time I'm here). It includes some distinct
oddities. Most notably, area codes can be either 2 or 3 digits, with 2 digit
area codes being used for major cities. While Mexico had formerly used type of
service prefixes (specific dialing prefixes for mobile phones), these were
retired fairly recently and are no longer required or even permitted.
In principal, telephone numbers for 2-digit area codes can be written
XX-XXXX-XXXX, while three-digit area codes can be written XXX-XXX-XXXX. Note
the lack of Ns to specify digits constrained to 2-9 as in NANP. This is not
entirely intentional, I just don't know if this restriction exists in Mexico
today. Putting together the current Mexican dialing plan from original sources
is a bit tricky as IFT has published changes rather than compiled versions of
the numbering plan. My Spanish is pretty bad so reading all of these is going
to take a while, and it's getting to be pretty late... I'll take this on later,
so you can look forward to a future post where I answer the big questions.
An extremely common convention in Mexico is to write phone numbers as
XX-XX-XX-XX-XX. I'm not really sure where this came from as I don't see e.g.
IFT using it in their documents, but I see it everywhere from handwritten signs
to the customer service number on a Coca-Cola can. Further complicating things,
I have seen the less obvious XXX-XXXX-XXX in use, particularly for toll free
numbers. This seems like perhaps the result of a misunderstanding of the digit
grouping convention for 2 digit area codes.
It seems to be a general trend that countries with variable-length area codes
lack well agreed upon phone number formatting conventions. In the UK, for
example, there is also variability (albeit much less of it). This speaks to one
of the disadvantages of variable-length area codes: they make digit grouping
more difficult, as there's a logical desire to group around the "area code" but
it's not obvious what part of the number that is.
Anyway, there's some more telephone oddities for you. Something useful to think
about when you're trying to figure out why your calls won't connect.
Update: reader Gabriel writes in with some additional info on Mexican telephone
number conventions. Apparently in the era of manual exchanges, it was
conventional to write 4-digit telephone numbers as XX-XX. The "many groups of
two" format is sort of a habitual extension of this. They also note that in
common parlance Mexico City has a 1-digit area code '5' as all '5X' codes are
allocated to it.
This is an experiment in format for me: I would like to have something like
twitter for thoughts that are interesting but don't necessarily make a whole
post. The problem is that I'm loathe to use Twitter and I somehow find most of
the federated solutions to be worse, although I'm feeling sort of good about
Pixelfed. But of course it's not amenable to text.
I would just make these blog posts, but blog posts get emailed out to a decent
number of subscribers now. I know that I, personally, react with swift anger to
any newsletter that dare darken my inbox more than once a week. I don't want to
burden you with another subject line to scroll past unless it's really worth
it, you know? So here's my compromise: I will post short items on the blog, but
not email them out. When I write the next proper post, I'll include any short
items from the meantime in the email with that post. Seem like a fair
compromise? I hope so. Beats my other plan, at least, which was to start a
syndicated newspaper column.
Also, having now written Computers Are Bad for nearly two years, I went back
and read some of my old posts. I feel like my tone has gotten more formal
over time, something I didn't intend.
I would hate for anyone to accuse me of being "professional." In an effort to
change this trend, the tone of these will be decidedly informal and my typing
might be even worse than usual.
The good part
So tonight I was lying on my couch watching Arrested Development yet again
while not entirely sober, and I experienced something of a horror film
scenario: I noticed that the "message" light on my desk phone was flashing.
I remembered I'd missed several calls today, so I retrieved my voice mail.
That is, I pulled out my smartphone and scrolled through my inbox to find the
notification emails with a PCM file attached. Even I don't actually make a
phone call for that.
The voicemail, from a seemingly random phone number in California, was 1 minute
and 8 seconds long. I realized that this was a trend: over the last few days I
had received multiple 1 minute, 8 second voice messages but the first one I had
listened to seemed to be silent. I had since been ignoring them, assuming it
was a telephone spammer that hung up a little bit too late (an amusing defect
of answering machines and voicemail is that it has always been surprisingly hard
for a machine to determine whether a person answered or voicemail, although
there are a few heuristics). Just for the heck of it, though, realizing that I
had eight such messages, I listened to one again.
The contents: a sort of digital noise. It sounded like perhaps very far away
music, or more analytically it seemed like mostly white noise with a little
bit of detail that a very low-bitrate speech codec had struggled to handle.
It was quiet, and I could never quite make anything out, although it always
seemed like I was just on the edge of distinguishing a human voice.
Here's the best part: after finding about fifteen seconds of this to be
extremely creepy, I went back to my email. The sound kept on playing. I
checked the notifications. Still going. It wouldn't stop. I went to the task
switcher and dismissed the audio player. Still going. Increasingly agitated,
and on the latest version of Android which is somehow yet harder to use, I held
power and remembered it doesn't do that any more. I held power and volume down,
vaguely remembering they had made it something like that. No, screenshot.
Holding power and volume up finally got me to the six-item power menu, which
somehow includes an easy-access "911" button even though you have to remember
some physical button escape sequence to get it. Rebooting the phone finally
stopped the noise.
Thoroughly spooked, I considered how I came to this point.
Because I am a dweeb and because IP voice termination is very cheap if you look
in the right places, I hold multiple toll-free phone numbers, several of which
go through directly to the extension of my desk phone. This had been the case
for some time, a couple of years at least, and while I don't put it to a lot of
productive use I like to think I'm kind of running my own little cottage
PrimeTel. Of course basically the only calls these numbers ever get are spam
calls, including a surprising number of car warranty expiration reminders
considering the toll-free number.
But now I remember that there is another type of nuisance call that afflicts
some toll free numbers. You see, toll free numbers exhibit a behavior called
"reverse charging" or "reverse tolling" where the callee pays for the call
instead of the caller. Whether you get your TFN on a fixed contract basis or
pay a per-minute rate, your telephone company generally pays just a little bit
of money each minute to the upstream telephone providers to compensate them for
carrying the call that their customer wasn't going to pay for.
This means that, if you have a somewhat loose ethical model of the phone
system, you can make a bit of profit by making toll-free calls. If you either
are a telco or get a telco to give you a cut of the toll they receive, every
toll-free call you make now nets you a per-minute rate. There is obviously a
great temptation to exploit this. Find a slightly crooked telco, make thousands
of calls to toll-free numbers, get some of them to stay on the phone for a
while, and you are now participating in capitalism.
The problem, of course, is that most telcos (even those that offer a kickback
for toll-free calls, which is not entirely unusual) will find out about the
thousands of calls you are making. They'll promptly, usually VERY promptly due
to automated precautions, give you the boot. Still, there are ways, especially
overseas or by fraud, to make a profit this way.
And so there is a fun type of nuisance call specific to the recipients of
toll-free calls: random phone calls that are designed to keep you on the phone
as long as possible. This is usually done by playing some sort of audio that is
just odd enough that you will probably stay on the phone to listen for a bit
even after you realize it's just some kind of abuse. Something that sounds
almost, but not quite, like someone talking is a classic example.
Presumably one of the many operations making these calls is happy to talk to
voicemail for a bit (voicemail systems typically "supe," meaning that the call
is charged as if it connected). why one minute and eight seconds I'm not sure,
that's not the limit on my voicemail system. Perhaps if you include the
greeting recording it's 2 minutes after the call connects or something.
I've known about this for some time, it's a relatively common form of toll
fraud. I likely first heard of it via an episode of "Reply All" back when that
was a going concern. Until now, I'd never actually experienced it. I don't know
why that's just changed, presumably some operation's crawler just now noticed
one of my TFNs on some website. Or they might have even wardialed it the old
fashioned way and now know that it answers.
Oh, and the thing where it kept on playing after I tried to stop it, as if it
were the distorted voice of some supernatural entity? No idea, as I said, I use
Android. God only knows what part of the weird app I use and the operating
system support for media players went wrong. Given the complexity and generally
poor reliability of the overall computing ecosystem, I can easily dismiss
basically any spooky behavior emanating from a smartphone. I'm not going to
worry about evil portents until it keeps going after a .45 to the chipset...
Maybe a silver one, just in the interest of caution.
One of the great joys of the '00s was the tendency of marketers to apply the
acronym "HD" to anything they possibly could. The funniest examples of this
phenomenon are those where HD doesn't even stand for "High Definition," but
instead for something a bit contrived like "Hybrid Digital." This is the case
with HD Radio.
For those readers outside of these United States and Canada (actually Mexico
as well), HD Radio might be a bit unfamiliar. In Europe, for example, a
standard called DAB for Digital Audio Broadcasting is dominant and, relative
to HD radio, highly successful. Another relatively widely used digital
broadcast standard is Digital Radio Mondiale, confusingly abbreviated DRM,
which is more widely used in the short and medium wave bands than in VHF
where we find most commercial broadcasting today... but that's not a
limitation, DRM can be used in the AM and FM broadcast bands.
HD radio differs from these standards in two important ways: first, it is
intended to completely coexist with analog broadcasting due to the lack of
North American appetite to eliminate analog. Second, no one uses it.
HD Radio broadcasts have been on the air in the US since the mid '00s. HD
broadcasts are reasonably common now, with 9 HD radio carriers carrying 16
stations here in Albuquerque. Less common are HD radio receivers. Many, but not
all, modern car stereos have HD Radio support. HD receivers outside of the car
center console are vanishingly rare. Stereo receivers virtually never have HD
decoding, and due to the small size of the market standalone receivers run
surprisingly expensive. I am fairly comfortable calling HD Radio a failed
technology in terms of its low adoption, but since it falls into the broader
market of broadcast radio standards are low. We can expect HD Radio stations to
remain available well into the future and continue to offer some odd
Santa Fe's 104.1 KTEG ("The Edge"), for example, a run of the mill iHeartMedia
alt rock station, features as its HD2 "subcarrier" a station called Dance
Nation '90s. The clearly automated programming includes Haddaway's "What Is
Love" seemingly every 30 minutes and no advertising whatsoever, because it
clearly doesn't have enough listeners for any advertisers to be willing to pay
for it. And yet it keeps on broadcasting, presumably an effort by iHeartMedia
to meet programming diversity requirements while still holding multiple top-40
licenses in the Albuquerque-Santa Fe market region .
So what is all this HD radio stuff? What is a subcarrier? And just what makes
HD radio "Hybrid Digital?" HD Radio has gotten some press lately because of the
curious failure mode of some Mazda head units, and that's more attention than
it's gotten for years, so let's look a bit at the details.
First, HD Radio is primarily, in the US, used in a format called In-Band
On-Channel, or IBOC. The basic idea is that a conventional analog radio station
continues to broadcast while an HD Radio station is superimposed on the same
frequency. The HD Radio signal is found "outside" of the analog signal, as two
prominent sideband signals outside of the bandwidth of analog FM stereo.
While the IBOC arrangement strongly resembles a single signal with both analog
and digital components, in practice it's very common for the HD signal to be
broadcast by a separate transmitter and antenna placed near the analog
transmitter (in order to minimize destructive interference issues). This isn't
quite considered the "correct" implementation but is often cheaper since it
avoids the need to make significant changes to the existing FM broadcast
equipment... which is often surprisingly old.
It's completely possible for a radio station to transmit only an HD signal,
but because of the rarity of HD receivers this has not become popular. The FCC
does not normally permit it, and has declined to extend the few experimental
licenses that were issued for digital-only operation. As a result, we see HD
Radio basically purely in the form of IBOC. More commonly, HD Radio supports
both a full hybrid mode with conventional FM audio clarity and also an
"extended" mode in which the digital sidebands intrude on the conventional FM
bandwidth. This results in mono-only, reduced-quality FM audio, but allows
for a greater digital data rate.
HD Radio was developed and continues to be maintained by a company called
iBiquity, which was acquired by DTS, which was acquired by Xperia. iBiquity
maintains a patent pool and performs (minimal) continuing development on the
standard. iBiquity makes their revenue from a substantial up-front license fee
for radio stations to use HD Radio, and from royalties on revenue from
subcarriers. To encourage adoption, no royalties are charged on each radio
station's primary audio feed. Further encouraging adoption (although not
particularly successfully), no royalty or license fees are required to
manufacture HD Radio receivers.
The adoption of HD Radio in North America stems from an evaluation process
conducted by the FCC in which several commercial options were considered. The
other major competitor was FMeXtra, a generally similar design that was not
selected by the FCC and so languished. Because US band planning for broadcast
radio is significantly different from the European approach, DAB was not a
serious contender (it has significant limitations due to the very narrow
RF bandwidth available in Europe, a non-issue in the US where each FM radio
station was effectively allocated 200kHz).
The actual HD Radio protocol is known more properly as NRSC-5, for its
standards number issued by the National Radio Systems Council. The actual
NRSC-5 protocol differs somewhat depending on whether the station is AM or FM
(the widely different bandwidth characteristics of the two bands require
different digital encoding approaches). In the more common case of FM, NRSC-5
consists of a set of separate OFDM data carriers, each conveying part of
several logical channels which we will discuss later. A total of 18 OFDM
subcarriers are typically present, plus several "reference" subcarriers which
are used by receivers to detect and cancel certain types of interference.
If you are not familiar with OFDM or Orthogonal Frequency Division
Multiplexing, it is an increasingly common encoding technique that essentially
uses multiple parallel digital signals (as we see with the 18 subcarriers in
the case of HD Radio) to allow each individual signal to operate at a lower
symbol rate. This has a number of advantages, but perhaps the most important is
that it is typically used to enable the addition of a "guard interval" between
each symbol. This intentional quiet period avoids subsequent symbols "blurring
together" in the form of inter-symbol interference, a common problem with
broadcast radio systems where multipath effects result in the same signal
arriving multiple times at slight time offsets.
A variety of methods are used to encode the logical channels onto the OFDM
subcarriers, things like scrambling and convolutional coding that improve the
ability of receivers to recover the signal due to mathematics that I am far
from an expert on. The end result is that an NRSC-5 standard IBOC signal in
the FM band can convey somewhere from 50kbps to 150kbps depending on the
operator's desired tradeoffs of bitrate to power and range.
The logical channels are the interface from layer 1 of NRSC-5 to layer 2. The
number and type of logical channels depends on the band (FM or AM), the
waveform (hybrid analog and digital, analog with reduced bandwidth and digital,
or digital only), and finally the service mode, which is basically a
configuration option that allows operators to select how digital capacity is
In the case of FM, five logical channels are supported... but not all at once.
A typical full hybrid station broadcasts only primary channel P1 and a the PIDS
channel, a low-bitrate channel for station identification. P1 operates at
approximately 98kbps. For stations using an "extended" waveform with mono FM,
the operator can select from configurations that provide 2-3 logical channels
with a total bitrate of 110kbps to 148kbps. Finally, all-digital stations can
operate in any extended service mode or at lower bitrates with different
primary channels present. Perhaps most importantly, all-digital stations can
include various combinations of secondary logical channels which can carry
yet more data.
The curious system of primary channels is one that was designed basically to
ease hardware implementation and is not very intuitive... we must remember that
when NRSC-5 was designed, embedded computing was significantly more limited.
Demodulation and decoding would have to be implemented in ASICs, and so many
aspects of the protocol were designed to ease that process. At this point it is
only important to understand that HD Radio's layer 1 can carry some combination
of 4 primary channels along with the PIDS channel, which is very low bitrate
but considered part of the primary channel feature set.
Layer 1, in summary, takes some combination of primary channels, the
low-bitrate PIDS channel, and possibly several secondary channels (only in the
case of all-digital stations) and encodes them across a set of OFDM subcarriers
arranged just outside of the FM audio bandwidth. The design of the OFDM encoding
and other features of layer 1 aid receivers in detecting and decoding this data.
Layer 2 operates on protocol data units or PDUs, effectively the packet of
NRSC-5. Specifically, it receives PDUs from services and then distributes them
to the layer 1 logical channels.
The services supported by NRSC are the Main Program Service or MPS which can
carry both audio (MPSA) and data (MPSD), the similar Supplemental Program
Service which also conveys audio and data, the Advanced Application Service
(AAS), and the Station Information Service (SIS).
MPS and SPS are where most of HD Radio happens. Each carries a program audio
stream along with program data that is related to the audio stream---things
like metadata of the currently playing track. These streams can go onto any
logical channel at layer 1, depending on the bitrate required and available.
An MPS stream is mandatory for an HD radio station, while an SPS is optional.
AAS is an optional feature that can be used for a variety of different
purposes, mostly various types of datacasting, which we'll examine later. And
finally, the SIS is the simplest of these services, as it has a dedicated
channel at layer 1 (the PIDS previously mentioned). As a result, layer 2 just
takes SIS PDUs and puts them directly on the layer 1 channel dedicated to them.
The most interesting part of layer 2 is the way that it muxes content together.
Rather than sending PDUs for each stream, NRSC-5 will combine multiple streams
within PDUs. This means that a PDU may contain only MPS or SPS audio, or it
might contain some combination of MPS or SPS with other types of data. While
this seems complicated, it has some convenient simplifying properties: PDUs can
be emitted for each program stream at a fixed rate based on the audio codec
rate. Any unused space in each PDU can then be used to send other types of
data, such as for AAS, on an as-available basis. The situation is somewhat
simplified for the receiver since it knows exactly when to expect PDUs
containing program audio, and that program audio is always the start of a PDU.
The MPS and one or more SPS streams, if present, are not combined together but
instead remain separate PDUs and are allocated to the logical channels in one
of several fixed schemes depending on the number of SPS present and the
broadcast configuration used by the station. In the most common configuration,
that of one logical channel on a full hybrid radio station, the MPS and up to
two SPS are multiplexed onto the single logical channel. In more complex
scenarios such as all-digital stations, the MPS and three SPS may be
multiplexed across three logical channels. Conceptually, up to seven distinct
SPS identified by a header field can be supported, although I'm not aware of
anyone actually implementing this.
It is worth discussing here some of the practical considerations around the MPS
and SPS. NRSC-5 requires that an MPS always be present, and the MPS must convey
a "program 0" which cannot be stopped and started. This is the main audio
channel on an HD radio station. The SPS, though, are used to convey
"subcarrier"  stations. This is the capability behind the "HD2" second audio
channel present on some HD radio stations, and it's possible, although not at
all common, to have an HD3 or even HD4.
Interestingly, the PDU "header" is not placed at the beginning of the PDU.
Instead, its 24-bit sequence (chosen off a list based on what data types are
present in the PDU) are interleaved throughout the body of the PDU. This is
intended to improve robustness by allowing the receiver to correctly determine
the PDU type even when only part of the PDU is received. PDUs always contain
mixed data in a fixed order (program data, opportunistic data, fixed data),
with a "data delimiter" sequence after the program audio and a fixed data
length value placed at the end. This assists receivers in interpreting any
partial PDUs, since they can "backtrack" from the length suffix to identify the
full fixed data section and then search further back for the "data delimiter"
to identify the full opportunistic data section.
And that's layer 2: audio, opportunistic data, and fixed data are collected for
the MPS and any SPS and/or AAS, gathered into PDUs, and then sent to layer 1 for
transmission. SID is forwarded directly to layer 1 unmodified.
NRSC-5's application layer runs on top of layer 2. Applications consist most
obviously of the MPS and SPS streams, which are used mainly to convey audio...
you know, the thing that a radio station does. This can be called the Audio
Transport application and it runs the same way whether producing MPS (remember,
this is the main audio feed) or SPS (secondary audio feeds or subcarriers).
Audio transport starts with an audio encoder, which is a proprietary design
called HDC or High-Definition Coding. HDC is a DCT-based lossy compression
algorithm which is similar to AAC but adjusted to have some useful properties
for radio. Among them, HDC receives audio data (as PCM) at a fixed rate and
then emits encoded blocks at a fixed rate---but variable size. This variable
size but fixed rate is convenient to receivers but also makes "opportunistic
data," as discussed earlier, possible, because many PDUs will have spare room
at the end.
Another useful feature of HDC is its multi-stream output. HDC can be configured
to produce two different bit streams, a "core" bit stream which is lower bitrate
but sufficient to reproduce the audio at reduced quality, and an "enhanced" data
stream that allows the reproduction of higher fidelity audio. The core bit
stream can be placed on a different layer 1 channel than the enhanced data stream,
allowing receivers to decode only one channel and still produce useful audio when
the second channel is not available due to poor reception quality. This is not
typically used by hybrid stations, instead it's a feature intended for extended
and digital-only stations.
The variable size of the audio data and variable size of PDUs creates some
complexity for receivers, so the audio transport includes some extra data about
sample rate and size to assist receivers in selecting an appropriate amount of
buffering to ensure that the program audio does not underrun despite bursts of
large audio samples and fixed data. This results in a fixed latency from
encoding to decoding, which is fairly short but still a bit behind analog
radio. This latency is sometimes apparent on receivers that attempt to
automatically select between analog and digital signals, even though stations
should delay their analog audio to match the NRSC-5 encoder.
Finally, the audio transport section of each PDU (that is, the MPS or SPS part
at the beginning) contains regular CRC checksums that are used by the receiver
to ensure that any bad audio data is discarded rather than decoded.
MPS and SPS audio is supplemented by Program Service Data (PSD), which can be
either associated with the MPS (MPSD) or an SPS (SPSD). The PSD protocol
generates PDUs which are provided to the audio transport to be incorporated
into audio PDUs at the very beginning of the MPS or SPS data. The PSD is rather
low bitrate as it receives only a small number of bytes in each PDU. This is
quite sufficient, as the PSD only serves to move small, textual metadata about
the audio. Most commonly this is the title, artist, and album, although a few
other fields are included as well such as structured metadata for
advertisements, including a field for price of the advertised deal. This
feature is rarely, if ever, used.
The PSD data is transmitted continuously in a loop, so that a receiver that has
just tuned to a station can quickly decode the PSD and display information
about whatever is being broadcast. The looping PSD data changes whenever
required, typically based on an outside system (such as a radio automation
system) sending new PSD data to the NRSC-5 encoder over a network connection.
PSD data is limited to 1024 bytes total and, as a minimum, the NRSC-5
specification requires that the title and artist fields be populated. Oddly, it
makes a half-exception for cases where no information on the audio program is
available: the artist field can be left empty, but the title field must be
populated with some fixed string. Some radio stations have added an NRSC-5
broadcast but not upgraded their radio automation to provide PSD data to the
encoder; in this case it's common to transmit the station call sign or name as
the track title, much as is the case with FM Radio Data Service.
Interestingly, the PSD data is viewed as a set of ID3 tags and, even though
very few ID3 fields are supported, it is expected that those fields be in
the correct ID3 format including version prefixes.
Perhaps the most sophisticated feature of NRSC-5 is the Advanced Application
Service transport or AAS. AAS is a flexible system intended to send just about
any data alongside the audio programs. Along with PDUs, the audio transport
generates a metadata stream indicating the length of the PDUs which is used.
The AAS can use that value to determine how many bytes are free, and then fill
them with opportunistic data of whatever type it likes. As a result, the AAS
basically takes advantage of any "slack" in the radio broadcast's capacity, as
well as reserving a portion for fixed data if desired by the station operator.
AAS data is encoded into AAS packets, an organizational unit independent of
PDUs (and included within PDUs generated by the audio transport) and loosely
based on computer networking conventions. Interestingly, AAS packets may be
fragmented or combined to fit into available space in PDUs. To account for this
variable structure, AAS specifies a transport layer below AAS packets which is
based on HDLC (ISO high-level data link control) or PPP (point-to-point
protocol, which is closely related to HDLC and very similar). So, in a way, AAS
consists of a loosely computer-network-like protocol over a protocol roughly
based on PPP over audio transport PDUs over OFDM.
Each AAS packet header specifies a sequence number for reconstruction of large
payloads and a port number, which indicates to the receiver how it should
handle the packet (or perhaps instead ignore the packet). A few ranges of
port numbers are defined, but the vast majority are left to user applications.
Port numbers are two bytes, and so there's a large number of applications
possible. Very few are defined by specification, limited basically to port
numbers for supplemental PSD. This might be a bit confusing since PSD has its
own reserved spot at the beginning of the audio transport. The PSD protocol
itself is limited to only small amounts of text, and so when desired AAS can
be used to send larger PSD-type payloads. The most common application of this
"extra PSD" is album art, which can be sent as a JPG or PNG file in the AAS
stream. In fact, multiple ports are reserved for each of MPSD (main PDS) and
SPSD, allowing different types of extra data to be sent via AAS.
Ultimately, the AAS specification is rather thin... because AAS is a highly
flexible feature that can be used in a number of ways. For example, AAS forms
the basis of the Artist Experience service which allows for delivery of more
complete metadata on musical tracks including album art. AAS can be used as
the basis of almost any datacasting application, and is applied to everything
from live traffic data to distribution of educational material to rural areas.
Finally, in our tour of applications, we should consider the station information
service or SIS. SIS is a very basic feature of NRSC-5 that allows a station to
broadcast its identification (call sign and name) along with some basic services
like a textual message related to the station and emergency alert system
messages. SIS has come up somewhat repeatedly here because it receives special
treatment; SIS is a very simple transport at a low bitrate and has its own
dedicated logical channel for easy decoding. As a result, SIS PDUs are typically
the first thing a receiver attempts to decode, and are very short and simple
To sum up the structure of HD radio, it is perhaps useful to look at it as a
flow process: SID data is generated by the encoder and sent to layer 2 which
passes it directly to layer 1, where it is transmitted on its own logical
channel. PSD data is provided to the audio transport which embeds it at the
beginning of audio PDUs. The audio transport informs the AAS encoder of the
amount of available free space in a PDU, and the AAS encoder provides an
appropriate amount of data to the audio transport to be added at the end of the
PDU. This PDU is then passed to layer 2 which encapsulates it in a complete
NRSC-5 PDU and arranges it into logical channels which are passed to layer 1.
Layer 1 encodes the data into multiple OFDM carriers using a somewhat complex
scheme that produces a digital signal that is easy for receivers to recover.
Non-Audio Applications of NRSC-5
While the NRSC-5 specification is clearly built mostly around transporting the
main and secondary program audio, the flexibility of its data components like
PSD and AAS allows its use for purposes other than audio. As a very simple
example, SIS packets include a value called the "absolute local frame number"
or ALFN that is effectively a timestamp, useful for receivers to establish the
currency of emergency alert messages and for various data applications.
Because the current time can be easily calculated from the ALFN, it can be used
to set the clocks on HD radio receivers such as car head units. To support
this, standard SIS fields include information on local time zone, daylight
savings time, and even upcoming leap seconds.
SIS packets include a one-bit flag that indicates whether or not the ALFN is
being generated based on a GPS-locked time source, or based on the NRSC-5
encoder's internal clock only. To avoid automatically adjusting radio clocks to
an incorrect time (something that had plagued the earlier CEA protocol for
automatic setting of VCR clocks via PBS member stations), NRSC-5 dictates that
receivers must not set their display time based on a radio station's ALFN
unless the flag indicating GPS lock is set. Unfortunately, it seems that it's
rather uncommon for radio stations to equip their encoder with a GPS time
source, and so in the Albuquerque market at least HD Radio-based automatic time
setting does not work.
Other supplemental applications were included in the basic SIS as well, notably
emergency alert messages. HD Radio stations can transmit emergency alert
messages in text format with start and end times. In practice this seems to be
appreciably less successful than the more flexible capability of SiriusXM,
and ironically despite its cost to the consumer SiriusXM might have better
market penetration than HD Radio.
NRSC-5's data capabilities can be used to deliver an enhanced metadata
experience around the audio programming. The most significant implementation of
this concept is the "artist experience" service, a non-NRSC-5 standard
promulgated by the HD Radio alliance that uses the AAS to distribute more
extensive metadata including album art in image format. This is an appreciably
more complex process and so is basically expected to be implemented in software
on a general-purpose embedded operating system, rather than the hardware-driven
decoding of audio programming and basic metadata. Of course this greater
complexity lead more or less directly to the recent incident with Mazda HD
radio receivers in Seattle, triggered by a station inadvertently transmitting
invalid Artist Experience data in a way that seems to have caused the Mazda
infotainment system to crash during parsing. Fortunately infotainment-type HD
radio receivers typically store HD Radio metadata in nonvolatile memory to
improve startup time when tuning to a station, so these Mazda receivers
apparently repeatedly crashed every time they were powered on to such a degree
that it was not possible to change stations (and avoid parsing the cached
invalid file). Neat.
Since Artist Experience just sends JPG or PNG files of album art, we know that
AAS can be used to transmit files in general (and looking at the AAS protocol
you can probably easily come up with a scheme to do so). This opens the door to
"datacasting," or the use of broadcast technology to distribute computer data.
I have written on this topic
To cover the elements specific to our topic, New Mexico's KANW and some other
public radio stations are experimenting with transmitting educational materials
from local school districts as part of the AAS data stream on their HD2
subcarrier. Inexpensive dedicated receivers collect these files over time and
store them on an SD card. These receiver devices also act as WiFi APs and offer
the stored contents via an embedded web server. This allows the substantial
population of individuals with phones, tablets, or laptops but no home internet
or cellular service to retrieve their distance education materials at home,
without having to drive into town for cellular service (the existing practice
in many parts of the Navajo Nation, for example) .
There is potential to use HD Radio to broadcast traffic information services,
weather information, and other types of data useful to car navigation systems.
While there's a long history of datacasting this kind of information via radio,
it was never especially successful and the need has mostly been obsoleted by
ubiquitous LTE connectivity. In any case, the enduring market for this type of
service (over-the-road truckers for example) has a very high level of SiriusXM
penetration and so already receives this type of data.
Fall of the House of Hybrid Digital
In fact, the satellite angle is too big to ignore in an overall discussion of
HD Radio. Satellite radio was introduced to the US at much the same time as HD
Radio, although XM proper was on the market slightly earlier. Satellite has the
significant downside of a monthly subscription fee. However, time seems to have
shown that the meaningful market for enhanced broadcast radio consists mostly
of people who are perfectly willing to pay a $20/mo subscription for a
meaningful better service. Moreover, it consists heavily of people involved in
the transportation industry (Americans listen to the radio basically only in
vehicles, so it makes sense that the most dedicated radio listeners are those
who spend many hours in motion). Since many of these people regularly travel
across state lines, a nationwide service is considerably more useful than one
where they have to hunt for a new good station to listen to as they pass
through each urban area.
All in all, HD radio is not really competitive for today's serious radio
listeners because it fails to address their biggest complaint, that radio is
too local. Moreover, SiriusXM's ongoing subscription revenue seems to provide a
much stronger incentive to quality than iHeartMedia's declining advertising
relationships . The result is that, for the most part, the quality of
SiriusXM programming is noticeably better than most commercial radio stations,
giving it a further edge over HD Radio.
Perhaps HD Radio is simply a case of poor product-market fit, SiriusXM having
solved essentially the same problems but much better. Perhaps the decline of
broadcast media never really gave it a chance. The technology is quite
interesting, but adoption is essentially limited to car stereos, and not even
that many of them. I suppose that's the problem with broadcast radio in
 The details here are complex and deserve their own post, but as a general
idea the FCC attempts to maintain a diversity of radio programming in each
market by refusing licenses to stations proposing a format that is already used
by other stations. Unfortunately there are relatively few radio formats that
are profitable to operate, so the broadcasting conglomerates tend to end up
playing games with operating stations in minor formats, at little profit, in
order to argue to the FCC that enough programming diversity is available to
justify another top 40 or "urban" station.
 The term "subcarrier" is used this way basically for historical reasons and
doesn't really make any technical sense. It's better to think of "HD2" as being
a subchannel or secondary channel, but because of the long history of radio
stations using actual subcarrier methods to convey an alternate audio stream
the subcarrier term is stuck.
 It seems inevitable that, as has frequently happened in the history of
datacasting, improving internet access technology will eventually obsolete this
concept. I would strongly caution you against thinking this has already
happened, though: even ignoring the issue of the long and somewhat undefined
wait, Starlink is considerably more expensive than the typical rates for rural
internet service in New Mexico. It is to some extent a false dichotomy to say
that Starlink is cost uncompetitive with DSL considering that it can service a
much greater area. However, I think a lot of "city folk" are used to the
over-$100-per-month pricing typical of urban gigabit service and so view
Starlink as inexpensive. They do not realize that, for all the downsides of
rural DSL, it is very cheap. This reflects the tight budget of its consumers.
For those who have access, CenturyLink DSL in New Mexico ranch country is
typically $45/mo no-contract with no install fee and many customers use a
$10/mo subsidized rate for low income households. Starlink's $99/mo and $500
initial is simply unaffordable in this market, especially since those outside
of the CenturyLink service area have, on average, an even lower disposable
income than those clustered near towns and highways.
 It is hard for me not to feel like iHeartMedia brought this upon
themselves. They gained essentially complete control of the radio industry
(with only even sadder Cumulus as a major competitor) and then squeezed it for
revenue until US commercial radio programming had become, essentially, a joke.
Modern commercial radio stations run on exceptionally tight budgets that have
mostly eliminated any type of advantage they might have had due to their
locality. This is most painfully apparent when you hear an iHeartMedia station
give a rare traffic update (they seem to view this today as a mostly pro forma
activity and do it as little as possible) in which the announcer pronounces
"Montaño" and, more puzzlingly, "Coors" wrong in the span of a single sentence.
I have heard a rumor that all of the iHeartMedia traffic announcements are done
centrally from perhaps Salt Lake City but I do not know if this is true.
I started writing a post about media container formats, and then I got severely
sidetracked by explaining how MPEG elementary streams aren't in a container but
still have most of the features of containers and had a hard time getting back
to topic until I made the decision that I ought to start down the media rabbit
hole with something more basic. So let's talk about an ostensibly basic audio
PCM stands for Pulse Code Modulation and, fundamentally, it is a basic
technique for digitization of analog data. PCM is so obvious that explaining it
is almost a bit silly, but here goes: given an analog signal, at regular
intervals the amplitude of the signal is measured and quantized to the nearest
representable number (in other words, rounded). The resulting "PCM signal" is
this sequence of numbers. If you remember your Nyquist and Shannon from college
data communications, you might realize that the most important consideration in
this process is that the sampling frequency must be twice the highest frequency
component in the signal to be digitized.
In the telephone network, for example, PCM encoding is performed at 8kHz. This
might seem surprisingly low, but speech frequencies trail off above 3kHz and so
the up-to-4kHz represented by 8kHz PCM is perfectly sufficient for intelligible
speech. It is not particularly friendly to music, though, which is part of why
hold music is the way it is. For this reason, in music and general digital
audio a sampling rate of 44.1kHz is conventional due to having been selected
for CDs. Audible frequencies are often defined as being "up to 20kHz" although
few people can actually hear anything that high (my own hearing trails off at
14kHz, attributable to a combination of age and adolescent exposure to nu
metal). This implies a sampling rate of 40kHz; the reason that CDs use 44.1kHz
is essentially that they wanted to go higher for comfort and 44.1kHz was the
highest they could easily go on the equipment they had at the time. In other
words, there's no particular reason, but it's an enduring standard.
Another important consideration in PCM encoding is the number of discrete
values that samples can possibly take. This is commonly expressed as the number
of bits available to represent each sample and called "bit depth." For example,
a bit depth of eight allows each sample to have one of 255 values that we might
label -127 through 128. The bit depth is important because it limits the
dynamic range of the signal. Dynamic range, put simply, is the greatest
possible variation in amplitude, or the greatest possible variation between
quiet and loud. Handling large dynamic ranges can be surprisingly difficult in
both analog and digital systems, since both electronics and algorithms struggle
to handle values that span multiple orders of magnitude.
In PCM encoding, bit depth has a huge impact on the resulting bitrate. 16-bit
audio, as used on CDs, is capable of a significantly higher dynamic range than
8-bit audio at the cost of doubling the bitrate. Dynamic range is important in
music, but is also surprisingly important in speech, and a bit depth of 8
is actually insufficient to reproduce speech that will be easy to understand.
And yet, due to technical constraints, 8kHz and 8-bit samples were selected for
telephone calls. So how is speech acceptably carried over 8-bit PCM?
We need to talk a bit about the topics of compression and companding. There can
be some confusion here because "compression" is commonly used in computing to
refer to methods that reduce the bitrate of data. In audio engineering, though,
compression refers to techniques that reduce the dynamic range of audio, by
making quieter sounds louder and louder sounds quieter until they tend to
converge at a fixed volume. Like some other writers, I will use "dynamic
compression" when referring to the audio technique to avoid confusion. For both
practical and aesthetic reasons (not to mention, arguably, stupid reasons),
some degree of dynamic compression is applied to most types of audio that we
Companding, a portmanteau of compressing and expanding, is a method used to
pack a wide dynamic range signal into a channel with a smaller dynamic range.
As the name suggests, companding basically consists of compressing the signal,
transmitting it, and then expanding it. How can the signal be expanded, though,
given that dynamic range was lost when it was compressed? The trick is that
both sides of a compander are non-linear, compressing loud sounds more than
quiet sounds. This works well, because in practice many types of audio show a
non-linear distribution of amplitudes. In the case of speech, for example,
significantly more detail is found at low volume levels, and yet occasional
peaks must be preserved for good intelligibility.
In practice, companding is so commonly used with PCM that the compander is
often considered part of the PCM coding. When I have described PCM thus far, I
have been describing linear PCM or LPCM. LPCM matches each sample against a set
of evenly distributed discrete values. Many actual PCM systems use some form of
non-linear PCM in which the possible sample values are distributed
logarithmically. This makes companding part of PCM itself, as the encoder
effectively compresses and decoder effectively expands. One way to illustrate
this is to consider what would happen if you digitized audio using a non-linear
PCM encoder and then played it back using a linear PCM decoder: It would sound
compressed, with the quieter components moved into a higher-valued, or louder,
Companding does result in a loss of fidelity, but it's one that is not very
noticeable for speech (or even for music in many cases) and it results in a
significant savings in bit depth. Companding is ubiquitous in speech coding.
One of the weird things you'll run into with PCM is the difference between
µ-law PCM and A-law PCM. In the world of telephony, a telephone call is usually
encoded as uncompressed 8kHz, 8-bit PCM, resulting in the 64kbps bitrate that
has become the basic unit of bandwidth in telecom systems. Given the simplicity
of uncompressed PCM, it can be surprising that many telephony systems like VoIP
software will expect you to choose from two different "versions" of PCM. The
secret of telephony PCM is that companding is viewed as part of the PCM codec,
and for largely historic reasons there are two common algorithms in use. The
actual difference is the function or curve used for companding, or in other
words, the exact nature of the non-linearity. In the US and Japan (owing to
post-WWII history Japan's phone system is very similar to that of the US), the
curve called µ-law is in common use. In Europe and most other parts of the
world, a somewhat different curve is used, called A-law. In practice the
difference between the two is not particularly significant, and it's difficult
to call one better than the other since both just make slightly different
trade offs of dynamic range for quantization error (A-law is the option with
greater dynamic range and greater possible distortion).
Companding is rarely applied in music and general multimedia applications. One
way to look at this is to understand the specializations of different audio
codecs: µ-law PCM and A-law PCM are both simple examples of what are called
speech codecs, Speex and Opus being more complex examples that use lossy
compression techniques for further bitrate reduction (or better fidelity at
64kbps). Speech codecs are specialized for the purpose of speech and so make
assumptions that are true of speech including a narrow frequency range and
certain temporal characteristics. Music fed through speech codecs tends to
become absolutely unlistenable, particularly for lossy speech codecs, which
hold music on GSM cellphones painfully illustrates.
In multimedia audio systems, we instead have to use general-purpose audio
codecs, most of which were designed around music. Companding is effectively a
speech coding technique and is left out of these audio systems. PCM is still
widely used, but in general audio PCM is assumed to imply linear PCM.
As previously mentioned, the most common convention for PCM audio is 44.1kHz at
16 bits. This was the format used by CDs, which effectively introduced digital
audio to the consumer market. In the professional market, where digital audio
has a longer history, 48kHz is also in common use... however, you might be able
to tell just by mathematical smell that conversion from 48kHz to 44.1kHz is
prone to distortion problems due to the inconveniently large common multiple of
the two sample rates. An increasingly commonly used sample rate in consumer
audio is 96kHz, and "high resolution audio" usually refers to 96kHz and 24 bit
There is some debate over whether or not 96kHz sampling is actually a good
idea. Remembering our Nyquist-Shannon, note that all of the extra fidelity we
get from the switch from 44.1kHz to 96kHz sampling is outside of the range
detectable by even the best human ears. In practice the bigger advantage of
96kHz is probably that it is an even multiple of the 48kHz often used by
professional equipment and thus eliminates effects from sample rate conversion.
On the other hand, there is some reason to believe that the practicalities of
real audio reproduction systems (namely the physical characteristics of
speakers, which are designed for reproduction of audible frequencies) causes
the high frequency components preserved by 96kHz sampling to turn into
distortion at lower, audible frequencies... with the counterintuitive result
that 96kHz sampling may actually reduce subjective audio quality, when
reproduced through real amplifiers and speakers. In any case, the change to
24-bit samples is certainly useful as it provides greater dynamic range.
Unfortunately, much like "HDR" video (which is the same concept, a greater
sample depth for greater dynamic range), most real audio is 16-bit and so
playback through a 24-bit audio chain requires scaling that doesn't typically
produce distortion but can reveal irritating bugs in software and equipment.
Fortunately the issue of subjective gamma, which makes scaling of non-HDR video
to HDR display devices surprisingly complex, is far less significant in the
case of audio.
PCM audio, at whatever bit rate and bit depth, is not so often seen in the form
of files because of its size. That said, the "WAV" file format is a simple
linear PCM encoding stored in a somewhat more complicated container. PCM is far
more often used as a transport between devices or logical components of a
system. For example, if you use a USB audio device, the computer is sending a
PCM stream to the device. Unfortunately Bluetooth does not afford sufficient
bandwidth for multimedia-quality PCM, so our now ubiquitous Bluetooth audio
devices must use some form of compression. A now less common but clearer
example of PCM transport is found in the form of S/PDIF, a common consumer
digital audio transport that can carry two 44.1 or 48kHz 16-bit PCM channels
over a coaxial or fiber-optic cable.
You might wonder how this relates to the most common consumer digital audio
transport today, HDMI. HDMI is one of a confusing flurry of new video standards
that were developed as a replacement for the analog VGA, but HDMI originated
more from the consumer A/V part of the market (the usual Japanese suspects,
mostly) and so is more associated with televisions than the (computer industry
backed) DisplayPort standard. A full treatment of HDMI's many features and
misfeatures would be a post of its own, but it's worth mentioning the forward
HDMI carries the forward (main, not return) audio channel by interleaving it
with the digital video signal during the "vertical blanking interval," a concept
that comes from the mechanical operation of CRT displays but has remained a
useful way to take advantage of excess bandwidth in a video channel. The term
vertical blanking is now somewhat archaic but the basic idea is that
transmitting a frame takes less time than the frame is displayed for, and so
the unoccupied time between transmitting each frame can be used to transmit
other data. The HDMI spec allows for up to 8 channels of 24-bit PCM, at up to
192kHz sampling rate---although devices are only required to support 2 channels
Despite the capability, 8-channel (usually actually "7.1" channel in the A/V
parlance) audio is not commonly seen on HDMI connections. Films and television
shows more often distribute multi-channel audio in the form of a compressed
format designed for use on S/PDIF, most often Dolby Digital and DTS (Xperi).
In practice the HDMI audio channel can move basically any format so long as the
devices on the ends support it. This can lead to some complexity in practice,
for example when playing a blu-ray disc with 7.1 channel DTS audio from a
general-purpose operating system that usually outputs PCM stereo. High-end HDMI
devices such as stereo receivers have to support automatic detection of a range
of audio formats, while media devices have to be able to output various formats
and often switch between them during operation.
On HDMI, the practicalities of inserting audio in the vertical blanking
interval requires that the audio data be packetized, or split up into chunks so
that it can be divided into the VBI and then reassembled into a continuous
stream on the receiving device. This concept of packetized audio and/or video
data is actually extremely common in the world of media formats, as
packetization is an easy way to achieve flexible muxing of multiple independent
streams. And that promise, that we are going to talk about packets, seems like
a good place to leave off for now. Packets are my favorite things!
Later on computer.rip: MPEG. Not much about the compression, but a lot about
the physical representations of MPEG media, such as elementary streams,
transport streams, and containers. These are increasingly important topics as
streaming media becomes a really common software application... plus it's all
pretty interesting and helps to explain the real behavior of terrible Hulu TV
A brief P.S.: If you were wondering, there is no good reason that PCM is called
PCM. The explanation seems to just be that it was developed alongside PWM and
PPM, so the name PCM provided a pleasing symmetry. It's hard to actually make
the term make a lot of sense, though, beyond that "code" was often used in the
telephone industry to refer to numeric digital channels.