I had a strong feeling that I had written a post at some point in the past
that touched on license-free radio services and bands. I can't find it now,
so maybe it was all a dream. I wanted to expand on the topic, so here we are
either way.
As a general principle, radio licensing in the United States started out being
based on the operator. As an individual or organization, you could obtain a
license that entitled you to transmit within certain specifications. You could
use whatever equipment you wanted, something that was particularly important
since early on most radio equipment was at least semi-custom.
In some cases licenses rested with individuals, and in others they rested with
organizations. It tended to depend on the type of service; in the maritime
world in particular radio operators needed to hold licenses regardless of the
separate station licensing of the ship.
In other services like most land-mobile radio, a license held by an
organization may entitle its staff to use radios (within license parameters)
with no training or qualifications at all. These types of radio services impose
limitations intended to prevent unqualified users from causing undue
interference. A common example is the prohibition on face programming of most
land-mobile radios in business or government use: restricting users to choosing
from pre-programmed channels prevents use of unlicensed frequencies, based on
the assumption that the pre-programming was done by a competent radio
technician. This doesn't always hold true in real organizations [1] but the
idea, at least, is a good one.
Today, though, we most commonly interact with radio in a different form:
services that are fully unlicensed. We use WiFi constantly, but neither
ourselves nor our organizations have a radio license authorizing it. You might
think that the manufacturer of the equipment, perhaps, holds a license, but
that's not really the case. The reality is strange and a result of
happenstance.
Early in the history of radio, it was discovered that radio frequency had
applications other than communications. As a form of electromagnetic radiation,
RF can be a useful way to deliver energy. In 1933, Westinghouse demonstrated
the use of a powerful shortwave transmitter as an oven. This idea was not
especially practical due to the physics of heating with low-frequency RF, but
the basic concept became quite practical around a decade later when a Raytheon
engineer famously noticed that a specialized type of transmitter tube used for
radar systems melted a chocolate bar in his pocket. One wonders if the
localized heating to his body this would have involved as well was noticeable,
but presumably RF safety was less of a workplace priority at the time.
This specialized transmitter tube was, of course, the magnetron, which has
largely fallen out of use in radar systems but is still used today as the RF
transmitter in microwave ovens. A magnetron is a vacuum tube that exploits some
convenient physics to emit RF at a fairly high level of efficiency, and with
a fairly compact device considering the power levels involved. As a downside,
the output of magnetrons is not particularly precise in terms of frequency
control, and is also not very easy to modulate. This makes them unattractive
for modern communications purposes, but quit suitable for non-communications
use of strong RF emissions such as Totino's pizza rolls.
This whole tangent about the history of the microwave is a way to introduce a
field of RF engineering different from what those of us in the information and
communications industry usually think of. We could broadly refer to these
applications as "RF heating," and while the microwave oven is the most
ubiquitous form there are quit a few others. The use of RF for localized
heating, for example, is useful in a number of situations outside of the
kitchen. Synthetic textiles, particularly for more technical applications like
tents and life jackets, are sometimes "seamed" using RF welding. RF welders
clamp the fabric and then put a strong HF signal through the join to cause
heating. The result is similar to direct thermal welding but can produce a more
reliable join for some materials, since the heating process is more even
through the thickness of the material. Similarly, a variety of instruments are
used in medicine to cause RF heating of specific parts of the body. While
normally RF heating of the body is a Bad Thing caused by poor safety practices,
surgeons can apply it to destroy tumors, cauterize wounds, etc.
RF is also useful for non-heating purposes due to the way it penetrates
materials, and there are various measurement instruments that pass RF through
materials or emit RF and observe reflections. I am of course basically
describing bistatic and monostatic radar, but many of these devices are far
smaller and lower power than radar as we typically think of it and so it's
useful for them to be available without complex licensing or coordination
requirements. A somewhat extreme example of such devices are the millimeter
wave imagers used in airport security, which take advantage of the minimal
water penetration of very high frequencies in the range of 60GHz and above.
This whole category of RF devices is an interesting one because they are not
"radios" in the typical sense, but they still use the same spectrum and so
impact radio use. This is a particularly important issue since many RF heating
devices operate at very high power levels... few people possess a radio
transmitter in the range of a kilowatt, but most people have a microwave oven.
As a result, radio spectrum regulators like the FCC need to coordinate these
devices to prevent them causing severe interference with communications
applications. It was the microwave oven which first revealed this need, and so
it's no surprise that shortly after the Raytheon chocolate accident the FCC
proposed a set of bands which it called Industrial, Scientific, and Medical, or
ISM---this term intended to encompass the set of non-communications RF
applications known at the time (microwave ovens had not yet become practical
for home use).
The microwave oven continues to serve as an excellent case study for the
evolution of unlicensed radio, because for several reasons microwave ovens
operate at around 2.4GHz, and so one of the original ISM bands is the 2.4GHz
band. That number will be familiar because most WiFi standards except very old
ones and very new ones operate in that same band. What gives? Why does a
sensitive, high-rate digital radio system operate in a band that was explicitly
reserved for being hopelessly splattered by a thousand microwave ovens?
The answer is licensing. Because the ISM bands were basically reserved to be a
no-man's land that non-communications devices could freely emit into, there are
no licensing requirements for ISM emissions. ISM devices must pass only a
device certification process which exists mostly only to ensure that they do
not produce external emissions outside of safety limits or emit in other bands.
In other words, WiFi uses the 2.4GHz band because it's the easiest one to use.
Other ISM bands show the same problem. 900MHz is reserved for ISM applications,
also mostly for heating, but was widely used by cordless phones and baby
monitors. The lower ISM bands, in the HF range, are typically not used by
consumer devices due to the higher cost of HF power electronics, but there are
exceptions.
These unlicensed communications applications of the ISM bands have been
formalized over time, but remain from their origin a workaround on licensing
requirements. This original sin of many consumer radio devices is the reason
that, early on, microwave ovens were a major source of problematic interference
with radio devices. The thing is, everyone blamed the microwave ovens even
though it was actually WiFi that was intruding in spectrum that rightfully
belonged to hot pockets.
One might wonder why these unlicensed systems use bands that are allocated to
ISM applications, instead of bands that are actually intended for unlicensed,
low-power communications. The short answer is politics, and the longer answer
is that no such bands existed at the time (in usable parts of spectrum) and the
process to create them was a long one. Remember that for most of the history of
spectrum regulation radios were big, expensive devices that required expertise
to operate. It was the expectation that everyone using a radio either had a
license or had been issued it by a licensed organization. It was cordless phones
and baby monitors that really started to chip away at that expectation, and WiFi
caused it to completely collapse.
We talked about 2.4GHz WiFi, and so you might be wondering about 5GHz WiFi...
the band used by 802.11a, and at least optionally in 802.11n, 802.11ac, and
802.11 "WiFi 6" ax. There's good news: 5GHz is not an ISM band. Instead, it's
allocated for "Unlicensed National Information Infrastructure," or U-NII. The
term is both weirdly vague (Information Infrastructure) an weirdly specific
(National), but U-NII's history is revealing. The 5GHz band was first widely
applied by the HIPERLAN standard, an ultimately unsuccessful competitor to WiFi
in Europe. The model of HIPERLAN, though, caused none other than Apple Computer
to start the regulatory process to allocate a similar band in the US for
computer networking. Originally, in 1995, Apple largely envisioned the band
being used for wide-area networking, or what we might now call WISPS, but the
rules were made sufficiently general to allow for local area applications as
well. Apple never succeeded in this product concept but the band was selected
for 802.11a. 801.11a had limited success due to higher cost and poorer range,
and subsequent WiFi standards returned to 2.4GHz... but as interference became
a major problem for WiFi that lower range became more attractive, along with
the many advantages of a more dedicated band allocation.
The U-NII band was allocated relatively late, though, and so it comes with some
complexities. By the time it was allocated for U-NII it had already been in use
for some time for radar, and indeed the issue of 5GHz WiFi interfering with
radar proved severe. To resolve these issues, many 5GHz U-NII devices are now
required to implement a feature called Dynamic Frequency Selection or DFS. This
might be better referred to as "radar dodging," because that's what it does.
5GHz WiFi APs actively monitor the channel they're using for anything that
looks like a radar emission. If they detect one, they switch to a different
channel to avoid it. Because radar is relatively sparsely deployed, this
usually works quit well. If you live near an airport, for example, there may be
a terminal weather radar at 5GHz that will quickly scare your WiFi network off
of a particular channel. But it's almost always the only such radar anywhere
nearby, so there are still other channels available. The issue becomes a bit
trickier for higher-performance WiFi standards like WiFi "802.11ax" 6 that use
wider channels, and so some people might see more issues caused by DFS
(probably the 5GHz AP shutting off entirely), but this should remain uncommon.
WiFi continues to grow as a radio application, and so too does its allocated
spectrum. Just a couple of years ago, the FCC allocated a huge swath---5.925 to
7.125GHz---to unlicensed communications systems, as secondary users to existing
mostly point-to-point microwave links. This range has effectively been glued on
to the top of the existing U-NII, and so it is referred to as U-NII 5 through
U-NII 8 (U-NII 1-4 being the original 1997 allocation). Once again, WiFi must
take actions to play nice with existing spectrum users. Indoor WiFi APs don't
have to do anything too special but are limited to very low power levels to
ensure that their emissions do not substantially leak outside of the building.
Outdoor APs are allowed a higher power level since potential interference is
inevitable in an outdoor environment... but there's a cost.
Outdoor 6GHz WiFi APs must use "automatic frequency coordination." AFC is not
yet completely nailed down, but the general idea is that someone (I put my
money on L3Harris) will operate an online database of 6GHz spectrum users. AFC
WiFi APs will have to automatically register with this database and obtain a
coordinated frequency allocation, which will be selected by the database to
prevent interference with existing fixed users and, to the greatest extent
practical, other 6GHz WiFi APs. This system doesn't actually exist yet, but
we can expect it to add a layer of management complexity to outdoor use of
the 6GHz band that might limit it to campus operators and other enterprise
WiFi systems, at least in the short term.
But then the issue is kind of moot for the moment, because there are very few
actual 6GHz WiFi devices. In keeping with the decision to brand 802.11ax as
"WiFi 6," 6GHz application is called "WiFi 6E." We can all ponder the direct
parallels to the confusing, but the other way, marketing term DECT 6.0. At the
moment only indoor WiFi 6E APs are available (due to AFC not yet being
standardized), and only the very cutting edge of client devices support it.
This includes the Pixel 6, but not yet any iPhone, although it's a pretty safe
bet that the iPhone 14 announcement will change that. A few mini-PCI-e form
factor WiFi 6E adapters are available, often called "tri-band," and are
starting to pop up in high-end laptops. As usual with new bands, it will be
some years before WiFi 6E finds common use.
Of course I am upgrading my home APs to 6E models, so that whenever I use my
Pixel 6 Pro I can feel just a little but smug. That's the important thing about
new WiFi standards, of course: spending nearly a grand on an upgrade that only
even theoretically helps for your phone, where real-world performance is
actually limited by JavaScript execution. Twitter.com still takes 10 seconds to
render 140 characters of text, but it's getting that text at the better part of
a gigabit per second!
There's some more complexity to this situation related to FCC certification of
devices, which has become more complex and important over time, but that's a
story for another time...
[1] Everyone grumbles about Baofeng people, but I've had some contact with
rural police and fire departments and you would be amazed at the things their
"radio technician" (chief's nephew) thinks are a good idea.
Today, as New Mexico celebrates 4/20 early, seems an appropriate time to talk
about bhang... or rather, the bhangmeter.
The name of the bhangmeter seems to have been a joke by its designer and Nobel
laureate Frederick Reines, although I must confess that I have never totally
gotten it (perhaps I simply haven't been high enough). In any case, the
bhangmeter is one of the earliest instruments designed for the detection of a
nuclear detonation. In short, a bhangmeter is a photosensor with accompanying
discrimination circuits (or today digital signal processing) that identify the
"double flash" optical and heat radiation pattern which is characteristic
of a nuclear detonation.
The double flash originates from the extreme nature of the period immediately
after a nuclear detonation: the detonation creates an immense amount of heat
and light, but very quickly the ionized shockwave emerging from the explosion
actually blocks much of the light output. As the shockwave expands and loses
energy, the light can escape again. The first pulse is only perhaps a
millisecond long and has very sharp edges, while the second pulse appears more
slowly and as much as a second or so later (depending on weapon type,
conditions, etc).
The immensely bright light of a nuclear detonation, accompanied by this double
flash intensity pattern, is fairly unique and has been widely for remote
sensing for nuclear weapons. Today this is mostly done by GPS and other
military satellites using modern optical imaging sensors, and the same
satellites observe for other indications of nuclear detonation such as an X-ray
pulse to confirm [1]. The bhangmeter itself, though, dates back to 1948 and
always showed potential for large-area, automated monitoring.
The United States first effort at large-scale automated nuclear detonation
monitoring was entrusted to the Western Union company, at the time the nation's
largest digital communications operator. By 1962, Western Union had completed
build-out of the uncreatively named Bomb Alarm System (BAS). BAS covered 99
locations which were thought to be likely targets for nuclear attack, and was
continuously monitored (including state of health and remote testing) from six
master control stations. It operated until the late '60s, when improved space
technology began to obsolete such ground-based systems.
Let's spend some time to look at the detailed design of the BAS, because it
has some interesting properties.
At each target site, three sensors are placed in a circle (at roughly 120
degrees apart) of eleven miles radius. This distance was chosen so that the
expected sensitivity of the sensors in poor weather would result in a
detonation at the center of the circle triggering all three, and because it
allowed ample time for a sensor to finish transmitting its alarm before it was
destroyed by shockwave-driven debris. If a nuclear weapon were to detonate off
center, it may destroy one station but the other two should complete
transmission of the alarm. This even allowed a very basic form of
triangulation.
The sensors were white aluminum cylinders mostly mounted to the top of
telephone poles, although some were on building roofs. On casual observation
they might have been mistaken for common pole-top transformers except that each
had a small cylindrical Fresnel lens sticking out of the top, looking not
unlike a maritime obstruction light. The Fresnel lens focused light from any
direction towards a triangular assembly of three small photocells. A perforated
metal screen between the lens and the photocells served both to attenuate light
(since the expected brightness of a nuclear detonation was extremely high) and
as a mounting point for a set of xenon flash bulbs that could be activated
remotely as a self-test mechanism.
In the weatherproof metal canister below the lens was a substantial set of
analog electronics which amplified the signal from the photocells and then
checked for a bright pulse with a rise time of less than 30ms, a brightness
roughly equivalent to that of the sun, and a decay to half brightness within
30ms. A second pulse must reach the same brightness within one second and
decay within one second.
Should such a double flash be detected, the sensor interrupted the 1100Hz
"heartbeat" tone modulated onto its power supply and instead emitted 920Hz for
one second followed by 720Hz for one second. These power supply lines, at 30vdc
(give or take the superimposed audio frequency tone), could run for up to 20
miles until reaching a signal generating station (SGS).
The SGS was a substantial equipment cabinet installed indoors that provided the
power supply to the sensor and, perhaps more importantly, monitored the tone
provided by the sensor. The SGS itself is very interesting, and seems to have
been well ahead of its time in terms of network design principles.
Long series of SGS could be connected together in a loop of telegraph lines.
Each SGS, when receiving a message on its inbound line, decoded and re-encoded
it to transmit on its outbound line. In this way the series of SGS functioned
as a ring network with digital regeneration at each SGS, allowing for very long
distances. This was quite necessary as the SGS rings each spanned multiple
states, starting and ending at one of the three master control stations.
Further, SGS performed basic collision avoidance by waiting for inbound
messages to complete before sending outbound messages, allowing the ring
network to appropriately queue up messages during busy periods.
During normal operation, the master control station transmitted into the ring a
four-character "poll" command, which seems to have been BBBG. This is based on
a telegraph tape shown in a testing document, it is not clear if this was
always the signal used, but BBBG does have an interesting pattern property in
Baudot that suggests it may have been used as a polling message as a way of
testing timing consistency in the SGS. An SGS failing to maintain its
baudot clock would have difficulty differentiating "B" and "G" and so would
fail to respond to polls and thus appear to be offline.
In response to the poll, each station forwarded on the poll message and checked
the tone coming from its attached sensor. If the normal heartbeat or "green"
tone was detected, it sent a "green" status report. For example, "JGBW," where
the first three characters are an identifier for the SGS. Should it fail to
detect a tone, it could respond with a trouble or "yellow" status, although I
don't have an example of that message.
Since each station sending its status would tie up the line, stations further
down would have to wait to report their status. The way this queuing worked
out, a noticeable amount of time after initiating the poll (around ten seconds
by my very rough estimation) the master control station would receive its own
poll command back, followed by green or yellow status messages from each SGS
in the loop, in order. This process, repeated every couple of minutes, was
the routine monitoring procedure.
Any SGS which failed to receive a poll command for 2.5 minutes would
preemptively send a status message. This might seem odd at first, but it was a
very useful design feature as it could be used to locate breaks in the loop. A
damaged telegraph line would result in no responses except for 2.5 minute
status messages from all of the SGS located after the break. This localized
the break to one section of the loop, a vital requirement for a system where
the total loop length could be over a thousand miles.
Should a sensor emit the 920Hz and 720Hz pattern, the attached SGS would wait
for the inbound line to be idle and then transmit a "red" message. For example,
"JGBCY," where "JG" is a station ID, "B" is an indicator of approximate yield
(this appears to have been a later enhancement to the system and I am not sure
of how it is communicated from sensor to SGS), "C" indicates an alarm and "Y"
is an optional terminator. The terminator does not seem to be present on
polling responses, perhaps since they are typically immediately followed by
additional responses.
The SGS "prioritizes" a red message in that as soon as an inbound message ends
it will transmit the red message, even if there is another inbound message
immediately following. Such de-prioritized messages will be queued to be sent
after the red alert. For redundancy, a second red message is transmitted a bit
later after the loop has cleared.
In the master control center, a computer sends poll messages and tracks
responses in order to make sure that all SGS are responsive. Should any red
message be received polling immediately stops and the computer begins recording
the specific SGS that have sent alarms based on their ID letters. At the same
time, the computer begins to read out the in-memory list of alarming stations
and transmit it on to display stations. Following this alarm process, the
computer automatically polls again and reports any "yellow" statuses to the
display stations. This presumably added further useful information on the
location and intensity of the detonation, since any new "yellow" statuses
probably indicate sensors destroyed by the blast. Finally, the computer
resets to the normal polling process.
When desired, an operator at a master control station can trigger the
transmission of a test command to a specific SGS or the entire loop. When
receiving this command, the SGS triggers the xenon flash bulbs in the sensor.
This should cause a blast detection and the resulting red message, which is
printed at the master control center for operator confirmation. This represents
a remarkably well-thought-out complete end-to-end test capability, in good form
for Western Union which at the time seemed to have a cultural emphasis on
complete remote testing (as opposed to AT&T which tended to focus more on
redundant fault detection systems in every piece of equipment).
To architect the network, the nation was first split roughly in half to form
two regions. In each region, three master control centers operated various
SGS loops. Each target area had three sensors, and the SGS corresponding to
each of the three sensors was on a loop connected to a different one of the
three master control centers. This provided double redundancy of the MCCs,
making the system durable to destruction of an MCC as well as destruction
of a sensor (or really, destruction of up to two of either).
In each display center, a computer system decoded the received messages and lit
up appropriate green, yellow, or red lights corresponding to each sensor. The
green and yellow lights were mounted in a list of all sensors, but the red
lights were placed behind a translucent map, providing an at-a-glance view of
the receiving end of nuclear war.
In the '60s, testing of nuclear defense systems was not as theoretical as it is
today. While laboratory testing was performed to design the sensors, the
sensors and overall system were validated in 1963 by the Small Boy shot of
Operation Dominic II. A small nuclear weapon was detonated at the Nevada Test
Site with a set of three BAS sensors mounted around it, adjusted for greater
than usual sensitivity due to the unusually small yield of the test weapon.
They were connected via Las Vegas to the operational BAS network, and as
expected detonation alarms were promptly displayed at the Pentagon and Ent and
Offutt Air Force Bases of the Strategic Air Command, which at the time would be
responsible for a reprisal.
I have unfortunately not been able to find detailed geographical information on
the system. The three Master Control Stations for the Western United States
were located at Helena, SLC, and Tulsa, per the nuclear test report. A map in a
Western Union report on the system that is captioned "Theoretical system
layout" but seems to be accurate shows detector coverage for Albuquerque,
Wyoming, and Montana in the Western region. These would presumably correspond
to Sandia Labs and Manzano Base and the Minuteman missile fields going into
service in the rural north around the same time as BAS.
The same map suggests Eastern master control stations at perhaps Lancaster,
Charlottesville, and perhaps Greensboro, although these are harder to place.
Additional known target areas monitored, based on several reports on the
system, include:
[1] This system, called USNDS as a whole, has a compact space segment that
flies second-class with other military space systems to save money. The main
satellites hosting USNDS are GPS and the Defense Support Platform or DSP, a
sort of general-purpose heat sensing system that can detect various other
types of weapons as well.
I haven't written for a bit, in part because I am currently on vacation in
Mexico. Well, here's a short piece about some interesting behavior I've noticed
here.
I use a cellular carrier with very good international roaming support, so for
the most part I just drive into Mexico and my phone continues to work as if
nothing has changed. I do get a notification shortly after crossing the border
warning that data might not work for a few minutes; I believe (but am not
certain) that this is because Google Fi uses eUICC.
eUICC, or Embedded Universal Integrated Circuit Card, essentially refers to a
special SIM card that can be field reprogrammed for different carrier
configurations. eUICC is attractive for embedded applications since it allows
for devices to be "personalized" to different cellular carriers without
physical changes, but it's also useful for typical smartphone applications
where it allows the SIM to be "swapped out" as a purely software process.
Note well, although the "embedded" seems to suggest it eUICC is not the same as
an "embedded SIM" (e.g. one soldered to the board). eUICC is instead a set of
capabilities of the SIM card and can be implemented either in an embedded SIM
or in a traditional SIM card. Several vendors, particularly in the IoT area,
offer eUICC capable SIMs in the traditional full/mini/micro SIM form factors
to allow an IoT operator to move devices between cellular networks and
configurations post-sale.
Anyway, my suspicion is that Google Fi cuts down on their international service
costs by actually re-provisioning devices to connect to a local carrier in the
country where they are operating. I can't find any information supporting this
theory though, other than clarification that Fi does use embedded (eSIM) eUICC
capability in Pixel devices. Of course the eUICC capabilities can be delivered
in traditional SIM form factor as well, so carrier switching by this mechanism
would not be limited to devices with eSIM. The history of Google Fi as
requiring a custom kernel supports the theory that they rely on eUICC
capabilities, since until relatively recently eUICC was poorly standardized and
Android would likely not normally ship with device drivers capable of
re-provisioning eUICC.
In any case, that wasn't even what I meant to talk about. I was going to say
a bit about cellular voice-over-IP capabilities including VoWiFi and VoLTE,
and the slightly odd way that they can behave in the situation where you are
using a phone in a country other than the one in which it's provisioned. To
get there, we should first cover a bit about how VoIP or "over-the-top
telephony" interacts with modern cellular devices.
Historically, high-speed data modes did not always combine gracefully with
cellular voice connections. Many older cellular air interface standards only
supported being "in a call" or a "data bearer channel," with the result that a
device could not participate in a voice call and a data connection at the same
time. This makes sense when you consider that the data standards were developed
with a goal of simple backwards-compatibility with existing cellular
infrastructure. The result was that basic cellular capabilities like voice
calls and management traffic (SMS, etc) were achieved by the cellular baseband
essentially regressing to an earlier version of the protocol, disabling
high-speed data protocols such as the high-speed-in-name-only HSPDA. Most
early LTE devices carried on this basic architecture, and so when you dialed a
call on many circa 2010s smartphones the baseband basically went back in time
to the 3G days and behaved as a basic GSM device. No LTE data could be
exchanged in the mean time, and some users noticed that they could not, for
example, load a web page while on a phone call.
This is a good time to insert a disclaimer: I am not an expert on cellular
technologies. I have done a fair amount of reading about them, but the full
architecture of modern cellular networks, then combined with all of the legacy
technologies still in use, is bafflingly complicated. I can virtually guarantee
that I will get at least one thing embarrassingly wrong in the length of this
post, especially since some of this is basically speculative. If you know
better I would appreciate if you emailed me, and I will make an edit to avoid
spreading rumors. There are a surprising number of untrue rumors about these
systems!
This issue of not being able to use data while in a phone call became
increasingly irritating as more people started using Bluetooth headsets of
speakerphone and expected to be able to do things like make a restaurant
reservation while on a call with a friend. It clearly needed some kind of
resolution. Further, the many layers of legacy in the cellular network made
things a lot more complicated for carriers than they seemed like they ought
to be. Along with other trends like thinner base stations, carriers saw an
obvious way out... one shared with basically the entirety of the telecom
industry: over-the-top.
If you are not familiar, over-the-top or OTT delivery is an architecture mostly
discussed in fixed telecoms (e.g. cable and wireline telephone) but also more
generally useful as a way of understanding telecom technologies. The basic
idea of OTT is IP convergence at the last mile. If you make every feature of
your telecom product run on top of IP, you simplify your whole outside plant to
broadband IP transport. The technology for IP is very mature, and there's a
wide spectrum of vendors and protocols available. In general, IP is less
expensive and more flexible than most other telecom transports. An ISP is a good
thing to be, and if cellular carriers can get phones to operate on IP alone,
they are essentially just ISPs with some supported applications.
Modern LTE networks are steering towards exactly this: an all-IP air segment
with a variety of services, including the traditional core of voice calls,
delivered over IP. The system for achieving this is broadly called the IP
Multimedia Subsystem or IMS. It is one of an alarming number of blocks in a
typical high-level diagram of the LTE architecture, and it does a lot of work.
Fundamentally, IMS is a layer of the LTE network that allows LTE devices to
connect to media services (mostly voice although video, for example, is also
possible) using traditional internet methods.
Under the hood this is not very interesting, because IMS tries to use standard
internet protocols to the greatest extent possible. Voice calls, for example,
are set up using SIP, just as in most VoIP environments. Some infrastructure is
required to get SIP to interact nicely with the traditional phone system, and
this is facilitated using SIP proxies, DNS records, etc so that both IMS
terminals (phones) and cellular phone switches can locate the "edges" of the
IMS segment... or in other words the endpoints that they need to connect to in
order to establish a call. While there are a lot of details, the most important
part of this bookkeeping is the Home Subscriber Server or HSS.
The HSS is responsible for tracking the association between end subscribers and
IMS endpoints. This works like a SIP version of the broader cellular network:
your phone establishes a SIP registration with a SIP proxy, which communicates
with the HSS to register your phone (state that it is able to set up a voice
connection to your phone) and obtain a copy of your subscriber information for
use in call processing decisions.
This all makes quite a bit of sense and is probably the arrangement that you
would come up with if asked to design an over-the-top cellular voice system.
Where things get a bit odd is, well, the same place things always get odd: the
edge cases. One of these is when phones travel internationally.
An interesting situation I discovered: when returning to our rented apartment,
I sometimes need to call my husband to let me in the front gate. If my phone
has connected to the apartment WiFi network by this point, the call goes
through normally, but with an odd ringing pattern: the typical "warble"
ringback plays only briefly, before being replaced by a fixed sine tone. If, on
the other hand, my phone has not connected to the WiFi (or the WiFi is not
working, the internet here is rather unreliable), the call fails with an error
message that I have misdialed ("El número marcado no es correcto," an unusually
curt intercept recording from Telcel).
Instead, calls via LTE must be dialed as if international: that is, dialed
00-1-NXX-XXX-XXXX. This works fine, and with normal ringback to boot.
So what's going on here?
This answer is partially speculative, but I think the general contours are
correct. First, Google Fi appears to use Telcel as their Mexican carrier
partner. I would suspect this works similarly to Fi's network switching to
Sprint and US Cellular, with a "ghost number" being temporarily assigned (at
least historically, all Google Fi numbers are "homed" with T-Mobile). When not
connected to WiFi, the phone is either using "traditional" GSM voice or is
connecting to Telcel IMS services located using LTE management facilities. As a
result, my phone is, for all intents and purposes, a Mexican cellphone. Calls
to US numbers must be dialed as international because they are international.
However, when connected to WiFi, the phone likely connects to a Google-operated
IMS segment which handles the phone normally, as if it were in the US. Calls to
US numbers are domestic again.
It's sort of surprising that the user experience here is so awkward. This is
pretty confusing behavior, especially to those unfamiliar with WiFi calling.
It's not so surprising though when you consider the generally poor quality of
Android's handling of international travel. Currently many text messages and
calls I receive are failing to match up with contacts, apparently because the
calling number is coming across with an '00' international dialing prefix and
so not matching the saved phone number. Of course, if the call arrives via
WiFi or the message by RCS, it works correctly. One would think that Android
core applications would correctly handle the scenario of having to remove the
international dialing prefix, but admittedly it would probably be difficult
to come up with an algorithmic rule for this that would work globally.
Another interesting observation, also with some preamble: I believe I have
mentioned before that Mexico has a complex relationship with NANP, the unified
numbering scheme for North American countries that makes up the "+1" country
code. While Mexico originally intended to participate in NANP, a series of
events related to the generally complex history of the Mexican telecom industry
prevented that materializing and Mexico was instead assigned country code +52.
The result is that Mexico is "NANP-ish" but uses a distinct numbering scheme,
and the NANP area codes originally assigned to Mexico have since mostly been
recycled as overlays in the US.
A full history of telephone number planning in Mexico could occupy an entire
post (perhaps I'll write it next time I'm here). It includes some distinct
oddities. Most notably, area codes can be either 2 or 3 digits, with 2 digit
area codes being used for major cities. While Mexico had formerly used type of
service prefixes (specific dialing prefixes for mobile phones), these were
retired fairly recently and are no longer required or even permitted.
In principal, telephone numbers for 2-digit area codes can be written
XX-XXXX-XXXX, while three-digit area codes can be written XXX-XXX-XXXX. Note
the lack of Ns to specify digits constrained to 2-9 as in NANP. This is not
entirely intentional, I just don't know if this restriction exists in Mexico
today. Putting together the current Mexican dialing plan from original sources
is a bit tricky as IFT has published changes rather than compiled versions of
the numbering plan. My Spanish is pretty bad so reading all of these is going
to take a while, and it's getting to be pretty late... I'll take this on later,
so you can look forward to a future post where I answer the big questions.
An extremely common convention in Mexico is to write phone numbers as
XX-XX-XX-XX-XX. I'm not really sure where this came from as I don't see e.g.
IFT using it in their documents, but I see it everywhere from handwritten signs
to the customer service number on a Coca-Cola can. Further complicating things,
I have seen the less obvious XXX-XXXX-XXX in use, particularly for toll free
numbers. This seems like perhaps the result of a misunderstanding of the digit
grouping convention for 2 digit area codes.
It seems to be a general trend that countries with variable-length area codes
lack well agreed upon phone number formatting conventions. In the UK, for
example, there is also variability (albeit much less of it). This speaks to one
of the disadvantages of variable-length area codes: they make digit grouping
more difficult, as there's a logical desire to group around the "area code" but
it's not obvious what part of the number that is.
Anyway, there's some more telephone oddities for you. Something useful to think
about when you're trying to figure out why your calls won't connect.
Update: reader Gabriel writes in with some additional info on Mexican telephone
number conventions. Apparently in the era of manual exchanges, it was
conventional to write 4-digit telephone numbers as XX-XX. The "many groups of
two" format is sort of a habitual extension of this. They also note that in
common parlance Mexico City has a 1-digit area code '5' as all '5X' codes are
allocated to it.
This is an experiment in format for me: I would like to have something like
twitter for thoughts that are interesting but don't necessarily make a whole
post. The problem is that I'm loathe to use Twitter and I somehow find most of
the federated solutions to be worse, although I'm feeling sort of good about
Pixelfed. But of course it's not amenable to text.
I would just make these blog posts, but blog posts get emailed out to a decent
number of subscribers now. I know that I, personally, react with swift anger to
any newsletter that dare darken my inbox more than once a week. I don't want to
burden you with another subject line to scroll past unless it's really worth
it, you know? So here's my compromise: I will post short items on the blog, but
not email them out. When I write the next proper post, I'll include any short
items from the meantime in the email with that post. Seem like a fair
compromise? I hope so. Beats my other plan, at least, which was to start a
syndicated newspaper column.
Also, having now written Computers Are Bad for nearly two years, I went back
and read some of my old posts. I feel like my tone has gotten more formal
over time, something I didn't intend.
I would hate for anyone to accuse me of being "professional." In an effort to
change this trend, the tone of these will be decidedly informal and my typing
might be even worse than usual.
The good part
So tonight I was lying on my couch watching Arrested Development yet again
while not entirely sober, and I experienced something of a horror film
scenario: I noticed that the "message" light on my desk phone was flashing.
I remembered I'd missed several calls today, so I retrieved my voice mail.
That is, I pulled out my smartphone and scrolled through my inbox to find the
notification emails with a PCM file attached. Even I don't actually make a
phone call for that.
The voicemail, from a seemingly random phone number in California, was 1 minute
and 8 seconds long. I realized that this was a trend: over the last few days I
had received multiple 1 minute, 8 second voice messages but the first one I had
listened to seemed to be silent. I had since been ignoring them, assuming it
was a telephone spammer that hung up a little bit too late (an amusing defect
of answering machines and voicemail is that it has always been surprisingly hard
for a machine to determine whether a person answered or voicemail, although
there are a few heuristics). Just for the heck of it, though, realizing that I
had eight such messages, I listened to one again.
The contents: a sort of digital noise. It sounded like perhaps very far away
music, or more analytically it seemed like mostly white noise with a little
bit of detail that a very low-bitrate speech codec had struggled to handle.
It was quiet, and I could never quite make anything out, although it always
seemed like I was just on the edge of distinguishing a human voice.
Here's the best part: after finding about fifteen seconds of this to be
extremely creepy, I went back to my email. The sound kept on playing. I
checked the notifications. Still going. It wouldn't stop. I went to the task
switcher and dismissed the audio player. Still going. Increasingly agitated,
and on the latest version of Android which is somehow yet harder to use, I held
power and remembered it doesn't do that any more. I held power and volume down,
vaguely remembering they had made it something like that. No, screenshot.
Holding power and volume up finally got me to the six-item power menu, which
somehow includes an easy-access "911" button even though you have to remember
some physical button escape sequence to get it. Rebooting the phone finally
stopped the noise.
Thoroughly spooked, I considered how I came to this point.
Because I am a dweeb and because IP voice termination is very cheap if you look
in the right places, I hold multiple toll-free phone numbers, several of which
go through directly to the extension of my desk phone. This had been the case
for some time, a couple of years at least, and while I don't put it to a lot of
productive use I like to think I'm kind of running my own little cottage
PrimeTel. Of course basically the only calls these numbers ever get are spam
calls, including a surprising number of car warranty expiration reminders
considering the toll-free number.
But now I remember that there is another type of nuisance call that afflicts
some toll free numbers. You see, toll free numbers exhibit a behavior called
"reverse charging" or "reverse tolling" where the callee pays for the call
instead of the caller. Whether you get your TFN on a fixed contract basis or
pay a per-minute rate, your telephone company generally pays just a little bit
of money each minute to the upstream telephone providers to compensate them for
carrying the call that their customer wasn't going to pay for.
This means that, if you have a somewhat loose ethical model of the phone
system, you can make a bit of profit by making toll-free calls. If you either
are a telco or get a telco to give you a cut of the toll they receive, every
toll-free call you make now nets you a per-minute rate. There is obviously a
great temptation to exploit this. Find a slightly crooked telco, make thousands
of calls to toll-free numbers, get some of them to stay on the phone for a
while, and you are now participating in capitalism.
The problem, of course, is that most telcos (even those that offer a kickback
for toll-free calls, which is not entirely unusual) will find out about the
thousands of calls you are making. They'll promptly, usually VERY promptly due
to automated precautions, give you the boot. Still, there are ways, especially
overseas or by fraud, to make a profit this way.
And so there is a fun type of nuisance call specific to the recipients of
toll-free calls: random phone calls that are designed to keep you on the phone
as long as possible. This is usually done by playing some sort of audio that is
just odd enough that you will probably stay on the phone to listen for a bit
even after you realize it's just some kind of abuse. Something that sounds
almost, but not quite, like someone talking is a classic example.
Presumably one of the many operations making these calls is happy to talk to
voicemail for a bit (voicemail systems typically "supe," meaning that the call
is charged as if it connected). why one minute and eight seconds I'm not sure,
that's not the limit on my voicemail system. Perhaps if you include the
greeting recording it's 2 minutes after the call connects or something.
I've known about this for some time, it's a relatively common form of toll
fraud. I likely first heard of it via an episode of "Reply All" back when that
was a going concern. Until now, I'd never actually experienced it. I don't know
why that's just changed, presumably some operation's crawler just now noticed
one of my TFNs on some website. Or they might have even wardialed it the old
fashioned way and now know that it answers.
Oh, and the thing where it kept on playing after I tried to stop it, as if it
were the distorted voice of some supernatural entity? No idea, as I said, I use
Android. God only knows what part of the weird app I use and the operating
system support for media players went wrong. Given the complexity and generally
poor reliability of the overall computing ecosystem, I can easily dismiss
basically any spooky behavior emanating from a smartphone. I'm not going to
worry about evil portents until it keeps going after a .45 to the chipset...
Maybe a silver one, just in the interest of caution.
One of the great joys of the '00s was the tendency of marketers to apply the
acronym "HD" to anything they possibly could. The funniest examples of this
phenomenon are those where HD doesn't even stand for "High Definition," but
instead for something a bit contrived like "Hybrid Digital." This is the case
with HD Radio.
For those readers outside of these United States and Canada (actually Mexico
as well), HD Radio might be a bit unfamiliar. In Europe, for example, a
standard called DAB for Digital Audio Broadcasting is dominant and, relative
to HD radio, highly successful. Another relatively widely used digital
broadcast standard is Digital Radio Mondiale, confusingly abbreviated DRM,
which is more widely used in the short and medium wave bands than in VHF
where we find most commercial broadcasting today... but that's not a
limitation, DRM can be used in the AM and FM broadcast bands.
HD radio differs from these standards in two important ways: first, it is
intended to completely coexist with analog broadcasting due to the lack of
North American appetite to eliminate analog. Second, no one uses it.
HD Radio broadcasts have been on the air in the US since the mid '00s. HD
broadcasts are reasonably common now, with 9 HD radio carriers carrying 16
stations here in Albuquerque. Less common are HD radio receivers. Many, but not
all, modern car stereos have HD Radio support. HD receivers outside of the car
center console are vanishingly rare. Stereo receivers virtually never have HD
decoding, and due to the small size of the market standalone receivers run
surprisingly expensive. I am fairly comfortable calling HD Radio a failed
technology in terms of its low adoption, but since it falls into the broader
market of broadcast radio standards are low. We can expect HD Radio stations to
remain available well into the future and continue to offer some odd
programming.
Santa Fe's 104.1 KTEG ("The Edge"), for example, a run of the mill iHeartMedia
alt rock station, features as its HD2 "subcarrier" a station called Dance
Nation '90s. The clearly automated programming includes Haddaway's "What Is
Love" seemingly every 30 minutes and no advertising whatsoever, because it
clearly doesn't have enough listeners for any advertisers to be willing to pay
for it. And yet it keeps on broadcasting, presumably an effort by iHeartMedia
to meet programming diversity requirements while still holding multiple top-40
licenses in the Albuquerque-Santa Fe market region [1].
So what is all this HD radio stuff? What is a subcarrier? And just what makes
HD radio "Hybrid Digital?" HD Radio has gotten some press lately because of the
curious failure mode of some Mazda head units, and that's more attention than
it's gotten for years, so let's look a bit at the details.
First, HD Radio is primarily, in the US, used in a format called In-Band
On-Channel, or IBOC. The basic idea is that a conventional analog radio station
continues to broadcast while an HD Radio station is superimposed on the same
frequency. The HD Radio signal is found "outside" of the analog signal, as two
prominent sideband signals outside of the bandwidth of analog FM stereo.
While the IBOC arrangement strongly resembles a single signal with both analog
and digital components, in practice it's very common for the HD signal to be
broadcast by a separate transmitter and antenna placed near the analog
transmitter (in order to minimize destructive interference issues). This isn't
quite considered the "correct" implementation but is often cheaper since it
avoids the need to make significant changes to the existing FM broadcast
equipment... which is often surprisingly old.
It's completely possible for a radio station to transmit only an HD signal,
but because of the rarity of HD receivers this has not become popular. The FCC
does not normally permit it, and has declined to extend the few experimental
licenses that were issued for digital-only operation. As a result, we see HD
Radio basically purely in the form of IBOC. More commonly, HD Radio supports
both a full hybrid mode with conventional FM audio clarity and also an
"extended" mode in which the digital sidebands intrude on the conventional FM
bandwidth. This results in mono-only, reduced-quality FM audio, but allows
for a greater digital data rate.
HD Radio was developed and continues to be maintained by a company called
iBiquity, which was acquired by DTS, which was acquired by Xperia. iBiquity
maintains a patent pool and performs (minimal) continuing development on the
standard. iBiquity makes their revenue from a substantial up-front license fee
for radio stations to use HD Radio, and from royalties on revenue from
subcarriers. To encourage adoption, no royalties are charged on each radio
station's primary audio feed. Further encouraging adoption (although not
particularly successfully), no royalty or license fees are required to
manufacture HD Radio receivers.
The adoption of HD Radio in North America stems from an evaluation process
conducted by the FCC in which several commercial options were considered. The
other major competitor was FMeXtra, a generally similar design that was not
selected by the FCC and so languished. Because US band planning for broadcast
radio is significantly different from the European approach, DAB was not a
serious contender (it has significant limitations due to the very narrow
RF bandwidth available in Europe, a non-issue in the US where each FM radio
station was effectively allocated 200kHz).
Layer 1
The actual HD Radio protocol is known more properly as NRSC-5, for its
standards number issued by the National Radio Systems Council. The actual
NRSC-5 protocol differs somewhat depending on whether the station is AM or FM
(the widely different bandwidth characteristics of the two bands require
different digital encoding approaches). In the more common case of FM, NRSC-5
consists of a set of separate OFDM data carriers, each conveying part of
several logical channels which we will discuss later. A total of 18 OFDM
subcarriers are typically present, plus several "reference" subcarriers which
are used by receivers to detect and cancel certain types of interference.
If you are not familiar with OFDM or Orthogonal Frequency Division
Multiplexing, it is an increasingly common encoding technique that essentially
uses multiple parallel digital signals (as we see with the 18 subcarriers in
the case of HD Radio) to allow each individual signal to operate at a lower
symbol rate. This has a number of advantages, but perhaps the most important is
that it is typically used to enable the addition of a "guard interval" between
each symbol. This intentional quiet period avoids subsequent symbols "blurring
together" in the form of inter-symbol interference, a common problem with
broadcast radio systems where multipath effects result in the same signal
arriving multiple times at slight time offsets.
A variety of methods are used to encode the logical channels onto the OFDM
subcarriers, things like scrambling and convolutional coding that improve the
ability of receivers to recover the signal due to mathematics that I am far
from an expert on. The end result is that an NRSC-5 standard IBOC signal in
the FM band can convey somewhere from 50kbps to 150kbps depending on the
operator's desired tradeoffs of bitrate to power and range.
The logical channels are the interface from layer 1 of NRSC-5 to layer 2. The
number and type of logical channels depends on the band (FM or AM), the
waveform (hybrid analog and digital, analog with reduced bandwidth and digital,
or digital only), and finally the service mode, which is basically a
configuration option that allows operators to select how digital capacity is
allocated.
In the case of FM, five logical channels are supported... but not all at once.
A typical full hybrid station broadcasts only primary channel P1 and a the PIDS
channel, a low-bitrate channel for station identification. P1 operates at
approximately 98kbps. For stations using an "extended" waveform with mono FM,
the operator can select from configurations that provide 2-3 logical channels
with a total bitrate of 110kbps to 148kbps. Finally, all-digital stations can
operate in any extended service mode or at lower bitrates with different
primary channels present. Perhaps most importantly, all-digital stations can
include various combinations of secondary logical channels which can carry
yet more data.
The curious system of primary channels is one that was designed basically to
ease hardware implementation and is not very intuitive... we must remember that
when NRSC-5 was designed, embedded computing was significantly more limited.
Demodulation and decoding would have to be implemented in ASICs, and so many
aspects of the protocol were designed to ease that process. At this point it is
only important to understand that HD Radio's layer 1 can carry some combination
of 4 primary channels along with the PIDS channel, which is very low bitrate
but considered part of the primary channel feature set.
Layer 1, in summary, takes some combination of primary channels, the
low-bitrate PIDS channel, and possibly several secondary channels (only in the
case of all-digital stations) and encodes them across a set of OFDM subcarriers
arranged just outside of the FM audio bandwidth. The design of the OFDM encoding
and other features of layer 1 aid receivers in detecting and decoding this data.
Layer 2
Layer 2 operates on protocol data units or PDUs, effectively the packet of
NRSC-5. Specifically, it receives PDUs from services and then distributes them
to the layer 1 logical channels.
The services supported by NRSC are the Main Program Service or MPS which can
carry both audio (MPSA) and data (MPSD), the similar Supplemental Program
Service which also conveys audio and data, the Advanced Application Service
(AAS), and the Station Information Service (SIS).
MPS and SPS are where most of HD Radio happens. Each carries a program audio
stream along with program data that is related to the audio stream---things
like metadata of the currently playing track. These streams can go onto any
logical channel at layer 1, depending on the bitrate required and available.
An MPS stream is mandatory for an HD radio station, while an SPS is optional.
AAS is an optional feature that can be used for a variety of different
purposes, mostly various types of datacasting, which we'll examine later. And
finally, the SIS is the simplest of these services, as it has a dedicated
channel at layer 1 (the PIDS previously mentioned). As a result, layer 2 just
takes SIS PDUs and puts them directly on the layer 1 channel dedicated to them.
The most interesting part of layer 2 is the way that it muxes content together.
Rather than sending PDUs for each stream, NRSC-5 will combine multiple streams
within PDUs. This means that a PDU may contain only MPS or SPS audio, or it
might contain some combination of MPS or SPS with other types of data. While
this seems complicated, it has some convenient simplifying properties: PDUs can
be emitted for each program stream at a fixed rate based on the audio codec
rate. Any unused space in each PDU can then be used to send other types of
data, such as for AAS, on an as-available basis. The situation is somewhat
simplified for the receiver since it knows exactly when to expect PDUs
containing program audio, and that program audio is always the start of a PDU.
The MPS and one or more SPS streams, if present, are not combined together but
instead remain separate PDUs and are allocated to the logical channels in one
of several fixed schemes depending on the number of SPS present and the
broadcast configuration used by the station. In the most common configuration,
that of one logical channel on a full hybrid radio station, the MPS and up to
two SPS are multiplexed onto the single logical channel. In more complex
scenarios such as all-digital stations, the MPS and three SPS may be
multiplexed across three logical channels. Conceptually, up to seven distinct
SPS identified by a header field can be supported, although I'm not aware of
anyone actually implementing this.
It is worth discussing here some of the practical considerations around the MPS
and SPS. NRSC-5 requires that an MPS always be present, and the MPS must convey
a "program 0" which cannot be stopped and started. This is the main audio
channel on an HD radio station. The SPS, though, are used to convey
"subcarrier" [2] stations. This is the capability behind the "HD2" second audio
channel present on some HD radio stations, and it's possible, although not at
all common, to have an HD3 or even HD4.
Interestingly, the PDU "header" is not placed at the beginning of the PDU.
Instead, its 24-bit sequence (chosen off a list based on what data types are
present in the PDU) are interleaved throughout the body of the PDU. This is
intended to improve robustness by allowing the receiver to correctly determine
the PDU type even when only part of the PDU is received. PDUs always contain
mixed data in a fixed order (program data, opportunistic data, fixed data),
with a "data delimiter" sequence after the program audio and a fixed data
length value placed at the end. This assists receivers in interpreting any
partial PDUs, since they can "backtrack" from the length suffix to identify the
full fixed data section and then search further back for the "data delimiter"
to identify the full opportunistic data section.
And that's layer 2: audio, opportunistic data, and fixed data are collected for
the MPS and any SPS and/or AAS, gathered into PDUs, and then sent to layer 1 for
transmission. SID is forwarded directly to layer 1 unmodified.
Applications
NRSC-5's application layer runs on top of layer 2. Applications consist most
obviously of the MPS and SPS streams, which are used mainly to convey audio...
you know, the thing that a radio station does. This can be called the Audio
Transport application and it runs the same way whether producing MPS (remember,
this is the main audio feed) or SPS (secondary audio feeds or subcarriers).
Audio transport starts with an audio encoder, which is a proprietary design
called HDC or High-Definition Coding. HDC is a DCT-based lossy compression
algorithm which is similar to AAC but adjusted to have some useful properties
for radio. Among them, HDC receives audio data (as PCM) at a fixed rate and
then emits encoded blocks at a fixed rate---but variable size. This variable
size but fixed rate is convenient to receivers but also makes "opportunistic
data," as discussed earlier, possible, because many PDUs will have spare room
at the end.
Another useful feature of HDC is its multi-stream output. HDC can be configured
to produce two different bit streams, a "core" bit stream which is lower bitrate
but sufficient to reproduce the audio at reduced quality, and an "enhanced" data
stream that allows the reproduction of higher fidelity audio. The core bit
stream can be placed on a different layer 1 channel than the enhanced data stream,
allowing receivers to decode only one channel and still produce useful audio when
the second channel is not available due to poor reception quality. This is not
typically used by hybrid stations, instead it's a feature intended for extended
and digital-only stations.
The variable size of the audio data and variable size of PDUs creates some
complexity for receivers, so the audio transport includes some extra data about
sample rate and size to assist receivers in selecting an appropriate amount of
buffering to ensure that the program audio does not underrun despite bursts of
large audio samples and fixed data. This results in a fixed latency from
encoding to decoding, which is fairly short but still a bit behind analog
radio. This latency is sometimes apparent on receivers that attempt to
automatically select between analog and digital signals, even though stations
should delay their analog audio to match the NRSC-5 encoder.
Finally, the audio transport section of each PDU (that is, the MPS or SPS part
at the beginning) contains regular CRC checksums that are used by the receiver
to ensure that any bad audio data is discarded rather than decoded.
MPS and SPS audio is supplemented by Program Service Data (PSD), which can be
either associated with the MPS (MPSD) or an SPS (SPSD). The PSD protocol
generates PDUs which are provided to the audio transport to be incorporated
into audio PDUs at the very beginning of the MPS or SPS data. The PSD is rather
low bitrate as it receives only a small number of bytes in each PDU. This is
quite sufficient, as the PSD only serves to move small, textual metadata about
the audio. Most commonly this is the title, artist, and album, although a few
other fields are included as well such as structured metadata for
advertisements, including a field for price of the advertised deal. This
feature is rarely, if ever, used.
The PSD data is transmitted continuously in a loop, so that a receiver that has
just tuned to a station can quickly decode the PSD and display information
about whatever is being broadcast. The looping PSD data changes whenever
required, typically based on an outside system (such as a radio automation
system) sending new PSD data to the NRSC-5 encoder over a network connection.
PSD data is limited to 1024 bytes total and, as a minimum, the NRSC-5
specification requires that the title and artist fields be populated. Oddly, it
makes a half-exception for cases where no information on the audio program is
available: the artist field can be left empty, but the title field must be
populated with some fixed string. Some radio stations have added an NRSC-5
broadcast but not upgraded their radio automation to provide PSD data to the
encoder; in this case it's common to transmit the station call sign or name as
the track title, much as is the case with FM Radio Data Service.
Interestingly, the PSD data is viewed as a set of ID3 tags and, even though
very few ID3 fields are supported, it is expected that those fields be in
the correct ID3 format including version prefixes.
Perhaps the most sophisticated feature of NRSC-5 is the Advanced Application
Service transport or AAS. AAS is a flexible system intended to send just about
any data alongside the audio programs. Along with PDUs, the audio transport
generates a metadata stream indicating the length of the PDUs which is used.
The AAS can use that value to determine how many bytes are free, and then fill
them with opportunistic data of whatever type it likes. As a result, the AAS
basically takes advantage of any "slack" in the radio broadcast's capacity, as
well as reserving a portion for fixed data if desired by the station operator.
AAS data is encoded into AAS packets, an organizational unit independent of
PDUs (and included within PDUs generated by the audio transport) and loosely
based on computer networking conventions. Interestingly, AAS packets may be
fragmented or combined to fit into available space in PDUs. To account for this
variable structure, AAS specifies a transport layer below AAS packets which is
based on HDLC (ISO high-level data link control) or PPP (point-to-point
protocol, which is closely related to HDLC and very similar). So, in a way, AAS
consists of a loosely computer-network-like protocol over a protocol roughly
based on PPP over audio transport PDUs over OFDM.
Each AAS packet header specifies a sequence number for reconstruction of large
payloads and a port number, which indicates to the receiver how it should
handle the packet (or perhaps instead ignore the packet). A few ranges of
port numbers are defined, but the vast majority are left to user applications.
Port numbers are two bytes, and so there's a large number of applications
possible. Very few are defined by specification, limited basically to port
numbers for supplemental PSD. This might be a bit confusing since PSD has its
own reserved spot at the beginning of the audio transport. The PSD protocol
itself is limited to only small amounts of text, and so when desired AAS can
be used to send larger PSD-type payloads. The most common application of this
"extra PSD" is album art, which can be sent as a JPG or PNG file in the AAS
stream. In fact, multiple ports are reserved for each of MPSD (main PDS) and
SPSD, allowing different types of extra data to be sent via AAS.
Ultimately, the AAS specification is rather thin... because AAS is a highly
flexible feature that can be used in a number of ways. For example, AAS forms
the basis of the Artist Experience service which allows for delivery of more
complete metadata on musical tracks including album art. AAS can be used as
the basis of almost any datacasting application, and is applied to everything
from live traffic data to distribution of educational material to rural areas.
Finally, in our tour of applications, we should consider the station information
service or SIS. SIS is a very basic feature of NRSC-5 that allows a station to
broadcast its identification (call sign and name) along with some basic services
like a textual message related to the station and emergency alert system
messages. SIS has come up somewhat repeatedly here because it receives special
treatment; SIS is a very simple transport at a low bitrate and has its own
dedicated logical channel for easy decoding. As a result, SIS PDUs are typically
the first thing a receiver attempts to decode, and are very short and simple
in structure.
To sum up the structure of HD radio, it is perhaps useful to look at it as a
flow process: SID data is generated by the encoder and sent to layer 2 which
passes it directly to layer 1, where it is transmitted on its own logical
channel. PSD data is provided to the audio transport which embeds it at the
beginning of audio PDUs. The audio transport informs the AAS encoder of the
amount of available free space in a PDU, and the AAS encoder provides an
appropriate amount of data to the audio transport to be added at the end of the
PDU. This PDU is then passed to layer 2 which encapsulates it in a complete
NRSC-5 PDU and arranges it into logical channels which are passed to layer 1.
Layer 1 encodes the data into multiple OFDM carriers using a somewhat complex
scheme that produces a digital signal that is easy for receivers to recover.
Non-Audio Applications of NRSC-5
While the NRSC-5 specification is clearly built mostly around transporting the
main and secondary program audio, the flexibility of its data components like
PSD and AAS allows its use for purposes other than audio. As a very simple
example, SIS packets include a value called the "absolute local frame number"
or ALFN that is effectively a timestamp, useful for receivers to establish the
currency of emergency alert messages and for various data applications.
Because the current time can be easily calculated from the ALFN, it can be used
to set the clocks on HD radio receivers such as car head units. To support
this, standard SIS fields include information on local time zone, daylight
savings time, and even upcoming leap seconds.
SIS packets include a one-bit flag that indicates whether or not the ALFN is
being generated based on a GPS-locked time source, or based on the NRSC-5
encoder's internal clock only. To avoid automatically adjusting radio clocks to
an incorrect time (something that had plagued the earlier CEA protocol for
automatic setting of VCR clocks via PBS member stations), NRSC-5 dictates that
receivers must not set their display time based on a radio station's ALFN
unless the flag indicating GPS lock is set. Unfortunately, it seems that it's
rather uncommon for radio stations to equip their encoder with a GPS time
source, and so in the Albuquerque market at least HD Radio-based automatic time
setting does not work.
Other supplemental applications were included in the basic SIS as well, notably
emergency alert messages. HD Radio stations can transmit emergency alert
messages in text format with start and end times. In practice this seems to be
appreciably less successful than the more flexible capability of SiriusXM,
and ironically despite its cost to the consumer SiriusXM might have better
market penetration than HD Radio.
NRSC-5's data capabilities can be used to deliver an enhanced metadata
experience around the audio programming. The most significant implementation of
this concept is the "artist experience" service, a non-NRSC-5 standard
promulgated by the HD Radio alliance that uses the AAS to distribute more
extensive metadata including album art in image format. This is an appreciably
more complex process and so is basically expected to be implemented in software
on a general-purpose embedded operating system, rather than the hardware-driven
decoding of audio programming and basic metadata. Of course this greater
complexity lead more or less directly to the recent incident with Mazda HD
radio receivers in Seattle, triggered by a station inadvertently transmitting
invalid Artist Experience data in a way that seems to have caused the Mazda
infotainment system to crash during parsing. Fortunately infotainment-type HD
radio receivers typically store HD Radio metadata in nonvolatile memory to
improve startup time when tuning to a station, so these Mazda receivers
apparently repeatedly crashed every time they were powered on to such a degree
that it was not possible to change stations (and avoid parsing the cached
invalid file). Neat.
Since Artist Experience just sends JPG or PNG files of album art, we know that
AAS can be used to transmit files in general (and looking at the AAS protocol
you can probably easily come up with a scheme to do so). This opens the door to
"datacasting," or the use of broadcast technology to distribute computer data.
I have written on this topic
before.
To cover the elements specific to our topic, New Mexico's KANW and some other
public radio stations are experimenting with transmitting educational materials
from local school districts as part of the AAS data stream on their HD2
subcarrier. Inexpensive dedicated receivers collect these files over time and
store them on an SD card. These receiver devices also act as WiFi APs and offer
the stored contents via an embedded web server. This allows the substantial
population of individuals with phones, tablets, or laptops but no home internet
or cellular service to retrieve their distance education materials at home,
without having to drive into town for cellular service (the existing practice
in many parts of the Navajo Nation, for example) [3].
There is potential to use HD Radio to broadcast traffic information services,
weather information, and other types of data useful to car navigation systems.
While there's a long history of datacasting this kind of information via radio,
it was never especially successful and the need has mostly been obsoleted by
ubiquitous LTE connectivity. In any case, the enduring market for this type of
service (over-the-road truckers for example) has a very high level of SiriusXM
penetration and so already receives this type of data.
Fall of the House of Hybrid Digital
In fact, the satellite angle is too big to ignore in an overall discussion of
HD Radio. Satellite radio was introduced to the US at much the same time as HD
Radio, although XM proper was on the market slightly earlier. Satellite has the
significant downside of a monthly subscription fee. However, time seems to have
shown that the meaningful market for enhanced broadcast radio consists mostly
of people who are perfectly willing to pay a $20/mo subscription for a
meaningful better service. Moreover, it consists heavily of people involved in
the transportation industry (Americans listen to the radio basically only in
vehicles, so it makes sense that the most dedicated radio listeners are those
who spend many hours in motion). Since many of these people regularly travel
across state lines, a nationwide service is considerably more useful than one
where they have to hunt for a new good station to listen to as they pass
through each urban area.
All in all, HD radio is not really competitive for today's serious radio
listeners because it fails to address their biggest complaint, that radio is
too local. Moreover, SiriusXM's ongoing subscription revenue seems to provide a
much stronger incentive to quality than iHeartMedia's declining advertising
relationships [4]. The result is that, for the most part, the quality of
SiriusXM programming is noticeably better than most commercial radio stations,
giving it a further edge over HD Radio.
Perhaps HD Radio is simply a case of poor product-market fit, SiriusXM having
solved essentially the same problems but much better. Perhaps the decline of
broadcast media never really gave it a chance. The technology is quite
interesting, but adoption is essentially limited to car stereos, and not even
that many of them. I suppose that's the problem with broadcast radio in
general, though.
[1] The details here are complex and deserve their own post, but as a general
idea the FCC attempts to maintain a diversity of radio programming in each
market by refusing licenses to stations proposing a format that is already used
by other stations. Unfortunately there are relatively few radio formats that
are profitable to operate, so the broadcasting conglomerates tend to end up
playing games with operating stations in minor formats, at little profit, in
order to argue to the FCC that enough programming diversity is available to
justify another top 40 or "urban" station.
[2] The term "subcarrier" is used this way basically for historical reasons and
doesn't really make any technical sense. It's better to think of "HD2" as being
a subchannel or secondary channel, but because of the long history of radio
stations using actual subcarrier methods to convey an alternate audio stream
the subcarrier term is stuck.
[3] It seems inevitable that, as has frequently happened in the history of
datacasting, improving internet access technology will eventually obsolete this
concept. I would strongly caution you against thinking this has already
happened, though: even ignoring the issue of the long and somewhat undefined
wait, Starlink is considerably more expensive than the typical rates for rural
internet service in New Mexico. It is to some extent a false dichotomy to say
that Starlink is cost uncompetitive with DSL considering that it can service a
much greater area. However, I think a lot of "city folk" are used to the
over-$100-per-month pricing typical of urban gigabit service and so view
Starlink as inexpensive. They do not realize that, for all the downsides of
rural DSL, it is very cheap. This reflects the tight budget of its consumers.
For those who have access, CenturyLink DSL in New Mexico ranch country is
typically $45/mo no-contract with no install fee and many customers use a
$10/mo subsidized rate for low income households. Starlink's $99/mo and $500
initial is simply unaffordable in this market, especially since those outside
of the CenturyLink service area have, on average, an even lower disposable
income than those clustered near towns and highways.
[4] It is hard for me not to feel like iHeartMedia brought this upon
themselves. They gained essentially complete control of the radio industry
(with only even sadder Cumulus as a major competitor) and then squeezed it for
revenue until US commercial radio programming had become, essentially, a joke.
Modern commercial radio stations run on exceptionally tight budgets that have
mostly eliminated any type of advantage they might have had due to their
locality. This is most painfully apparent when you hear an iHeartMedia station
give a rare traffic update (they seem to view this today as a mostly pro forma
activity and do it as little as possible) in which the announcer pronounces
"Montaño" and, more puzzlingly, "Coors" wrong in the span of a single sentence.
I have heard a rumor that all of the iHeartMedia traffic announcements are done
centrally from perhaps Salt Lake City but I do not know if this is true.