On the front page of HN today was an [article with a confusing headline],
"Farmers 'crippled' by satellite failure as GPS-guided tractors grind to a
halt." Of course the headline doesn't actually say this, but it does seem
to imply that there has been some kind of failure of GPS.
This is not quite as outlandish as it might sound. The GPS system, maintained
by the US Space Force, has occasionally suffered from serious funding shortages
and bureaucratic paralysis. In 2009, the GAO issued a concerning report. They
found that the program to launch a new generation of GPS satellites was so far
behind schedule that the system was in danger of falling below the required 24
satellites. This would result in partial outages of GPS in different parts of
the world, and could potentially take years to resolve. In response the Air
Force issued a set of strongly worded statements insisting that they had kept
GPS working for fifteen years and would consider to do so, and moreover made
some changes to accelerate the GPS-III program.
Fortunately, we are now largely out of the woods on this issue, as not just
GPS-III but subsequent designs are in service and the GPS constellation has
been restored to its full planned size---including on-orbit spares for use in
case of an unexpected failure. This is not to say that the administration of
GPS is all good news; the GAO continues to issue more or less annual reports on
how military programs to acquire more advanced GPS receivers (supporting the
newer "M-code" signal) are badly mismanaged. But at least the civilian aspect
of GPS should be quite reliable for years to come.
In any case, any significant failure of GPS would become major international
news. Another perennial topic of GAO reports is the failure of the US
government and infrastructure operators to develop any meaningful backup or
alternative for GPS. A GPS failure would cause huge swaths of transportation
and communications infrastructure to malfunction. While I remain a huge
advocate for the construction of the terrestrial PNT (position, navigation,
time) technology called eLORAN, it would cost money, and not in the F-35 kind
of way, so it's unlikely to happen.
What happened in Australia, the article tells us about halfway through, was
actually a problem with Inmarsat-41. Inmarsat is a bit of a historical oddity,
in the same category as ARINC (Aviation Radio Inc.). It was founded as the
International Maritime Satellite Organization, a non-profit entity sponsored by
the UN's International Maritime Organization to develop a satellite network for
emergency communications at sea. In 1998, though, Inmarsat was privatized,
becoming a British company. Despite its historic legacy, Inmarsat is today just
one of several major commercial satellite communications networks. It offers
two-way telephony and data service, and carries both first and third-party
broadcast services.
One of these services is SBAS, the Space-Based Augmentation System.
"Augmentation System" is a common, if not very specific, term for GPS
correction systems. The topic of GPS correction systems is sort of complicated,
and we're going to celebrate 04/20 by going into them in some depth. Speaking
of celebrating 04/20, this might be more of a ramble than a well-formed
narrative, but there's a lot of interesting ideas related to PNT correction.
First, let's start with the concept of GPS augmentation systems. GPS fixes are
subject to a number of sources of error. A high-quality GPS receiver with a
sufficiently long observation time can eliminate most of the receiver-based
positioning error, but there remain some pernicious errors which are hard to
eliminate. Two of the most prominent are orbital ephemera and atmospheric
effects.
First, orbital ephemera: to produce a GPS fix, receivers need to know the
locations of the GPS satellites in orbit. In order to provide this information,
ground stations observe the locations of GPS satellites and produce orbital
ephemera, sets of parameters that describe the satellite's elliptical paths in
relation to the earth. As the term "ephemera" suggests, these are "point in
time" measurements that describe the current path of the satellites. Details of
how space works like gravitational perturbations mean that satellite orbits are
prone to changes, and satellites sometimes navigate to correct their orbits.
This is why observations are used to determine the ephemera, and these
observations are carried out on a continuous basis.
The GPS operators at the Second and Nineteenth Space Operations Squadrons
regularly upload updated ephemera to the GPS satellites which transmit them for
receivers to use. For practical reasons the ephemera are transmitted at a very
low bitrate and can take some time to receive, which can contribute to "cold
start" times on traditional GPS receivers of upwards of ten minutes. Most
modern GPS receivers use some form of "assisted GPS" to reduce this time, with
the most common example being the widespread practice of smartphone GPS
receivers obtaining current ephemera from a web service instead of "waiting
for it to come around" on the low-rate GPS data feed.
The problem is that new ephemera are only produced hourly, and the orbits of
satellites varies on a minute level too rapidly for hourly observations to keep
up with. Additional latency in the upload process means that the ephemera
transmitted by GPS satellites can be several hours old. The observations are
somewhat limited in precision anyway, considering the incredible precision
modern GPS receivers are capable of. So, variation in the actual orbit of
satellites from the ephemera leads to ephemeris error in GPS fixes.
Ephemeris error isn't actually that big of a source of error, but it's an
interesting one so I wanted to talk about it anyway. Besides, it has
introduced us to the term "assisted GPS" or "AGPS," which is unrelated to
but often confused with augmentation systems. Assisted GPS just refers to
the use of an alternate data path, such as IP, to obtain the current GPS
network information that receivers need to make a first fix.
Second are atmospheric effects. GPS positioning relies on extremely accurate
time-of-flight measurements. Unfortunately, the atmospheric is a weirdly
complex thing, and numerous effects (such as "weather") result in radio
frequency radiation traveling by indirect, non-linear paths. This problem
is especially acute when dealing with satellites since, well, they're up in
space, and so the received signals have to travel through a whole lot of
atmosphere to get down here. This includes the ionosphere, which interacts
with RF in particularly strange ways and can add significant travel time.
There are a number of approaches to reducing these errors, but one of the most
common is differential GPS or DGPS. DGPS relies on a convenient property of
most sources of GPS error: satellite position, satellite clock drift, and
atmospheric conditions all tend to contribute a pretty similar error across a
region. This means that if you can measure the error of GPS fixes in one place,
the error at other nearby places is probably pretty similar. This is exactly
what DGPS does: a reference station with a precisely surveyed location
uses a high-quality GPS receiver to obtain very accurate fixes. The difference
between the GPS fix and known location is then distributed as a correction
signal that can be applied by other GPS receivers in the same area, canceling
much of the inaccuracy. DGPS is one of the most common types of augmentation
system, and is widely used by higher-quality GPS receivers.
Numerous augmentation systems exist, which makes them more complicated to talk
about. The big issue is how the correction signal is distributed. Historically,
one of the earliest major DGPS systems was the NDGPS or Nationwide DGPS. NDGPS
is operated by the US Coast Guard (having originally been developed for
maritime navigation) and uses a network of site across the US that transmit
correction signals at around 300kHz. NDGPS has a great historical detail: when
it was expanded from coastal stations only to a nationwide system, a great deal
of the new inland correction stations were installed on recently retired sites
of the Ground Wave Emergency Network (GWEN). GWEN was a short-lived survivable
radio system operated by the military for continuity of government and command
and control purposes, and much of the modern conspiracy theory around 5G can be
directly traced to historic conspiracy theories surrounding GWEN as a
government mind control system.
NDGPS is now being retired, though, as WAAS is viewed as a replacement. WAAS,
the Wide Area Augmentation System, is the FAA's version of the same concept,
intended for aviation use. Most WAAS reference stations are installed on the
roofs of air route traffic control centers, with some others added as needed to
fill gaps. WAAS differs from NDGPS in an important way: WAAS correction signals
are actually distributed by three different commercial communications
satellites. WAAS comes from space, just like GPS itself.
This is a good time to point out an odd bit of terminology: DGPS and
augmentation systems are essentially the same thing, but for largely historic
reasons "DGPS" usually refers to correction signals distributed by terrestrial
radio while "augmentation system" usually refers to correction signals
distributed by satellite.
You can probably see where this goes with Inmarsat. WAAS provides complete
coverage of North America, but not of elsewhere. Inmarsat operates equivalent
space-based augmentation signals as a contractor to several national
governments, including Australia, New Zealand, and likely soon the UK. And
that's what broke: some kind of problem with the satellite caused a
disruption of the SBAS feed for Australia.
While we're talking about GPS augmentation, we should also talk briefly about
RTK, Real-Time Kinematic correction. RTK, DGPS, and augmentation are sometimes
used almost synonymously, but once again there is a conventional difference in
the meanings. RTK most literally refers to the observation of the phase of the
the GPS signals. The phase can be used as additional timing information to
estimate the distance between the receiver and the satellite. Microwave GPS
signals have a short wavelength (about 19cm), and electronics for observing
phase difference can be made very precise, so this method can produce extremely
accurate fixes. The problem is that RTK only tells you where the receiver is
within a wavelength, or in other words it tells you where you are in a 19cm
window, but not which 19cm window you're in.
To resolve this mystery, RTK is almost always used with a reference station so
that the phase can be compared between a fixed point (with a location
determined through long, 48-hour+ observation) and the moving receiver used for
surveying in the field. Since RTK is mostly used by surveyors who are striving
for huge accuracy and aren't in a hurry, it's most often done with a portable
reference station installed nearby for the duration of the surveying project.
If you've ever run into a skywards antenna connected to a chained-down Jobox
with a solar panel, it's very likely an RTK reference station installed for
some highway construction project. While RTK tends to imply DGPS, it's a
separate technique, and produces even higher precision than DGPS or
augmentation. It's also possible to use RTK without a reference station at all,
by analyzing fixes from multiple locations later... a method referred to as
virtual reference station.
So how much does this stuff actually impact our lives? Well, for the most part
we tend to use smartphones for PNT in our daily lives, and smartphones have a
somewhat different bag of tricks that relies on the cellular network to assist
positioning. The thing is, smartphone GPS receivers really aren't very accurate
at all. They rely on network assistance not for precision but in order to
obtain any fix at all. Smartphones are constantly used in situations like urban
environments and even indoors where GPS reception is poor and subject to huge
multipath error. Cellular receivers bootstrap based on hints from the network
(using the known locations of cellular towers) to accelerate GPS fix, and to
produce a fix at all when they can't detect signals from a sufficient number of
GPS satellites.
And that's it on PNT for today, but maybe I'll come back to shill more for
eLORAN in the future.
One of the foundational goals of computer technology, at least as understood by
popular culture, is to automate away our jobs. When your job is (put simply) to
get computers to work correctly, there's sort of an irony to this view of
things. Still, just within the span of my career there have been technological
advancements that have both automated away parts of my job (Puppet might be an
example here, the prior art of CFEngine notwithstanding) and given me plenty
more to bill time for (Kubernetes).
With the rise of large language models (LLMs), there is a large camp of people
who think these ML applications are going to automate away larger portions of
more jobs. I find myself curious: should I fear for my employment?
The thing is, a large portion of my profession could be accurately described as
"Googling it," although lately I'm racking up a Kagi bill instead. Obviously
this is an important strategy for things like error messages where it's often
faster to see if someone has solved the same problem before than to figure it
out from first principles. But it's sort of true for routine tasks as well...
maybe my memory is just unusually poor, but there's a huge set of tasks where I
will repeatedly look up the same Digital Ocean guide because I just want to
copy-paste their config file snippets instead of having to look in docs to
remember the names of the right directives.
Of course, what makes me a "professional" is that I have opinions about the
correct ways certain things should be done, so I often ignore parts of these
guides or make changes to suit my preferences on important issues like Unix
domain sockets or localhost network sockets for communication with application
servers. In a way it's sort of silly to follow some guide on
best-linux-tutorials.biz and then just ignore the parts I disagree with but,
well, that's what I tend to do. So I wonder: would a better approach be an LLM?
In today's episode, I'm going to ask ChatGPT for guides for some increasingly
complex Linux sysadmin and DevOps tasks and then see whether or not I agree
with its output. I don't really intend this as that serious of a critique of
the use of ML in operations but hey, maybe we'll learn something useful!
The LEMP Stack
Back when I was a child first learning to hate software, the dominant stack for
websites was "LAMP." This stood for Linux, Apache, MySQL, PHP. Nowadays it
seems that at least a few people are saying "LEMP" for Linux, eNginx,
Mymariadb, PHP? I don't like this, aesthetically. In any case, there's a number
of things in PHP that I tend to deploy a lot, Dokuwiki being a prime example.
This means that I am repeatedly wanting to set up Nginx to proxy requests to
PHP-FPM, which is not difficult but does involve remembering a couple of config
directives and where php-fpm puts its socket by default. Most of the time I
don't actually want an RDBMS (personally I usually just use sqlite for
everything) so for a long time I've googled for some guide and copied their
snippets while ignoring the parts about MySQL/MariaDB. Let's see if ChatGPT
has something more to the point for this very common and simple task.
Prompt: "how do I set up nginx and php?"
To set up Nginx and PHP on your server, follow these steps:
Install Nginx on your server:
For Ubuntu/Debian:
sudo apt update
sudo apt install nginx
For CentOS/RHEL:
sudo yum update
sudo yum install nginx
The output starts off strong by offering snippets for both "Ubuntu/Debian" and
"CentOS/RHEL." These two cover the good majority of the Linux server landscape,
and while I could quibble with the label "CentOS/RHEL" rather than something
that doesn't invoke the mostly-dead CentOS project like "RHEL/Fedora," ChatGPT
is following the same convention most people do.
The instructions it provides are pretty reasonable, but there's something
interesting here that those of you familiar with the two dominant Linux package
manager families might detect. For Debian, it tells us to 'update' and then
'install.' for RHEL, it tells us to 'update' and then 'install.' These are
neatly parallel except that the 'update' subcommand of apt and yum do fairly
different things! On Debian it will retrieve the repo metadata and then install
nginx (upgrading software is a different subcommand), on RHEL it will upgrade
all of the managed software on the system and then install nginx (repo metadata
is retrieved automatically by yum when it exceeds a configured 'stale'
timeout).
I'm not sure how much we can make of this. It seems likely to me that we are
seeing ChatGPT's lack of understanding of the underlying material: it is
extremely common for people to 'update' and then 'install' on both platforms,
so each in isolation is pretty reasonable, but it's odd for it to put them in
parallel without noting that they will do different things. If the 'yum update'
were an intentional choice we should see 'apt upgrade,' but we don't.
Install PHP and the required PHP extensions:
For Ubuntu/Debian:
sudo apt install php-fpm php-mysql php-mbstring
For CentOS/RHEL:
sudo yum install php-fpm php-mysql php-mbstring
This is a little odder. It says "the required PHP extensions" which are
apparently mysql and mbstring. It's not clear where or why these are
required, which makes me think ChatGPT got this command from some guide on
deploying specific software. WordPress might be to blame here as it requires
mbstring, and the sheer number of LAMP/LEMP guides for the assumption that
we will need the mysql extension.
Configure Nginx to use PHP:
Create a new server block configuration file in /etc/nginx/sites-available directory with the following content:
server {
listen 80;
server_name example.com;
root /var/www/html;
index index.php index.html index.htm;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
}
}
Replace example.com with your domain name and php7.4 with your installed PHP version.
This is pretty much what you would find in any guide. One interesting thing is
the 'try_files' directive. There are about a million ways to set this up in
nginx and some people prefer different approaches, but what we see here is
pretty typical of php setup guides and should work fine. The catch is
explicitly passing the $query_string which sort of gets eaten otherwise when
nginx internally redirects to use index.php.
Test your Nginx configuration:
sudo nginx -t
Reload Nginx to apply the new configuration:
sudo systemctl reload nginx
Hey, nice work! A surprising number of people don't know about '-t' to validate
config files.
Create a PHP file to test if everything is working:
sudo nano /var/www/html/info.php
Add the following content:
<?php
phpinfo();
?>
Open your web browser and navigate to http://your-server-ip/info.php. You should see the PHP information page.
That's it! You now have Nginx and PHP installed and configured on your server.
So yeah, this isn't exactly part of the install process, but still very
normal for guides.
Overall I'd say this is about an 8/10. The only things I disliked were:
Telling the user to upgrade packages on RHEL... not an unusual thing in
guides to update first "for good measure" but I personally feel that it's not a
great practice to tell the user to 'yum update' without explaining that will
update anything.
Telling the user to install mysql and mbstring, which aren't really going to
do any harm, but if you're going to include a "common" set of PHP extensions
in the instructions it should be bigger than just these two. It's just odd.
'du' and 'df' disagree
This is actually an interview question I like to use for Linux-intensive
positions. It might seem a little in the weeds but to be honest I think I run
into this situation in real life probably at least once a month. The problem
looks like this: tools like 'df' tell you that a partition is 99% or 100% full,
and you might be having problems because of the full partition. But when you
use 'du' to investigate, it shows a total usage that is appreciably less than
'df' does. Why do these two-letter commands not get along? is dd somehow the
answer?
Prompt: On Linux, a partition is full but du doesn't show as much data as df does.
If you are seeing a discrepancy between the output of du and df on a Linux
system, where df reports that a partition is full but du does not show as
much data, it's possible that there are files that are being held open by
processes and therefore are not being deleted even though they have been
unlinked (deleted).
This actually isn't the answer that I expect but it's a reasonable answer. This
could very well be the cause, I just find that this situation is less common
than having a mount that "shadows" data on the volume at the next higher mount
point. For example, say you have a volume mounted at / and a volume mounted at
/home. If you have some kind of problem that results in /home failing to mount,
or changed the filesystem setup at some point, or even have weird boot-order
problems, you can sometimes end up with data in /home/ on the volume mounted at
/. This data counts against the usage of the volume at / but won't show up in
tools like 'du' since it's "shadowed" by /home/ now being a mountpoint to
another volume. Easy problem to have if you are rearranging the volumes/mounts
on a system that's already been in use.
I'll omit the rest of the output, but it makes sense. It tells us to use 'lsof'
to figure out what process is keeping the files open and then kill that
process. The guide does include the requisite paragraph that you need to
investigate to determine how this happened in the first place.
Overall I'm giving this a 5/10, because it gave a reasonable answer but omitted
the possibility of mount shadowing. Which of these two causes is more likely
depends on the system and workload, but I think a thorough answer should
include both possibilities.
Per-process network policy
One sort of common "advanced" Linux networking scenario is when you are using a
full-tunnel VPN and want to route all traffic through it, but you have to get
the VPN itself to connect to its endpoint without trying to go through itself.
There are a few ways to solve this problem, but one of the less common and (in
my opinion) more elegant approaches is to get the VPN service to use its own
special routing table. This same concept also makes it easy to set up firewall
policy around the VPN to prevent leakage. Now there are a lot of caveats to
this and I'm really just talking about userspace VPNs here, but that probably
makes it a good challenge for ChatGPT.
prompt: How do I configure per-process route tables on Linux
To configure per-process route tables on Linux, you can use the ip command and the setsockopt() system call.
This will create a new routing table with the name mytable and the ID 200.
It maybe seems a little weird that this is the way to create a routing table
but it's fine. The iproute2 stuff that facilitates multiple routing tables
without multiple headaches is pretty new and there isn't a lot of porcelain
around it... and I'm not sure there ever will be, this isn't a very common task
and while editing the file seems a little old-school compared to most of the
contemporary network tooling it works just fine. There's very little need to
expose this over a bus or anything.
Configure the routes for the new table:
sudo ip route add default via dev table mytable
This command will add a default route to the new table with the specified gateway and interface.
I will take this moment to make a couple of funny observations about the
mechanics of ChatGPT's output. First, every step in this guide is numbered
1. Some things here are probably copy-paste errors on my part (I'm reformatting
the output to look better in plaintext), but that isn't, this output has four
step ones. Another is the code blocks. I'm just using plaintext but ChatGPT
uses nice embedded code blocks with syntax highlighting. Syntax highlighting
means that it needs to know (or guess) the language it's highlighting and
the codeblocks show what language they're set to. It is... seemingly random?
The above shell command is 'csharp', but 'sql' seems to be a more common
guess for shell commands than even 'bash'. This just means that most of the
time the syntax highlighting doesn't really make sense.
I'm not sure where either of these are going wrong, but I get the feeling
(without doing any actual research) that ChatGPT is outputting markdown.
Markdown numbered lists can be fiddly and there are a few ways to end up with
all step 1 like this. When it comes to the code blocks, I'm not sure if chatGPT
is specifying the language or if the code block service is guessing it based on
contents. I tend to guess the former because otherwise the guesses are very
weird. I can't see any reason it would think one of these commands is csharp
and the other is sql, they both have the same set of characters in them.
And yeah, this instruction makes perfect sense. You need to add some routes to
the new route table.
Configure the process to use the new table:
sudo ip rule add from table mytable
This command will add a new rule to use the mytable routing table for traffic originating from the specified IP address.
:thinking_emoji:
There's some clear semantic confusion here. I think ChatGPT is just going
too readily to the many guides it's read that create a route table and then
apply that route table based on ip rules. This is a very common thing to do,
I would wager the most common way to use custom route tables, but it doesn't
help in our case. The intro of this step is also just wrong (this has nothing
to do with the process), but the sentence after correctly explains what it does.
Finally, start the process with the setsockopt() system call to bind it to the new table:
This code will bind the process to the eth0 interface and mark its packets with the mytable routing table ID.
This is, well, it's not exactly wrong but it's also not right.
If I fiddled with the prompt I'm sure I could get it to output something that
makes more sense, but I also feel like this is an unreasonable way to interpret
the prompt I did provide. To be clear, I expected that it would use cgroups and
iptables to apply the new custom route table to some existing application. This
seems like a way more common situation than to... write or modify an existing
application in C. On the upside, the code block correctly has the language as
'c'!
All that said, if we start from the assumption that we are writing a new
application this still doesn't work exactly. The option we see here, SO_MARK,
applies a 'mark' in the sense of netfilter to traffic on the socket. Marks are
a super useful feature that allows us to set an arbitrary tag on each packet
that we can then refer to in our other network policy. A very common use for
marks is to match the traffic up with an iptables rule that sets the routing
table for that traffic... but ChatGPT didn't tell us to set that rule, it had
us set the rule based on source port instead. I think maybe ChatGPT was
trying to cover this in step 1 number 3 but it didn't quite choose the right
rule.
Repeat steps 3 and 4 for each process that needs to use the new routing table.
There are no steps 3 and 4! There is only step 1!
Overall I'd give this a 3/10. Honestly I think ChatGPT did better than I
expected when considering that this is not a very common use-case, but
ChatGPT's output is worse than what you get if you google the same question.
Takeaways
This isn't intended to be a super carefully thought out research piece, but
like others I've seen reporting that suggests that ChatGPT is pretty good at
generating source code but has a tendency to leave subtle errors. I wondered
if the same thing might be true in the old-school system administration space,
but the feeling I walked away with is a little different: ChatGPT does make a
lot of errors, but they aren't particularly subtle. I suspect this has to do
with the different nature of the training material for these kind of questions,
which is more in the form of narrative guides and documentation that keep steps
simple but leave a lot of assumptions.
I pretty much covered every question I thought to ask in a short time spent
playing around. There was one question that I was just unable to get ChatGPT to
generate a good answer for... an AWS network architecture question involving
load balancing for both HTTP(S) and arbitrary TCP services on the same elastic
IP when you want layer 7 behavior. I could mark this as a 0/10 for ChatGPT but
it might be a case of more "prompt engineering" being required... it kept
producing output that was reasonable but failed to address one or more of my
requirements, so I kept making the requirements more explicit, and then it
would just miss a different set of requirements. It may very well be possible
to get ChatGPT to produce a correct solution but it was definitely getting to
be less useful than a search engine, even with how difficult the AWS
documentation can be to use for architecture questions.
On the whole, I felt that ChatGPT was performing more poorly than Google for
similar queries. It's possible to get ChatGPT to refine its output by adding
more detail (at least if you don't get too deep into AWS networking
capabilities), which is a big plus over a conventional search engine, but
honestly it still didn't feel to me like this was a savings of effort over
reading a few different articles and synthesizing.
One of the reasons this was on my mind is because I'm working with a client
right now who has an interesting habit of copying and pasting all the error
messages they get into ChatGPT, while still screen sharing. From this sort of
eavesdropping on AI I have not been very impressed with its output, which has
often been high-level to the point of uselessness. It sort of has the feel
of AWS documentation, actually... he would put a very specific error from,
let's say, Terraform into ChatGPT and it would answer with a few paragraphs
about how Terraform works at a high level. I'm sure this can be improved with
a more specific training corpus but I'm not sure what that corpus would be,
exactly, which continues to stymie my plans to just forward all the emails I
get from clients to ChatGPT and give them the response.
I mean, I think it would keep them going back and forth for a while, but they
might feel like it's a good value for money.
Lately I tend to stick to topics that are historic by at least twenty years,
and that does have a lot of advantages. But I am supposedly a DevOps
professional, and so I will occasionally indulge in giving DevOps advice... or
at least opinions, which are sort of like advice but with less of a warranty.
There's been a lot of discussion lately about Docker, mostly about their
boneheaded reversal following their boneheaded apology for their boneheaded
decision to eliminate free teams. I don't really care much about this event in
terms of how it impacts my professional work. I long ago wrote off Docker, Inc.
as a positive part of the DevOps ecosystem. But what's very interesting to me
is how we got here: The story of Docker, Docker Inc., Docker Hub, and their
relation to the broader world of containerization is endlessly fascinating to
me.
How is it that Docker Inc., creator of one of the most important and ubiquitous
tools in the modern software industry, has become such a backwater of
rent-seeking and foot-shooting? Silicon Valley continually produces some
astounding failures, but Docker stands out to me. Docker as a software
product is an incredible success; Docker as a company is a joke; and the
work of computing professionals is complicated by the oddly distant and yet
oddly close connection between the two.
Docker, from a technical perspective, is more evolutionary than revolutionary.
It mostly glued together existing Linux kernel features, following a road that
had at least been graded, if not paved and striped, by projects like LXC.
Docker as a concept, though, had a revolutionary impact on the DevOps field.
Docker quickly became one of the most common ways of distributing server-side
software, and whole development workflows rearranged themselves around it.
Orchestration tools like the ones we use today are hard to picture without
Docker, and for many professionals Docker is on par with their text editor as a
primary tool of the trade.
But underlying all of this there has always been sort of a question: what is
Docker, exactly? I don't necessarily mean the software, but the concept. I have
always felt that the software is not really all that great. Many aspects of
Docker's user interface and API seem idiosyncratic; some of the abstraction it
introduces is more confusing than useful. In particular, the union file system
(UFS) image format is a choice that seems more academically aspirational than
practical. Sure, it has tidy properties in theory, but my experience has been
that developers spend a lot more time working around it than working with it.
All this is to say that I don't think that Docker, the tool, is really all that
important. In a different world, LXC might have gained all this market share.
Had Docker not come about, something like containerd would likely have emerged
anyway. Or perhaps we would all be using lightweight VMs instead; academic and
commercial research tends to show that the advantages containers have over more
conventional paravirtualization are far smaller than most believe.
I would argue that the Docker that matters is not software, but a concept. A
workflow, you might say, although I don't think it's even that concrete. The
Docker that swept DevOps like a savior come to spare us from Enterprise
JavaBeans isn't really about the runtime at all. It's about the images, and
more about the ease of programatically creating images. Much of this benefit
comes from composition: perhaps the most important single feature of Docker
is the FROM keyword.
So Docker is an open-source software product, one that is basically free (as in
beer and as in freedom) although hindered by a history of messy licensing
situations. Docker is also a company, and companies are expected to produce
revenue. And that's where other facets of the greater identity we call "Docker"
come to light: Docker Desktop and Docker Hub.
Docker Desktop isn't really that interesting to me. Docker is closely coupled
to Linux in a way that makes it difficult to run on the predominant platform
used by developers [1]. Docker Inc. developed Docker Desktop, a tool that runs
Docker in a VM using fewer clicks than it would take to set that up yourself
(which is still not that many clicks). Docker Inc. then needed to make money,
so they slapped a licensing fee on Docker Desktop. I responded by switching to
Podman, but I get that some people are willing to pay the monthly fee for the
simplicity of Docker Desktop, even if I feel that the particular implementation
of Docker Desktop often makes things harder rather than easier.
Also I find the Docker Desktop "GUI" to be incredibly, intensely annoying,
especially since Docker Inc. seems to pressure you to use it in a desperate
attempt to dig what Silicon Valley types call a moat. But I fully acknowledge
that I am a weird computer curmudgeon who uses Thunderbird and pines for the
better performance of, well, pine.
Still, the point of this tangent about Docker Desktop is that Docker's decision
to monetize via Desktop---and in a pretty irritating way that caused a great
deal of heartburn to many software companies---was probably the first tangible
sign that Docker Inc. is not the benevolent force that it had long seemed to
be. Suddenly Docker, the open-source tool that made our work so much easier,
had an ugly clash with capitalism. Docker became a FOSS engine behind a
commercial tool that Docker Inc. badly wanted us to pay for.
Docker Desktop also illustrates a recurring problem with Docker: the borders
between free and paid within the scope of their commercial products. Docker
Desktop became free for certain use-cases including personal use and use in
small businesses, but requires a paid subscription for use in larger companies.
This kind of arrangement might seem like a charitable compromise but is also
sort of a worst-of-both-worlds: Docker Desktop is free enough to be ubiquitous
but commercial enough to pose an alarming liability to large companies. Some
companies exceeding Docker's definition of a small company have gone as far as
using their device management tools to forcibly remove Docker Desktop, in order
to mitigate the risk of a lawsuit for violating its license.
There is a fundamental problem with "free for some, paid for others": it
requires that users determine whether or not they are permitted to use the tool
for free. Even well-intentioned users will screw this up when the rules require
knowledge of their employer's financials and, moreover, are in small print at
the very bottom of a pricing page that says "free" at the top. Personally, I
think that Docker Inc.'s pricing page borders on outright deception by making
the licensing restrictions on Docker Desktop so unobvious.
Docker Hub, though: Docker Hub is really something.
That most compelling feature of Docker, the ability to easily pull images from
somewhere else and even build on top of them, depends on there being a place to
pull images from. It's easy to see how, at first, Docker Inc. figured that the
most important thing was to have a ubiquitous, open Docker registry that made
it easy for people to get started. In this way, we might view Docker Hub as
having been a sort of scaffolding for the Docker movement. The fact that you
could just run 'docker pull ubuntu' and have it work was probably actually
quite important to the early adoption of Docker, and many continue to depend
on it today.
Docker Hub, though, may yet be Docker's undoing. I can only assume that Docker
did not realize the situation they were getting into. Docker images are
relatively large, and Docker Hub became so central to the use of Docker that it
became common for DevOps toolchains to pull images to production nodes straight
from Docker Hub. Bandwidth is relatively expensive even before cloud provider
margins; the cost of operating Docker Hub must have become huge. Docker Inc.'s
scaffolding for the Docker community suddenly became core infrastructure for
endless cloud environments, and effectively a subsidy to Docker's many users.
It's hard to blame Docker Inc. too much for flailing. Docker Hub's operating
costs were probably unsustainable, and there aren't a lot of options to fix
this other than making Docker Hub expensive, or making Docker Hub worse, or
both. Docker Inc. seems to have opted for both. Docker Hub is not especially
fast, in fact it's pretty slow compared to almost any other option. Docker Hub
now imposes per-IP quotas, which probably would have been totally reasonable at
the start but was a total disaster when it was introduced post-hoc and suddenly
caused thousands, if not millions, of DevOps pipelines to intermittently fail.
Docker Inc.'s goal was presumably that users would start using paid Docker
plans to raise the quotas but, well, that's only attractive for users that
either don't know about caching proxies or judge the overhead of using one to
be more costly than Docker Hub... and I have a hard time picturing an
organization where that would be true.
That's the strange thing about Docker Hub. It is both totally replaceable and
totally unreplaceable.
Docker Hub is totally replaceable in that the Docker registry API is really
pretty simple and easy to implement in other products. There are tons of
options for Docker registries other than Docker Hub, and frankly most of them
are much better options. I'm not just saying that because GitLab [2] has a
built-in Docker registry, but that sort of illustrates the point. Of course
GitLab has a built-in Docker registry, it's no big deal. It's not even that
GitLab introduced it as a competitor to Docker Hub, that's sort of absurd,
Docker Hub doesn't even really figure. GitLab introduced it as a competitor to
Sonatype Nexus and JFrog Artifactory, to say nothing of the docker registries
offered by just about every cloud provider. For someone choosing a Docker
registry to deploy or subscribe to, Docker Hub has no clear advantage, and
probably ranks pretty low among the options.
And yet Docker Hub is the Docker registry, and the whole teetering tower of
DevOps is deeply dependent on it! What an odd contradiction, and yet it's
completely obvious why:
First, Docker Hub is free. Implausibly free, and as it turns out, probably
unsustainably free. There's an old maxim that if you're not paying, you're the
product. But Docker Hub reminds us that in the VC-driven (and not particularly
results-driven) world of Silicon Valley there is a potent second possibility:
if you're not paying, there may be no product at all. At least not once your
vendor gets to the end of the runway [3].
Second, Docker Hub is the default. Being the default can be a big deal, and
this is painfully true for Docker. The dominance of short, convenient
"user/image" or even just "image" references is so strong that Docker image
references that actually specify a registry feel almost feels like an off-label
hack, a workaround for how Docker is really supposed to be used. What's more,
Docker Hub's original quotas (or rather lack thereof) left no need for
authentication in many situations, so having to authenticate to a registry also
feels like an extra hassle. Many tools built around Docker don't make the use
of a non-Docker Hub registry, or any authentication to a registry, as
convenient as it probably should be. Tutorials and guides for Docker often omit
setup of any registry other than Docker Hub, since Docker Hub is already
configured and has everything available in it. You only find out the mistake
you've made when your pipelines stop working until the quota period resets, or
worse, pulls in production start failing and you have to hope you're lucky
enough to check the Kubernetes events before digging around a dozen other
places.
So the solution to the Docker Hub problem is obvious: stop using Docker Hub. It
was probably a bad idea all along. But the reality of the situation is much
harder. Moving off of Docker Hub is a pain, and one that has a way of staying
pretty far down priority lists. Docker Hub references, or rather references
with no registry at all that default to Docker Hub, are so ubiquitous that any
project moving their official builds off of Docker Hub will probably break a
tremendous number of downstream users.
Docker Inc.'s behavior with Docker Desktop and especially Docker Hub feels like
rent-seeking at best, and potentially extortionate. It's not exactly fair to
blame all of this on Docker Inc.; both commercial users and the open-source
community should have foreseen the retrospectively obvious risk of Docker
actually thinking about the economics. Nonetheless, a cynical and not entirely
unreasonable take on this story is that Docker hoodwinked us. Perhaps Docker has
simply stumbled upon the "Embrace, Extend, Extinguish" of our age: employ FOSS
software defaults and lazy developer practices (that were inculcated by Docker's
documentation) to make everyone dependent on Docker Inc.'s free registry, then
tighten the quota screws until they have no choice than to pay in. This is a
very cynical take indeed! I don't really believe it, mostly because it involves
far more strategic vision than I would credit Docker Inc. with.
I decided to write about this because I think there are lessons to be learned.
Important lessons. No doubt some of this problem is directly attributable to
the economic conditions that dominated Silicon Valley for the last decade.
Docker Inc. probably wouldn't have gotten so far, burning so much money, had
there not been an incredible amount of money to burn. Still, it seems
inevitable that this kind of relationship between open-source software and
corporate strategy, and between free and paid services, will happen again.
I propose these takeaways, as discussion topics if nothing else:
Be skeptical of free services, especially ones that are required for any
part of your business (or open source venture, or hobby project, etc).
Free services should never become a deeply embedded dependency unless there
is very good reason to believe they will remain free. Perhaps the backing of
a large foundation or corporate sponsor with a good history with open source
would count, but even that is no promise. Consider the example of Red Hat,
its acquisition by IBM, and the impact of that business event on projects
previously considered extremely reliable like CentOS.
Free tools that rely on third-party services are only free for the time
being. Sure, this might be obvious, but it's probably a deeper problem than
you realize. Docker never relied on Docker Hub in that it has always been
possible to use other registries. But Docker and the community strongly
encouraged the use of Docker Hub through technical, economic, and social
means. This had the result of making Docker Hub a de facto hard requirement
for many projects and systems.
When writing documentation, guides, blog posts, advice to coworkers, etc.,
think about long-term sustainability even when it is less convenient. I
suspect that the ongoing slow-motion meltdown over Docker Hub would have
been greatly mitigated if the use of multiple Docker registries, or at least
the easy ability to specify a third-party registry and authenticate, were
considered best practices and more common in the Docker community.
[1] I mean MacOS, but you can assume I mean Windows and it still works.
[2] My employer whose opinions these are not.
[3] I am here resisting the urge to write a convoluted aviation metaphor.
Something about being passengers on a whale-shaped plane that is hitting the
last thousand feet and still short of V_r, so the captain says we only get
100 builds per 6 hours per IP and the rest are going out the window.
p.s. I took so long to write this so late at night that now the date in the
title is wrong, haha whoops not fixing it
This will probably be a short one, and I know I haven't written for a while,
but it has always been the case that you get what you pay for and Computers
Are Bad is nothing if not affordable. Still, this is a topic on which I am
moderately passionate and so I can probably stretch it to an implausible
length.
Elevator control panels have long featured two buttons labeled "door open" and
"door close." One of these buttons does pretty much what it says on the label
(although I understand that European elevators sometimes have a separate "door
hold" button for the most common use of "door open"). The other usually doesn't
seem to, and that has lead to a minor internet phenomenon. Here's the problem:
the internet is wrong, and I am here to set it right. This works every time!
A huge number of articles confidently state that "80% of door close buttons do
nothing." The origin of this 80% number seems to be a 2014 episode of Radiolab
titled "Buttons Not Buttons," which I just listened through while doing
laundry. Radiolab gets the statistic from the curator of an elevator history
museum, who says that most of them "aren't even hooked up." This is reason to
doubt our curator's accuracy. I don't think there is anything malicious going
on here, but I do think there is an element of someone who has been out of the
industry for a while who is at least misstating the details of the issue.
The problem is not unique to Radiolab, though. An Oct. 27, 2016 New York Times
article, "Pushing That Crosswalk Button May Make You Feel Better, but...,"
covers the exact same material as the Radiolab article a couple of years later.
And the article was widely repeated in other publications, not by syndication but
by "According to the New York Times..." paraphrasing. This means that often the
repetitions are more problematic than the original, but even the original says:
But some buttons we regularly rely on to get results are mere artifices -
placebos that promote an illusion of control but that in reality do not work.
Many versions of the article lean on this line even harder, asserting that
door close buttons in elevators are installed entirely or at least primarily
as placebos. But the NYTimes article provides brief mention of the deeper,
and less conspiratorial, reality:
Karen W. Penafiel, executive director of National Elevator Industry Inc., a
trade group, said the close-door feature faded into obsolescence a few years
after the enactment of the Americans With Disabilities Act in 1990.
...
The buttons can be operated by firefighters and maintenance workers who have
the proper keys or codes.
There are a few things to cover:
First, anyone who says that the "door close" buttons in elevators are routinely
"not even hooked up" shouldn't be trusted. The world is full of many elevators
and I'm sure some can be found with mechanically non-functional door close
buttons, but the issue should be infrequent. The "door close" button is
required to operate the elevator in fire service mode, which disables automatic
closing of the doors entirely so that the elevator does not leave a firefighter
stranded. Fire service mode must be tested as part of the regular inspection of
the elevator (ASME A17.1-2019, but implemented through various state and local
codes). Therefore, elevators with a "door close" button that isn't "hooked up"
will fail their annual inspections. While no doubt some slip through the cracks
(particularly in states with laxer inspection standards), something that
wouldn't meet inspection standards can hardly be called normal practice and the
affected elevators must be far fewer than 80%.
But perhaps I am being too pedantic. Elevator control systems are complex and
highly configurable. Whether or not the door close button is "hooked up" or not
is mostly irrelevant if the controller is configured to ignore the button, and
it's possible that some of these articles are actually referring to a
configuration issue. So what can we find about the way elevators are configured?
I did some desperate research in the hopes of finding openly available
documentation on elevator controller programming, but elevator manufacturers
hold their control systems very close to their chests. I was not lucky enough
to find any reasonably modern programming documentation that I could access.
Some years ago I did shoulder-surf an elevator technician for a while as he
attempted to troubleshoot a reasonably new two-story ThyssenKrupp hydraulic
that was repeatedly shutting off due to a trouble code. In the modern world
this kind of troubleshooting consists mostly of sitting on the floor of the
elevator with a laptop looking at various status reports available in the
configuration software. The software, as I recall, came from the school of
industrial software design where a major component of the interface was a large
tree view of every option and discoverability came in the form of some items
being in ALL CAPS.
The NYTimes article, though, puts us onto the important issue here: the ADA.
Multiple articles repeat that door close buttons have been non-functional since
1990, although I think most of them (if not all) are just paraphrasings of this
same NYTimes piece. The ADA is easy to find and section 4.10 addresses elevators.
Specifically, 4.10.7 and 4.10.8 have been mentioned by some elevator technicians
as the source of the "door close" trouble. With some less relevant material
omitted:
4.10.7* Door and Signal Timing for Hall Calls
The minimum acceptable time from notification that a car is answering a call
until the doors of that car start to close shall be calculated from the
following equation:
T = D/(1.5 ft/s) or T = D/(445 mm/s)
where T total time in seconds and D distance (in feet or millimeters) from a
point in the lobby or corridor 60 in (1525 mm) directly in front of the
farthest call button controlling that car to the centerline of its hoistway
door (see Fig. 21).
4.10.8 Door Delay for Car Calls
The minimum time for elevator doors to remain fully open in response to a car
call shall be 3 seconds.
Based on posts from various elevator technicians, it's clear that these ADA
requirements have at least been widely interpreted as stating hard minimums
regardless of any user interaction. In other words, the ADA timing constitutes
the minimum door hold time which cannot be shortened. Based on the 4.10.7
rule, we can see that that time will be as long as ten seconds in fairly normal
elevator lobbies (16 feet, or about two elevators, from door centerpoint to
furthest button). We can read the same in a compliance FAQ from Corada, an ADA
compliance consulting firm:
User activation of door close (or automatic operation) cannot reduce the
initial opening time of doors (3 seconds minimum) or the minimum door signal
timing (based on 1.5/ ft/s travel speed for the distance from the hall call
button to car door centerline).
One point here can be kind of confusing. The minimum time for the door to be
fully open is 3 seconds, but the door signal timing is based on the time from
the indication of which elevator has arrived (usually a chime and illuminated
lamp) to the time that the doors start closing. This will be at least a couple
of seconds longer than the minimum door time due to the open and close time of
the door, but since it starts at 5 seconds and goes up from there it will
usually be the longer of the two requirements and thus set the actual minimum
door time. Where this is likely to not be the case are single-elevator setups
where the 5 second minimum timing will apply and the time from chime to door
open eats up the first two seconds... in that case, the 3 second fully open
time will become the limiting (or really maximizing) factor.
From some elevator manuals such as one for the Motion Control Engineering
VFMC-1000, we can gather that that the "minimum door hold time" and "door hold
time" are separately configurable. I have seen several mentions online that in
most elevators the "door close" button functions totally normally during the
difference between the minimum door hold time and the door hold time. In
other words, there may be some period during which pushing the door close
button causes the door to close, but it will be after the end of the
ADA-required minimum door time.
Here is the obvious catch: since reducing the door hold time will make the
elevator more responsive (less time on the way to a call spent waiting with
the doors open), elevator installers are usually motivated to make the door
hold time as short as possible. Since the ADA requirements impose a minimum,
it's likely very common for the minimum door hold time and the "normal" door
hold time to be the same... meaning that the window to use the "door close"
button is zero seconds in duration.
We can confirm this behavior by finding an elevator with a very long configured
door hold time. That seems pretty easy to do: visit a hospital. Most hospitals
set the door hold time fairly high to accommodate people pushing hospital beds
around, so the normal door open time is longer than the ADA requirement (the
ADA rules are of course written assuming a person can cover 1.5 ft/s which
isn't very fast but still seems hard to achieve when accelerating a heavy
hospital bed in a tight space). Call an elevator, step inside, wait for around
ten seconds from the chime for the minimum door hold to elapse, and then push
the "door close" button. What happens? Well, in my experience the door promptly
closes, although I admit that I've only tested this on two hospitals so far.
Perhaps your experience will vary: I can see the possibility of a hospital
setting the minimum door hold time high, but of course that would get pretty
annoying and probably produce pushback from the staff. In the hospitals where
I've studiously observed the elevators the normal door hold time was close to
20 seconds, which feels like an eternity when you're waiting to get up one
floor.
Another way we can inspect this issue is via door reopening rules. While older
elevators used a rubber bumper on the door called a sensitive edge, most
elevators you'll see today use a "light curtain" instead. This device,
installed between the hall and car doors, monitors for the interruption of
infrared light beams to tell if the door is clear. When the door is obstructed,
ADA 4.10.6 requires the door to remain open for at least 20 seconds. After that
point ADA just refers to the ASME A17.1 standard, which allows for a behavior
called "nudging" in which the elevator controller encourages people to clear
the door by closing it anyway (at slow speed). The light curtain can also be
used to detect whether or not a person has entered the elevator, which can be
used as an input to hold time. Some articles online say that you can "hack" an
elevator waiting at an empty floor (because someone called the elevator and
walked away, for example) by momentarily interrupting the light curtain so that
the controller will believe that someone has entered.
Indeed this seems to work well on some elevators, but the ADA requirements do
not allow an exception to minimum hold times based on light curtain detection.
This means that the light curtain trick is basically equivalent to the door
close button: we can expect it to, at most, shorten the door hold time to the
ADA minimum. Nothing is allowed to decrease the time below the ADA minimum,
except when the elevator is in a special mode such as fire or perhaps
independent service.
So it seems that the reality of elevator "door close" buttons is rather less
dramatic than Radiolab and the NYTimes imply: the "door close" button is
perfectly functional, but details of the 1990 ADA mean that most of the time
people are pressing it the elevator controller isn't permitted to close the
door due to ADA rules. As far as I can tell, outside of the ADA minimum door
time, door close buttons work just fine.
And yet tons of articles online still tell us that the button is installed as a
placebo... something that is demonstrably untrue considering its significance
in fire (and maintenance, independent, etc) modes, and shows a general lack of
understanding of elevator codes and the ADA. Moreover, it seems like something
you would find out is untrue with about five minutes of research. So why is it
such "common knowledge" that it makes the rounds of major subreddits and minor
local news websites to this day?
No doubt a large portion of the problem is laziness. The "placebo" theory has a
lot of sizzle to it. Even though the NYTimes is somewhat noncommittal and only
implies that it is the true purpose of the button, most of the online pieces
about door close buttons I can find appear to be based solely on the 2016
NYTimes article and actually repeat the claim about the placebo affect more
strongly than the NYTimes originally makes it. In other words, the "fact" that
the door close button is a placebo seems to mostly just be a product of lazy
journalists rewriting an NYTimes piece enough to not feel like plagiarists.
There is also a matter of aesthetic appeal: the placebo theory sounds great. It
has the universal appeal of mundane reality but also hints at some kind of
conspiracy to deceive in the elevator industry. And, of course, it makes
everyone feel better about the high failure rate of mashing the "door close"
button without the complexity of an accurate explanation of the 1990 ADA rules.
The NYTimes piece basically makes it sound like the ADA banned door close
buttons, and it's easy to read the ADA and see that that's not true... but it
takes some real attention and thought to figure out how the ADA really did
change elevator controls.
This type of phenomenon, a sort of "internet urban pseudo-legend," is not at
all unique to elevator buttons. In fact the very same 2016 NYTimes article
that started that year's round of elevator button "fun facts" is also to blame
for another widespread belief in placebo buttons: crosswalk request buttons.
The NYTimes article says that most crosswalk buttons do nothing, explaining
that the buttons were made non-functional after an upgrade to computer light
controls. What the article does say, but many readers seem to miss, is that
this is a fact about crosswalk buttons in New York City.
Many traffic lights operate in "actuated mode," where they base their cycling
on knowledge of who is waiting where. Older traffic lights mostly used buried
inductive loops under the lanes to detect lane occupancy (that a vehicle is
present), but a lot of newer traffic lights use either video cameras or compact
radar sets. Since they don't require cutting into the pavement and then
resealing it, these are cheaper and faster to install. Newer video and radar
systems are also better at detecting cyclists than pavement loops---although
earlier video systems performed very poorly on this issue and gave video lane
presence detection a bad reputation in some cities.
New York City, though, was a very early adopter of large-area computer control
of traffic lights. One of the main advantages of central computer control of
traffic lights is the ability to set up complex schedules and dynamically
adjust timing. Not only can centrally-controlled traffic lights operate in
sequence timing matched to the speed limit of the street, they can also have
the durations in different directions and sequence speed adjusted based on
real-time traffic conditions.
The problem is that combining central timing control with actuated operation
is, well, tricky. In practice, most traffic lights that operate under sequence
timing or remote timing control don't operate in actuated mode, or at least not
at the same time. What some traffic lights do today is switch: sequence timing
during rush hour, and actuated mode during lower traffic. Even with today's
developments combining scheduled timing with actuation inputs is tricky, and
New York City adopted centralized control in the '70s!
So New York's adoption of central control was also, for the most part, an
abandonment of actuated operation. The crosswalk buttons are actuation inputs,
so they became non-functional as part of this shift. The 2016 NYTimes article
explained that the city had estimated the cost of removing the now
non-functional buttons at over a million dollars and so decided to skip the
effort... but they are removing the buttons as other work is performed.
For the second time, this runs directly counter to the "mechanical placebo"
argument the article is based on. The buttons weren't originally installed as
placebo at all; when they were put in they were fully functional. A different
decision, to switch to centralized timing control, resulted in their current
state, and even then, they are being removed over time.
Moreover, the same does not apply to other cities. The NYTimes makes a very
lazy effort at addressing this by by referring to a now-unavailable 2010 ABC
News piece reporting that they "...found only one functioning crosswalk button
in a survey of signals in Austin, Tex.; Gainesville, Fla.; and Syracuse." It is
unclear what the extent of that survey is, and I lack the familiarity with
traffic signaling in those cities to comment on it. But in a great many cities,
most of them in my experience, actuated traffic signals remain the norm outside
of very high-traffic areas, and so the crosswalk buttons serve a real purpose.
Depending on the light configuration, you may never get a "walk" signal if you
don't press the button, or the duration of the "walk" signal (prior to the
flashing red hand clearing time) may be shorter.
Actually one might wonder why those crosswalk buttons have so much staying
power, given the technical progress in lane presence detection. Video and radar
options for waiting pedestrian detection do exist. I have occasionally even
seen PIR sensors installed for this purpose in suburban areas. The problem, I
think, is that detecting a pedestrian waiting to cross involves more nuance
than a vehicle. Sidewalks don't have lane lines to clearly delineate different
queues for each movement. A video or radar-based system can detect a pedestrian
waiting on the corner, but not whether that person is waiting to cross one
direction, or the other, or for an Uber, or just chose that spot to catch up on
Tik Tok. Video-based waiting pedestrian detection may be too prone to false
positives, and in any case the button is a robust and low-cost option that can
also be used to meet ADA objectives through audible and tactile announcements.
So there's a story about buttons: the conspiracy about them being placebos is
itself a conspiracy to get you to read articles in publications like "Science
Alert." Or maybe that's just an old tale, and the reality of content-farmed
news websites falls out of some implications of the ADA. It's a strange world
out there.
As we discussed previously,
the search for UAP is often contextualized in terms of the events of 2017:
the public revelation of the AATIP and alien-hunting efforts by Robert Bigelow
and Tom DeLonge. While widely publicized, these programs seem to have lead to
very little. I believe the termination of the AATIP (which lead to the creation
of TTSA) to be a result of the AATIP's failure to address the DoD's actual
concern: that UAP represented a threat to airspace sovereignty.
I just used a lot of four- and five-letter acronyms without explaining them.
These topics were all discussed in the previous
post and if you
are not familiar with them I would encourage you to read it. Still, I will try
to knock it off. Besides, now there is a new set of four- and five-letter
acronyms. The end of the AATIP was not the end of the DoD's efforts to
investigate UAP. Instead, military UAP research was reorganized, first into
Naval intelligence as the UAP Task Force, and later in the cross-branch
military intelligence All-Domain Anomaly Resolution Office, or AARO.
It is unclear exactly what the AARO has accomplished. As a military
intelligence organization, the DoD will not comment on it. Most of what we know
comes from legislators briefed on the program, like Sen. Gillibrand and Sen.
Rubio. In various interviews and statements, they have said that AARO's work is
underway but hampered by underfunding---underfunding that is, embarrassingly, a
result of some kind of technical error in defense appropriation.
Administratively confused as they may be, the DoD's UAP efforts have lead to
creation of a series of reports. Issued by the Director of National
Intelligence (DNI) at the behest of congress, the June 2021 unclassified report
appeared to be mostly a review of the same data analyzed by AATIP. The report
was short---9 pages---but contained enough information to produce a lot of
reporting. One of the most important takeaways is that, up to around 2020, the
military had no standardized way of collecting reports of UAP. Later
reporting would show that even after 2020 efforts to collect UAP reports were
uneven and often ineffective.
Much of the reason for this is essentially stigma: advocates of UAP research
have often complained that through the late 20th century the military developed
a widespread attitude of suppressing UAP incidents to avoid embarrassment. As a
result, it's likely that there are many more UAP encounters than known. This is
particularly important since analysis (including that in the 2021 report)
repeatedly finds that the majority of UAP reports are probably explainable,
while a few are more likely to result from some type of unknown object such as
an adversarial aircraft. In other words, the signal to noise ratio in UAP
reports is low. Taken one way this might discourage reporting and analysis,
since any individual report is unlikely to amount to anything. The opposite is
true as well, though: if most UAP encounters are not reported and analyzed,
it's likely that the genuinely troubling incidents will never be discovered.
The 2021 report broadly suggests that this is exactly what was happening for
many years: so few UAP incidents were seriously considered that no one noticed
that some of them posed real danger.
The 2021 report briefly mentions that some UAP incidents were particularly
compelling. For example, in 18 incidents the UAP demonstrated maneuvering.
This doesn't mean "shot into the sky as if by antigravity," but rather that the
objects appeared to be navigating towards targets, turning with intention, or
stationkeeping against the wind. In other words, they are incidents in which
the UAP appears to have been a powered craft under some type of control. Even
more importantly, the report notes that in a few cases there were indications
of RF activity. The military will never go into much detail on this topic
because it quickly becomes classified, but many military aircraft are equipped
with "electronic warfare" systems that use SDR and other radio technology to
detect and classify RF signals. Historically the main purpose of these systems
was to detect and locate anti-aircraft radar systems, but they have also been
extended to general ELINT use.
ELINT is an intelligence community term for "electronic intelligence." Readers
are more likely to be familiar with the term SIGINT, for signals intelligence,
and the difference between the two can be initially confusing. The key is that
the "electronic" in ELINT is the same as in "electronic warfare." SIGINT is
about receiving signals in order to analyze their payloads, for example by
cryptologic means. ELINT is about receiving signals for the sake of the signals
themselves. For example, to recognize the chirp patterns used by specific
adversarial radar systems, or to identify digital transmission modes used by
different types of communications systems, thus indicating the presence of
that communications system and its user. A simple and classic example of ELINT
would be to determine that an adversarial force uses a certain type of
encrypted digital radio system, and then monitor for transmissions matching
that system to locate adversarial forces in the field. The contents don't
matter and for an encrypted system may not be feasible to recover anyway. The
mere presence of the signal provides useful intelligence.
The concept of ELINT becomes important in several different ways when
discussing UAP. First, the 2021 DNI report's mention that several UAP were
associated with RF emissions almost certainly refers to ELINT information
collected by intelligence or electronic warfare equipment. These RF emissions
likely indicate some combination of remote control and real-time data
reporting, although a less likely possibility (in my opinion) is that it
reflects electronic warfare equipment on the UAP engaged in some type of
active countermeasure.
It's meaningful to contrast this view of the matter with the one widespread in
the media in 2017. A UAP that maneuvers and communicates by radio is not
exactly X-Files material, and almost by definition can be assumed to be an
sUAS---small unmanned aerial system, commonly referred to as a drone. Far from
the outlandish claims made by characters like Tom DeLonge, such a craft is
hardly paranormal in that we know such devices exist and are in use. What is a
startling discovery is that sUAS are being spotted operating near defense
installations and military maneuvers and cannot be identified. This poses a
very serious threat not only to airspace sovereignty as a general principle but
also to the operational security of the military.
Perhaps the component of the report that generated the most media interest is
its analysis of the nature of the reported UAP. In the vast majority of cases,
in fact all but one, the DNI report states that it was not possible to
definitively determine the nature of the UAP. This was almost always because of
the limited information available, often just one or two eyewitness accounts
and perhaps a poor photo and radar tracks. Most of these incidents presumably
do have explanations within the realm of the known that simply could not be
determined without additional evidence. On the other hand, the report does
state that there are some cases which "may require additional scientific
knowledge" to identify.
It is not entirely clear how dramatically this statement should be taken. It's
possible, even likely, that the phrase mostly refers to the possibility that
new methods of evidence collection will need to be developed, such as the new
generation of radar systems currently emerging to collect more accurate
information on sUAS with very low radar cross section due to their small size.
It's also possible that the phrase reflects the fact that some reported UAP
incidents involve the UAP behaving in ways that no known aerial system is
capable of, such as high speeds and maneuvers requiring extreme performance.
Once again, there is a temptation to take this possibility and run in the
direction of extraterrestrial technology. Occam's razor at the very least
suggests that it's more likely that some adversarial nation has made
appreciable advancements in aviation technology and kept them secret. While
perhaps unlikely this is not, in my mind, beyond reason. We know, for example,
that both Russia and China have now made more progress towards fielding a
practical hypersonic weapons system than the United States. This reinforces the
possibility that their extensive research efforts have yielded some interesting
results.
Following the 2021 UAP report, Congress ordered the DNI to produce annual
updates on the state of UAP research. The first such update, the 2022 report,
was released a few months ago. The unclassified version is quite short, but it
is accompanied by a significantly longer and more detailed classified version
which has been presented to some members of Congress. The unclassified document
states that the number of known UAP incidents has increased appreciably,
largely due to the substantial effort the military has made to encourage
reporting. To provide a sense of the scale, 247 new reports were received in
the roughly 1.5 years between the preliminary and 2022 reports. A number of
additional incidents occurring prior to the 2021 report also came to the
attention of military intelligence during the same period, and these were
analyzed as well.
Perhaps the most important part of the 2022 report is its statement that, of
the newly analyzed incidents, more than half were determined to be
"unremarkable." In most cases, it was judged that the incident was probably
caused by a balloon. While these are still of possible interest, they are less
interesting than the remainder which are more difficult to explain.
Intriguingly, the report states that some UAP "demonstrated unusual flight
characteristics or performance capabilities." This supports the more dramatic
interpretation of the 2021 report, that it is possible that some incidents
cannot be explained without the assumption that some adversary possesses a
previously unknown advanced technology.
While it already attracted a great deal of media attention, this entire matter
of DNI reports was only the opening act to the spy balloon. The airspace
sovereignty aspect of the UAP reports is not something that attracted much
discussion in the media, but it has become much more front of mind as a UAP of
the first kind drifted across the United States. This UAP was not unidentified
for long, with the military publicly attributing it to China---an attribution
that China has both formally and informally acknowledged.
Balloons are not new in warfare. Indeed, as the oldest form of aviation, the
balloon is also the oldest form of military aviation. The first practical
flying machine was the hot air balloon. While the technology originated in
France, the first regular or large-scale example of military aviation is
usually placed at the US Civil War. Hot air balloons were routinely used for
reconnaissance during the Civil War, and the slow movement and long dwell times
of balloons still make them attractive as reconnaissance platforms.
Military ballooning in the United States is not limited to the far past.
During World War II, the Japanese launched nearly 10,000 balloons equipped with
incendiaries. The hope was that these balloons would drift into the United
States and start fires---which some of them did, although a concerted press
censorship program largely prevented not only the Japanese but also Americans
learning of the campaign. Ultimately the impact of the balloon bombs was very
limited, but they are still often considered the first intercontinental weapon
system. They might also be viewed as the first profound challenge to US air
sovereignty, as the balloons required no nearby support (as aircraft of the era
did) and the technology of the time provided no effective means of protection.
Indeed, this was the calculus behind the press censorship: since there was no
good way to stop the balloon bombs, the hope was that if the US carefully
avoided any word of them being published, the Japanese might assume they were
all being lost at sea and stop sending them.
While the Cold War presented Soviet bombers and then missiles as top concerns,
it could be said that balloons have always been one of the greatest practical
threats to airspace sovereignty. Despite their slow travel and poor
maneuverability, balloons are hard to stop.
Balloons remain surprisingly relevant today. First, modern balloons can operate
at extremely high altitudes, similar to those achieved by the U-2 spy plans.
This provides an advantage both in terms of observation range and secrecy.
Second, balloons are notoriously difficult to detect. While the envelope is
large, the material is largely transparent to RF, resulting in a very low radar
cross section. Careful design of the suspended payload can give it a very low
radar cross section as well... often easier than it sounds, since the payload
is kept very lightweight. The sum result of these two factors is that even
large balloons are difficult to detect. They are most obvious visually, but the
United States and Canada have never had that substantial of a ground observer
program and the idea has not been on the public mind for many decades. Many
people might see a balloon before any word reached air defense.
On January 28th, a large balloon operated by China entered US airspace over
Alaska. During the following week, it drifted across the country until leaving
the east coast near South Carolina, where it was shot down with a Sidewinder
missile. Circumstances suggest that both the Chinese and US administrations may
have intended to downplay the situation to avoid ratcheting tensions, as the US
government did not announce the balloon to the public until about a day after
it had initially been detected entering US airspace. Publicly, China claimed it
to be a weather balloon which had unintentionally drifted off course. The New
York Times reports that, privately, Chinese officials told US counterparts that
they had not intended for the balloon to become such a public incident and
would remove it from US airspace as quickly as possible.
Modern balloons of this type are capable of a limited but surprisingly flexible
form of navigation by adjusting their buoyancy, and thus altitude, to drift in
different winds. Perhaps the balloon spent a week crossing the US by intention,
perhaps an unfortunate coincidence of weather created a situation where they
were not able to navigate it out more quickly, or perhaps some equipment
failure had rendered the balloon unable to change its altitude. I tend to
suspect one of the latter two since it is hard to think of China's motivation
to leave the balloon so publicly over the United States. In any case, that's
what happened.
We now know more about the balloon, not so much because of analysis of the
wreckage (although that is occurring) but more because the military and
administration have begun to share more information collected by means
including a U-2 spy plane (one of few aircraft capable of meeting the balloon's
altitude) and other military reconnaissance equipment. The balloon had large
solar arrays to power its equipment, it reportedly had small propellers (almost
certainly to control orientation of the payload frame rather than for
navigation), and it bristled with antennas.
This is an important point. One of the popular reactions to the balloon was
mystery at why China would employ balloons when they have a substantial
satellite capability. At least for anyone with a background in remote sensing
the reason is quite obvious: balloons are just a lot closer to the ground than
satellites, and that means that just about every form of sensing can be
performed with much lower gain and thus better sensitivity. This is true of
optical systems where balloons are capable of much better spatial resolution
than satellites, but also true of RF where atmospheric attenuation and
distortion both become very difficult problems when observing from orbit.
Further, balloons are faster and cheaper to build and launch than satellites,
allowing for much more frequent reconfigurations and earlier fielding of new
observation equipment. The cost and timeline on satellites is such that newly
developed intelligence technology takes years to make it from the lab to the
sky... Chinese intelligence balloons, on the other hand, can likely be
fabricated pretty quickly.
It's useful here to return to the topic of ELINT. First, it's very likely that
ELINT was a major mission of this balloon. Sensing RF emissions from military
equipment at close range is invaluable in creating ELINT signatures for
equipment like radar and encrypted communications systems, which directly
translates into a better capability to mount an offensive from the air. SIGINT
was likely also a mission. One of the advantages of ELINT collection is that
the data acquired for ELINT purposes can typically be processed to glean SIGINT
information, and even provides valuable material for cryptologists attempting
to break codes.
ELINT is also relevant in the detection of the balloon. While the spy balloon
in the recent incident was detected by conventional means, the DoD has reported
that they are now able to assert that this is at least the fifth such balloon
to enter US airspace. For those not familiar with ELINT methods this might be
surprising, but it makes a great deal of sense. The fact that this balloon was
tracked by the military for days provided ample opportunities to collect good
quality ELINT signatures of the communications equipment used by the balloon.
The military possesses a number of aircraft dedicated to the purpose of ELINT
and SIGINT collection, such as the RC-135---a modified C-135 Stratolifter
equipped with specialized antennas and hundreds of pounds of electronic
equipment. These type of aircraft could orbit the balloon for hours and collect
extensive recordings of raw RF emissions.
ELINT information is also collected by ground-based and orbital (satellite)
assets, including a family of satellites that deploy large parabolic reflectors
to collect RF signals with extremely high gain. The data collected by these
platforms is likely retained in raw form, allowing for retrospective analysis.
Information collected by similar means has been publicly used in the past. And
this is most likely how the first four balloons were discovered: by searching
historic data collected by various platforms for matching ELINT signatures. The
presence of the same digital data modem as in the recent spy balloon, in US
airspace, almost certainly indicates a similar Chinese asset operating in the
past.
It's important to understand that the RF environment is extremely busy, with a
great deal of noise originating from the many radio devices we use every day.
It's simply not feasible for someone in some military facility to carefully
review waterfall displays of the RF data collected by numerous ELINT assets.
What is much more feasible is to develop signatures and then use automation to
search for instances of similar traffic. It's the practical reality of
intelligence at scale.
The discovery of the recent spy balloon has had an incredible effect on air
defense. I am of the general opinion, and have occasionally argued in the past,
that the US government has significantly under-invested in air defense since
the end of the Cold War. While we do need to move on from the hysteria of the
1970s, the lack of investment in air surveillance and defense over the last
fifty years or so has lead to an embarrassing situation: our ability to detect
intrusion on our airspace is fairly poor, and when we do it can take well over
an hour to get a fighter in the air to investigate it. The balloon brought this
problem to the attention of not only the government but the public, and so some
action had to be taken.
Primary radar [1] is quite complex. Even decades into radar technology it
remains a fairly difficult problem to pick objects of interest, such as
aircraft, out of "clutter"---the many objects, ranging from the ground to
wind-blown dust, that can produce primary radar returns. One of the simplest
approaches is to ignore objects that are not large and moving fast. This type
of filtering is usually adequate for detection of aircraft, but fails entirely
for some objects like balloons and sUAS that may be small and slow moving.
Further, the US and Canada are very large. Integrating data from the many radar
surveillance sites and presenting it in a way that allows an air defense
controller to identify suspicious objects in the sea of normal air traffic is a
difficult problem, and a problem that the US has not seriously invested in for
decades. The information systems used by both the FAA and NORAD for processing
of radar data are almost notoriously poor. In the wake of the spy balloon,
officials have admitted to the press that the military is struggling to process
the data from radar systems and identify notable objects.
Air defense is one of the oldest problems in computing as an industry. One of
the first (perhaps the first, depending on who you ask) networked computer
systems was SAGE: an air defense radar processing system. These problems are
still difficult today, but we are no longer mounting cutting-edge research and
development projects to face them. Instead, we are trapped in a morass of
defensed contractors and acquisition projects that take decades to deliver
nothing.
In response to the discovery of the spy balloon, NORAD has changed the
parameters used to process radar data to exclude fewer objects. They have also
made a policy change to take action on more unknown objects than they had
before. This lead directly to NORAD action to intercept several balloons over
the past two weeks. There are now indications that at least some of these
balloons may have been ordinary amateur radio balloons, not presenting a threat
to air sovereignty at all. Some will view this as an embarrassment or
indictment of NORAD's now more aggressive approach, but it's an untenable
problem. If China or some other adversary is sending small balloons into our
airspace, we need to make an effort to identify such balloons. But currently,
no organized system or method exists to identify balloons and other
miscellaneous aerial equipment.
One could argue (indeed, here I am) that up to about two weeks ago NORAD was
still looking for Soviet bombers, with a minor side project of light aircraft
smuggling drugs. Air defense largely ignored anything that wasn't large and
actively crossing a border (or more to the point an ADIZ). And that's how about
four large intelligence platforms apparently wandered in unnoticed... with UAP
reports suggesting that there may be much more.
My suspicion is that the coming year will involve many changes and challenges
in the way that we surveil our airspace. I think that we will likely become
more restrictive in airspace management, requiring more aircraft than before to
have filed flight plans. Otherwise it is very difficult to differentiate a
normal but untracked object from an adversarial intelligence asset.
And indications are that adversarial intelligence assets are a very real
problem. China's spy balloon program is apparently both long-running and
widespread, with similar balloons observed for years in other countries as
well. This shouldn't be surprising---after all, reconnaissance balloons are the
oldest form of military aviation. The US and allies made enormous use of
reconnaissance balloons during the Cold War, sending many thousands into the
USSR. It's likely the case that we only really slowed down because our modern
reconnaissance balloon projects have all become notorious defense contracting
failures. We're still trying, but projects like TARS have run far overbudget
and still perform poorly in operational contexts.
It might feel like this situation is new, and in terms of press reporting it
is. But we should have seen it coming. In an interview following a classified
briefing, Senator John Kennedy said that "These objects have been flying over
us for years, many years. We've known about those objects for many years."
Robert Bigelow got into UAP research because he was searching for aliens.
Maybe aliens are out there, maybe they aren't, but there is one thing we know
for sure: our adversaries are out there, and they possess aviation technology
at least as advanced as ours. For decades we ignored UFOs as folly, and for
decades we ignored potential aviation advancements by our adversaries along
with them. Now those advancements are floating across the northern United
States and perhaps worse---the DNI is hoping they'll find out, if they can just
get people to report what they see.
[1] Radar that operates by detecting reflections or attenuation of an RF field
by an object. This is as opposed to secondary radar, more common in air traffic
control, that works by "interrogating" a cooperative transponder installed on
the aircraft.