_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford               home archive subscribe rss

>>> 2021-08-26 a permanent solution

The strategic and tactical considerations surrounding nuclear weapons went through several major eras in a matter of a few decades. Today we view the threat of nuclear wear primarily through the "triad": the capability to deliver a nuclear attack from land, sea, and air. This would happen primarily through intercontinental ballistic missiles (ICBMs), so-called because they basically launch themselves to the lower end of space before strategically falling towards their targets (ballistic reentry). ICBMs are fast, taking about 30 minutes to arrive across the globe. The result is that we generally expect to have very little warning of a nuclear first trike.

The situation in the early Cold War was quite different. ICBMs, and long-range missiles in general, are complex and took some time to develop. From the end of World War II to roughly the late '60s, the primary method of delivery for nuclear weapons was expected to be by air: bombs, delivered by long-range bombers. The travel time from the Soviet Union would be hours, allowing significant warning and a real opportunity for air defense intervention.

The problem was this: we would have to know the bombers were coming.

Many people seem to assume that the United States has the capability to detect and track all aircraft flying in US airspace. The reality is quite a bit different. The problem of detecting and tracking aircraft is a surprisingly difficult one, and even today our capabilities are limited. Nonetheless, surveillance of airspace is considered a key element of "air sovereignty," or our ability to maintain military and civil control of our airspace.

Let's take a look at the history of the United States ability to monitor our airspace.

During the 1940s, it was becoming clear that airspace surveillance was an important problem. Although the United States did not then face attacks on the contiguous US (and never would), were the Axis forces to advance to the point of bombing missions on CONUS it would be critical to be able to detect the incoming aircraft. The military invested in a system for Aircraft Control and Warning, or AC&W. Progress was slow: long-range radar was primitive and expensive, and the construction of AC&W stations was not a high-level priority. By the end of the 1948, there were only a small number of AC&W stations, they were considered basically experimental, and the ability to integrate data coming from the several stations was very limited. Efforts to expand the air surveillance system routinely failed due to lack of funding.

The Lashup project, launched in '48, made up the first major effort to build an air surveillance system. As the name suggests, Lashup was only intended to be temporary, funded by the congress as a stopgap measure until a more complete system could be designed. Over the next two years, 44 radar stations were built focused around certain strategically important areas. Lashup provided nothing near nationwide coverage, but was expected to detect bombing runs directed towards the most important military targets. Lashup included three stations surrounding my own Albuquerque, due to the importance of the Sandia and Manzano Amy Bases and the Z Division of Los Alamos.

Lashup sites used sophisticated radar sets for the time, but perhaps the most important innovation of Lashup was the command and control infrastructure built around it. Lashup stations were connected to the air defense command by dedicated telephone lines, the air defense command was connected to the continental air command by another dedicated telephone line, and ultimately dedicated lines were connected all the way to the White House. This was the first system built to allow a prompt nuclear response by informing the commander in chief of an impending nuclear attack as quickly as possible.

If you've read any of my other material on the cold war, you might understand that this is the core of my fascination with cold war defense history: the threat of nuclear attack was, for the large part, the first thing to motivate the development of a nationwide rapid communications system. For the first half of the 20th century there simply wasn't a need to deliver a message from the west coast to the President in minutes, but in the second half of the 20th century there most definitely was.

The fear of a Soviet nuclear strike, and the resulting government funding, was perhaps the largest single motivator of progress in communications and computing technology from the 1940s to the 1980s. Most of the communications technology we now rely on was originally built to meet the threat of a first strike.

We see this clearly in the case of air defense radar. While Lashup nominally had the capability to deliver a prompt warning of nuclear attack, the entire process was rather manual and thus not very reliable. Fortunately, lashup was temporary, and just as construction of the Lashup sites was complete work started on its replacement: the Permanent System.

The Permanent System consisted of a large number of radar stations, ultimately over 100. More importantly, though, it consisted of a system of communications and coordination centers intended to quickly confirm and communicate a nuclear threat.

It will help in understanding this system to understand the strategic principal involved. The primary defense against a nuclear attack by bombers was a process called ground-controlled intercept, or GCI. The basic concept of GCI was that radar stations would provide up-to-date position and track information on inbound enemy aircraft, which would be used to vector interceptor aircraft directly towards the threat. The aid of ground equipment was critical to an effective response, as fighter aircraft of the time lacked sophisticated targeting radar and had no good way to search for bombers.

To this end, the Permanent System included Manual Air Defense Control Centers (ADCC) (the "manual" was used to differentiate from automatic centers in the later SAGE system). The ADCCs received information on radar targets from the individual radar sites via telephone, and plotted them with wet erase marker on clear plexiglass maps (perhaps the source of the clear whiteboard trope now ubiquitous in films) in order to correlate multiple tracks. They then reported these summarized formations and tracks to the Air Defense Command, at Ent AFB in Colorado, for use in directing interceptors.

The Permanent System was extended beyond CONUS, although Alaska continued to have a distinct air defense program. The biggest OCONUS extension of the Permanent System was into Canada, with the Pinetree Line (the first of the cross-Canadian early warning radar networks) roughly integrated into the Permanent System. Perhaps most interestingly, the Permanent System also saw an early effort at extension of early warning radar into the ocean. This took the form of the Texas Towers, a set of three awkward offshore radar stations that were later abandoned due to their poor durability against rough seas [1].

Technology was advancing extremely rapidly in the mid-20th century, and by the time the Permanent System reached nearly 200 radar stations it had also become nearly obsolete. For its vast scale, the capabilities of the Permanent System were decidedly limited: it could only detect large aircraft, it performed poorly at low altitudes (often requiring mitigation through "gap filler" stations), and interpretation and correlation of radar data was a manual process, costing precious minutes in the timeline of a nuclear reprisal.

Here in Albuquerque, Kirtland Air Force Base was host to the Kirtland Manual ADCC, activated in 1951. 13 radar stations around New Mexico, eastern Arizona, and western Texas reported to Kirtland AFB. Each of these 13 radar stations was itself a manned Air Force Station including housing and cantonment. The Continental Divide Air Force Station, for example, consisted of some fifty people in remote McKinley County. The station included amenities like a library and gym, housing and a trailer park, and two radars: an early warning radar and a height-finding radar. Finally, a ground-air transmit-receive (GATR) radio site provided a route for communications with interceptors.

Continental Divide AFS was deactivated in 1960. You can still see the remains today, although there is little left other than roads and some foundations.

Like Continental Divide AFS, the Permanent System as a whole failed to make it even a decade. In 1960, it was as obsolete as Lashup, having been replaced not only by improved radar equipment but, more importantly, by a vastly improved communications and correlation system: the Semi-Automatic Ground Environment, or SAGE---by most measures, the first practical networked computer system.

We'll talk about SAGE later, but for now, check out a list of Permanent System sites. There might be one in your area. Pay it a visit some time; in many ways it's the beginning of the computer revolution: a manual data collection network obsoleted in just a few years by the development of the first nationwide computer network.

[1] The Texas Towers were connected to shore via troposcatter radio links, one of my favorite communications technologies and something that will surely get a full post in the future.


>>> 2021-08-16 on voting

The use of electronics to administer elections has been controversial for some time. Since the "hanging chads" of the 2000 election, there's been some degree of public awareness of the use of technology for voting and its possible impacts on the accuracy and integrity of the election. The exact nature of the controversy has been through several generations, though, reflecting both changes in election technology and changes in the political climate.

Voting is a topic of great interest to me. The administration of elections is critical to a functioning democracy, and raises a variety of interesting security and practical challenges. In particular, the introduction of automation into elections presents great opportunity for cost savings and faster reporting, but also a greater risk of intentional and accidental interference in the voting process. Back when I was in school, I focused some of my research on election administration. Today, I continue to research the topic, and have added the practical experience of being a poll worker in two states and for many elections [1].

Given my general propensity to have opinions, it will come as no surprise that this has all left me with strong opinions on the role of computer technology in election administration. But before we get to any of that, I want to talk a bit about the facts of the matter.

The thing that most frustrates me about controversies surrounding electronic voting is the generally very poor public understanding of what electronic voting is. If you follow me on Twitter, you may have seen a thread about this recently, and it's a ramble I go on often. There is a great deal of public misconception about the past, present, and future role of electronics in elections. These misunderstandings constantly taint debate about electronic voting.

In an on-and-off series of posts, I plan to provide an objective technical discussion of election technology, "electronic voting," and security concerns surrounding both. I will largely not be addressing recent "stolen election" conspiracy theories for a variety of reasons, but will undoubtedly touch on them occasionally. At the very least, because I can never turn down an opportunity to talk about J. Hutton Pulitzer, an amazing wacko who has a delightful way of appearing with a huge splash, making a fool of himself, and then disappearing... to pop up again a couple years later in a completely different context.

I will restate that my goal here is to remain largely apolitical (mocking J. Hutton Pulitzer aside), and as a result I will not necessarily respond to any given election fraud or interference claim directly. But I do think anyone interested in or concerned by these theories will find the technical context that I can provide very useful.

Who runs elections?

One of the odd things about the US, compared to other countries, is the general architecture of election administration. In the US, elections are mostly administered by the county clerk, and the election process is defined by state law. Federal law imposes only minimal requirements on election administration, leaving plenty of room for variation between states.

Although election administration is directly performed by the county clerk, for state-level elections (which is basically all the big ones) the secretary of state performs many functions. It's also typical for the secretary of state to provide a great deal of support and policy for the county clerks. So, while county clerks run elections, it's common for them to do so using equipment, software, and methods provided by the state. It's ultimately the responsibility of states to pay for elections, which is probably the greatest single problem with US election integrity, because states are poor.

While it seems a little odd that, say, a presidential election is run by the county clerks, it can also be odd the other way. Entities like municipalities, school districts, higher education districts, flood control districts, all kinds of sub-county entities may also have elected offices and the authority to issue bond and tax measures. These are typically (but not always) administered by the county or counties as well, usually on a contract basis.

What is electronic voting?

Debate around electronic voting tends to focus purely around "voting machines," a broad category that I will define more later. The reality is that voting machines are only a small portion of the overall election apparatus, and are not always the most important part. So before I get into the world of election security theory, I want to talk a bit about the moving parts of an election, and where technology is used.

The general timeline of an election looks like this:

To meet these ends, election administrators use various different systems. There's a great deal of mix-and-match between these systems, many vendors offer a "complete solution" but it's still common for election administrators to use products from multiple vendors.

Each of these systems poses various integrity and security concerns. However, election systems can be roughly divided into two categories: tabulating systems and non-tabulating systems.

Tabulating systems, such as tabulators and direct recording electronic (DRE) machines, directly count votes which they record in various formats for later totalization. Tabulating systems tend to be the highest-risk element of an election because they are the key point at which the outcome of an election could be altered by, for example, changing votes.

Non-tabulating systems perform support functions such as design of ballots, registration of voters, and totalizing of tabulated votes. These systems tend to be less security critical because they produce artifacts which are relatively easy to audit after the fact. For example, a fault in ballot design will be fairly obvious and easy to check for. Similarly, totalizing of tabulated votes can fairly easily be repeated using the original output of the tabulators (and tabulators typically output their results in multiple independent formats to facilitate this verification).

This is not to say that tabulating systems are not subject to audit. When a paper form of the voter's selections exists (a ballot or paper audit trail), it's possible to manually recount the paper form in order to verify the correctness of the tabulation. However, this is a much more labor intensive and costly operation than auditing the results of other systems. In the case of DRE systems with no paper audit trail, an audit may not be possible.

We will be discussing all of these systems in more detail in the future.

Why electronic voting

There is one fundamental question about electronic voting that I want to address up front, in this overview. That is: why electronic voting at all?

Most of the fervor around electronic voting has centered around direct recording electronic (DRE) machines that lack a voter verifiable paper audit trail (VVPAT) [2]. These machines, typically touchscreens, record the voter's choices directly to digital media without producing any paper form. As a result, there is typically no acceptable way to audit the tabulation performed by these machines. Software bugs or malicious tampering could result in an incorrect tabulation that could not be readily detected or corrected after the fact.

It's fairly universally accepted that these machines are a bad idea. Basically no one approves of them at this point. So why are they so common?

Well, this is the first major misconception about the nature of electronic voting: DRE machines with no VVPAT are rare. Only ten states still use them, and most of those states only use them in some polling places. Year by year, the number of DRE w/o VVPAT machines in use decreases as they are generally being replaced with other solutions.

The reason is simple: they are extremely unpopular.

So why did anyone ever have DRE machines? And why do we use machines at all instead of paper ballots placed in a simple box?

The answer is the Help America Vote Act of 2002 (HAVA). The HAVA was written with a primary goal of addressing the significant problems that occurred with older mechanical voting systems in the 2000 election, including accessibility problems. Accessibility is its biggest enduring impact: the HAVA requires that all elections offer a voting mechanism which is accessible to individuals with various disabilities including impaired or no vision.

In 2002, there were few options that met this requirement.

The other key ingredient is, as we discussed earlier, the nature of election administration in the US. Elections are not just administered but funded at the state and county level. State budgets for elections have typically been very slim, and suddenly, in 2002, most states suddenly faced a requirement that they replace their voting systems.

The result was that, in the years shortly after 2002, basically the entire United States replaced its voting systems on a shoestring budget. Many states were forced to go for the cheapest possible option. Because paper handling adds an appreciable amount of complexity, the cheapest option was to do it in software: "paperless," or non-auditable, DRE machines.

To the extent that DRE w/o VVPAT machines are still in use in 2021, we are still struggling with the legacy of the HAVA's good intentions combined with the US's decentralized and tiny budget for the fundamental administration of democracy.

We don't have non-auditable voting systems because someone likes them. We have them because they were all we could afford in 2003, and because we haven't since been able to afford to replace them.

Basically the entire electronic voting landscape revolves around this single issue: there is enormous pressure in the US to perform elections as cheaply as possible, while still meeting sometimes stringent but often lax standards. The driver on selection of election technology is almost never integrity, and seldom speed or efficiency. It is nearly always price.

In upcoming posts, I will be expanding on this with (at least!) the following topics:

[1] I highly recommend that anyone with an interest in election administration step up as a poll worker. You will learn more than you could imagine about the practical considerations around elections.

[2] We will talk more about VVPAT and how it compares to a paper ballot in the future.


>>> 2021-08-03 key systems

programming note update: the ongoing reliability problems with computer.rip have been tracked down to a piece of hardware which is Not My Problem, and so I anxiously await the DC installing a replacement. Hopefully the problem will be resolved shortly.

And now for more about telephones, because I am on vacation in Guadalajara and telephones are decidedly a recreational topic. If you follow me on twitter I am probably about to provide an over-length thread on some Mexican telephone trivia.

Back when I was talking about turrets, I mentioned their relationship to key systems. While largely forgotten today, key systems were an important step in the evolution of business telephone systems and remain influential on business telephony today. Let's talk a bit about key systems, including some particularly notable ones.

But first, it would be helpful to understand the landscape of business telephony systems. I'm writing this from the perspective of today, but I think this overview will be helpful in understanding the context in which the key system was invented and became popular.

Most businesses have a simple problem: they have, say, ten employees, each with a phone, but they do not want to pay the considerable expense of having ten telephone lines in service. It would be much better to have, say, two telephone lines, which were shared among the employees. The first and most obvious solution was the private branch exchange, often abbreviated PBX. In a classic PBX arrangement, one or more outside lines terminate at a small manual exchange (the type with operators that insert plugs to connect lines). The PBX can provide the same services as a telco exchange, including answering incoming calls and directing them to inside lines, but comes at the significant disadvantage of requiring an operator.

Today, it's not unusual for a front-desk receptionist or other similar employee to serve as the de facto telephone operator (usually today called an "attendant" to differentiate from the older position of a dedicated operator), answering incoming calls and directing them appropriately. The design of a manual telephone exchange made this impractical, though, as even small manual exchanges were pretty large and nearly required wearing a headset... wearing a headset and sitting behind a plugboard was not amenable to greeting guests or other typical receptionist tasks, so a dedicated, full-time telephone operator was basically required. This made PBXs very expensive to operate, in addition to the considerable expense of purchasing one.

The solution here seems obvious: the Private Automated Branch Exchange, or PABX. A PABX uses automatic switching rather than manual. Outbound calls can be made by dialing, while inbound calls can be managed by various techniques like DID or an automated attendant. In the case of DID, Direct Inward Dialing, the telephone company assigns a unique telephone number to each employee of a company even though the company does not have that many lines (for practical reasons related to how mechanical switches hunted for available lines, in early cases these numbers usually had to be sequential). When the telco connected a call to the PABX, it used some technique to indicate the number the call had been dialed to originally---early on this was often the delightfully named Revertive Pulsing, where once the PABX "answered" the line the exchange pulse-dialed back to the PABX, often with the last n digits of the called number.

In the case of an automated attendant (AA), the PABX answers and plays an audio recording prompting the caller to enter an extension. It then connects the call appropriately. The AA may optionally provide a menu of usually single-digit options, although this is a bit more complicated to implement and was not as common on early PABXs.

DID and AA are both ubiquitous today. The use of telephone extensions inside of businesses has generally decreased over the years as DID has become easier and cheaper to implement, but AAs remain common for telephone menus, which may straddle the line between a "mere" AA and the more complicated interactive voice response (IVR) system.

Here's the problem, though: in the early days of business telephony, DID and AAs were both very complex to implement. Early PABXs were mechanical, even Strowger (also called step-by-step or SXS), and the introduction of DID significantly complicated the switching matrix. The lack of good, reliable audio playback devices and the lack of universal DTMF signaling made AAs impractical for quite some time.

So, here is the problem: for smaller organizations, which could not justify the expense of employing a telephone operator during business hours, there were few practical options. PABXs were too expensive and too limited, often still requiring a full-time operator to handle incoming calls [1].

The key system was introduced as a compromise. Like a PABX, it does not require an operator. But, a key system is substantially less complex and expensive than a PABX. What's the trick? A key system makes everyone act as the operator.

When I previously mentioned key systems I put it like this: a PABX connects many users to each line. A key system connects many lines to each user.

Lets say again you are a small organization with about ten employees and you want to pay for two lines. When you install a key system, you connect the two outside lines to a Key Service Unit (KSU). The KSU is then connected to each of the ten telephones by a large, multi-pair cable, often a 25-pair Amphenol type connector. Superficially, it may look like a PABX, but the use of the multi-pair cable is a big hint to what's going on: the KSU only provides very minimal electrical conversions and mostly just acts as a jumper matrix. All of the actual logic is in the telephones, each of which have all of the outside lines connected directly to them.

The "key" in "key system" refers to the "line keys" on each phone. In our notional two-line system, each phone has two buttons labeled "line 1" and "line 2." Whenever a line is in use, the button lights. When a line is ringing, the light flashes and the phone may ring depending on configuration (ringing can usually be enabled/disabled per line to provide a simple concept of "call groups" if the outside lines have different numbers).

To place a call, a user presses a line key that is not lit, which connects their phone directly to that outside line. They then dial normally. To answer a call, the user presses the flashing line key and then picks up the phone. All they really have is a phone that is connected to all of the outside lines, the key system just makes it possible to have many phones connected this way at once.

Of course, early on key systems sprouted additional features. Even the earliest key systems started to offer an "intercom" feature, in which one or more pairs on each phone were connected to an "intercom bridge" in the KSU. This provided a feature that is superficially like a PABX's inside calling: a user can press an intercom key and then dial a number, which causes another phone on the system to ring. When that person answers, they can have a conversation. Of course the simple design of the feature imposes a lot of limitations, and generally only one intercom call can be made for each assigned intercom bridge on the system. This was often only one or two.

You can also see that key systems pose a significant risk of "collisions." Later key systems often included a "privacy" feature that locked out phones from connecting to a line when it was currently in use, so that other users could not eavesdrop on your calls. The feature could similarly prevent someone trying to make an intercom call suddenly being placed in an existing call. Of course these features meant that if all outside lines or all intercom bridges were in use, it was simply not possible to make a call. The line key lights served an important purpose in showing users when a line was available for their use.

Perhaps the quintessential key system is the Western Electric 1A and descendants, which were in widespread use for decades around the mid century. Later revisions of the 1A such as the 1A2 supported as many as 29 lines to each phone (this required multiple 25-pair cables per phone!) and advanced (for the time) features such as attended transfer and music on hold.

Key systems were often designed flexibly to reduce cost of installation. For example, outside lines might be allocated to different departments. Most phones would only need to be connected to the lines for their department, but a receptionist might have a "call director" phone that presented all lines so that they could answer calls for multiple departments [2].

My favorite key system, though, is the AT&T Merlin. The Merlin was a late digital key system, introduced in 1983, and so began to blur the line between key system and PABX. Most importantly, though, the Merlin telephone instruments were beautiful. Seriously, look at them. An advertising campaign including product placement in films and television reinforced the aesthetic cache of the Merlin. The campaign is said to have been so successful that the Merlin instruments became something of a status symbol, and client-contact organizations like law firms would upgrade from 1A2 to Merlin just for the desk decorations. I recall having read once that the Merlin was a key inspiration for the design of the NeXT Cube under Steve Jobs, but I cannot find a source on this now so perhaps I just made it up. I certainly hope it's true!

It might seem that key systems would be an artifact of history today, entirely outmoded by the availability of inexpensive PABX systems. There were a lot of disadvantages to key systems. Besides the issue of users having to manually select lines, and limited logic on ring groups, the large multi-pair cables required to telephone instruments made key systems expensive to install and not amenable to reuse of existing phone cabling in a building.

The funny thing is that sort of the opposite happened. The low-cost PABXs that became readily available in the 1990s were actually more descended from key systems than the earlier electromechanical PABXs. The small business PABX I have in my house, for example, the Comdial DX80, is basically an overgrown key system. Yet it has many of the advantages of an earlier PABX!

Here's the trick: the availability of computer-controlled digital switching and communications allowed for implementing a "key system" using a standard two-pair line to each telephone. Small businesses were usually upgrading from key systems and expected similar behavior. So it just made sense to take a suite of PABX features and shove them into a key system, using digital signaling to simplify the installation of the system.

So the DX80 for example works like this: the KSU communicates with the phones using a digital protocol over a single-pair telephone line. Each telephone instrument can be equipped with a full set of line keys for the KSU's up-to-16 outside lines, but the KSU is also capable of automatically selecting outside lines and automated incoming call routing based on DID or an auto-attendant. Internal calling between phones is managed digitally and is not limited to one or two intercom lines. All this adds up to flexibility: you can use the DX80 as either a key system or a PABX, depending on how you configure it. You can leave automated line selection un-configured and present line keys on the phones, or you can remove the line keys from phones (reallocating them to other uses) and set up fully automatic call handling.

Many organizations ended up doing both!

A lot of '90s to '00s PABXs were like this. They had sort of an identity crisis between key system and PABX where they wanted to present the convenience of a PABX without removing the familiar line keys for direct access to outside lines. Those line keys could be important, after all, as not all businesses had a DID arrangement (or even disconnect supervision) from their telco, so the use of the line keys allowed for connecting the PABX directly to a "normal" telephone line without needing to get the telco to enable additional features.

Today, most business telephone systems are being converted to VoIP which can provide additionally flexibility and features, and basically obsoletes the concept of a key system since the "number of lines" on a VoIP trunk is a largely synthetic concept. Nonetheless, most VoIP systems can be configured for key-system-like behavior if you really want it.

[1] I have omitted from this discussion the Centrex and other forms of telco- operated PABXs. I will probably do a full post on these in the future. For a short time I worked for a large organization which owned a formerly AT&T-operated 5ESS as their PABX and had the pleasure of getting an extensive tour of the system from one of its few remaining on-site technicians. It has since been decommissioned. As a basic hint, when an organization is large enough to have one or more exchange codes to itself (often seen with universities and older large corporations), it's likely that they had an on-site PABX provided by the telco. If an organization had a set of sequential numbers but no on-site switch, they probably used Centrex, which was basically the same arragement except for the switch was located in a telco office (and often "virtualized" on an existing ESS). Centrex was also popular with organizations that were very large but had multiple facilities, like school districts, since the existing telco exchange office was as convenient of a central location as anywhere else. That said, the nature of their close relationship to government meant that school districts often found it convenient to run their own private trunk lines between buildings, and so they may have still used an on-site switch.

[2] The term "call director" is still sometimes used today to refer to phones with an unusually large number of line buttons, often on a device like a "receptionist sidecar". The terminology is confused by "Call Director" also being the name of various PABX products and features.


>>> 2021-07-26 rip those bits to shreds

Programming note: you may have noticed that computer.rip has been up and down lately. My sincere apologies, one of the downsides of having a neo-luddite aversion to the same cloud services you work with professionally all day is that sometimes your "platform as a physical object" (PaaPO) starts exhibiting hardware problems that are tricky to diagnose, and you are not paid to do this so you are averse to spending a lot of your weekend on it. Some messing around and remote hands tickets later the situation seems to have stabilized, and this irritation has given me the impetus to get started on my plans to move this infrastructure back to Albuquerque.

Let's talk a bit about something practical. Since my academic background is in computer security, it's ego-inflating to act like some kind of expert from time to time. Although I have always focused primarily on networking, I also have a strong interest in the security and forensic concerns surrounding file systems and storage devices. Today, we're going to look at storage devices.

It's well known among computing professionals that hard disk drives pose a substantial risk of accidental data exposure. A common scenario is that a workstation or laptop is used by a person to process sensitive information and then discarded as surplus. Later, someone buys it at auction, intercepts it at a recycler, or similar and searches the drive for social security numbers. This kind of thing happens surprisingly frequently, perhaps mostly because the risk is not actually as common knowledge as you would think. I have a side hustle, hobby, and/or addiction of purchasing and refurbishing IT equipment at auction. I routinely purchase items that turn out to have intact storage, including from government agencies.

So, to give some obvious advice: pay attention to old devices. If your organization does not have a policy around device sanitization, it should. Unfortunately the issue is not always simple, and even organizations which require sanitization of all storage devices routinely screw it up. A prominent example is photocopiers, for years organizations with otherwise good practices were sending photocopiers back to leasing companies or to auction without realizing that most photocopiers these days have nonvolatile storage to which they cache documents. So having a policy isn't really good enough on its own: you need to back it up with someone doing actual research on the devices in question. I have heard of a situation in which a server was "sanitized" and then surplussed with multiple disk drives intact because the person sanitizing it didn't realize that the manufacturer had made the eccentric decision to put additional drive bays on the rear of the chassis!

But that's all sort of besides the point. We all agree that storage devices need to be sanitized before they leave your control... but how?

Opinions on data sanitization tend to fall into two camps. Roughly, those are "an overwrite is good enough" and "the only way to be sure is to nuke it from orbit." Neither of these positions are quite correct, and I will present an unusually academic review here of the current state of storage sanitization, along with my opinionated advice.

The black-marker overwrite

The most obvious way to sanitize a storage device, perhaps after burying it in a hole, is to overwrite the data with something else. It could be ones, it could be zeroes, it could be random data or some kind of systematic pattern. The general concept of overwriting data to destroy it presumably dates back to the genesis of magnetic storage, but for a long time it's been common knowledge that merely overwriting data is not sufficient to prevent recovery.

A useful early illustration of the topic is Venugopal V. Veeravali's 1987 master's thesis, "Detection of Digital Information from Erased Magnetic Disks." It's exactly what it says on the tin. The paper is mostly formulae by mass, but the key takeaway is that Veeravali connected a spectrum analyzer to a magnetic read head. They showed that the data from the spectrum analyzer, once subjected to a great deal of math, could be used to reconstruct the original contents of an erased disk to a certain degree of confidence.

This is pretty much exactly the thing everyone was worried about, and various demonstrations of this potential lead to Peter Gutmann's influential 1996 paper "Secure Deletion of Data from Magnetic and Solid-State Memory." Gutmann looks at a lot of practical issues in the way storage devices work and, based on consideration of specific patterns that could remain considering different physical arrangements for data storage, proposes the perfect method of data erasure. The Gutmann Method, as it's sometimes called, is a 35-pass scheme of overwriting with both random data and fixed patterns.

The reason for the large number of passes is partially Just To Be Sure, but the fixed pattern overwrites are targeted at specific forms of encoding. The process is longer than strictly needed just because Gutmann believes that a general approach to the problem requires use of multiple erasure methods, one of which ought to be appropriate for the specific device in question. This is to say that Gutmann never really thought 35 passes were necessary. Rather, to put it pithily, he figured eight random passes would do and then multiplied all the encoding schemes together to get 27 passes that ought to even out the encoding-related patterns on the drives of the time.

Another way to make my point is this: Gutmann's paper is actually rather specific to the storage technology of the time, and the time was 1996. So there's no reason to work off of his conclusions today. Fortunately few people do, because a Gutmann wipe takes truly forever.

Another influential "standard" for overwriting for erasure is the "DoD wipe," which refers to 5220.22-M, also known as the National Industrial Security Program Operating Manual, also known as the NISPOM. I can say with a good degree of confidence that every single person who has ever invoked this standard has misunderstood it. It is not a standard, it is not applicable to you, and since 2006 it no longer makes any mention of a 3-pass wipe.

Practical data remanance

The concept of multi-pass overwrites for data sanitization is largely an obsolete one. This is true for several different reasons. Most prominently, the nature of storage devices has changed appreciably. The physical density of data recording has increased significantly. Drive heads now operate on magnetic coils and track dynamically rather than under absolute positioning (reducing error in tracking). And there are of course today many solid-state drives, which repeatedly overwrite data as a matter of normal operating procedure (but at the same time may leave a great deal of data available).

You don't need to take my word on this! Also in 2006, for example, the NIST issued new recommendations on sanitization stating that a single overwrite was sufficient. This may have been closely related to the 2006 NISPOM change. Gutmann himself published a note in 2011 that he no longer believes his famous method to be relevant and assumes a single overwrite to be sufficient.

Much of the discussion of recovery of overwritten data from magnetic media has long concentrated around various types of magnetic microscopes. Much like your elementary school friend who's uncle works for Nintendo, the matter is frequently discussed but seldom demonstrated. Without wanting to go too deep into review of literature and argumentative blog posts, I think it is a fairly safe assertion that recovery of data by means of electron microscopy, force microscopy, magnetic probe microscopy, etc is infeasible for any meaningful quantity of data without enormous resources.

The academic work that has demonstrated recovery of once-overwritten data by these techniques has generally consisted of extensive effort to recover a single bit at a low level of confidence. The error rate makes recovery of even a byte impractical. A useful discussion of this is in the ICISS 2008 conference paper "Overwriting Hard Drive Data: The Great Wiping Controversy," amusingly written in part by a man who would go on to claim (almost certainly falsely) to have invented Bitcoin. It's a strange world out there.

As far as summing up the issue, I enjoy the conclusion of a document written by litigation consultant Fred Cohen:

To date I have found no example of any instance in which digital data recorded on a hard disk drive and subsequently overwritten was recovered from such a drive since 1985... Indeed, there appears to be nobody in the [forensics and security litigation] community that disputes this result with any actual basis and no example of recovery of data from overwritten areas of modern disk drives. The only claims that there might be such a capability are based on notions surrounding possible capabilities in classified environments to which the individuals asserting such claims do not assert they have actual access and about which they claim no actual knowledge.

Recovery of overwritten data by microscopy is, in practice, a scary story to tell in the dark.

The takeaway here is that, for practical purposes, a single overwrite of data on a magnetic platter seems to be quite sufficient to prevent recovery.

It's not all platters

Here's the problem: in practice, remanance on magnetic media is no longer the thing to worry about.

The obvious reason is the extensive use of SSDs and other forms of flash memory in modern workstations and portable devices. The forensic qualities of SSDs are, to put it briefly, tremendously more complicated and more poorly understood than those of HDDs. To even skim the surface of this topic would require its own post (perhaps it will get it one day), but the important thing to know is that SSDs throw out all of the concerns around HDDs and introduce a whole set of new concerns.

The second reason, though, and perhaps a more pervasive one, is that the forensic properties of the magnetic platters themselves are well understood, but those of the rest of the HDD are not.

The fundamental problem in the case of both HDDs and SSDs is that modern storage devices are increasingly complex and rely on significant onboard software in order to manage the physical storage of data. The behavior of that onboard software is not disclosed by the manufacturer and is not well understood by the forensics community. In short, when you send data to an HDD or SSD, we know that it puts that data somewhere but in most cases we really don't know where it puts it. Even in HDDs there can be significant flash caching involved (especially on "fancier" drives). Extensive internal remapping in both HDDs and SSDs means that not all portions of the drive surface (or flash matrix, etc) are even exposed to the host system. In the case of SSDs, especially, large portions of the storage are not.

So that's where we end up in the modern world: storage devices have become so complex that the recovery methods of the 1980s no longer apply. By the same token, storage devices have become so complex that we can no longer confidently make any assertions about their actual behavior with regards to erasure or overwriting. A one-pass overwrite is both good enough at the platter level and clearly not good enough at the device level, because caches, remapping, wear leveling, etc all mean that there is no guarantee that a full overwrite actually overwrites anything important.

Recommended sanitization methods

Various authorities for technical security recommendations exist in the US, but the major two are the NIST and the NSA.

NIST 800-88, summarized briefly, recommends that sanitization be performed by degaussing, overwriting, physical destruction of the device, or encryption (we will return to this point later). The NIST organizes these methods into three levels, which are to be selected based on risk analysis, and physical destruction is the recommended method for high risk material or material where no method of reliable overwriting or degaussing is known.

NSA PM 9-12 requires sanitization by degaussing, disintegration, or incineration for "hard drives." Hard drives, in this context, are limited to devices with no non-volatile solid state memory. For any device with non-volatile solid state memory, disintegration or incineration is required. Disintegration is performed to a 2mm particle size, and incineration at 670 Celsius or better.

Degaussing, in practice, is surprisingly difficult. Effective degaussing of hard drives tends to require disassembly in order to individually degauss the platters, and so is difficult to perform at scale. Further, degaussing methods tend to be pretty sensitive to the exact way the degaussing is performed, making them hard to verify. The issue is big enough that the NSA requires that degaussing be followed by physical destruction of the drive, but to a lower standard than for disintegration (simple crushing is acceptable). For that reason, disintegration and incineration tend to be more common in government contexts.

It's sort of funny that I tell you all about how multiple overwrite passes are unnecessary but then tell you that accepted standards require that you blend the drive until it resembles a coarse glitter. "Data sanitization is easy," I say, chucking drives into a specialized machine with a 5-figure price tag.

The core of the issue is that the focus on magnetic remanance is missing the point. While research indicates that magnetic remanance is nowhere near the problem it is widely thought to be, in practice remanance is not the way that data is sneaking out. The problem is not the physics of the platters, it's the complexity of the devices and the lack of reliable host access to the entire storage capacity.

ATA secure erase and self-encryption and who knows what else

The ATA command set, or rather some version of it, provides a low-level secure erase command that, in theory, causes the drive's own firmware to initiate an overwrite of the entire storage surface. This is far preferable to overwriting from the host system, because the drive firmware is aware of the actual physical storage topology and can overwrite parts of the storage that are not normally accessible to the host.

The problem is that drive manufacturers have been found to implement ATA secure erase sloppily, or not at all. There is basically no external means of auditing that a secure erase was performed effectively. For that simple reason, ATA secure erase should not be relied upon.

Another approach is the self-encrypting drive or SED, which transparently encrypts data as it is written. These devices are convenient since simply commanding the drive to throw away the key is sufficient. SED features tend to be better implemented than ATA secure erase because of the fact that they are only implemented at all on high-end drives that are priced for the extra feature. That said, the external auditing problem still very much exists.

Another option is to encrypt at the host level, and then throw away the key at the host level. This is basically the same as the SED method but since the encryption is performed externally to the drive, the whole thing can be audited externally for assurance. In all reality this is a fine approach to data sanitization and should be implemented whenever possible. If you have ever been on the fence about whether or not to encrypt storage, consider this: if you are effective about encrypting your storage, you won't need to sanitize it later! The mere absence of the key is effective sanitization, as recognized by the NIST.

The problem is that disk encryption features in real devices are inconsistent. Drive encryption may not be available at all, or it may only be partial. This makes encryption difficult to rely on in most practical scenarios.

The bottom line

When you dispose of old electronics, you should perform due diligence to identify all non-volatile storage devices. These storage devices should be physically destroyed prior to disposal.

DIY methods like drilling through platters and hitting things with hammers are not ideal, but should be perfectly sufficient for real scenarios. Recovering data from partially damaged hard drives and SSDs is possible but not easy, and the number of facilities that perform that type of recovery is small. There are lots of ways to achieve this type of significant damage, from low-cost hand-cranked crushing devices to the New Mexican tradition of taking things out to the desert and shooting at them. Await my academic work on the merits of FMJ vs hollow-point for data sanitization. My assumption is that FMJ will be more effective due to penetration in multi-platter drives, but I might be overestimating the hardness of the media, or underestimating the number of rounds I will feel like putting into it.

Ideally, storage devices should be disintegrated, shredded, or incinerated. Unless you are looking forward to making a large quantity of thermite, these methods are difficult without expensive specialized equipment. However, there are plenty of vendors that offer certified storage destruction as a service. Ask your local shredding truck company about their rates for storage devices.

Most conveniently, do what I do: chuck all your old storage devices in a drawer, tell yourself you'll use them for something later, and forget about them. We'll call it long-term retention, or the geologic repository junk drawer.


>>> 2021-07-21 the desqtop

I believe I have mentioned before that the history of early GUI environments for PCs is sufficiently complex and obscure that it's very common to run into incorrect information. This is markedly true of the Wikipedia article on DESQview, which "incorrects" a misconception by stating another incorrect fact. Since it's Wikipedia, the free encyclopedia that anyone can edit, I assume that if I correct it the change will be reverted by bot within seconds.

False claims about TopView aside, the Wikipedia article on DESQview makes most of the salient points about its history. That said, I would like to talk about it a bit because DESQview is a neat example of an argument I've made, and it happens to dovetail into another corner of GUI history that I'll bring up here and there.

DESQview was a multitasking GUI built by a company called Quarterdeck. It was released for DOS in 1985, so several years after Visi On, and right in the thicket of most of the DOS GUIs. DESQview is a GUI, though, only in the sense of the logical paradigm of user interactions. It actually runs in textmode, using the DOS extended ASCII box drawing figures to create windows and menus, and using letters and symbols as buttons. It's similar in this regard to the relatively modern Twin (Terminal Windows), and could be viewed as a souped up terminal multiplexer like tmux.

Despite running in textmode, DESQview has basically all of the WIMP (Windows, Icons, Menus, Pointer) behavior that we consider typical of a GUI. To be fair, by virtue of running in textmode it fundamentally lacks icons, but so did a number of other early GUIs that ran in graphics mode. Any one of us could sit down in front of a machine running DESQview and figure out the basic interactions without much trouble, something that can't be said of most terminal multiplexers. Here is an example of the philosophical divide between TUI and GUI, or more specifically between unguided and guided: terminal multiplexers like screen and tmux are unguided interfaces that expect the user to read the manual. More typical of the GUI, DESQview attempts to make most functionality fairly discoverable to the user.

So in that light, consider this sentence from the Wikipedia article: "DESQview is not a GUI (Graphical User Interface) operating system. Rather, it is a non-graphical, windowed shell that runs in real mode on top of DOS, although it can run on any Intel 8086- or Intel 80286-based PC."

It's not a GUI, it's a non-graphical windowed shell. It runs in real mode on top of DOS, which is true of basically all '80s GUIs including Windows. It has windows, and it's a shell, but it's not a GUI because it's non-graphical. To me, at least, this whole thing is a bit farcical. The desire here to exclude DESQview from the category of GUIs only serves to reinforce that the interaction concept that we refer to as the "GUI" is actually quite divorced from the difference between text and raster displays. You can always employ ASCII art to pretend you have a graphical display, after all.

Another interesting component of DESQview to discuss is its support for DOS applications. We saw with Visi On that there is sort of a basic conflict involved in developing a DOS GUI: if it runs on top of DOS, users will want to be able to run their existing DOS software. But DOS software assumes full control of the machine and does not play well with multitasking. Visi On went the route of throwing DOS out the window and requiring that software be written specifically for Visi On [1]. DESQview went the opposite, more consumer-friendly route, of bending over backwards to work with the existing DOS stable.

DESQview had a significant leg up on this venture because its developer, Quarterdeck, had previously sold a DOS task-switcher called Desq. Task switchers are not really a familiar part of the modern computing landscape because of the ubiquity of multitasking operating systems. Back in the '80s, though, most microcomputer operating systems were single-task and so the ability to run multiple programs at the same time could only be simulated. A task switcher created something like multitasking by doing exactly what it sounds like: switching out the tasks.

Specifically, Desq acted as a DOS TSR, or Terminate and Stay Resident. When launched, Desq installed an interrupt handler and then terminated. The interrupt handler fired when keyboard keys were pressed (remember at this point the keyboard on PCs was connected via the 8042 keyboard controller, which generated interrupts on each keypress). The interrupt handler could basically inspect each keyboard event and decide whether to act on it. In effect, a TSR could implement a "global hotkey."

In the case of Desq, the hotkey resulted in Desq seizing control of the machine and stashing the contents of memory. It then presented a utility that allowed the user to select another task, which would be copied into memory and then jumped to. The effect was somewhat like switching windows, but you could only have one program visible at a time.

You might be wondering where that memory was stashed to. This gets into the peculiarities of x86 memory. By the time these task switcher utilities hit the scene, "extended memory" beyond the 1 MB real mode limit was fairly common on PCs. But, real-mode applications were unable to access this extended memory without putting in extra effort [2]. In practice, most DOS applications only ever used the real-mode-addressable memory, so task switchers could somewhat safely swap the first megabyte "basic memory" into the extended memory without the next application messing with it. Of course there was no guarantee, some applications did implement extended memory support and this generally made a program "incompatible" with task switching.

For Quarterdeck, DESQview was basically an extension of Desq, so it was natural to continue to support switching between conventional DOS applications. DESQview did much the same thing, loading and unloading DOS applications, but also using driver tricks to cause applications to "draw" text to their own windows. Like Desq, DESQview could "multitask" only the sense that it could react to interrupts, so the user was effectively "locked in" to the active window until the user triggered DESQview to seize control by use of a keyboard shortcut.

DESQview is an important example of a GUI system that is very much transitional between text and raster, and between TUI and GUI. Other similar examples include TopView, DOS Shell, and Norton Commander, the latter two of which were ostensibly file managers but grew to include a number of GUI features. Interestingly, though, DESQview appeared on the scene after the first text mode competitors. While raster mode has obvious advantages today for GUI software, there were huge additional challenges involved in using raster mode at this point in time. For one, it made compatibility with existing software extremely difficult.

Perhaps more importantly, though, the entire business computing world was on text-based machines, and text was mostly viewed as being perfectly sufficient. There just wasn't a lot of pressure to provide raster operating systems, because people hadn't really seen raster mode put to good use yet.

There are a couple of places to go from here, and you know that I will go to both of them: first, we will eventually need to get to the topic of Windows. I will probably discuss early Windows and TopView somewhat in parallel, because the comparison is interesting and because the competition of Windows and TopView represents yet another twist in the tumultuous partnership between Microsoft and IBM. In more of a fork, though, I will also start into a topic closely related to GUI history: network delivery of GUIs.

I said that DESQview dovetailed into another interesting topic, and it's network GUIs. DESQview was followed by DESQview/X... an X server. While this partially enabled the porting of X applications to DOS, it more importantly contributed to the first wave of thin client GUI systems.

[1] This isn't quite true, it actually is possible to run DOS applications under Visi On but with significant limitations that mostly prevented actually using the feature.

[2] If this sounds a bit amusing, keep in mind that we had basically the exact same problem years later with the 3-ish GB 32-bit limit. Memory beyond the first 3-ish gigabytes on a 32-bit machine could be used only if the application put in extra effort to support it (in that case by implementing PAE rather than XMS, the DOS extended memory API).

<- newer                                                                older ->