_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford                       home subscribe rss

>>> 2021-06-02 a history of powerpoint

A brief interlude from the topic of GUIs to talk about perhaps one of the most infamous of all GUI programs, Microsoft PowerPoint.

PowerPoint is ubiquitous but often criticized in most industries, but I have never seen more complete use and abuse of PowerPoint than in military. I was repeatedly astounded by how military programs invested more effort in preparing elaborately illustrated slides than actually, well, putting content in them. And that, in a nutshell, is the common criticism of PowerPoint: that it allows people to avoid actual effective communication by investing their effort in slides.

Nonetheless, the basic idea of using visual aids in presentations is obviously a good one. The problem seems to be one of degrees. When I competed in expository speech back in high school my "slides" were printed on a plotter and mounted on foam core. More so than the actual rules of the event, this imposed an economy in my use of visual aids. Perhaps the problem with PowerPoint is simply that it makes slides too easy. When all you need to do is click "new slide" and fill in some bullet points, there's nothing to stop the type of presenter who has more slides than ideas.

Of course that doesn't stop the military from hiring graphic designers to prepare their flowcharts, but still, I think the basic concept stands...

As my foam core example suggests, the basic idea of presenting to slides is much older than PowerPoint. I've quipped before that Corporate Culture is what people call their PowerPoint presentations. Most of the large, old organizations I've worked for, private and government, had some sort of "in-group" term for a presentation. For example, at GE, one presents a "deck." Many of these terms are anachronistic, frozen references to whichever presentation technology the organization first adopted.

Visual aids for presentations could be said to have gone through a few generations: large format printed materials, transparent slides, and digital projection. Essentially all methods other than projection have died out today, but for a time these all coexisted.

Printed materials can obviously be prepared by hand, e.g. by a sign painter, and this was the first common method of presenting to slides. Automation started from this point, with the use of plotters. As I have perhaps mentioned before the term "plotter" is a bit overloaded and today is often used to refer to large-format raster printers, but historically "plotter" referred to a device that moved a tool along vectors, and it's still used for this purpose as well.

Some of the first devices to create print materials from a computer were pen plotters, which worked by moving a pen around over the paper. HP and Roland were both major manufacturers of these devices (Roland is still in the traditional plotter business today, but for vinyl cutting). And it turns out that presentations were a popular application. The lettering produced by these devices was basic and often worse than what a sign painter could offer (but requiring less skill). What really sold pen plotters was the ability to produce precise graphs and charts directly from data packages like VisiCalc.

The particularly popular HP plotters, the 75 series, had a built-in demo program that sold this capability by ponderously outlining a pie chart along with a jagged but steeply rising line labeled "Sales." Business!

These sorts of visual aids remained relatively costly to product though until projection became available... large-format plotters, board to make things rigid, etc. are not cheap. Once you buy a single projector for a conference room, though, projection becomes a fairly cheap technology, even with the methods of producing slides.

The basic concept of projection slide technology is to produce graphics using a computer and then print them onto a transparent material which serves as film for a projector. There are a lot of variations on how to achieve this. Likely the oldest method is to produce a document using a device like a plotter (or manual illustration, or a combination) and then photographically expose it on film using a device that could be described as an enlarger set to suck rather than blow. Or a camera on a weird mount, your choice.

In fact this remained a very common process for duplication for a very long time, as once a document was exposed on film photochemical methods can be used to produce printing plates or screens or all kinds of things. There is a terminological legacy of this method at least in the sciences, where many journals and conferences refer to the final to-be-printed draft of a paper as the "camera-ready" version. In the past, you would actually mail this copy to them and they (or more likely their printing house) would photograph it using a document camera and use the film to create the plates for the printed journal or proceedings.

If you've seen older technical books or journals, you may have seen charts and math notation that were hand-written onto the paper after it was typewritten (with blank spaces left for the figures and formulas). That's the magic of "reprographics," a term which historically referred mostly to this paper to film to paper process but nowadays gets used for all kinds of commercial printing. This is closely related to the term "pasting up" for final document layout, since a final step before reprographic printing was usually to combine text blocks, figures, etc produced by various means into a single layout. Using paste.

For presentations, there are a few options. The film directly off the document camera may be developed and then mounted in a paper or plastic slide to be placed in a projector. If you are familiar with film photography, that might seem a little off to you because developed film is in negative... in fact, for around a hundred years "reversal films" have been available that develop to positive color, and they were typically used to photograph for slides in order to avoid the need for an extra development process. Kodachrome is a prominent example. Reversal films are also sometimes used for typical photography and cinematography but tended to be more complex to develop and thus more expensive, so most of us kept our terrible 35mm photography on negatives.

This approach had the downside that the slide would be very small (e.g. from a 35mm camera), which required specialized projection equipment (a slide projector). The overhead projector was much more flexible because the "film frame," called the platen, was large enough for a person to hand-write on. It served as a whiteboard as well as a projector. So more conference rooms featured overhead projectors than slide projectors, and there was a desire to be able to project prepared presentations on these devices.

This concept, of putting prepared (usually computer-generated) material on a transparent sheet to be placed on an overhead projector, is usually referred to as a "viewgraph." Viewgraphs were especially popular in engineering and defense fields, and there are people in the military who refer to their PowerPoint presentations as viewgraphs to this day. There are multiple ways to produce viewgraphs but the simplest and later on most common was the use of plastic sheets that accepted fused toner much like paper, so viewgraphs could either be printed on a laser printer or made by photocopying a paper version. When I worked for my undergraduate computer center around a decade ago we still had one laser printer that was kept stocked with transparency sheets, but people only ever printed to it by accident.

In fact, these "direct-print" transparencies were a major technical advancement. Before the special materials were developed to make them possible, overhead transparencies were also produced by photochemical means and use of a document camera and enlarger. But most large institutions had an in-house shop that could produce these with a quick turnaround, and they were still popular even before easy laser printing.

Not all projection slides were produced by photographing or copying a paper document, and in fact this method was somewhat limited and tended not to work well for color. By the '70s photosetting had become practical for the production of printing plates directly from computers, and it was also used to produce slides and transparencies. At the simplest, a photosetter is a computer display with optics that focus the emitted light onto film. In practice, many photosetters were much more complicated as they used shifting of the optics to expose small sections of film at a time, allowing for photosetting at much higher resolution than the actual display (often a CRT).

Donald Knuth originally developed TeX as a method of controlling a photosetter to produce print plates for books, and some of TeX's rougher edges date back to its origin of being closely coupled to this screen-to-film process. The photosetting process was also used to produce slides direct from digital content, and into the early '00s it was possible to send a PowerPoint presentation off to a company that would photoset it onto Kodak slides. Somewhere I have a bin of janitorial product sales presentations on slides that seem to be this recent.

The overhead projector as a device was popular and flexible, and so it was also leveraged for some of the first digital projection technology. In fact, the history of electronic projection is long and interesting, but I am constraining myself to devices often seen in corporate conference rooms, so we will leave out amazing creations like the Eidophor. The first direct computer projection method to become readily available to America's middle management was a device sometimes called a spatial light modulator (SLM).

By the 1980s these were starting to pop up. They were basically transparent LCD displays of about the right size to be placed directly onto the platen of an overhead projector. With a composite video or VGA interface they could be used as direct computer displays, although the color rendering and refresh rate tended to be abysmal. I remember seeing one used in elementary school, along with the 8mm projectors that many school districts held on to for decades.

All of these odd methods of presentation basically disappeared when the "digital projector" or "data projector" became available. Much like our modern projectors, these devices were direct computer displays that offered relatively good image quality and didn't require any of the advanced preparation that previous methods had. Digital projectors had their own evolution, though.

The first widely popular digital projectors were CRT projectors, which used a set of three unusually bright CRT tubes and optics. CRT projectors offered surprisingly good image quality (late-model CRT projectors are pretty comparable to modern 3LCD projectors), but were large, expensive, and not very bright. The tubes were often liquid cooled and required regular replacement at a substantial cost. As a result, they weren't common outside of large meeting rooms and theaters.

The large size, low brightness, and often high noise level of CRT projectors made them a bit more like film projectors than modern digital projectors in terms of installation and handling. They were not just screwed into the ceiling, rooms would be designed specifically for them. They could weigh several hundred pounds and required good maintenance access. All of this added up to mean that they were usually in a projection booth or in a rear-projection arrangement. Rear-projection was especially popular in institutional contexts because it allowed a person to point at the screen without shadowing.

Take a close look at any major corporate auditorium or college lecture hall built in the '70s or '80s and there will almost certainly be an awkward storage room directly behind the platform. Originally, this was actually the projection booth, and a transparent rear-projection screen was mounted in the wall in between. Well-equipped auditoriums would often have both a rear projection and front projection capability, as rear projection required mirroring the image. Anything that came in on film would often be front-projected, often onto a larger screen, because it was simpler and easier. Few things came in on film that someone would be pointing at, anyway.

You may be detecting that I enjoy the archaeological study of 1980s office buildings. We all need hobbies. Sometimes I think I should have been an electrician just so I could explain to clients why their motor-variac architectural lighting controller is mounted in the place it is, but then they'd certainly have found an excuse to make me stop talking to them by that point.

The next major digital projection technology on the scene was DLP, in which a tiny MEMS array of mirrors flip in and out of position to turn pixels on and off. The thing is, DLP technology is basically the end of history here... DLP projectors are still commonly used today. LCD projectors, especially those with one LCD per color, tend to produce better quality. Laser projectors, which use a laser diode as a light source, offer even better brightness and lifespan than the short arc lamps used by DLP and LCD projectors. But all of these are basically just incremental improvements on the DLP projection technology, which made digital projectors small enough and affordable enough to become a major presence in conference rooms and classrooms.

The trick, of course, is that as television technology has improved these projectors are losing their audience. Because I am a huge dweeb I use a projector in my living room, but it is clear to me at this point that the next upgrade will be to a television. Televisions offer better color rendering and brightness than comparably priced projection setups, and are reaching into the same size bracket. An 85" OLED television, while fantastically expensive, is in the same price range as a similarly spec'd projector and 100" screen (assuming ALPR here for more comparable brightness/color). And, of course, the installation is easier. But let me tell you, once you've installed an outlet and video plate in the dead center of your living room ceiling you feel a strong compulsion to use it for something. Ceiling TV?

So that's basically the story of how we get to today. Producing a "deck" for a meeting presentation used to be a fairly substantial effort that involved the use of specialized software and sending out to at least an internal print shop, if not an outside vendor, for the preparation of the actual slides. At that point in time, slides had to be "worth it," although I'm sure that didn't stop all kinds of useless slides to impress people with stars on their shoulders.

Today, though, preparing visual aids for a presentation is so simple that it has become the default. Hiding off to the side of slides is seen as less effort than standing where people will actually look at you. And god knows that in the era of COVID the "share screen" button is basically a trick to make it so people don't just see your webcam video when you're talking. That would be terrible.

There are many little details and variations in this story that I would love to talk about but I fear it will turn into a complete ramble. For example, overhead based projection could be remarkably sophisticated at times. You may remember the scene at the beginning of "The Hunt for Red October" (the film) in which Alec Baldwin gives an intelligence briefing while unseen military aids change out the transparencies on multiple overhead projectors behind rear-projection screens. This was a real thing that was done in important enough contexts.

Slide projectors were sometimes used in surprisingly sophisticated setups. I worked with a college lecture hall that was originally equipped with one rear projection screen for a CRT projector and two front projection screens, both with a corresponding slide projector. All three projectors could be controlled from the lectern. I suspect this setup was rarely used to its full potential and it had of course been removed, the pedestals for the front slide projectors remaining as historic artifacts much like the "No Smoking" painted on the front wall.

Various methods existed for synchronizing film and slide projectors with recorded audio. A particularly well-known example is the "film strip" sometimes used in schools as a cheaper substitute for an actual motion picture. Late film strips were cassette tapes and strips of slides, the projector advanced the slide strip when it detected a tone in the audio from the cassette tape.

Okay, see, I'm just rambling.


>>> 2021-05-24 dialogs not taken

Note: I put up a YouTube video about a minor aerial lift disaster in northern New Mexico. You can see it here: https://www.youtube.com/watch?v=1NDc760fxbY.

Note 2: I have begrudgingly started using Twitter to ramble about the things I spend my day on. It's hard to say how long this will last. https://twitter.com/jcrawfordor.

When we look back on the history of the graphical user interface, perhaps one of the most important innovations in the history of computing, we tend to think of a timeline like this: Xerox, Apple, Microsoft, whatever we're doing today.

Of course that has the correct general contours. The GUI as a concept, and the specific interaction paradigms we are familiar with today, formed in their first productized version at the Xerox Palo Alto Research Center (XPARC). Their production version, the Alto, was never offered as a commercial product but was nonetheless widely known and very influential. Apple's early machines, particularly the Lisa and Macintosh, featured a design heavily inspired by the work at XPARC. Later, Microsoft released Windows 3.11 for Workgroups, the first one that was cool, which was heavily inspired by Apple's work.

In reality, though, the history of the GUI is a tangled one full of controversy ("inspired" in the previous paragraph is a euphemism for "they all sued each other") and false starts. Most serious efforts at GUIs amounted to nothing, and the few that survive to the modern age are not necessarily the ones that were most competitive when originally introduced. Much like I have perennially said about networking, the world of GUIs has so ossified into three major branches (MacOS-like, Windows-like, and whatever the hell Gnome 3 is trying to be [1]) that it's hard to imagine other options.

Well, that's what we're about to do: imagine a different world, a world where it's around the '80s and there are multiple competing GUIs. Most importantly, there are more competing GUIs than there are operating systems, because multiple independent software vendors (ISVs) took on the development of GUIs on top of operating systems like CP/M and DOS. The complexities of a GUI, such as de facto requiring multi-tasking, required that these GUIs substantially blur the line between "operating system" and "application" in a way that only an early PC programmer could love.

And that is why I love these.

Before we embark on a scenic tour of the graveyard of abandoned GUIs, we need to talk a bit about the GUI as a concept. This is important to comprehend the precedents for the GUI of today, and thus the reason that Apple did not prevail in their lawsuit against Microsoft (and perhaps Xerox did not prevail in their lawsuit against Apple, although this is an iffier claim as the Xerox vs. Apple case did not get the same examination as Apple vs. Microsoft).

What is a GUI?

I believe that a fundamental challenge to nearly all discussions about GUIs is that the term "GUI" is actually somewhat ill defined. In an attempt to resolve this, I will present some terminology that might be a better fit for this discussion than "GUI" and "TUI" or "graphical" and "command-line." In doing so I will try to keep my terminology in line with that used in the academic study of human-computer interaction, but despite the best efforts of one of my former advisors I am not an HCI scholar so I will probably reinvent terminology at least once.

The first thing we should observe is that the distinction between "graphics" and "text" is not actually especially important. I mean, it is very important, but it actually does not fundamentally define the interface. In my experience people rarely think about it this way, but it ought to be obvious: libraries such as newt can be used to create "gui-esque" programs in text mode (think of the old Debian installer as an example), while there are graphical programs that behave very much like textmode ones (think of text editors). Emacs is a good example of software which blurs this line; emacs simultaneously has traits of "TUI" and "GUI" and is often preferred in graphical mode as a result.

To navigate this confusion, I use the terms "graphics mode" and "text mode" to refer strictly to the technical output mechanism---whether raster data or text is sent to the video adapter. Think about it like the legacy VGA modes: the selection of graphics or text mode is important to user experience and imposes constraints on interface design, but does not fundamentally determine the type of interface that will be presented.

What does? Well, that's a difficult question to answer, in part because of the panoply of approaches to GUIs. Industry and researchers in HCI tend to use certain useful classifications, though. The first, and perhaps most important, is that of a functional UI versus an object-oriented UI. Do not get too tangled in thinking of these as related to functional programming or OO programming, as the interface paradigm is not necessarily coupled to the implementation.

A functional user interface is one that primarily emphasis, well, functions. A command interpreter, such as a shell, is a very functional interface in that the primary element of interaction is, well, functions, with data existing in the context of those functions. On the other hand, a modern word processor is an object oriented interface. The primary element of interaction is not functions but the data (i.e. objects), the available functions are presented in the context of data.

In a way, this dichotomy actually captures the "GUI vs TUI" debate better than the actual difference between graphics and text mode. Text mode applications are usually, but not always, functional, while graphical applications are usually, but not always, object oriented. If you've worked with enough special-purpose software, say in the sciences, you've likely encountered a graphical program which was actually functional rather than object oriented, and found it to be a frustrating mess.

This has a lot to do with the discovery and hiding of functionality and data. Functional interfaces tend to be either highly constrained (e.g. they are only capable of a few things) or require that the bulk of functionality be hidden, as in the case of a typical shell where users are expected to know the available functions rather than being offered them by the interface. Graphical software which attempts to offer a broad swath of functionality, in a functional paradigm, will have a tendency to overwhelm users.

Consider the case of Microsoft Word. I had previously asserted that word processors are usually an example of the object oriented interface. In practice, virtually all software actually presents a blend of the two paradigms. In the case of Word, the interface is mostly object-oriented, but there is a need to present a large set of commands. Traditionally this has been done by the means of drop-down menus, which date back nearly to the genesis of raster computer displays. This is part of the model or toolkit often called WIMP, meaning Windows, Icons, Menus, Pointer. A very large portion of graphics mode software is WIMP, and the WIMP concept today is exemplified by many GUI development toolkits which are highly WIMP-centric (Windows Forms, Tk, etc).

If you used Office 2003 or earlier, you will no doubt remember the immense volume of functionality present in the menu bar. This is an example of the feature or option overload that functional interfaces tend to present unless functionality is carefully hidden. It makes an especially good example because of Microsoft's choice in 2007 to introduce the "ribbon" interface. This was a remarkably controversial decision (for the same reason that any change to any software ever is controversial), but at its core it appears to have been an effort by Microsoft to improve discoverability of the Office interface through contextual hiding of the menus. Essentially, the ribbon extends the object oriented aspect of the interface to the upper window chrome, which had traditionally been a bastion of functional (menu-driven) design.

Menu-driven is another useful term here, although I tend to prefer "guided" as a term instead (this is a term of my own invention). Guided interfaces are those that accommodate novice or infrequent users by clearly expressing the available options. Very frequently this is by means of graphical menus, but there are numerous other options, one of which we'll talk about shortly. The most extreme form of a guided interface is the wizard, which despite being broadly lambasted for Microsoft's particularly aggressive use in earlier Windows versions has survived in a great deal of contexts. A much more relaxed form would be the "(Y/n)" type hints often shown in textmode applications. "Abort, Retry, Fail?," if you think about it, is a menu [2]. This guidance is obviously closely related to discoverability, basically in the sense that non-guided interfaces make little to no attempt at discoverability (e.g. the traditional shell).

Another useful term is the direct manipulation interface. Direct manipulation is a more generalized form of WYSIWYG (What You See Is What You Get). Direct manipulation interfaces are those that allow the user to make changes and immediately see the results. Commonly this is done by means of an interface metaphor in which the user directly manipulates the object/data in a fashion that is intuitive due to its relation to physical space [3]. For example, resizing objects using corner drag handles. Direct manipulation interfaces are not necessarily WYSIWYG. For example, a great deal of early graphics and word processing software enabled direct manipulation but did not attempt to show "final" output until so commanded (WordPerfect, for example).

This has been sort of a grab basket of terminology and has not necessarily answered the original question (what is a GUI?). This is partially a result of my innate tendency to ramble but partially a result of the real complexity of the question. Interfaces generally exist somewhere on a spectrum from functional to object-oriented, from guided to un-guided, and in a way from text mode to graphics mode (consider e.g. the use of Curses box drawing to replicate dialogs in text mode).

My underlying contention, to review, is this: when people talk about "GUI vs TUI," they are usually referring not to the video mode (raster or text) but actually to the interface paradigm, which tends to be functional or object oriented, respectively, and unguided or guided, respectively. Popular perceptions of the GUI vs. TUI dichotomy, even among technical professionals, are often more a result of the computing culture (e.g. the dominance of the Apple-esque WIMP model) than technical capabilities or limitations of the two. What I am saying is that the difference between GUI and TUI is a cultural construct.

Interface Standardization: CUA

In explaining this concept that the "GUI vs TUI" dichotomy is deeper than the actual video mode, I often reach out to a historic example that will be particularly useful here because of its importance in the history of the GUI---and especially the history of the GUIs we use today. That is IBM Common User Access, CUA.

CUA is is not an especially early event in GUI history but it's a formative one, and it's useful for our purposes because it was published at a time---1987---when there were still plenty of text-only terminals in use. As a result, CUA bridges the text and raster universes.

The context is this: by the late '80s, IBM considers itself a major player in the world of personal computers, in addition to mainframes and mid/minis. Across these domains existed a variety of operating systems with a variety of software. This is true even of the PC, as at this point in time IBM is simultaneously supporting OS/2 and Windows (2). While graphical interfaces clearly existed for these systems, this was still an early era for raster displays, and for the most part IBM still felt text mode to be more important (it was the only option available on their cash cow mainframes and minis).

Across operating systems and applications there was a tremendous degree of inconsistency in basic commands and interactions. This was an issue in both graphical and textual software but was especially clear in text mode where constraints of the display meant that user guidance was usually relatively minimal. We can still clearly see this on Unix-like operating systems where many popular programs are ported from historical operating systems with varying input conventions (to the extent they had reliable conventions), and few efforts have been made to standardize. Consider the classic problem of exiting vim or emacs: each requires a completely different approach with no guidance. This used to be the case with essentially all software.

CUA aimed to solve this problem by establishing uniform keyboard commands and interface conventions across software on all IBM platforms. CUA was developed to function completely in text mode, which will be somewhat surprising considering the range of things it standardized.

The most often discussed component of CUA is its keyboard shortcuts. Through a somewhat indirect route (considering the failure of the close IBM/Microsoft collaboration), CUA has been highly influential on Windows software. Many of the well-known Windows keyboard commands come from CUA originally. For example, F1 for help, F3 for search, F5 to refresh. This is not limited just to the F keys, and the bulk of common Windows shortcuts originated with CUA. There are, of course, exceptions, with copy and paste being major ones: CUA defined Shift+Delete and Shift+Insert for cut and paste, for example. Microsoft made a decision early on to adopt the Apple shortcuts instead, and those are the Ctrl+C/Ctrl+V/Ctrl+X we are familiar with today. They have been begrudgingly adopted by almost every computing environment with the exception of terminals and Xorg (but are then re-implemented by most GUI toolkits).

The keyboard, though, is old hat for text mode applications. CUA went a great deal further by also standardizing a set of interactions which are very much GUI by modern standards. For example, the dialog box with conventional options of "OK" and "OK/Cancel" come from CUA, along with the ubiquitous menu sequence of File first and Help last.

While being graphical by modern standards these concepts of drop-down menus and dialog boxes were widely implemented in text mode by IBM. From a Linux perspective, this is rarely seen and would likely be a bit surprising. Why is that?

I contend that there is a significant and early differentiation between IBM and UNIX interfaces that remains highly influential today. While today the dichotomy is widely viewed as philosophical, at the time it was far more practical.

UNIX was developed inside of AT&T as a research project and then spread primarily through universities and research organizations. Because it was viewed primarily as a research operating system, UNIX was often run on whatever hardware was available. The PDP-11, for example, was very common. Early on, most of these systems were equipped with teletypewriters and not video terminals. Even as video terminals became common, there were a wide variety in use with remarkably little standardization, which made exploiting the power and flexibility of the video terminal very difficult. The result is that, for a large and important portion of its history, UNIX software was built under the assumption that the terminal was completely line-oriented... that is, no escape codes, no curses.

IBM, on the other hand, had complete control of the hardware in use. IBM operating systems and software were virtually always run on both machines and terminals that were leased from IBM as part of a package deal. There was relatively little fragmentation of hardware capabilities and software developers could safely take full advantage of whatever terminal was standard with the computer the software was built for (and it was common for software to require a particular terminal).

For this reason, IBM terminal support has always been more sophisticated than UNIX terminal support. At the root was a major difference in philosophy. IBM made extensive use of block terminals, rather than character terminals. For a block terminal, the computer would send a full "screen" to the terminal. The terminal operated independently, allowing the user to edit the screen, until the user triggered a submit action (typically by pressing enter) which caused the terminal to send the entire screen back to the computer and await a new screen to display.

This mechanism made it very easy to implement "form" interfaces that required minimal computer support, which is one of the reasons that IBM mainframes were particularly prized for the ability to support a very large number of terminals. In later block terminals such as the important 3270, the computer could inform the terminal of the location of editable fields and even specify basic form validation criteria, all as part of the screen sent to the terminal for display.

Ultimately, the block terminal concept is far more like the modern web browser than what we usually think of as a terminal. Although the business logic is all in the mainframe, much of the interface/interaction logic actually runs locally in the terminal. Because the entire screen was sent to the terminal each time, it was uniformly possible to update any point on the screen, which was not something which could be assumed for a large portion of UNIX's rise to dominance [4].

As a result, the IBM terminal model was much more amenable to user guidance than the UNIX model. Even when displaying a simple command shell, IBM terminals could provide user guidance at the top or bottom of the screen (and it was standard to do so, often with a key to toggle the amount of guidance displayed to gain more screen space as desired). UNIX shells do not do so, primarily for the simple reason that the shells were developed when most machines were not capable of placing text at the top or bottom of the screen while still being able to accept user input at the prompt.

Of course curses capabilities are now ubiquitous through the magic of every software terminal pretending to be a particularly popular video terminal from 1983. Newer software like tmux usually relied on this from the start, and older mainstays like vi have had support added. But the underlying concept of the line-oriented shell ossified before this happened, and "modern" terminals like zsh and fish have made only relatively minor inroads in the form of much more interactive assisted tab completion.

IBM software, on the other hand, has been offering on-screen menus and guidance since before the C programming language. Well prior to CUA it was typical for IBM software to use interactive menus where the user selects an option, hierarchical/nested menus, and common commands via F keys which were listed at the bottom of the screen for the user's convenience.

While many IBM operating systems and software packages do offer a command line, it's often oriented more towards power users and typical functions were all accessible by a guided menu system. Most IBM software, especially by the '80s, provided an extensive online help facility where pressing F1 retrieved context-aware guidance on filling out a particular form or field. Indeed, the CUA concept of an interactive help system where the user presses a Help icon and then clicks on a GUI element to get a popup explanation---formerly common in Windows software---was a direct descendent of the IBM mainframe online help.

The point I intend to illustrate here is not that IBM mainframes were surprisingly sophisticated and cool, although that is true (IBM had many problems but for the most part the engineering was not one of them).

My point is that the modern dichotomy, debate, even religious war between the GUI and TUI actually predates GUIs. It is not a debate over graphical vs text display, it is a debate over more fundamental UI paradigms. It is a fight of guided but less flexible interfaces versus unguided but more powerful ones. It is a fight of functional interfaces versus object oriented ones. And perhaps most importantly, it is a competition of "code-like" interfaces versus direct manipulation.

What's more, and here is where I swerve more into the hot take lane, the victory of the text-mode, line-oriented, shell interface in computer science and engineering is not a result of some inherent elegance or power. It is an artifact of the history of computing.

Most decisions in computing, at least most meaningful ones, are not by design. They are by coincidence, simultaneously haphazard but also inevitable in consideration of the decades of work up to that point. Abstraction, it turns out, is freeing, but also confining. Freeing in that it spares the programmer thinking about the underlying work, but confining in that it pervasively, if sometimes subtly, steers all of us in a direction set in the '60s when our chosen platform's lineage began.

This is as true of the GUI as anything else, and so it should be no surprise that IBM's achievements were highly influential, but simultaneously UNIX's limitations were highly influential. For how much time is spent discussing the philosophical advantages of interfaces, I don't think it's a stretch to say that the schism in modern computing, between the terminal and everything else, is a resonating echo of IBM's decision to lease equipment and AT&T's decision to make UNIX widely available to academic users.

Some old GUIs

Now that we've established that the history of the GUI is philosophically complex and rooted in things set in motion before we were born, I'd like to take a look at some of the forgotten branches of the GUI family tree: GUI implementations that were influential, technically impressive, or just weird. I've already gone on more than enough for one evening, though, so keep an eye out for part 2 of... several.

[1] It is going to take an enormous amount of self-discipline to avoid turning all of this into one long screed about Gnome 3, perhaps the only software I have ever truly hated. Oh, that and all web browsers.

[2] Menus in text mode applications are interesting due to the surprising lack of widespread agreement on how to implement them. There are many, many variations across commonly used software from limited shells with tab completion to what I call the "CS Freshman Special," presenting a list of numbered options and prompting the user to enter the number of their choice. The inconsistency of these text mode menus get at exactly the problem IBM was trying to solve with CUA, but then I'm spoiling the end for you.

[3] This is a somewhat more academic topic than I usually verge into, but it could be argued and often is that graphical software is intrinsically metaphorical. That is, it is always structured around an "interface metaphor" as an aid to users in understanding the possible interactions. The most basic interface metaphor might be that of the button, which traditionally had a simple 3D raised appearance as an affordance to suggest that it can be pressed down. This is all part of the puzzle of what differentiates "GUI" from "TUI": graphical applications are usually, but not always, based in metaphor. Textmode applications usually aren't, if nothing else due to the constraints of text, but it does happen.

[4] This did lead to some unusual interaction designs that would probably not be repeated today. For example, in many IBM text editors an entire line would be deleted (analogous to vim's "dd") by typing one or more "d"s over the line number in the left margin and then submitting. The screen was returned with that line removed. This was more or less a workaround for the fact that the terminal understood each line of the text document to be a form field, and so there was some jankiness around adding/removing lines. Scrolling similarly required round trips to the computer.


>>> 2021-05-19 telephone turrets

Let's talk about a bit of telephone history. Again. Normally, I am more interested in the switching equipment and carriers and not so much in the instruments---that is, the things that you plug in at the end of the line. There are a few that really catch my eye, though, and one of them is of course the phenomena of the trading turret.

A trading turret is a specialized telephone-like device typically used by day traders. The somewhat useless Wikipedia article describes a trading turret as being a specialized key system, which is useless to most people today as key systems are no longer common and few people know what they are. Nonetheless, it is basically true. I will leave out much discussion of key systems here because I will probably talk about them in depth in the future, but a basic explanation is that a key system allows users at multiple telephone instruments to each access all outside lines. This was a popular setup for businesses that were large enough to need multiple outside lines but too small to have a dedicated telephone operator, from their introduction in the 1930s to the development of affordable small PABXs[1] in the '90s.

Key systems still occasionally appear today and the topic can become somewhat muddled because late key systems tended to have "PABX features" and many PABXs, especially in the IP world, have "key system features." But the basic difference can be explained something like this: a PABX connects multiple users to each line, while a key system connects multiple lines to each user. They were often used for similar purposes with the difference being largely one of implementation, but key systems do have their specific niches.

One of those is the item we for some reason call a turret. The term turret is used today almost exclusively to refer to the item made for the securities industry, a trading turret. These formidable tanks of phones often provide multiple handsets and speakers and are more or less identified by a touchscreen or large set of soft buttons that allow one-touch access to a large number of contacts.

These are superficially similar to a large set of line buttons such as is seen on the "receptionist sidecar" available for many business phones---an extra plug-in module that offers a big set of line buttons which can be configured as speed-dials or even one-touch unattended transfers, so that a receptionist can easily transfer calls or call up for people without having to dial extensions all the time. However, turrets are more than just phones with a lot of line buttons.

It kind of raises the question: what is a trading turret? What really differentiates one from, say, a digital PABX phone with a sidecar?

This is just the kind of thing I contemplate in my private moments, but the issue came to the front of my mind when someone provided a mailing list I am a member of with an interesting document [2]. It is the 1974 Bell System Practice (BSP, basically a Bell System standard operating procedure) for the SAC Main Operating Base Turret. BSP 981-202-100 if you are particularly interested.

The document describes a desk-wide system with ten color-coded handsets used at a Strategic Air Command base to give a communications operator quick access to primary and redundant versions of multiple communications lines. For flavor, two of these handsets were red and corresponded to primary and secondary four-wire leased line circuits used for the SAC Primary Alerting System, used to deliver emergency action messages. Here we have a real red telephone, but not to Moscow.

This makes it clear that the term "turret" is not specific to the finance industry, which was actually a bit of a surprise to me. Where, then, did we get the turret as a type of telephone instrument?

The first usage I have found is the Order Turret No. 1, introduced by the Bell system sometime in the early 1930s (exactly date unclear). The No. 1 is essentially a small manual (cord-and-plug) exchange that accommodates multiple user "positions." A series of subsequent Order Turrets, up to at least the No. 4, were produced in the first half of the century.

I was initially a bit unclear on the application of these devices (I found BSPs on them, but these have a great way of describing maintenance and repair in detail without ever saying what the thing is for) until I found an article in the Bell Laboratories Record, an employee magazine, of 1938. The article describes the use of the No. 4, now a more compact design which can be scaled to an arbitrary number of operators, as it was used at Macy's. It is called an Order Turret, it turns out, because it is used to place orders.

The system looks something like this: 20 (or another number, but we'll say 20, which was the capacity of the apparently common Order Turret Number 2) outside lines are assigned sequential numbers at the telephone exchange with busy fall through such that a call to the first line, if it is in use, will connect to the next line, and so on until a free line is found. At the turret, the call "appears" on a jack in front of each attendant. Whichever attendant is not currently busy can insert a plug to answer the call. In this way, the turret system allows a pool of attendants to collectively answer a pool of incoming lines.

But there's more: these attendants are taking telephone orders in a department store, where the actual stock is out on the floor in various departments. So if a customer asks about a particular item, the attendant can insert a plug into a jack for an internal line to that department, ringing a phone on the floor so that the attendant can speak with a salesperson to confirm availability and have the item set aside. The turret is used not only to answer calls, but to simultaneously manage multiple calls between different parties.

So far as I can tell, this is the defining feature of a turret: a turret isn't just used to handle multiple lines (that's a key telephone). A turret isn't just used to have rapid access to many speed dials (that's a receptionist sidecar). A turret is used to make multiple simultaneous calls, by someone who must quickly relay information between multiple parties. Like the telephone order attendant at an old-fashioned department store, the person on communications duty at a SAC command, or an investment banker.

This explains of course why both legacy and modern turrets often feature multiple handsets (the original Order Turrets did not, but the attendant wore a headset that they would move between jacks instead). As telephone systems have become more sophisticated, turrets have as well, and modern turrets often use IP connectivity to provide a mix of features like squawk boxes (permanently open conference lines), presence information, and a feature with various names (sometimes called automatic ringdown although this is not quite accurate) that allows one trader at a turret to call another trader at a turret with no ringing---the call just connects immediately, much like an intercom. All of this can be done very quickly, because the turret provides a large set of pre-programmed buttons for all the people the user is likely to want to contact.

You can already see that the application I've described for these early turrets, of order taking, could be handled differently. An obvious enhancement is to actively distribute calls to available attendants instead of presenting calls at all attendant stations and waiting for someone to pick up. Indeed, the Order Turret No. 4 did exactly this, actively "pushing" each incoming call to an available attendant. This increase in sophistication, to actively routing calls, really blurred the line between the order turret and the PABX, which Bell was well aware of. The No. 4 was less an order turret in the sense of previous designs, and more just a feature of a PABX.

The order lines, instead of being dedicated lines going straight to turrets, were just the normal incoming lines of the business PABX. The business PABX allocated calls to attendants sitting at the No. 4 stations. This is basically how modern inward call center systems work, and it seems that over time the concept of the "order turret" faded away as these call center queue systems became just another feature of a PABX.

Turrets found few niches in which to hold on. The SAC command turret tells a bit of a story about the close relationship between the Cold War defense complex and the Bell System. Large portions of SAC infrastructure were essentially contracted to AT&T, and so AT&T apparently drew on their background with order turrets in developing the concept for the SAC communications system, which in its totality consisted of a dizzying number of two- and four-wire leased lines and radio links unified by these eight-foot-wide turrets. They even controlled the sirens.

No doubt there were other turrets designed by Ma Bell, although I have struggled to find them. The Order Turret series seems to have died away by the mid-century, but the SAC command turret likely remained in use into the '80s at least. Can we find any others?

An obvious application for a turret-like system is in police and fire dispatch, where in many smaller communities emergency calls were taken directly by the dispatcher who then had to relay information on the radio. Indeed, various vendors have sold telephone equipment for dispatch and public safety answering points (PSAPs, where 911 is answered) described as turrets, but the terminology does not seem to have caught on as strongly in that field. I would suspect this is because radio equipment was often more important in these early dispatch centers than telephone, and indeed the complex communications consoles in public safety dispatch centers usually come from radio vendors (e.g. Motorola) rather than telephone. Radio vendors usually just call these "dispatch consoles" and they have gone through a similar evolution from electromechanical to IP.

There is one exception which stands out: in the city of Boston, the central dispatch office is apparently colloquially referred to as "the turret." The recordings of police radio traffic, sometimes used as evidence in court, are often referred to as "turret tapes" in Massachusetts. I am not certain that the terms are related but it would seem likely; I would speculate that at some point in history the concept of dispatchers using turrets turned into dispatchers working at the turret.

The funny thing here is that I've gone on for a long time without addressing my original interest: why is it called a turret? Well, after all the digging through BSPs and newspaper archives and a whole detour through court records, I still haven't quite answered that question. No one seems to have written down an etymology.

All I can offer is this theory: while turret most directly refers to a tower, through the path of gun turrets it has also come to refer to something that rotates (e.g. in the case of a turret lathe). The original Order Turrets consisted of a rectangular table around which four attendants would sit, two on each side. Perhaps they were called turrets because the attendants sat in a circle and the duty to answer the next call rotated around.

Just a guess.

[1] Today we usually just use the term PBX, for Private Branch eXchange. I specify PABX, for Private Automatic Branch eXchange, in the historical context because for decades the term "PBX" referred mostly to manual boards with dedicated operators, which used to be common in businesses, hotels, etc. A PABX is an automatically switched system, basically the PBX equivalent of the introduction of dialing.

[2] I am leaving the person and list here anonymous out of respect for the community's privacy, although it is an excellent resource if you're interested in these topics and I feel a bit bad for not giving credit. The name of the list rhymes with, uhh, OldDoorBombs.


>>> 2021-05-10 lightweight as in ldap

Programming note: I have posted two videos to my poorly-tended YouTube account. They are part two of the video about Manzano base, and a rough version of a conference presentation on security of aviation radionavigaton technologies.

I've mentioned LDAP several times as of late. Most recently, when I said I would write about it. And here we are! I will not provide a complete or thorough explanation of LDAP because doing so would easily fill a book, and I'm not sure that I'm prepared to be the kind of person who has written a book on LDAP. But I will try to give you a general understanding of what LDAP is, how it works, and why it is such a monumental pain in the ass.

I've also mentioned it, though, in the context of the OSI protocols. This is because LDAP is a direct descendent of one of the great visions of the OSI project: a grand, unified directory infrastructure with global addressability and integration with the other OSI protocols. This is an example of the ambition and failure of the OSI concept: in practice, directory services have proven to be fairly special-purpose, limited to enterprise environments, and intentionally limited in scope (e.g. kept internal for security reasons). OSI contemplated a directory infrastructure which was basically the opposite in every regard. It did not survive to the modern age, except in various bits and pieces which are still widely used in... once again, crypto infrastructure. Common crypto certificate formats are ASN.1 serialized (as we mentioned last week) because they are from the OSI directory service, X.500.

Before we get into the weeds, though, let's understand the high level objectives. What even is a directory, or a directory service?

It's a digital telephone directory.

This answer is so simple and naive that it almost cannot be true, and yet it is. Remember that the whole OSI deal was in many ways a product of the telephone industry, and that the telephone industry has always favored more complex, powerful, integrated solutions over simpler, independent, but composable solutions. One thing the telephone industry knew well, and had a surprisingly sophisticated approach to, was the white pages.

If you think about it, the humble telephone directory was a surprisingly central component of the bureaucracy of the typical 1970s enterprise. Today, historians often review archived institutional and corporate telephone directories as a way to figure out the timelines of historical figures. Corporate histories often use the telephone directory as a main organizing source, since it documents both the changing staff and the changing structure of the organization (traditional corporate directories often had an org chart in the front pages to boot!).

Across the many functional areas of a business, the telephone directory was a unifying source of truth---or authority---for the structure and membership of the organization. For consumer telephone service, directories had a less complex structure but were an undertaking in their own way due to the sheer number of subscribers. Telephone providers put computers to work at the job of collecting, sorting, and printing their subscriber's directory entries very early on. The information in the published white pages was an excerpt or report from the company's subscriber rolls, and so was closely tied to other important functions like billing and service management.

Inside the industry, the directory referred to all this and more: the unified, authoritative information on the users of the system.

This concept was extended to the world of computing in the form of X.500 and its accompanying OSI network protocols for access to X.500 information. At its root, LDAP is an alternative protocol to access X.500, and so there are substantial similarities between X.500 and the X.500-like substance that we now refer to as an LDAP server. In fact, there is no such thing as an "LDAP server" in the sense that LDAP remains a protocol to access an X.500 compliant directory, but in practice LDAP is now usually used with backends that were designed specifically for LDAP and avoid much of the complexity of X.500 in the sense of the OSI model. The situation today is such that "X.500" and "LDAP" are closely related concepts which are difficult to fully untangle; X.500 is very much alive and well if you accept the caveat that it is only used in the constrained form of corporate directories accessed by alternative methods [1].

The basic structure of X.500 is called the Directory Information Tree, or DIT. The DIT is a hierarchical database which stores objects that possess attributes, which are basically key-value pairs belonging to the object. Objects can be queried for based on their attributes, using a form called the Distinguished Name. DNs are made up of a set of attributes which uniquely identify an object at each level of the hierarchy. For example, an idealized X.500 DN, in the same notation as used by LDAP/LDIF (notation for DNs varies by X.500 protocol), looks like this: cn=J. B. Crawford,ou=Blogger,o=Seventh Standard,c=us. This DN identifies an object by, from top to bottom, country, organization, organizational unit, and common name. Common name is an attribute which contains a human-readable name for the object and is, conventionally, widely used for the identification of that object.

Note some things about this concept: first, the structure is rooted in the US. How does the namespace work, exactly? Who determines organizations under countries? Originally, X.500 was intended to be operated much like DNS, as a distributed system of many servers operating a shared namespace. Space in that namespace would be managed through a registry, which would be SRI or Network Solutions or whatever [2].

Second, this whole concept of identifying objects by attributes seems like it's very subject to conventions. It is, but you must resist the urge to hear "hierarchical store of objects with attributes" and think of X.500 as being a lightly-structured, flexible data store like a modern "NoSQL." In reality it is not, X.500 is highly structured through the use of schemas.

We mostly use the term "schema" when talking about relational databases or markup languages. X.500 schemas serve the same function of describing the structure of objects in the DIT but look and feel different because they are highly object-oriented. That is, an X.500 schema is made up of classes. Classes can be inherited from other classes, in which case their attributes are merged. Resultingly there is not only a hierarchy of data, but of types. Objects can be instances of multiple classes, in which case they must provide the attributes of all of those classes, which may overlap. It's seemingly simple but can get confusing very fast.

Let's illustrate this by taking a look at a common X.500 class: organizationalPerson, or What's up with that number? Remember the whole snmp thing? Yes, X.500 makes use of OIDs to, among other things, identify classes. That said, we commonly (and especially in the case of LDAP) deal only with their names.

While organizationalPerson does not require any attributes, it suggests things like these:

You will notice that this list is dated, and missing obvious things like name. The former is because it is in fact very old, the latter is because organizationalPerson is an auxiliary class and so is intended to be applied to objects only in addition to other classes. Namely, organizationalPerson is usually applied to objects alongside Person, which has some basics like:

You will notice that this class both overlaps with organizationalPerson on telephoneNumber, but also has some odd things like assistant that seem to be specific to an organization. Why the two different classes, then? Conway observed that the structure of systems resembles the structure of their creators; X.500 is no exception. organizationalPerson was written more as part of an effort to represent organizations than as part of an effort to represent people, these two efforts were not as well harmonized as you would hope for.

An object has a "primary" or "core" type. This is referred to as its structural class, and the class itself must be specially marked as structural. This is important for several reasons that are mostly under the hood of the X.500 implementation, but it's useful to know that Person is a structural class... so an X.500 entry representing a human being should have a core type of Person, but in most cases will have multiple auxiliary types bolted on to provide additional information.

That's a lot about the conceptual design of X.500... or really just the core concept of the data structure, ignoring basically the entire transactional concept which is more complicated than you could ever imagine. It's enough to get more into LDAP, though.

Before we go fully into the LDAPverse, though, it's useful to understand how LDAP is really used. This swerves right from OSI to one of my other favorite topics, Network Operating Systems [3].

For a group of computers to act like a unified computing environment, they must have a central concept of a user. This is most often thought of in the context of authentication and authorization, but a user directory is also necessary to enable features like messaging. Further, the user directory itself (e.g. the ability to use the computer as a telephone directory) is considered a feature of a network computing environment in its own right.

In almost all network computing environments, this user directory is descended from X.500. This is seen in the form of Microsoft Active Directory for Windows (modern Windows does not actually use LDAP to interact with the AD domain controller, but instead a different directory access implementation called NT LAN Manager or NTLM), and LDAP for Linux and MacOS (we will not discuss NIS for Linux now, but perhaps in the future).

In these systems, the directory server acts as the source of basic information on the user. Consider another important LDAP class, PosixAccount. PosixAccount adds attributes like uid, homeDirectory, and gecos that reflect the user account metadata expected by POSIX [4]. It is possible to perform authentication against LDAP as well, but it comes with limitations and security concerns that make it uncommon in practice for operating systems. Both Windows and Unix-like environments now generally use Kerberos for authentication.

Many things have changed in the transition from the grand vision of X.500 to the reality of LDAP for information on user accounts. First, the concept of a single unified X.500 namespace has been wholly abandoned. It's complex to implement, and it's not clear that it's something anyone ever wanted, anyway, as federation of directories between organizations brings significant security and compliance concerns.

Instead, modern directories usually use DNS as their root organizational hierarchy. This basically involves cramming shim objects into the DIT that reflect the DNS hierarchy. The example DN I mentioned earlier would more often be seen today as cn=J. B. Crawford,dc=computer,dc=rip. dc here is Domain Component, and domain components are represented in the same order as in DNS because LDAP uses the same confused right-to-left hierarchical representation (AD does it the correct way around).

Another major change has been to the structure. The original intention was that the X.500 hierarchy should represent the structure of the organization. This is uncommon today, because it introduced a maintenance headache (moving objects around the directory as people changed positions) and didn't have a lot of advantages in practice. Instead LDAP objects are more commonly grouped by their high-level purpose. For example, user accounts are often placed in an OU called "accounts" or "users." All in all, this marks a more general trend that LDAP has become a system only for software consumption, and there is minimal concern today about LDAP being browseable by human users.

So let's consider some details of how LDAP works. First off, LDAP is a binary protocol that uses a representation based on ASN.1. That said, LDAP is almost always used with LDAP Data Interchange Format, or LDIF, which is a textual representation. So it's very common to talk about LDAP "data" and "objects" in LDIF format, but understand that LDIF is just a user aid and is not how LDAP data is represented "in actuality."

LDAP provides more or less the verbs you would expect: ADD, DELETE, MODIFY. These are not especially interesting. The SEARCH operation, however, is where much of the in-use complexity of LDAP resides. SEARCH is a general-purpose verb to retrieve information from an LDAP DIT, and it is built to be very flexible. At its simplest, SEARCH can be invoked with a baseObject (a DN) and a scope of BaseObject, which just causes the server to return exactly the object identified by the DN.

In a more complex application, SEARCH can be invoked with a base path representing a subtree, a scope of wholeSubtree (means what it says), and a filter. The filter is a prefix-notation conditional statement that is applied to each candidate object; objects are only returned if the filter evaluates to true.

We can put these SEARCH concepts together into a very common LDAP SEARCH application, which is locating a user in a directory. A common configuration for a piece of software using LDAP for authentication would be:

baseObject: ou=users,dc=computer,dc=rip

scope: singleLevel

filter: (&(objectClass=PosixAccount)(uid=$user))

The $user here is a substitution tag which will be replaced by the user's username. Confusingly, in the PosixAccount class, uid refers to the user name while uidNumber is the value we usually refer to as uid.

A real headache comes about with groups. In authorization applications like RBAC, you commonly want to get the list of groups a user is a member of to make authorization decisions. There are multiple norms for representing groups in LDAP. Groups can have a list of accounts which are members, or accounts can have a list of groups they are a member of. Both are in common use, generally the former for Windows and the latter for UNIX-likes. This is where the flexibility of the filter expression becomes important: whatever "direction" the LDAP server represents the relationship, it's possible to go "the other way" by querying for the object type that contains the list with a filter expression that the list must contain the thing you're looking for. Because finding all users in a group is a less common requirement than finding all groups a user is in, a lot of LDAP clients in practice make somewhat narrow assumptions about how to find users but provide a more general (but also more irritating) configuration for finding group information [5].

Another complexity of LDAP in practice is authentication. A last important LDAP verb is BIND, BIND is used to assume the identity of a user in the directory. While anonymous access to LDAP is common, modern directory servers implement access control and limit access to sensitive values like password hashes to the users they belong to, for obvious reasons. This means that the formerly common approach of anonymously querying for a user to get their password hash and then checking the password should never be seen or heard of today. Instead, user authentication is done via BIND: the LDAP client attempts to BIND to the user (as an LDAP object) using the password provided by the user. If the server allows it, the user apparently provided the correct password. If the server doesn't allow it, the user better try again. In this way, the actual authentication method is the authentication method of the LDAP server itself [6].

There's a problem, though. Or rather, two. First, for security reasons, it's not necessarily a great idea to allow users to query for complete group information, and depending on how group membership is represented it is not necessarily practical to use access controls to allow a user to access only the group information they should know about. Second, applications often have a need to access directory information at points other than when a user is actively logging in and the application has access to their password. For obvious reasons it is not a good idea for the application to store the user's password in plaintext for this purpose.

The solution is an irritating invention usually called a "manager." The manager is a non-person account (also called a system account) that an LDAP client uses in order to BIND to the LDAP server so that it is permitted to read information that is not available for anonymous query. Most commonly this is used for getting a user's group memberships. This is a particularly common setup because a lot of applications need access to user group information fairly frequently and do not strongly abstract their user information access, so they "cache" group information and update it from the LDAP server periodically---outside of the context of an authenticating user.

Very frequently this takes the form of periodically "synchronizing" the application's existing local user database with the LDAP server, a lazy bit of engineering that causes endless frustration for administrators but is also difficult to avoid as the reality is that the concepts of "user" and "group" simply vary far too widely between applications to completely centralize all user information in one place.

As mentioned earlier, all of the methods of authenticating against LDAP have appreciable limitations. For this reason, Kerberos is generally considered the superior authentication method and "real" LDAP authentication is not common at the OS level. That said, Kerberos configuration and clients are relatively complex, which is probably the main reason that many non-OS applications still use direct LDAP authentication.

In practice, directory servers are not usually set up as a standalone package. Usually they are one facet of a larger directory system or identity management system. Popular options are Microsoft Active Directory and Red Hat IDM (based on FreeIPA), but there are a number of other options out there. Each of these generally implement a directory service alongside a dedicated authentication service (usually Kerberos because it is powerful and well researched), a name service (DNS), and some type of policy engine. DNS might initially be surprising here, as it does not at first glance seem like a related concern. However, in practice, directory systems represent device just as much as people. Because each host needs to have a corresponding directory entry (particularly important with Kerberos where hosts need the ability to authenticate to other network services on their own), it's already necessary to maintain host information in the directory service which makes it a natural place to implement DNS. DHCP is also sometimes implemented as part of the directory service because there is overlap between the directory management functions and basic host management functions of DHCP, but this seems to be less common today because in enterprise orgs DHCP is more often part of an IPAM solution (e.g. Infoblox).

You might be surprised to hear that there are all of these inconsistencies and differences in LDAP implementations considering my claim that X.500 is strongly typed against schemas. The nature of this contradiction will be obvious to any DBA: for any non-trivial application, the schema will always be both too complex and not complex enough. The well-established X.500 and LDAP schemas, published for example in RFCs, don't have enough fields to express the full scope of information about users needed in any given application. Simultaneously, though, they provide so many types and attributes that there are multiple ways to solve a given problem. Any attempt to reduce one problem will inevitably make the other worse.

The long history of these systems only makes the problem more complicated, as there are multiple and sometimes conflicting historic schemas and approaches and it's hard to get rid of any of them now. For this reason identity management solutions often come with some sort of "quick ref" documentation explaining the important aspects of the LDAP schema as they use it, to be used as an aid in configuring other LDAP clients.

I'm going to call this enough on the topic of LDAP for now... but there will be a followup coming. For me, this whole discussion of complex enterprise directory solutions raises a question: can we have the advantages of a directory service, namely a unified sense of identity, in a consumer environment?

The answer is yes, through the transformation of all software into a monthly subscription, but I want to talk a bit about the history of attempts at bringing the dream of the NOS to the home. Microsoft has tried at least a half dozen times and it has never really worked.

[1] As an example of this ontological complexity, Microsoft Active Directory is sometimes referred to as being an LDAP server or LDAP implementation. This is not true, but it's also not untrue. It is perhaps more accurate to say that "Active Directory is an implementation of a modified form of X.500 which is commonly accessed using LDAP for interoperability" but that's a mouthful and probably still not quite correct.

[2] Have I written about this here before? While IANA was long operated by Jon Postel who was famously benevolent, the function of ICANN was tossed around defense contractors for a while and then handed to Network Solutions, who turned out to be so comically evil that the power had to be taken away from them. ICANN didn't turn out much better. It's a whole story.

[3] Requisite explanatory footnote about network operating systems (NOS): the term has basically changed in definition midway through computer history. Today NOS generally refers to operating systems written for network appliances, like Cisco IOS. Up to the mid-'90s, though, it more commonly referred to a general-purpose operating system that was built specifically to be used as part of a network environment, such as Novell Netware. The salient features of NOS such as centralized user directories, inter-computer messaging, and shared access to storage and printers are present in all modern operating systems (sometimes with implementations borrowed from historic NOS) and so the use of the term NOS in this sense has faded away.

[4] This whole thing gets into some weird UNIX history, particularly the gecos and the aspect of LDAP's UNIX-nerd cousin NIS. Maybe that'll be a post some time.

[5] For how closely connected the concept of users and groups seems to be, this issue of the user->group query being irritatingly difficult is remarkably common in identity systems, even many modern "cloud" ones. Despite being a common requirement and one of the conceptually simpler options for authorization RBAC does not generally seem to be a first-class concern to the designers of directories.

[6] It's possible to use a wide variety of network services for authentication in this way, by just passing the user's credentials on and seeing if it works. I have seen a couple of web applications offer "IMAP authentication" in that way, presumably because small organizations are more likely to have central email than LDAP.


>>> 2021-05-01 simple as in snmp

Very early on in my career as an "IT person," when my daily work consisted primarily of photocopier and laptop warranty service with a smattering of Active Directory administration (it was an, uh, weird job), I was particularly intimidated by SNMP. It always felt like one of those dark mysteries of computing that existed far beyond my mortal knowledge, like distributed algorithm optimization or modern JavaScript.

The good news is that SNMP is actually, as the name suggests, quite simple. The reason for my SNMP apprehensions is a bit silly from the perspective of computer science: SNMP makes extensive use of long, incomprehensible numbers. That is, of course, basically a description of all of computing, but SNMP exposes them to users in a way that modern software generally tries to avoid.

Today, we're going to learn about SNMP and those numbers. Surprise: they're an emanation of an arcane component of the OSI stack, like at least 50% of the things I talk about.

But let's step back and just talk about SNMP at a high level. SNMP was designed to offer a portable and simple to implement method for a manager (e.g. an appliance or administrator's workstation) to inspect the state of various devices and potentially change their configuration. It's intended to be amenable to implementation on embedded systems, and while it's most classically associated with network appliances there is a virtually unlimited number of devices and software packages which expose an SNMP interface.

SNMP often acts as a "lowest common denominator:" it's a simple and old protocol, so just about everything supports it. This makes it very handy for getting heterogeneous devices (especially in terms of vendor) into one monitoring solution, and sometimes allows for centralized configuration as well, although that gets a lot trickier.

At its core, SNMP belongs to a category of protocols which I refer to as remote memory access protocols (this is my taxonomy and does not necessarily reflect that of academic work or your employer). These are protocols which allow a remote host to read and (possibly subject to access controls) write an emulated memory address space. This does not necessarily (and often doesn't) have anything to do with the actual physical or virtual memory of the service, and the addressing scheme used for this memory space might be eccentric, but the basic idea is there: the "server" has memory addresses, and the protocol allows you to read and write them.

These remote memory access protocols, as a category, tend to be very common with embedded systems because if they do happen to align with physical memory, they are very simple to implement. A prominent example is Modbus, a common industrial automation protocol that consists of reading and writing registers, coils, etc., which are domain-specific terms for addresses in the typed memory of PLCs (historically these were physical addresses in the PLC's unusually structured memory, but today it's generally just a software construct running on a more general-purpose architecture).

Unsurprisingly, then, the basic SNMP "verbs" are get and set, and these take parameters of an address and, if setting, a value. On top of this very simple principle, SNMP adds a more sophisticated feature called a "trap," but we'll talk about that later. Let's call it an "advanced topic," although it's actually one of the most useful parts of SNMP in practical situations.

What is perhaps most interesting to consider, as far as arcane details of SNMP, is the structure of the addresses. This is the scary part of SNMP: just about the first time you have to interact directly with SNMP you will encounter an address, called a variable or more properly object identifier (OID) in SNMP parlance, like . It's like an IP address, if they were substantially less user-friendly. That is to say, an IPv6 address [1].

These OIDs are in fact hierarchical addresses in a structure called the Management Information Base (MIB). The MIB is an attempt to unify, into one data structure, the many data points which could exist across devices in a network. This idea of a grand unification of the domain of knowledge of "configuration of network appliances" into one unpleasant numbered hierarchy has a powerful smell of golden era Computer Science with a capital CS, and indeed it is!

You see, from a very high level, the MIB is actually viewed as something akin to a serialization format---it is, after all, fundamentally concerned with packing the state of a device (Management Information) into a normalized, strictly structured, interoperable format. To achieve this, the MIB is described using something called SMI (e.g. RFC2578), which is best understood as a simplified (or perhaps more formally "constrained") flavor or ASN.1.

ASN.1 is the most prominent of the interface description and serialization formats developed for the OSI protocol suite. You might be tempted to call ASN.1 an example of the "presentation layer," although like most invocations of the OSI model, you would be misunderstanding the OSI model in saying so (the OSI presentation layer protocols are, as the name suggests but is often ignored, full on request-reply network protocols, not just serialization formats). Nonetheless, people say this a lot, and at least ASN.1 truly dates back to OSI, unlike a lot of things people relate to the OSI model.

You might be familiar with ASN.1 because it is widely used in cryptography, and by this I mean that cryptography applications are widely saddled with ASN.1. Most cryptographic certificates, the formats we tend to variously (and confusingly) call X.500, PKCS#11, DER, PEM, etc, are ASN.1 serialized. This is a whole lot of fun since ASN.1 is significantly divergent from modern computing conventions, including the use of length-prefixed rather than terminated strings (in some cases). I bring this up because it has lead to a rather famous series of vulnerabilities in TLS implementations, because apparently not even the people implementing TLS have actually read the ASN.1 specification that closely.

Anyway, back to SMI. Basically, SMI allows vendors of devices (or anyone really) to write, in SMI, a description of an MIB "module." A "module" is basically a list of OIDs (hierarchically structured) with their types and other metadata. This SMI source is then compiled into the binary representation actually used by SNMP clients. If you are unlucky, you may need to write SMI yourself for devices whose vendors implemented SNMP but did not provide the supporting materials. But, in most cases, device vendors provide a file (commonly called an MIB file) which is the SMI description of the MIB module(s) implemented by the device. This MIB file can then be fed to your SNMP tool to be compiled into its "whole picture" binary MIB.

Knowing that it is a result of compiling together SMI produces by various vendors, let's take a look at the structure of the MIB. Each dot-separated number identifies a subtree, which for extra fun are called "arcs" in the context of the MIB. At the very top of the OID hierarchy is a top level which identifies the standards authority. This is 0, 1, or 2, which refer to ITU, ISO, and ITU/ISO together, respectively. Of course these three parts of the tree use different internal structures so I can't generalize past this point, but I will focus on the ISO tree because it's the one most commonly used in practice.

Under the .1 ISO hierarchy are arcs for ISO standard OIDs, registry authorities (somewhat difficult to explain and also not widely used, basically a metadata space), ISO member organizations by country (e.g. ANSI in the US), and then identified organizations, which are just companies and organizations that have asked for OID space. This can be somewhat confusing because many national ISO member organizations also allocate OID space within their arcs, but major vendors (e.g. Cisco) are often found at this top level instead.

So let's take a look at a somewhat arbitrary example, an MIB for Juniper's Junos. I'm using this as an example rather than the more obvious Cisco IOS because I got mad at Cisco's website for getting MIBs which did not appear to have seen an update in a decade. In any case, the MIB starts out at .

In terms of the hierarchy this means: ISO standard, identified organization, DOD, internet, private projects, private enterprises, Juniper.

Haha, wait, that just goes against most of what I said. What's going on with the DOD thing?

The entire Internet, big-I, TCP/IP world is considered to be a subset of the DOD, for OID purposes. This . space is actually managed by IANA, and if you would like your own . number they will be happy to give you one upon application.

This is all particularly interesting historically, because unlike a lot of protocols I talk about SNMP does not predate IP. It was designed specifically for use on IP networks, over UDP. SNMP is based on several earlier protocols also used with IP. So, where does this weird rendition of IP to a small subset come from?

Well, it really has more to do with politics than technology. The MIB tree essentially belongs to ITU and ISO, but ITU and ISO are both organizations which are not especially known for swiftly and cheaply adopting standards proposed by vendors. It was fairly obvious from an early stage that vendors would need to produce MIB modules for their own devices fairly quickly, but ISO and ISO member organizations were not especially enthusiastic about issuing a large number of arcs to these vendors. So instead, IANA stepped in---but not quite IANA yet, instead IANA's predecessor, Jon Postel. Postel, who was the IANA for quite some time, worked on contract for DOD, and so he assigned OIDs out of their space. There's no really good reason for it to be this way, but if you work with SNMP a lot then typing . will have become basically reflex.

Now, what is found inside of this Juniper space? Well, for example, there's . This is an integer value which provides the average power used, in watts, by whatever's plugged into a particular outlet of a managed PDU. The MIB structure allows OIDs which contain other OIDs (object identifier type OIDs) to actually contain tables of those OIDs, so . is a table of all of the outlets on the PDU, and . within it is a list of useful properties of the outlet such as name, status, and various useful electrical measurements like current and power factor.

After all of this talk of ASN.1 and MIBs and etc, these examples are actually very useful and concrete. SNMP is, after all, actually a useful protocol for real-world situations, such as centralized monitoring of your PDUs to identify problems and catch your colo customers exceeding their power budgets.

And remember, SNMP even allows writing. So ., the status of the outlet, can not only be used to determine whether the outlet is on or off but also to turn the outlet on or off, which is a fun move when your colo customer doesn't pay their bill for months.

SNMP is not limited to as concrete of devices as managed PDUs. For example, RFC4113 provides an MIB for UDP. That is, it permits you to describe and modify UDP messages using SNMP, if that's a thing you really want to do. In fact, the entire concept of the MIB is far more general than SNMP, and ISO protocols and standards often use MIB OIDs for identification purposes having little to do with the application we're discussing here. For example, many MIME types have an associated OID because the OSI email equivalent, X.435, uses OIDs to identify the types of message parts. In general, OSI standards are lousy with OIDs used as identifiers and, less frequently, to describe data structures and field sets.

The fact that you can set via SNMP, and get all kinds of potentially sensitive questions, raises the concern of security. Fortunately, SNMP provides an airtight solution to this problem: "communities." A community is really just a shared password, if the SNMP manager has the same community string as the SNMP agent then it is allowed access. Even better, many SNMP agents have well-known default community strings. Perfect. To be fair, SNMPv3 adds a more rigorous authentication support including support for different authentication methods, but there are still plenty of SNMPv2 devices out there with community string set to "public."

One final thing to complete our discussion of SNMP is to mention the trap. More technically, I am going to conflate traps and inform requests which are actually slightly different, but everyone conflates them so I feel okay about it. A trap is an extremely useful feature of SNMP which allows you to configure an agent (e.g. device) to immediately inform a manager when certain events occur. This is essentially a basic alarm capability built in to many devices. Traps are identified by OIDs, and can bind other OIDs, so that the generated trap message includes not only which trap was triggered, but also some other related data if so configured. To be complete, an inform request is really just a trap where the agent acknowledges receipt (this is not the case with normal traps) so that the agent can resend if it is not acknowledged.

In order for traps to work, the manager first needs to listen for traps, which is usually fairly straightforward to set up. Then, various OIDs are set on the agent to enable traps and set the destination for those traps (e.g. the IP of the manager). In some cases agents also provide a web interface or other more convenient mechanisms to set these up, which is much appreciated since SNMP is unpleasant to have to think about directly.

That's about it for SNMP. Simple, right? Well, it really is pretty simple, as long as you agree to just take OIDs as magic numbers that come from wherever it is computers do and not ask too many questions. Where SNMP can become rather rough is when you run into issues with MIBs, or if you are using SNMPv3 where the authentication and configuration can be amazingly, maddeningly complex for some vendors.

As an aside, the whole reason I'm talking about SNMP is because a reader asked me to. For much the same reason, from the same reader, I'll be talking about LDAP soon. LDAP is even more an out-of-place artifact of OSI than SNMP, and it is basically impossible to describe as used in short form, but I will take a shot at illustrating the odd historical components of LDAP and the ways they matter today. It will at least serve as a teaser for my yet to be written book, "Survival Under LDAP." LDAP is survivable for as many as 70% of Americans, but you must know how to protect yourself!

[1] I continue to seriously question the merits of the complex address representation used with IPv6. If we had stuck to decimalized bytes separated by dots, we'd be doing a lot more typing, but we wouldn't be trying to remember what :: means when it's there.

<- newer                                                                older ->