A brief interlude from the topic of GUIs to talk about perhaps one of the most
infamous of all GUI programs, Microsoft PowerPoint.
PowerPoint is ubiquitous but often criticized in most industries, but I have
never seen more complete use and abuse of PowerPoint than in military. I was
repeatedly astounded by how military programs invested more effort in preparing
elaborately illustrated slides than actually, well, putting content in them.
And that, in a nutshell, is the common criticism of PowerPoint: that it allows
people to avoid actual effective communication by investing their effort in
Nonetheless, the basic idea of using visual aids in presentations is obviously
a good one. The problem seems to be one of degrees. When I competed in
expository speech back in high school my "slides" were printed on a plotter and
mounted on foam core. More so than the actual rules of the event, this imposed
an economy in my use of visual aids. Perhaps the problem with PowerPoint is
simply that it makes slides too easy. When all you need to do is click "new
slide" and fill in some bullet points, there's nothing to stop the type of
presenter who has more slides than ideas.
Of course that doesn't stop the military from hiring graphic designers to
prepare their flowcharts, but still, I think the basic concept stands...
As my foam core example suggests, the basic idea of presenting to slides is
much older than PowerPoint. I've quipped before that Corporate Culture is what
people call their PowerPoint presentations. Most of the large, old
organizations I've worked for, private and government, had some sort of
"in-group" term for a presentation. For example, at GE, one presents a "deck."
Many of these terms are anachronistic, frozen references to whichever
presentation technology the organization first adopted.
Visual aids for presentations could be said to have gone through a few
generations: large format printed materials, transparent slides, and digital
projection. Essentially all methods other than projection have died out today,
but for a time these all coexisted.
Printed materials can obviously be prepared by hand, e.g. by a sign painter,
and this was the first common method of presenting to slides. Automation
started from this point, with the use of plotters. As I have perhaps mentioned
before the term "plotter" is a bit overloaded and today is often used to refer
to large-format raster printers, but historically "plotter" referred to a
device that moved a tool along vectors, and it's still used for this purpose as
Some of the first devices to create print materials from a computer were pen
plotters, which worked by moving a pen around over the paper. HP and Roland
were both major manufacturers of these devices (Roland is still in the
traditional plotter business today, but for vinyl cutting). And it turns out
that presentations were a popular application. The lettering produced by these
devices was basic and often worse than what a sign painter could offer (but
requiring less skill). What really sold pen plotters was the ability to produce
precise graphs and charts directly from data packages like VisiCalc.
The particularly popular HP plotters, the 75 series, had a built-in demo
program that sold this capability by ponderously outlining a pie chart along
with a jagged but steeply rising line labeled "Sales." Business!
These sorts of visual aids remained relatively costly to product though until
projection became available... large-format plotters, board to make things
rigid, etc. are not cheap. Once you buy a single projector for a conference
room, though, projection becomes a fairly cheap technology, even with the
methods of producing slides.
The basic concept of projection slide technology is to produce graphics using
a computer and then print them onto a transparent material which serves as film
for a projector. There are a lot of variations on how to achieve this. Likely
the oldest method is to produce a document using a device like a plotter (or
manual illustration, or a combination) and then photographically expose it on
film using a device that could be described as an enlarger set to suck rather
than blow. Or a camera on a weird mount, your choice.
In fact this remained a very common process for duplication for a very long
time, as once a document was exposed on film photochemical methods can be used
to produce printing plates or screens or all kinds of things. There is a
terminological legacy of this method at least in the sciences, where many
journals and conferences refer to the final to-be-printed draft of a paper as
the "camera-ready" version. In the past, you would actually mail this copy to
them and they (or more likely their printing house) would photograph it using
a document camera and use the film to create the plates for the printed journal
If you've seen older technical books or journals, you may have seen charts and
math notation that were hand-written onto the paper after it was typewritten
(with blank spaces left for the figures and formulas). That's the magic of
"reprographics," a term which historically referred mostly to this paper to
film to paper process but nowadays gets used for all kinds of commercial
printing. This is closely related to the term "pasting up" for final document
layout, since a final step before reprographic printing was usually to combine
text blocks, figures, etc produced by various means into a single layout. Using
For presentations, there are a few options. The film directly off the document
camera may be developed and then mounted in a paper or plastic slide to be
placed in a projector. If you are familiar with film photography, that might
seem a little off to you because developed film is in negative... in fact, for
around a hundred years "reversal films" have been available that develop to
positive color, and they were typically used to photograph for slides in order
to avoid the need for an extra development process. Kodachrome is a prominent
example. Reversal films are also sometimes used for typical photography and
cinematography but tended to be more complex to develop and thus more
expensive, so most of us kept our terrible 35mm photography on negatives.
This approach had the downside that the slide would be very small (e.g. from a
35mm camera), which required specialized projection equipment (a slide
projector). The overhead projector was much more flexible because the "film
frame," called the platen, was large enough for a person to hand-write on. It
served as a whiteboard as well as a projector. So more conference rooms
featured overhead projectors than slide projectors, and there was a desire to
be able to project prepared presentations on these devices.
This concept, of putting prepared (usually computer-generated) material on a
transparent sheet to be placed on an overhead projector, is usually referred to
as a "viewgraph." Viewgraphs were especially popular in engineering and defense
fields, and there are people in the military who refer to their PowerPoint
presentations as viewgraphs to this day. There are multiple ways to produce
viewgraphs but the simplest and later on most common was the use of plastic
sheets that accepted fused toner much like paper, so viewgraphs could either be
printed on a laser printer or made by photocopying a paper version. When I
worked for my undergraduate computer center around a decade ago we still had
one laser printer that was kept stocked with transparency sheets, but people
only ever printed to it by accident.
In fact, these "direct-print" transparencies were a major technical advancement.
Before the special materials were developed to make them possible, overhead
transparencies were also produced by photochemical means and use of a document
camera and enlarger. But most large institutions had an in-house shop that
could produce these with a quick turnaround, and they were still popular even
before easy laser printing.
Not all projection slides were produced by photographing or copying a paper
document, and in fact this method was somewhat limited and tended not to work
well for color. By the '70s photosetting had become practical for the
production of printing plates directly from computers, and it was also used to
produce slides and transparencies. At the simplest, a photosetter is a computer
display with optics that focus the emitted light onto film. In practice, many
photosetters were much more complicated as they used shifting of the optics to
expose small sections of film at a time, allowing for photosetting at much
higher resolution than the actual display (often a CRT).
Donald Knuth originally developed TeX as a method of controlling a photosetter
to produce print plates for books, and some of TeX's rougher edges date back to
its origin of being closely coupled to this screen-to-film process. The
photosetting process was also used to produce slides direct from digital
content, and into the early '00s it was possible to send a PowerPoint
presentation off to a company that would photoset it onto Kodak slides.
Somewhere I have a bin of janitorial product sales presentations on slides that
seem to be this recent.
The overhead projector as a device was popular and flexible, and so it was also
leveraged for some of the first digital projection technology. In fact, the
history of electronic projection is long and interesting, but I am constraining
myself to devices often seen in corporate conference rooms, so we will leave
out amazing creations like the Eidophor. The first direct computer projection
method to become readily available to America's middle management was a device
sometimes called a spatial light modulator (SLM).
By the 1980s these were starting to pop up. They were basically transparent LCD
displays of about the right size to be placed directly onto the platen of an
overhead projector. With a composite video or VGA interface they could be used
as direct computer displays, although the color rendering and refresh rate
tended to be abysmal. I remember seeing one used in elementary school, along
with the 8mm projectors that many school districts held on to for decades.
All of these odd methods of presentation basically disappeared when the
"digital projector" or "data projector" became available. Much like our modern
projectors, these devices were direct computer displays that offered relatively
good image quality and didn't require any of the advanced preparation that
previous methods had. Digital projectors had their own evolution, though.
The first widely popular digital projectors were CRT projectors, which used a
set of three unusually bright CRT tubes and optics. CRT projectors offered
surprisingly good image quality (late-model CRT projectors are pretty
comparable to modern 3LCD projectors), but were large, expensive, and not very
bright. The tubes were often liquid cooled and required regular replacement at
a substantial cost. As a result, they weren't common outside of large meeting
rooms and theaters.
The large size, low brightness, and often high noise level of CRT projectors
made them a bit more like film projectors than modern digital projectors in
terms of installation and handling. They were not just screwed into the
ceiling, rooms would be designed specifically for them. They could weigh
several hundred pounds and required good maintenance access. All of this added
up to mean that they were usually in a projection booth or in a rear-projection
arrangement. Rear-projection was especially popular in institutional contexts
because it allowed a person to point at the screen without shadowing.
Take a close look at any major corporate auditorium or college lecture hall
built in the '70s or '80s and there will almost certainly be an awkward storage
room directly behind the platform. Originally, this was actually the projection
booth, and a transparent rear-projection screen was mounted in the wall in
between. Well-equipped auditoriums would often have both a rear projection and
front projection capability, as rear projection required mirroring the image.
Anything that came in on film would often be front-projected, often onto a
larger screen, because it was simpler and easier. Few things came in on film
that someone would be pointing at, anyway.
You may be detecting that I enjoy the archaeological study of 1980s office
buildings. We all need hobbies. Sometimes I think I should have been an
electrician just so I could explain to clients why their motor-variac
architectural lighting controller is mounted in the place it is, but then
they'd certainly have found an excuse to make me stop talking to them by that
The next major digital projection technology on the scene was DLP, in which a
tiny MEMS array of mirrors flip in and out of position to turn pixels on and
off. The thing is, DLP technology is basically the end of history here... DLP
projectors are still commonly used today. LCD projectors, especially those with
one LCD per color, tend to produce better quality. Laser projectors, which use
a laser diode as a light source, offer even better brightness and lifespan than
the short arc lamps used by DLP and LCD projectors. But all of these are
basically just incremental improvements on the DLP projection technology, which
made digital projectors small enough and affordable enough to become a major
presence in conference rooms and classrooms.
The trick, of course, is that as television technology has improved these
projectors are losing their audience. Because I am a huge dweeb I use a
projector in my living room, but it is clear to me at this point that the next
upgrade will be to a television. Televisions offer better color rendering and
brightness than comparably priced projection setups, and are reaching into the
same size bracket. An 85" OLED television, while fantastically expensive, is in
the same price range as a similarly spec'd projector and 100" screen (assuming
ALPR here for more comparable brightness/color). And, of course, the
installation is easier. But let me tell you, once you've installed an outlet
and video plate in the dead center of your living room ceiling you feel a
strong compulsion to use it for something. Ceiling TV?
So that's basically the story of how we get to today. Producing a "deck" for a
meeting presentation used to be a fairly substantial effort that involved the
use of specialized software and sending out to at least an internal print shop,
if not an outside vendor, for the preparation of the actual slides. At that
point in time, slides had to be "worth it," although I'm sure that didn't stop
all kinds of useless slides to impress people with stars on their shoulders.
Today, though, preparing visual aids for a presentation is so simple that it
has become the default. Hiding off to the side of slides is seen as less effort
than standing where people will actually look at you. And god knows that in the
era of COVID the "share screen" button is basically a trick to make it so
people don't just see your webcam video when you're talking. That would be
There are many little details and variations in this story that I would love to
talk about but I fear it will turn into a complete ramble. For example,
overhead based projection could be remarkably sophisticated at times. You may
remember the scene at the beginning of "The Hunt for Red October" (the film) in
which Alec Baldwin gives an intelligence briefing while unseen military aids
change out the transparencies on multiple overhead projectors behind
rear-projection screens. This was a real thing that was done in important
Slide projectors were sometimes used in surprisingly sophisticated setups. I
worked with a college lecture hall that was originally equipped with one rear
projection screen for a CRT projector and two front projection screens, both
with a corresponding slide projector. All three projectors could be controlled
from the lectern. I suspect this setup was rarely used to its full potential
and it had of course been removed, the pedestals for the front slide projectors
remaining as historic artifacts much like the "No Smoking" painted on the front
Various methods existed for synchronizing film and slide projectors with
recorded audio. A particularly well-known example is the "film strip" sometimes
used in schools as a cheaper substitute for an actual motion picture. Late
film strips were cassette tapes and strips of slides, the projector advanced
the slide strip when it detected a tone in the audio from the cassette tape.
Note 2: I have begrudgingly started using Twitter to ramble about the things
I spend my day on. It's hard to say how long this will last. https://twitter.com/jcrawfordor.
When we look back on the history of the graphical user interface, perhaps one
of the most important innovations in the history of computing, we tend to think
of a timeline like this: Xerox, Apple, Microsoft, whatever we're doing today.
Of course that has the correct general contours. The GUI as a concept, and the
specific interaction paradigms we are familiar with today, formed in their
first productized version at the Xerox Palo Alto Research Center (XPARC).
Their production version, the Alto, was never offered as a commercial product
but was nonetheless widely known and very influential. Apple's early machines,
particularly the Lisa and Macintosh, featured a design heavily inspired by the
work at XPARC. Later, Microsoft released Windows 3.11 for Workgroups, the first
one that was cool, which was heavily inspired by Apple's work.
In reality, though, the history of the GUI is a tangled one full of controversy
("inspired" in the previous paragraph is a euphemism for "they all sued each
other") and false starts. Most serious efforts at GUIs amounted to nothing, and
the few that survive to the modern age are not necessarily the ones that were
most competitive when originally introduced. Much like I have perennially said
about networking, the world of GUIs has so ossified into three major branches
(MacOS-like, Windows-like, and whatever the hell Gnome 3 is trying to be )
that it's hard to imagine other options.
Well, that's what we're about to do: imagine a different world, a world where
it's around the '80s and there are multiple competing GUIs. Most importantly,
there are more competing GUIs than there are operating systems, because
multiple independent software vendors (ISVs) took on the development of GUIs
on top of operating systems like CP/M and DOS. The complexities of a GUI, such
as de facto requiring multi-tasking, required that these GUIs substantially
blur the line between "operating system" and "application" in a way that only
an early PC programmer could love.
And that is why I love these.
Before we embark on a scenic tour of the graveyard of abandoned GUIs, we need
to talk a bit about the GUI as a concept. This is important to comprehend the
precedents for the GUI of today, and thus the reason that Apple did not prevail
in their lawsuit against Microsoft (and perhaps Xerox did not prevail in their
lawsuit against Apple, although this is an iffier claim as the Xerox vs. Apple
case did not get the same examination as Apple vs. Microsoft).
What is a GUI?
I believe that a fundamental challenge to nearly all discussions about GUIs is
that the term "GUI" is actually somewhat ill defined. In an attempt to resolve
this, I will present some terminology that might be a better fit for this
discussion than "GUI" and "TUI" or "graphical" and "command-line." In doing so
I will try to keep my terminology in line with that used in the academic study
of human-computer interaction, but despite the best efforts of one of my former
advisors I am not an HCI scholar so I will probably reinvent terminology at
The first thing we should observe is that the distinction between "graphics"
and "text" is not actually especially important. I mean, it is very important,
but it actually does not fundamentally define the interface. In my experience
people rarely think about it this way, but it ought to be obvious: libraries
such as newt can be used to create "gui-esque" programs in text mode (think of
the old Debian installer as an example), while there are graphical programs
that behave very much like textmode ones (think of text editors). Emacs is a
good example of software which blurs this line; emacs simultaneously has traits
of "TUI" and "GUI" and is often preferred in graphical mode as a result.
To navigate this confusion, I use the terms "graphics mode" and "text mode" to
refer strictly to the technical output mechanism---whether raster data or text
is sent to the video adapter. Think about it like the legacy VGA modes: the
selection of graphics or text mode is important to user experience and imposes
constraints on interface design, but does not fundamentally determine the type
of interface that will be presented.
What does? Well, that's a difficult question to answer, in part because of the
panoply of approaches to GUIs. Industry and researchers in HCI tend to use
certain useful classifications, though. The first, and perhaps most important,
is that of a functional UI versus an object-oriented UI. Do not get too tangled
in thinking of these as related to functional programming or OO programming,
as the interface paradigm is not necessarily coupled to the implementation.
A functional user interface is one that primarily emphasis, well, functions.
A command interpreter, such as a shell, is a very functional interface in that
the primary element of interaction is, well, functions, with data existing in
the context of those functions. On the other hand, a modern word processor is an
object oriented interface. The primary element of interaction is not functions
but the data (i.e. objects), the available functions are presented in the
context of data.
In a way, this dichotomy actually captures the "GUI vs TUI" debate better than
the actual difference between graphics and text mode. Text mode applications
are usually, but not always, functional, while graphical applications are
usually, but not always, object oriented. If you've worked with enough
special-purpose software, say in the sciences, you've likely encountered a
graphical program which was actually functional rather than object oriented,
and found it to be a frustrating mess.
This has a lot to do with the discovery and hiding of functionality and data.
Functional interfaces tend to be either highly constrained (e.g. they are only
capable of a few things) or require that the bulk of functionality be hidden,
as in the case of a typical shell where users are expected to know the
available functions rather than being offered them by the interface. Graphical
software which attempts to offer a broad swath of functionality, in a
functional paradigm, will have a tendency to overwhelm users.
Consider the case of Microsoft Word. I had previously asserted that word
processors are usually an example of the object oriented interface. In
practice, virtually all software actually presents a blend of the two
paradigms. In the case of Word, the interface is mostly object-oriented, but
there is a need to present a large set of commands. Traditionally this has been
done by the means of drop-down menus, which date back nearly to the genesis of
raster computer displays. This is part of the model or toolkit often called
WIMP, meaning Windows, Icons, Menus, Pointer. A very large portion of graphics
mode software is WIMP, and the WIMP concept today is exemplified by many GUI
development toolkits which are highly WIMP-centric (Windows Forms, Tk, etc).
If you used Office 2003 or earlier, you will no doubt remember the immense
volume of functionality present in the menu bar. This is an example of the
feature or option overload that functional interfaces tend to present unless
functionality is carefully hidden. It makes an especially good example because
of Microsoft's choice in 2007 to introduce the "ribbon" interface. This was a
remarkably controversial decision (for the same reason that any change to any
software ever is controversial), but at its core it appears to have been an
effort by Microsoft to improve discoverability of the Office interface through
contextual hiding of the menus. Essentially, the ribbon extends the object
oriented aspect of the interface to the upper window chrome, which had
traditionally been a bastion of functional (menu-driven) design.
Menu-driven is another useful term here, although I tend to prefer "guided" as
a term instead (this is a term of my own invention). Guided interfaces are
those that accommodate novice or infrequent users by clearly expressing the
available options. Very frequently this is by means of graphical menus, but
there are numerous other options, one of which we'll talk about shortly. The
most extreme form of a guided interface is the wizard, which despite being
broadly lambasted for Microsoft's particularly aggressive use in earlier
Windows versions has survived in a great deal of contexts. A much more relaxed
form would be the "(Y/n)" type hints often shown in textmode applications.
"Abort, Retry, Fail?," if you think about it, is a menu . This guidance is
obviously closely related to discoverability, basically in the sense that
non-guided interfaces make little to no attempt at discoverability (e.g. the
Another useful term is the direct manipulation interface. Direct manipulation
is a more generalized form of WYSIWYG (What You See Is What You Get). Direct
manipulation interfaces are those that allow the user to make changes and
immediately see the results. Commonly this is done by means of an interface
metaphor in which the user directly manipulates the object/data in a fashion
that is intuitive due to its relation to physical space . For example,
resizing objects using corner drag handles. Direct manipulation interfaces are
not necessarily WYSIWYG. For example, a great deal of early graphics and word
processing software enabled direct manipulation but did not attempt to show
"final" output until so commanded (WordPerfect, for example).
This has been sort of a grab basket of terminology and has not necessarily
answered the original question (what is a GUI?). This is partially a result
of my innate tendency to ramble but partially a result of the real complexity
of the question. Interfaces generally exist somewhere on a spectrum from
functional to object-oriented, from guided to un-guided, and in a way from
text mode to graphics mode (consider e.g. the use of Curses box drawing to
replicate dialogs in text mode).
My underlying contention, to review, is this: when people talk about "GUI vs
TUI," they are usually referring not to the video mode (raster or text) but
actually to the interface paradigm, which tends to be functional or object
oriented, respectively, and unguided or guided, respectively. Popular
perceptions of the GUI vs. TUI dichotomy, even among technical professionals,
are often more a result of the computing culture (e.g. the dominance of the
Apple-esque WIMP model) than technical capabilities or limitations of the two.
What I am saying is that the difference between GUI and TUI is a cultural
Interface Standardization: CUA
In explaining this concept that the "GUI vs TUI" dichotomy is deeper than the
actual video mode, I often reach out to a historic example that will be
particularly useful here because of its importance in the history of the
GUI---and especially the history of the GUIs we use today. That is IBM
Common User Access, CUA.
CUA is is not an especially early event in GUI history but it's a formative one,
and it's useful for our purposes because it was published at a
time---1987---when there were still plenty of text-only terminals in use. As
a result, CUA bridges the text and raster universes.
The context is this: by the late '80s, IBM considers itself a major player
in the world of personal computers, in addition to mainframes and mid/minis.
Across these domains existed a variety of operating systems with a variety of
software. This is true even of the PC, as at this point in time IBM is
simultaneously supporting OS/2 and Windows (2). While graphical interfaces
clearly existed for these systems, this was still an early era for raster
displays, and for the most part IBM still felt text mode to be more important
(it was the only option available on their cash cow mainframes and minis).
Across operating systems and applications there was a tremendous degree of
inconsistency in basic commands and interactions. This was an issue in both
graphical and textual software but was especially clear in text mode where
constraints of the display meant that user guidance was usually relatively
minimal. We can still clearly see this on Unix-like operating systems where
many popular programs are ported from historical operating systems with
varying input conventions (to the extent they had reliable conventions),
and few efforts have been made to standardize. Consider the classic problem
of exiting vim or emacs: each requires a completely different approach with
no guidance. This used to be the case with essentially all software.
CUA aimed to solve this problem by establishing uniform keyboard commands and
interface conventions across software on all IBM platforms. CUA was developed
to function completely in text mode, which will be somewhat surprising
considering the range of things it standardized.
The most often discussed component of CUA is its keyboard shortcuts. Through a
somewhat indirect route (considering the failure of the close IBM/Microsoft
collaboration), CUA has been highly influential on Windows software. Many of
the well-known Windows keyboard commands come from CUA originally. For example,
F1 for help, F3 for search, F5 to refresh. This is not limited just to the F
keys, and the bulk of common Windows shortcuts originated with CUA. There are,
of course, exceptions, with copy and paste being major ones: CUA defined
Shift+Delete and Shift+Insert for cut and paste, for example. Microsoft made a
decision early on to adopt the Apple shortcuts instead, and those are the
Ctrl+C/Ctrl+V/Ctrl+X we are familiar with today. They have been begrudgingly
adopted by almost every computing environment with the exception of terminals
and Xorg (but are then re-implemented by most GUI toolkits).
The keyboard, though, is old hat for text mode applications. CUA went a great
deal further by also standardizing a set of interactions which are very much
GUI by modern standards. For example, the dialog box with conventional options
of "OK" and "OK/Cancel" come from CUA, along with the ubiquitous menu sequence
of File first and Help last.
While being graphical by modern standards these concepts of drop-down menus and
dialog boxes were widely implemented in text mode by IBM. From a Linux
perspective, this is rarely seen and would likely be a bit surprising. Why is
I contend that there is a significant and early differentiation between IBM
and UNIX interfaces that remains highly influential today. While today the
dichotomy is widely viewed as philosophical, at the time it was far more
UNIX was developed inside of AT&T as a research project and then spread
primarily through universities and research organizations. Because it was
viewed primarily as a research operating system, UNIX was often run on whatever
hardware was available. The PDP-11, for example, was very common. Early on,
most of these systems were equipped with teletypewriters and not video
terminals. Even as video terminals became common, there were a wide variety
in use with remarkably little standardization, which made exploiting the
power and flexibility of the video terminal very difficult. The result is that,
for a large and important portion of its history, UNIX software was built
under the assumption that the terminal was completely line-oriented... that
is, no escape codes, no curses.
IBM, on the other hand, had complete control of the hardware in use. IBM
operating systems and software were virtually always run on both machines and
terminals that were leased from IBM as part of a package deal. There was
relatively little fragmentation of hardware capabilities and software
developers could safely take full advantage of whatever terminal was standard
with the computer the software was built for (and it was common for software to
require a particular terminal).
For this reason, IBM terminal support has always been more sophisticated than
UNIX terminal support. At the root was a major difference in philosophy. IBM
made extensive use of block terminals, rather than character terminals.
For a block terminal, the computer would send a full "screen" to the terminal.
The terminal operated independently, allowing the user to edit the screen,
until the user triggered a submit action (typically by pressing enter) which
caused the terminal to send the entire screen back to the computer and await
a new screen to display.
This mechanism made it very easy to implement "form" interfaces that required
minimal computer support, which is one of the reasons that IBM mainframes
were particularly prized for the ability to support a very large number of
terminals. In later block terminals such as the important 3270, the computer
could inform the terminal of the location of editable fields and even specify
basic form validation criteria, all as part of the screen sent to the terminal
Ultimately, the block terminal concept is far more like the modern web browser
than what we usually think of as a terminal. Although the business logic is all
in the mainframe, much of the interface/interaction logic actually runs locally
in the terminal. Because the entire screen was sent to the terminal each time,
it was uniformly possible to update any point on the screen, which was not
something which could be assumed for a large portion of UNIX's rise to
As a result, the IBM terminal model was much more amenable to user guidance
than the UNIX model. Even when displaying a simple command shell, IBM terminals
could provide user guidance at the top or bottom of the screen (and it was
standard to do so, often with a key to toggle the amount of guidance displayed
to gain more screen space as desired). UNIX shells do not do so, primarily for
the simple reason that the shells were developed when most machines were not
capable of placing text at the top or bottom of the screen while still being
able to accept user input at the prompt.
Of course curses capabilities are now ubiquitous through the magic of every
software terminal pretending to be a particularly popular video terminal from
1983. Newer software like tmux usually relied on this from the start, and older
mainstays like vi have had support added. But the underlying concept of the
line-oriented shell ossified before this happened, and "modern" terminals like
zsh and fish have made only relatively minor inroads in the form of much more
interactive assisted tab completion.
IBM software, on the other hand, has been offering on-screen menus and guidance
since before the C programming language. Well prior to CUA it was typical for
IBM software to use interactive menus where the user selects an option,
hierarchical/nested menus, and common commands via F keys which were listed
at the bottom of the screen for the user's convenience.
While many IBM operating systems and software packages do offer a command line,
it's often oriented more towards power users and typical functions were all
accessible by a guided menu system. Most IBM software, especially by the '80s,
provided an extensive online help facility where pressing F1 retrieved
context-aware guidance on filling out a particular form or field. Indeed, the
CUA concept of an interactive help system where the user presses a Help icon
and then clicks on a GUI element to get a popup explanation---formerly common
in Windows software---was a direct descendent of the IBM mainframe online help.
The point I intend to illustrate here is not that IBM mainframes were
surprisingly sophisticated and cool, although that is true (IBM had many
problems but for the most part the engineering was not one of them).
My point is that the modern dichotomy, debate, even religious war between the
GUI and TUI actually predates GUIs. It is not a debate over graphical vs text
display, it is a debate over more fundamental UI paradigms. It is a fight of
guided but less flexible interfaces versus unguided but more powerful ones.
It is a fight of functional interfaces versus object oriented ones. And perhaps
most importantly, it is a competition of "code-like" interfaces versus direct
What's more, and here is where I swerve more into the hot take lane, the
victory of the text-mode, line-oriented, shell interface in computer science
and engineering is not a result of some inherent elegance or power. It is an
artifact of the history of computing.
Most decisions in computing, at least most meaningful ones, are not by design.
They are by coincidence, simultaneously haphazard but also inevitable in
consideration of the decades of work up to that point. Abstraction, it turns
out, is freeing, but also confining. Freeing in that it spares the programmer
thinking about the underlying work, but confining in that it pervasively, if
sometimes subtly, steers all of us in a direction set in the '60s when our
chosen platform's lineage began.
This is as true of the GUI as anything else, and so it should be no surprise
that IBM's achievements were highly influential, but simultaneously UNIX's
limitations were highly influential. For how much time is spent discussing the
philosophical advantages of interfaces, I don't think it's a stretch to say
that the schism in modern computing, between the terminal and everything else,
is a resonating echo of IBM's decision to lease equipment and AT&T's decision
to make UNIX widely available to academic users.
Some old GUIs
Now that we've established that the history of the GUI is philosophically
complex and rooted in things set in motion before we were born, I'd like to
take a look at some of the forgotten branches of the GUI family tree: GUI
implementations that were influential, technically impressive, or just weird.
I've already gone on more than enough for one evening, though, so keep an eye
out for part 2 of... several.
 It is going to take an enormous amount of self-discipline to avoid turning
all of this into one long screed about Gnome 3, perhaps the only software I
have ever truly hated. Oh, that and all web browsers.
 Menus in text mode applications are interesting due to the surprising lack
of widespread agreement on how to implement them. There are many, many
variations across commonly used software from limited shells with tab
completion to what I call the "CS Freshman Special," presenting a list of
numbered options and prompting the user to enter the number of their choice.
The inconsistency of these text mode menus get at exactly the problem IBM was
trying to solve with CUA, but then I'm spoiling the end for you.
 This is a somewhat more academic topic than I usually verge into, but it
could be argued and often is that graphical software is intrinsically
metaphorical. That is, it is always structured around an "interface metaphor"
as an aid to users in understanding the possible interactions. The most basic
interface metaphor might be that of the button, which traditionally had a
simple 3D raised appearance as an affordance to suggest that it can be pressed
down. This is all part of the puzzle of what differentiates "GUI" from "TUI":
graphical applications are usually, but not always, based in metaphor. Textmode
applications usually aren't, if nothing else due to the constraints of text,
but it does happen.
 This did lead to some unusual interaction designs that would probably not
be repeated today. For example, in many IBM text editors an entire line would
be deleted (analogous to vim's "dd") by typing one or more "d"s over the line
number in the left margin and then submitting. The screen was returned with
that line removed. This was more or less a workaround for the fact that the
terminal understood each line of the text document to be a form field, and so
there was some jankiness around adding/removing lines. Scrolling similarly
required round trips to the computer.
Let's talk about a bit of telephone history. Again. Normally, I am more
interested in the switching equipment and carriers and not so much in the
instruments---that is, the things that you plug in at the end of the line.
There are a few that really catch my eye, though, and one of them is of course
the phenomena of the trading turret.
A trading turret is a
specialized telephone-like device typically used by day traders. The somewhat
useless Wikipedia article describes a trading turret as being a specialized
key system, which is useless to most people today as key systems are no
longer common and few people know what they are. Nonetheless, it is basically
true. I will leave out much discussion of key systems here because I will
probably talk about them in depth in the future, but a basic explanation is
that a key system allows users at multiple telephone instruments to each access
all outside lines. This was a popular setup for businesses that were large
enough to need multiple outside lines but too small to have a dedicated telephone
operator, from their introduction in the 1930s to the development of affordable
small PABXs in the '90s.
Key systems still occasionally appear today and the topic can become somewhat
muddled because late key systems tended to have "PABX features" and many PABXs,
especially in the IP world, have "key system features." But the basic
difference can be explained something like this: a PABX connects multiple users
to each line, while a key system connects multiple lines to each user. They
were often used for similar purposes with the difference being largely one of
implementation, but key systems do have their specific niches.
One of those is the item we for some reason call a turret. The term turret
is used today almost exclusively to refer to the item made for the securities
industry, a trading turret. These formidable tanks of phones often provide
multiple handsets and speakers and are more or less identified by a touchscreen
or large set of soft buttons that allow one-touch access to a large number of
These are superficially similar to a large set of line buttons such as is seen
on the "receptionist sidecar" available for many business phones---an extra
plug-in module that offers a big set of line buttons which can be configured as
speed-dials or even one-touch unattended transfers, so that a receptionist can
easily transfer calls or call up for people without having to dial extensions
all the time. However, turrets are more than just phones with a lot of line
It kind of raises the question: what is a trading turret? What really
differentiates one from, say, a digital PABX phone with a sidecar?
This is just the kind of thing I contemplate in my private moments, but the
issue came to the front of my mind when someone provided a mailing list I am a
member of with an interesting document . It is the 1974 Bell System Practice
(BSP, basically a Bell System standard operating procedure) for the SAC Main
Operating Base Turret. BSP 981-202-100 if you are particularly interested.
The document describes a desk-wide system with ten color-coded handsets used at
a Strategic Air Command base to give a communications operator quick access to
primary and redundant versions of multiple communications lines. For flavor,
two of these handsets were red and corresponded to primary and secondary
four-wire leased line circuits used for the SAC Primary Alerting System, used
to deliver emergency action messages. Here we have a real red telephone, but
not to Moscow.
This makes it clear that the term "turret" is not specific to the finance
industry, which was actually a bit of a surprise to me. Where, then, did we get
the turret as a type of telephone instrument?
The first usage I have found is the Order Turret No. 1, introduced by the Bell
system sometime in the early 1930s (exactly date unclear). The No. 1 is
essentially a small manual (cord-and-plug) exchange that accommodates multiple
user "positions." A series of subsequent Order Turrets, up to at least the No.
4, were produced in the first half of the century.
I was initially a bit unclear on the application of these devices (I found BSPs
on them, but these have a great way of describing maintenance and repair in
detail without ever saying what the thing is for) until I found an article in
the Bell Laboratories Record, an employee magazine, of 1938. The article
describes the use of the No. 4, now a more compact design which can be scaled
to an arbitrary number of operators, as it was used at Macy's. It is called
an Order Turret, it turns out, because it is used to place orders.
The system looks something like this: 20 (or another number, but we'll say 20,
which was the capacity of the apparently common Order Turret Number 2) outside
lines are assigned sequential numbers at the telephone exchange with busy
fall through such that a call to the first line, if it is in use, will connect
to the next line, and so on until a free line is found. At the turret, the call
"appears" on a jack in front of each attendant. Whichever attendant is not
currently busy can insert a plug to answer the call. In this way, the turret
system allows a pool of attendants to collectively answer a pool of incoming
But there's more: these attendants are taking telephone orders in a department
store, where the actual stock is out on the floor in various departments. So if
a customer asks about a particular item, the attendant can insert a plug into a
jack for an internal line to that department, ringing a phone on the floor so
that the attendant can speak with a salesperson to confirm availability and
have the item set aside. The turret is used not only to answer calls, but to
simultaneously manage multiple calls between different parties.
So far as I can tell, this is the defining feature of a turret: a turret
isn't just used to handle multiple lines (that's a key telephone). A turret
isn't just used to have rapid access to many speed dials (that's a receptionist
sidecar). A turret is used to make multiple simultaneous calls, by someone
who must quickly relay information between multiple parties. Like the telephone
order attendant at an old-fashioned department store, the person on
communications duty at a SAC command, or an investment banker.
This explains of course why both legacy and modern turrets often feature
multiple handsets (the original Order Turrets did not, but the attendant wore a
headset that they would move between jacks instead). As telephone systems have
become more sophisticated, turrets have as well, and modern turrets often use
IP connectivity to provide a mix of features like squawk boxes (permanently
open conference lines), presence information, and a feature with various names
(sometimes called automatic ringdown although this is not quite accurate) that
allows one trader at a turret to call another trader at a turret with no
ringing---the call just connects immediately, much like an intercom. All of
this can be done very quickly, because the turret provides a large set of
pre-programmed buttons for all the people the user is likely to want to contact.
You can already see that the application I've described for these early
turrets, of order taking, could be handled differently. An obvious enhancement
is to actively distribute calls to available attendants instead of presenting
calls at all attendant stations and waiting for someone to pick up. Indeed, the
Order Turret No. 4 did exactly this, actively "pushing" each incoming call to
an available attendant. This increase in sophistication, to actively routing
calls, really blurred the line between the order turret and the PABX, which
Bell was well aware of. The No. 4 was less an order turret in the sense of
previous designs, and more just a feature of a PABX.
The order lines, instead of being dedicated lines going straight to turrets,
were just the normal incoming lines of the business PABX. The business PABX
allocated calls to attendants sitting at the No. 4 stations. This is basically
how modern inward call center systems work, and it seems that over time the
concept of the "order turret" faded away as these call center queue systems
became just another feature of a PABX.
Turrets found few niches in which to hold on. The SAC command turret tells a
bit of a story about the close relationship between the Cold War defense
complex and the Bell System. Large portions of SAC infrastructure were
essentially contracted to AT&T, and so AT&T apparently drew on their background
with order turrets in developing the concept for the SAC communications system,
which in its totality consisted of a dizzying number of two- and four-wire
leased lines and radio links unified by these eight-foot-wide turrets. They
even controlled the sirens.
No doubt there were other turrets designed by Ma Bell, although I have
struggled to find them. The Order Turret series seems to have died away by the
mid-century, but the SAC command turret likely remained in use into the '80s at
least. Can we find any others?
An obvious application for a turret-like system is in police and fire dispatch,
where in many smaller communities emergency calls were taken directly by the
dispatcher who then had to relay information on the radio. Indeed, various
vendors have sold telephone equipment for dispatch and public safety answering
points (PSAPs, where 911 is answered) described as turrets, but the terminology
does not seem to have caught on as strongly in that field. I would suspect this
is because radio equipment was often more important in these early dispatch
centers than telephone, and indeed the complex communications consoles in
public safety dispatch centers usually come from radio vendors (e.g. Motorola)
rather than telephone. Radio vendors usually just call these "dispatch
consoles" and they have gone through a similar evolution from electromechanical
There is one exception which stands out: in the city of Boston, the central
dispatch office is apparently colloquially referred to as "the turret." The
recordings of police radio traffic, sometimes used as evidence in court, are
often referred to as "turret tapes" in Massachusetts. I am not certain that the
terms are related but it would seem likely; I would speculate that at some
point in history the concept of dispatchers using turrets turned into
dispatchers working at the turret.
The funny thing here is that I've gone on for a long time without addressing
my original interest: why is it called a turret? Well, after all the digging
through BSPs and newspaper archives and a whole detour through court records,
I still haven't quite answered that question. No one seems to have written
down an etymology.
All I can offer is this theory: while turret most directly refers to a tower,
through the path of gun turrets it has also come to refer to something that
rotates (e.g. in the case of a turret lathe). The original Order Turrets
consisted of a rectangular table around which four attendants would sit, two on
each side. Perhaps they were called turrets because the attendants sat in a
circle and the duty to answer the next call rotated around.
Just a guess.
 Today we usually just use the term PBX, for Private Branch eXchange. I
specify PABX, for Private Automatic Branch eXchange, in the historical
context because for decades the term "PBX" referred mostly to manual boards
with dedicated operators, which used to be common in businesses, hotels, etc.
A PABX is an automatically switched system, basically the PBX equivalent of
the introduction of dialing.
 I am leaving the person and list here anonymous out of respect for the
community's privacy, although it is an excellent resource if you're interested
in these topics and I feel a bit bad for not giving credit. The name of the
list rhymes with, uhh, OldDoorBombs.
I've mentioned LDAP several times as of late. Most recently, when I said I
would write about it. And here we are! I will not provide a complete or
thorough explanation of LDAP because doing so would easily fill a book, and
I'm not sure that I'm prepared to be the kind of person who has written a
book on LDAP. But I will try to give you a general understanding of what
LDAP is, how it works, and why it is such a monumental pain in the ass.
I've also mentioned it, though, in the context of the OSI protocols. This is
because LDAP is a direct descendent of one of the great visions of the OSI
project: a grand, unified directory infrastructure with global addressability
and integration with the other OSI protocols. This is an example of the
ambition and failure of the OSI concept: in practice, directory services have
proven to be fairly special-purpose, limited to enterprise environments, and
intentionally limited in scope (e.g. kept internal for security reasons). OSI
contemplated a directory infrastructure which was basically the opposite in
every regard. It did not survive to the modern age, except in various bits
and pieces which are still widely used in... once again, crypto infrastructure.
Common crypto certificate formats are ASN.1 serialized (as we mentioned last
week) because they are from the OSI directory service, X.500.
Before we get into the weeds, though, let's understand the high level
objectives. What even is a directory, or a directory service?
It's a digital telephone directory.
This answer is so simple and naive that it almost cannot be true, and yet it
is. Remember that the whole OSI deal was in many ways a product of the
telephone industry, and that the telephone industry has always favored more
complex, powerful, integrated solutions over simpler, independent, but
composable solutions. One thing the telephone industry knew well, and had
a surprisingly sophisticated approach to, was the white pages.
If you think about it, the humble telephone directory was a surprisingly
central component of the bureaucracy of the typical 1970s enterprise. Today,
historians often review archived institutional and corporate telephone
directories as a way to figure out the timelines of historical figures.
Corporate histories often use the telephone directory as a main organizing
source, since it documents both the changing staff and the changing structure
of the organization (traditional corporate directories often had an org chart
in the front pages to boot!).
Across the many functional areas of a business, the telephone directory was a
unifying source of truth---or authority---for the structure and membership of
the organization. For consumer telephone service, directories had a less
complex structure but were an undertaking in their own way due to the sheer
number of subscribers. Telephone providers put computers to work at the job of
collecting, sorting, and printing their subscriber's directory entries very
early on. The information in the published white pages was an excerpt or
report from the company's subscriber rolls, and so was closely tied to other
important functions like billing and service management.
Inside the industry, the directory referred to all this and more: the unified,
authoritative information on the users of the system.
This concept was extended to the world of computing in the form of X.500 and
its accompanying OSI network protocols for access to X.500 information. At its
root, LDAP is an alternative protocol to access X.500, and so there are
substantial similarities between X.500 and the X.500-like substance that we now
refer to as an LDAP server. In fact, there is no such thing as an "LDAP server"
in the sense that LDAP remains a protocol to access an X.500 compliant
directory, but in practice LDAP is now usually used with backends that were
designed specifically for LDAP and avoid much of the complexity of X.500 in the
sense of the OSI model. The situation today is such that "X.500" and "LDAP" are
closely related concepts which are difficult to fully untangle; X.500 is very
much alive and well if you accept the caveat that it is only used in the
constrained form of corporate directories accessed by alternative methods .
The basic structure of X.500 is called the Directory Information Tree, or DIT.
The DIT is a hierarchical database which stores objects that possess
attributes, which are basically key-value pairs belonging to the object.
Objects can be queried for based on their attributes, using a form called the
Distinguished Name. DNs are made up of a set of attributes which uniquely
identify an object at each level of the hierarchy. For example, an idealized
X.500 DN, in the same notation as used by LDAP/LDIF (notation for DNs varies by
X.500 protocol), looks like this: cn=J. B. Crawford,ou=Blogger,o=Seventh
Standard,c=us. This DN identifies an object by, from top to bottom, country,
organization, organizational unit, and common name. Common name is an attribute
which contains a human-readable name for the object and is, conventionally,
widely used for the identification of that object.
Note some things about this concept: first, the structure is rooted in the
US. How does the namespace work, exactly? Who determines organizations under
countries? Originally, X.500 was intended to be operated much like DNS, as a
distributed system of many servers operating a shared namespace. Space in that
namespace would be managed through a registry, which would be SRI or Network
Solutions or whatever .
Second, this whole concept of identifying objects by attributes seems like it's
very subject to conventions. It is, but you must resist the urge to hear
"hierarchical store of objects with attributes" and think of X.500 as being a
lightly-structured, flexible data store like a modern "NoSQL." In reality it is
not, X.500 is highly structured through the use of schemas.
We mostly use the term "schema" when talking about relational databases or
markup languages. X.500 schemas serve the same function of describing the
structure of objects in the DIT but look and feel different because they are
highly object-oriented. That is, an X.500 schema is made up of classes. Classes
can be inherited from other classes, in which case their attributes are merged.
Resultingly there is not only a hierarchy of data, but of types. Objects can
be instances of multiple classes, in which case they must provide the
attributes of all of those classes, which may overlap. It's seemingly simple
but can get confusing very fast.
Let's illustrate this by taking a look at a common X.500 class:
organizationalPerson, or 18.104.22.168. What's up with that number? Remember the
whole snmp thing? Yes, X.500 makes use of OIDs to, among other things, identify
classes. That said, we commonly (and especially in the case of LDAP) deal only
with their names.
While organizationalPerson does not require any attributes, it suggests things
You will notice that this list is dated, and missing obvious things like name.
The former is because it is in fact very old, the latter is because
organizationalPerson is an auxiliary class and so is intended to be applied to
objects only in addition to other classes. Namely, organizationalPerson is
usually applied to objects alongside Person, which has some basics like:
cn (common name, required)
sn (surname, required)
assistant (as in, reference to this person's assistant)
You will notice that this class both overlaps with organizationalPerson on
telephoneNumber, but also has some odd things like assistant that seem to be
specific to an organization. Why the two different classes, then? Conway
observed that the structure of systems resembles the structure of their
creators; X.500 is no exception. organizationalPerson was written more as part
of an effort to represent organizations than as part of an effort to represent
people, these two efforts were not as well harmonized as you would hope for.
An object has a "primary" or "core" type. This is referred to as its structural
class, and the class itself must be specially marked as structural. This is
important for several reasons that are mostly under the hood of the X.500
implementation, but it's useful to know that Person is a structural class... so
an X.500 entry representing a human being should have a core type of Person,
but in most cases will have multiple auxiliary types bolted on to provide
That's a lot about the conceptual design of X.500... or really just the core
concept of the data structure, ignoring basically the entire transactional
concept which is more complicated than you could ever imagine. It's enough to
get more into LDAP, though.
Before we go fully into the LDAPverse, though, it's useful to understand how
LDAP is really used. This swerves right from OSI to one of my other favorite
topics, Network Operating Systems .
For a group of computers to act like a unified computing environment, they must
have a central concept of a user. This is most often thought of in the context
of authentication and authorization, but a user directory is also necessary to
enable features like messaging. Further, the user directory itself (e.g. the
ability to use the computer as a telephone directory) is considered a feature
of a network computing environment in its own right.
In almost all network computing environments, this user directory is descended
from X.500. This is seen in the form of Microsoft Active Directory for Windows
(modern Windows does not actually use LDAP to interact with the AD domain
controller, but instead a different directory access implementation called NT
LAN Manager or NTLM), and LDAP for Linux and MacOS (we will not discuss NIS
for Linux now, but perhaps in the future).
In these systems, the directory server acts as the source of basic information
on the user. Consider another important LDAP class, PosixAccount. PosixAccount
adds attributes like uid, homeDirectory, and gecos that reflect the user
account metadata expected by POSIX . It is possible to perform
authentication against LDAP as well, but it comes with limitations and security
concerns that make it uncommon in practice for operating systems. Both Windows
and Unix-like environments now generally use Kerberos for authentication.
Many things have changed in the transition from the grand vision of X.500 to
the reality of LDAP for information on user accounts. First, the concept of a
single unified X.500 namespace has been wholly abandoned. It's complex to
implement, and it's not clear that it's something anyone ever wanted, anyway,
as federation of directories between organizations brings significant security
and compliance concerns.
Instead, modern directories usually use DNS as their root organizational
hierarchy. This basically involves cramming shim objects into the DIT that
reflect the DNS hierarchy. The example DN I mentioned earlier would more often
be seen today as cn=J. B. Crawford,dc=computer,dc=rip. dc here is Domain
Component, and domain components are represented in the same order as in DNS
because LDAP uses the same confused right-to-left hierarchical representation
(AD does it the correct way around).
Another major change has been to the structure. The original intention was that
the X.500 hierarchy should represent the structure of the organization. This is
uncommon today, because it introduced a maintenance headache (moving objects
around the directory as people changed positions) and didn't have a lot of
advantages in practice. Instead LDAP objects are more commonly grouped by their
high-level purpose. For example, user accounts are often placed in an OU called
"accounts" or "users." All in all, this marks a more general trend that LDAP
has become a system only for software consumption, and there is minimal concern
today about LDAP being browseable by human users.
So let's consider some details of how LDAP works. First off, LDAP is a binary
protocol that uses a representation based on ASN.1. That said, LDAP is almost
always used with LDAP Data Interchange Format, or LDIF, which is a textual
representation. So it's very common to talk about LDAP "data" and "objects"
in LDIF format, but understand that LDIF is just a user aid and is not how
LDAP data is represented "in actuality."
LDAP provides more or less the verbs you would expect: ADD, DELETE, MODIFY.
These are not especially interesting. The SEARCH operation, however, is where
much of the in-use complexity of LDAP resides. SEARCH is a general-purpose
verb to retrieve information from an LDAP DIT, and it is built to be very
flexible. At its simplest, SEARCH can be invoked with a baseObject (a DN)
and a scope of BaseObject, which just causes the server to return exactly the
object identified by the DN.
In a more complex application, SEARCH can be invoked with a base path
representing a subtree, a scope of wholeSubtree (means what it says), and a
filter. The filter is a prefix-notation conditional statement that is applied
to each candidate object; objects are only returned if the filter evaluates to
We can put these SEARCH concepts together into a very common LDAP SEARCH
application, which is locating a user in a directory. A common configuration
for a piece of software using LDAP for authentication would be:
The $user here is a substitution tag which will be replaced by the user's
username. Confusingly, in the PosixAccount class, uid refers to the user name
while uidNumber is the value we usually refer to as uid.
A real headache comes about with groups. In authorization applications like
RBAC, you commonly want to get the list of groups a user is a member of to make
authorization decisions. There are multiple norms for representing groups in
LDAP. Groups can have a list of accounts which are members, or accounts can
have a list of groups they are a member of. Both are in common use, generally
the former for Windows and the latter for UNIX-likes. This is where the
flexibility of the filter expression becomes important: whatever "direction"
the LDAP server represents the relationship, it's possible to go "the other
way" by querying for the object type that contains the list with a filter
expression that the list must contain the thing you're looking for. Because
finding all users in a group is a less common requirement than finding all
groups a user is in, a lot of LDAP clients in practice make somewhat narrow
assumptions about how to find users but provide a more general (but also more
irritating) configuration for finding group information .
Another complexity of LDAP in practice is authentication. A last important
LDAP verb is BIND, BIND is used to assume the identity of a user in the
directory. While anonymous access to LDAP is common, modern directory servers
implement access control and limit access to sensitive values like password
hashes to the users they belong to, for obvious reasons. This means that the
formerly common approach of anonymously querying for a user to get their
password hash and then checking the password should never be seen or heard of
today. Instead, user authentication is done via BIND: the LDAP client attempts
to BIND to the user (as an LDAP object) using the password provided by the
user. If the server allows it, the user apparently provided the correct
password. If the server doesn't allow it, the user better try again. In this
way, the actual authentication method is the authentication method of the LDAP
server itself .
There's a problem, though. Or rather, two. First, for security reasons, it's
not necessarily a great idea to allow users to query for complete group
information, and depending on how group membership is represented it is not
necessarily practical to use access controls to allow a user to access only
the group information they should know about. Second, applications often have
a need to access directory information at points other than when a user is
actively logging in and the application has access to their password. For
obvious reasons it is not a good idea for the application to store the user's
password in plaintext for this purpose.
The solution is an irritating invention usually called a "manager." The manager
is a non-person account (also called a system account) that an LDAP client uses
in order to BIND to the LDAP server so that it is permitted to read information
that is not available for anonymous query. Most commonly this is used for
getting a user's group memberships. This is a particularly common setup because
a lot of applications need access to user group information fairly frequently
and do not strongly abstract their user information access, so they "cache"
group information and update it from the LDAP server periodically---outside of
the context of an authenticating user.
Very frequently this takes the form of periodically "synchronizing" the
application's existing local user database with the LDAP server, a lazy bit
of engineering that causes endless frustration for administrators but is also
difficult to avoid as the reality is that the concepts of "user" and "group"
simply vary far too widely between applications to completely centralize all
user information in one place.
As mentioned earlier, all of the methods of authenticating against LDAP have
appreciable limitations. For this reason, Kerberos is generally considered the
superior authentication method and "real" LDAP authentication is not common at
the OS level. That said, Kerberos configuration and clients are relatively
complex, which is probably the main reason that many non-OS applications still
use direct LDAP authentication.
In practice, directory servers are not usually set up as a standalone package.
Usually they are one facet of a larger directory system or identity management
system. Popular options are Microsoft Active Directory and Red Hat IDM (based
on FreeIPA), but there are a number of other options out there. Each of these
generally implement a directory service alongside a dedicated authentication
service (usually Kerberos because it is powerful and well researched), a name
service (DNS), and some type of policy engine. DNS might initially be
surprising here, as it does not at first glance seem like a related concern.
However, in practice, directory systems represent device just as much as
people. Because each host needs to have a corresponding directory entry
(particularly important with Kerberos where hosts need the ability to
authenticate to other network services on their own), it's already necessary to
maintain host information in the directory service which makes it a natural
place to implement DNS. DHCP is also sometimes implemented as part of the
directory service because there is overlap between the directory management
functions and basic host management functions of DHCP, but this seems to be
less common today because in enterprise orgs DHCP is more often part of an IPAM
solution (e.g. Infoblox).
You might be surprised to hear that there are all of these inconsistencies and
differences in LDAP implementations considering my claim that X.500 is strongly
typed against schemas. The nature of this contradiction will be obvious to any
DBA: for any non-trivial application, the schema will always be both too
complex and not complex enough. The well-established X.500 and LDAP schemas,
published for example in RFCs, don't have enough fields to express the full
scope of information about users needed in any given application.
Simultaneously, though, they provide so many types and attributes that there
are multiple ways to solve a given problem. Any attempt to reduce one problem
will inevitably make the other worse.
The long history of these systems only makes the problem more complicated, as
there are multiple and sometimes conflicting historic schemas and approaches
and it's hard to get rid of any of them now. For this reason identity
management solutions often come with some sort of "quick ref" documentation
explaining the important aspects of the LDAP schema as they use it, to be used
as an aid in configuring other LDAP clients.
I'm going to call this enough on the topic of LDAP for now... but there will be
a followup coming. For me, this whole discussion of complex enterprise
directory solutions raises a question: can we have the advantages of a directory
service, namely a unified sense of identity, in a consumer environment?
The answer is yes, through the transformation of all software into a monthly
subscription, but I want to talk a bit about the history of attempts at
bringing the dream of the NOS to the home. Microsoft has tried at least a
half dozen times and it has never really worked.
 As an example of this ontological complexity, Microsoft Active Directory is
sometimes referred to as being an LDAP server or LDAP implementation. This is
not true, but it's also not untrue. It is perhaps more accurate to say that
"Active Directory is an implementation of a modified form of X.500 which is
commonly accessed using LDAP for interoperability" but that's a mouthful and
probably still not quite correct.
 Have I written about this here before? While IANA was long operated by Jon
Postel who was famously benevolent, the function of ICANN was tossed around
defense contractors for a while and then handed to Network Solutions, who
turned out to be so comically evil that the power had to be taken away from
them. ICANN didn't turn out much better. It's a whole story.
 Requisite explanatory footnote about network operating systems (NOS): the
term has basically changed in definition midway through computer history.
Today NOS generally refers to operating systems written for network appliances,
like Cisco IOS. Up to the mid-'90s, though, it more commonly referred to a
general-purpose operating system that was built specifically to be used as part
of a network environment, such as Novell Netware. The salient features of NOS
such as centralized user directories, inter-computer messaging, and shared
access to storage and printers are present in all modern operating systems
(sometimes with implementations borrowed from historic NOS) and so the use of
the term NOS in this sense has faded away.
 This whole thing gets into some weird UNIX history, particularly the gecos
and the aspect of LDAP's UNIX-nerd cousin NIS. Maybe that'll be a post some
 For how closely connected the concept of users and groups seems to be, this
issue of the user->group query being irritatingly difficult is remarkably
common in identity systems, even many modern "cloud" ones. Despite being a
common requirement and one of the conceptually simpler options for
authorization RBAC does not generally seem to be a first-class concern to the
designers of directories.
 It's possible to use a wide variety of network services for authentication
in this way, by just passing the user's credentials on and seeing if it works.
I have seen a couple of web applications offer "IMAP authentication" in that
way, presumably because small organizations are more likely to have central
email than LDAP.
Very early on in my career as an "IT person," when my daily work consisted
primarily of photocopier and laptop warranty service with a smattering of
Active Directory administration (it was an, uh, weird job), I was particularly
intimidated by SNMP. It always felt like one of those dark mysteries of
computing that existed far beyond my mortal knowledge, like distributed
The good news is that SNMP is actually, as the name suggests, quite simple.
The reason for my SNMP apprehensions is a bit silly from the perspective of
computer science: SNMP makes extensive use of long, incomprehensible numbers.
That is, of course, basically a description of all of computing, but SNMP
exposes them to users in a way that modern software generally tries to avoid.
Today, we're going to learn about SNMP and those numbers. Surprise: they're
an emanation of an arcane component of the OSI stack, like at least 50% of
the things I talk about.
But let's step back and just talk about SNMP at a high level. SNMP was designed
to offer a portable and simple to implement method for a manager (e.g. an
appliance or administrator's workstation) to inspect the state of various
devices and potentially change their configuration. It's intended to be
amenable to implementation on embedded systems, and while it's most classically
associated with network appliances there is a virtually unlimited number of
devices and software packages which expose an SNMP interface.
SNMP often acts as a "lowest common denominator:" it's a simple and old
protocol, so just about everything supports it. This makes it very handy for
getting heterogeneous devices (especially in terms of vendor) into one
monitoring solution, and sometimes allows for centralized configuration as
well, although that gets a lot trickier.
At its core, SNMP belongs to a category of protocols which I refer to as remote
memory access protocols (this is my taxonomy and does not necessarily reflect
that of academic work or your employer). These are protocols which allow a
remote host to read and (possibly subject to access controls) write an emulated
memory address space. This does not necessarily (and often doesn't) have
anything to do with the actual physical or virtual memory of the service, and
the addressing scheme used for this memory space might be eccentric, but the
basic idea is there: the "server" has memory addresses, and the protocol allows
you to read and write them.
These remote memory access protocols, as a category, tend to be very common
with embedded systems because if they do happen to align with physical
memory, they are very simple to implement. A prominent example is Modbus, a
common industrial automation protocol that consists of reading and writing
registers, coils, etc., which are domain-specific terms for addresses in the
typed memory of PLCs (historically these were physical addresses in the PLC's
unusually structured memory, but today it's generally just a software construct
running on a more general-purpose architecture).
Unsurprisingly, then, the basic SNMP "verbs" are get and set, and these take
parameters of an address and, if setting, a value. On top of this very simple
principle, SNMP adds a more sophisticated feature called a "trap," but we'll
talk about that later. Let's call it an "advanced topic," although it's
actually one of the most useful parts of SNMP in practical situations.
What is perhaps most interesting to consider, as far as arcane details of SNMP,
is the structure of the addresses. This is the scary part of SNMP: just about
the first time you have to interact directly with SNMP you will encounter an
address, called a variable or more properly object identifier (OID) in SNMP
parlance, like .22.214.171.124.126.96.36.1995. It's like an IP address, if they were
substantially less user-friendly. That is to say, an IPv6 address .
These OIDs are in fact hierarchical addresses in a structure called the
Management Information Base (MIB). The MIB is an attempt to unify, into one
data structure, the many data points which could exist across devices in
a network. This idea of a grand unification of the domain of knowledge of
"configuration of network appliances" into one unpleasant numbered hierarchy
has a powerful smell of golden era Computer Science with a capital CS, and
indeed it is!
You see, from a very high level, the MIB is actually viewed as something akin
to a serialization format---it is, after all, fundamentally concerned with
packing the state of a device (Management Information) into a normalized,
strictly structured, interoperable format. To achieve this, the MIB is
described using something called SMI (e.g. RFC2578), which is best understood
as a simplified (or perhaps more formally "constrained") flavor or ASN.1.
ASN.1 is the most prominent of the interface description and serialization
formats developed for the OSI protocol suite. You might be tempted to call
ASN.1 an example of the "presentation layer," although like most invocations of
the OSI model, you would be misunderstanding the OSI model in saying so (the
OSI presentation layer protocols are, as the name suggests but is often
ignored, full on request-reply network protocols, not just serialization
formats). Nonetheless, people say this a lot, and at least ASN.1 truly dates
back to OSI, unlike a lot of things people relate to the OSI model.
You might be familiar with ASN.1 because it is widely used in cryptography, and
by this I mean that cryptography applications are widely saddled with ASN.1.
Most cryptographic certificates, the formats we tend to variously (and
confusingly) call X.500, PKCS#11, DER, PEM, etc, are ASN.1 serialized. This is
a whole lot of fun since ASN.1 is significantly divergent from modern computing
conventions, including the use of length-prefixed rather than terminated
strings (in some cases). I bring this up because it has lead to a rather famous
series of vulnerabilities in TLS implementations, because apparently not even
the people implementing TLS have actually read the ASN.1 specification that
Anyway, back to SMI. Basically, SMI allows vendors of devices (or anyone
really) to write, in SMI, a description of an MIB "module." A "module" is
basically a list of OIDs (hierarchically structured) with their types and other
metadata. This SMI source is then compiled into the binary representation
actually used by SNMP clients. If you are unlucky, you may need to write SMI
yourself for devices whose vendors implemented SNMP but did not provide the
supporting materials. But, in most cases, device vendors provide a file
(commonly called an MIB file) which is the SMI description of the MIB module(s)
implemented by the device. This MIB file can then be fed to your SNMP tool to
be compiled into its "whole picture" binary MIB.
Knowing that it is a result of compiling together SMI produces by various
vendors, let's take a look at the structure of the MIB. Each dot-separated
number identifies a subtree, which for extra fun are called "arcs" in the
context of the MIB. At the very top of the OID hierarchy is a top level which
identifies the standards authority. This is 0, 1, or 2, which refer to ITU,
ISO, and ITU/ISO together, respectively. Of course these three parts of the
tree use different internal structures so I can't generalize past this point,
but I will focus on the ISO tree because it's the one most commonly used in
Under the .1 ISO hierarchy are arcs for ISO standard OIDs, registry authorities
(somewhat difficult to explain and also not widely used, basically a metadata
space), ISO member organizations by country (e.g. ANSI in the US), and then
identified organizations, which are just companies and organizations that have
asked for OID space. This can be somewhat confusing because many national ISO
member organizations also allocate OID space within their arcs, but major
vendors (e.g. Cisco) are often found at this top level instead.
So let's take a look at a somewhat arbitrary example, an MIB for Juniper's
Junos. I'm using this as an example rather than the more obvious Cisco IOS
because I got mad at Cisco's website for getting MIBs which did not appear to
have seen an update in a decade. In any case, the MIB starts out at
In terms of the hierarchy this means: ISO standard, identified organization,
DOD, internet, private projects, private enterprises, Juniper.
Haha, wait, that just goes against most of what I said. What's going on with
the DOD thing?
The entire Internet, big-I, TCP/IP world is considered to be a subset of the
DOD, for OID purposes. This .188.8.131.52.4.1 space is actually managed by IANA, and
if you would like your own .184.108.40.206.4.1 number they will be happy to give you
one upon application.
This is all particularly interesting historically, because unlike a lot of
protocols I talk about SNMP does not predate IP. It was designed specifically
for use on IP networks, over UDP. SNMP is based on several earlier protocols
also used with IP. So, where does this weird rendition of IP to a small subset
Well, it really has more to do with politics than technology. The MIB tree
essentially belongs to ITU and ISO, but ITU and ISO are both organizations
which are not especially known for swiftly and cheaply adopting standards
proposed by vendors. It was fairly obvious from an early stage that vendors
would need to produce MIB modules for their own devices fairly quickly, but ISO
and ISO member organizations were not especially enthusiastic about issuing a
large number of arcs to these vendors. So instead, IANA stepped in---but not
quite IANA yet, instead IANA's predecessor, Jon Postel. Postel, who was the
IANA for quite some time, worked on contract for DOD, and so he assigned OIDs
out of their space. There's no really good reason for it to be this way, but
if you work with SNMP a lot then typing .220.127.116.11.4.1 will have become
Now, what is found inside of this Juniper space? Well, for example, there's
.18.104.22.168.4.1.2622.214.171.124.126.96.36.199. This is an integer value which provides the
average power used, in watts, by whatever's plugged into a particular outlet of
a managed PDU. The MIB structure allows OIDs which contain other OIDs (object
identifier type OIDs) to actually contain tables of those OIDs, so
.188.8.131.52.4.1.26184.108.40.206.2.4 is a table of all of the outlets on the PDU, and
.220.127.116.11.4.1.2618.104.22.168.2.4.1 within it is a list of useful properties of the
outlet such as name, status, and various useful electrical measurements like
current and power factor.
After all of this talk of ASN.1 and MIBs and etc, these examples are actually
very useful and concrete. SNMP is, after all, actually a useful protocol for
real-world situations, such as centralized monitoring of your PDUs to identify
problems and catch your colo customers exceeding their power budgets.
And remember, SNMP even allows writing. So .22.214.171.124.4.1.26126.96.36.199.188.8.131.52,
the status of the outlet, can not only be used to determine whether the outlet
is on or off but also to turn the outlet on or off, which is a fun move when
your colo customer doesn't pay their bill for months.
SNMP is not limited to as concrete of devices as managed PDUs. For example,
RFC4113 provides an MIB for UDP. That is, it permits you to describe and modify
UDP messages using SNMP, if that's a thing you really want to do. In fact, the
entire concept of the MIB is far more general than SNMP, and ISO protocols and
standards often use MIB OIDs for identification purposes having little to do
with the application we're discussing here. For example, many MIME types have
an associated OID because the OSI email equivalent, X.435, uses OIDs to
identify the types of message parts. In general, OSI standards are lousy with
OIDs used as identifiers and, less frequently, to describe data structures and
The fact that you can set via SNMP, and get all kinds of potentially sensitive
questions, raises the concern of security. Fortunately, SNMP provides an
airtight solution to this problem: "communities." A community is really just a
shared password, if the SNMP manager has the same community string as the SNMP
agent then it is allowed access. Even better, many SNMP agents have well-known
default community strings. Perfect. To be fair, SNMPv3 adds a more rigorous
authentication support including support for different authentication methods,
but there are still plenty of SNMPv2 devices out there with community string
set to "public."
One final thing to complete our discussion of SNMP is to mention the trap. More
technically, I am going to conflate traps and inform requests which are
actually slightly different, but everyone conflates them so I feel okay about
it. A trap is an extremely useful feature of SNMP which allows you to configure
an agent (e.g. device) to immediately inform a manager when certain events
occur. This is essentially a basic alarm capability built in to many devices.
Traps are identified by OIDs, and can bind other OIDs, so that the generated
trap message includes not only which trap was triggered, but also some other
related data if so configured. To be complete, an inform request is really just
a trap where the agent acknowledges receipt (this is not the case with normal
traps) so that the agent can resend if it is not acknowledged.
In order for traps to work, the manager first needs to listen for traps, which
is usually fairly straightforward to set up. Then, various OIDs are set on the
agent to enable traps and set the destination for those traps (e.g. the IP of
the manager). In some cases agents also provide a web interface or other more
convenient mechanisms to set these up, which is much appreciated since SNMP
is unpleasant to have to think about directly.
That's about it for SNMP. Simple, right? Well, it really is pretty simple, as
long as you agree to just take OIDs as magic numbers that come from wherever it
is computers do and not ask too many questions. Where SNMP can become rather
rough is when you run into issues with MIBs, or if you are using SNMPv3 where
the authentication and configuration can be amazingly, maddeningly complex for
As an aside, the whole reason I'm talking about SNMP is because a reader asked
me to. For much the same reason, from the same reader, I'll be talking about
LDAP soon. LDAP is even more an out-of-place artifact of OSI than SNMP, and it
is basically impossible to describe as used in short form, but I will take a
shot at illustrating the odd historical components of LDAP and the ways they
matter today. It will at least serve as a teaser for my yet to be written book,
"Survival Under LDAP." LDAP is survivable for as many as 70% of Americans,
but you must know how to protect yourself!
 I continue to seriously question the merits of the complex address
representation used with IPv6. If we had stuck to decimalized bytes separated by
dots, we'd be doing a lot more typing, but we wouldn't be trying to remember
what :: means when it's there.