use computers to store data

2021-04-03

The New York Times once described a software package as "among the first of an emerging generation of software making extensive use of artificial intelligence techniques." What is this deep learning, data science, artificial intelligence they speak of? Ansa Paradox, of 1985 [1].

Obviously the contours of "artificial intelligence" have shifted a great deal over the years. At the same time, though, basic expectations of computers have shifted---often backwards.

One of the most obvious applications of computer technology is the storage of data.

Well, that's both obvious and general, but what I mean specifically here is data which had previously or conventionally been stored on hardcopy. Business records, basically: customer accounts, project management reports, accounts payable, etc etc. The examples are numerous, practically infinite.

I intend to show that, counter-intuitively, computers have in many ways gotten worse at these functions over time. The reasons are partially technical, but for the most part economic. In short, capitalism ruins computing once again.

To get there, though, we need to start a ways back, with the genesis of business computing.

Early computers were generally not applied to "data storage" tasks. A simple explanation is that storage technology developed somewhat behind computing technologies; early computers, over a reasonable period of time, could process more data than they could store. This is where much of the concept of a "computer operator" comes from: the need to more or less continuously feed new data to the computer, retrieved from paper files or prepared (e.g. on punched cards) on demand.

As the state of storage changed, devices included simple, low-capacity types of solid state memory such as core memory, and higher capacity media such as paper or magnetic tape. Core memory was random access, but very expensive. Tape was relatively inexpensive on a capacity basis, but it was extremely inefficient to access it in a nonlinear (e.g. random) fashion. This is essentially the origin of mainframe computers being heavily based around batch processing: for efficiency purposes, data needed to be processed in large volumes, in fixed order, simply to facilitate the loading of tapes.

The ability to efficiently use a "database" as we think of them today effectively required a random-access storage device of fairly high capacity, say, a multi-MB hard drive (or more eccentrically, and more briefly, something like a film or tape magazine).

Reasonably large capacity hard disk drives were available by the '60s, but were enormously expensive and just, well, enormous. Still, these storage devices basically created the modern concept of a "database:" a set of data which could be retrieved not just in linear order but also arbitrarily based on various selection criteria.

As a direct result of these devices, IBM researcher E. F. Codd published a paper in 1970 describing a formalized approach to the storage and retrieval of complex-structured data on large "shared data banks." Codd called his system "relational," and described the primary features seen in most modern databases. Although it was somewhat poorly received at the time (likely primarily due to the difficulty of implementing it on existing hardware), by the '90s the concept of a relational database had become so popular that it was essentially assumed that any "database" was relational in nature, and could be queried by SQL or something similar to it.

A major factor in the rising popularity of databases was the decreasing cost of storage, which encouraged uses of computers that required this kind of flexible, structured data storage. By the end of the 1980s, hard disk drives became a common option on PCs, introducing the ingredients for a database system to the consumer market.

This represented, to a degree which I do not wish to understate, a democratization of the database. Nearly as soon as the computers and storage became available, it was a widespread assumption that computer users of all types would have a use for databases, from the home to the large enterprise. Because most computer users did not have the desire to learn a programming language and environment in depth, this created a market for a software genre almost forgotten today: the desktop database.

I hesitate to make any claims of "first," but an early and very prominent desktop database solution was dBase II (they called the first version II, a particularly strong form of the XBox 360 move) from Ashton-Tate. dBase was released in around 1980, and within a period of a few years the field proliferated. FoxPro (actually a variant of dBase) and Paradox were other major entrants from the same time period that may be familiar to older readers.

dBase was initially offered on CP/M, which was a popular platform at the time (and one that was influential on the design of DOS), but was ported to DOS (of both Microsoft and IBM variants) and Apple II, the other significant platforms of the era.

Let's consider the features of dBase, which was typical of these early desktop database products. dBase was textmode software, and while it provided tabular (or loosely "graphical") views, it was primarily what we would now call a REPL for the dBase programming language. The dBase language was fairly flexible but also intended to be simple enough for end-users to learn, so that they could write and modify their own dBase programs---this was the entire point of the software, to make custom databases accessible to non-engineers.

The dBase language was similar to SQL but added additional interactive prompting and operations for ease of use. Ted Leath provides a reasonably complex example dBase program on his website.

It wasn't necessarily expected, though, that the dBase language and shell would be used on an ongoing basis. Instead, dBase shipped with tools called ASSIST and APPGEN. The purpose of these tools was to offer a more user-friendly interface to a dBase database. ASSIST was a sort of general-purpose client to the database for querying and data management, while APPGEN allowed for the creation of forms, queries, and reports linked by a menu system---basically the creation of a CRUD app.

In a way, the combination of dBase and APPGEN is thus a way to create common CRUD applications without the need for "programming" in its form at the time. This capability is referred to as Rapid Application Development (RAD), and RAD and desktop databases are two peas in a pod. The line between the two has become significantly blurred, and all desktop databases offer at least basic RAD capabilities. More sophisticated options were capable of generating client applications for multiple applications which could operate over the network.

As I mentioned, there are many of these. A brief listing that I assembled, based mostly on Wikipedia with some other sources, includes: DataEase, Paradox, dBase, FoxPro, Kexi, Approach, Access, R:Base, OMNIS, StarOffice/OpenOffice/LibreOffice/NeoOffice Base (delightfully also called Starbase), PowerBuilder, FileMaker, and I'm sure at least a dozen more. These include some entrants from major brands recognizable today, such as Access developed by Microsoft, FileMaker acquired by Apple, and Approach acquired by IBM.

These products were highly successful in their time. dBase propelled Ashton-Tate to the top of the software industry, alongside IBM, in the 1980s. FileMaker has been hugely influential in Apple business circles. Access was the core of many small businesses for over a decade. It's easy to see why: desktop databases, and their companion of RAD, truly made the (record-keeping) power of computers available to the masses by empowering users to develop their own applications.

You didn't buy an inventory, invoicing, customer management, or other solution and then conform your business practices to it. Instead, you developed your own custom application that fit your needs exactly. The development of these database applications required some skill, but it was easier to acquire than general-purpose programming, especially in the '90s and '00s as desktop databases made the transition to GUI programs with extensive user assistance. The expectation, and in many cases reality, is that a business clerk could implement a desktop database solution to their record-keeping use case with only a fairly brief study of the desktop database's manual... no coding bootcamp required.

Nonetheless, a professional industry flowered around these products with many third-party consultants, integrators, user groups, and conferences. Many of these products became so deeply integrated into their use-cases that they survive today, now the critical core of a legacy system. Paradox, for example, has become part of the WordPerfect suite and remains in heavy use in WordPerfect holdout industries such as law and legislation.

And yet... desktop databases are all but gone today. Many of these products are still maintained, particularly the more recent entrants such as Kexi, and there is a small set of modern RAD solutions such as Zoho Creator. All in all, though, the desktop database industry has entirely collapsed since the early '00s. Desktop databases are typically viewed today as legacy artifacts, a sign of poor engineering and extensive technical debt. Far from democratizing, they are seen as constraining.

What changed?

I posit that the decline of desktop databases reflects a larger shift in the software industry: broadly speaking, an increase in profit motive, and a decrease in ambition.

In the early days of computing, and extending well into the '90s in the correct niches, there was a view that computers would solve problems in the most general case. From Rear Admiral Hopper's era of "automatic programming" to "no-code" solutions in the '00s, there was a strong ambition that the field of software engineering existed only as a stopgap measure until "artificial intelligence" was developed to such a degree that users were fully empowered to create their own solutions to their own problems. Computers were infinitely flexible, and with a level of skill decreasing every day they could be made to perform any function.

Today, computers are not general-purpose problem-solving machines ready for the whims of any user. They are merely a platform to deliver "apps," "SAAS," and in general special-purpose solutions delivered on a subscription model.

The first shift is economic: the reality of desktop databases is that they were difficult to monetize to modern standards. After a one-time purchase of the software, users could develop an unlimited number of solutions without any added cost. In a way, the marketers of desktop databases sealed their own fate by selling, for a fixed fee, the ability to not be dependent on the software industry going forward. While not achieved, this was at least the ideal of their fate.

The second shift is cultural: the mid-century to the '90s was a heady time in computer science when the goal was flexibility and generality. To be somewhat cynical (not that that is new), the goal of the '10s and '20s is monetization and engagement. Successful software today must be prescriptive, rather than general, in order to direct users to the behaviors which are most readily converted into a commercial advantage for the developer.

Perhaps more deeply though, software engineers have given up.

The reality is that generality is hard. I am, hopefully obviously, presenting a very rosy view of the desktop database. In practice, while these solutions were powerful and flexible, they were perhaps too flexible and often lead to messy applications which were unreliable and difficult to maintain. Part of this was due to limitations in the applications, part of it was due to the inherent challenge of untrained users who were effectively developing software without a practical or academic knowledge of computer applications (although one could argue that this sentence describes many software engineers today...).

One might think that this is one of the most important challenges that a computer scientist, software engineer, coder, etc. could take on. What needs to be done, what needs to be changed to make computers truly the tools of their owners? Truly a flexible, general device able to take on any challenge, as IBM marketing promised in the '50s?

But, alas, these problems are hard, and they are hard in a way that is not especially profitable. We are, after all, talking about engineering software vendors entirely out of the problem.

The result is that the few RAD solutions that are under active development today are subscription-based and usage-priced, effectively cloud platforms. Even despite this, they are generally unsuccessful. Yet, the desire for a generalized desktop database remains an especially strong one among business computer users. Virtually everyone who has worked in IT or software in an established business environment has seen the "Excel monstrosity," a tabular data file prepared in spreadsheet software which is trying so very hard to be a generalized RDBMS in a tool not originally intended for it.

As professionals, we often mock these fallen creations of a sadistic mind as evidence of users run amok, of the evils of an untrained person empowered by a keyboard. We've all done it, certainly I have; making fun of a person who has created a twenty-sheet, instruction-laden Excel workbook to solve a problem that clearly should have been solved with software, developed by someone with a computer science degree or at least a certificate from a four-week fly-by-night NodeJS bootcamp.

And yet, when we do this, we are mocking users for employing computers as they were once intended: general-purpose.

I hesitate to sound like RMS, particularly considering what I wrote a few messages ago. But, as I said, he is worthy of respect in some regards. Despite his inconsistency, perhaps we can learn something from his view of software as user-empowering versus user-subjugating. Desktop databases empowered users. Do applications today empower users?

The software industry, I contend, has fallen from grace. It is hard to place when this change occurred, because it happened slowly and by degrees, but it seems to me like sometime during the late '90s to early '00s the software industry fundamentally gave up. Interest in solving problems was abandoned and replaced by a drive to engage users, a vague term that is nearly always interpreted in a way that raises fundamental ethical concerns. Computing is no longer a lofty field engaged in the salvation of mankind; it is a field of mechanical labor engaged in the conversion of people into money.

In short, capitalism ruins computing once again.

Epilogue

If I have a manifesto at the moment, this is it. I don't mean to entirely degrade the modern software industry, I mean, I work in it. Certainly there are many people today working on software that solves generalized problems for any user. But if you really think about it, on the whole, do you feel that the modern software industry is oriented towards the enablement of all computer users, or towards the exploitation of those users?

There are many ways in which this change has occurred, and here I have focused on just one minute corner of the shift in the software industry. But we can see the same trend in many other places: from a distributed to centralized internet, from open to closed platforms, from up-front to subscription, from general-purpose to "app store." And yet, after it all, there is still "dBase 2019... for optimized productivity!"

[1] I found this amazing quote courtesy of some Wikipedia editor, but just searching a newspaper archive for "artificial intelligence" in the 1970-1990 timeframe is a ton of fun and will probably lead to a post one day.