RFC-219

by Darius Kazemi, August 7 2019

In 2019 I'm reading one RFC a day in chronological order starting from the very first one. More on this project here. There is a table of contents for all my RFC posts.

The datacomputer

RFC-219 is titled “User's View of the Datacomputer”. It's authored by Richard Winter of the Computer Corporation of America, and dated September 3, 1971.

The technical content

This RFC is about the “datacomputer”, aka the trillion-bit store last mentioned in RFC-164. The idea is that this is, basically, a big shared drive for files. Some files are limited to only certain users, but other files are meant to be accessed by the entire user population of the ARPANET.

The author says that the datacomputer should be conceptualized as a black box, and that the way to interface with the black box is via a “data language”. Generally speaking, CCA is trying to make the internals of the datacomputer unnecessary for a user to know about, and instead the user should only be concerned with accessing records via their correct identifiers.

The datacomputer only a priori knows about one file: this is the file that contains the information about where all other files are. So that file is the datacomputer's entry to and view on its universe. Files contain uniquely numbered records, and records contain named fields. Named fields are not necessarily unique. Fields correspond to elementary data, so it seems like what we have here is a sort of key-value store.

Files can also contain an index with pointers to records. The example they give is an index that points to all records where the named field “STATE” has the value “MASSACHUSETTS”. (This seems a lot like a database index to me, in that it's kind of a pre-baked relationship that lets you access things faster than usual, but I'm even less of an expert in databases than I am in computing history.)

The datacomputer does periodic garbage collection, because as records are inserted into the system, sequentially ordered files might not end up physically sequentially ordered, only logically sequentially ordered.

The datacomputer communicates in “streams”, which are sequences of records passed between the datacomputer and a using program.

Apparently, access to a full file at a time is not the primary use of the datacomputer. Rather, they expect users to request subsets of files, comprised of lists of record ID numbers. You can also do simple boolean queries on file indices, like “return all records where the named field STATE is MASSACHUSETTS and the named field MONTH is AUGUST”.

The primitives for writing files are:

add a field/record
delete a field/record
replace a field/record

The author also notes that when the contents of a record is changed, any indices that point to the record must be recalculated.

The document ends with a list of potential uses for the datacomputer, including:

storing and retrieving giant blobs of data
a replacement for local tape drive storage
text storage and retrieval
storage and retrieval of a large database, like census data

Analysis

While modern programmers are probably really familiar with the concept of streams, a programmer in 1971 may not have been. I love this very simple description of the ephemerality of a stream:

There is no concept of permanent storage for streams.  The records move past the datacomputer one at a time, as though they were on a conveyor belt.

One record, the current record, is available to the datacomputer in each stream.

How to follow this blog

You can subscribe to this blog's RSS feed or if you're on a federated ActivityPub social network like Mastodon or Pleroma you can search for the user “@365-rfcs@write.as” and follow it there.

About me

I'm Darius Kazemi. I'm an independent technologist and artist. I do a lot of work on the decentralized web with ActivityPub, including a Node.js reference implementation, an RSS-to-ActivityPub converter, and a fork of Mastodon, called Hometown. You can support my work via my Patreon.