by Darius Kazemi, May 2 2019
RFC-122 is titled “Network Specifications for UCSB's Simple-Minded File System”. It's authored by James E. White of UC Santa Barbara on April 26th, 1971.
The technical content
UCSB offers storage for users who want it via something called their Simple-Minded File System (SMFS). Storage is provided on IBM 2316 disk packs, each of which is essentially a hard drive that offers about 29 MB of storage space. There are 16 of these in two IBM 2314 direct access storage facilities, which allow external users to address and access data on an array of up to 8 of the drives. There are daily tape backups.
For now only one drive is available for Network use, with UCSB planning to provide up to four drives for the next year.
UCSB makes no claims to have a ton of file storage; even by 1971 standards this isn't a ton of storage space. But in the spirit of sharing resources they are happy to provide what they have.
SMFS can support up to ten simultaneous Network users. It is still operating under an older Host-Host standard so programmers will need to use that version of the protocol for now.
To connect to SMFS you send a connection request to Site 3, socket
x'401' (that's hex, it would be socket 1025 in decimal), where SMFS is always listening for incoming requests using the Initial Connection Protocol (referenced in RFC-66 and RFC-93), whereupon it shunts the user to a new socket for their connection, so that it can maintain
x'401' open for new connections from other users. As with other protocols, there is a receive socket
'n' and a send socket
'n+1'. This particular pattern of socket allotment really seems to have taken ARPANET by storm.
SMFS allows for storage of binary files between 1 bit and 25 million bits in length. Each file has a filename of 1 to 36 characters in length. Supported characters are A to Z (upper case only), the digits 0 through 9, and a blank space. Lower case letters are essentially converted to upper case, that is to say, the filenames are case insensitive. Curiously, the file system supports both ASCII and EBCDIC naming. I suppose this makes sense, since UCSB is using IBM 360 systems, which use the slightly less popular EBCDIC, and they understand that to reach the rest of the network, ASCII compatibility for file names will be very helpful.
Incredibly, the network file store offers three access levels: fully public, read-only, and read/edit. This is essentially identical to what is offered on network documents today (like a Google Spreadsheet). The way this is accomplished is through a password system. Public documents require no special permissions. You can lock down a file and provide an “access password” to users you want to have read-only access. Or you can provide a “modification password”, which allows users to edit the file. The document even suggests that “a group of users can be given access to a file while a single individual retains the exclusive right to modify it.”
SMFS has internal mechanisms that handle access collisions — when two users want to access or modify the same file at the same time. It accomplishes this by simply “locking” the file to the first user who asked for access and delaying access to the second user until the first is done with their work.
The first step in creating a file on SMFS is asking for a filename, specifying the amount of size that is to be allocated to it, and any passwords associated. If the name isn't already taken and there is space, then the space and the filename are reserved. SMFS adds a little bit of overhead to each file space allocated as insurance (overhead varies depending on how the user connects, there are interesting details in section V.A. of the document that I won't go into here).
Next the system offers two ways of writing a file — one essentially concatenates new data, while the other replaces any data that is there.
There is a function for retrieving files where you can ask for a filename and the number of bits you'd like to retrieve from it. What's interesting to me about this is you can just repeat the request successively asking for, say, 100 bits at a time, and it assumes that you want the next 100 bits in the file with each request. Once you reach the end, it terminates the connection. So not only do you not need to know how big a file is to retrieve it, you don't even have to ask for the metadata about the file size—you can just whack it with retrieve requests until you have the whole file. (There is an option to “format” your file by SMFS, which then does cause SMFS to supply the length metadata.)
Since SMFS is sequential-access rather than random-access, you can't just say “give me bits 25 through 28 of this 100 bit long file.” You have to say “give me the first 25 bits”, throw away that data, then say “give me the next 25 bits” and throw away everything but the first 3 bits, then close your connection. SMFS offers a way to say “give me the first 25 bits but don't actually give them to me, I am only asking you to do this to move your counter to position 25”. This is provided so as not to clog up the network with unnecessary data transfers and is called “spacing a file”.
Deletion and file renaming operations are also provided. Interestingly, they recommend that if you're going to rewrite a whole time, it might make sense to do a kind of remote swap so that if the connection fails you don't end up with a corrupt file. This uses both deletion and renaming. Their suggestion, if you want to overwrite a file called, say,
- upload the new contents of
MYFILEto a file called
- confirm that
This adds the extra confirmation step, and then at that point you're just reassigning metadata so there's no need to worry about dropped network connections corrupting your data.
The remainder of the document discusses the actual bit fields and headers and status messages and error messages that programmers need to know if they're going to communicate with SMFS.
I just want to point out that “send a host-host connection request to socket 1025 at site 3” is really similar to what happens on the internet today. “Site 3” is designated with that number because it's a number that everyone has agreed represents UC Santa Barbara, and as such it's hard-wired into the network of IMPs so that when you ask it to send information to site 3, it starts moving it towards a computer in Santa Barbara, California. In the modern internet instead of a site number, this is a domain name or, ultimately, an IP address. There is a complex network of things that are not so unlike the old IMPs that route your message to its destination when you request a specific IP. When you make a request via, say, a web browser, you assume that the device at the IP you're talking to is listening on port 80 (if it's an HTTP request), and then you assume that it speaks the language of HTTP so you can do your initial connection. So “send an HTTP request to 18.104.22.168:80” is the modern equivalent of “send a Host-Host request to 3:1025”.
At one point the author states that “there is no restriction on the contents” of files hosted at UCSB. I am pretty sure that was a file-format policy, as in arbitrary bits can be stored regardless of formatting, and not some kind of commentary on free speech.
This is all eerily similar to work I very occasionally do in the Dat ecosystem. Though it differs in all sorts of ways, Dat is, ultimately, a network file storage protocol like this one.