by Darius Kazemi, April 24 2019
The technical content
This document proposes “a file transfer protocol” which on a technicality remains unnamed throughout the entire document. This file transfer protocol would eventually become known as the File Transfer Protocol, a protocol that, like Telnet, is still in use today (more on that in the Analysis section below).
This is a long RFC so I'm going to break it up into the same subsections used in the RFC itself.
The RFC begins by drawing a line between “direct” and “indirect” usage. Direct usage is when a user logs in to an actual account on the remote computer and uses it as though they were sitting there in front of it, typing all the same commands that a local user would. Indirect usage is where you don't get anything approaching operating system access and rather some kind of mediating system is involved that interacts with the operating system on your behalf.
The protocol is already in its debugging stages and seems to have been deployed at MIT already, at least in a testing sense. Bhushan points out that since they are connecting two computers with extremely different file systems, he is confident that the protocol should work on any combination of disparate file systems.
I mentioned this in my post about RFC-91, but file transfer was a very hard problem that the NWG had to tackle. This 2014 interview between Bob Braden (UCLA's Manager of Programming at this time) and Jeffrey R. Yost is illustrative.
Yost: Do you recall some of the major issues and challenges that were discussed in those early meetings among the ARPANET working group?
Braden: The first task was to develop a file transfer protocol, FTP. Again, I was the odd man out because the IBM had a truly baroque file system, and we wanted to let customers have access to some of the features of it, whereas all the other systems were UNIX systems with a very simple file structure. So if you look at the FTP spec there’s some weirdness, complexity that’s there because I asked for it.
Some terms are defined. A file is “an ordered set” of binary bits, and can be any length. A named file is a file with a unique identifier that is a directory name plus a file name. (So today that would be something like
C:/stuff/readme.txt in Windows or
/stuff/readme.txt in UNIX.)
A principal objective of the protocol is to promote the indirect use of computers on the network. Therefore, the user or his program should have a simple and uniform interface to the file systems on the network and be shielded from the variations in file and storage systems of different host computers.
So in a sense, FTP is the original HTTP, in that it's supposed to make it so you don't have to worry about which of dozens of operating systems and hardware another computer uses. You can just upload or download files and be done with it.
Bhushan says that he wants this protocol to be efficient, extensible, and adaptable, with error-recovery built in. He's proposing the first read “user friendly” internet protocol. (Early versions of Telnet, which were in use but had not been formalized by this time in 1971, were not what I'd call user-friendly.)
FTP is built with 7-bit ASCII and “binary” data in mind, and provides support for non-ASCII text like EBCDIC through conversion mechanisms.
This protocol operates on a file level. Stuff like the Host-Host protocol emphasizes messages (what we might think of as packets), but Bhushan is very specifically architecting this around the higher-level concept of the file as a logical record, a la Mealy in RFC-91.
Here's Mealy as quoted in that RFC:
Our point of view is that a link is a carrier of information. Information is carried in segments of a fixed maximum length called messages. That this is so is an accident, from the user's point of view; when he wishes to transmit a contiguous stream of data, he will in general, segment it in a different (from the IMP-IMP or HOST-HOST protocol view) manner — we will call his segment a record.
The author suggests identifying people by username and host site name, and warns against using passwords. The idea is that the FTP host server should be the one providing the security, and that the server “can prevent unprivileged access by users from other host sites”, though I don't know how and he doesn't explain.
Things on FTP happen in what are called transactions. A transaction message consists of:
- a 72-bit header
- the actual data
Filler seems to serve the same purpose as “padding” does in the Host-Host protocol.
The header consists of:
- a transaction type byte
- a data type byte
- an extension of data type byte
- a byte to tell you how long the filler field is
- a 24-bit number that tells you how many bits long the data field is
First of all, this is now an example of a third-level protocol implementing data types. Data types were specifically dropped from the Host-Host protocol the prior month, and Crocker et al encouraged third level protocols to implement them if they wanted to. In this case the data type byte and its extension say things like “this is ASCII text” or “this is EBDIC encoded” etc etc. Bhushan recommends that all hosts support ASCII and binary, with other encodings being optional.
Thirty different potential data types are named! This is a lot, so I couldn't imagine any site supporting all thirty. The idenfication of ASCII and binary as lowest common denominator makes practical sense.
Transaction types fall into these broad categories:
- request (messages that come from the user: login request, tell me what files you have, delete this file, etc)
- response (messages that come from the remote server: usually a kind of acknowledgement message)
- transfer (can be sent by the user or the remote server: this is the actual transfer of files back and forth)
- terminate (always comes from the server: the server letting the user know whether an action was successful or unsuccessful along with relevant error codes)
FTP from the very start comes with an “execute” request that allows the user to execute a file remotely on a server. According to Bhushan, the “execute request is intended to facilitate the indirect execution of programs and subroutines.”
Extensions to protocol
The protocol is open to extensions — namely, people are encouraged to add functionality to it as long as they don't remove or modify existing functionality. The author suggests adding new kinds of requests from users, new kinds of error messages, and new kinds of data types.
User access control is discussed at length as something that could be extended. Password support is proposed as a possible extension to the protocol that would aid in this.
Bhushan also suggests that the “execute” request could be further mediated. Right now the user needs to know the exact parameters that a remote program wants in order to execute it. He imagines a “program mediation protocol” that lets a user execute a program generically without worrying about the specifics of the program.
This RFC is a pretty historically important one. It lays the foundation for FTP, a file transfer protocol that is still in use today. While a lot of people today will beg you not to use it because it's totally insecure, the fact remains that people still use FTP every single day in 2019.
Bhushan's splitting of internet traffic into “direct” and “indirect” is really interesting to me. By this definition, almost all internet traffic in 2019 is “indirect”. For example, when I use HTTP to talk to a website, I don't log in as a user on their server. I just say “I would like index.html please” and the server says “okay, here is index.html”. In 1971, at least from my reading of how people were using it, most ARPANET traffic was “direct”.
I love that this proposal isn't like the proposal for DEL — it's a description of a protocol that is already in use at MIT, at least in its debugging stages. Bhushan says in an endnote that
The interim version of the protocol, limited to transfer of ASCII files, was developed by Chander Ramchandani and Howard Brodie of Project MAC. The ideas of transactions, descriptors, error recovery, aborts, file headings and attributes, execution of programs, and use of data types, pathnames, and default mechanisms are new here. Howard Brodie and Neal Ryan have coded the interim protocol in the PDP-10 and the 645, respectively.
The Mapping the Journey podcast has a 23 minute interview with Abhay Bhushan, and there's a full text transcript there too. It's well worth a listen/look. Compared to other authors of major early RFCs, specifically protocols as important as FTP, there is not a lot of interview material about Bhushan.
Above I link to a 2014 interview between Bob Braden and Jeffrey R. Yost, which mentions some FTP history.
Bhushan mentions several times that Metcalfe intends to write a paper on the “program mediation” concept for remote execution, but I can't find anything like it in his bibliography.