by Darius Kazemi, Mar 5 2019
Marking considered harmful
The technical content
This document is about how something called “marking” is undesirable and needs to be eliminated from the protocol. So, a refresher: what is marking?
Marking is a solution proposed in RFC-33 by Postel and Crocker. It was not invented by them, but they proposed its use to solve a problem. Quoting RFC-33, that problem is when
the sending Host wants to send an irregular length message and its hardware is only capable of sending word-multiple messages, some additional convention is needed.
Maybe a little explanation is in order. A “word” is the smallest length of data can that be passed around by a given computer. For example, an 8-bit computer sends around data in 8-bit bundles called “words” (that's 8 ones and zeroes), and a 16-bit computer does the same but in 16-bit bundles of words. If an 8 bit computer wanted to store or send the binary message
11, which is two bits long, it would need to store or send something like
00000011. So an “irregular length message” is one that doesn't exactly fit as an even multiple of whatever size words a computer is using. Back in the ARPANET era there were 8-bit, 12-bit, 16-bit, and 32-bit computers. I'm sure there were other computers that used more esoteric word lengths too. But anyway, if we wanted to send a 17-bit message and we were on a 16-bit computer, we would pad out the message until it was 32 bits long, because that is a multiple of 16 that we can work with.
Side note: you might say, “What, a 32-bit computer back in that day?” Well, remember these were not mass produced consumer microcomputers. They were expensive mainframes! The MIT Whirlwind, which fully launched in 1951, was a 16-bit computer. The SDS Sigma 7 was a 32-bit computer launched in 1966. So while 16-bit computing didn't reach mass markets until the early 1980s, and 32-bit computing later than that, it was always available to anyone who wanted to shell out a few million dollars.
Anyway, the general proposal was that the sending Host would apply marking, which is a bunch of
0s followed by a
1 until the message text began on a word boundary, and the IMP would apply padding, which is a
1 followed by a bunch of
0s so that the ultimate length of the entire message would end on a word boundary.
All of this is just to make it so that the receiving machine knows the where in the data a message starts, where it ends, and how to quickly and efficiently get there.
One problem that this RFC brings up is when the sending Host has a different word length than the receiving Host. The RFC author shows a simple case where you could easily end up sending “112 bits to carry 8 bits of information”, which is very inefficient and costly in terms of time to transmit.
The solution proposed by Elie in this RFC involves embedding counters in the sending/receiving process and sending a “word complete” message on counter overflow. This is rather complex and I'm not even sure I fully understand it.
There is more to come on marking. Several more RFCs will address this problem. It's interesting to me how a relatively minor feature of the protocol is taking up most of the discussion time in the summer of 1970, at least in the RFC series.