Please review new control-spec.txt

Mon Jun 27 22:11:03 UTC 2005

(I finally got round to write this down...)

--- Nick Mathewson <nickm at freehaven.net> wrote:

> At the recommendations of a Very
> Experienced Internet
> Person, I've taken a hard look at the control
> protocol, and come to
> agree with the V.E.I.P that the protocol that
> shipped with 0.1.0.10 is
> a pain in the neck.  It is just binary enough to be
> hard to use, with
> no real benefit.  The solution is to go with a nice
> text-based
> protocol line SMTP and HTTP and everybody else use.

Sorry to cut in here already, but after taking a look
at the new protocol, I'm left wondering: In which way
is it really simpler than the old one?

Message building and parsing: With the old one,
everything you needed was a decent stream
implementation that lets you write standard
bytes/words/int32s, plus a way to convert a string to
bytes and back. The only "nasty" thing on the sender
side is that you need to store the message data in a
buffer first, get its length and send it before the
actual data. Parsing is equally straight-forward. A
nice feature is that you can completely separate the
receiving part from the parsing, because you know the
length of the data block that constitutes a message.
In other words, you don't have to look at the content
of the messages to determine the message boundaries.
(Even fragmented messages can be hidden from the
parsing code easily.)

Now what would be easier with the new protocol? One
positive aspect is that there is no need for the
sender to capture the message length in advance. The
*abstract* message composing is just concatenate and
send. However, I see a lot of "little stuff" that is
introduced which, IMO, offsets this gain. For example,
writing or receiving a string now requires
escaping/unescaping. Byte sequences that were trivial
to send and read before now must be hex-encoded and
parsed back. To make the parsing code robust, possible
encoding errors must be caught that could not appear
before (for example, receiving "$ZK" as an encoded
byte). While none of these things is difficult, I fail
to see the "nice" part in it.

The only positive thing that I could see about a
text-based protocol is that it could be used directly
via a terminal session. As I don't know if this use is
intended, I haven't looked into this further.

(Note, however, that I'm not against a switch to a
text protocol; I'm just not convinced that the old one
is bad or more complicated.)

Ok, after these general observations, I'll comment on
some details of the new protocol itself. By the way:
Sorry if the general tone of the comments appears
negative - I hope it comes across as the constructive
criticism it is.

---SPEC---
3.2 GETCONF
...
  If all of the listed keywords exist in the Tor
configuration, Tor replies
  with a series of reply lines of the form:
      250 keyword=value
---END---

I don't think it's a good idea to use the general 250
code for this, because this means that the message
receiver must be context-aware of the command that was
sent before. Better to define a response code that
*always* means "this is the current state of this CONF
value" (e.g. "259 key=value"). The 250 text should
always be optional to parse, IMO.

(Note: All of this applies to GETINFO as well; it
should, of course, get a different response code than
GETCONF.)

---SPEC---
3.2 GETCONF
...
  If any option is set to a 'default' value
semantically different from an
  empty string, Tor may reply with a reply line of the
form:
      250 keyword
---END---

I'm not sure I understand this?

> Commands that take extra data start with "+";
...
>     C: +POSTDESCRIPTOR
>     C: router foobar 1.2.3.4 9001 0 9030
>     C: [... a server descriptor goes here ...]
>     C: .
>     S: 250 OK

Hmm. I'm not a fan of the "it's data until THIS line
appears" approach; it is just one more exception that
must be dealt with on both sides (because the
terminator line must be "escaped" when it appears in
the data).

I pondered some different approaches, and I came up
with another solution. It's a common convention in
configuration files etc to "continue" a line by ending
it with a backslash. How about ending every line that
has an associated follow-up line this way? This could
equally apply to command and response messages, giving
more consistency to the protocol. (A good way to think
about this might be an "escaped newline" - it
separates individual lines, but does not terminate the
message.)

The above example, rewritten (with extra data lines):

     C: POSTDESCRIPTOR \
     C: router foobar 1.2.3.4 9001 0 9030 \
     C: more desciptor data \
     C: even more desciptor data \
     C: last line of descriptor
     S: 250 OK

Rewriting this example of a server reply:

>     C: GETINFO version addr-mappings/cache
>     S: 250-version=Tor 0.1.1.0-alpha-cvs
>     S: 250+addr-mappings/cache
>     S: tor.eff.org=209.237.230.66
>     S: tor2.eff.org=209.237.230.67
>     S: .
>     S: 250 OK

gives, when combined with my above suggestion to
replace the 250 code:

     C: GETINFO version addr-mappings/cache
     S: 258 version=Tor 0.1.1.0-alpha-cvs
     S: 258 addr-mappings/cache \
     S: tor.eff.org=209.237.230.66 \
     S: tor2.eff.org=209.237.230.67
     S: 250 OK

The main benefit that I see is the consistent message
structure. Each logical unit starts with a message
code, followed by a space; and it consists of all
lines up to the first that does not end with a
backslash. In case of a client command, the message
code is always a string (like "GETINFO"); in case of a
reply, it's always a numeric code. In case I didn't
miss anything, this approach requires no exceptions at
all.

> If you have the time to see what I've gotten wrong
> in *this* version
> of the specification, that would be much
> appreciated.  
> (Even telling me what is in bad taste would be
helpful.)

I did my best :)

Regards,

Robert

___________________________________________________________ 
Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de