Proposal: GETINFO controller option for connection information

Fri Jun 4 04:10:31 UTC 2010

Hi Nick, thanks for the feedback!

I'm fine with most of these, but the names are no good.  *Everything*
> that's returned from GETINFO is "info" after all
>

Heh, good point. Changed.

The only one I wouldn't want to add as-is the ns/authorities option,
> since it says that it uses the v2 directory format.
>

I chose the v2 directory format since that's what all the other "ns" entries
use and it seems weird to have this be a black sheep. If this is really a
problem I'm happy to change it, though conformity to the rest of the spec
struck me as being important.

Three reasons off the top of my head.  First: That's the format that
> RFCs use.  Second: we like to be able to use diff to compare different
> versions of a spec, and making every paragraph a single line makes
> diff's output much less useful.  Third: we like to version-control our
> specs, and it's a lot easier to resolve conflicts when every paragraph
> is not a single line.
>

I don't want to dwell on this too much since it's a pretty trivial issue,
but I disagree with points two and three. If you add or remove words, the
rewrapping throws off the diff and version control resolution for any text
following it anyway. That said, "conformity with the other specs, and since
most people here prefer it" strike me as perfectly fine reasons. ;)

Do you mean their IDs, or their entries in the format specified above, or
> what?
>

I was tempted to go with just the IDs, but it seems like the other "/all"
getinfo options provide all the content so going with that to try and fit
in.

Probably a "process-owner" notion is closer to what you want here.
> Also remember that it needs to work on Windows. ;)
>

Yup, that's why the user entry states that an empty string is provided if
none exists. I'm not yet sold on the usefulness of the uid, euid, gid, and
egid but happy to include them if others think it would be handy.

At this point we're probably ready for another proposal revision
>

Your wish is my command! I've split it up into two proposals (see attached),
one concerning the circ options and the other listing the expansion of
relay/process getinfo options. Other changes include:
- added "relay/flags" and "desc/time" getinfo options
- changed the bandwidth entries to be both inbound and outbound (not sure
how it would make sense otherwise...)
- changed the circ results to have a unix timestamp for when the circuit was
created rather than the uptime (more generally useful and avoids the
question of when the results were last updated)

Cheers! -Damian

On Tue, Jun 1, 2010 at 8:47 AM, Nick Mathewson <nickm at freehaven.net> wrote:

> I'll try to follow up to this whole thread and push things forward even
> more.
>
> On April 14, Damian wrote:
> [...]
> >Also, could we move forward on the other (less controversial) items? For
> instance, bandwidth totals tend to be a very highly requested piece of
> information and pipe's already provided a nice patch to get it (
> http://www.mail-archive.com/or-talk@freehaven.net/msg13085.html). For
> reference, here's the not-so-controversial GETINFO options I proposed:
>
> I'm fine with most of these, but the names are no good.  *Everything*
> that's returned from GETINFO is "info" after all, so prefixing all of
> these with "info" is redundant; they need to be put into a grouping
> that actually says what they mean.
>
> The only one I wouldn't want to add as-is the ns/authorities option,
> since it says that it uses the v2 directory format.  That format is
> obsolescent; we shouldn't be adding new things that use it.  A
> non-deprecated format would be fine.
>
>  [...]
> >I'm not planning on converting the following to the customary 80-character
> width until it's at
> > least past being a first draft for a couple reasons:
> >1. I find editing fixed-width documents to be a time consuming pain in the
> ass.
>
> Maybe use a text editor that will re-wrap paragraphs for you?  There
> are dozens of them.
>
> > 2. I've yet to hear why we do this. Is it just to cater to mail clients
> too dumb to know how to line wrap?
>
> Three reasons off the top of my head.  First: That's the format that
> RFCs use.  Second: we like to be able to use diff to compare different
> versions of a spec, and making every paragraph a single line makes
> diff's output much less useful.  Third: we like to version-control our
> specs, and it's a lot easier to resolve conflicts when every paragraph
> is not a single line.
>
> There may be more benefits too.
>
>
> On Fri, Apr 16, 2010 at 3:17 PM, Jacob Appelbaum <jacob at appelbaum.net>
> wrote:
> > Damian Johnson wrote:
> >> Yesterday Jake met with me to discuss this proposal, making the very
> >> good points that both:
> >>   1. It's completely ineffectual for the auditing purposes I've
> >> mentioned since either (a) these results can be fetched from netstat
> >> already or (b) the information would only be provided via tor and
> >> can't be validated.
> >>   2. The things I'm really interested in can be fetched with much less
> >> (and safer) information.
> >
> > I still think that anything that can be used to track circuits (and the
> > clients associated with them) is not a good idea - in Tor or using arm.
> > We shouldn't encourage people to log, look or otherwise track Tor.
> >
> >>
> >> In particular we discussed making the proposal circuit based rather
> >> than connection based, being something like the following:
> >>
> >>   "circ/<Circuit identity>" -- Provides entry for the associated
> circuit,
> >>     formatted as:
> >>       CIRC_ID IN_TYPE OUT_TYPE READ WRITE UPTIME
> >>
> >>     none of the parameters contain whitespace, and additional results
> must be
> >>     ignored to allow for future expansion. Parameters are defined as
> follows:
> >>       CIRC_ID - Unique identifier for the circuit this belongs to.
> >>       IN_TYPE/OUT_TYPE - Single character flags indicating the purpose
> of the
> >>         inbound or outbound connection. If no connection is established
> then
> >>         this provides an empty string. Otherwise, it consists of one
> from each
> >>         of the following categories (this may become longer in future
> >>         expansion):
> >>           Usage Type:
> >>             C: client traffic, R: relaying traffic,
> >>             X: control, H: hidden service, D: directory
> >>           Destination:
> >>             I: inter-tor connection, O: outside the tor network, L:
> localhost
> >>         For instance, "RO" would indicate that this was an established
> >>         1st-hop (or bridged) relay connection.
> >>       READ/WRITE - Total bytes read/written over the life of this
> connection.
> >>       UPTIME - Time the connection's been established in seconds.
>
> This looks a lot better; I don't see a good way to cause problems with
> this.
>
> >>   "circ/all" -- Newline separated listing of all current circuits.
>
> Do you mean their IDs, or their entries in the format specified above, or
> what?
>
>  [...]
> >>   SafeControlPort 0|1
> >>     Restricts access of the control port to only include read-only
> operations.
> >>     (Default: 0)
> >>
> >> Making this the default would be a no-go due to vidalia (though still
> >> a nice option to have...). If this is implemented its setting should
> >> be part of the PROTOCOLINFO response.
>
> I agree with Jake that this probably wants to be another proposal of
> its own, and get implemented independently.
>
> >> Finally, the other proposed GETINFO options still seem useful (with
> >> the possible exception of "info/uptime-reset"), and could be improved
> >> with the addition of:
> >>
> >>   "info/user" -- User under which the tor process is running, providing
> an
> >>     empty string if none exists.
> >>
> >
> > You may also want something like the following:
> >
> > "info/uid"
> > "info/euid"
> > "info/gid"
> > "info/egid"
>
> Probably a "process-owner" notion is closer to what you want here.
> Also remember that it needs to work on Windows. ;)
>
> Also see above caveat on the "info/" prefix.
>
> >>   "info/pid" -- Process id belonging to the tor process, -1 if none
> exists for
> >>     the platform.
> >>
> >> * this one is both useful and surprisingly difficult for me to
> >> retrieve at present (arm attempts to get it from pidof, ps, and
> >> netstat yet still fails on some systems...)
> >
> > The good news is that it's pretty easy to do in C:
> >
> >    pid_t pid;
> >    pid = getpid(); // see also getppid();
> >    printf("PID is: %d\n", pid);
>
> Fine by me, modulo calling it "info".
>
> At this point we're probably ready for another proposal revision, and
> a draft patch to implement all of the above. :)
>
> --
> Nick
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100603/1de7043d/attachment.htm>
-------------- next part --------------
Filename: xxx-circ-getinfo-option.txt
Title: GETINFO controller option for circuit information
Author: Damian Johnson
Created: 03-June-2010
Status: Draft

Overview:

    This details an additional GETINFO option that would provide information
    concerning a relay's current circuits.

Motivation:

    The original proposal was for connection related information, but Jake make
    the excellent point that any information retrieved from the control port
    is...

      1. completely ineffectual for auditing purposes since either (a) these
      results can be fetched from netstat already or (b) the information would
      only be provided via tor and can't be validated.

      2. The more useful uses for connection information can be achieved with
      much less (and safer) information.

    Hence the proposal is now for circuit based rather than connection based
    information. This would strip the most controversial and sensitive data
    entirely (ip addresses, ports, and connection based bandwidth breakdowns)
    while still being useful for the following purposes:

    - Basic Relay Usage Questions
    How is the bandwidth I'm contributing broken down? Is it being evenly
    distributed or is someone hogging most of it? Do these circuits belong to
    the hidden service I'm running or something else? Now that I'm using exit
    policy X am I desirable as an exit, or are most people just using me as a
    relay?

    - Debugging
    Say a relay has a restrictive firewall policy for outbound connections,
    with the ORPort whitelisted but doesn't realize that tor needs random high
    ports. Tor would report success ("your orport is reachable - excellent")
    yet the relay would be nonfunctional. This proposed information would
    reveal numerous RELAY -> YOU -> UNESTABLISHED circuits, giving a good
    indicator of what's wrong.

    - Visualization
    A nice benefit of visualizing tor's behavior is that it becomes a helpful
    tool in puzzling out how tor works. For instance, tor spawns numerous
    client connections at startup (even if unused as a client). As a newcomer
    to tor these asymmetric (outbound only) connections mystified me for quite
    a while until until Roger explained their use to me. The proposed
    TYPE_FLAGS would let controllers clearly label them as being client
    related, making their purpose a bit clearer.

    At the moment connection data can only be retrieved via commands like
    netstat, ss, and lsof. However, providing an alternative via the control
    port provides several advantages:

      - scrubbing for private data
          Raw connection data has no notion of what's sensitive and what is
          not. The relay's flags and cached consensus can be used to take
          educated guesses concerning which connections could possibly belong
          to client or exit traffic, but this is both difficult and inaccurate.
          Anything provided via the control port can scrubbed to make sure we
          aren't providing anything we think relay operators should not see.

      - additional information
          All connection querying commands strictly provide the ip address and
          port of connections, and nothing else. However, for the uses listed
          above the far more interesting attributes are the circuit's type,
          bandwidth usage and uptime.

      - improved performance
          Querying connection data is an expensive activity, especially for
          busy relays or low end processors (such as mobile devices). Tor
          already internally knows its circuits, allowing for vastly quicker
          lookups.

      - cross platform capability
          The connection querying utilities mentioned above not only aren't
          available under Windows, but differ widely among different *nix
          platforms. FreeBSD in particular takes a very unique approach,
          dropping important options from netstat and assigning ss to a
          spreadsheet application instead. A controller interface, however,
          would provide a uniform means of retrieving this information.

Security Implications:

    This is an open question. This proposal lacks the most controversial pieces
    of information (ip addresses and ports) and insight into potential threats
    this would pose would be very welcomed!

Specification:

   The following addition would be made to the control-spec's GETINFO section:

  "circ/<Circuit identity>" -- Provides entry for the associated circuit,
    formatted as:
      CIRC_ID CREATED IN_TYPE IN_READ IN_WRITE OUT_TYPE OUT_READ OUT_WRITE

    none of the parameters contain whitespace, and additional results must be
    ignored to allow for future expansion. Parameters are defined as follows:
      CIRC_ID - Unique identifier for the circuit this belongs to.
      CREATED - Unix timestamp for when the circuit was created.
      IN_TYPE/OUT_TYPE - Single character flags indicating the purpose of the
        inbound or outbound connection. If no connection is established then
        this provides an empty string. Otherwise, it consists of one from each
        of the following categories (this may become longer in future
        expansion):
          Usage Type:
            C: client traffic, R: relaying traffic,
            X: control, H: hidden service, D: directory
          Destination:
            I: inter-tor connection, O: outside the tor network, L: localhost
        For instance, "RO" would indicate that this was an established
        1st-hop (or bridged) relay connection.
      READ/WRITE - Total bytes read/written over the life of this circuit.

  "circ/all" -- Newline separated listing of all current circuits.
-------------- next part --------------
Filename: xxx-getinfo-option-expansion.txt
Title: GETINFO Option Expansion
Author: Damian Johnson
Created: 02-June-2010
Status: Draft

Overview:

    Over the course of developing arm there's been numerous hacks and
    workarounds to gleam pieces of basic, desirable information about the tor
    process. As per Roger's request I've compiled a list of these pain points
    to try and improve the control protocol interface.

Motivation:

    The purpose of this proposal is to expose additional process and relay
    related information that is currently unavailable in a convenient,
    dependable, and/or platform independent way. Examples of this are...

      - The relay's total contributed bandwidth. This is a highly requested
        piece of information and, based on the following patch from pipe, looks
        trivial to include.
        http://www.mail-archive.com/or-talk@freehaven.net/msg13085.html

      - The process ID of the tor process. There is a high degree of guess work
        in obtaining this. Arm for instance uses pidof, netstat, and ps yet
        still fails on some platforms, and Orbot recently got a ticket about
        its own attempt to fetch it with ps:
        https://trac.torproject.org/projects/tor/ticket/1388

    This just includes the pieces of missing information I've noticed
    (suggestions or questions of their usefulness are welcome!).

Security Implications:

    None that I'm aware of. From a security standpoint this seems decently
    innocuous.

Specification:

    The following addition would be made to the control-spec's GETINFO section:

    "relay/bw-limit" -- Effective relayed bandwidth limit (currently
    RelayBandwidthRate if set, otherwise BandwidthRate).

    "relay/burst-limit" -- Effective relayed burst limit.

    "relay/read-total" -- Total bytes relayed (download).

    "relay/write-total" -- Total bytes relayed (upload).

    "relay/flags" -- Flags currently held by the relay.

    "desc/time" -- Unix timestamp for when the latest server descriptor was
    fetched.

    "process/user" -- User under which the tor process is running, providing an
    empty string if none exists.

    "process/pid" -- Process id belonging to the tor process, -1 if none exists
    for the platform.

    "process/uptime" -- Total uptime of the tor process (in seconds).

    "process/uptime-reset" -- Time since last reset (startup or sighup signal,
    in seconds).

    "process/descriptors-used" -- Count of file descriptors used.

    "process/descriptor-limit" -- File descriptor limit (getrlimit results).

    "ns/authority" -- Router status info (v2 directory style) for all
    recognized directory authorities, joined by newlines.