[tor-dev] Brainstorming ideas for controller features for improved testing; want feedback

Nick Mathewson nickm at torproject.org
Thu Mar 26 17:08:44 UTC 2015


On Fri, Mar 20, 2015 at 11:55 AM, Nick Mathewson <nickm at torproject.org> wrote:
> Hi!  I've got an end-of-month deliverable to flesh out as many good
> ideas here as I can, and I'd appreciate feedback on what kind of
> features it would be good to add to the controller protocol in order
> to better support testing.
>
> More ideas would be most welcome.

Thanks for the comments, all!

Here's the latest version, with more things folded in, and some syntax sketches.

All syntaxes below are examples meant to focus discussion around
what might appear in a spec, not final (ha!) formal specification.

IDEAS
=====

1. Step-by-step hidden service connections

   Add the ability to create connections to hidden services step by
   step, to best

   What's necessary here is commands to:
      * Establish a rendezvous point on a given circuit.
      * Construct and send an introduce1 cell on a given circuit.
      * Realize that a rendezvous circuit has been constructed.

   ESTABLISH_REND {circuit_num} {rend_token}

   SEND_INTRO1 {circuit_num} intro_key={key} authinfo={ nfo}

   650 RECEIVED_REND_OK {circuit_num}

2. Send a single cell on a circuit

   (TESTING ONLY)

   For fuzzing and low-level testing purposes, it would be handy to be
   able to send a single cell on a tor circuit.

   This might be better to expose via a low-level modular API than via
   the control port.

   SEND_RAW_RELAY_CELL {circuit_num} LAYER={cpath layer} DATA={hex}

3. Intercept cell by cell on a circuit

   (TESTING ONLY)

   For fuzzing, testing, and debugging purposes, it might be handy for
   a controller to be able to observe data cell by cell on a circuit of
   interest.

   This might be better to expose via a low-level modular API than via
   the control port.

   650 RELAY_CELL {circuit_num} LAYER={cpath layer} DATA={hex}

4. Send a single cell on a connection.

   (TESTING ONLY)

   As 2, but for connections.  Note that we might even, for testing,
   expose this at a sub-cell level.

   SEND_RAW_CELL {conn id} DATA={hex}

5. Intercept all cells on a connection

   (TESTING ONLY)

   As 2, but for connections.

   650 CONN_CELL {conn id} DATA={hex}

6. Plug-in to handle a relay or other command.

   Right now, all Tor's features need to be baked into Tor; it's not
   easy to write extensions.  We could change that by having the
   controller able to intersect particular relay or extension commands
   and act accordingly.  This could be used for prototyping new
   features, etc.

   HANDLE_EXTENSION {conn=id|circ=id|conn=*|circ=*} {command_num}

   650 EXTENSION {conn=id|circ=id} {command_num} DATA

7. Force a given protocol on a given connection

   We could add a feature to restrict what protocols can be negotiated
   on a given connection we create.  This could help us better test our
   protocols for interoperatbility.

   OPEN_CONN {addr=address...} {protos=1,2,3...} {KEYID1=...}

8. Examine fine-grained connection detail.

   There are many data available for a given connection (such as
   fine-grained TLS information) that are not currently exposed on the
   GETINFO interface.  We could make most of this available for testing,
   pending security analysis.

   GETINFO orconn/<id>/all
   GETINFO orconn/<id>/item

9. Examine cache in detail

   In the past we've seen crazy issues with our descriptor caching
   code.  It might be good to expose for testing information about
   where exactly descriptors are stored, what attributes are set on
   them, and so on.  We could also expose events for cache compaction
   and discarded expired descriptors.

   This might also (TESTING ONLY) extend to events for access of cache
   documents, information about HS Descs, etc, though that's a bit worrisome.

   GETINFO cache/<itemtype>/keys
   GETINFO cache/<itemtype>/by-key/<key>/ ...
            raw
            annotations
            location

   650 CACHE_COMPACTED (bytes_saved)
   650 CACHE_DISCARDED (item)

10. Fetch literal documents

   Currently there's no way for a controller to ask Tor to download a
   given descriptor or microdescriptor or networkstatus.  That could
   change.

   DOWNLOAD <itemtype> <keys> [{circ=circ_id} {anon=0|1} {cache=router_id}]

11. OOM stats

   To resist out-of-memory attacks, we track our memory usage and kill
   off circuits as needed when memory gets low.  We could expose the
   memory thresholds and current sizes via one or more controller
   commands.

   GETINFO mem-usage/all
   GETINFO mem-usage/<type>

   650 MEM_USAGE k=v k=v k=v
   650 MEM_FREED k=v k=v k=v

12. Timeout values

   Tor has a truly huge variety of internal timers to ensure that given
   periodic events happen enough; we could expose those, and (TESTING
   ONLY) allow controllers to adjust them or trigger their corresponding
   events.

   GETINFO timer/list
   GETINFO timer/expires/<timer-name>

   ADJTIMER <timer-name> <seconds>.<msec>

   650 TIMER <timer-name> next=<seconds.msec>

13. Detailed connection debugging info

   Current connection events expose only large-scale state changes in
   connections; we could instead expose every state transition at the
   cell-by-cell handshake level.

14. Detailed circuit debugging info

   As 13 but for circuits.

15. Halt main loop except for control layer.

   (TESTING ONLY)

   For inspection/debugging purposes, it might be clever to have Tor be
   able to freeze itself, except for the control layer, and let the
   controller inquire about information.

   This presents implementation challenges, and is probably not a great
   idea to do before a _big_ refactoring.

   FREEZE {LOOP|CONN=id|CIRCUIT=id}
   UNFREEZE {LOOP|CONN=id|CIRCUIT=id}

16. Service a single connection

   (TESTING ONLY)

   Currently controllers can disable circuit construction or stream
   attachment, and do them manually.  We might also do this for
   connections, allowing a testing controller to trace what Tor does
   cell by cell on a single connection.

   STEP {LOOP|CONN=id|CIRCUIT=id}

17. All rephist data

    There are many data about history and usage in rephist.c (which
    stands for 'reputation and history!').  We could expose them, to let
    us better test them.  Some of this might be useful for Seth
    (previously arm) users.

    Spec: This would use GETINFO extensions, and probably some new
    events.

18. All ratelim data

    Sometimes our rate-limiting code can get wonky.  It would be great
    to expose it to Tor controllers in order to help ensure it's
    behaving correctly.  This would include send/receive windows and bw
    stuff.

    Spec: This would use GETINFO extensions, and probably some new
    events.

19. All accounting data

    As 17, but for hibernate.c, which performs bandwidth accounting.

20. All guard transitions

    Our guard node state logic is very complicated, and much in need of
    testing and refactoring.  Exposing more state transitions and guard
    selection transitions to the controller might help.  (We have a
    "GUARD" event now, but it is a bit out of sync with the main
    implementation)

21. All key transitions

    We could generate events every time we change keys, and (TESTING
    ONLY) allow a controller to time-out a key early.

22. Examine mux settings

    (TESTING ONLY)

    The circuitmux code that we use to decide which cell to send next is
    very complex; it would be good to expose its thinking and
    decisionmaking to a testing controller for better observation.

23. Examine pathbias settings

    Our pathbias code is also complex, and a bit flakier than the
    circuitmux code.  We could do for it as with 22 above.

24. Examine cpuworker queues

    As of 0.2.6, we have a new cpuworker infrastructure that better
    sends data to worker threads, but not much visibility into how well
    it's working.  Exposing some information about this to the
    controller could help us tune better.

    650 QUEUED_WORK TYPE=type ID=id RAW=hex NUM_QUEUED=n
    650 FINISHED_WORK TYPE=type ID=id STATUS=status RAW=raw

    GETINFO cpuworker/all/n_queued
    GETINFO cpuworker/all/rate
    GETINFO cpuworker/<num>/...
    GETINFO cpuworker/<type>/...

25. All geoip data

    As 17, but for any geoip information we aren't currently exporting.

26. Replay detection

    For hidden service security (and maybe eventually for secure ntor Y
    reuse) we have to keep replay caches to prevent us from being
    tricked into handling the same value twice.  We don't expose the
    load on these caches to the controller, however.  And we could, to
    help us better tune them into using a good memory/error-rate
    trade-off.

    GETINFO replaycache/<type>/load
    GETINFO replaycache/<type>/size

27. Hidden service intropoint changes, desc changes, uploads  (+2)

    Many hidden service transitions currently generate no events.  We
    could at minimum generate events for changed inroduction points,
    changed hidden service descriptors, uploading our own HS descriptor.


    650 HSSERV <id> <eventtype> k=v k=v k=v

28. Descriptor uploads.

    We have an event for when our descriptor has changed, but not for a
    successful upload for it.  We could fix that.

    650 UPLOADED <type> <authority> <status>

29. Path generation logic -- expose, allow.

    Currently a controller's only visibility into path selection logic
    is in its outputs, and in the opportunity to replace path selection
    logic entirely.  We could expose more details about the algorithm's
    operation in order to help better test our path selection.

    GEN_PATH <k=v> <k=v>

    650 WANT_PATH id <k=v>
    USE_PATH id node,node,node

30. All PT status information.

    Pluggable transport feedback is, at present, very coarse-grained.
    For testing we might expose more.

31. Crypto operation counts.

    We ought to keep count of our various cryptographic operations, and
    expose them to the controller.   This would help us know where to
    spend our optimization efforts.

    GETINFO crypto/{<op>|all}/{<use>|all}/count
    GETINFO crypto/{<op>|all}/{<use>|all}/speed

32. Forget cached information  (+1)

    To better test our download logic, it would be helpful to have a way
    to drop items from our caches.

    DROPCACHE {cachetype}

33. Expose number of bytes sent-received per circuit

    (TESTING ONLY)

    This could help us improve efficiency by investigating slow-circuit
    issues.

    650 CIRCBYTES sent={n} rcvd={n}

34. Modify state of existing connections and circuits

    (TESTING ONLY)

    This could help with fault injection to better validate error-handling
    code.

IMPLEMENTATION NOTES:
=====================

Not lightly does one list 33 controller improvement areas.  If we're
hoping to do these without too much programmer time and , we need to
take a much more principled approach to implementing controller
commands.  For #8351 I worked on a branch called 'ticket8351' that has
some code we could use here.

COMPATIBILITY NOTES:
====================

Many of the features here are ones that we might not want to promise to
support indefinitely; we should gate them behind a USEFEATURE command,
and maybe place them in an annex of the control spec.


More information about the tor-dev mailing list