[tor-dev] Draft report on where to focus testing efforts and modularization efforts in Tor

Nick Mathewson nickm at freehaven.net
Thu Mar 26 15:01:24 UTC 2015


0. Summary

   I've got some end-of-month deliverables to try to figure out how Tor
   fits together to identify the most important areas to test, the
   hardest areas to test, and the best ways to modularize Tor in the
   future.  Here's my current draft.  Some of these ideas are
   half-baked; many will need more rounds of revision before they
   can be acted on.

   For more information see tickets #15236 and #15234.


1. Testing Tor: Where we should focus

1.1. Let's focus on complicated, volatile things.

   These functions are the highest in length and in complexity:

    circuit_expire_building
    parse_port_config
    run_scheduled_events
    connection_ap_handshake_rewrite_and_attach
    options_act
    networkstatus_parse_vote_from_string
    connection_dir_client_reached_eof
    directory_handle_command_get
    networkstatus_compute_consensus
    options_validate

   If we focus on the number of commits that have code still live in
   a function, we have:

     circuit_expire_building
     parse_port_config
     run_scheduled_events
     connection_ap_handshake_rewrite_and_attach
     options_act
     networkstatus_parse_vote_from_string
     connection_dir_client_reached_eof
     directory_handle_command_get
     networkstatus_compute_consensus
     options_validate

   And if we want to focus on average commits per line of code in a
   function, weighted in various ways, these ones stand out:

     add_stream_log_impl
     connection_close_immediate
     connection_connect_sockaddr
     connection_dir_client_reached_eof
     connection_edge_destroy
     connection_exit_begin_conn
     connection_handle_write_impl
     consider_testing_reachability
     directory_info_has_arrived
     init_keys
     options_act
     options_validate
     router_add_to_routerlist
     router_dump_router_to_string
     tor_free_all
     router_rebuild_descriptor
     run_scheduled_events

   If we focus on how many changes have *ever* been made to a
   function, we get these as the top offenders:

      router_rebuild_descriptor
      onion_extend_cpath
      directory_handle_command
      dns_resolve
      assert_connection_ok
      connection_handle_write
      connection_handle_listener_read
      dns_found_answer
      circuit_extend
      main
      connection_edge_process_inbuf
      connection_ap_handshake_attach_circuit
      config_assign
      init_keys
      circuit_send_next_onion_skin
      connection_dir_process_inbuf
      prepare_for_poll
      connection_exit_begin_conn
      connection_ap_handshake_process_socks
      router_dump_router_to_string
      fetch_from_buf_socks
      getconfig
      run_scheduled_events
      connection_edge_process_relay_cell
      do_main_loop

  No single list above is uniquely the right one to focus on;
  instead, we should probably use them in tandem to identify areas
  to work on.

1.2. What will be hardest to test?

  The entire "blob" described in section 2.1 below should be
  considered hard-to-test because of the number of functions
  reachable from one another.  Removing some of this callgraph
  complexity will help us test, but it's not completely trivial.
  See Appendix A for a list of functions of this kind.

2. Untangling Tor: Modules and You

  If you've done a software engineering class, you probably got
  taught how to diagram the modules or layers of a software project.
  I tried this with Tor back in 2005 and my head spun; things have
  not gotten easier since then.

  Looking at Tor's call graphs tell us a lot about our issues with
  modularizing Tor.  In the abstract, we'd like to divide Tor into
  layers and modules, with most modules only calling a few others,
  and the "central" functions limited to as little code as possible.
  This will make modules more independent and help us test them in
  isolation.  This will also help us with future design ideas like
  splitting Tor into separate processes for testing or security or
  robustness purposes.

2.1. Fun with callgraphs

  I did some analysis of Tor's callgraph to figure out where we
  stand today.  About 27% of the functions that tor calls
  (internally and externally) can reach only 10 or fewer other
  functions that Tor knows about.  About 82% can reach a smaller
  portion of Tor.

  Then there comes a block of functions that can reach _nearly every
  other function_ in the block, or in Tor.  They are all reachable
  from one another.  There are about 450 of these, or 11% of the
  functions in Tor.  I'll call them "The Blob."

  Finally, there are functions that can reach the Blob, and all the
  functions that the Blob calls, but which are not themselves part
  of the Blob.  These are about 100 functions, or 2% of the
  functions in Tor.

  Breaking up the Blob is a high priority for making Tor more
  modular.

  So, how can we do that?

  * Let's start by looking at which functions, if they were made
    unnecessary, or if they stopped calling into the Blob, would
    remove the most other functions from the Blob.

  * Let's also start by hand-annotating the Blob with a list of
    functions that logically shouldn't be part of the Blob, and then
    seeing what they call that pulls them into the Blob.

  Here are some preliminary results for the best functions to try to
  remove from the Blob, and some antipatterns to avoid:

  * The following functions pull many, many other functions into the
      Blob: control_event_* router_rebuild_descriptor_
      directory_fetches_from_authorities *mark_for_close* *free
      *free_

    If we could stop them calling back into the blob, we would
    shrink the blob's size down to 254 functions, and reduce the
    number of outside functions calling into the blob to 74.

    Also worth investigating is seeing whether to remove the call
    from 'connection_write_to_buf_impl_' to
    'connection_handle_write', and the call from
    'conection_handle_write_impl' to 'connection_handle_read'.
    Either of these changes reduces the number of functions
    reachable from blob functions severely, and shifts many
    functions from Blob-members to Blob-callers.

    Finally, these might be more work to unblob:

     'connection_or_change_state' 'tor_cleanup' 'init_keys'
     'connection_or_close_for_error' But doing so would take the
     blob down to 60 functions, with the blob-callers to 118.

  * Many functions that are meant to simply note that an event has
    occurred, so that some work must happen. We could refactor most
    of these to instead use a deferred callback type, rather than
    invoking the functions that call the work directly.  This would
    also help us decouple Blob functions from one another.

  * I have the blob functions listed in Appendix A.1, with the ones
    that would still in the blob after the changes above listed in
    Appendix A.2.

2.2. Separating modules into processes and libraries; module-level APIs.

  Let's imagine this architecture: Tor runs as a series of
  conceptually separate processes that communicate over a reasonable
  IPC mechanism.  These processes correspond to module boundaries;
  their APIs need to be carefully designed.  There is a central
  communications process that does the work of transferring commands
  and replies from A to B.

  APIs would be divided into publish/subscribe types and
  remote-procedure call types.  Modules would be divided into
  stateful/non-stateful; non-stateful ones could be shared
  libraries; or if their processing is sensitive, they could be kept
  in very sandboxed processes.

  How do we get there?

  To make the proposal concrete, I suggest for a 0th-order draft
  that we imagine the following simplified version:

    * That all src/common modules become shared libraries with no
      state.

    * That we divide (or imagine how to divide) the contents of
      src/or into separate modules, with more clear connections
      between them.  We imagine doing this one part at a time,
      starting by adding a nice simple publish/subscribe mechanism.

    * That the controller API be reimagined as a a layer on top of
      the rest of this.

    * That any multiprocess version of this be imagined, for
      concreteness, as ProtocolBufs over nanomsg.  (I'm not tied to
      this particular format/transport combination; just proposing
      it for concreteness.

  We would need a transition plan here.  I'd suggest that some of
  the easiest modules to extract first would be voting, rephist,
  node information management, path selection, directory data, and
  hidden services.  Most of these have the virtue of being more
  easily testable once they are extracted.

  Once we have some significant pieces split into libraries, and
  submodules, we could start doing process-level refactoring, moving
  towards an architecture built out of processes and parts of these kinds:

    * Network: Has libevent-driven main loop.  Accepts commands in a
      pool, maybe from a subthread.

    * Utility: Is just event-driven.  Answers requests, gives
      responses.

    * Manager: starts up processes as needed and connects them to
      one another.

    * Dispatcher: relays messages from where they come from to where
      they go.

  Our best next steps in this area, IMO, seem to be to identify one
  or two key areas and abstractions, and begin isolating them in
  practice so as to better get a feel for how this kind of
  transition works, so we can iterate our design.

[Appendices in attachment so as to keep this to a reasonable size.]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: modularization-ideas-appendix.txt.bz2
Type: application/x-bzip2
Size: 4060 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20150326/4a05ae3a/attachment.bin>


More information about the tor-dev mailing list