[tor-dev] Chutney code refactoring

Thu Apr 16 03:35:19 UTC 2020

I was going to mention this on my pull request
<https://github.com/torproject/chutney/pull/67> but I figured it would
be much more fitting to have a more-general thread here. In the PR I
mentioned how I believe that, if it quacks like a dict, it should be
implemented as a dict. That way we (hopefully) run into fewer gotchas
down the line, trying to reinvent the dict ourselves, and we have
clearer self-docuemtnation ("oh, this class inherits dict, now I know
kind of what to expect out of it").

I'm still trying to reason through chutney's code for how exactly
inheritance works and how this inheritance can be expressed more
clearly, for example it looks like the `Node` class is for a type of
relay (authority, client, bridge, et cetera) and its `getN` method is
to generate a number of those nodes. That is poorly documented and I
only figured this out by seeing how `Node.getN` was being called in the
tests themselves. To me, it makes more sense not to "overload" what a
`Node` is; if we are using an object as a factory to create more
`Node`s, then we should make that evident. The factory can then make
identical nodes with similar properties, rather than one parent `Node`
spawning children that refer back to their parent.

For me this issue is mostly "it feels like a lot of mental overhead,
and I'm sure it can be done better than it already is." But of course
it's difficult to fix something without first understanding how it
should behave. (Perhaps a test suite for the test suite? ;) But since
we're dealing with a lot of inheritance in chutney, and since this
inheritance is (in my opinion) expressed poorly, it can be quick to
find oneself caught in a web of code with no clear way out.

Here is what I have been considering lately, with the code overall. I
hope that laying it out in points will help both myself and others try
to divy up what could be improved about the code, work on it piecewise,
and make sure nothing is reimplemented incorrectly. I know I'm new to
the code and thus have not looked into it thoroughly enough to
understand everything about it, but again, something in the back of my
head is telling me that if it's this difficult to understand now, then
it can stand to be changed. So please take my criticism lightly and
only where it applies. I could be wrong about some things as well. But,
as a testing suite, this is the place we absolutely need to be sure we
get right, because with a hard-to-reason testing suite, we open
ourselves up to bugs within the tests themselves.

- Documentation especially for methods that the tests are supposed to
  call. In `Node` class, there is a comment "Users are expected to call
  these" with `__init__` and three other functions declared. `__init__`
  is fairly self-explanatory but the other three could stand to have
  the same docstrings that we are giving other functions. I think this
  should be my first task, before trying to mess with anything that
  moves. That way, we have a clearer overview of the testsuite's API,
  and then we can work the implementation around that.
- Once it's clear to all of us (meaning, not only the core Chutney
  devs, but also outsiders such as myself) what the current interface
  does, we can see the strengths of that interface, maybe some design
  flaws if we deem any to exist, and some improvements we can make with
  the interface. What I mean is: here is the place to make any
  necessary adjustments to the API before we dive into implementation.
- Now we visit the implementation. By this point I (and other non-core
  developers) should have a better understanding on what the tests
  *should* be doing, that we can then proceed to verify the correctness
  of the tests' implementations themselves. Here's where I would like
  to revisit cleaning up the code so that we make use of functions to
  abstract and "atomize" certain aspects of the code (like the
  currently-unused function isBoostrapped), where we should introduce
  more self-documenting names (as suggested, replace getAttribute names
  with isAttribute names for when we want to return bool values), et
  cetera.

I'm outlining what *I* think the approach should be, and in hopes that
everything else will line up and make more sense for us all. I do want
to make sure I know what the code should be doing, because while I
could inspect the code line by line and go through the logic branches
myself, I'm going to be perfectly honest and say it gives me a
headache :) I think it's much easier to reason about with
compartmentalised, self-documenting (and docstringed) components, than
to try to step through the code as it runs and take it in as one giant
piece.

Because, we're testing a complex piece of software (Tor). Tor itself
can do a lot of things that can throw us off especially if we're less
familiar with its implementation, yet. And that throws off our
understanding of how Chutney is supposed to work, as well. A lot of the
tests currently seem to be very conditional -- they pass or fail with
little rhyme or reason, which is of course the best type of issues to
debug :P -- so I don't see the test results helping me as well as
refactoring the testsuite first.

Again, let me know if I have anything wrong, made any unjust
assumptions, showed my ignorance, et cetera. I'm going into this boldly
and I hope that's the right call.

Caitlin