[tor-dev] Metrics Plans

Tue Jun 11 15:02:54 UTC 2013

> I can try experimenting with this later on (when we have the full / needed
> importer working, e.g.), but it might be difficult to scale indeed (not
> sure, of course). Do you have any specific use cases in mind? (actually
> curious, could be interesting to hear.)

The advantages of being able to reconstruct Descriptor instances is
simpler usage (and hence more maintainable code). Ie, usage could be
as simple as...

========================================

from tor.metrics import descriptor_db

# Fetches all of the server descriptors for a given date. These are provided as
# instances of...
#
#   stem.descriptor.server_descriptor.RelayDescriptor

for desc in descriptor_db.get_server_descriptors(2013, 1, 1):
  # print the addresses of only the exits

  if desc.exit_policy.is_exiting_allowed():
    print desc.address

========================================

Obviously we'd still want to do raw SQL queries for high traffic
applications. However, for applications where maintainability trumps
speed this could be a nice feature to have.

>> * After making the schema update the importer could then run over this
>> raw data table, constructing Descriptor instances from it and
>> performing updates for any missing attributes.
>
> I can't say I can easily see the specifics of how all this would work, but
> if we had an always-up-to-date data model (mediated by Stem Relay Descriptor
> class, but not necessarily), this might work.. (The ORM <-> Stem Descriptor
> object mapping itself is trivial, so all is well in that regard.)

I'm not sure if I entirely follow. As I understand it the importer...

* Reads raw rsynced descriptor data.
* Uses it to construct stem Descriptor instances.
* Persists those to the database.

My suggestion is that for the first step it could read the rsynced
descriptors *or* the raw descriptor content from the database itself.
This means that the importer could be used to not only populate new
descriptors, but also back-fill after a schema update.

That is to say, adding a new column would simply be...

* Perform the schema update.
* Run the importer, which...
  * Reads raw descriptor data from the database.
  * Uses it to construct stem Descriptor instances.
  * Performs an UPDATE for anything that's out of sync or missing from
the database.

Cheers! -Damian