[tor-bugs] #24218 [Metrics/Statistics]: Implement new metrics-web module for IPv6 relay statistics

Fri Dec 8 11:48:48 UTC 2017

#24218: Implement new metrics-web module for IPv6 relay statistics
--------------------------------+--------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  enhancement         |         Status:  needs_revision
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:  iwakeh              |        Sponsor:
--------------------------------+--------------------------------
Changes (by iwakeh):

 * status:  needs_review => needs_revision

Comment:

 Starting with the SQL as this is most important.
 I'll add a ticket later (unless I find an existing one) for (finally)
 writing up sql coding guide lines and adapting all existing scripts.
 From reading intensely through the older sql scripts (for the tech report)
 I know that the naming in init-servers-ipv6.sql is consistent with the old
 naming, but there are parts that are really hard to read and often only
 understandable when reading the Java code in addition.

 Here some topics that should go into the guide doc (b/c of the upcoming
 guide doc, I'm trying to be verbose):
 * Use names (not numbers) for group-by and order-by, which is fine in this
 script.
 * types and function names etc. shouldn't be used as column or field
 identifiers, for example `count`, `date`, `timestamp`, and `server` (the
 enum type defined in the script).  For the server enum I'd suggest using
 `server_enum`, so the column (e.g. in table statuses) can be defined as
 `server server_enum NOT NULL,`, which makes immediately clear, what that
 column will contain.  Below I add comments to the various table
 definitions.
 * Names should be self-explanatory as much as reasonably possible.  Some
 names only make sense after reading the table's comment and for others
 only from reading the java code in addition.  For example, `count` in
 table aggregated first of all shouldn't be used because of the function
 `count()` and secondly it's meaning can only be derived from the table's
 comment.  Changing `count` to `server_count'  would make the meaning
 obvious.  Similarly, `advertised_bandwidth` to
 `advertised_bandwidth_bytes`.
 * For multi-line comments C-style `/* ...*/` could and maybe should be
 used and `-- ...` only for one-liners.  (That's of minor importance,
 though.)
 * Indentation of code in functions for readability.

 Detailed comments (suggesting name changes, asking questions):
 * server_descriptors:
 {{{
 #!sql
 CREATE TABLE server_descriptors (
   digest BYTEA PRIMARY KEY,  -- digest of what?  Maybe: sha1_desc_digest
   advertised_bandwidth INTEGER, -- in bytes?  Maybe adv_bandwidth_bytes?
   announced BOOLEAN NOT NULL, -- announced_ipv6
   exiting BOOLEAN  -- exit_flag or exit_relay (making obvious that this
 will be null for bridges)
 );

 }}}

 * server enum:
 {{{
 #!sql
 CREATE TYPE server_enum AS ENUM ('relay', 'bridge');
 }}}

 * statuses
 {{{
 #!sql
 CREATE TABLE statuses (
   status_id SERIAL PRIMARY KEY,
   server server NOT NULL,  -- rather: server server_enum NOT NULL,
   timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL,  -- valid_after
   running INTEGER NOT NULL, -- running_count (otherwise I'd assume this to
 be a boolean)
   UNIQUE (server, timestamp)
 );
 }}}

 * status_entries
 {{{
 #!sql
 CREATE TABLE status_entries (
   status_id INTEGER REFERENCES statuses (status_id) NOT NULL,
   digest BYTEA NOT NULL,  -- as above in server_descriptors
   guard BOOLEAN,  -- guard_relay
   exit BOOLEAN,  -- exit_relay
   confirmed BOOLEAN, -- confirmed_ipv6_relay
   UNIQUE (status_id, digest)
 );
 }}}

 * aggregated
 {{{
 #!sql
 CREATE TABLE aggregated (  -- aggregated_flags_ipv6
   status_id INTEGER REFERENCES statuses (status_id) NOT NULL,
   guard BOOLEAN,  -- cf. above
   exit BOOLEAN,  -- cf. above
   announced BOOLEAN,  -- cf. above
   exiting BOOLEAN,  -- cf. above
   confirmed BOOLEAN,  -- cf. above
   count INTEGER NOT NULL,  -- server_count or server_count_sum
   advertised_bandwidth BIGINT, -- adv_bandwidth_bytes or
 adv_bandwidth_bytes_sum
   CONSTRAINT aggregated_unique
     UNIQUE (status_id, guard, exit, announced, exiting, confirmed)
 );
 }}}

 * function aggregate:
 Maybe, call it `aggregate_flags_ipv6`?
 For the aggregate function I got the following error running the script:
 {{{
 psql --dbname=reviewipv6 -f modules/servers-ipv6/src/main/resources/init-
 servers-ipv6.sql
 CREATE TABLE
 CREATE TYPE
 CREATE TABLE
 CREATE TABLE
 CREATE TABLE
 psql:modules/servers-ipv6/src/main/resources/init-servers-ipv6.sql:75:
 ERROR:  syntax error at or near "ON"
 LINE 9: ON CONFLICT ON CONSTRAINT aggregated_unique
         ^
 CREATE VIEW
 }}}

 * servers_ipv6: Maybe: `ipv6servers`?  And, the `date` column need to be
 renamed.

 Some java related questions (from only skimming the code):
 The package name as mentioned at the top.  Why are the classes not
 declared public?  Couldn't the 'ParsedNetwork' of the class names
 ParsedNetworkStatus and ParsedNetworkDescriptor be ommitted here?

 General naming:
 Module name should be same as java package (especially without dashes) and
 maybe change java package to `ipv6servers`?  It seems more readable (to
 me) than `serversipv6`.

 I continue with an in-depth Java review when the SQL is settled.

 The structure looks fine regarding the upcoming re-factoring of metrics-
 web.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24218#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online