[tor-bugs] #24218 [Metrics/Statistics]: Implement new metrics-web module for IPv6 relay statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Dec 8 11:48:48 UTC 2017
#24218: Implement new metrics-web module for IPv6 relay statistics
--------------------------------+--------------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: needs_revision
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: iwakeh | Sponsor:
--------------------------------+--------------------------------
Changes (by iwakeh):
* status: needs_review => needs_revision
Comment:
Starting with the SQL as this is most important.
I'll add a ticket later (unless I find an existing one) for (finally)
writing up sql coding guide lines and adapting all existing scripts.
From reading intensely through the older sql scripts (for the tech report)
I know that the naming in init-servers-ipv6.sql is consistent with the old
naming, but there are parts that are really hard to read and often only
understandable when reading the Java code in addition.
Here some topics that should go into the guide doc (b/c of the upcoming
guide doc, I'm trying to be verbose):
* Use names (not numbers) for group-by and order-by, which is fine in this
script.
* types and function names etc. shouldn't be used as column or field
identifiers, for example `count`, `date`, `timestamp`, and `server` (the
enum type defined in the script). For the server enum I'd suggest using
`server_enum`, so the column (e.g. in table statuses) can be defined as
`server server_enum NOT NULL,`, which makes immediately clear, what that
column will contain. Below I add comments to the various table
definitions.
* Names should be self-explanatory as much as reasonably possible. Some
names only make sense after reading the table's comment and for others
only from reading the java code in addition. For example, `count` in
table aggregated first of all shouldn't be used because of the function
`count()` and secondly it's meaning can only be derived from the table's
comment. Changing `count` to `server_count' would make the meaning
obvious. Similarly, `advertised_bandwidth` to
`advertised_bandwidth_bytes`.
* For multi-line comments C-style `/* ...*/` could and maybe should be
used and `-- ...` only for one-liners. (That's of minor importance,
though.)
* Indentation of code in functions for readability.
Detailed comments (suggesting name changes, asking questions):
* server_descriptors:
{{{
#!sql
CREATE TABLE server_descriptors (
digest BYTEA PRIMARY KEY, -- digest of what? Maybe: sha1_desc_digest
advertised_bandwidth INTEGER, -- in bytes? Maybe adv_bandwidth_bytes?
announced BOOLEAN NOT NULL, -- announced_ipv6
exiting BOOLEAN -- exit_flag or exit_relay (making obvious that this
will be null for bridges)
);
}}}
* server enum:
{{{
#!sql
CREATE TYPE server_enum AS ENUM ('relay', 'bridge');
}}}
* statuses
{{{
#!sql
CREATE TABLE statuses (
status_id SERIAL PRIMARY KEY,
server server NOT NULL, -- rather: server server_enum NOT NULL,
timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL, -- valid_after
running INTEGER NOT NULL, -- running_count (otherwise I'd assume this to
be a boolean)
UNIQUE (server, timestamp)
);
}}}
* status_entries
{{{
#!sql
CREATE TABLE status_entries (
status_id INTEGER REFERENCES statuses (status_id) NOT NULL,
digest BYTEA NOT NULL, -- as above in server_descriptors
guard BOOLEAN, -- guard_relay
exit BOOLEAN, -- exit_relay
confirmed BOOLEAN, -- confirmed_ipv6_relay
UNIQUE (status_id, digest)
);
}}}
* aggregated
{{{
#!sql
CREATE TABLE aggregated ( -- aggregated_flags_ipv6
status_id INTEGER REFERENCES statuses (status_id) NOT NULL,
guard BOOLEAN, -- cf. above
exit BOOLEAN, -- cf. above
announced BOOLEAN, -- cf. above
exiting BOOLEAN, -- cf. above
confirmed BOOLEAN, -- cf. above
count INTEGER NOT NULL, -- server_count or server_count_sum
advertised_bandwidth BIGINT, -- adv_bandwidth_bytes or
adv_bandwidth_bytes_sum
CONSTRAINT aggregated_unique
UNIQUE (status_id, guard, exit, announced, exiting, confirmed)
);
}}}
* function aggregate:
Maybe, call it `aggregate_flags_ipv6`?
For the aggregate function I got the following error running the script:
{{{
psql --dbname=reviewipv6 -f modules/servers-ipv6/src/main/resources/init-
servers-ipv6.sql
CREATE TABLE
CREATE TYPE
CREATE TABLE
CREATE TABLE
CREATE TABLE
psql:modules/servers-ipv6/src/main/resources/init-servers-ipv6.sql:75:
ERROR: syntax error at or near "ON"
LINE 9: ON CONFLICT ON CONSTRAINT aggregated_unique
^
CREATE VIEW
}}}
* servers_ipv6: Maybe: `ipv6servers`? And, the `date` column need to be
renamed.
Some java related questions (from only skimming the code):
The package name as mentioned at the top. Why are the classes not
declared public? Couldn't the 'ParsedNetwork' of the class names
ParsedNetworkStatus and ParsedNetworkDescriptor be ommitted here?
General naming:
Module name should be same as java package (especially without dashes) and
maybe change java package to `ipv6servers`? It seems more readable (to
me) than `serversipv6`.
I continue with an in-depth Java review when the SQL is settled.
The structure looks fine regarding the upcoming re-factoring of metrics-
web.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24218#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list