[metrics-bugs] #25525 [Metrics/CollecTor]: Fix either spec or code regarding full path of sanitized webstats files
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Mar 16 16:30:47 UTC 2018
#25525: Fix either spec or code regarding full path of sanitized webstats files
-----------------------------------+--------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: new
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-----------------------------------+--------------------------
This issue came up when discussing webstats tarballs that I created the
other day: what file structure should these tarballs have, internally.
Turns out we already specified this file structure in
[https://gitweb.torproject.org/collector.git/tree/src/main/resources/docs/PROTOCOL#n388
Section 5.4 of the Protocol of CollecTor's File Structure]:
''"'webstats' contains compressed log files structured and named
according to the 'Tor web server logs' specification, section 4.3 [0]."''
And Section 4.3 of the referenced specification says:
''"Sanitized log files may additionally be sorted into directories by
virtual host and date as in:
<virtual-host>/YYYY/MM/<virtual-host>_<physical-
host>_access.log_YYYYMMDD[.xz]"''
So, I'd say this is sufficiently specified.
However, the current structure of CollecTor's `out/` directory is
different, as
[https://gitweb.torproject.org/collector.git/tree/src/main/java/org/torproject/collector/persist/WebServerAccessLogPersistence.java#n42
implemented here]:
{{{
this.storagePath = Paths.get(
WEBSTATS,
this.desc.getVirtualHost(),
this.desc.getLogDate().format(yearPattern), // year
this.desc.getLogDate().format(monthPattern), // month
this.desc.getLogDate().format(dayPattern), // day
name).toString();
}}}
Note the ''day'' part which does not exist in the specification.
So, we'll either have to fix the specification or the code. I don't feel
strongly which one we change. But let's make a decision really soon,
before I start reprocessing archives due to #25522. Therefore setting
priority to High.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25525>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list