[metrics-bugs] #25525 [Metrics/CollecTor]: Fix either spec or code regarding full path of sanitized webstats files

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Mar 16 16:30:47 UTC 2018


#25525: Fix either spec or code regarding full path of sanitized webstats files
-----------------------------------+--------------------------
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  defect             |     Status:  new
     Priority:  High               |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+--------------------------
 This issue came up when discussing webstats tarballs that I created the
 other day: what file structure should these tarballs have, internally.

 Turns out we already specified this file structure in
 [https://gitweb.torproject.org/collector.git/tree/src/main/resources/docs/PROTOCOL#n388
 Section 5.4 of the Protocol of CollecTor's File Structure]:

   ''"'webstats' contains compressed log files structured and named
 according to the 'Tor web server logs' specification, section 4.3 [0]."''

 And Section 4.3 of the referenced specification says:

   ''"Sanitized log files may additionally be sorted into directories by
 virtual host and date as in:
   <virtual-host>/YYYY/MM/<virtual-host>_<physical-
 host>_access.log_YYYYMMDD[.xz]"''

 So, I'd say this is sufficiently specified.

 However, the current structure of CollecTor's `out/` directory is
 different, as
 [https://gitweb.torproject.org/collector.git/tree/src/main/java/org/torproject/collector/persist/WebServerAccessLogPersistence.java#n42
 implemented here]:

 {{{
     this.storagePath = Paths.get(
         WEBSTATS,
         this.desc.getVirtualHost(),
         this.desc.getLogDate().format(yearPattern), // year
         this.desc.getLogDate().format(monthPattern), // month
         this.desc.getLogDate().format(dayPattern), // day
         name).toString();
 }}}

 Note the ''day'' part which does not exist in the specification.

 So, we'll either have to fix the specification or the code. I don't feel
 strongly which one we change. But let's make a decision really soon,
 before I start reprocessing archives due to #25522. Therefore setting
 priority to High.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25525>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list