[metrics-bugs] #25523 [Metrics/Library]: Add support for webstats tarballs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Mar 16 16:24:33 UTC 2018


#25523: Add support for webstats tarballs
-----------------------------+--------------------------
 Reporter:  karsten          |          Owner:  iwakeh
     Type:  defect           |         Status:  accepted
 Priority:  Medium           |      Milestone:
Component:  Metrics/Library  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:                   |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+--------------------------
Changes (by iwakeh):

 * status:  assigned => accepted


Old description:

> I started creating tarballs containing `.xz`-compressed webstats files.
> When I attempt to feed them into `DescriptorReader`, it fails with an
> exception like the following:
>
> {{{
> Cannot parse descriptor file ’in/webstats-2016-01.tar’.
> ��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
>         at
> org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
>         at
> org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
>         at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
>         at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
>         at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
>         at java.lang.Thread.run(Thread.java:745)}
> }}}
>
> The tarballs I created contain files as follows:
>
> {{{
> $ tar tf webstats-2016-01.tar
> [...]
> webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
> webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
> }}}
>
> When I extract tarball files before reading them with `DescriptorReader`,
> this works just fine.
>
> I ''think'' that the issue is that
> `DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
> `descriptorFile` rather than `fileName` to obtain the file name. The
> effect is that it learns the ''tarball'' file name, rather than the file
> name of the contained log file:
>
> {{{
> -    if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
> +    if (fileName.contains(LogDescriptorImpl.MARKER)
> }}}
>
> The above is untested and probably insufficient. It's just supposed to
> start the bug hunting. Priority is medium, because we can just extract
> tarballs for now. But it's a bug, and it may confuse users as soon as we
> provide these tarballs and no working code to process them.
>
> This is also related to #22695.
>
> Assigning to iwakeh who said they'd like to grab it.

New description:

 I started creating tarballs containing `.xz`-compressed webstats files.
 When I attempt to feed them into `DescriptorReader`, it fails with an
 exception like the following:

 {{{
 Cannot parse descriptor file ’in/webstats-2016-01.tar’.
 ��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
         at
 org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
         at
 org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
         at
 org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
         at java.lang.Thread.run(Thread.java:745)}
 }}}

 The tarballs I created contain files as follows:

 {{{
 $ tar tf webstats-2016-01.tar
 [...]
 webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
 webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
 }}}

 When I extract tarball files before reading them with `DescriptorReader`,
 this works just fine.

 I ''think'' that the issue is that
 `DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
 `descriptorFile` rather than `fileName` to obtain the file name. The
 effect is that it learns the ''tarball'' file name, rather than the file
 name of the contained log file:

 {{{
 -    if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
 +    if (fileName.contains(LogDescriptorImpl.MARKER)
 }}}

 The above is untested and probably insufficient. It's just supposed to
 start the bug hunting. Priority is medium, because we can just extract
 tarballs for now. But it's a bug, and it may confuse users as soon as we
 provide these tarballs and no working code to process them.

 This is also related to #22695.

 Assigning to iwakeh who said they'd like to grab it.

--

Comment:

 Taken.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25523#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list