[metrics-bugs] #25523 [Metrics/Library]: Add support for webstats tarballs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Mar 16 16:24:33 UTC 2018
#25523: Add support for webstats tarballs
-----------------------------+--------------------------
Reporter: karsten | Owner: iwakeh
Type: defect | Status: accepted
Priority: Medium | Milestone:
Component: Metrics/Library | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+--------------------------
Changes (by iwakeh):
* status: assigned => accepted
Old description:
> I started creating tarballs containing `.xz`-compressed webstats files.
> When I attempt to feed them into `DescriptorReader`, it fails with an
> exception like the following:
>
> {{{
> Cannot parse descriptor file ’in/webstats-2016-01.tar’.
> ��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
> at
> org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
> at
> org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
> at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
> at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
> at
> org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
> at java.lang.Thread.run(Thread.java:745)}
> }}}
>
> The tarballs I created contain files as follows:
>
> {{{
> $ tar tf webstats-2016-01.tar
> [...]
> webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
> webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
> }}}
>
> When I extract tarball files before reading them with `DescriptorReader`,
> this works just fine.
>
> I ''think'' that the issue is that
> `DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
> `descriptorFile` rather than `fileName` to obtain the file name. The
> effect is that it learns the ''tarball'' file name, rather than the file
> name of the contained log file:
>
> {{{
> - if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
> + if (fileName.contains(LogDescriptorImpl.MARKER)
> }}}
>
> The above is untested and probably insufficient. It's just supposed to
> start the bug hunting. Priority is medium, because we can just extract
> tarballs for now. But it's a bug, and it may confuse users as soon as we
> provide these tarballs and no working code to process them.
>
> This is also related to #22695.
>
> Assigning to iwakeh who said they'd like to grab it.
New description:
I started creating tarballs containing `.xz`-compressed webstats files.
When I attempt to feed them into `DescriptorReader`, it fails with an
exception like the following:
{{{
Cannot parse descriptor file ’in/webstats-2016-01.tar’.
��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
at
org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
at
org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
at java.lang.Thread.run(Thread.java:745)}
}}}
The tarballs I created contain files as follows:
{{{
$ tar tf webstats-2016-01.tar
[...]
webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
}}}
When I extract tarball files before reading them with `DescriptorReader`,
this works just fine.
I ''think'' that the issue is that
`DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
`descriptorFile` rather than `fileName` to obtain the file name. The
effect is that it learns the ''tarball'' file name, rather than the file
name of the contained log file:
{{{
- if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
+ if (fileName.contains(LogDescriptorImpl.MARKER)
}}}
The above is untested and probably insufficient. It's just supposed to
start the bug hunting. Priority is medium, because we can just extract
tarballs for now. But it's a bug, and it may confuse users as soon as we
provide these tarballs and no working code to process them.
This is also related to #22695.
Assigning to iwakeh who said they'd like to grab it.
--
Comment:
Taken.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25523#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list