[metrics-bugs] #25523 [Metrics/Library]: Add support for webstats tarballs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Mar 16 16:04:27 UTC 2018
#25523: Add support for webstats tarballs
---------------------------------+----------------------
Reporter: karsten | Owner: iwakeh
Type: defect | Status: assigned
Priority: Medium | Milestone:
Component: Metrics/Library | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
---------------------------------+----------------------
I started creating tarballs containing `.xz`-compressed webstats files.
When I attempt to feed them into `DescriptorReader`, it fails with an
exception like the following:
{{{
Cannot parse descriptor file ’in/webstats-2016-01.tar’.
��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
at
org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
at
org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
at
org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
at java.lang.Thread.run(Thread.java:745)}
}}}
The tarballs I created contain files as follows:
{{{
$ tar tf webstats-2016-01.tar
[...]
webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
}}}
When I extract tarball files before reading them with `DescriptorReader`,
this works just fine.
I ''think'' that the issue is that
`DescriptorParserImpl#detectTypeAndParseDescriptors()` looks at
`descriptorFile` rather than `fileName` to obtain the file name. The
effect is that it learns the ''tarball'' file name, rather than the file
name of the contained log file:
{{{
- if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
+ if (fileName.contains(LogDescriptorImpl.MARKER)
}}}
The above is untested and probably insufficient. It's just supposed to
start the bug hunting. Priority is medium, because we can just extract
tarballs for now. But it's a bug, and it may confuse users as soon as we
provide these tarballs and no working code to process them.
This is also related to #22695.
Assigning to iwakeh who said they'd like to grab it.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25523>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list