[metrics-bugs] #25317 [Metrics/CollecTor]: Enable webstats to process large (> 2G) logfiles
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Feb 21 09:01:53 UTC 2018
#25317: Enable webstats to process large (> 2G) logfiles
-----------------------------------+----------------------
Reporter: iwakeh | Owner: iwakeh
Type: defect | Status: assigned
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-----------------------------------+----------------------
Quote from #25161, comment 12:
Looking at the stack trace and the input log files, I noticed that two
log files are larger than 2G when decompressed:
{{{
3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
access.log-20160531
584K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
access.log-20160531.xz
2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org-
access.log-20160601
404K in/webstats/archeotrichon.torproject.org/dist.torproject.org-
access.log-20160601.xz
}}}
I just ran another bulk import with just those two files as import and
ran into the same exception.
It seems like we shouldn't attempt to decompress these files into a
`byte[]` in `FileType.decompress`, because Java can only handle arrays
with up to 2 billion elements:
https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we
should work with streams there, not `byte[]`.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25317>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list