[metrics-bugs] #25100 [Metrics/CollecTor]: Make CollecTor's webstats module use less RAM and CPU time
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Jan 31 14:53:37 UTC 2018
#25100: Make CollecTor's webstats module use less RAM and CPU time
-----------------------------------+--------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: new
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-----------------------------------+--------------------------
I re-processed 516 xz-compressed log files downloaded from webstats.tp.o
with a total size of 4.4M. Here's what I found:
My first attempt to process these log files with 4G RAM failed after 10
minutes with a `java.lang.OutOfMemoryError`:
{{{
2018-01-31 13:42:18,189 INFO o.t.c.cron.Scheduler:73 Prepare single run
for org.torproject.collector.webstats.SanitizeWeblogs.
2018-01-31 13:42:18,193 INFO o.t.c.cron.Scheduler:150 New Thread created:
CollecTor-Scheduled-Thread-1
2018-01-31 13:42:18,194 INFO o.t.c.c.CollecTorMain:66 Starting webstats
module of CollecTor.
2018-01-31 13:42:18,302 INFO o.t.c.w.SanitizeWeblogs:98 Found log files
for 1 virtual hosts.
2018-01-31 13:42:18,302 INFO o.t.c.w.SanitizeWeblogs:105 Processing logs
for metrics.torproject.org on meronense.torproject.org.
2018-01-31 13:53:02,368 ERROR o.t.c.c.CollecTorMain:71 The webstats module
failed: null
java.lang.OutOfMemoryError: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
at
java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at
java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714)
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at
org.torproject.collector.webstats.SanitizeWeblogs.findCleanWrite(SanitizeWeblogs.java:110)
at
org.torproject.collector.webstats.SanitizeWeblogs.startProcessing(SanitizeWeblogs.java:87)
at
org.torproject.collector.cron.CollecTorMain.run(CollecTorMain.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3236)
at sun.misc.IOUtils.readFully(IOUtils.java:60)
at java.util.jar.JarFile.getBytes(JarFile.java:425)
at
java.util.jar.JarFile.getManifestFromReference(JarFile.java:193)
at java.util.jar.JarFile.getManifest(JarFile.java:180)
at
sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:981)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:119)
at
ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:419)
at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
at ch.qos.logback.classic.Logger.debug(Logger.java:490)
at
org.torproject.descriptor.log.WebServerAccessLogLine.makeLine(WebServerAccessLogLine.java:129)
at
org.torproject.collector.webstats.SanitizeWeblogs.lambda$lineStream$7(SanitizeWeblogs.java:189)
at
org.torproject.collector.webstats.SanitizeWeblogs$$Lambda$24/1590539684.apply(Unknown
Source)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at
org.torproject.collector.webstats.SanitizeWeblogs.lineStream(SanitizeWeblogs.java:190)
at
org.torproject.collector.webstats.SanitizeWeblogs.lambda$findCleanWrite$1(SanitizeWeblogs.java:109)
2018-01-31 13:53:02,376 INFO o.t.c.c.ShutdownHook:23 Shutdown in progress
...
2018-01-31 13:53:02,377 INFO o.t.c.cron.Scheduler:127 Waiting at most 10
minutes for termination of running tasks ...
2018-01-31 13:53:02,377 INFO o.t.c.cron.Scheduler:132 Shutdown of all
scheduled tasks completed successfully.
2018-01-31 13:53:02,378 INFO o.t.c.c.ShutdownHook:25 Shutdown finished.
Exiting.
}}}
My second attempt with 8G RAM succeeded after 20 minutes and produced
correct log files.
We need to make this code run much, much faster and consume much, much
less RAM. We have roughly 1G of logs on webstats.tp.o, so we're talking
about 0.5% of logs here.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25100>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list