[metrics-bugs] #22602 [Metrics/CollecTor]: CollecTor's relaydescs module freezes while downloading from directory authorities

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Jun 14 09:30:49 UTC 2017


#22602: CollecTor's relaydescs module freezes while downloading from directory
authorities
-----------------------------------+--------------------------
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  defect             |     Status:  new
     Priority:  High               |  Milestone:
    Component:  Metrics/CollecTor  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+--------------------------
 This morning, 2017-06-14 ~07:00, I noticed that the latest consensus
 retrieved by CollecTor was valid after 2017-06-13 17:00.

 The last log lines from the relaydescs module were:

 {{{
 2017-06-13 17:05:00,001 INFO o.t.c.c.CollecTorMain:66 Starting relaydescs
 module of CollecTor.
 2017-06-13 17:05:26,184 INFO o.t.c.r.CachedRelayDescriptorReader:255
 Finished importing relay descriptors from local Tor data directories:
 cached-consensus: 2017-06-13 17:00:00
 cached-descriptors: parsed 0, skipped 24560 server descriptors
 cached-descriptors.new: parsed 608, skipped 8585 server descriptors
 cached-extrainfo: parsed 0, skipped 24543 extra-info descriptors
 cached-extrainfo.new: parsed 607, skipped 8239 extra-info descriptors
 v3-status-votes: parsed 8, skipped 0 votes
 }}}

 All other modules continued as usual.

 Here's a stack trace obtained using `jcmd`:

 {{{
 "CollecTor-Scheduled-Thread-8" daemon prio=10 tid=0x00007fedd8006800
 nid=0x6411 runnable [0x00007fee023fd000]
    java.lang.Thread.State: RUNNABLE
         at java.net.SocketInputStream.socketRead0(Native Method)
         at java.net.SocketInputStream.read(SocketInputStream.java:153)
         at java.net.SocketInputStream.read(SocketInputStream.java:122)
         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
         - locked <0x000000078fd3b3d8> (a java.io.BufferedInputStream)
         at
 sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:707)
         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:650)
         at
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1371)
         - locked <0x000000078fd3b418> (a
 sun.net.www.protocol.http.HttpURLConnection)
         at
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
         at
 org.torproject.collector.relaydescs.RelayDescriptorDownloader.downloadResourceFromAuthority(RelayDescriptorDownloader.java:869)
         at
 org.torproject.collector.relaydescs.RelayDescriptorDownloader.downloadDescriptors(RelayDescriptorDownloader.java:817)
         at
 org.torproject.collector.relaydescs.ArchiveWriter.startProcessing(ArchiveWriter.java:176)
         at
 org.torproject.collector.cron.CollecTorMain.run(CollecTorMain.java:67)
         at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
         at
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
         at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
 }}}

 I stopped and restarted CollecTor and am now working on filling the gap of
 relay descriptors published in these ~16 hours by syncing from the backup
 instance.

 I guess the fix is to start using a timeout somewhere.  It's just curious
 that we didn't run into this case before.  We didn't change anything there
 recently, did we?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22602>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list