[tor-bugs] #13664 [Tor]: Potential issue with rend cache object when intro points falls to 0.

Tue Nov 4 21:26:39 UTC 2014

#13664: Potential issue with rend cache object when intro points falls to 0.
---------------------+---------------------
 Reporter:  dgoulet  |          Owner:
     Type:  defect   |         Status:  new
 Priority:  normal   |      Milestone:
Component:  Tor      |        Version:
 Keywords:  tor-hs   |  Actual Points:
Parent ID:           |         Points:
---------------------+---------------------
 (Reproduced on Tor v0.2.6.1-alpha-dev (git-a142fc29aff4b476))

 Here is the use case I was testing. I setup an HS on a remote server for
 perf analysis. On my client, I made a small script that torsocks 10
 connections on a different circuit to that HS (considering that the SOCKS5
 user/pass == unique circuit works).

 With the above, one time out of 10, I get all 10 connections to
 successfully connect and work. The rest of the time I get an arbitrary
 amout of connections failing with "Host unreachable". I feel this is a
 combo of sometimes luck and sometimes the real issue.

 I analyze this and my understanding is that the rend cache contains v2
 descriptor with stored intro points ("intro_nodes" variable). However,
 through the cycle of trying to connect, some intro points may be
 unrechable thus  being removed from that list. It also appears that we can
 remove nodes in that list when closing circuit that were built in
 "parallel":

 {{{
 Nov 04 15:36:08.000 [info] rend_client_close_other_intros(): Closing
 introduction circuit 25 that we built in parallel (Purpose 7).
 Nov 04 15:36:08.000 [debug] circuit_get_by_circid_channel_impl():
 circuit_get_by_circid_channel_impl() returning circuit 0x7f6f1a171190 for
 circ_id 2434373038, channel ID 0 (0x7f6f1a0425e0)
 Nov 04 15:36:08.000 [info] circuit_mark_for_close_(): Failed intro circ
 rejxmpqgho5vqdl4 to $EBE718E1A49EE229071702964F8DB1F318075FF8 (awaiting
 ack). Removing from descriptor.
 }}}

 circuit_mark_for_close_() triggers a INTRO_POINT_FAILURE_GENERIC failure
 that removes the intro point from the list. I might be wrongly
 interpreting the "we built in parallel" feature but what I can observed is
 that the intro node list becomes empty at some point which triggers a
 "let's refetch that v2 descriptor!" behaviour.

 {{{
 Nov 04 15:36:08.000 [info] rend_client_report_intro_point_failure():
 Unknown service "rejxmpqgho5vqdl4". Re-fetching descriptor.
 }}}

 However, the rend cache is not cleared of the old entry before fetching
 that new descriptor. So once the v2 descriptor is received, we store it in
 the cache using "rend_cache_store_v2_desc_as_client()" that prints this:

 {{{
 Nov 04 15:36:09.000 [info] rend_cache_store_v2_desc_as_client(): We
 already have this service descriptor rejxmpqgho5vqdl4. [rendezvous-
 service-descriptor i7hkcux5dghqv6ahstewyccltr6aud2x
 }}}

 So since we "have it" in the cache, we call "rend_client_desc_trynow()"
 and it completely fails because all intro points in the cache object are
 gone so this closes all pending connections.

 Now, I think this happens because the heuristic for telling if "We already
 have the cache object" is just by comparing the "desc" string here in
 rendcommon.c +1156

 {{{
   /* Do we already have this descriptor? */
   if (e && !strcmp(desc, e->desc)) {
     log_info(LD_REND,"We already have this service descriptor %s. [%s]",
              safe_str_client(service_id), desc);
     e->received = time(NULL);
     goto okay;
   }
 }}}

 I think when the intro point list ends up to 0 node, we should remove it
 from the cache and trigger the "fetch it again".

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13664>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online