[tor-bugs] #23817 [Core Tor/Tor]: Tor re-tries directory mirrors that it knows are missing microdescriptors

Tue Oct 31 13:17:38 UTC 2017

#23817: Tor re-tries directory mirrors that it knows are missing microdescriptors
-------------------------------------------------+-------------------------
 Reporter:  teor                                 |          Owner:  (none)
     Type:  defect                               |         Status:  new
 Priority:  Medium                               |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  tor-guard, tor-hs, prop224,          |  Actual Points:
  032-backport? 031-backport?                    |
Parent ID:  #21969                               |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by teor):

 Replying to [comment:9 asn]:
 > Replying to [comment:7 teor]:
 > > Replying to [comment:6 asn]:
 > > > Here is an implementation plan of the failure cache idea from
 comment:4 .
 > > >
 > > > First of all, the interface of the failure cache:
 > > >
 > > >   We introduce a `digest256map_t *md_fetch_fail_cache` which maps
 the 256-bit md hash to a smartlist of dirguards thru which we failed to
 fetch the md.
 > > >
 > > > Now the code logic:
 > > >
 > > > 1) We populate `md_fetch_fail_cache` with dirguards in
 `dir_microdesc_download_failed()`.  We remove them from the failure cache
 in `microdescs_add_to_cache()` when we successfuly add an md to the cache.
 > >
 > > Successfully add *that* md to the cache?
 > > Or any md from that dirguard?
 > >
 >
 > I meant, we remove *that* md from the `md_fetch_fail_cache` if we manage
 to fetch *that* md from any dir.
 >
 > > I think this is ok, as long as we ask for mds in large enough batches.
 > >
 > > > 2) We add another `entry_guard_restriction_t` restriction type in
 `guards_choose_dirguard()`. We currently have one restriction type which
 is designed to restrict guard nodes based on the exit node choice and its
 family. We want another type which uses a smartlist and restricts
 dirguards based on whether we have failed to fetch an md from that
 dirguard. We are gonna use this in step 3.
 > > >
 > > > 3) In `directory_get_from_dirserver()` we query the md failure cache
 and pass any results to `directory_pick_generic_dirserver()` and then to
 `guards_choose_dirguard()` which uses the new restriction type to block
 previously failed dirguards from being selected.
 > >
 > > Do we block dirguards that have failed to deliver an md from downloads
 of that md?
 > > Or do we block dirguards that have failed to deliver any mds from
 downloads of any md?
 > >
 >
 > Yes, that's a good question that I forgot to address in this proposal.
 >
 > I guess my design above was suggesting that we block dirguards "that
 have failed to deliver any mds from downloads of any md", until those mds
 get fetched from another dirserver and get removed from the failure cache.

 I think this is the behaviour we want: trying each dir server for each
 specific md will mean that we time out, because there are more mds than
 there are dir servers.

 There are two cases when this will cause us to fall back to the
 authorities:
 * we download a new consensus from the authorities, and they are the only
 ones with some of the mds in it
 * for some reason, an md referenced in the consensus is not mirrored
 correctly by relays or authorities (this would be a serious bug in tor)

 To avoid this happening when it isn't necessary, we should expire failure
 cache entries after a random time. Maybe it should be the time when we
 expect dir servers to fetch a new consensus and new mds. I think this is
 1-2 hours, but check dir-spec for the exact details. Or we could expire md
 failure caches each time we get a new consensus. That would be easier.

 > That kinda penalizes dirservers that dont have a totally up-to-date md
 cache, which perhaps kinda makes sense. But perhaps before we become so
 strict, we should check whether we can improve the dirserver logic of
 fetching new mds so that they are almost always up-to-date if possible? Or
 we can do this last part after we implement the failure cache proposal?

 No, it's not possible to improve this behaviour without major changes to
 tor. A directory server can't fetch new mds until it fetches a new
 consensus that references them. And these fetches are done at random times
 to avoid flooding authorities with queries.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23817#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online