[tor-bugs] #9385 [BridgeDB]: bridgedb's email responder should fuzzy match email addresses within time periods

Sat Aug 3 02:26:23 UTC 2013

#9385: bridgedb's email responder should fuzzy match email addresses within time
periods
-----------------------------------------+----------------------------------
 Reporter:  isis                         |          Owner:  isis
     Type:  defect                       |         Status:  new 
 Priority:  normal                       |      Milestone:      
Component:  BridgeDB                     |        Version:      
 Keywords:  email,distributor,spam,bots  |         Parent:      
   Points:                               |   Actualpoints:      
-----------------------------------------+----------------------------------
 tl;dr: We're getting trolled hardcore. We should have some sort of fuzzy
 matching on email addresses within a time limit.

 While looking into #9277, in the directory which BridgeDB stores it's
 logfiles, I noticed several problems.

 One of them is that BridgeDB's email response distributor is incredibly
 naive and susceptible to massive trolling. Forgetting the fact that there
 are five days worth of logfiles which include the *full* *text* of the
 response emails, *including* *the* *client* *email* *addresses*, it is
 actually lucky that I saw these email addresses, because there is a
 definite pattern to them.

 There were 200 occurences of 'gmail.com':
 {{{
 $ grep -Er '@gmail\.com' | awk -Pe  '{"From "} ; { print $2 }' | grep
 gmail\.com | wc -l
 200
 }}}

 120 of which were unique:
 {{{
 $ grep -Er '@gmail\.com' | awk -Pe  '{"From "} ; { print $2 }' | grep
 gmail\.com | sort | uniq | wc -l
 120
 }}}

 The problem is that there are multiple addresses making requests in a row
 which are not only quite clearly related (i.e.
 <static_username>+<incremental_integer>@gmail.com, or
 <base32_80bit_hash>@gmail.com) but also are quite obviously snark/trolling
 from various adversaries.

 For example, one of the usernames which had incremental integers, was
 'feidanchaoren', and I saw it incremented 34 times, i.e.
 {{{
 feidanchaoren00001@
 feidanchaoren00002@
 [...]
 feidanchaoren00034@
 }}}

 There were multiple requests (though at minimum 30 minutes apart) from
 precisely the same username+integer.

 Also, 'fei dan' is romanji for 飞蛋, which means 'flying egg' in English. It
 is from Confucian parable which, if I understood it correctly (and I am
 well-versed in neither Traditional Chinese nor Confucianism), is about a
 man who pays so much attention to a bunch of eggs trying to ensure that
 they hatch, that he does not pay any attention to what to do afterwards.
 The eggs hatch, and the chickens fly away. Roughly, it means: "if you pay
 too much attention to details and not enough to the bigger picture, you
 are made of #fail". And 'cha oren' (超人) is 'superman' in English but more
 accurately Nietzsche's 'übermensch' in German. I would assume we're being
 trolled pretty hard.

 One way to fix this might be to take the time period which we currently
 wait between responses, and in addition to rejecting emails from precisely
 the same username, we can block anything which fuzzy matches. However,
 going down the path of finding clever regexes to match things like the
 fake .onion address looking email addresses in addition to all the other
 things which are clearly patterns to a human sounds like a good way to
 either write unreadable code or accidentally block honest users.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9385>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online