[tor-bugs] #9385 [BridgeDB]: bridgedb's email responder should fuzzy match email addresses within time periods
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sat Aug 3 02:26:23 UTC 2013
#9385: bridgedb's email responder should fuzzy match email addresses within time
periods
-----------------------------------------+----------------------------------
Reporter: isis | Owner: isis
Type: defect | Status: new
Priority: normal | Milestone:
Component: BridgeDB | Version:
Keywords: email,distributor,spam,bots | Parent:
Points: | Actualpoints:
-----------------------------------------+----------------------------------
tl;dr: We're getting trolled hardcore. We should have some sort of fuzzy
matching on email addresses within a time limit.
While looking into #9277, in the directory which BridgeDB stores it's
logfiles, I noticed several problems.
One of them is that BridgeDB's email response distributor is incredibly
naive and susceptible to massive trolling. Forgetting the fact that there
are five days worth of logfiles which include the *full* *text* of the
response emails, *including* *the* *client* *email* *addresses*, it is
actually lucky that I saw these email addresses, because there is a
definite pattern to them.
There were 200 occurences of 'gmail.com':
{{{
$ grep -Er '@gmail\.com' | awk -Pe '{"From "} ; { print $2 }' | grep
gmail\.com | wc -l
200
}}}
120 of which were unique:
{{{
$ grep -Er '@gmail\.com' | awk -Pe '{"From "} ; { print $2 }' | grep
gmail\.com | sort | uniq | wc -l
120
}}}
The problem is that there are multiple addresses making requests in a row
which are not only quite clearly related (i.e.
<static_username>+<incremental_integer>@gmail.com, or
<base32_80bit_hash>@gmail.com) but also are quite obviously snark/trolling
from various adversaries.
For example, one of the usernames which had incremental integers, was
'feidanchaoren', and I saw it incremented 34 times, i.e.
{{{
feidanchaoren00001@
feidanchaoren00002@
[...]
feidanchaoren00034@
}}}
There were multiple requests (though at minimum 30 minutes apart) from
precisely the same username+integer.
Also, 'fei dan' is romanji for 飞蛋, which means 'flying egg' in English. It
is from Confucian parable which, if I understood it correctly (and I am
well-versed in neither Traditional Chinese nor Confucianism), is about a
man who pays so much attention to a bunch of eggs trying to ensure that
they hatch, that he does not pay any attention to what to do afterwards.
The eggs hatch, and the chickens fly away. Roughly, it means: "if you pay
too much attention to details and not enough to the bigger picture, you
are made of #fail". And 'cha oren' (超人) is 'superman' in English but more
accurately Nietzsche's 'übermensch' in German. I would assume we're being
trolled pretty hard.
One way to fix this might be to take the time period which we currently
wait between responses, and in addition to rejecting emails from precisely
the same username, we can block anything which fuzzy matches. However,
going down the path of finding clever regexes to match things like the
fake .onion address looking email addresses in addition to all the other
things which are clearly patterns to a human sounds like a good way to
either write unreadable code or accidentally block honest users.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9385>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list