Fwd: [Wikitech-l] Planning to tighten TorBlock settings

Sat Apr 4 01:08:10 UTC 2009

On Fri, Apr 03, 2009 at 03:29:37PM -0400, Gregory Maxwell wrote:
> On Fri, Apr 3, 2009 at 12:34 PM, Paul Syverson
> <syverson at itd.nrl.navy.mil> wrote:
> > On Fri, Apr 03, 2009 at 12:03:53PM -0400, Gregory Maxwell wrote:
> >
> >> To solve this issue I believe that TOR needs a strong pseudo-anonymous
> >> system built in and available to users. ??Something where Wikipedia can
> >> block a single misbehaving user just as easily as they can without Tor
> >> and that user can't simply mint 100 more accounts in a few minutes.
> >> There have been various proposals for systems to accomplish this in
> >> the past, but unless one is integrated and easily supported by
> >> websites it will do no good.
> >
> > Building this into Tor would be pointless because, as you observed, it
> > would not block misbehavior via the many other ways to be anonymous
> > widely used by abusive people. Systems that require protection against
> > abuse need some sort of authentication (possibly anonymous or
> > pseudonymous) required for any access. Otherwise people can just
> > manufacture identities at will and use them over any other path than
> > Tor. ??More accurately I should have said that it would be pointless
> > to reduce overall abuse; it would just give one warm fuzzies that
> > abuse via Tor was moved to the other mechanisms not designed to be
> > as visible.
> 
> So I need to ask you to take a leap of faith here: I need you to simply accept
> that:
> 
> * There are people involved in the Wikimedia project who care deeply
> about TOR relevant issues like the freedom from censorship.  * That
> these people generally understand the issues, the technical factors
> of how Tor works, and have spent a considerable amount of time
> thinking about these issues, conducted measurements, etc.  * That
> these people understand the issues facing Wikipedia better than you
> do.

It's no leap of faith. I would take the first and last points as given
and am happy to know the points in the middle.  I apologize again if
my tone was off (even more than usual).

> With that out of the way???
> 
> I'm quite confident that it would not be pointless.
> 
> At any given moment the English Wikipedia is blocking write access
> from a considerable portion of the total IPv4 address space.
> Persistently write-blocked access includes almost every widely known
> anonymization service (including the commercial offerings), the entire
> address blocks of many of the lower priced colocation/shell providers,
> many tens of thousands of open or partially open HTTP proxies, a large
> number of ISP transparent proxies which have not been registered as
> trusted XFF header sources. Frequently short term blocked ranges
> include popular broadband providers that allocate from large
> non-geographically specific pools, as well as entire universities and
> branches of government.  Last I looked there was a good number of /16s
> blocked.
> 
> The site administrators can selective grant block bypasses to users,
> so if a regular good contributor is impacted when her entire
> university gets blocks she can be exempted by simply requesting an
> exemption.
> 
> Because of the aggressive use of both long and short term write access
> blocking made possible by a general community indifference to
> collateral damage against users who are not regular contributors (whom
> can be exempted) the existing tools are *highly effective* at halting
> repeated aggressively disruptive behaviour.
> 
> We must distinguish casually disruptive behaviour and aggressively
> disruptive behaviour. For the former, someone scribbling nonsense into
> an article, TOR is simply not a factor??? these people are not going to
> use Tor. If they had enough brains to get Tor installed they'd
> probably come up with a more productive use of their time.  Few people
> involved with Wikipedia think they are fighting that problem by
> limiting Tor.  The latter type includes aggressive fairly intelligent
> people who are willing to put in considerable time and effort. These
> sorts of people will write their own attack software, and yes, they'll
> use Tor.  These people produce damage in great disproportion to their
> numbers.
> 
> Without Tor these people will have access to some number of yet
> unblocked anonymization services, internet cafes, broadband provider
> subnets, dialup accounts, etc.  They crop up cause some damage and the
> avenue is closed with a (frequently permanent) write access block.
> Once they have exhausted their available means finding new proxies is
> time consuming and costly, especially since English Wikipedia has so
> much exposure that all the low hanging fruit is blocked. The attacker
> will usually give up, or at least be greatly slowed.
> 

I have trouble believing that, if they are indeed as tech-savvy as you
say, because I'm thinking that there is a virtually indefinite
supply of fruit that hangs pretty low for anyone willing to do the
things you outlined. I suppose this is the source of our divergence
on my "pointless" statement above.

But, if it is really working for you, and they haven't just learned to
slip below the radar, good for you (relatively speaking).

> With Tor permitted they can simply continue to make their trouble
> continually and without abatement for as long as they like ??? and can
> do so at high rates of speed relative to the overall editing activity
> on the site through massively parallel automated operation.
> 
> Prior to the existence of Tor awareness in mediawiki Tor exits were
> blocked randomly as troublemakers cropped up on them. At any given
> time only about 300 Wikipedia-reaching exits were blocked, which was
> actually sufficient because most attackers were not quite smart enough
> to figure out how to visit Wikipedia with named exits (which a serious
> usability problem with Tor??? because it's non-trivial to get exit
> syntax working with vhosted sites).  ... but this also meant that
> legit Tor users also couldn't edit either, as they were even less
> likely to find and use working exits than the attackers.
> 
> > It is also not necessary to have a technical solution. Simply escrowing
> > all edits coming via Tor until some editor is willing to check them
> > would prevent abuse from ever getting to Wikipedia and would allow
> > edits only at whatever rate Wikimedia chooses to devote resources.
> 
> This presumes a fairly narrow view of abuse, and I think a greatly
> underestimated view of the total cost of escrowing.
> 
> Common forms of abuse include things like flooding history and logs
> with abusive messages and the repeated posting of private information
> which Wikipedia chooses not to disseminate for legal or ethical
> reasons.  In fact these kinds of attacks probably now constitute a
> majority of the activity from the aggressive attackers, as their goal
> is usually to piss off the Wikipedia community, and these methods are
> actually more effective to that end then screwing with the content. (A
> particular perversion of the Wikipedia community, perhaps, but it is
> what it is???)

I don't think I had a narrow view of abuse. Rather a broad view of edit.
All of these imply something more than read access to public pages,
which is what I was trying to capture succinctly  by "edit".

> 
> Escrowing can't fix flooding, nor does escrowing make sense for
> non-content materials.
> 
> Even for content escrowing isn't simple. When the most recent edit is
> escrowed, hitting edit does what? Does it fork the non-escrowed text?
> Does it subject the editor to the (possibly offensive or browser
> crashing) escrowed material? If it was forked, now an approver must
> not just approve but now must manually merge the text.  It becomes
> messy quickly, and the Wikipedia community has already clearly sent
> the message that however much it cares about Tor it does not care so
> much as to do a lot of additional work to support it.
> 

Right. My point is that you set up however much resource you care to
for escrowed material. I don't mean to imply a single resource if it
is more convenient to set up a content escrow and any other escrows
for thing that can reached and affected through Tor. However much all
of that amounts to is up to you. Once the escrow repository(ies) is
full, you just drop any more input via Tor until you have had a chance
to look at it. Similarly for the available time of human editors.
Assuming it is anything above zero, it is useful. They will only put
in as much effort as they care to. Until they get around to it, no
input via Tor. There is no need to put "a lot of additional work". The
point is that abuse via this vector now comes to Wikipedia only at the
rate the community wants to take it. Abuse does not overload the
Wikipedia resources or community, it just prevents honest inputs via
Tor. _But_ because it _never_ succeeds in successful attacks on
Wikipedia, the incentive to persist doing it goes down over time
(assuming that preventing honest edits via Tor is not itself
enticing).

> > Because of the incentives from failing ever to succeed via this
> > path would mean that abusive submissions should be minimal; although
> > that is moot from the perspective of Wikimedia resources devoted to
> > it. I hashed this out with Jimbo several years ago, and he entirely
> > agreed with me in the end.
> 
> That was then, this is now.  In any case, Jimmy touches on a lot of
> areas but isn't deeply involved with dealing with these problems on a
> daily basis, the people whom are have significantly different
> positions.
> 

Please let me know how what I said above is incompatible with
any new positions.

> 
> > All that aside, my understanding is that precisely what you suggest
> > has been designed, built, and proposed for integration to wikimedia
> > more than once and was basically ignored: first Jason Holt's nym and
> > then later the more fully featureful and developed nymble from some
> > folks at Dartmouth.
> 
> s/wikimedia/mediawiki/
> 
> I'm aware of the nymble system. But it is completely integrated on the
> Tor side. To make use of it you have to conduct some complicated dance
> of copying around cryptographic blobs between websites, running
> javascript cryptographic engines, etc.
> 
> Frankly, it was a dance that I'd never bother going through. It's
> something which we could only reasonably expect troublemakers to use.
> (And??? perhaps a few Chinese dissidents??? except their needs are
> frequently met already by friends in other countries running closed
> access https proxies for them)
> 
> (I believe it also, or at least one of them required the whole site to
> be via HTTPS, which is pretty much a non-starter for economic reasons;
> but I could be misremembering)
> 
> Neither of these systems are or can ever be a panacea: In the best
> case they are a least a 2x force multiplier for attacker, though in
> practice even more since they they also inhibit Wikipedia from
> utilizing range blocks.  So allowing these systems systems would be a
> compromise??? once you factor in a decent amount of additional code to
> maintain on the WP side, and that these systems were too complicated
> for almost anyone except an attacker to use due to a lack of Tor
> integration???  of course there hasn't been much interest in deploying
> them.
> 
> If, instead, there was some magic torbutton nymthing that interfaced
> to websites via a standard remote authentication mechanism like OpenID
> which someone is actually going to maintain,  then perhaps you'd see
> people actually using it. (and perhaps still not Wikimedia sites, but
> at least there would be a greater chance).  It's certainly bound to be
> more effective then some protest of blocking read access.

Umm. Been a while for me too, but I think you just described nymble,
or at least part of it. I also don't see how it would affect range
blocks because they're just orthogonal from each other.

You seem to be saying that you are actually managing against abuse
roughly OK with completely open unauthenticated access via various ad
hoc techniques such as range blocking, except for sophisticated
attackers who tend to be the ones that come via Tor and a couple
of other vectors that you know how to dry up. Is that correct?
That would be surprising but good to know.

2AM here. G'night,
Paul