[tor-bugs] #24351 [Applications/Tor Browser]: Block Global Active Adversary Cloudflare
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Oct 24 23:51:24 UTC 2019
#24351: Block Global Active Adversary Cloudflare
-------------------------------------------------+-------------------------
Reporter: nullius | Owner:
| cypherpunks
Type: enhancement | Status:
| assigned
Priority: High | Milestone:
Component: Applications/Tor Browser | Version:
Severity: Major | Resolution:
Keywords: security, privacy, anonymity, mitm, | Actual Points:
cloudflare |
Parent ID: #18361 | Points: 1000
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by hlandau):
Cloudflare considered harmful
Websites should avoid using Cloudflare.
Cloudflare's HTTP fronting service incorporates some seriously
questionable practices. As things stand, the extent of the number of
websites which have come to use Cloudflare, multiplied by its issues,
poses a hazard to the state of the web.
Cloudflare as advertised to site operators fails to identify these issues,
and operators may even be unaware of them.
The CAPTCHA absurdity
If a website uses Cloudflare, most likely and by default this will result
in the website being rendered stochastically defective.
A website is an HTTP request processing service. Adoption of Cloudflare
results in such services becoming unreliable, and causes denial-of-service
conditions to occur for users, in an essentially random and unaccountable
fashion.
Where such denial-of-service conditions occur, Cloudflare provides a
bizarre “one more step” page inviting the visitor to complete a reCAPTCHA
to access the site. Cloudflare claims that this is based on IP reputation,
which constitutes a fallacious equivocation of IPs and users which has
been found to be highly detrimental to Tor users in terms of the
browseability of the web.
This doesn't even work if the user has cookies disabled, or if the user
uses a browser which doesn't support iframes, such as lynx.
The HTTP status code for this page is 403 Forbidden. Essentially,
Cloudflare by design randomly perpretrates denial of service attacks on
users, yet at the same time Cloudflare paradoxically advertises itself as
a service to mitigate DoS attacks.
Cloudflare claims that that these measures are necessary to counter abuse.
This claim is dubious because it is a model of operation for a CDN which
appears to have few if any imitators. In other words, other CDNs do not
appear to have any issue implementing HTTP properly while providing anti-
DDoS measures, without resorting to such practices as random demands that
users complete CAPTCHAs.
Cloudflare's randomly occurring demand that users complete CAPTCHAs
discriminates against users which are not humans, by design. This
constitutes a hazard to the crawlability of the web. It is unclear what
recourse is available to organizations spidering the web which find
themselves impaired by Cloudflare's actions. Even if Cloudflare were to
whitelist these organizations, this essentially makes Cloudflare an
authority on the legitimacy of a search engine; which, given the magnitude
of Cloudflare's user base, is deeply concerning.
There is no basis in the HTTP standard to demand that users or their user
agents complete CAPTCHAs in order to load a page, and the CAPTCHA demands
issued by Cloudflare are not communicated in any standard way. The '403
Forbidden' status code used for these pages could express arbitrary policy
prohibitions, such as non-negotiable denial of access. Cloudflare demands
not only a human user (but not consistently; only if it randomly decides
to do so), but also does not unambiguously communicate where this is the
case in a machine-readable manner, which is discriminatory to robots, many
of which are legitimate.
Rather absurdly, if you want to provide an API over Cloudflare you have to
exempt your API endpoints so that this doesn't happen, which raises the
question of what the point is in the first place if you have to make holes
in it, and basically proves the point.
Cloudflare's inexplicable inability to implement HTTP in a sane,
transparent manner, despite this incapability being seemingly unshared by
every other CDN service in existence, became even more ridiculous when
Cloudflare reached out to the Tor project to request that they make
changes to Tor to accommodate their own problematic practices.
In its liason with Tor, Cloudflare states that its reasoning for its
CAPTCHAs is not DDoS mitigation, but the following:
They want to mitigate comment spam.
They think it's better UI to verify the user in a GET request than in
a POST request when they're actually making the comment.
Moreover, if the comment system uses AJAX, intercepting the POST
request won't work properly.
You would think that this last point might make Cloudflare realise that
trying to stop comment spam at the CDN level is a futile idea and can only
result in breaking HTTP, but no. What Cloudflare is trying to do here is a
fundamentally broken practice (in fact, the whole premise of “web
application firewalls” is fundamentally broken, see below) because
Cloudflare is not in a position to understand the semantic meaning of HTTP
traffic and is definitely not in a position to rearchitect a site
operator's web application so that it understands why its own AJAX
requests are randomly being denied.
In other words, since CAPTCHAs are discriminatory to robots as discussed
above, Cloudflare's service is unwittingly discriminatory to the
JavaScript of the very websites it serves, breaking them. Cloudflare's
response to this appears to be to CAPTCHA the entire website up-front on
the off chance someone might want to post a comment, though even this
doesn't always work; I have definitely encountered websites which didn't
do so and which had broken AJAX functionality due to the subsequent AJAX-
triggered requests being denied by Cloudflare, a condition the JS code was
not designed to handle (nor would there be any sane way for it to handle
it anyway).
“Abuse”, and the Web Application Firewall fallacy
Being a comment spam filtering service isn't the only thing Cloudflare is
trying to do besides being a CDN. They also claim to use their CAPTCHAs to
mitigate other “abusive” traffic, like “harvesting e. mail addresses”.
lunar: [...] Could you tell use [sic] what qualifies as abuse?
jgrahamc: Abuse: comment spamming, harvesting email addresses,
attacking web applications (e.g. SQL injection), HTTP DoS (exploiting slow
web servers/applications to knock them offline). I'm not interested in
L3/L4 DoS and Tor as that's non-existent (unless then [sic] exit node is
separately part of a botnet).
This is a fundamentally broken practice. Attempting to filter for SQL
injection at the CDN level is an excercise in futility and security
theatre. This is the “Web Application Firewall” idea, the absurd idea that
grepping requests/responses for known-to-be-naughty patterns is an
adequate cure for vulnerable web applications. In reality it isn't
trustworthy or accurate because it can't be.
If I try to login on a website with the username ' OR 0=0 --, Cloudflare
has no way of knowing whether this is a SQL injection attack or just a
peculiar username which the website has decided to legitimately issue.
Cloudflare has no way of knowing if the website even uses SQL for data
storage.
If I post ' OR 0=0 -- in a comment, Cloudflare has no way of knowing
whether this is an SQL injection attack, or whether it will actually work,
or whether I'm actually posting a comment discussing SQL injection and
including examples (at which point this actually becomes a form of
censorship).
What using Cloudflare does mean is that Cloudflare will randomly cause DoS
to users if it thinks they're trying to use a pattern of text to which
Cloudflare is by design allergic. The circumstances in which these denials
of service occur are, of course, ill defined and in no way exhaustively
enumerated, so using Cloudflare presents an intense and unaccounted
liability in terms of availability and content neutrality for any website.
It is essentially a way to make your website unreliable and fail randomly.
The “web application firewall” concept is fundamentally flawed in all
instances, because it falsely presupposes that a blind intermediate proxy
can reliably assess the semantic meaning of data transmitted, which is in
actual fact impossible. Since this kind of “service” is part of the
Cloudflare value proposition and an attempt to add a profit-making value-
add, Cloudflare has essentially built their entire business on doing
something which is a bad idea and which cannot be reliably implemented.
Arbitrary and poorly defined content mangulation
Continuing with the flawed “web application firewall” theme of an
unknowing proxy trying to guess the semantic meaning of content
transmitted through it, Cloudflare insists on being a CDN which does un-
CDN-like things in yet other ways. Rather than being a neutral proxy of
traffic, even when Cloudflare isn't stochastically DoSing its customers
websites, Cloudflare insists on doing interesting things with response
bodies.
For example, it mangles e. mail addresses and replaces them with some
JavaScript convolution intended to complicate harvesting. Except that it
doesn't mangle e. mail addresses... it mangles anything which looks
vaguely like an e. mail address, even if it isn't.
This
# Welcome to example.com. To access the foobar API, use curl:
curl 'https://foo@example.com/foobar'
becomes
# Welcome to example.com. To access the foobar API, use curl:
curl 'https://[email protected]/foobar'
XMPP address? Filtered. SIP address? Filtered. OpenSSH algorithm
identifier? Filtered. Kerberos principal? Filtered. Because this filtering
is necessarily done without regard to the context, it suffers from the
same issues inherent in trying to prevent SQL injection, and is a potent
demonstration of how “Web Application Firewalls” are a fundamentally
stupid idea. Cloudflare can't actually have the slightest clue whether
something is an e. mail address or not, but filter away it will.
Since I browse the web with JavaScript disabled by default, it's a running
facepalm for me to find things on websites which aren't even email
addresses replaced with [email protected], even parts of source code
listings.
Of course, this practice also discriminates against users with JavaScript
disabled and against browsers that don't support JavaScript, preventing
them from viewing email addresses (or anything that looks like one).
Cloudflare also takes other liberties. It rejiggers a webpage's JavaScript
to optimize it. Essentially Cloudflare modifies responses to apply a
variety of poorly-defined transformations it thinks appropriate. From a
website operator's perspective, this should be seen as a liability.
It is in actual fact an intelligence agency
No other CDN service offers a free service comparable to that of
Cloudflare. Why does Cloudflare offer service for free?
It's because Cloudflare isn't a CDN, it's an intelligence project. Its
entire purpose is to collect data. This isn't my inference, the founders
of Cloudflare have happily gone on record and said it:
Back in 2003, Lee Holloway and I started Project Honey Pot as an open-
source project to track online fraud and abuse. The Project allowed anyone
with a website to install a piece of code and track hackers and spammers.
We ran it as a hobby and didn't think much about it until, in 2008, the
Department of Homeland Security called and said, 'Do you have any idea how
valuable the data you have is?' That started us thinking about how we
could effectively deploy the data from Project Honey Pot, as well as other
sources, in order to protect websites online. That turned into the initial
impetus for CloudFlare.
Yes, Cloudflare was founded by the Project Honeypot people.
Cloudflare also has an extremely generous free tier, and probably most of
the websites which use it do not pay. But as we've come to understand in
this era of surveillance capitalism, if you aren't paying, you aren't the
customer — you're the product.
Threat to anonymity of Tor users. Cloudflare doesn't just pointlessly
inconvenience Tor users by making them solve CAPTCHAs to view websites; it
also poses a vehicle for the deanonymisation of Tor users. Cloudflare is
basically an ideal platform for attacking Tor because it is the closest
anyone has ever come to building a Global Active Adversary (GAA) — an
entity which can observe and modify traffic anywhere in the world. Compare
this with the lesser category of the Global Passive Adversary (GPA), which
can observe but not modify traffic anywhere in the world. Tor is not
designed to offer effective security against either of these.
To put this in perspective, in 2013 the NSA had given up on ever achieving
GPA status (and therefore on ever being able to reliably deanonymise Tor
traffic), let alone GAA. Cloudflare is effectively inviting people to help
it become a GAA.
Tracking cookies. By the way, Cloudflare delivers a tracking cookie for
any website which uses it. Even if the website is completely static and
stateless, you still get a tracking cookie. (Since Cloudflare definitely
has assets in the EU — it has to, it's a CDN — it's also pretty
egregiously violating EU law here.)
It is probably a US Government-attached intelligence agency
Cloudflare is known for providing its services to a variety of websites,
including notorious piracy websites such as The Pirate Bay. It is also a
US company.
Since the US is known for taking down even companies that appear to be
legal on paper, such as Megaupload, when they are associated with
copyright infringement, this situation is peculiar.
Probable liability. 17 USC 512 provides for exemptions from liability for
copyright infringement for various types of entity. 17 USC 512(c) provides
for the takedown of “information residing on systems or networks at
direction of users”; this is the well-known “DMCA notice” provision.
However it also contains a provision 17 USC 512(b) which relates to
caching proxies (i.e., Cloudflare).
This clause provides several conditions for this exemption from liability
being valid:
that the material is transmitted by the caching proxy without
modification; and
that the proxy handle takedown notices for material on a site if a
court has ordered that the material be removed from the original site;
amongst other things.
In other words, if a US court were to order that The Pirate Bay take down
certain pages of their site, Cloudflare would be obliged to comply with
notices asking them to give effect to that takedown in the absence of
compliance by The Pirate Bay itself — and in any case, it seems likely
that a US court could also just order Cloudflare directly to disable
access to it.
But even this is moot because Cloudflare modifies the material it passes.
Therefore, it cannot claim 17 USC 512(b) exemption at all. Moreover, since
47 USC 230 (the Communications Decency Act) explicitly exempts copyright
from the immunity from liability it grants intermediaries, without an
applicable 17 USC 512 exemption it is likely to be liable. Despite this,
there has been an absence of even attempted attack by the US on
Cloudflare's activities providing services to notorious piracy websites.
My conclusions. It appears likely that the US government could adversely
affect Cloudflare's business via legal action if they wished. The fact
that they have not is therefore unusual. However, it is well known that US
federal law enforcement is happy to avoid shutting down illegal activities
if they believe that they can obtain more intelligence by not doing so.
From previous programmes like PRISM, it is well demonstrated that most
large tech companies are perfectly happy to comply with requests for total
access by intelligence and law enforcement agencies. Moreover, given that
Cloudflare is now used by an absurdly large number of websites, this means
that Cloudflare is essentially the world's premier global MitM agency.
This is a level of access and visibility that signals intelligence
agencies could ordinarily only dream of, especially since it breaks TLS.
To put it frankly, the intercept data available to Cloudflare is so
tantalising to intelligence agencies, it seems almost beyond plausibility
that they haven't gone after it, especially when considering the
possibility of effectively blackmailing Cloudflare over the legality of
their activities and, for that matter, the known historical contact of
their founders with DHS. (Though even this assumes a default reluctance to
assist extrajudicial surveillance on the part of tech companies, which is
known to frequently not be the case in favour of indifference or outright
enthusiasm.)
In other words, on the balance of probabilities, I believe that
Cloudflare's continued lack of aggression from the US government and
simple consideration of the standard MO of both intelligence agencies and
tech companies makes it overwhelmingly likely that Cloudflare is an
effective element of US signals intelligence.
This is not, of course, evidence or proof. However, the probabilities are
adequate that assuming it is not the case would be, frankly, imprudent.
Although the use of Cloudflare as a wiretap alone would be alarming in
itself, the prospect of the fusion of Cloudflare's partial GAA and the
NSA's partial GPA capabilities would be formidable in the extreme. An
abundance of caution — and a distrust of companies essentially trying to
put themselves in a position to MitM all web traffic, even if they claim
benevolence — is wholly advisable.
After all, the mere possibility of this threatens to undermine the “crypto
renaissance”, to undermine everything which the public cryptographic
community has worked for since 2013 — and cryptography is all about
mitigating mere possibilities in the first place1.
Conclusions
Cloudflare's product is based on fundamentally flawed ideas such as
“web application firewalling” which simply cannot work properly.
Using Cloudflare is a way of stochastically DoSing and subtly breaking
your own website.
Using Cloudflare discriminates against Tor users, and for that matter
some non-Tor users.
Cloudflare is the world's leading global MitM agency, rivalling the
power of any signals intelligence agency. They are in a position to
monitor, surveil, deanonymise and modify an alarmingly large and growing
percentage of web traffic because of its widespread usage and the fact
that it terminates TLS sessions2. Even if you were to trust Cloudflare,
putting this level of trust in one entity is extremely unwise.
It is extremely hard to imagine this intercept data not ending up in
the hands of intelligence agencies.
Hugo Landau
https://github.com/hlandau
https://www.devever.net/~hl/cloudflare
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24351#comment:141>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list