[tor-bugs] #25985 [Obfuscation/Snowflake]: Add AMP cache as another domain fronting option with Google
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu May 3 17:25:39 UTC 2018
#25985: Add AMP cache as another domain fronting option with Google
-----------------------------------+------------------------
Reporter: twim | Owner: (none)
Type: project | Status: new
Priority: Medium | Milestone:
Component: Obfuscation/Snowflake | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------------+------------------------
Comment (by dcf):
Replying to [comment:5 twim]:
> > I presume you at least need a Google account; is it something you set
up in the Google Cloud Platform? Is there a fee?
>
> Curiously enough you don't need a Google account for that because the
AMP project itself isn't solely a Google thing. It is just a special HTML
markup that can be accelerated by any party incl. Google. You just set up
an AMP version of your pages at your host and it just works. No GCP
involved. There is no fee at the moment for page loading, there will only
be on API calls (not our case). As IANAL, I am not aware whether this
usage violates ToS. I couldn't find any.
Thanks for this great info. It's a lot easier than I imagined. The Google
AMP cache will issue GET requests to arbitrary URLs on your behalf (going
back to the [https://www.bamsoftware.com/papers/oss.pdf OSS] idea). I
tried it with my web server, which I haven't done anything to set up for
AMP:
!https://www-bamsoftware-
com.cdn.ampproject.org/c/s/www.bamsoftware.com/amptest
This resulted in an HTTP request to my server:
{{{
64.233.172.149 - - [03/May/2018:10:59:20 -0600] "GET /amptest HTTP/1.1"
404 3726 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile
Safari/537.36 (compatible; Google-AMPHTML)"
}}}
Probably because the page doesn't pass AMP validation (i.e., doesn't
exist), the AMP cache's response was a status-200 meta/JavaScript redirect
to the original URL:
{{{
HTTP/1.1 200 OK
Location: https://www.bamsoftware.com/amptest
Cache-Control: private
X-Content-Type-Options: nosniff
Date: Thu, 03 May 2018 17:05:10 GMT
Content-Type: text/html; charset=UTF-8
Server: sffe
Content-Length: 361
X-XSS-Protection: 1; mode=block
Alt-Svc: hq=":443"; ma=2592000; quic=51303433; quic=51303432;
quic=51303431; quic=51303339; quic=51303335,quic=":443"; ma=2592000;
v="43,42,41,39,35"
<HTML><HEAD>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>Redirecting</TITLE>
<META HTTP-EQUIV="refresh" content="1;
url=https://www.bamsoftware.com/amptest">
</HEAD>
<BODY
onLoad="location.replace('https://www.bamsoftware.com/amptest'+document.location.hash)">
Redirecting you to https://www.bamsoftware.com/amptest</BODY></HTML>
}}}
> > I've seen different kinds of AMP URLs...
> > Do you know what the difference between all these URL styles is? Are
they basically interchangeable? The first one looks like the best, if we
can use it.
>
> I haven't managed to make URLs like
https://www.google.com/amp/s/amp.reddit.com/blablabla to not redirect to
the full article. I am probably just do not understand how this kind of
links differs from others.
The trick with these is you have to use a mobile User-Agent. Press
Ctrl+Shift+I to open the browser console, click the "Responsive Design
Mode", and choose a phone from the menu.
> > https://amp-reddit-com.cdn.ampproject.org/
>
> This is the kind of links I am using in amper. I guess that in theory
*.cdn.ampproject.org can resolve to non-Google IPs as well. These hosts
can be fronted by typical Google server names.
Okay yeah, I found these guides to the URL format. The `c` means content
(can also be `r` for resource or `i` for image) and the `s` means use TLS.
https://developers.google.com/amp/cache/overview#amp-cache-url-format
https://www.ampbyexample.com/advanced/using_the_google_amp_cache/#amp-
cache-url-format
> > https://amp.reddit.com/
>
> This is the host from which one is serving their AMP pages.
I see; so "amp" in the name here is just a convention, not a requirement.
I found this description of the three kinds of URLs:
https://www.ampproject.org/latest/blog/whats-in-an-amp-url/
The "*.cdn.ampproject.org" ones they call "AMP Cache" URLs and the
"google.com/amp" ones they call "AMP Viewer" URLs. It seems like the "AMP
Viewer" URLs are only produced automatically by Google in search result
pages. But yeah, in any case, you can domain-front the "AMP Cache" URLs.
How these all link together is you have the original non-AMP page:
https://www.reddit.com/r/OutOfTheLoop/comments/56euau/whats_with_google_amp_quite_annoyingly_being_used/
In the source code, there is a `<link rel="amphtml" />` that points to the
AMP version:
https://amp.reddit.com/r/OutOfTheLoop/comments/56euau/whats_with_google_amp_quite_annoyingly_being_used/
The AMP URL (or any URL) can be mechanically converted to an AMP Cache
URL:
https://amp-reddit-
com.cdn.ampproject.org/c/s/amp.reddit.com/r/OutOfTheLoop/comments/56euau/whats_with_google_amp_quite_annoyingly_being_used/
Which in some contexts may appear as an AMP Viewer URL (adds a header to
the page):
https://www.google.com/amp/s/amp.reddit.com/r/funny/comments/8gpwtd/shut_up_and_take_my_money/
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25985#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list