How to Replace Google’s AMP Without Slowing It Down
IFrames cause all of AMP’s problems, but they provide unbeatable performance. Fixing this is hard, but possible.
Dan Fabulich is a Principal Engineer at Redfin. (We’re hiring!)
Discuss on Hacker News
Discuss on Reddit
A lot of people hate AMP, and now that Google has announced AMP for email, the AMP haters are out in force again, and understandably so.
I’m aware of roughly two categories of complaints about AMP on the web:
1) The URL of AMP pages is wrong. It points to google.com instead of the original web site. This is bad, because it puts Google in control of your website. (In a more extreme form: AMP is Google’s strategy to take over the web, an attack on the open internet, etc.)
2) AMP UI sucks, especially on iOS. John Gruber’s comments summarize the main issues:
But other than loading fast, AMP sucks. It implements its own scrolling behavior on iOS, which feels unnatural, and even worse, it breaks the decade-old system-wide iOS behavior of being able to tap the status bar to scroll to the top of any scrollable view. AMP also completely breaks Safari’s ability to search for text on a page (via the “Find on Page” action in the sharing sheet).
[…]
And I forgot to mention back in May that Mobile Safari doesn’t automatically show/hide its browser chrome as you scroll, like it does for any normal web page. AMP pages are also incompatible with Safari Reader mode, making them harder to read for some people, and impossible to read for others.
Even AMP opponents agree that AMP has one noticeable advantage: opening an AMP page from a Google search results page is fast, as fast or faster than the fastest pages on the web.
AMP pages load on Google.com faster than pages with no JavaScript. That’s pretty surprising, especially when you consider that AMP pages use a lot of JavaScript — hundreds of kilobytes of compressed JavaScript, which is normally slower than a snail on a turtle eating molasses in January. How is this possible?
AMP Loads Fast Because of Preloading and Prerendering
Google’s stated goal for AMP is to preload and prerender the web pages in search results. That way, when you click on a link, the page, as well as its images, JavaScript, and CSS are already downloaded (preloading), and the page has finished painting and executing JavaScript (prerendering). Displaying a prerendered page is as fast as switching tabs in your browser.
The performance benefit of prerendering is enormous, so enormous that prerendered pages can load hundreds of KB of JavaScript and still display “instantly” by the time the user taps on a link.
A common misconception is that anyone’s pages could be as fast or faster than AMP by following web performance best practices and removing JavaScript (preferably all JavaScript). Gruber again:
Yes, AMP pages load fast, but you don’t need AMP for fast-loading web pages. If you are a publisher and your web pages don’t load fast, the sane solution is to fix your [fricking] website so that pages load fast, not to throw your hands up in the air and implement AMP.
But even an optimized [everloving] static site can’t outperform prerendering. As a result, calling for Google to “just shut AMP down” would mean losing AMP’s performance advantages.
A Naive Approach to Prerendering Would Violate Your Privacy
The simplest approach that could possibly work would be to download all of the web pages from all 10 Google search results just as soon as the results page loads. That would work, and it would be fast enough, but there’s a big privacy problem with that approach.
Each time your browser visits a web site, that site gets to try to “fingerprint” you, to identify who you are and recognize you for next time.
The easy way to do this is with cookies. A web site can assign your browser a random ID number, and your browser will use that ID number in all subsequent visits to that web site. But cookies are just one way to fingerprint you. There are dozens, if not hundreds of tricky ways to recognize a browser, including your IP address, your browser’s cache, the exact list of fonts your computer has installed, and more.
When web sites fingerprint you, they can collude with other web sites (“tracking providers”) to build a record of what you do online.
Sites can only fingerprint you if you visit them, so you have at least some control over who gets your fingerprints, by deciding which sites to visit.
If Google prerendered all sites in search results, your browser would automatically visit 10 Google-selected sites, some of which might be sites that you’d never want to visit. All of them would be able to track you.
And that’s not all! When you visit a site from search results, it’s often easy to guess what you were searching for. For example, if you visit a top-ranked page about coping with AIDS, you probably searched for something like “coping with AIDS”. It’s bad enough that the site you visit gets to know that, but it would be even worse for all 10 sites in your search results to know that; at least one of them could share that information with a tracking provider, and then every site on the internet would know that you or someone you care about is probably coping with AIDS.
There’s no way to work around this privacy problem while allowing your browser to visit 10 random sites. The only way to prerender while preserving user privacy is for Google to visit those sites on your behalf, and show you the prerendered results.
In AMP, Google Acts As Your Proxy, Which Causes All the Problems
If you want to show someone else’s website on your website, the only way to do it is to use an <iframe>
, putting their site in a little rectangle on your page. That <iframe>
is the crux of all AMP's usability problems.
IFrames Don’t Scroll Correctly
First, the <iframe>
messes up scrolling. The browser can't know whether you’re scrolling the page or the little rectangle. When you’re at the “top” of the page, is that the top of the page, or the top of the rectangle?
In Mobile Safari, the browser chrome doesn’t dismiss when you scroll AMP pages because you’re scrolling the <iframe>
rectangle, not the page itself. Reader mode doesn’t work because it only shows the text in the outer page, not the text in the rectangle.
The URL Bar Shows the Proxy Site’s URL, Not the Original URL
To prevent sites from impersonating one another, browsers always show the domain name of the site you’re visiting in the URL bar. Even when Google has the original site’s permission to proxy AMP pages, the browser has no way of knowing that, so it shows the URL as google.com instead of nytimes.com.
Aside: Google Is Bad at Talking about AMP’s Very Real Problems
Google has really struggled to speak clearly about AMP, and especially has struggled to admit clearly what its problems are and how they intend to fix them.
The most talked about problem with AMP is its URLs: Google AMP URLs point to google.com, and not the original publisher’s website. People argue that this represents Google “taking over the web,” directly attacking the open web.
Serious charges! Paul Bakaus on the AMP team responds with baffling jargon.
The Basics: Analytics attribution and link sharing
Even though the AMP Cache model doesn’t follow the origin model (serving your page from your own domain), we attribute all traffic to you, the publisher. Through the<amp-analytics>
tag, AMP supports a large number of analytics providers (26 to date and growing!), to make sure you can measure your success and the traffic is correctly attributed.When I ask users and developers about why they want to “click-through” to the canonical page from a cached AMP result, the answer is often about link sharing. And granted, it’s annoying to copy a google.com URL instead of the canonical URL. However, the issue isn’t as large of a problem as you’d think: Google amends its cached AMP search results with Schema.org and OpenGraph metadata, so posting the link to any platform that honors these should result in the canonical URL being shared. That being said, there are more opportunities to improve the sharing flow. In native web-views, one could share the canonical directly if the app supports it, and, based on users’ feedback, the team at Google is working on enabling easy access to the canonical URL on all its surfaces.
This explanation isn’t clear at all. The only part in clear language is dismissive: “However, the issue isn’t as large of a problem as you’d think.”
You have to know a lot about AMP in order to understand why he thinks that. (What’s “the origin model?” What’s <amp-analytics>
? What does analytics have to do with this? What do you mean by “canonical”? What metadata is being set? Which platforms honor the metadata, and which ones don't? What does “easy access to the canonical URL” mean, exactly?)
In brief, his answer is: AMP has an analytics feature that lets you count “your” visitors even though they’re not really visiting your own site. And some social sites/apps can translate Google AMP URLs back to their original “canonical” web site, so it shouldn’t be that big a deal. (Unfortunately, very few sites/apps actually do this translation. Facebook doesn’t.)
I wish Google had made a clearer statement, one like this:
AMP has some real problems, but for some sites, the performance benefits outweigh the costs. On AMP pages, the URL isn’t right, and the scrolling feels unnatural. But our #1 goal for AMP is performance, and there’s no way for us to fix AMP’s major problems today without compromising on performance or on your privacy.
We’re working on web standards that will fix these issues, but new standards will take years to arrive, if they arrive at all. We’ll do everything we can in the short term to make AMP sites as pleasant to use as possible, and leave it up to publishers to decide whether AMP’s problems are worse than the performance benefits.
The Fix: Allow Any Site to Securely Proxy Other Sites
Today, if your website wants to save a cached copy of another website and show it to your users, you’d have to use an <iframe>
, too, and all of your URLs would be wrong, too, and your scrolling would be messed up, too.
Google has a proposal that would allow any site to proxy any other site. It’s called the Web Packaging standard. (They’re talking about renaming it because it sounds too much like webpack.)
Under the Web Packaging standard, users could download a compressed file from Google.com that was cryptographically signed by NYTimes.com. When the browser opened that file, it could display the original NYTimes.com content and display nytimes.com
in the URL bar. No <iframe>
would be required, so scrolling would work normally.
Some commentators had assumed that Web Packaging would only fix the URL problem and not the <iframe>
scrolling problem, but, in fact, the only way to fix the URL problem is to stop using <iframe>
s, which would naturally fix the scrolling problem, too. This is Google's fault again for poor messaging. Google's jargon-filled announcement mentioned that it would fix the URL issue (“we're a big fan of meaningful URLs”) but said nothing about the scrolling/usability issues.
If other browsers accepted the Web Packaging standard, the web might look rather different in the future, since basically any site that links to a lot of external sites (Reddit? Twitter? Facebook?) could start linking to prerendered Web Packages, rather the original site. Those sites would appear to just load faster. Web-Packaged pages could one day eliminate the Reddit “hug of death,” where Reddit’s overenthusiastic visitors overwhelm sites hosting original content.
Despite cries that Google is trying to subvert the open web, the result could be a more open web, a web open to copying, sharing, and archiving web sites.
My suggestion to the AMP team: when Web Packaging becomes generally available in mobile browsers, don’t call the new thing “AMP.” The AMP brand is toxic now; just look at how people responded to “AMP for Email.” A proxy system built on Web Packaging will be so different from AMP v1 that it’ll be worthwhile to call it something else.
May I humbly suggest a new name? Secure Nimble Accelerated Packages. Oh, yes. SNAP.
Discuss on Hacker News
Discuss on Reddit
P.S. Redfin is hiring.