Cross-site leaks: Difference between revisions

Content deleted Content added
Defences: ce to make background more readable
m Punctuation and spacing wrt refs
 
(18 intermediate revisions by 4 users not shown)
Line 10:
Cross-site leaks are a diverse form of attack, and there is no consistent classification of such attacks. Multiple sources classify cross-site leaks by the technique used to leak information. Among the well-known cross-site leaks are timing attacks, which depend on timing events within the web browser. Error events constitute another category, using the presence or absence of events to disclose data. Additionally, [[Cache timing attack|cache-timing attacks]] rely on the web cache to unveil information. Since 2023, newer attacks that use operating systems and web browser limits to leak information have also been found.
 
Before 2017, defending against cross-site leaks was considered to be difficult. This was because many of the information leakage issues exploited by cross-site leak attacks were inherent to the way websites worked. Most defences against this class of attacks have been introduced after 2017 in the form of extensions to the [[HTTP|hypertext transfer protocol]] (HTTP). These extensions allow websites to instruct the browser to disallow or annotate certain kinds of [[State (computer science)|stateful]] requests coming from other websites. One of the most successful approaches browsers have implemented is [[SameSite cookie|SameSite]] cookies. SameSite cookies allow websites to set a directive that prevents other websites from accessing and sending sensitive cookies. Another defencesdefence involves using [[List of HTTP header fields|HTTP headers]] to restrict which websites can embed a particular site. Cache partitioning also serves as a defence against cross-site leaks, preventing other websites from using the web cache to exfiltrate data.
 
== Background ==
{{Further information|Same-origin policy|Cross-origin resource sharing}}
 
[[Web application|Web applications]] (web apps) have two primary components: a web browser and one or more [[Web server|web servers]]. The browser typically interacts with the servers via [[HTTP|hyper text transfer protocol]] (HTTP) and [[WebSocket]] connections to deliver a web applicationapp.{{refn|While there are other possible ways for interactions between web browsers and web servers to occur (such as the [[WebRTC|WebRTC protocol]]), in the context of cross-site leaks, only the HTTP interactions and WebSocket connections are considered important.{{sfn|Knittel|Mainka|Niemietz|Noß|2021|pp=1773,1776}} The rest of the article will assume the HTTP interactions and WebSocket connections are the only two ways for web browsers to interact with web servers.|group=note}} CertainTo parts ofmake the web applicationapp needinteractive, tothe reactbrowser to useralso input or other client-side logic; this is done by renderingrenders [[HTML]] orand [[CSS]], or byand executingexecutes [[JavaScript]] ascode partprovided ofby the processweb ofapp. deliveringThese elements allow the websiteweb app to react to user inputs and run client-side logic.<ref>{{Cite web |date=2023-07-24 |title=How the web works – Learn web development {{!}} MDN |url=https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/How_the_Web_works |url-status=live |archive-url=https://web.archive.org/web/20230924191546/https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/How_the_Web_works |archive-date=2023-09-24 |access-date=2023-10-01 |website=[[MDN Web Docs]] |language=en-US}}</ref> Often, theseusers userinteract interactions requirewith the web app toover makelong periods of time, making multiple requests to the server over extended periods. To keep track of such requests, web apps often use a persistent identifier tied to a specific user through their current session or user account.<ref>{{Cite web |last1=Wagner |first1=David |last2=Weaver |first2=Nicholas |last3=Kao |first3=Peyrin |last4=Shakir |first4=Fuzail |last5=Law |first5=Andrew |last6=Ngai |first6=Nicholas |title=Cookies and Session Management |url=https://textbook.cs161.org/web/cookies.html |access-date=2024-03-24 |website=[[UC Berkeley]] CS-161 Computer Security Textbook |language=en-US}}</ref> This accountidentifier orcan sessioninclude willdetails have certain identifiable attributes, such aslike age or access level, reflectingwhich reflect the user's previous interactionshistory with the web app. If revealed to other websites, these identifiable attributes might [[Data re-identification|deanonymize]] the user.{{sfn|Sudhodanan|Khodayari|Caballero|2020|pp=2-3}}
 
In an ideal scenarioIdeally, these user interactions andeach web serverapp communicationsshould wouldoperate beindependently donewithout so that no two web apps are allowed to interfereinterfering with each otherothers. However, due to various design choices made during the early years of the web, web apps can regularly interact with each other.{{sfn|Zalewski|2011|p=15}} To prevent the abuse of this behavior, web browsers enforce a set of rules called the [[same-origin policy]] that limits direct interactions between web applications from different sources.{{sfn|Schwenk|Niemietz|Mainka|2017|p=713}}{{sfn|Zalewski|2011|p=16}} Despite these restrictions, web apps often need to load content from external sources, such as instructions for displaying elements on a page, design layouts, and videos or images. These types of interactions, called cross-origin requests, are exceptions to the same-origin policy.{{sfn|Somé|2018|pp=13-14}} They are strictly governed by a set of strict rules known as the [[cross-origin resource sharing]] (CORS) framework. CORS ensures that such interactions occur under controlled conditions, by preventing unauthorized access to data that a web app is not allowed to see,. This is achieved by requiring explicit permission before other websites can access the contents of these requests.<ref>{{Cite web |date=2023-12-20 |title=Same-origin policy - Security on the web {{!}} MDN |url=https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy |access-date=2024-03-24 |website=[[MDN Web Docs]] |language=en-US}}</ref>
 
Cross-site leaks allow attackers to obtain information despitecircumvent the restrictions imposed by the same-origin policy. Cross-siteand the CORS framework. leaksThey leverage information-leakage issues ([[side channels]]) that have historically been present in browsers. Using thethese leaksside channels, an attacker can execute code tothat can infer informationdetails about thedata currentthat statethe ofsame a different execution context via cross-origin requestspolicy orwould by embedding the victim web app in the attacking webhave appshielded.{{sfn|Knittel|Mainka|Niemietz|Noß|2021|p=17721774}} TheThis attackerdata can then accessbe used to reveal information about a user's previous interactions with a web app.{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2021|p=1}}
 
== Mechanism ==
Line 26:
Cross-site leak attacks part-2-phish.svg|An attacker identifies a vulnerable [[URL]] and [[Phishing|phishes]] the user to their website using an email. When the user goes to the attacker's website, the attacker can make malicious requests to the web server using the vulnerable URL.
Cross-site leak attacks part-3-exfil.svg|The attacker is prevented from reading the web server's response. However, other factors like the response time or size can be measured by the attacker, leaking information about the response – a [[side-channel attack]].
</gallery>To carry out a cross-site leak attack, an attacker must first study how a website interacts with users. They need to identify a specific [[URL]] that produces different [[HTTP|Hyper Text Transfer Protocol]] (HTTP) responses based on the user's past actions on the site.{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2747}}{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2022|p=787}} For instance, if the attacker is trying to attack [[Gmail]], they could try to find a search URL that returns ana different HTTP response based on how many search results are found for a specific search term in a user's emails.{{sfn|Gelernter|Herzberg|2015|pp=1399–1402}} Once an attacker finds a specific URL, they can then host a website and [[Phishing|phish]] or otherwise lure unsuspecting users to the website. Once the victim is on the attacker's website, the attacker can use various embedding techniques to initiate cross-origin HTTP requests to the URL identified by the attacker.{{sfn|Sudhodanan|Khodayari|Caballero|2020|p=1}} However, since the attacker is on a different website, the [[same-origin policy]] imposed by the web browser will prevent the attacker from directly reading any part of the response sent by the vulnerable website.{{refn|group=note|This includes metadata associated with the response like status codes and HTTP headers{{sfn|Van Goethem|Vanhoef|Piessens|Joosen|2016|p=448}}}}{{sfn|Van Goethem|Vanhoef|Piessens|Joosen|2016|p=448}}
 
To circumvent this security barrier, the attacker can use browser-leak methods, to distinguish subtle differences between different responses. Browser leak methods are [[JavaScript]], [[CSS]] or [[HTML]] snippets that leverage long-standing [[information leakage]] issues ([[Side-channel attack|side channels]]) in the web browser to reveal specific characteristics about a HTTP response.{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2747}}{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2022|p=787}} In the case of Gmail, the attacker could use JavaScript to time how long the browser took to [[Parsing|parse]] the HTTP response returned by the search result. If the time taken to parse the response returned by the endpoint was low, the attacker could infer that there were no search results for their query. Alternatively, if the site took longer, the attacker could infer that multiple search results were returned.{{sfn|Gelernter|Herzberg|2015|pp=1399–1402}} The attacker can subsequently use the information gained through these information leakages to exfiltrate sensitive information, which can be used to track and [[Data re-identification|deanonymize]] the victim.{{sfn|Sudhodanan|Khodayari|Caballero|2020|p=1}} In the case of Gmail, the attacker could make a request to the search endpoint with a query and subsequently measure the time the query took to figure out whether or not the user had any emails containing a specific query string.{{refn|group=note|An example of such a query could be the name of a well known bank, or the contact information of a person or organization that the user is expected to have interacted with.{{sfn|Gelernter|Herzberg|2015|p=1400}}}} If a response takes very little time to be processed, the attacker can assume that no search results were returned. Conversely, if a response takes a large amount amount of time to be processed, the attacker infer that a lot of search results were returned. By making multiple requests, an attacker could gain significant insight into the current state of the victim application, potentially revealing private information of a user, helping launch sophisticated spamming and phishing attacks.{{sfn|Gelernter|Herzberg|2015|p=1400}}
 
== History ==
Cross-site leaks have been known about since 2000;{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2754}} research papers dating from that year from [[Purdue University]] describe a theoretical attack that uses the HTTP cache to compromise the privacy of a user's browsing habits.{{sfn|Felten|Schneider|2000|pp=25,26,27,31}} In 2007, Andrew Bortz and [[Dan Boneh]] from [[Stanford University]] published a white paper detailing an attack that made use of timing information to determine the size of cross-site responses.{{sfn|Bortz|Boneh|2007|pp=623–625}} In 2015, researchers from [[Bar-Ilan University]] described a cross-site search attack that used similar leaking methods. butThe usedattack employed a technique in which the input was crafted to grow the size of the responses, leading to a proportional growth in the time taken to generate the responses, thus increasing the attack's accuracy.{{sfn|Gelernter|Herzberg|2015|pp=1394–1397}}
 
Independent security researchers have published blog posts describing cross-site leak attacks against real-world applications. In 2009, Chris Evans described an attack against [[Yahoo! Mail]] via which a malicious site could search a user's inbox for sensitive information.<ref name="PortSwigger">{{Cite web |last=Walker |first=James |date=2019-03-21 |title=New XS-Leak techniques reveal fresh ways to expose user information |url=https://portswigger.net/daily-swig/new-xs-leak-techniques-reveal-fresh-ways-to-expose-user-information |url-status=live |archive-url=https://web.archive.org/web/20231029162650/https://portswigger.net/daily-swig/new-xs-leak-techniques-reveal-fresh-ways-to-expose-user-information |archive-date=2023-10-29 |access-date=2023-10-29 |website=The Daily Swig |language=en}}</ref> In 2018, Luan Herrara describedfound a securitycross-site exploitleak that[[Vulnerability allowed(computing)|vulnerability]] them to exfiltrate data about sensitive security issues using the search functionality ofin Google's Monorail bug tracker, thatwhich is actively used by [[Open source|open-source]] projects such as thelike [[Chromium (web browser)|Chromium]], Angle, and [[Skia Graphics Engine]]. This exploit allowed Herrara to exfiltrate data about sensitive security issues by abusing the search endpoint of the bug tracker.{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2021|pp=1,6}}<ref>{{Cite web |last=Herrera |first=Luan |date=2019-03-31 |title=XS-Searching Google's bug tracker to find out vulnerable source code |url=https://medium.com/@luanherrera/xs-searching-googles-bug-tracker-to-find-out-vulnerable-source-code-50d8135b7549 |url-status=live |archive-url=https://web.archive.org/web/20231029162653/https://medium.com/@luanherrera/xs-searching-googles-bug-tracker-to-find-out-vulnerable-source-code-50d8135b7549 |archive-date=2023-10-29 |access-date=2023-10-29 |website=Medium |language=en}}</ref> In 2019, Terjanq, a Polish security researcher, published a blog post describing a cross-site search attack that allowed them to exfiltrate sensitive user information across high-profile Google products.{{sfn|Knittel|Mainka|Niemietz|Noß|2021|p=1772}}<ref name=":3">{{Cite web |author=Terjanq |title=Mass XS-Search using Cache Attack – HackMD |url=https://terjanq.github.io/Bug-Bounty/Google/cache-attack-06jd2d2mz2r0/index.html |url-status=live |archive-url=https://web.archive.org/web/20231029162649/https://terjanq.github.io/Bug-Bounty/Google/cache-attack-06jd2d2mz2r0/index.html |archive-date=2023-10-29 |access-date=2023-10-29 |publisher=[[GitHub]]}}</ref>
 
As part of its increased focus on dealing with security issues that depend on misusing long-standing [[Web platform|web-platform]] features, Google launched XSLeaks Wiki in 2020. inThe aninitiative attemptaimed to create an open-knowledge database, about web-platform features that were being misused and analysing and compiling information about cross-site leak attacks.<ref name="PortSwigger" />{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2021|p=10}}{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2756}}
 
Since 2020, there has been some interest among the academic security community in standardizing the classification of these attacks. In 2020, Sudhodanan et al. were among the first to systematically summarize previous work in cross-site leaks, and developed a tool called BASTA-COSI that could be used to detect leaky URLs. {{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2756}}{{sfn|Sudhodanan|Khodayari|Caballero|2020|p=2}} In 2021, Knittel et al. proposed a new formal model to evaluate and characterize cross-site leaks, allowing the researchers to find new leaks affecting several browsers. {{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2756}}{{sfn|Knittel|Mainka|Niemietz|Noß|2021|p=1773}} In 2022, Van Goethem et al. evaluated currently available defences against these attacks and extended the existing model to consider the state of browser components as part of the model.{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2756}}{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2022|p=787}} In 2023, a paper published by Rautenstrauch et al. systemizing previous research into cross-site leaks was awarded the Distinguished Paper Award at the [[IEEE Symposium on Security and Privacy]].<ref>{{Cite web |title=IEEE Symposium on Security and Privacy 2023 |url=https://sp2023.ieee-security.org/program-awards.html |url-status=live |archive-url=https://web.archive.org/web/20231029162649/https://sp2023.ieee-security.org/program-awards.html |archive-date=2023-10-29 |access-date=2023-10-29 |website=sp2023.ieee-security.org}}</ref>
== Threat model ==
The [[threat model]] of a cross-site leak relies on the attacker being able to direct the victim to a malicious website that is at least partially under the attacker's control. The attacker can accomplish this by compromising a web page, by phishing the user to a web page and loading arbitrary code, or by using a malicious advertisement on an otherwise-safe web page.{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2022|p=786}}{{sfn|Sudhodanan|Khodayari|Caballero|2020|p=11}}
Line 53:
While initially used only to differentiate between the time it took for a HTTP request to resolve a response,{{sfn|Bortz|Boneh|2007|pp=623–625}} research performed after 2007 has demonstrated the use of this leak technique to detect other differences across web-app states. In 2017, Vila et al. showed timing attacks could infer cross-origin execution times across embedded contexts. This was made possible by a lack of [[site isolation]] features in contemporaneous browsers, which allowed an attacking website to slow down and amplify timing differences caused by differences in the amount of JavaScript being executed when events were sent to a victim web app.{{sfn|Vila|Köpf|2017|pp=851–853}}{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2022|p=796}}
 
In 2021, Knittel et al. showed the Performance API{{refn|group=note|The Performance API is a set of Javascript functions that allow websites to retrieve various [[Web performance#Metrics|metrics associated with web performance]]<ref>{{Cite web |date=2023-02-19 |title=Performance - Web APIs {{!}} MDN |url=https://developer.mozilla.org/en-US/docs/Web/API/Performance |access-date=2024-03-11 |website=[[MDN Web Docs]] |language=en-US}}</ref>}} could leak the presence or absence of redirects in responses. This was possible due to a bug in the Performance API that allowed the amount of time shown to the user to be negative when a redirect occurred. [[Google Chrome]] subsequently fixed this bug.{{sfn|Knittel|Mainka|Niemietz|Noß|2021|p=1778}} In 2023, Snyder et al. showed timing attacks could be used to perform pool-party attacks in which websites could block shared resources by exhausting their global quota. By making the victim web app execute JavaScript that used these shared resources and then timing how long these executions took, the researchers were able to reveal information about the state of a web app. {{sfn|Snyder|Karami|Edelstein|Livshits|2023|p=7095}}
 
=== Error events ===
Line 116:
== Defences ==
 
Despite cross-site leaks being known about since 2000, most defences have been introduced afterBefore 2017.{{sfn|Van Goethem|Franken|Sanchez-Rola|Dworken|2021|p=16}} Before the introduction of these defences, websites could defend against cross-site leaks by ensuring the same response was returned for all application states, thwarting the attacker's ability to differentiate the requests. This approach was infeasible for any non-trivial website. The second approach was to create session-specific URLs that would not work outside a user's session. This approach limitslimited link sharing, and iswas impractical.{{sfn|Rautenstrauch|Pellegrino|Stock|2023|p=2754}}{{sfn|Zaheri|Curtmola|2021|p=160}}
 
Most modern defences are extensions to the HTTP protocol that either prevent state changes, make cross-origin requests [[Stateless protocol|stateless]], or completely isolate shared resources across multiple origins.{{sfn|Knittel|Mainka|Niemietz|Noß|2021|p=1780}}
Line 217:
* {{Cite book |last1=Van Goethem |first1=Tom |title=Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security |last2=Franken |first2=Gertjan |last3=Sanchez-Rola |first3=Iskander |last4=Dworken |first4=David |last5=Joosen |first5=Wouter |date=2022-05-30 |publisher=Association for Computing Machinery |isbn=978-1-4503-9140-5 |pages=784–798 |language=en |chapter=SoK: Exploring Current and Future Research Directions on XS-Leaks through an Extended Formal Model |doi=10.1145/3488932.3517416 |doi-access=free |s2cid=248990284}}{{Creative Commons text attribution notice|cc=by4|author(s)=Tom Van Goethem, Gertjan Franken, Iskander Sanchez-Rola, David Dworken and Wouter Joosen}}
* {{Cite book |last1=Van Goethem |first1=Tom |title=2023 IEEE Security and Privacy Workshops (SPW) |last2=Sanchez-Rola |first2=Iskander |last3=Joosen |first3=Wouter |date=2023 |publisher=IEEE |isbn=979-8-3503-1236-2 |pages=371–383 |language=en-US |chapter=Scripted Henchmen: Leveraging XS-Leaks for Cross-Site Vulnerability Detection |doi=10.1109/SPW59333.2023.00038 |access-date=2023-11-07 |chapter-url=https://ieeexplore.ieee.org/document/10188656 |s2cid=259267534 |s2cid-access=free}}
* {{Cite book|lastlast1=Van Goethem |firstfirst1=Tom |last2=Vanhoef |first2=Mathy |last3=Piessens |first3=Frank |last4=Joosen |first4=Wouter |date=2016 |title=Request and Conquer: Exposing {Cross-Origin} Resource Size |url=https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vangoethem |language=en |pages=447–462 |isbn=978-1-931971-32-4}}
* {{Cite book |last1=Van Goethem |first1=Tom |last2=Joosen |first2=Wouter |last3=Nikiforakis |first3=Nick |chapter=The Clock is Still Ticking: Timing Attacks in the Modern Web |date=2015-10-12 |title=Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security |chapter-url=https://doi.org/10.1145/2810103.2813632 |series=CCS '15 |publisher=Association for Computing Machinery |pages=1382–1393 |doi=10.1145/2810103.2813632 |isbn=978-1-4503-3832-5|s2cid=17705638|s2cid-access=free }}
* {{Cite journal |last1=Vila |first1=Pepe |last2=Köpf |first2=Boris |date=2017 |title=Loophole: Timing Attacks on Shared Event Loops in Chrome |url=https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/vila |journal=SEC'17: Proceedings of the 26th USENIX Conference on Security Symposium |language=en |pages=849–864 |arxiv=1702.06764 |isbn=978-1-931971-40-9}}
* {{Cite book |last1=Zaheri |first1=Mojtaba |title=Security and Privacy in Communication Networks |last2=Curtmola |first2=Reza |date=2021 |publisher=Springer International Publishing |isbn=978-3-030-90022-9 |editor-last=Garcia-Alfaro |editor-first=Joaquin |series=Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering |volume=399 |pages=143–163 |language=en |chapter=Leakuidator: Leaky Resource Attacks and Countermeasures |doi=10.1007/978-3-030-90022-9_8|doi-access=free |editor2-last=Li |editor2-first=Shujun |editor3-last=Poovendran |editor3-first=Radha |editor4-last=Debar |editor4-first=Hervé |editor5-last=Yung |editor5-first=Moti |s2cid=237476137}}
* {{Cite journal |last1=Zaheri |first1=Mojtaba |last2=Oren |first2=Yossi |last3=Curtmola |first3=Reza |date=2022 |title=Targeted Deanonymization via the Cache Side Channel: Attacks and Defenses |url=https://www.usenix.org/conference/usenixsecurity22/presentation/zaheri |language=en |pages=1505–1523 |journal=Proceedings of the 31th USENIX Conference on Security Symposium |series=SEC '22 |isbn=978-1-939133-31-1 |s2cid=251092191 |s2cid-access=free}}
* {{Cite book |last=Zalewski |first=Michal |url=https://books.google.com/books?id=NU3wOk2jzWsC&newbks=0&hl=en |title=The Tangled Web: A Guide to Securing Modern Web Applications |date=2011-11-15 |publisher=No Starch Press |isbn=978-1-59327-388-0 |language=en}}
{{refend}}