DevOps and Build Automation with Python
4th Course in Python Scripting for DevOps Specialization
Browser Automation
In this module, we look at how to utilize pyppeteer in Python
to automate browser interaction
2 © LearnQuest 2021
Learning Objectives
Browser Automation
Upon completion of this module, learners will be able to:
• Describe the use case for headless browsing
• Develop scripts that utilize headless browsing to
visit a webpage
• Develop scripts that utilize headless browsing to extract
elements from a web page
3 © LearnQuest 2021
Lesson 1 In this lesson we extend our toolset
of python to look at utilize python to
Headless Browsing script the web browser automation
4 © LearnQuest 2021
Headless Browsing
• Headless browsers provide automated
control of a web page in an environment
similar to popular web browsers
• Puppeteer is a very popular JavaScript
headless browsing framework -
https://github.com/puppeteer/puppeteer
• Puppeteer automates a headless
Chrome Browser
• Pyppeteer is a port to Python
5 © LearnQuest 2021
Headless Browsing Use Cases
• Test automation in modern web applications
• Taking screenshots of web pages
• Running automated tests for JavaScript libraries
• Scraping web sites for data
• Automating interaction of web pages
6 © LearnQuest 2021
Malicious Headless Browsing Use Cases
• Perform DDoS attacks on web sites
• Increase advertisement impressions
• Automate web sites in unintended ways (credential stuffing)
7 © LearnQuest 2021
Installing Pyppeteer
pip install pyppeteer
8 © LearnQuest 2021
Lesson 1
Headless browsing can be used
Review to test web applications
for performance
Headless browsing can be used to
test web applications for errors
Headless browsing can be used to
take screenshots of webpages
9 © LearnQuest 2021
Lesson 2 In this lesson we look at how to
develop scripts that utilize headless
Writing Scripts to Visit a browsing to visit a webpage
Web Page
10 © LearnQuest 2021
Coroutines
• Coroutines are computer program
components that generalize
subroutines for non-preemptive
multitasking, by allowing multiple
entry points for suspending and
resuming execution at certain
locations
• Asyncio module provides
async/await syntax to build
asychronous coroutines
11 © LearnQuest 2021
Async & Await
Async Await
• The syntax async def • The keyword await passes
introduces either a native function control back to the
coroutine or an event loop. (It suspends the
asynchronous generator execution of the surrounding
coroutine.)
• await f() - returns control
until f finishes
To call a coroutine function, you must await it to get its results.
12 © LearnQuest 2021
Example Headless call
1. import asyncio
2. from pyppeteer import launch
3. async def main():
4. browser = await launch()
5. page = await browser.newPage()
6. await page.goto('https://example.com')
7. await page.screenshot({'path': 'example.png'})
8. await browser.close()
9. asyncio.get_event_loop().run_until_complete(main())
13 © LearnQuest 2021
Lesson 2
Review Async def introduces
a native coroutine
Await passes function control
back to the event loop
To call a coroutine function,
you must await it
14 © LearnQuest 2021
Lesson 3 In this lesson we look at how we
can develop scripts that utilize
Extracting HTML headless browsing to extract
Elements elements from a web page
15 © LearnQuest 2021
Page Class
• This class provides methods to interact with a single tab of chrome
• Goto function to load URL
• Screenshot function to save page as an image
• Content function to return contents as a string
• Metrics function to return a dictionary of metrics in key value pairs
• Emulate function to set agent and other browser characteristics
16 © LearnQuest 2021
Selecting Dom Elements
Gets Elements Inner Text
• Page.querySelector()
• Page.querySelectorAll()
• Page.xpath()
Example:
element = await page.querySelector('h1')
17 © LearnQuest 2021
Running JavaScript
• JavaScript strings can be function or expression.
• Pyppeteer tries to automatically detect the string is function or expression
• force_expr=True option forces pyppeteer to treat the string as expression
Example:
dimensions = await page.evaluate('''() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}''')
18 © LearnQuest 2021
Lesson 3
Review Xpath can be used to navigate
through elements in a page
Evaluate can execute JavaScript
in the context of the page
Arrow functions are used
to express JavaScript
19 © LearnQuest 2021