Description
I have been using Playwright with the Scrapy web scraping framework, this is the plugin: https://github.com/scrapy-plugins/scrapy-playwright
Scrapy is designed to cleanly shutdown on SIGINT, saving its crawl state so that it can be resumed. When used with Playwright the SIGINT immediately closes the browsers (with both handle_sigint=False
or True
), this results in the currently processing pages crashing and the shutdown failing with the state not being saved. The ideal way to fix this is to prevent the SIGINT from being passed to the browsers so that scrapy-playwright can handle their clean shutdown as active pages finish being processed.
I have successfully made this work on posix by monkey patching Playwright based on this from stackoverflow to include a preexec_fn
:
# Monkey patch Playwright
from playwright._impl._transport import PipeTransport, _get_stderr_fileno
def new_pipe_transport_connect_preexec(): # Don't forward signals.
os.setpgrp()
async def new_pipe_transport_connect(self):
self._stopped_future: asyncio.Future = asyncio.Future()
# Hide the command-line window on Windows when using Pythonw.exe
creationflags = 0
if sys.platform == "win32" and sys.stdout is None:
creationflags = subprocess.CREATE_NO_WINDOW
try:
# For pyinstaller
env = os.environ.copy()
if getattr(sys, "frozen", False):
env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")
self._proc = await asyncio.create_subprocess_exec(
str(self._driver_executable),
"run-driver",
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=_get_stderr_fileno(),
limit=32768,
creationflags=creationflags,
env=env,
preexec_fn=new_pipe_transport_connect_preexec, # My new line!!
)
except Exception as exc:
self.on_error_future.set_exception(exc)
raise exc
self._output = self._proc.stdin
PipeTransport.connect = new_pipe_transport_connect
What would be ideal is either a flag that could be set to enable this (for both Posix and Windows) or a way to pass a custom preexec_fn
when starting a browser.
See the related bug on scrapy-playwright scrapy-plugins/scrapy-playwright#62