Skip to content

extraction-magnus-revert-with-main-with-extension #2464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

MagMueller
Copy link
Collaborator

@MagMueller MagMueller commented Jul 16, 2025

Auto-generated PR for branch: extraction-magnus-revert-with-main-with-extension


Summary by cubic

Rebuilt the DOM extraction and serialization system to use Chrome DevTools Protocol (CDP) for more accurate detection of interactive elements, improved accessibility data, and better performance.

  • New Features

    • Added enhanced DOM tree and snapshot extraction using CDP, including visibility, interactivity, and accessibility properties.
    • Introduced new serializers for LLM-friendly DOM representations and selector maps.
    • Enabled default browser extensions for ad blocking and cookie handling.
  • Refactors

    • Removed legacy DOM processing modules and replaced them with new CDP-based implementations.
    • Updated tests and examples to use the new DOM state and element types.

Copy link

delve-auditor bot commented Jul 16, 2025

No security or compliance issues detected. Reviewed everything up to 99f285e.

Security Overview
  • 🔎 Scanned files: 31 changed file(s)
Detected Code Changes

The diff is too large to display a summary of code changes.

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 8 issues across 24 files. Review them in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

import urllib.request

try:
with urllib.request.urlopen(url) as response:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urllib.request.urlopen is invoked without a timeout, so a slow or unresponsive server can block the entire process indefinitely. Provide an explicit timeout to make the download operation fail fast.

Suggested change
with urllib.request.urlopen(url) as response:
with urllib.request.urlopen(url, timeout=30) as response:


# Get attributes hash
attributes_string = ''.join(f'{key}={value}' for key, value in self.attributes.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash computation depends on the iteration order of a dict, but dict order is not considered in equality. Two nodes that are considered equal (same key-value pairs in different insertion order) can generate different hashes, breaking the requirement that equal objects have identical hash values.

Suggested change
attributes_string = ''.join(f'{key}={value}' for key, value in self.attributes.items())
attributes_string = ''.join(f'{key}={self.attributes[key]}' for key in sorted(self.attributes))


# Extract stacking contexts if available
if layout_idx < len(layout.get('stackingContexts', [])):
stacking_contexts = layout.get('stackingContexts', {}).get('index', [])[layout_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stackingContexts is a RareBooleanData object; directly indexing into get('index', [])[layout_idx] risks IndexError and ignores the helper used elsewhere for rare boolean parsing.

]


def _parse_rare_boolean_data(rare_data: RareBooleanData, index: int) -> bool | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation always returns a boolean, never None, so the return type annotation is misleading and could confuse type-checkers or readers.

Suggested change
def _parse_rare_boolean_data(rare_data: RareBooleanData, index: int) -> bool | None:
def _parse_rare_boolean_data(rare_data: RareBooleanData, index: int) -> bool:

# Show startup info
print('\n🌐 BROWSER-USE DOM EXTRACTION TESTER')
print(f'📊 {len(websites)} websites total: {len(sample_websites)} standard + {len(difficult_websites)} complex')
print('🔧 Controls: Type 1-15 to jump | Enter to re-run | "n" next | "q" quit')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control hint hard-codes the range 1-15 even though the actual website list length is dynamic (currently 17). This can mislead users and cause unnecessary input errors.

Suggested change
print('🔧 Controls: Type 1-15 to jump | Enter to re-run | "n" next | "q" quit')
print(f'🔧 Controls: Type 1-{len(websites)} to jump | Enter to re-run | "n" next | "q" quit')


def __init__(self, page: 'Page', logger: logging.Logger | None = None):
Either browser or page must be provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states that either browser or page must be provided, but the constructor requires both parameters. This mismatch can confuse users and maintainers.

# cache the session id for this playwright page
# self.playwright_page_to_session_id_store[page_guid] = target['targetId']

session = await cdp_client.send.Target.attachToTarget(params={'targetId': target['targetId'], 'flatten': True})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new Target.attachToTarget is issued every time _get_current_page_session_id runs, but the returned session is never detached or reused. This can leak DevTools sessions and exhaust browser resources over time.

interactive_count,
total_nodes,
# processed_nodes,
print(f'⚠️ Viewport size detection failed: {e}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct print statements inside library code bypass the project's logging system, making it harder to control verbosity and aggregate logs. Use the standard logger instead.

Copy link

github-actions bot commented Jul 16, 2025

Agent Task Evaluation Results: 2/3 (67%)

View detailed results
Task Result Reason
captcha_cloudflare ❌ Fail The agent reported that it attempted to solve the Cloudflare Turnstile captcha as instructed, but the captcha was not solved successfully. Consequently, the 'hostname' value under the success message was not found or extracted. Since the agent did not solve the captcha and did not extract the hostname 'example.com', the task was not completed successfully.
amazon_laptop ✅ Pass The agent successfully navigated to amazon.com, performed a search for 'laptop', and returned the name and details of the first laptop result. Therefore, it fulfilled all the criteria specified for the task.
browser_use_pip ✅ Pass The agent successfully provided the pip installation command 'pip install browser-use' as requested, meeting the success criteria.

Check the evaluate-tasks job for detailed task execution logs.

cursor[bot]

This comment was marked as outdated.

@MagMueller MagMueller marked this pull request as draft July 16, 2025 19:51
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Click Logic Fails for Larger Elements

The fallback coordinate clicking logic incorrectly uses the top-left corner (element_node.snapshot_node.bounds.x/y) instead of the element's center. This change from the previous element_node.viewport_coordinates.center.x/y causes clicks to miss their intended targets, especially for larger elements. The correct coordinates should be calculated as (bounds.x + bounds.width/2, bounds.y + bounds.height/2).

browser_use/browser/session.py#L2203-L2213

# Final fallback - try clicking by coordinates if available
if element_node.snapshot_node and element_node.snapshot_node.bounds:
try:
# TODO: instead of using the cached center, we should use the actual center of the element (easy, just get it by nodeBackendId)
self.logger.warning(
f'⚠️ Element click failed, falling back to coordinate click at ({element_node.snapshot_node.bounds.x}, {element_node.snapshot_node.bounds.y})'
)
await page.mouse.click(
element_node.snapshot_node.bounds.x,
element_node.snapshot_node.bounds.y,
)

Fix in CursorFix in Web


Bug: Async Context Manager Scope Issue

The finally block attempts to access the dom_service variable, which is defined within an async with context manager. This causes a NameError because dom_service is out of scope when the finally block executes.

browser_use/browser/session.py#L3323-L3325

finally:
await self.remove_highlights(dom_service)

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants