Skip to content

new-extraction-layer-with-main #2439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 63 commits into
base: main
Choose a base branch
from
Draft

Conversation

MagMueller
Copy link
Collaborator

@MagMueller MagMueller commented Jul 14, 2025

Auto-generated PR for branch: new-extraction-layer-with-main


Summary by cubic

Rebuilt the DOM extraction layer to use pure Chrome DevTools Protocol (CDP), enabling faster and more accurate DOM snapshots, improved element interactivity detection, and better support for iframes and accessibility data.

  • New Features

    • Added enhanced DOM serializers and snapshot processing for richer LLM input.
    • Introduced native element highlighting and paint order calculation.
    • Improved timing logs and performance for large pages.
  • Refactors

    • Removed legacy DOM tree processor and JavaScript-based extraction.
    • Updated all core services and tests to use the new CDP-based DOM state and element types.

…port for identifying iframe content in the DOM tree and improved the logic for analyzing element interactivity based on iframe context. Updated highlighting script to ensure accurate positioning and visibility of iframe elements.
…ssary DOM service call for timing data and streamlined the process by utilizing existing state summary timing info.
…alysis. Added file writing for user messages and timing data, while cleaning up commented-out code and enhancing cache performance reporting.
…enhance iframe handling logic

This commit introduces detailed timing logs for key operations within the BrowserSession class, including page retrieval, DOM processing, and PDF auto-download checks. Additionally, it improves iframe handling by establishing proper parent relationships for iframe content documents, ensuring better traceability of elements within iframes.
@MagMueller MagMueller marked this pull request as draft July 14, 2025 12:00
Copy link

delve-auditor bot commented Jul 14, 2025

No security or compliance issues detected. Reviewed everything up to e79c5d8.

Security Overview
  • 🔎 Scanned files: 30 changed file(s)
Detected Code Changes

The diff is too large to display a summary of code changes.

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

Copy link

github-actions bot commented Jul 14, 2025

Agent Task Evaluation Results: 1/3 (33%)

View detailed results
Task Result Reason
captcha_cloudflare ❌ Fail The agent failed to solve the captcha successfully as indicated by the message 'Cloudflare Turnstile solved incorrectly, please try again.' Consequently, the agent could not extract the 'hostname' value, which must be 'example.com' upon a successful solve. Therefore, the task requirements were not met.
amazon_laptop ✅ Pass The agent navigated to amazon.com, performed a search for 'laptop', and returned the name and relevant details of the first laptop result. All criteria for success were met.
browser_use_pip ❌ Fail The agent did not provide the exact pip installation command 'pip install browser-use' as required. Although it located the repository and described it, it was unable to extract or confirm the precise installation command before the step limit was reached. Therefore, the task is incomplete and does not meet the success criteria.

Check the evaluate-tasks job for detailed task execution logs.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Async Context Manager Scope Issue

The dom_service variable, defined within an async with context manager in _get_updated_state, is out of scope when the finally block attempts to use it for cleanup. This will cause a NameError at runtime. The variable may also be undefined if an exception occurs before the context manager is entered.

browser_use/browser/session.py#L3288-L3294

raise
finally:
start_final_cleanup = time.time()
await self.remove_highlights(dom_service)
end_final_cleanup = time.time()
print(f'⏱️ Final cleanup (remove_highlights) took {end_final_cleanup - start_final_cleanup:.3f} seconds')

Fix in CursorFix in Web


Bug: Rect Validation Fails Silently

The Rect dataclass's __post_init__ method incorrectly returns False when validating rectangle coordinates. As __post_init__ does not process return values, this allows invalid Rect objects (e.g., where x1 > x2 or y1 > y2) to be silently created. It should raise an exception (e.g., ValueError) to prevent the creation of invalid objects.

browser_use/dom/serializer/paint_order.py#L19-L22

def __post_init__(self):
if not (self.x1 <= self.x2 and self.y1 <= self.y2):
return False

browser_use/dom/enhanced_snapshot.py#L19-L22

# Only the ESSENTIAL computed styles for interactivity and visibility detection
REQUIRED_COMPUTED_STYLES = [
# Essential for visibility
'display',

Fix in CursorFix in Web


BugBot free trial expires on July 22, 2025
Learn more in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 5 issues across 22 files. Review them in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

url = url + '/json/version'
async with httpx.AsyncClient() as client:
version_info = await client.get(url)
ws_url = version_info.json()['webSocketDebuggerUrl']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes the JSON response always contains webSocketDebuggerUrl; a missing key will raise KeyError and crash the service.

start_get_targets = time.time()
targets = await cdp_client.send.Target.getTargets()
end_get_targets = time.time()
print(f'⏱️ Target.getTargets() took {end_get_targets - start_get_targets:.3f} seconds')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses print for diagnostic output instead of the project’s logging system, making log handling and log-level control impossible in production.

# print('saved snapshot to tmp/snapshot.json')
# print('saved ax tree to tmp/ax_tree.json')

input('Done. Press Enter to continue...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input() performs a blocking read and will block the entire event loop when called inside an async function, defeating the benefits of asyncio.

end = time.time()
print(f'Time taken: {end - start} seconds')

async with aiofiles.open('tmp/enhanced_dom_tree.json', 'w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes the "tmp" directory already exists; aiofiles.open will raise FileNotFoundError if it doesn't, causing the script to crash on first run.


def __post_init__(self):
if not (self.x1 <= self.x2 and self.y1 <= self.y2):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a value from post_init silently ignores invalid rectangle coordinates instead of raising an error, so invalid Rect instances can still be constructed

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 7 issues across 24 files. Review them in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

start_get_targets = time.time()
targets = await cdp_client.send.Target.getTargets()
end_get_targets = time.time()
print(f'⏱️ Target.getTargets() took {end_get_targets - start_get_targets:.3f} seconds')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses print for diagnostic output instead of the project’s logging system, making log handling and log-level control impossible in production.

url = url + '/json/version'
async with httpx.AsyncClient() as client:
version_info = await client.get(url)
ws_url = version_info.json()['webSocketDebuggerUrl']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes the JSON response always contains webSocketDebuggerUrl; a missing key will raise KeyError and crash the service.

end = time.time()
print(f'Time taken: {end - start} seconds')

async with aiofiles.open('tmp/enhanced_dom_tree.json', 'w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes the "tmp" directory already exists; aiofiles.open will raise FileNotFoundError if it doesn't, causing the script to crash on first run.

# print('saved snapshot to tmp/snapshot.json')
# print('saved ax tree to tmp/ax_tree.json')

input('Done. Press Enter to continue...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input() performs a blocking read and will block the entire event loop when called inside an async function, defeating the benefits of asyncio.


def __post_init__(self):
if not (self.x1 <= self.x2 and self.y1 <= self.y2):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a value from post_init silently ignores invalid rectangle coordinates instead of raising an error, so invalid Rect instances can still be constructed

SelectorMap = dict[int, DOMElementNode]
# Combine both for final hash
combined_string = f'{parent_branch_path_string}|{attributes_string}'
element_hash = hashlib.sha256(combined_string.encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing a full SHA-256 digest every time hash is called is unnecessarily expensive and defeats Python’s built-in hash caching; this can degrade performance when EnhancedDOMTreeNode objects are used as dictionary keys or set members.

'is_scrollable': self.is_scrollable,
'frame_id': self.frame_id,
'content_document': self.content_document.__json__() if self.content_document else None,
'shadow_root_type': self.shadow_root_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw Enum instance is stored directly in the JSON payload; json.dumps will fail because the default encoder cannot serialize Enum objects.

Suggested change
'shadow_root_type': self.shadow_root_type,
'shadow_root_type': self.shadow_root_type.name if self.shadow_root_type else None,

MagMueller and others added 7 commits July 14, 2025 14:44
- Integrated @observe_debug and @time_execution_sync decorators in the DomService class for get_dom_tree and get_serialized_dom_tree methods to enhance debugging and performance tracking.
- Updated PaintOrderRemover class with @observe_debug and @time_execution_sync in calculate_paint_order method.
- Enhanced DOMTreeSerializer class with multiple @time_execution_sync decorators for various methods to improve execution time tracking and debugging capabilities.
- Introduced the @observe_debug decorator to the serialize_accessible_elements method to enhance debugging capabilities by ignoring input and output.
- Removed @time_execution_sync decorator from _assign_interactive_indices_and_mark_new_nodes method to streamline observability in the DOMTreeSerializer class.
@shivamkhatri
Copy link
Contributor

shivamkhatri commented Jul 23, 2025

@MagMueller , @gregpr07 will this PR help in solving below issue?:
#2519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants