-
Notifications
You must be signed in to change notification settings - Fork 7.8k
new-extraction-layer-with-main #2439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…to new-extraction-layer-magnus
…port for identifying iframe content in the DOM tree and improved the logic for analyzing element interactivity based on iframe context. Updated highlighting script to ensure accurate positioning and visibility of iframe elements.
…ssary DOM service call for timing data and streamlined the process by utilizing existing state summary timing info.
…alysis. Added file writing for user messages and timing data, while cleaning up commented-out code and enhancing cache performance reporting.
…enhance iframe handling logic This commit introduces detailed timing logs for key operations within the BrowserSession class, including page retrieval, DOM processing, and PDF auto-download checks. Additionally, it improves iframe handling by establishing proper parent relationships for iframe content documents, ensuring better traceability of elements within iframes.
✅ No security or compliance issues detected. Reviewed everything up to e79c5d8. Security Overview
Detected Code ChangesThe diff is too large to display a summary of code changes. Reply to this PR with |
Agent Task Evaluation Results: 1/3 (33%)View detailed results
Check the evaluate-tasks job for detailed task execution logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Async Context Manager Scope Issue
The dom_service
variable, defined within an async with
context manager in _get_updated_state
, is out of scope when the finally
block attempts to use it for cleanup. This will cause a NameError
at runtime. The variable may also be undefined if an exception occurs before the context manager is entered.
browser_use/browser/session.py#L3288-L3294
browser-use/browser_use/browser/session.py
Lines 3288 to 3294 in a484a3a
raise | |
finally: | |
start_final_cleanup = time.time() | |
await self.remove_highlights(dom_service) | |
end_final_cleanup = time.time() | |
print(f'⏱️ Final cleanup (remove_highlights) took {end_final_cleanup - start_final_cleanup:.3f} seconds') | |
Bug: Rect Validation Fails Silently
The Rect
dataclass's __post_init__
method incorrectly returns False
when validating rectangle coordinates. As __post_init__
does not process return values, this allows invalid Rect
objects (e.g., where x1 > x2
or y1 > y2
) to be silently created. It should raise an exception (e.g., ValueError
) to prevent the creation of invalid objects.
browser_use/dom/serializer/paint_order.py#L19-L22
browser-use/browser_use/dom/serializer/paint_order.py
Lines 19 to 22 in a484a3a
def __post_init__(self): | |
if not (self.x1 <= self.x2 and self.y1 <= self.y2): | |
return False |
browser_use/dom/enhanced_snapshot.py#L19-L22
browser-use/browser_use/dom/enhanced_snapshot.py
Lines 19 to 22 in a484a3a
# Only the ESSENTIAL computed styles for interactivity and visibility detection | |
REQUIRED_COMPUTED_STYLES = [ | |
# Essential for visibility | |
'display', |
BugBot free trial expires on July 22, 2025
Learn more in the Cursor dashboard.
Was this report helpful? Give feedback by reacting with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic found 5 issues across 22 files. Review them in cubic.dev
React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai
to give specific feedback.
browser_use/dom/service.py
Outdated
url = url + '/json/version' | ||
async with httpx.AsyncClient() as client: | ||
version_info = await client.get(url) | ||
ws_url = version_info.json()['webSocketDebuggerUrl'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes the JSON response always contains webSocketDebuggerUrl; a missing key will raise KeyError and crash the service.
browser_use/dom/service.py
Outdated
start_get_targets = time.time() | ||
targets = await cdp_client.send.Target.getTargets() | ||
end_get_targets = time.time() | ||
print(f'⏱️ Target.getTargets() took {end_get_targets - start_get_targets:.3f} seconds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses print for diagnostic output instead of the project’s logging system, making log handling and log-level control impossible in production.
# print('saved snapshot to tmp/snapshot.json') | ||
# print('saved ax tree to tmp/ax_tree.json') | ||
|
||
input('Done. Press Enter to continue...') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input() performs a blocking read and will block the entire event loop when called inside an async function, defeating the benefits of asyncio.
end = time.time() | ||
print(f'Time taken: {end - start} seconds') | ||
|
||
async with aiofiles.open('tmp/enhanced_dom_tree.json', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes the "tmp" directory already exists; aiofiles.open will raise FileNotFoundError if it doesn't, causing the script to crash on first run.
|
||
def __post_init__(self): | ||
if not (self.x1 <= self.x2 and self.y1 <= self.y2): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a value from post_init silently ignores invalid rectangle coordinates instead of raising an error, so invalid Rect instances can still be constructed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic found 7 issues across 24 files. Review them in cubic.dev
React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai
to give specific feedback.
browser_use/dom/service.py
Outdated
start_get_targets = time.time() | ||
targets = await cdp_client.send.Target.getTargets() | ||
end_get_targets = time.time() | ||
print(f'⏱️ Target.getTargets() took {end_get_targets - start_get_targets:.3f} seconds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses print for diagnostic output instead of the project’s logging system, making log handling and log-level control impossible in production.
browser_use/dom/service.py
Outdated
url = url + '/json/version' | ||
async with httpx.AsyncClient() as client: | ||
version_info = await client.get(url) | ||
ws_url = version_info.json()['webSocketDebuggerUrl'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes the JSON response always contains webSocketDebuggerUrl; a missing key will raise KeyError and crash the service.
end = time.time() | ||
print(f'Time taken: {end - start} seconds') | ||
|
||
async with aiofiles.open('tmp/enhanced_dom_tree.json', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumes the "tmp" directory already exists; aiofiles.open will raise FileNotFoundError if it doesn't, causing the script to crash on first run.
# print('saved snapshot to tmp/snapshot.json') | ||
# print('saved ax tree to tmp/ax_tree.json') | ||
|
||
input('Done. Press Enter to continue...') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input() performs a blocking read and will block the entire event loop when called inside an async function, defeating the benefits of asyncio.
|
||
def __post_init__(self): | ||
if not (self.x1 <= self.x2 and self.y1 <= self.y2): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a value from post_init silently ignores invalid rectangle coordinates instead of raising an error, so invalid Rect instances can still be constructed
SelectorMap = dict[int, DOMElementNode] | ||
# Combine both for final hash | ||
combined_string = f'{parent_branch_path_string}|{attributes_string}' | ||
element_hash = hashlib.sha256(combined_string.encode()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing a full SHA-256 digest every time hash is called is unnecessarily expensive and defeats Python’s built-in hash caching; this can degrade performance when EnhancedDOMTreeNode objects are used as dictionary keys or set members.
'is_scrollable': self.is_scrollable, | ||
'frame_id': self.frame_id, | ||
'content_document': self.content_document.__json__() if self.content_document else None, | ||
'shadow_root_type': self.shadow_root_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw Enum instance is stored directly in the JSON payload; json.dumps will fail because the default encoder cannot serialize Enum objects.
'shadow_root_type': self.shadow_root_type, | |
'shadow_root_type': self.shadow_root_type.name if self.shadow_root_type else None, |
- Integrated @observe_debug and @time_execution_sync decorators in the DomService class for get_dom_tree and get_serialized_dom_tree methods to enhance debugging and performance tracking. - Updated PaintOrderRemover class with @observe_debug and @time_execution_sync in calculate_paint_order method. - Enhanced DOMTreeSerializer class with multiple @time_execution_sync decorators for various methods to improve execution time tracking and debugging capabilities.
- Introduced the @observe_debug decorator to the serialize_accessible_elements method to enhance debugging capabilities by ignoring input and output. - Removed @time_execution_sync decorator from _assign_interactive_indices_and_mark_new_nodes method to streamline observability in the DOMTreeSerializer class.
…ng-332-switch-actions-to-pure-cdp
…action-layer-with-main
@MagMueller , @gregpr07 will this PR help in solving below issue?: |
Auto-generated PR for branch: new-extraction-layer-with-main
Summary by cubic
Rebuilt the DOM extraction layer to use pure Chrome DevTools Protocol (CDP), enabling faster and more accurate DOM snapshots, improved element interactivity detection, and better support for iframes and accessibility data.
New Features
Refactors