feat: speed optimization for extract_structured_data #2443

sauravpanda · 2025-07-14T21:33:17Z

Summary by cubic

Improved the speed of extract_structured_data by limiting iframe processing, reducing content size, and shortening timeouts.

Performance
- Only processes up to 3 iframes and only if the query mentions "iframe" or "frame".
- Strips more HTML elements before markdown conversion.
- Reduces content length from 30,000 to 20,000 characters with smarter truncation.
- Cuts LLM and iframe timeouts for faster responses.

delve-auditor · 2025-07-14T21:35:40Z

✅ No security or compliance issues detected. Reviewed everything up to 5f99600.

Security Overview

🔎 Scanned files: 1 changed file(s)

Detected Code Changes

Change Type	Relevant files
Enhancement	► browser_use/controller/service.py Optimize iframe processing and content truncation ► browser_use/llm/groq/chat.py Simplify Groq chat implementation ► browser_use/mcp/server.py Streamline logging configuration
Refactor	► .github/workflows/claude.yml Simplify workflow configuration ► browser_use/mcp/init.py Direct import of BrowserUseServer
Configuration changes	► pyproject.toml Update version to 0.5.4

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

github-actions · 2025-07-14T21:35:41Z

Agent Task Evaluation Results: 2/3 (67%)

View detailed results

Task	Result	Reason
captcha_cloudflare	❌ Fail	The agent failed to complete the captcha solving task successfully. Although it attempted multiple times to interact with the captcha and click the 'Check' button, it was unable to solve the captcha correctly. As a result, the success message with the 'hostname' value was never displayed or extracted, and thus the required hostname 'example.com' was not obtained. Therefore, the task criteria were not met.
amazon_laptop	✅ Pass	The agent successfully navigated to amazon.com, searched for 'laptop', and returned the name of the first laptop result along with relevant details. The output meets all the criteria specified in the task.
browser_use_pip	✅ Pass	The agent explicitly provided the command 'pip install browser-use' as requested, fulfilling the task criteria. Additional relevant commands and information were also included, which enhance the user's understanding and potential usage but do not detract from the success of meeting the main requirement.

Check the evaluate-tasks job for detailed task execution logs.

delve-auditor · 2025-07-14T21:35:46Z

✅ No security or compliance issues detected. Reviewed everything up to 404336c.

Security Overview

🔎 Scanned files: 1 changed file(s)

Detected Code Changes

Change Type	Relevant files
Enhancement	► browser-use-rules.mdc Remove pre-commit formatting requirement ► .env.example Update Azure OpenAI key variable name ► prompts.py Add PlannerPrompt class ► service.py Speed optimization for extract_structured_data
Refactor	► observability.py Simplify debug observation decorators ► message_manager/service.py Optimize debug observation parameters ► controller/service.py Update wait action behavior
Other	► examples/custom-functions/cua.py Remove custom-functions example

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

cubic-dev-ai

cubic reviewed 1 file and found no issues. Review PR in cubic.dev.

pirate · 2025-07-15T23:12:01Z

we can also even hardcode the most common iframe tracking domains from the Alexa top 100 sites so we never bother processing those.

Parva101

Must‑Fix Before Merge

Comment / code timeout drift.
page.content() uses timeout=10.0 but comment + error msg say “5 seconds.” Similar drift in iframe timeouts (2s in comment, 1.0 in code).
Fix: centralize constants & align error messages.
asyncio.get_event_loop() is deprecated from python 3.12
Use asyncio.get_running_loop() (safe in modern async contexts).
Iframe gating heuristic may drop critical data.
Now iframes are processed only when the query text contains “iframe” or “frame.” Many sites load actual content (docs, auth, dashboards, embedded tables) in cross‑origin iframes users won’t name. Data‑loss regression.
Fix ideas:
- Config flag process_iframes=True (default) w/ MAX_IFRAME_COUNT.
- Always include top N non‑ad iframes (URL heuristic, size check) unless disabled.

…thub.com/browser-use/browser-use into saurav/fix-extract-structured-data-speed

.github/workflows/sync-docs.yml

mertunsall · 2025-08-03T13:11:04Z

@sauravpanda what's the state of this PR?

sauravpanda added 2 commits July 14, 2025 14:32

feat: speed optimization for extract_structured_data

404336c

Merge branch 'main' into saurav/fix-extract-structured-data-speed

74e2303

cubic-dev-ai bot reviewed Jul 14, 2025

View reviewed changes

Merge branch 'main' into saurav/fix-extract-structured-data-speed

ce4f62e

Parva101 suggested changes Jul 16, 2025

View reviewed changes

sauravpanda added 2 commits July 17, 2025 00:16

Added sync docs file

363a008

Merge branch 'saurav/fix-extract-structured-data-speed' of https://gi…

7ef5368

…thub.com/browser-use/browser-use into saurav/fix-extract-structured-data-speed

github-advanced-security bot found potential problems Jul 17, 2025

View reviewed changes

.github/workflows/sync-docs.yml Fixed Show resolved Hide resolved

sauravpanda marked this pull request as draft July 17, 2025 07:17

remove irrelevant file commit

5f99600

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: speed optimization for extract_structured_data #2443

feat: speed optimization for extract_structured_data #2443

Uh oh!

sauravpanda commented Jul 14, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

delve-auditor bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

delve-auditor bot commented Jul 14, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

pirate commented Jul 15, 2025

Uh oh!

Parva101 left a comment

Uh oh!

Uh oh!

mertunsall commented Aug 3, 2025

Uh oh!

Uh oh!

feat: speed optimization for extract_structured_data #2443

Are you sure you want to change the base?

feat: speed optimization for extract_structured_data #2443

Uh oh!

Conversation

sauravpanda commented Jul 14, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

delve-auditor bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent Task Evaluation Results: 2/3 (67%)

Uh oh!

delve-auditor bot commented Jul 14, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pirate commented Jul 15, 2025

Uh oh!

Parva101 left a comment

Choose a reason for hiding this comment

Must‑Fix Before Merge

Uh oh!

Uh oh!

mertunsall commented Aug 3, 2025

Uh oh!

Uh oh!

sauravpanda commented Jul 14, 2025 •

edited by cubic-dev-ai bot

Loading

delve-auditor bot commented Jul 14, 2025 •

edited

Loading

github-actions bot commented Jul 14, 2025 •

edited

Loading