[FEATURE] structured_output retries

### Problem Statement

Since many model providers which support tool calling don't provide structured output guarantees via constrained decoding, it would be great to have retry logic for this.

### Proposed Solution

A few different types of retry logic that would be beneficial are:

1. Naive retry — retry up to some `retry_limit`
2. Intelligent retry — construct a retry prompt by injecting Pydantic validation error(s) and requesting to avoid those errors on retry
3. Patching mistakes — similar to [trustcall](https://github.com/hinthornw/trustcall), ask an LLM to provide a JSON patch rather than rewriting from scratch

### Use Case

Here's an example of a Pydantic schema which sometimes succeeds and sometimes fails:

```python
import boto3
from botocore.config import Config
import asyncio
from pydantic import BaseModel, Field
from strands.models import BedrockModel
from typing import List

config = Config(
    region_name="us-east-1",
    connect_timeout=10,
    read_timeout=120,
)

async def test_deep_nesting():
    model = BedrockModel(model_id="amazon.nova-lite-v1:0", max_tokens=8192, config=config)

    class Exercise(BaseModel):
        name: str = Field(description="Name of the exercise")
        sets: int = Field(ge=1, le=10, description="Number of sets to perform")
        reps: int = Field(ge=1, le=50, description="Number of repetitions per set")

    class Workout(BaseModel):
        name: str = Field(description="Name of the workout")
        exercises: List[Exercise] = Field(
            min_length=2, 
            max_length=4, 
            description="List of exercises in this workout (2-4 exercises)"
        )

    class DaySchedule(BaseModel):
        is_rest_day: bool = Field(description="True if this is a rest day, False if it's a workout day")
        workouts: List[Workout] = Field(
            default=[],
            max_length=2,
            description="List of workouts for this day. Empty list if is_rest_day is True, 1-2 workouts if False"
        )

    class Week(BaseModel):
        week_number: int = Field(ge=1, le=4, description="Week number in the program")
        monday: DaySchedule
        tuesday: DaySchedule
        wednesday: DaySchedule
        thursday: DaySchedule
        friday: DaySchedule
        saturday: DaySchedule
        sunday: DaySchedule

    class WorkoutSchedule(BaseModel):
        title: str = Field(description="Title of the workout program")
        weeks: List[Week] = Field(
            min_length=2, 
            max_length=2, 
            description="Exactly 2 weeks of programming"
        )

    messages = [{
        "role": "user", 
        "content": [{"text": """
        Create a complete 2-week workout schedule called "Strength & Conditioning Program".
        
        Requirements:
        - Each week should have 4-5 workout days and 2-3 rest days
        - For rest days: set is_rest_day=true and workouts=[] (empty list)
        - For workout days: set is_rest_day=false and include 1-2 workouts in the workouts list
        - Each workout should have 2-4 exercises with realistic rep/set counts (reps 5-20, sets 1-5)
        - Make it progressive - week 2 should build on week 1
        - Vary the workouts throughout each week (upper body, lower body, cardio, etc.)
        """}]
    }]

    async for event in model.structured_output(WorkoutSchedule, messages):
        if "output" in event:
            return event["output"]

async def run_tests():
    successes = 0
    failures = 0
    
    for i in range(10):
        try:
            await test_deep_nesting()
            successes += 1
            print("✅", end="")
        except Exception:
            failures += 1
            print("❌", end="")
    
    print(f"\nResults: {successes}/10 successes ({successes*10}%)")

asyncio.run(run_tests())
```

My result:

```
❌✅✅✅✅✅❌✅❌✅
Results: 7/10 successes (70%)
```

### Alternatives Solutions

_No response_

### Additional Context

OpenAI's [structured outputs feature](https://openai.com/index/introducing-structured-outputs-in-the-api/) guarantees adhering to the Pydantic schema by using constrained decoding, but providers/models such as Anthropic, Nova, etc. do not offer this feature, and therefore often fail to adhere to complex nested schemas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] structured_output retries #348

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] structured_output retries #348

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions