Skip to content

Commit 0bcc636

Browse files
Gabriel Bayomi Tinoco Kalejaiyegustavocidornelas
authored andcommitted
feat: add guardrails system with PII protection
- Add base guardrail architecture with GuardrailAction, BlockStrategy enums - Implement GuardrailResult dataclass for structured guardrail responses - Create BaseGuardrail abstract class for extensible guardrail implementations - Add PIIGuardrail using Microsoft Presidio for PII detection and redaction - Support multiple block strategies: raise exception, return empty, return error message, skip function - Include GuardrailRegistry for managing guardrail instances - Add comprehensive error handling and logging This foundational system enables flexible content filtering and protection for AI/LLM applications with configurable actions and strategies.
1 parent b603847 commit 0bcc636

File tree

4 files changed

+783
-0
lines changed

4 files changed

+783
-0
lines changed
Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
# Openlayer Guardrails System
2+
3+
The Openlayer Guardrails system provides a flexible framework for protecting against security risks, PII leakage, and other concerns in traced functions. Guardrails can intercept function inputs and outputs, taking actions like allowing, blocking, or modifying data based on configurable rules.
4+
5+
## Overview
6+
7+
Guardrails integrate seamlessly with Openlayer's tracing system, automatically adding metadata about their actions to trace steps. This provides visibility into when and how guardrails are protecting your applications.
8+
9+
### Key Features
10+
11+
- **Flexible Actions**: Allow, block, or modify data based on detection results
12+
- **Input & Output Protection**: Guardrails can protect both function inputs and outputs
13+
- **Extensible Architecture**: Easy to add new guardrail types and detection methods
14+
- **Trace Integration**: Automatic metadata logging to Openlayer traces
15+
- **Multiple Guardrails**: Support for applying multiple guardrails to a single function
16+
- **Configurable Thresholds**: Adjustable confidence levels and detection rules
17+
18+
## Quick Start
19+
20+
### Basic Usage
21+
22+
```python
23+
from openlayer.lib.tracing import tracer
24+
from openlayer.lib.guardrails import PIIGuardrail
25+
26+
# Create a PII guardrail
27+
pii_guardrail = PIIGuardrail(
28+
name="PII Protection",
29+
block_entities={"US_SSN", "CREDIT_CARD"}, # Block high-risk PII
30+
redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS"} # Redact medium-risk PII
31+
)
32+
33+
# Apply to traced functions
34+
@tracer.trace(guardrails=[pii_guardrail])
35+
def process_user_input(user_query: str) -> str:
36+
return f"Processing: {user_query}"
37+
38+
# Usage examples:
39+
process_user_input("tell me about turtles") # ✅ Allowed
40+
process_user_input("my SSN is 123-45-6789") # 🚫 Blocked
41+
process_user_input("call me at 555-1234") # ✏️ Phone number redacted
42+
```
43+
44+
### Installation Requirements
45+
46+
The PII guardrail requires Microsoft Presidio:
47+
48+
```bash
49+
pip install presidio-analyzer presidio-anonymizer
50+
```
51+
52+
## Guardrail Actions
53+
54+
Guardrails can take three types of actions:
55+
56+
### 1. ALLOW
57+
- **When**: No sensitive data detected or data is considered safe
58+
- **Result**: Function executes normally with original data
59+
- **Metadata**: Records that no action was taken
60+
61+
### 2. BLOCK
62+
- **When**: High-risk sensitive data is detected (e.g., SSN, credit cards)
63+
- **Result**: Raises `GuardrailBlockedException`, preventing function execution
64+
- **Metadata**: Records what was blocked and why
65+
66+
### 3. MODIFY
67+
- **When**: Medium-risk sensitive data is detected (e.g., phone numbers, emails)
68+
- **Result**: Function executes with redacted/modified data
69+
- **Metadata**: Records what was modified and how
70+
71+
## Built-in Guardrails
72+
73+
### PIIGuardrail
74+
75+
Protects against Personally Identifiable Information using Microsoft Presidio.
76+
77+
```python
78+
from openlayer.lib.guardrails import PIIGuardrail
79+
80+
pii_guardrail = PIIGuardrail(
81+
name="PII Protection",
82+
block_entities={"US_SSN", "CREDIT_CARD", "US_PASSPORT"},
83+
redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON", "LOCATION"},
84+
confidence_threshold=0.8, # Minimum confidence to trigger (0.0-1.0)
85+
language="en" # Language for analysis
86+
)
87+
```
88+
89+
**Supported Entity Types:**
90+
- **High-risk (typically blocked)**: `US_SSN`, `CREDIT_CARD`, `CRYPTO`, `IBAN_CODE`, `US_BANK_NUMBER`, `US_DRIVER_LICENSE`, `US_PASSPORT`
91+
- **Medium-risk (typically redacted)**: `PHONE_NUMBER`, `EMAIL_ADDRESS`, `PERSON`, `LOCATION`, `DATE_TIME`, `NRP`, `MEDICAL_LICENSE`, `URL`
92+
93+
## Creating Custom Guardrails
94+
95+
### Basic Custom Guardrail
96+
97+
```python
98+
from openlayer.lib.guardrails.base import BaseGuardrail, GuardrailAction, GuardrailResult
99+
100+
class ToxicityGuardrail(BaseGuardrail):
101+
def __init__(self, name: str = "Toxicity Filter", **config):
102+
super().__init__(name=name, **config)
103+
self.toxic_words = config.get("toxic_words", ["badword1", "badword2"])
104+
105+
def check_input(self, inputs: Dict[str, Any]) -> GuardrailResult:
106+
# Check inputs for toxic content
107+
text_content = str(inputs)
108+
for word in self.toxic_words:
109+
if word.lower() in text_content.lower():
110+
return GuardrailResult(
111+
action=GuardrailAction.BLOCK,
112+
reason=f"Toxic content detected: {word}"
113+
)
114+
return GuardrailResult(action=GuardrailAction.ALLOW)
115+
116+
def check_output(self, output: Any, inputs: Dict[str, Any]) -> GuardrailResult:
117+
# Similar logic for outputs
118+
return GuardrailResult(action=GuardrailAction.ALLOW)
119+
120+
# Register and use
121+
from openlayer.lib.guardrails.base import register_guardrail
122+
register_guardrail("toxicity", ToxicityGuardrail)
123+
124+
# Create instance
125+
toxicity_guard = ToxicityGuardrail(toxic_words=["spam", "scam"])
126+
```
127+
128+
## Advanced Usage
129+
130+
### Multiple Guardrails
131+
132+
```python
133+
# Apply multiple guardrails in sequence
134+
@tracer.trace(guardrails=[pii_guardrail, toxicity_guardrail, custom_guardrail])
135+
def secure_function(user_input: str) -> str:
136+
return process_input(user_input)
137+
```
138+
139+
### Configuration Options
140+
141+
```python
142+
# Highly customized PII guardrail
143+
strict_pii = PIIGuardrail(
144+
name="Strict PII Filter",
145+
enabled=True,
146+
block_entities={"US_SSN", "CREDIT_CARD", "US_BANK_NUMBER"},
147+
redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON"},
148+
confidence_threshold=0.9, # Very strict
149+
language="en"
150+
)
151+
152+
# Lenient PII guardrail for development
153+
dev_pii = PIIGuardrail(
154+
name="Development PII Filter",
155+
enabled=False, # Disabled for development
156+
confidence_threshold=0.5 # Lower threshold
157+
)
158+
```
159+
160+
### Conditional Guardrails
161+
162+
```python
163+
# Enable/disable based on environment
164+
import os
165+
production_mode = os.getenv("ENVIRONMENT") == "production"
166+
167+
pii_guardrail = PIIGuardrail(
168+
name="Production PII Filter",
169+
enabled=production_mode,
170+
confidence_threshold=0.9 if production_mode else 0.5
171+
)
172+
```
173+
174+
## Metadata and Observability
175+
176+
Guardrails automatically add metadata to trace steps, providing visibility into their actions:
177+
178+
```json
179+
{
180+
"guardrails": {
181+
"input_pii_protection": {
182+
"action": "redacted",
183+
"reason": "Redacted PII entities: PHONE_NUMBER",
184+
"metadata": {
185+
"detected_entities": ["PHONE_NUMBER"],
186+
"redacted_entities": ["PHONE_NUMBER"],
187+
"confidence_threshold": 0.7
188+
}
189+
},
190+
"output_pii_protection": {
191+
"action": "allow",
192+
"reason": "no_pii_detected",
193+
"metadata": {
194+
"detected_entities": [],
195+
"confidence_threshold": 0.7
196+
}
197+
}
198+
}
199+
}
200+
```
201+
202+
## Error Handling
203+
204+
### GuardrailBlockedException
205+
206+
When a guardrail blocks execution, it raises a `GuardrailBlockedException`:
207+
208+
```python
209+
try:
210+
result = secure_function("my SSN is 123-45-6789")
211+
except GuardrailBlockedException as e:
212+
print(f"Blocked by {e.guardrail_name}: {e.reason}")
213+
print(f"Metadata: {e.metadata}")
214+
```
215+
216+
### Graceful Degradation
217+
218+
Guardrails are designed to fail gracefully:
219+
- If a guardrail encounters an error, it logs the error but doesn't break the trace
220+
- The error is recorded in the trace metadata
221+
- Function execution continues normally
222+
223+
## Performance Considerations
224+
225+
- Guardrails add latency to function execution
226+
- PII detection using Presidio can be CPU-intensive for large text
227+
- Consider caching guardrail results for repeated content
228+
- Use appropriate confidence thresholds to balance accuracy and performance
229+
- Disable guardrails in development/testing environments if needed
230+
231+
## Integration with Other Systems
232+
233+
### LLM Guard Integration (Future)
234+
235+
The guardrails system is designed to support multiple detection backends:
236+
237+
```python
238+
# Future: LLM Guard integration
239+
from openlayer.lib.guardrails import LLMGuardGuardrail
240+
241+
llm_guard = LLMGuardGuardrail(
242+
scanners=["Toxicity", "BanSubstrings", "PromptInjection"]
243+
)
244+
```
245+
246+
### Custom Detection Engines
247+
248+
Implement the `BaseGuardrail` interface to integrate any detection system:
249+
250+
```python
251+
class CustomDetectionGuardrail(BaseGuardrail):
252+
def __init__(self, **config):
253+
super().__init__(**config)
254+
# Initialize your detection engine
255+
self.detector = YourDetectionEngine(**config)
256+
257+
def check_input(self, inputs):
258+
results = self.detector.analyze(inputs)
259+
# Convert to GuardrailResult
260+
return self._convert_results(results)
261+
```
262+
263+
## Best Practices
264+
265+
1. **Layer Guardrails**: Use multiple guardrails for defense in depth
266+
2. **Environment-Specific Config**: Different settings for dev/staging/production
267+
3. **Monitor Performance**: Track guardrail latency and effectiveness
268+
4. **Regular Updates**: Keep detection rules and models updated
269+
5. **Test Thoroughly**: Verify guardrails work with your specific data patterns
270+
6. **Document Policies**: Clear documentation of what gets blocked/modified
271+
7. **Audit Logs**: Review guardrail actions regularly for tuning
272+
273+
## Troubleshooting
274+
275+
### Common Issues
276+
277+
1. **High False Positives**: Lower confidence threshold or adjust entity types
278+
2. **Performance Issues**: Optimize text preprocessing, use caching
279+
3. **Missing Detections**: Increase confidence threshold, add custom patterns
280+
4. **Import Errors**: Ensure required dependencies (presidio) are installed
281+
282+
### Debugging
283+
284+
Enable debug logging to see guardrail decisions:
285+
286+
```python
287+
import logging
288+
logging.getLogger("openlayer.lib.guardrails").setLevel(logging.DEBUG)
289+
```
290+
291+
## Examples
292+
293+
See the `examples/tracing/` directory for complete working examples:
294+
- `guardrails_example.py` - Comprehensive examples with Presidio
295+
- `simple_guardrails_test.py` - Basic functionality test without dependencies
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""Guardrails module for Openlayer tracing."""
2+
3+
from .base import (
4+
GuardrailAction,
5+
BlockStrategy,
6+
GuardrailResult,
7+
BaseGuardrail,
8+
GuardrailBlockedException,
9+
GuardrailRegistry,
10+
)
11+
from .pii import PIIGuardrail
12+
13+
__all__ = [
14+
"GuardrailAction",
15+
"BlockStrategy",
16+
"GuardrailResult",
17+
"BaseGuardrail",
18+
"GuardrailBlockedException",
19+
"GuardrailRegistry",
20+
"PIIGuardrail",
21+
]

0 commit comments

Comments
 (0)