|
| 1 | +# Openlayer Guardrails System |
| 2 | + |
| 3 | +The Openlayer Guardrails system provides a flexible framework for protecting against security risks, PII leakage, and other concerns in traced functions. Guardrails can intercept function inputs and outputs, taking actions like allowing, blocking, or modifying data based on configurable rules. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Guardrails integrate seamlessly with Openlayer's tracing system, automatically adding metadata about their actions to trace steps. This provides visibility into when and how guardrails are protecting your applications. |
| 8 | + |
| 9 | +### Key Features |
| 10 | + |
| 11 | +- **Flexible Actions**: Allow, block, or modify data based on detection results |
| 12 | +- **Input & Output Protection**: Guardrails can protect both function inputs and outputs |
| 13 | +- **Extensible Architecture**: Easy to add new guardrail types and detection methods |
| 14 | +- **Trace Integration**: Automatic metadata logging to Openlayer traces |
| 15 | +- **Multiple Guardrails**: Support for applying multiple guardrails to a single function |
| 16 | +- **Configurable Thresholds**: Adjustable confidence levels and detection rules |
| 17 | + |
| 18 | +## Quick Start |
| 19 | + |
| 20 | +### Basic Usage |
| 21 | + |
| 22 | +```python |
| 23 | +from openlayer.lib.tracing import tracer |
| 24 | +from openlayer.lib.guardrails import PIIGuardrail |
| 25 | + |
| 26 | +# Create a PII guardrail |
| 27 | +pii_guardrail = PIIGuardrail( |
| 28 | + name="PII Protection", |
| 29 | + block_entities={"US_SSN", "CREDIT_CARD"}, # Block high-risk PII |
| 30 | + redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS"} # Redact medium-risk PII |
| 31 | +) |
| 32 | + |
| 33 | +# Apply to traced functions |
| 34 | +@tracer.trace(guardrails=[pii_guardrail]) |
| 35 | +def process_user_input(user_query: str) -> str: |
| 36 | + return f"Processing: {user_query}" |
| 37 | + |
| 38 | +# Usage examples: |
| 39 | +process_user_input("tell me about turtles") # ✅ Allowed |
| 40 | +process_user_input("my SSN is 123-45-6789") # 🚫 Blocked |
| 41 | +process_user_input("call me at 555-1234") # ✏️ Phone number redacted |
| 42 | +``` |
| 43 | + |
| 44 | +### Installation Requirements |
| 45 | + |
| 46 | +The PII guardrail requires Microsoft Presidio: |
| 47 | + |
| 48 | +```bash |
| 49 | +pip install presidio-analyzer presidio-anonymizer |
| 50 | +``` |
| 51 | + |
| 52 | +## Guardrail Actions |
| 53 | + |
| 54 | +Guardrails can take three types of actions: |
| 55 | + |
| 56 | +### 1. ALLOW |
| 57 | +- **When**: No sensitive data detected or data is considered safe |
| 58 | +- **Result**: Function executes normally with original data |
| 59 | +- **Metadata**: Records that no action was taken |
| 60 | + |
| 61 | +### 2. BLOCK |
| 62 | +- **When**: High-risk sensitive data is detected (e.g., SSN, credit cards) |
| 63 | +- **Result**: Raises `GuardrailBlockedException`, preventing function execution |
| 64 | +- **Metadata**: Records what was blocked and why |
| 65 | + |
| 66 | +### 3. MODIFY |
| 67 | +- **When**: Medium-risk sensitive data is detected (e.g., phone numbers, emails) |
| 68 | +- **Result**: Function executes with redacted/modified data |
| 69 | +- **Metadata**: Records what was modified and how |
| 70 | + |
| 71 | +## Built-in Guardrails |
| 72 | + |
| 73 | +### PIIGuardrail |
| 74 | + |
| 75 | +Protects against Personally Identifiable Information using Microsoft Presidio. |
| 76 | + |
| 77 | +```python |
| 78 | +from openlayer.lib.guardrails import PIIGuardrail |
| 79 | + |
| 80 | +pii_guardrail = PIIGuardrail( |
| 81 | + name="PII Protection", |
| 82 | + block_entities={"US_SSN", "CREDIT_CARD", "US_PASSPORT"}, |
| 83 | + redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON", "LOCATION"}, |
| 84 | + confidence_threshold=0.8, # Minimum confidence to trigger (0.0-1.0) |
| 85 | + language="en" # Language for analysis |
| 86 | +) |
| 87 | +``` |
| 88 | + |
| 89 | +**Supported Entity Types:** |
| 90 | +- **High-risk (typically blocked)**: `US_SSN`, `CREDIT_CARD`, `CRYPTO`, `IBAN_CODE`, `US_BANK_NUMBER`, `US_DRIVER_LICENSE`, `US_PASSPORT` |
| 91 | +- **Medium-risk (typically redacted)**: `PHONE_NUMBER`, `EMAIL_ADDRESS`, `PERSON`, `LOCATION`, `DATE_TIME`, `NRP`, `MEDICAL_LICENSE`, `URL` |
| 92 | + |
| 93 | +## Creating Custom Guardrails |
| 94 | + |
| 95 | +### Basic Custom Guardrail |
| 96 | + |
| 97 | +```python |
| 98 | +from openlayer.lib.guardrails.base import BaseGuardrail, GuardrailAction, GuardrailResult |
| 99 | + |
| 100 | +class ToxicityGuardrail(BaseGuardrail): |
| 101 | + def __init__(self, name: str = "Toxicity Filter", **config): |
| 102 | + super().__init__(name=name, **config) |
| 103 | + self.toxic_words = config.get("toxic_words", ["badword1", "badword2"]) |
| 104 | + |
| 105 | + def check_input(self, inputs: Dict[str, Any]) -> GuardrailResult: |
| 106 | + # Check inputs for toxic content |
| 107 | + text_content = str(inputs) |
| 108 | + for word in self.toxic_words: |
| 109 | + if word.lower() in text_content.lower(): |
| 110 | + return GuardrailResult( |
| 111 | + action=GuardrailAction.BLOCK, |
| 112 | + reason=f"Toxic content detected: {word}" |
| 113 | + ) |
| 114 | + return GuardrailResult(action=GuardrailAction.ALLOW) |
| 115 | + |
| 116 | + def check_output(self, output: Any, inputs: Dict[str, Any]) -> GuardrailResult: |
| 117 | + # Similar logic for outputs |
| 118 | + return GuardrailResult(action=GuardrailAction.ALLOW) |
| 119 | + |
| 120 | +# Register and use |
| 121 | +from openlayer.lib.guardrails.base import register_guardrail |
| 122 | +register_guardrail("toxicity", ToxicityGuardrail) |
| 123 | + |
| 124 | +# Create instance |
| 125 | +toxicity_guard = ToxicityGuardrail(toxic_words=["spam", "scam"]) |
| 126 | +``` |
| 127 | + |
| 128 | +## Advanced Usage |
| 129 | + |
| 130 | +### Multiple Guardrails |
| 131 | + |
| 132 | +```python |
| 133 | +# Apply multiple guardrails in sequence |
| 134 | +@tracer.trace(guardrails=[pii_guardrail, toxicity_guardrail, custom_guardrail]) |
| 135 | +def secure_function(user_input: str) -> str: |
| 136 | + return process_input(user_input) |
| 137 | +``` |
| 138 | + |
| 139 | +### Configuration Options |
| 140 | + |
| 141 | +```python |
| 142 | +# Highly customized PII guardrail |
| 143 | +strict_pii = PIIGuardrail( |
| 144 | + name="Strict PII Filter", |
| 145 | + enabled=True, |
| 146 | + block_entities={"US_SSN", "CREDIT_CARD", "US_BANK_NUMBER"}, |
| 147 | + redact_entities={"PHONE_NUMBER", "EMAIL_ADDRESS", "PERSON"}, |
| 148 | + confidence_threshold=0.9, # Very strict |
| 149 | + language="en" |
| 150 | +) |
| 151 | + |
| 152 | +# Lenient PII guardrail for development |
| 153 | +dev_pii = PIIGuardrail( |
| 154 | + name="Development PII Filter", |
| 155 | + enabled=False, # Disabled for development |
| 156 | + confidence_threshold=0.5 # Lower threshold |
| 157 | +) |
| 158 | +``` |
| 159 | + |
| 160 | +### Conditional Guardrails |
| 161 | + |
| 162 | +```python |
| 163 | +# Enable/disable based on environment |
| 164 | +import os |
| 165 | +production_mode = os.getenv("ENVIRONMENT") == "production" |
| 166 | + |
| 167 | +pii_guardrail = PIIGuardrail( |
| 168 | + name="Production PII Filter", |
| 169 | + enabled=production_mode, |
| 170 | + confidence_threshold=0.9 if production_mode else 0.5 |
| 171 | +) |
| 172 | +``` |
| 173 | + |
| 174 | +## Metadata and Observability |
| 175 | + |
| 176 | +Guardrails automatically add metadata to trace steps, providing visibility into their actions: |
| 177 | + |
| 178 | +```json |
| 179 | +{ |
| 180 | + "guardrails": { |
| 181 | + "input_pii_protection": { |
| 182 | + "action": "redacted", |
| 183 | + "reason": "Redacted PII entities: PHONE_NUMBER", |
| 184 | + "metadata": { |
| 185 | + "detected_entities": ["PHONE_NUMBER"], |
| 186 | + "redacted_entities": ["PHONE_NUMBER"], |
| 187 | + "confidence_threshold": 0.7 |
| 188 | + } |
| 189 | + }, |
| 190 | + "output_pii_protection": { |
| 191 | + "action": "allow", |
| 192 | + "reason": "no_pii_detected", |
| 193 | + "metadata": { |
| 194 | + "detected_entities": [], |
| 195 | + "confidence_threshold": 0.7 |
| 196 | + } |
| 197 | + } |
| 198 | + } |
| 199 | +} |
| 200 | +``` |
| 201 | + |
| 202 | +## Error Handling |
| 203 | + |
| 204 | +### GuardrailBlockedException |
| 205 | + |
| 206 | +When a guardrail blocks execution, it raises a `GuardrailBlockedException`: |
| 207 | + |
| 208 | +```python |
| 209 | +try: |
| 210 | + result = secure_function("my SSN is 123-45-6789") |
| 211 | +except GuardrailBlockedException as e: |
| 212 | + print(f"Blocked by {e.guardrail_name}: {e.reason}") |
| 213 | + print(f"Metadata: {e.metadata}") |
| 214 | +``` |
| 215 | + |
| 216 | +### Graceful Degradation |
| 217 | + |
| 218 | +Guardrails are designed to fail gracefully: |
| 219 | +- If a guardrail encounters an error, it logs the error but doesn't break the trace |
| 220 | +- The error is recorded in the trace metadata |
| 221 | +- Function execution continues normally |
| 222 | + |
| 223 | +## Performance Considerations |
| 224 | + |
| 225 | +- Guardrails add latency to function execution |
| 226 | +- PII detection using Presidio can be CPU-intensive for large text |
| 227 | +- Consider caching guardrail results for repeated content |
| 228 | +- Use appropriate confidence thresholds to balance accuracy and performance |
| 229 | +- Disable guardrails in development/testing environments if needed |
| 230 | + |
| 231 | +## Integration with Other Systems |
| 232 | + |
| 233 | +### LLM Guard Integration (Future) |
| 234 | + |
| 235 | +The guardrails system is designed to support multiple detection backends: |
| 236 | + |
| 237 | +```python |
| 238 | +# Future: LLM Guard integration |
| 239 | +from openlayer.lib.guardrails import LLMGuardGuardrail |
| 240 | + |
| 241 | +llm_guard = LLMGuardGuardrail( |
| 242 | + scanners=["Toxicity", "BanSubstrings", "PromptInjection"] |
| 243 | +) |
| 244 | +``` |
| 245 | + |
| 246 | +### Custom Detection Engines |
| 247 | + |
| 248 | +Implement the `BaseGuardrail` interface to integrate any detection system: |
| 249 | + |
| 250 | +```python |
| 251 | +class CustomDetectionGuardrail(BaseGuardrail): |
| 252 | + def __init__(self, **config): |
| 253 | + super().__init__(**config) |
| 254 | + # Initialize your detection engine |
| 255 | + self.detector = YourDetectionEngine(**config) |
| 256 | + |
| 257 | + def check_input(self, inputs): |
| 258 | + results = self.detector.analyze(inputs) |
| 259 | + # Convert to GuardrailResult |
| 260 | + return self._convert_results(results) |
| 261 | +``` |
| 262 | + |
| 263 | +## Best Practices |
| 264 | + |
| 265 | +1. **Layer Guardrails**: Use multiple guardrails for defense in depth |
| 266 | +2. **Environment-Specific Config**: Different settings for dev/staging/production |
| 267 | +3. **Monitor Performance**: Track guardrail latency and effectiveness |
| 268 | +4. **Regular Updates**: Keep detection rules and models updated |
| 269 | +5. **Test Thoroughly**: Verify guardrails work with your specific data patterns |
| 270 | +6. **Document Policies**: Clear documentation of what gets blocked/modified |
| 271 | +7. **Audit Logs**: Review guardrail actions regularly for tuning |
| 272 | + |
| 273 | +## Troubleshooting |
| 274 | + |
| 275 | +### Common Issues |
| 276 | + |
| 277 | +1. **High False Positives**: Lower confidence threshold or adjust entity types |
| 278 | +2. **Performance Issues**: Optimize text preprocessing, use caching |
| 279 | +3. **Missing Detections**: Increase confidence threshold, add custom patterns |
| 280 | +4. **Import Errors**: Ensure required dependencies (presidio) are installed |
| 281 | + |
| 282 | +### Debugging |
| 283 | + |
| 284 | +Enable debug logging to see guardrail decisions: |
| 285 | + |
| 286 | +```python |
| 287 | +import logging |
| 288 | +logging.getLogger("openlayer.lib.guardrails").setLevel(logging.DEBUG) |
| 289 | +``` |
| 290 | + |
| 291 | +## Examples |
| 292 | + |
| 293 | +See the `examples/tracing/` directory for complete working examples: |
| 294 | +- `guardrails_example.py` - Comprehensive examples with Presidio |
| 295 | +- `simple_guardrails_test.py` - Basic functionality test without dependencies |
0 commit comments