Hα» thα»ng microservice Δα» xα» lΓ½, phΓ’n tΓch vΓ tΓ¬m kiαΊΏm documents vα»i vector embeddings vΓ AI-powered search.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Gateway β β Discovery β β Config Server β
β (Port 8080) β β Service β β (Port 8888) β
β β β (Port 8761) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Service β β Metadata β β Storage β
β (Port 8081) β β Service β β Service β
β β β (Port 8082) β β (Port 8084) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Notification β β Embedding β β Audit Service β
β Service β β Service β β β
β (Port 8085) β β (Port 8083) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βββββββββββββββββββββββββ
β
βββββββββββββββββββ
β External β
β Dependencies β
β β
β β’ Kafka β
β β’ Zilliz Cloud β
β β’ PostgreSQL β
β β’ Redis β
βββββββββββββββββββ
Service | Port | Description | Technology |
---|---|---|---|
Gateway | 8080 | API Gateway & Load Balancer | Spring Cloud Gateway |
Discovery | 8761 | Service Registry | Eureka Server |
Config Server | 8888 | Centralized Configuration | Spring Cloud Config |
User Service | 8081 | User Management & Authentication | Spring Boot + JWT |
Metadata Service | 8082 | Document Metadata & Search | Spring Boot + JPA |
Storage Service | 8084 | File Storage & Management | Spring Boot + MinIO |
Notification Service | 8085 | Email & Push Notifications | Spring Boot |
Audit Service | - | Activity Logging & Monitoring | Spring Boot |
Service | Port | Description | Technology |
---|---|---|---|
Embedding Service | 8083 | Vector Embeddings & Similarity Search | Python + FastAPI + Zilliz Cloud |
OCR Service | - | Document Text Extraction | Python + PaddleOCR + VietOCR |
- Java 17+
- Maven 3.8+
- Python 3.9+
- Docker & Docker Compose
- PostgreSQL
- Redis
- Kafka
git clone <repository-url>
cd PDFMiner-microservice
# Start PostgreSQL, Redis, Kafka
docker-compose -f infrastructure/docker-compose.yml up -d
# Build all Java services
mvn clean compile
# Start Config Server (first)
cd config-server && mvn spring-boot:run &
# Start Discovery Service
cd discovery-service && mvn spring-boot:run &
# Start other services
cd gateway && mvn spring-boot:run &
cd user-service && mvn spring-boot:run &
cd metadata-service && mvn spring-boot:run &
cd storage-service && mvn spring-boot:run &
cd notification-service && mvn spring-boot:run &
# Embedding Service
cd embedding-service
chmod +x start.sh
./start.sh
# OCR Service
cd ocr-service
python server.py
- API Gateway: http://localhost:8080
- Discovery Dashboard: http://localhost:8761
- Embedding Service: http://localhost:8083
- Health Checks: http://localhost:8080/actuator/health
Create .env
files in each service directory:
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=pdfminer
DB_USERNAME=postgres
DB_PASSWORD=password
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
# Kafka
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
# Zilliz Cloud (for Embedding Service)
ZILLIZ_CLOUD_URI=your_zilliz_uri
ZILLIZ_CLOUD_TOKEN=your_zilliz_token
Services tα»± Δα»ng register vα»i Eureka Server:
eureka:
client:
service-url:
defaultZone: http://localhost:8761/eureka/
# User Management
POST /api/users/register
POST /api/users/login
GET /api/users/profile
# Document Management
POST /api/documents/upload
GET /api/documents/{id}
POST /api/documents/search
# Vector Search
POST /api/embeddings/search
POST /api/embeddings/similar
# File Storage
POST /api/storage/upload
GET /api/storage/download/{id}
- Swagger UI: http://localhost:{port}/swagger-ui.html
- OpenAPI: http://localhost:{port}/v3/api-docs
# Overall system health
curl http://localhost:8080/actuator/health
# Individual services
curl http://localhost:8081/actuator/health # User Service
curl http://localhost:8082/actuator/health # Metadata Service
curl http://localhost:8083/health # Embedding Service
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
- Centralized Logging: ELK Stack
- Distributed Tracing: Sleuth + Zipkin
- Create new Maven module:
mkdir new-service
cd new-service
# Copy from template service
- Update root
pom.xml
:
<modules>
<!-- existing modules -->
<module>new-service</module>
</modules>
- Configure service discovery and config
# Unit tests
mvn test
# Integration tests
mvn integration-test
# End-to-end tests
cd tests && python -m pytest
- JWT Authentication vα»i Spring Security
- Rate Limiting trong Gateway
- Input Validation vΓ sanitization
- HTTPS/TLS cho production
- API Key management cho external services
- Gateway Throughput: 10,000 RPS
- Vector Search: <100ms response time
- Document Processing: 50 documents/minute
- OCR Processing: 2-5 pages/minute
- Horizontal Scaling: Multiple instances vα»i load balancing
- Database Sharding: Partitioned by tenant/user
- Caching Strategy: Redis cho frequently accessed data
- CDN: Static files vΓ images
# Build all services
docker-compose build
# Deploy to staging
docker-compose -f docker-compose.staging.yml up -d
# Deploy to production
docker-compose -f docker-compose.prod.yml up -d
# Apply configurations
kubectl apply -f k8s/
# Check deployment status
kubectl get pods
kubectl get services
- AWS: EKS + RDS + ElastiCache + S3
- GCP: GKE + Cloud SQL + Cloud Storage
- Azure: AKS + Azure Database + Blob Storage
- Fork repository
- Create feature branch:
git checkout -b feature/new-feature
- Commit changes:
git commit -m 'Add new feature'
- Push branch:
git push origin feature/new-feature
- Create Pull Request
This project is licensed under the MIT License - see LICENSE file.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Wiki: Project Wiki
- Email: support@pdfminer.com
Built with β€οΈ using Spring Boot, FastAPI, and modern microservice patterns