Skip to content

BuiDoKhoiNguyen/PDFMiner

Repository files navigation

PDFMiner Microservice Architecture

Hệ thα»‘ng microservice để xα»­ lΓ½, phΓ’n tΓ­ch vΓ  tΓ¬m kiαΊΏm documents vα»›i vector embeddings vΓ  AI-powered search.

πŸ—οΈ KiαΊΏn trΓΊc tα»•ng quan

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gateway       β”‚    β”‚ Discovery       β”‚    β”‚ Config Server   β”‚
β”‚   (Port 8080)   β”‚    β”‚ Service         β”‚    β”‚ (Port 8888)     β”‚
β”‚                 β”‚    β”‚ (Port 8761)     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                            β”‚                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Service    β”‚    β”‚ Metadata        β”‚    β”‚ Storage         β”‚
β”‚ (Port 8081)     β”‚    β”‚ Service         β”‚    β”‚ Service         β”‚
β”‚                 β”‚    β”‚ (Port 8082)     β”‚    β”‚ (Port 8084)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                            β”‚                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Notification    β”‚    β”‚ Embedding       β”‚    β”‚ Audit Service   β”‚
β”‚ Service         β”‚    β”‚ Service         β”‚    β”‚                 β”‚
β”‚ (Port 8085)     β”‚    β”‚ (Port 8083)     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ External        β”‚
        β”‚ Dependencies    β”‚
        β”‚                 β”‚
        β”‚ β€’ Kafka         β”‚
        β”‚ β€’ Zilliz Cloud  β”‚
        β”‚ β€’ PostgreSQL    β”‚
        β”‚ β€’ Redis         β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Services

Core Services (Java/Spring Boot)

Service Port Description Technology
Gateway 8080 API Gateway & Load Balancer Spring Cloud Gateway
Discovery 8761 Service Registry Eureka Server
Config Server 8888 Centralized Configuration Spring Cloud Config
User Service 8081 User Management & Authentication Spring Boot + JWT
Metadata Service 8082 Document Metadata & Search Spring Boot + JPA
Storage Service 8084 File Storage & Management Spring Boot + MinIO
Notification Service 8085 Email & Push Notifications Spring Boot
Audit Service - Activity Logging & Monitoring Spring Boot

AI/ML Services (Python)

Service Port Description Technology
Embedding Service 8083 Vector Embeddings & Similarity Search Python + FastAPI + Zilliz Cloud
OCR Service - Document Text Extraction Python + PaddleOCR + VietOCR

πŸš€ Quick Start

Prerequisites

  • Java 17+
  • Maven 3.8+
  • Python 3.9+
  • Docker & Docker Compose
  • PostgreSQL
  • Redis
  • Kafka

1. Clone Repository

git clone <repository-url>
cd PDFMiner-microservice

2. Start Infrastructure Services

# Start PostgreSQL, Redis, Kafka
docker-compose -f infrastructure/docker-compose.yml up -d

3. Start Core Services

# Build all Java services
mvn clean compile

# Start Config Server (first)
cd config-server && mvn spring-boot:run &

# Start Discovery Service
cd discovery-service && mvn spring-boot:run &

# Start other services
cd gateway && mvn spring-boot:run &
cd user-service && mvn spring-boot:run &
cd metadata-service && mvn spring-boot:run &
cd storage-service && mvn spring-boot:run &
cd notification-service && mvn spring-boot:run &

4. Start AI/ML Services

# Embedding Service
cd embedding-service
chmod +x start.sh
./start.sh

# OCR Service
cd ocr-service
python server.py

5. Access Services

πŸ”§ Configuration

Environment Variables

Create .env files in each service directory:

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=pdfminer
DB_USERNAME=postgres
DB_PASSWORD=password

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379

# Kafka
KAFKA_BOOTSTRAP_SERVERS=localhost:9092

# Zilliz Cloud (for Embedding Service)
ZILLIZ_CLOUD_URI=your_zilliz_uri
ZILLIZ_CLOUD_TOKEN=your_zilliz_token

Service Discovery

Services tα»± Δ‘α»™ng register vα»›i Eureka Server:

eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/

πŸ“‹ API Documentation

Gateway Endpoints

# User Management
POST /api/users/register
POST /api/users/login
GET  /api/users/profile

# Document Management
POST /api/documents/upload
GET  /api/documents/{id}
POST /api/documents/search

# Vector Search
POST /api/embeddings/search
POST /api/embeddings/similar

# File Storage
POST /api/storage/upload
GET  /api/storage/download/{id}

Direct Service APIs

πŸ” Monitoring & Observability

Health Checks

# Overall system health
curl http://localhost:8080/actuator/health

# Individual services
curl http://localhost:8081/actuator/health  # User Service
curl http://localhost:8082/actuator/health  # Metadata Service
curl http://localhost:8083/health           # Embedding Service

Metrics & Logging

πŸ› οΈ Development

Adding New Service

  1. Create new Maven module:
mkdir new-service
cd new-service
# Copy from template service
  1. Update root pom.xml:
<modules>
    <!-- existing modules -->
    <module>new-service</module>
</modules>
  1. Configure service discovery and config

Testing

# Unit tests
mvn test

# Integration tests
mvn integration-test

# End-to-end tests
cd tests && python -m pytest

πŸ”’ Security

  • JWT Authentication vα»›i Spring Security
  • Rate Limiting trong Gateway
  • Input Validation vΓ  sanitization
  • HTTPS/TLS cho production
  • API Key management cho external services

πŸ“Š Performance

Benchmarks

  • Gateway Throughput: 10,000 RPS
  • Vector Search: <100ms response time
  • Document Processing: 50 documents/minute
  • OCR Processing: 2-5 pages/minute

Scaling

  • Horizontal Scaling: Multiple instances vα»›i load balancing
  • Database Sharding: Partitioned by tenant/user
  • Caching Strategy: Redis cho frequently accessed data
  • CDN: Static files vΓ  images

🚒 Deployment

Docker

# Build all services
docker-compose build

# Deploy to staging
docker-compose -f docker-compose.staging.yml up -d

# Deploy to production
docker-compose -f docker-compose.prod.yml up -d

Kubernetes

# Apply configurations
kubectl apply -f k8s/

# Check deployment status
kubectl get pods
kubectl get services

Cloud Deployment

  • AWS: EKS + RDS + ElastiCache + S3
  • GCP: GKE + Cloud SQL + Cloud Storage
  • Azure: AKS + Azure Database + Blob Storage

πŸ“š Documentation

🀝 Contributing

  1. Fork repository
  2. Create feature branch: git checkout -b feature/new-feature
  3. Commit changes: git commit -m 'Add new feature'
  4. Push branch: git push origin feature/new-feature
  5. Create Pull Request

πŸ“„ License

This project is licensed under the MIT License - see LICENSE file.

πŸ†˜ Support


Built with ❀️ using Spring Boot, FastAPI, and modern microservice patterns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published