1.
Distributed Database Concepts
A distributed database consists of multiple interlinked databases spread across various locations but
appearing as a single system to the users. The key reasons for their use include better performance,
high fault tolerance, and scalability. Distributed databases offer transparency to the users through
the following aspects:
- Location Transparency: Users access data without knowing its physical location.
- Replication Transparency: The system manages duplicated data across sites.
- Scalability: New nodes can be added seamlessly.
Example: A global e-commerce company uses distributed databases to manage inventory and
customer data across continents, ensuring faster access and reliability.
2. Data Fragmentation, Replication, and Allocation Techniques
Efficient design of distributed databases relies on the following techniques:
1. **Data Fragmentation**:
- Horizontal Fragmentation: Data is divided into rows based on conditions. For example, customer
records are stored regionally.
- Vertical Fragmentation: Data is divided into columns, e.g., separating personal information from
financial data.
- Hybrid Fragmentation: Combines both horizontal and vertical techniques.
2. **Data Replication**:
- Full Replication: Entire database is duplicated across all sites, enhancing data availability.
- Partial Replication: Only critical parts of the data are replicated.
Example: Banking systems replicate transactional data for faster query response.
3. **Data Allocation**:
- Centralized Allocation: All data stored at one location.
- Partitioned Allocation: Data divided and stored at multiple locations.
- Hybrid Allocation: A combination of centralized and partitioned methods.
3. Types of Distributed Database Systems
Distributed databases are classified into the following types:
- **Homogeneous Distributed Databases**: Use the same database management system (DBMS)
across all sites, ensuring compatibility. Example: A retail chain using Oracle DBMS across
branches.
- **Heterogeneous Distributed Databases**: Use different DBMSs, requiring middleware for
interaction. Example: Combining MySQL and MongoDB in an e-commerce system.
- **Federated Databases**: Independent databases collaborate through a unified interface.
Example: Universities sharing research data.
- **Client-Server Systems**: Central server handles requests from clients. Example: Online gaming
platforms.
- **Peer-to-Peer Systems**: Each node acts as both client and server, such as in blockchain
networks.
4. Query Processing in Distributed Databases
Query processing in distributed databases aims to ensure efficient execution while overcoming
challenges such as data distribution and communication delays. Key aspects include:
1. **Challenges**:
- Data fragmentation across multiple locations.
- Communication overhead during data transfer.
2. **Optimization Techniques**:
- Query Decomposition: Splitting a complex query into smaller subqueries.
- Cost-Based Optimization: Evaluating various query plans to minimize costs.
- Join Strategies: Using semi-joins to reduce data transfer between nodes.
3. **Distributed Query Algorithms**:
Techniques like two-phase commit ensure consistency in distributed transactions. Example: A bank
transferring money between accounts located in different branches.