Skip to content

Document and announce the proposed architectural redesign #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
[![Build Status](https://github.com/ruby-git/ruby-git/workflows/CI/badge.svg?branch=main)](https://github.com/ruby-git/ruby-git/actions?query=workflow%3ACI)
[![Conventional Commits](https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?logo=conventionalcommits&logoColor=white)](https://conventionalcommits.org)

- [📢 Architectural Redesign 📢](#-architectural-redesign-)
- [📢 We Now Use RuboCop 📢](#-we-now-use-rubocop-)
- [📢 Default Branch Rename 📢](#-default-branch-rename-)
- [📢 We've Switched to Conventional Commits 📢](#-weve-switched-to-conventional-commits-)
Expand All @@ -24,6 +25,39 @@
- [Ruby version support policy](#ruby-version-support-policy)
- [License](#license)

## 📢 Architectural Redesign 📢

The git gem is undergoing a significant architectural redesign for the upcoming
v5.0.0 release. The current architecture has several design challenges that make it
difficult to maintain and evolve. This redesign aims to address these issues by
introducing a clearer, more robust, and more testable structure.

We have prepared detailed documents outlining the analysis of the current
architecture and the proposed changes. We encourage our community and contributors to
review them:

1. [Analysis of the Current Architecture](redesign/1_architecture_existing.md): A
breakdown of the existing design and its challenges.
2. [The Proposed Redesign](redesign/2_architecture_redesign.md): An overview of the
new three-layered architecture.
3. [Implementation Plan](redesign/3_architecture_implementation.md): The step-by-step
plan for implementing the redesign.

Your feedback is welcome! Please feel free to open an issue to discuss the proposed
changes.

> **DON'T PANIC!**
>
> While this is a major internal refactoring, our goal is to keep the primary public
API on the main repository object as stable as possible. Most users who rely on
documented methods like `g.commit`, `g.add`, and `g.status` should find the
transition to v5.0.0 straightforward.
>
> The breaking changes will primarily affect users who have been relying on the
internal g.lib accessor, which will be removed as part of this cleanup. For more
details, please see the "Impact on Users" section in [the redesign
document](redesign/2_architecture_redesign.md).

## 📢 We Now Use RuboCop 📢

To improve code consistency and maintainability, the `ruby-git` project has now
Expand Down
66 changes: 66 additions & 0 deletions redesign/1_architecture_existing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Analysis of the Current Git Gem Architecture and Its Challenges

This document provides an in-depth look at the current architecture of the `git` gem, outlining its primary components and the design challenges that have emerged over time. Understanding these challenges is the key motivation for the proposed architectural redesign.

- [1. Overview of the Current Architecture](#1-overview-of-the-current-architecture)
- [2. Key Architectural Challenges](#2-key-architectural-challenges)
- [A. Unclear Separation of Concerns](#a-unclear-separation-of-concerns)
- [B. Circular Dependency](#b-circular-dependency)
- [C. Undefined Public API Boundary](#c-undefined-public-api-boundary)
- [D. Slow and Brittle Test Suite](#d-slow-and-brittle-test-suite)

## 1. Overview of the Current Architecture

The gem's current design is centered around three main classes: `Git`, `Git::Base`, and `Git::Lib`.

- **`Git` (Top-Level Module)**: This module serves as the primary public entry point for creating repository objects. It contains class-level factory methods like `Git.open`, `Git.clone`, and `Git.init`. It also provides an interface for accessing global git configuration settings.

**`Git::Base`**: This is the main object that users interact with after creating or opening a repository. It holds the high-level public API for most git operations (e.g., `g.commit`, `g.add`, `g.status`). It is responsible for managing the repository's state, such as the paths to the working directory and the `.git` directory.

**`Git::Lib`**: This class is intended to be the low-level wrapper around the `git` command-line tool. It contains the methods that build the specific command-line arguments and execute the `git` binary. In practice, it also contains a significant amount of logic for parsing the output of these commands.

## 2. Key Architectural Challenges

While this structure has been functional, several significant design challenges make the codebase difficult to maintain, test, and evolve.

### A. Unclear Separation of Concerns

The responsibilities between Git::Base and Git::Lib are "muddy" and overlap significantly.

- `Git::Base` sometimes contains logic that feels like it should be lower-level.

- `Git::Lib`, which should ideally only be concerned with command execution, is filled with high-level logic for parsing command output into specific Ruby objects (e.g., parsing log output, diff stats, and branch lists).

This blending of responsibilities makes it hard to determine where a specific piece of logic should reside, leading to an inconsistent and confusing internal structure.

### B. Circular Dependency

This is the most critical architectural flaw in the current design.

- A `Git::Base` instance is created.

- The first time a command is run, `Git::Base` lazily initializes a `Git::Lib` instance via its `.lib` accessor method.

- The `Git::Lib` constructor is passed the `Git::Base` instance (`self`) so that it can read the repository's path configuration back from the object that is creating it.

This creates a tight, circular coupling: `Git::Base` depends on `Git::Lib` to execute commands, but `Git::Lib` depends on `Git::Base` for its own configuration. This pattern makes the classes difficula to instantiate or test in isolation and creates a fragile system where changes in one class can have unexpected side effects in the other.

### C. Undefined Public API Boundary

The gem lacks a formally defined public interface. Because `Git::Base` exposes its internal `Git::Lib` instance via the public `g.lib` accessor, many users have come to rely on `Git::Lib` and its methods as if they were part of the public API.

This has two negative consequences:

1. It prevents the gem's maintainers from refactoring or changing the internal implementation of `Git::Lib` without causing breaking changes for users.

2. It exposes complex, internal methods to users, creating a confusing and inconsistent user experience.

### D. Slow and Brittle Test Suite

The current testing strategy, built on `TestUnit`, suffers from two major issues:

- **Over-reliance on Fixtures**: Most tests depend on having a complete, physical git repository on the filesystem to run against. Managing these fixtures is cumbersome.

- **Excessive Shelling Out**: Because the logic for command execution and output parsing are tightly coupled, nearly every test must shell out to the actual `git` command-line tool.

This makes the test suite extremely slow, especially on non-UNIX platforms like Windows where process creation is more expensive. The slow feedback loop discourages frequent testing and makes development more difficult. The brittleness of filesystem-dependent tests also leads to flickering or unreliable test runs.
130 changes: 130 additions & 0 deletions redesign/2_architecture_redesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Proposed Redesigned Architecture for the Git Gem

This issue outlines a proposal for a major redesign of the git gem, targeted for version 5.0.0. The goal of this redesign is to modernize the gem's architecture, making it more robust, maintainable, testable, and easier for new contributors to understand.

- [1. Motivation](#1-motivation)
- [2. The New Architecture: A Three-Layered Approach](#2-the-new-architecture-a-three-layered-approach)
- [3. Key Design Principles](#3-key-design-principles)
- [A. Clear Public vs. Private API](#a-clear-public-vs-private-api)
- [B. Dependency Injection](#b-dependency-injection)
- [C. Immutable Return Values](#c-immutable-return-values)
- [D. Clear Naming for Path Objects](#d-clear-naming-for-path-objects)
- [4. Testing Strategy Overhaul](#4-testing-strategy-overhaul)
- [5. Impact on Users: Breaking Changes for v5.0.0](#5-impact-on-users-breaking-changes-for-v500)

## 1. Motivation

The current architecture, while functional, has several design issues that have accrued over time, making it difficult to extend and maintain.

- **Unclear Separation of Concerns**: The responsibilities of the `Git`, `Git::Base`, and `Git::Lib` classes are "muddy." `Git::Base` acts as both a high-level API and a factory, while `Git::Lib` contains a mix of low-level command execution and high-level output parsing.

- **Circular Dependency**: A key architectural flaw is the circular dependency between `Git::Base` and `Git::Lib`. `Git::Base` creates and depends on `Git::Lib`, but `Git::Lib`'s constructor requires an instance of Git::Base to access configuration. This tight coupling makes the classes difficult to reason about and test in isolation.

- **Undefined Public API**: The boundary between the gem's public API and its internal implementation is not clearly defined. This has led some users to rely on internal classes like `Git::Lib`, making it difficult to refactor the internals without introducing breaking changes.

- **Slow and Brittle Test Suite**: The current tests rely heavily on filesystem fixtures and shelling out to the git command line for almost every test case. This makes the test suite slow and difficult to maintain, especially on non-UNIX platforms.

## 2. The New Architecture: A Three-Layered Approach

The new design is built on a clear separation of concerns, dividing responsibilities into three distinct layers: a Facade, an Execution Context, and Command Objects.

1. The Facade Layer: Git::Repository

This is the primary public interface that users will interact with.

**Renaming**: `Git::Base` will be renamed to `Git::Repository`. This name is more descriptive and intuitive.

**Responsibility**: It will serve as a clean, high-level facade for all common git operations. Its methods will be simple, one-line calls that delegate the actual work to an appropriate command object.

**Scalability**: To prevent this class from growing too large, its methods will be organized into logical modules (e.g., `Git::Repository::Branching`, `Git::Repository::History`) which are then included in the main class. This keeps the core class definition small and the features well-organized. These categories will be inspired by (but not slavishly follow) the git command line reference in [this page](https://git-scm.com/docs).

2. The Execution Layer: Git::ExecutionContext

This is the low-level, private engine for running commands.

**Renaming**: `Git::Lib` will be renamed to `Git::ExecutionContext`.

**Responsibility**: Its sole purpose is to execute raw git commands. It will manage the repository's environment (working directory, .git path, logger) and use the existing `Git::CommandLine` class to interact with the system's git binary. It will have no knowledge of any specific git command's arguments or output.

3. The Logic Layer: Git::Commands

This is where all the command-specific implementation details will live.

**New Classes**: For each git operation, a new command class will be created within the `Git::Commands` namespace (e.g., `Git::Commands::Commit`, `Git::Commands::Diff`).

**Dual Responsibility**: Each command class will be responsible for:

1. **Building Arguments**: Translating high-level Ruby options into the specific command-line flags and arguments that git expects.

2. **Parsing Output**: Taking the raw string output from the ExecutionContext and converting it into rich, structured Ruby objects.

**Handling Complexity**: For commands with multiple behaviors (like git diff), we can use specialized subclasses (e.g., Git::Commands::Diff::NameStatus, Git::Commands::Diff::Stats) to keep each class focused on a single responsibility.

## 3. Key Design Principles

The new architecture will be guided by the following modern design principles.

### A. Clear Public vs. Private API

A primary goal of this redesign is to establish a crisp boundary between the public API and internal implementation details.

- **Public Interface**: The public API will consist of the `Git` module (for factories), the `Git::Repository` class, and the specialized data/query objects it returns (e.g., `Git::Log`, `Git::Status`, `Git::Object::Commit`).

- **Private Implementation**: All other components, including `Git::ExecutionContext` and all classes within the `Git::Commands` namespace, will be considered internal. They will be explicitly marked with the `@api private` YARD tag to discourage external use.

### B. Dependency Injection

The circular dependency will be resolved by implementing a clear, one-way dependency flow.

1. The factory methods (`Git.open`, `Git.clone`) will create and configure an instance of `Git::ExecutionContext`.

2. This `ExecutionContext` instance will then be injected into the constructor of the `Git::Repository` object.

This decouples the `Repository` from its execution environment, making the system more modular and easier to test.

### C. Immutable Return Values

To create a more predictable and robust API, methods will return structured, immutable data objects instead of raw strings or hashes.

This will be implemented using `Data.define` or simple, frozen `Struct`s.

For example, instead of returning a raw string, `repo.config('user.name')` will return a `Git::Config::Value` object containing the key, value, scope, and source path.

### D. Clear Naming for Path Objects

To improve clarity, all classes that represent filesystem paths will be renamed to follow a consistent `...Path` suffix convention.

- `Git::WorkingDirectory` -> `Git::WorkingTreePath`

- `Git::Index` -> `Git::IndexPath`

- The old `Git::Repository` (representing the .git directory/file) -> `Git::RepositoryPath`

## 4. Testing Strategy Overhaul

The test suite will be modernized to be faster, more reliable, and easier to work with.

- **Migration to RSpec**: The entire test suite will be migrated from TestUnit to RSpec to leverage its modern tooling and expressive DSL.

- **Layered Testing**: A three-layered testing strategy will be adopted:

1. **Unit Tests**: The majority of tests will be fast, isolated unit tests for the `Command` classes, using mock `ExecutionContexts`.

2. **Integration Tests**: A small number of integration tests will verify that `ExecutionContext` correctly interacts with the system's `git` binary.

3. **Feature Tests**: A minimal set of high-level tests will ensure the public facade on `Git::Repository` works end-to-end.

- **Reduced Filesystem Dependency**: This new structure will dramatically reduce the suite's reliance on slow and brittle filesystem fixtures.

## 5. Impact on Users: Breaking Changes for v5.0.0

This redesign is a significant undertaking and will be released as version 5.0.0. It includes several breaking changes that users will need to be aware of when upgrading.

- **`Git::Lib` is Removed**: Any code directly referencing `Git::Lib` will break.

- **g.lib Accessor is Removed**: The `.lib` accessor on repository objects will be removed.

- **Internal Methods Relocated**: Methods that were previously accessible via g.lib will now be private implementation details of the new command classes and will not be directly reachable.

Users should only rely on the newly defined public interface.

Loading