Skip to content

refactor(mv): Refactor HardlinkError for improved error handling #8408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amirhosseinghanipour
Copy link
Contributor

This pull request refactors the HardlinkError enum within the mv utility to provide more structured and descriptive error handling to significantly improve diagnostics for developers and end-users.

Motivation

The previous implementation of HardlinkError included a generic variant for scan-related failures:

#[derive(Debug)]
pub enum HardlinkError {
    Io(io::Error),
    Scan(String),
    Preservation { source: PathBuf, target: PathBuf },
    Metadata { path: PathBuf, error: io::Error },
}

While functional, the Scan(String) variant was a significant limitation. When a hardlink scan failed (for example, during a recursive directory traversal), the only information captured was a simple string message. This approach made it difficult to programmatically handle the error or to provide a user with actionable feedback. The underlying cause of the failure—such as a permissions issue, a non-existent directory, or another I/O problem—was obscured.

For a core utility, where robustness and clear diagnostics are paramount, a more precise error-handling mechanism was necessary.

Implementation

To address this, I decided to replace the opaque String with a structured error variant that captures the full context of the failure.

The Scan variant was removed and replaced with the following:

#[derive(Debug)]
pub enum HardlinkError {
    Io(io::Error),
    Preservation { source: PathBuf, target: PathBuf },
    Metadata { path: PathBuf, error: io::Error },
    ScanReadDir { path: PathBuf, error: io::Error },
}

This new ScanReadDir variant is substantially more useful for several key reasons:

  1. Contextual Path Information: It explicitly captures the path of the directory that could not be read. This immediately tells the user or developer where the error occurred.
  2. Underlying I/O Error: It preserves the original io::Error. This is crucial because it contains the low-level details of why the operation failed (e.g., PermissionDenied, NotFound, etc.).
  3. Improved Diagnostics: With this structured information, we can generate far more informative error messages. The Display implementation was updated to format a message like: Failed to read directory during scan /path/to/problem/dir: Permission denied.

This refactoring required updating all call sites that previously used HardlinkError::Scan. Therefore, the new error information is propagated correctly throughout the hardlink module.

Merging this change enhances the validity of the mv command's hardlink preservation feature. Debugging failures related to directory scanning is now significantly more straightforward.

@amirhosseinghanipour amirhosseinghanipour force-pushed the refactor/hardlink-error-enum branch 2 times, most recently from c32657c to e4ebc98 Compare July 29, 2025 03:54
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@RenjiSann
Copy link
Collaborator

Please run cargo fmt.

I'm not sure this addresses any known issue.
Is this needed for GNU compatibility, or is it pure refactoring ?

Also, tbh this PR looks like it is 100% AI-generated, but I'm not sure what's our stance regarding AI-generated code.

@amirhosseinghanipour
Copy link
Contributor Author

Indeed, this is a pure refactoring and doesn't address a pre-existing issue or GNU compatibility. My motivation was to make the error handling for hardlink scanning more specific. I felt that the old Scan(String) error was too generic. When a failure occurred, it wasn't clear what part of the scan had failed. I renamed it to ScanReadDir specifically because the error happens when the code fails to read a directory's contents during the recursive scan, not some other part of the scanning logic. However, I completely understand if the project prefers to avoid refactoring for its own sake.

Regarding the AI, I apologize if the style came across as impersonal. As I'm new to this and mentioned that in Sylvestre's comment, I've been using AI as a tool to help me structure my thoughts and make sure my code suggestions are idiomatic. I'm driving the changes and doing my best to understand and test everything, but I can see how it made the description feel less authentic. I'll be more mindful of that and write in a more direct style going forward.

I'll run cargo fmt right away and push the changes. Thanks again for the guidance! That means a lot!

@amirhosseinghanipour amirhosseinghanipour force-pushed the refactor/hardlink-error-enum branch from e4ebc98 to b2a7ce7 Compare July 29, 2025 12:12
@amirhosseinghanipour
Copy link
Contributor Author

By the way, I do have a follow-up question to better understand the project's contribution philosophy. I'd like to know what the general stance is on pure refactoring PRs like this one. I ask because in my experience, targeted refactoring can help a lot with long-term maintenance.

I do respect that every project has its own priorities. Knowing if you're generally open to these kinds of PRs will help me focus my future efforts on contributions that are most valuable to the team.

@amirhosseinghanipour amirhosseinghanipour force-pushed the refactor/hardlink-error-enum branch from b2a7ce7 to 9102fa8 Compare July 29, 2025 12:27
@sylvestre
Copy link
Contributor

our priorities are:

  • gnu compatibility
  • performance improvements
  • improvement on code quality (less code, easier to read, remove duplication)

in this PR, please explain where you saw the previous error and how to trigger the new one.

@amirhosseinghanipour
Copy link
Contributor Author

I've also fixed the redundant_closure error that Clippy caught and force-pushed the amended commit.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@amirhosseinghanipour
Copy link
Contributor Author

in this PR, please explain where you saw the previous error and how to trigger the new one.

Previously, if you tried to move a directory without read permissions, the command would succeed silently. This happened because the hardlink scan was designed to gracefully ignore read errors, and the subsequent fs::rename call doesn't require read permissions to succeed on the same filesystem. You can trigger it on main with:

mkdir -p /tmp/mv_test/{src/unreadable_dir,dest} && \
chmod a-r /tmp/mv_test/src/unreadable_dir && \
./target/debug/mv /tmp/mv_test/src/unreadable_dir /tmp/mv_test/dest && \
rm -rf /tmp/mv_test

So the command does the job and exits successfully with no error message.

The ultimate goal of this PR is to make mv fail immediately when it encounters a permission error to provide a clear and descriptive message to the user. This is a pure refactoring of the error type as discussed. My plan was to follow this up with a second PR that modifies mv.rs to use this new, more specific error. That second change would fully implement the new behavior" described above by stopping the silent failure. I kept them separate to make the reviews cleaner, but I wanted to provide the full context for this change. This will make it easier for users to diagnose file system permission issues, which I personally believe is crucial, even contrary to Unix.

To keep it short, now, when the hardlink scan fails to read a directory due to a permission error, that error is immediately reported to the user, and the program exits. This aligns with the principle of failing fast and makes the tool's behavior less surprising.

If this foundational PR is accepted, my very next PR will modify mv.rs to correctly deliver the error. The result will be that the mv command will no longer fail silently. Instead, it will produce a clear error.

<git:(refactor/hardlink-error-enum*)> ./target/debug/mv /tmp/mv_test/src/unreadable_dir /tmp/mv_test/dest

./target/debug/mv: I/O error during hardlink operation: Failed to read directory during scan /tmp/mv_src/unreadable_dir: Permission denied (os error 13)

Please let me know if you'd like me to proceed with this two-step plan. If you agree, I'll await the review of this current PR. If not, I completely understand, and I will close this PR.

@sylvestre
Copy link
Contributor

in that case, please write a test then :)

@amirhosseinghanipour
Copy link
Contributor Author

Oh yeah wonderful! :D I've implemented the changes we discussed. I've specifically focused on addressing the refactoring and I've now added a test case that explicitly demonstrates how the new error handling works. Following up on my previous message, I also wanted to clarify that this refactoring wasn't isolated. To correctly implement the new error structure and ensure that existing functionalities continued to work (and their tests passed!), I had to make targeted edits in other parts of the mv utility. These changes were essential because the program needs to identify and react to specific failure conditions rather than go silently. The tests for these updated flows now pass and confirm the correct behavior. I'm done. :)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@amirhosseinghanipour amirhosseinghanipour force-pushed the refactor/hardlink-error-enum branch from 7c8b8e0 to 5364cf4 Compare July 29, 2025 21:51
@amirhosseinghanipour
Copy link
Contributor Author

Added #[cfg(unix)] to prevent test_mv_hardlink_permission_error compilation on Windows due to reliance on the Unix file permission API than Windows ACLs.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@amirhosseinghanipour amirhosseinghanipour force-pushed the refactor/hardlink-error-enum branch from 5364cf4 to 464972a Compare July 29, 2025 22:11
@amirhosseinghanipour
Copy link
Contributor Author

I've successfully verified changes by running tests on Linux. The tests pass on Linux platforms as expected. Since I don't have direct access to a Windows environment for testing, I'd appreciate it if a reviewer could verify the behavior and test outcomes on Windows. The problematic line was test test_mv_dir_into_file_where_both_are_files. However, I'm looking forward to seeing the fix.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

// For non-verbose mode, silently continue for missing files
// This provides graceful degradation - we'll lose hardlink info for this file
// but can still preserve hardlinks for other files
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with this code and how to reproduce this, but this comment seems to make a lot of sense, and your modification just prevents graceful degradation. I don't think that's an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is here. It always continues silently on non-verbose and will further continue the move operation regardless of the scan failure. So, I strongly believe resolving a scan failure even contrary to GNU is a massive deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I know my PR slightly deviates from the established GNU behavior and tried to point it out through my follow-ups. My core argument is that a tool should not succeed silengtly when the user's intent can't be fully/correctly carried out. The previous behavior hides underlying problems and lead to surprising outcomes for the user while being technically possible. And I also acknowledge the graceful degradation, but I think we can decide on something together and that is the user experience argument. I agree syscall itself doesn't require read permissions on the dir being moved. That's why GNU can move an unreadable dir, BUT from a user's perspective, this is deeply counter-intuitive and problematic. A user's mental model for file operations is built on consistency, and if they run ls /tmp/unreadable_dir and get Permission denied, they're absolutely surprised when mv /tmp/unreadable_dir /tmp/dest succeeds without a warning (I'm considering a very normal end user). This inconsistency makes the tool feel unpredictable and failing with a clear error message is more preditable and informational. <-- This is exactly where and why this PR began.

Another issue is that successfully moving an unreadable dir masks a permission problem. The user might think the operation was fully successful, but later they try to access content and it fails. As I'm trying to shed light on this matter, I believe you can now see it's far better to fail fast at the time of the move operation. This way, the user immediately gets aware of the permissions issue so they can fix it. Another point is we're scanning for hardlinks here even though the simplest move is just a rename. The previous behavior created a contradictory state that the hardlink scan would fail silently (this is the key word for me lol) due to permissions, but the rename would succeed. My change makes the behavoir consistent. And it doesn't matter if you're not into its code, it basically means it if a required scan fails if and only if due to permissions, the entire operation MUST fail. That's what I'm proposing in here.

));
set_exit_code(1);
continue;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why it's ok to not print an error (and call continue) if neither of the 2 tests above succeed.

(found by another AI...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to follow the existing error-handling patterns, and I noticed that the loop already uses continue in several places to skip a single problematic file and proceed with the rest of the operation. I was also trying to align my new error handling with the skip and continue pattern, that's why I kept the original continue. Speaking of set_exit_code(1), my goal was to handle specific, non-critical failures where we don't need to print a new error message because it can handle with the interactive flag for the sake of brevity. If we should print an error, please explain why and what. I'd love to know your suggestions.

.stderr_contains("Permission denied");

at.set_mode(unreadable_dir, 0o755);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is incorrect, and tests something that GNU coreutils (and uutils) is perfectly able to do before your change.

rm -rf tmp
mkdir -p tmp/src/unreadable_dir
chmod 0333 tmp/src/unreadable_dir
mkdir -p tmp/dst
mv tmp/src/unreadable_dir tmp/dst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the GNU way to successfully move a dir with 0333 permissions, that's not the problem, and my test asserts the opposite and is incorrect from a strict GNU compatibility view. I don't dispute this fact. Please read the whole PR again, I tried a lot to bring this change into the ecosystem. My argument is that this is a prime example of a legacy behavior that uutils has an opportunity to improve for the sake of a safer and more intuitive user experience. We all know that the move command performs two distinct ops. It attempts to scan the source dir and it children to preserve hardlink rels. This operation requires read permissions. The other one is a rename(2) syscall that requires write/execute permissions on the parent dirs, not on the dir being moved. So, GNU prioritizes the second step and ignores the failure of the first. This leads to two significant problems for the user. It fails silently because the user expects it to preserve hardlinks. When it encounters an unreadable dir, it can't perform the hardlink scan, meaning it can't guarantee the presentation of links within the dir. GNU proceeds with the move anyway and silently fails. The user is left with a potentially broken file structure and no warning that the operation was not fully successful. Also, a user who can't ls a dir would be very surprised that they can mv it lol. This success hides a very important permission issue that will almost certain cause problems later. My change surfaces this issue immediately, so I ask very nicely from user to do a fancy chmod. This is because I want to make sure the move command correctly and preditably completes. So the test case isn't wrong if you think this way. It imposes this stricter and in a safer contract. The test assrts that if mv can't properly inspect a source dir to fulfill hardlink preseveration, it should fail with a clear error rather than succeding partically and silently again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants