-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
refactor(mv): Refactor HardlinkError for improved error handling #8408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor(mv): Refactor HardlinkError for improved error handling #8408
Conversation
c32657c
to
e4ebc98
Compare
GNU testsuite comparison:
|
Please run I'm not sure this addresses any known issue. Also, tbh this PR looks like it is 100% AI-generated, but I'm not sure what's our stance regarding AI-generated code. |
Indeed, this is a pure refactoring and doesn't address a pre-existing issue or GNU compatibility. My motivation was to make the error handling for hardlink scanning more specific. I felt that the old Regarding the AI, I apologize if the style came across as impersonal. As I'm new to this and mentioned that in Sylvestre's comment, I've been using AI as a tool to help me structure my thoughts and make sure my code suggestions are idiomatic. I'm driving the changes and doing my best to understand and test everything, but I can see how it made the description feel less authentic. I'll be more mindful of that and write in a more direct style going forward. I'll run |
e4ebc98
to
b2a7ce7
Compare
By the way, I do have a follow-up question to better understand the project's contribution philosophy. I'd like to know what the general stance is on pure refactoring PRs like this one. I ask because in my experience, targeted refactoring can help a lot with long-term maintenance. I do respect that every project has its own priorities. Knowing if you're generally open to these kinds of PRs will help me focus my future efforts on contributions that are most valuable to the team. |
b2a7ce7
to
9102fa8
Compare
our priorities are:
in this PR, please explain where you saw the previous error and how to trigger the new one. |
I've also fixed the |
GNU testsuite comparison:
|
Previously, if you tried to move a directory without read permissions, the command would succeed silently. This happened because the hardlink scan was designed to gracefully ignore read errors, and the subsequent mkdir -p /tmp/mv_test/{src/unreadable_dir,dest} && \
chmod a-r /tmp/mv_test/src/unreadable_dir && \
./target/debug/mv /tmp/mv_test/src/unreadable_dir /tmp/mv_test/dest && \
rm -rf /tmp/mv_test So the command does the job and exits successfully with no error message. The ultimate goal of this PR is to make To keep it short, now, when the hardlink scan fails to read a directory due to a permission error, that error is immediately reported to the user, and the program exits. This aligns with the principle of failing fast and makes the tool's behavior less surprising. If this foundational PR is accepted, my very next PR will modify <git:(refactor/hardlink-error-enum*)> ./target/debug/mv /tmp/mv_test/src/unreadable_dir /tmp/mv_test/dest
./target/debug/mv: I/O error during hardlink operation: Failed to read directory during scan /tmp/mv_src/unreadable_dir: Permission denied (os error 13) Please let me know if you'd like me to proceed with this two-step plan. If you agree, I'll await the review of this current PR. If not, I completely understand, and I will close this PR. |
in that case, please write a test then :) |
Oh yeah wonderful! :D I've implemented the changes we discussed. I've specifically focused on addressing the refactoring and I've now added a test case that explicitly demonstrates how the new error handling works. Following up on my previous message, I also wanted to clarify that this refactoring wasn't isolated. To correctly implement the new error structure and ensure that existing functionalities continued to work (and their tests passed!), I had to make targeted edits in other parts of the |
GNU testsuite comparison:
|
7c8b8e0
to
5364cf4
Compare
Added |
GNU testsuite comparison:
|
5364cf4
to
464972a
Compare
I've successfully verified changes by running tests on Linux. The tests pass on Linux platforms as expected. Since I don't have direct access to a Windows environment for testing, I'd appreciate it if a reviewer could verify the behavior and test outcomes on Windows. The problematic line was test |
GNU testsuite comparison:
|
// For non-verbose mode, silently continue for missing files | ||
// This provides graceful degradation - we'll lose hardlink info for this file | ||
// but can still preserve hardlinks for other files | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too familiar with this code and how to reproduce this, but this comment seems to make a lot of sense, and your modification just prevents graceful degradation. I don't think that's an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is here. It always continues silently on non-verbose and will further continue the move operation regardless of the scan failure. So, I strongly believe resolving a scan failure even contrary to GNU is a massive deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I know my PR slightly deviates from the established GNU behavior and tried to point it out through my follow-ups. My core argument is that a tool should not succeed silengtly when the user's intent can't be fully/correctly carried out. The previous behavior hides underlying problems and lead to surprising outcomes for the user while being technically possible. And I also acknowledge the graceful degradation, but I think we can decide on something together and that is the user experience argument. I agree syscall itself doesn't require read permissions on the dir being moved. That's why GNU can move an unreadable dir, BUT from a user's perspective, this is deeply counter-intuitive and problematic. A user's mental model for file operations is built on consistency, and if they run ls /tmp/unreadable_dir
and get Permission denied
, they're absolutely surprised when mv /tmp/unreadable_dir /tmp/dest
succeeds without a warning (I'm considering a very normal end user). This inconsistency makes the tool feel unpredictable and failing with a clear error message is more preditable and informational. <-- This is exactly where and why this PR began.
Another issue is that successfully moving an unreadable dir masks a permission problem. The user might think the operation was fully successful, but later they try to access content and it fails. As I'm trying to shed light on this matter, I believe you can now see it's far better to fail fast at the time of the move operation. This way, the user immediately gets aware of the permissions issue so they can fix it. Another point is we're scanning for hardlinks here even though the simplest move is just a rename. The previous behavior created a contradictory state that the hardlink scan would fail silently (this is the key word for me lol) due to permissions, but the rename would succeed. My change makes the behavoir consistent. And it doesn't matter if you're not into its code, it basically means it if a required scan fails if and only if due to permissions, the entire operation MUST fail. That's what I'm proposing in here.
)); | ||
set_exit_code(1); | ||
continue; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why it's ok to not print an error (and call continue
) if neither of the 2 tests above succeed.
(found by another AI...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to follow the existing error-handling patterns, and I noticed that the loop already uses continue
in several places to skip a single problematic file and proceed with the rest of the operation. I was also trying to align my new error handling with the skip and continue pattern, that's why I kept the original continue
. Speaking of set_exit_code(1)
, my goal was to handle specific, non-critical failures where we don't need to print a new error message because it can handle with the interactive flag for the sake of brevity. If we should print an error, please explain why and what. I'd love to know your suggestions.
.stderr_contains("Permission denied"); | ||
|
||
at.set_mode(unreadable_dir, 0o755); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is incorrect, and tests something that GNU coreutils (and uutils) is perfectly able to do before your change.
rm -rf tmp
mkdir -p tmp/src/unreadable_dir
chmod 0333 tmp/src/unreadable_dir
mkdir -p tmp/dst
mv tmp/src/unreadable_dir tmp/dst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the GNU way to successfully move a dir with 0333 permissions, that's not the problem, and my test asserts the opposite and is incorrect from a strict GNU compatibility view. I don't dispute this fact. Please read the whole PR again, I tried a lot to bring this change into the ecosystem. My argument is that this is a prime example of a legacy behavior that uutils has an opportunity to improve for the sake of a safer and more intuitive user experience. We all know that the move command performs two distinct ops. It attempts to scan the source dir and it children to preserve hardlink rels. This operation requires read permissions. The other one is a rename(2) syscall that requires write/execute permissions on the parent dirs, not on the dir being moved. So, GNU prioritizes the second step and ignores the failure of the first. This leads to two significant problems for the user. It fails silently because the user expects it to preserve hardlinks. When it encounters an unreadable dir, it can't perform the hardlink scan, meaning it can't guarantee the presentation of links within the dir. GNU proceeds with the move anyway and silently fails. The user is left with a potentially broken file structure and no warning that the operation was not fully successful. Also, a user who can't ls
a dir would be very surprised that they can mv
it lol. This success hides a very important permission issue that will almost certain cause problems later. My change surfaces this issue immediately, so I ask very nicely from user to do a fancy chmod
. This is because I want to make sure the move command correctly and preditably completes. So the test case isn't wrong if you think this way. It imposes this stricter and in a safer contract. The test assrts that if mv
can't properly inspect a source dir to fulfill hardlink preseveration, it should fail with a clear error rather than succeding partically and silently again.
This pull request refactors the
HardlinkError
enum within themv
utility to provide more structured and descriptive error handling to significantly improve diagnostics for developers and end-users.Motivation
The previous implementation of
HardlinkError
included a generic variant for scan-related failures:While functional, the
Scan(String)
variant was a significant limitation. When a hardlink scan failed (for example, during a recursive directory traversal), the only information captured was a simple string message. This approach made it difficult to programmatically handle the error or to provide a user with actionable feedback. The underlying cause of the failure—such as a permissions issue, a non-existent directory, or another I/O problem—was obscured.For a core utility, where robustness and clear diagnostics are paramount, a more precise error-handling mechanism was necessary.
Implementation
To address this, I decided to replace the opaque
String
with a structured error variant that captures the full context of the failure.The
Scan
variant was removed and replaced with the following:This new
ScanReadDir
variant is substantially more useful for several key reasons:path
of the directory that could not be read. This immediately tells the user or developer where the error occurred.io::Error
. This is crucial because it contains the low-level details of why the operation failed (e.g.,PermissionDenied
,NotFound
, etc.).Display
implementation was updated to format a message like:Failed to read directory during scan /path/to/problem/dir: Permission denied
.This refactoring required updating all call sites that previously used
HardlinkError::Scan
. Therefore, the new error information is propagated correctly throughout the hardlink module.Merging this change enhances the validity of the
mv
command's hardlink preservation feature. Debugging failures related to directory scanning is now significantly more straightforward.