-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
split: pass GNU tests/b-chunk.sh #5475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GNU testsuite comparison:
|
@tertsdiepraam would you mind reviewing this one ? |
setting back to draft - even though it passes the GNU test, the implementation for ---io-blksize option is incomplete, needs more work |
ed7c9ad
to
ad29e42
Compare
GNU testsuite comparison:
|
It seems to be unnecessary since we have already made the path relative using `construct_dest_path`.
rustix & linux-raw-sys
* uucore support for illumos and solaris * use macro to consolidate illumos and solaris signals * fixing some CI issues * replaced macro with better cfg usage
GNU testsuite comparison:
|
GNU testsuite comparison:
|
@tertsdiepraam @sylvestre this one is ready for review |
} | ||
} | ||
None => b'\n', | ||
}; | ||
|
||
let io_blksize: Option<usize> = if let Some(s) = matches.get_one::<String>(OPT_IO_BLKSIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about ?
let io_blksize: Option<usize> = matches.get_one::<String>(OPT_IO_BLKSIZE).map(|s| {
parse_size_u64(s)
.ok()
.and_then(|n| {
let n = n.try_into().map_err(|_| SettingsError::InvalidIOBlockSize(s.to_string()))?;
if n > OPT_IO_BLKSIZE_MAX {
Err(SettingsError::InvalidIOBlockSize(s.to_string()))
} else {
Ok(n)
}
})
}).transpose()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case it would "eat" possible error returned by parse_size_u64(), so it cannot be bubbled up to from()->Result. Also, since ok() converts it to Option and the and_then() expects an option to be returned from the closure, it is not straitforward to return the SettingsError in either places within the closure inside closure - i.e. doing return Err(SettingsError::InvalidIOBlockSize(s.to_string()))
or Some(Err(SettingsError::InvalidIOBlockSize(s.to_string())))
would not work : as return is not for from()->Result function but closure, and wrapping in Some() to satisfy and_then() signature we will end up with Option<Option<Result<>>>
as a retuning value from closure from map() instead of Option<Result<>>
... could unwrap(), but then still missing/eating error from parse_size_u64() ...
I could be missing something though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please ignore my comment then :(
src/uu/split/src/split.rs
Outdated
// empty STDIN stream, | ||
// and files with true file size 0 | ||
// will also fit here | ||
input_size = num_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about an early exit ?
input_size = num_bytes; | |
Ok(num_bytes) |
src/uu/split/src/split.rs
Outdated
let mut tmp_fd = File::open(Path::new(input))?; | ||
let end = tmp_fd.seek(SeekFrom::End(0))?; | ||
if end > 0 { | ||
input_size = end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same:
input_size = end; | |
Ok(end) |
src/uu/split/src/split.rs
Outdated
let read_limit: u64 = if let Some(n) = io_blksize { | ||
*n | ||
} else { | ||
OPT_IO_BLKSIZE_MAX | ||
} | ||
.try_into() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let read_limit: u64 = if let Some(n) = io_blksize { | |
*n | |
} else { | |
OPT_IO_BLKSIZE_MAX | |
} | |
.try_into() | |
.unwrap(); | |
let read_limit = io_blksize.unwrap_or(OPT_IO_BLKSIZE_MAX) as u64; |
src/uu/split/src/split.rs
Outdated
// could report incorrect file size via `metadata.len()` | ||
// either `0` while there is content | ||
// or a size larger than actual content | ||
input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, early return
input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | |
get_irregular_input_size(input, reader, buf, io_blksize) |
src/uu/split/src/split.rs
Outdated
// Regular file | ||
let metadata = metadata(input)?; | ||
input_size = metadata.len(); | ||
// Double check the size if `metadata.len()` reports `0` | ||
if input_size == 0 { | ||
input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Regular file | |
let metadata = metadata(input)?; | |
input_size = metadata.len(); | |
// Double check the size if `metadata.len()` reports `0` | |
if input_size == 0 { | |
input_size = get_irregular_input_size(input, reader, buf, io_blksize)?; | |
} | |
let metadata = metadata(input)?; | |
let size = metadata.len(); | |
// If metadata reports size 0, use get_irregular_input_size to double-check | |
if size == 0 { | |
get_irregular_input_size(input, reader, buf, io_blksize) | |
} else { | |
Ok(size) | |
} |
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work! Seeing that GNU test pass is great!
src/uu/split/src/split.rs
Outdated
// The ---io parameter is consumed and ignored. | ||
// The parameter is included to make GNU coreutils tests pass. | ||
static OPT_IO: &str = "-io"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example from GNU:
❯ split ---
split: option '---io-blksize' requires an argument
Try 'split --help' for more information.
❯ split ---io-bl
split: option '---io-blksize' requires an argument
Try 'split --help' for more information.
This is not a separate option, but one of those things where a long argument can be shortened to the first few letters if its unambiguous. clap
supports this too and I think it's turned on (it's infer_long_args(true)
) so do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, looks like it is the case. will remove
src/uu/split/src/split.rs
Outdated
// of bytes per chunk. | ||
// | ||
// Get the size of the input in bytes | ||
let initial_buf: &mut Vec<u8> = &mut vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let initial_buf: &mut Vec<u8> = &mut vec![]; | |
let initial_buf = Vec::new(); |
src/uu/split/src/split.rs
Outdated
if usize::BITS < 64 { | ||
let _: usize = num_chunks | ||
.try_into() | ||
.map_err(|_| USimpleError::new(1, "Number of chunks too big"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? The original code makes sense to me. If we need it to fit in a usize
why wouldn't we make it a usize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it made sense for the original code because num_chunks was later used in method calls that require usize like writers.iter_mut().take(num_chunks - 1)
, and there was a test to test for that error - looks like only for code coverage %. I left it in to keep the test, but looking at it more closely - it does not need to be usize, so I will remove it along with the test that checked for it.
src/uu/split/src/split.rs
Outdated
if input == "-" | ||
|| input.starts_with("/dev/") | ||
|| input.starts_with("/proc/") | ||
|| input.starts_with("/sys/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels inherently brittle and I'm wondering whether there's a better way to do it. For example, we could have links to all of these filesystems and that should still work. Based on the syscalls they make, I don't think GNU has exceptions like, though I'm not entirely sure what they are doing instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they might just try to get the length first and try this other method if that returns 0
? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in GNU they first go straight to reading it into a buffer and then compare that to what metadata/filesystem is reporting + do seek on the file if those disagree (as well as trying to copy things arround into temp file/buffer, etc) and some other edge cases. I tried to avoid reading file into a buffer first for everything, but could re-implement to closer follow GNU approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing what GNU does is the right call! Nice investigative work! (As a precaution: remember not to look at the GNU source code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be ready to go
src/uu/split/src/split.rs
Outdated
loop { | ||
let chunk_size = chunk_size_base + (chunk_size_reminder > i) as u64; | ||
let buf: &mut Vec<u8> = &mut vec![]; | ||
i += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work?
for i in 1.. {
...
}
GNU testsuite comparison:
|
GNU testsuite comparison:
|
@tertsdiepraam @sylvestre |
GNU testsuite comparison:
|
@tertsdiepraam @sylvestre I have another set of changes ready that pass next Gnu test for split 'l-chunk', but they are based on code included in this PR. Should I add them into this PR or wait until this one is resolved and then submit those in a new one? |
--------- Co-authored-by: Terts Diepraam <terts.diepraam@gmail.com> Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> Co-authored-by: Brandon Elam Barker <brandon.barker@gmail.com> Co-authored-by: Kostiantyn Hryshchuk <statheres@gmail.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This PR adds
---io-blksize
option--number
strategies0
for file size, while actually having some content OR report size greater than actual contentpasses GNU tests/b-chunk.sh