Skip to content

Commit fa02a7c

Browse files
authored
feat!: hardlink deduplications (#291)
* feat!: hardlink deduplications (wip) * test: multiple hardlinks to a single file * fix: algorithm * test: children * refactor: allow customization * feat(api): re-export `dashmap` * feat: report deduplication results * refactor: use `0.into()` instead * refactor: just use `default` * feat(api): ability to cause errors * feat!: report hardlinks to progress * feat(api): stop exposing `dashmap` * feat(api): obscure container type * docs: fix * feat(api): newtype for inode numbers * feat(api): make `Event` `non_exhaustive` * feat(api): rename a struct * feat: show shared links before errors * feat(api): `ProgressReport::shared` should be `Size` * perf: reduce string resizing * feat(api): rename some structs * feat(api): move `HardlinkList` to `hardlink` * docs: fix * feat(api): reflection types for hardlink types * fix(windows): dependencies * feat(json)!: add optional `shared-inodes` * feat(api): add some aliases * docs: correction * docs: correction * feat(api): `into_reflection` * feat(api): `HardlinkList::{len,is_empty}` * feat(json): output the hardlink record * chore(git): revert a wrong change This reverts commit 4efa9b9. * fix(json): move `"shared-inodes"` out of `"tree"` * feat(api)!: rename `UnitAndTree` to `JsonDataBody` * feat(windows): error on unsupported feature * fix(json): `--json-input` with `--shared-inodes` * feat(json): rename `"shared-inodes"` to `"shared"` * refactor: this looks better * feat(json): always sort `shared` by `ino` * fix: git merge artifact * fix: `TEXT_MAX_LEN` * feat(api)!: remove `hook` * feat(api): add some aliases * feat(api): re-export `RecordHardlinksArgument` * feat(api): create some aliases * feat(api): expose `aware` and `ignorant` * feat(api): simplify, eliminate `.clone()` * refactor: shorten, consistency * refactor: rename a trait to match its sole method * refactor: shorten a name * docs: `HardlinkAware::new` * refactor: use `SmartDefault` * feat(api): `DeduplicateSharedSize` * refactor: move POSIX-exclusive code to its own module * feat(api): `Reflection::{new,len,is_empty}` * feat(api): `Reflection::iter` * feat(api): summarize `HardlinkList` * fix: compilation conditionals * feat(api): `Reflection::{len,is_empty}` * feat(json): `.shared.{details,summary}` * feat(api): `Default` for `JsonShared` * fix: report hardlinks * feat(api): shorten a method name * test: fix * feat(json): print hardlinks summary * fix(json): missing `shared` * fix: clippy * feat(json): omit `.shared.{details,summary}` * feat(cli): forbid `--json-input` + `--deduplicate-hardlinks` * feat(cli): define some aliases * chore: generate completions * refactor: prefer arrays * docs(cli): make some aliases visible * refactor: move iterators to its own module * refactor: move iterator to its own module * feat(api)!: pass params into `Sub::json_output` * feat(api): convert `Infallible` to `RuntimeError` * docs: add some text * feat(api): make `RecordHardlinks` fallible * feat(api): move instead of mutate * feat: output json despite deduplication failures * refactor: replace qualified with use * feat: report the number of unrecorded links (wip) * feat: report the number of unrecorded links * fix: message * fix: grammar * feat: remove extra trailing newline * feat(api): stop relying on `Into` * feat(api): remove unneeded impls * feat(api): better names * docs: be more specific * feat(api): revert back to mutations This reverts commit 8f79037. * docs: real solutions * docs: I have decided that I don't care There's just too many ways to create invalid tree (such as implementing a custom trait incorrectly for example) that it's not worth the efforts * fix: windows * test: fix windows * fix: windows * fix: windows * test: fix windows * test: fix windows * test: fix windows * test: fix windows * test: fix windows * chore: remove a lint silencer * docs: resolve a todo * feat(api): summary on no hardlinks * docs: resolve a todo * feat(api): remove eq traits from `LinkPathList` The order of items is non-deterministic * docs: more details * docs: correction * test: reflections' equality * feat: assert links equal * test: change detection * fix: windows * ci(benchmark): add `deduplicate-hardlinks` * docs(cli): add examples * docs(cli): consistent wording * docs(cli): more accurate description * chore: update completions * docs(readme): update regarding hardlinks * fix: windows * refactor: longer dot chain * test: prepare some functions * refactor: word wrap consistently * test: less confusing names * feat(api): add missing `Mul<usize>` * fix(test): create the files * docs: fix grammar * test: complex tree * style: add an empty line * refactor: rearrange * test: hardlinks summary * refactor: use `.into_sorted_by` * test: hardlinks details * docs: plan * refactor: rename a function * refactor: add `.as_ref` * test: add some (wip) * test: sorted * test: unique inodes * test: combine * test: correct setup function * test: more even number * test: summary * test: use `assert_eq` * test: without deduplication * test: consistent convention * test: create fewer files * refactor: better formula * docs: remove irrelevant comment * test: `into_sorted_by_key` * test: some hardlinks with deduplication * refactor: remove unnecessary `vec!` * test: hardlinks summary text * refactor: shorten the code * test: some hardlinks without deduplication * refactor: rearrange * refactor: split a test file * refactor: rename a test file * refactor: remove repeating suffices from test names * feat(api): immutable setter methods * chore(deps): add `derive_setters` * feat(api): immutable setters for `Summary` * test: exclusive hardlinks only * test: mixing exclusive-only and external-only * test: add a missing assertion * test: external hardlinks only * refactor: consistent naming * test: simple tree with some hardlinks * test: visualization * style: remove some empty lines * refactor: remove unnecessary cast * feat: multiple arguments hardlinks deduplication * test: multiple arguments hardlinks deduplication * refactor: rename a test file * feat: deduplicate arguments * test: `remove_items_from_vec_by_indices` * docs: quirk * test: posix only * refactor: use mockable trait * test: add some tests * test: correct `MockedApi::canonicalize` * feat: spare symlinks from deduplication * test: forgot to normalize * test: fix `MockedApi::is_real_dir` * test: fix a typo * test: clear assertion message * test: fix `resolve_symlink` * test: add some tests * style: missing trailing comma * refactor: move tests to their own files * test: deduplicate arguments * chore: apply some suggestions * style: apply copilot suggestion * docs(cli): improve flag description * refactor: use better terms
1 parent 155c4f1 commit fa02a7c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4895
-92
lines changed

Cargo.lock

Lines changed: 85 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ cli = ["clap/derive", "clap_complete", "clap-utilities", "json"]
4949
cli-completions = ["cli"]
5050

5151
[dependencies]
52+
dashmap = "^6.1.0"
5253
pipe-trait = "^0.4.0"
5354
smart-default = "^0.7.1"
5455
derive_more = { version = "^2.0.1", features = ["full"] }
@@ -72,5 +73,6 @@ sysinfo = "^0.35.2"
7273
build-fs-tree = "^0.7.1"
7374
command-extra = "^1.0.0"
7475
maplit = "^1.0.2"
76+
normalize-path = "^0.2.1"
7577
pretty_assertions = "^1.4.1"
7678
rand = "^0.9.1"

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,14 @@ The benchmark was generated by [a GitHub Workflow](https://github.com/KSXGitHub/
5050
* Very fast.
5151
* Relative comparison of separate files.
5252
* Extensible via the library crate or JSON interface.
53+
* Unbiased regarding hardlinks: All hardlinks are treated as equally real.
54+
* Optional hardlink detection and deduplication (would make `pdu` proportionally slower).
5355
* Optional progress report (would make `pdu` slightly slower).
5456
* Customize tree depth.
5557
* Customize chart size.
5658

5759
## Limitations
5860

59-
* Ignorant of hard links: All hard links are counted as real files.
6061
* Do not follow symbolic links.
6162
* Do not differentiate filesystem: Mounted folders are counted as normal folders.
6263
* The runtime is optimized at the expense of binary size.

ci/github-actions/benchmark/matrix.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,17 @@ export const COMPETING_BENCHMARK_MATRIX: readonly CompetingBenchmarkCategory[] =
111111
['du', '--count-links'],
112112
],
113113
},
114+
{
115+
id: 'deduplicate-hardlinks',
116+
pduCliArgs: ['--deduplicate-hardlinks'],
117+
competitors: [
118+
['dust', '--no-progress'],
119+
['dua'],
120+
['ncdu', '-o', '/dev/stdout', '-0'],
121+
['gdu', '--non-interactive', '--no-progress'],
122+
['du'],
123+
],
124+
},
114125
{
115126
id: 'top-down',
116127
pduCliArgs: ['--top-down'],

exports/completion.bash

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ _pdu() {
2323

2424
case "${cmd}" in
2525
pdu)
26-
opts="-h -V --json-input --json-output --bytes-format --top-down --align-right --quantity --depth --max-depth --width --total-width --column-width --min-ratio --no-sort --no-errors --silent-errors --progress --threads --help --version [FILES]..."
26+
opts="-h -V --json-input --json-output --bytes-format --detect-links --dedupe-links --deduplicate-hardlinks --top-down --align-right --quantity --depth --max-depth --width --total-width --column-width --min-ratio --no-sort --no-errors --silent-errors --progress --threads --omit-json-shared-details --omit-json-shared-summary --help --version [FILES]..."
2727
if [[ ${cur} == -* || ${COMP_CWORD} -eq 1 ]] ; then
2828
COMPREPLY=( $(compgen -W "${opts}" -- "${cur}") )
2929
return 0

exports/completion.elv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,17 @@ set edit:completion:arg-completer[pdu] = {|@words|
2929
cand --threads 'Set the maximum number of threads to spawn. Could be either "auto", "max", or a number'
3030
cand --json-input 'Read JSON data from stdin'
3131
cand --json-output 'Print JSON data instead of an ASCII chart'
32+
cand --deduplicate-hardlinks 'Detect and subtract the sizes of hardlinks from their parent directory totals'
33+
cand --detect-links 'Detect and subtract the sizes of hardlinks from their parent directory totals'
34+
cand --dedupe-links 'Detect and subtract the sizes of hardlinks from their parent directory totals'
3235
cand --top-down 'Print the tree top-down instead of bottom-up'
3336
cand --align-right 'Set the root of the bars to the right'
3437
cand --no-sort 'Do not sort the branches in the tree'
3538
cand --silent-errors 'Prevent filesystem error messages from appearing in stderr'
3639
cand --no-errors 'Prevent filesystem error messages from appearing in stderr'
3740
cand --progress 'Report progress being made at the expense of performance'
41+
cand --omit-json-shared-details 'Do not output `.shared.details` in the JSON output'
42+
cand --omit-json-shared-summary 'Do not output `.shared.summary` in the JSON output'
3843
cand -h 'Print help (see more with ''--help'')'
3944
cand --help 'Print help (see more with ''--help'')'
4045
cand -V 'Print version'

exports/completion.fish

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,13 @@ complete -c pdu -l min-ratio -d 'Minimal size proportion required to appear' -r
1111
complete -c pdu -l threads -d 'Set the maximum number of threads to spawn. Could be either "auto", "max", or a number' -r
1212
complete -c pdu -l json-input -d 'Read JSON data from stdin'
1313
complete -c pdu -l json-output -d 'Print JSON data instead of an ASCII chart'
14+
complete -c pdu -l deduplicate-hardlinks -l detect-links -l dedupe-links -d 'Detect and subtract the sizes of hardlinks from their parent directory totals'
1415
complete -c pdu -l top-down -d 'Print the tree top-down instead of bottom-up'
1516
complete -c pdu -l align-right -d 'Set the root of the bars to the right'
1617
complete -c pdu -l no-sort -d 'Do not sort the branches in the tree'
1718
complete -c pdu -l silent-errors -l no-errors -d 'Prevent filesystem error messages from appearing in stderr'
1819
complete -c pdu -l progress -d 'Report progress being made at the expense of performance'
20+
complete -c pdu -l omit-json-shared-details -d 'Do not output `.shared.details` in the JSON output'
21+
complete -c pdu -l omit-json-shared-summary -d 'Do not output `.shared.summary` in the JSON output'
1922
complete -c pdu -s h -l help -d 'Print help (see more with \'--help\')'
2023
complete -c pdu -s V -l version -d 'Print version'

exports/completion.ps1

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,17 @@ Register-ArgumentCompleter -Native -CommandName 'pdu' -ScriptBlock {
3232
[CompletionResult]::new('--threads', '--threads', [CompletionResultType]::ParameterName, 'Set the maximum number of threads to spawn. Could be either "auto", "max", or a number')
3333
[CompletionResult]::new('--json-input', '--json-input', [CompletionResultType]::ParameterName, 'Read JSON data from stdin')
3434
[CompletionResult]::new('--json-output', '--json-output', [CompletionResultType]::ParameterName, 'Print JSON data instead of an ASCII chart')
35+
[CompletionResult]::new('--deduplicate-hardlinks', '--deduplicate-hardlinks', [CompletionResultType]::ParameterName, 'Detect and subtract the sizes of hardlinks from their parent directory totals')
36+
[CompletionResult]::new('--detect-links', '--detect-links', [CompletionResultType]::ParameterName, 'Detect and subtract the sizes of hardlinks from their parent directory totals')
37+
[CompletionResult]::new('--dedupe-links', '--dedupe-links', [CompletionResultType]::ParameterName, 'Detect and subtract the sizes of hardlinks from their parent directory totals')
3538
[CompletionResult]::new('--top-down', '--top-down', [CompletionResultType]::ParameterName, 'Print the tree top-down instead of bottom-up')
3639
[CompletionResult]::new('--align-right', '--align-right', [CompletionResultType]::ParameterName, 'Set the root of the bars to the right')
3740
[CompletionResult]::new('--no-sort', '--no-sort', [CompletionResultType]::ParameterName, 'Do not sort the branches in the tree')
3841
[CompletionResult]::new('--silent-errors', '--silent-errors', [CompletionResultType]::ParameterName, 'Prevent filesystem error messages from appearing in stderr')
3942
[CompletionResult]::new('--no-errors', '--no-errors', [CompletionResultType]::ParameterName, 'Prevent filesystem error messages from appearing in stderr')
4043
[CompletionResult]::new('--progress', '--progress', [CompletionResultType]::ParameterName, 'Report progress being made at the expense of performance')
44+
[CompletionResult]::new('--omit-json-shared-details', '--omit-json-shared-details', [CompletionResultType]::ParameterName, 'Do not output `.shared.details` in the JSON output')
45+
[CompletionResult]::new('--omit-json-shared-summary', '--omit-json-shared-summary', [CompletionResultType]::ParameterName, 'Do not output `.shared.summary` in the JSON output')
4146
[CompletionResult]::new('-h', '-h', [CompletionResultType]::ParameterName, 'Print help (see more with ''--help'')')
4247
[CompletionResult]::new('--help', '--help', [CompletionResultType]::ParameterName, 'Print help (see more with ''--help'')')
4348
[CompletionResult]::new('-V', '-V ', [CompletionResultType]::ParameterName, 'Print version')

exports/completion.zsh

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,19 @@ block-count\:"Count numbers of blocks"))' \
2828
'*--column-width=[Maximum widths of the tree column and width of the bar column]:TREE_WIDTH:_default:TREE_WIDTH:_default' \
2929
'--min-ratio=[Minimal size proportion required to appear]:MIN_RATIO:_default' \
3030
'--threads=[Set the maximum number of threads to spawn. Could be either "auto", "max", or a number]:THREADS:_default' \
31-
'(--quantity)--json-input[Read JSON data from stdin]' \
31+
'(--quantity --deduplicate-hardlinks)--json-input[Read JSON data from stdin]' \
3232
'--json-output[Print JSON data instead of an ASCII chart]' \
33+
'--deduplicate-hardlinks[Detect and subtract the sizes of hardlinks from their parent directory totals]' \
34+
'--detect-links[Detect and subtract the sizes of hardlinks from their parent directory totals]' \
35+
'--dedupe-links[Detect and subtract the sizes of hardlinks from their parent directory totals]' \
3336
'--top-down[Print the tree top-down instead of bottom-up]' \
3437
'--align-right[Set the root of the bars to the right]' \
3538
'--no-sort[Do not sort the branches in the tree]' \
3639
'--silent-errors[Prevent filesystem error messages from appearing in stderr]' \
3740
'--no-errors[Prevent filesystem error messages from appearing in stderr]' \
3841
'--progress[Report progress being made at the expense of performance]' \
42+
'--omit-json-shared-details[Do not output \`.shared.details\` in the JSON output]' \
43+
'--omit-json-shared-summary[Do not output \`.shared.summary\` in the JSON output]' \
3944
'-h[Print help (see more with '\''--help'\'')]' \
4045
'--help[Print help (see more with '\''--help'\'')]' \
4146
'-V[Print version]' \

0 commit comments

Comments
 (0)