Skip to content

Ignore Whitespaces Only Partially Works #66

Closed
@zoaked

Description

@zoaked

Preface
I'm not sure if this is a bug or if I am misusing the tool and/or not configuring it correctly. If so, please consider this to be a question and I would appreciate any information that could be provided.

Thanks!

Describe the bug
When checking for equality, whitespaces are correctly ignored. When generating differences, some whitespaces are still compared while others are deleted from the result.

To Reproduce
Steps to reproduce the behavior:

DiffRowGenerator generator = DiffRowGenerator.create()
		.showInlineDiffs(true)
		.inlineDiffByWord(true) 
		.ignoreWhiteSpaces(true)
		.oldTag(f -> "~")      //introduce markdown style for strikethrough
		.newTag(f -> "**")     //introduce markdown style for bold
		.build();

//compute the differences for two test texts.
List<DiffRow> rows1 = generator.generateDiffRows(
		Arrays.asList("This\nis\na\ntest."),
		Arrays.asList("This is a test"));

or a more basic example using tabs instead of newlines...

//compute the differences for two test texts.
List<DiffRow> rows2 = generator.generateDiffRows(
		Arrays.asList("This\tis\ta\ttest."),
		Arrays.asList("This is a test"));

or an even more basic example that just changes the number of spaces...

//compute the differences for two test texts.
List<DiffRow> rows3 = generator.generateDiffRows(
		Arrays.asList("This  is  a  test."),
		Arrays.asList("This is a test"));

Actual Result

  • rows1:
    (period is correctly identified as an "old" tag while newlines are gone)
`Thisisatest~.~`

(spaces are considered to be "new" tags when we were asking for them to be ignored)

`This** **is** **a** **test`
  • rows2:
    (period is correctly identified as an "old" tag while tabs are also treated as "old" tags)
`This~    ~is~    ~a~    ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`
  • rows3:
    (period is correctly identified as an "old" tag while the spaces are also treated as "old" tags)
`This~  ~is~  ~a~  ~test~.~`

(spaces are considered to be "new" tags when we are asking for them to be ignored)

`This** **is** **a** **test`

Expected Behavior

  • rows1:
This\nis\na\ntest~.~
This is a test
  • rows2:
This\tis\ta\ttest~.~
This is a test
  • rows3:
This  is  a  test~.~
This is a test

Notes
If a period is left at the end of the second string being diff-ed in any of the above blocks, this does not happen and the entire block is identified as matching like was expected.

Suggested Fix

  1. Pass DiffRowGenerator.equalizer down to DiffUtils.diff when DiffRowGenerator.generateInlineDiffs is called.
  2. Use an internal identifier for merging/splitting instead of something that could be contained within the comparison.

System

  • Java version: 8 (1.8.0_151)
  • Diff Utils Version: 4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions