Skip to content

`split': invalid byte sequence in UTF-8 (ArgumentError) #188

@jatinganhotra

Description

@jatinganhotra

Hi,

For a project, I'm storing diffs for each commit in forms of difffiles (diff per file) and patches, including the stats for each patch.

My script starts like -

require 'rubygems'
require 'logger'
require 'rugged'
require 'git'

# Include diff, difffile classes
load 'diff.rb'
load 'difffile.rb'
load 'helper.rb'

# Global array to store all the diffs
diffs_array = []

# Initialize both libraries
working_dir = `pwd`.chomp
rubygit_gem_repo = Git.open(working_dir, :log => Logger.new(STDOUT))
rugged_repo = Rugged::Repository.new(working_dir)

# Get all the commits for the project
commit_list = rubygit_gem_repo.log(nil)

# Get the initial empty tree state
empty_tree=`git hash-object -w -t tree /dev/null`
empty_state = rugged_repo.lookup("#{empty_tree.chomp}")

commit_list_array = commit_list.to_a

For 2 commits, I calculate the diff as follows:

  diff_bw_commits = rubygit_gem_repo.diff(prev_sha, next_sha)
  diff = Diff.new(prev_sha, next_sha, diff_bw_commits)
  diff.generate_difffiles_and_stats

In the generate_difffiles_and_stats function, I'm doing the following:

@difffiles = []
    # diff.class => Git::Diff
    # Get the stats for the diff, before extracting individual difffiles
    @stats = @diff.stats
    diff = @diff.to_a

    self.generate_stats
    @num_difffiles = diff.size
    @num_difffiles.times do |i|
      difffile = DiffFile.new( diff[i] )
      @difffiles << difffile
    end

My script runs fine for simple commit histories that I created myself, but when I run it on project JSHint, I'm getting an error:

</CreationDate(D:20120619174250-04'00')/Creator(Adobe Illustrator CS5.1)/ModDat0000000000 65535 f-04'00')/Producer(Adobe PDF library 9.90)/Title(jshint)>>
+0000000016 00000 n
+0000000144 00000 n
+0001070928 00000 n
<</Size 32/Root 1 0 R/Info 31 0 R/ID[<6BDD672972174366B9A561E955D8F759><CE113017%%EOF00efBA27BDB89A4EB25>]>>
\ No newline at end of file
/Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:121:in `split': invalid byte sequence in UTF-8 (ArgumentError)
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:121:in `process_full_diff'
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:107:in `process_full'
    from /Users/jatinganhotra/.rvm/gems/ruby-2.1.3@527project/gems/git-1.2.8/lib/git/diff.rb:64:in `each'
    from diff.rb:57:in `to_a'
    from diff.rb:57:in `generate_difffiles_and_stats'
    from script.rb:48:in `block in <main>'
    from script.rb:32:in `each'
    from script.rb:32:in `<main>'

I researched about the error and found that the issue can be fixed by the answer in this StackOverflow answer.
Is it something that I am doing wrong?
Please let me know if you need any more information.

P.S. I know that this gem is not under active development, but it very nicely breaks down a Diff to Difffile to Patch. I can also easily access stats for each commit and stats per file. So, I stick to using it for diff purposes. I looked at the Rugged gem, but couldn't find such functionality. So, I just love this gem for this :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions