Skip to content

IOError when scanning file with odd chars #98

Open
@rob99

Description

@rob99

I inadvertently pasted in to a ruby comment some text from Word which had inverted commas. When I executed CodeRay.scan_file on that file, it complained with:

IOError (Cannot run program "file" (in directory "C:\tb\port_compare"): CreateProcess error=2, The system cannot find the file specified)

...which was thrown at lib/coderay/scanner.rb:120 (method guess_encoding). Further up the stack in normalize I could see where it was branching to encode_with_encoding (as opposed to to_unix) so I commented that out to force it to use to_unix.

Then I retried and received this error:

CodeRay::Scanners::Scanner::ScanError (

***ERROR in scanner.rb:200:in `tokenize': invalid byte sequence in UTF-8 (after 0 tokens)

tokens:


current line: 55  column: 89  pos: 1673
matched: "# WTF? AND data_srce_sys_cde / id_prod_cmpnt_cde_1 are in \x93Interest Only\x94 list"  state: "Error in CodeRay::Scanners::Ruby#scan_tokens, initial state was: :initial"
bol? = false,  eos? = false

surrounding code:
"_1 are in \u0093Interest Only\u0094 list"  ~~  "\n          return :bullet_inte"


***ERROR***

...which helped my diagnose the root problem.

If would be good if there was some error handling around the IO.popen call to help diagnose, or if the call to guess_encoding was stricter (assuming it was called in error). Not sure how to do this but thought I'd log it here anyway in case someone else has the same error...

Windows XP - Notepad ++ - ANSI file

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions