Skip to content

Incorrect ASCII detection  #9

@cbourgeois

Description

@cbourgeois

Hi,

I think that the test set for this package is too reduced, the default values for very simple strings are wrong:

echo $LANG
en_US.UTF-8

charamel.Detector().probe('abc')
[(<Encoding.CP_1006: 'cp1006'>, 0.9521461826551444), (<Encoding.CP_864: 'cp864'>, 0.9462450387005286), (<Encoding.UTF_7: 'utf_7'>, 0.9452766125829656)]
charamel.Detector().probe('Param1234567890*ą_')
[(<Encoding.CP_1006: 'cp1006'>, 0.9521461826551444), (<Encoding.CP_864: 'cp864'>, 0.9462450387005286), (<Encoding.UTF_7: 'utf_7'>, 0.9452766125829656)]

The first one should return ascii and the second one UTF-8.

Thanks in advance for looking into that,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions