Skip to content

Heuristically filter out minified files #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 11, 2016

Conversation

ABaldwinHunter
Copy link
Contributor

By default we filter out minified files that are identifiable as minified from
their filename (e.g. *.min.js), but many projects have minified files that
don't match those patterns, and these files can cause ESLint to segfault in a
container environment (possibly due to memory pressure).

This adds a new heuristic filter based on line length that filters out files
that appear to be minified to prevent them being analyzed. For any matching
files, a log message is sent to stderr notifying that the file is being
skipped.

@codeclimate/review @wfleming

Note: tested locally on 8 repos and saw minimal performance loss. There are some minified files that we currently analyze successfully and this change would mean ignoring them instead. We think that is okay.

By default we filter out minified files that are identifiable as minified from
their filename (e.g. *.min.js), but many projects have minified files that
don't match those patterns, and these files can cause ESLint to segfault in a
container environment (possibly due to memory pressure).

This adds a new heuristic filter based on line length that filters out files
that appear to be minified to prevent them being analyzed. For any matching
files, a log message is sent to stderr notifying that the file is being
skipped.
BatchSanitizer.prototype.sanitizedFiles = function() {
var sanitizedFiles = [];

for(var i = 0; i < this.files.length; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you did this with a for loop and not something like forEach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason. Want me to switch to forEach instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh - don't feel strongly. This is fine!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because I honestly forgot Node had forEach.

@gdiggs
Copy link
Contributor

gdiggs commented Aug 11, 2016

tested locally on 8 repos

Which repos? Were any of them particularly large (lots of files, large files, etc)?

@ABaldwinHunter
Copy link
Contributor Author

@gordondiggs tested on a variety of sizes - shortest run took 4 seconds, longest 16 minutes (mocha).

Public repos from sample:
codeclimate/codeclimate-eslint
codeclimate/fixme
chaijs/chai
mochajs/mocha
mewo2/terrain
sachinchoolur/lightgallery.js

@gdiggs
Copy link
Contributor

gdiggs commented Aug 11, 2016

@ABaldwinHunter We've used relax/relax as a sample large repo in the past - can you test there to be sure?

@gdiggs
Copy link
Contributor

gdiggs commented Aug 11, 2016

Code LGTM - curious how it affects relax

@ABaldwinHunter
Copy link
Contributor Author

@gordondiggs Thanks! Relax looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants