Skip to content

perf: use io.StringIO instead tempdir/tempfile #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 26, 2024

Conversation

jnooree
Copy link
Contributor

@jnooree jnooree commented Mar 23, 2024

This is a follow-up PR of #92.

In the previous PR review #92 (comment) @2bndy5 suggested:

Can we avoid caching to a temp file? I raised some concern about this approach in cpp-linter#73 as well (about cost in speed).
Could we use io.StringIO stream instead?

In that version, multiprocessing.Pool.imap() was used for parallel processing, and it blocks until all of the results are made available (if I'm not misunderstood). The behavior might cause memory problems on repositories with many files and warnings.

After, I decided to switch to the async version in 4e029a3 by using concurrent.futures.as_completed, which returns the logs as each work is completed (hence its name). Now that the logs are immediately printed after the subprocess is done processing a file and logs for a single file would not overflow memory in most projects, I think it is safe to use io.StringIO for better performance (as with the original suggestion).

Copy link

codecov bot commented Mar 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3970d38) to head (df1b92c).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #94   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        20           
  Lines         1437      1436    -1     
=========================================
- Hits          1437      1436    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@2bndy5 2bndy5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response. I got lost exploring the multiprocessing docs and the concurrent.futures docs...

@2bndy5
Copy link
Contributor

2bndy5 commented Mar 26, 2024

Initial benchmark indeed looks a bit faster:

hyperfine "cpp-linter -l false -a false -j 1" "cpp-linter -l false -a false -j 0"
Benchmark 1: cpp-linter -l false -a false -j 1
  Time (mean ± σ):     54.647 s ±  2.467 s    [User: 20.625 s, System: 10.809 s]
  Range (min … max):   52.247 s … 57.656 s    5 runs

Benchmark 2: cpp-linter -l false -a false -j 0
  Time (mean ± σ):     27.898 s ±  0.882 s    [User: 24.238 s, System: 10.881 s]
  Range (min … max):   26.847 s … 28.713 s    5 runs

Summary
  cpp-linter -l false -a false -j 0 ran
    1.96 ± 0.11 times faster than cpp-linter -l false -a false -j 1

The previous PR averaged about 1.95 times faster with -j 0 🎉

@2bndy5 2bndy5 merged commit 07f63c2 into cpp-linter:main Mar 26, 2024
@2bndy5
Copy link
Contributor

2bndy5 commented Mar 26, 2024

The behavior might cause memory problems on repositories with many files and warnings.

Its good that this was done in a separate PR. If we do get problems with memory saturation, then we can simply revert the merge commit and rollback to the temp file approach.

@2bndy5 2bndy5 added the enhancement New feature or request label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants