Skip to content

⚡️ Speed up MetricsAggregator._flushable_buckets() by 26% in sentry_sdk/metrics.py #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jun 15, 2024

📄 MetricsAggregator._flushable_buckets() in sentry_sdk/metrics.py

📈 Performance improved by 26% (0.26x faster)

⏱️ Runtime went down from 14.8 microseconds to 11.7 microseconds

Explanation and details

To make the program run faster, I will focus on the following.

  1. Minimize lock contention by reducing the scope and frequency of lock acquisition.
  2. Replace the use of unnecessary temporary variables.
  3. Optimize the iteration and condition checks.

Here's the optimized version of the program.

Explanation.

  1. Moved cutoff and flushable_buckets initialization outside the lock: This reduces the time spent holding the lock.
  2. Replaced temporary lists with direct list comprehensions: This minimizes unnecessary variable creation and memory usage.
  3. Removed redundant checks and clear logic: By leveraging a direct comprehension for keys to delete, we optimized the iteration to avoid repeated condition checking.
  4. Used buckets.clear() and del more effectively: This ensures atomic deletion within the lock's scope.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 19 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
# function to test
import random
import threading
import time

import pytest  # used for our unit tests


class Metric:
    def __init__(self, weight):
        self.weight = weight
from sentry_sdk.metrics import MetricsAggregator

# unit tests

@pytest.fixture
def aggregator():
    return MetricsAggregator(capture_func=lambda x: x)

def test_no_buckets_present(aggregator):
    # No buckets present
    aggregator.buckets = {}
    flushable = aggregator._flushable_buckets()
    assert flushable == []

def test_single_bucket_not_flushable(aggregator):
    # Single bucket with a timestamp newer than the cutoff
    aggregator.buckets = {time.time(): {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert flushable == []

def test_single_bucket_flushable(aggregator):
    # Single bucket with a timestamp older than the cutoff
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert flushable == [(timestamp, {"metric1": Metric(weight=10)})]

def test_all_buckets_flushable(aggregator):
    # All buckets are flushable
    timestamp1 = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 3 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 2

def test_no_buckets_flushable(aggregator):
    # No buckets are flushable
    timestamp1 = time.time() - 0.5 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 0.8 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    flushable = aggregator._flushable_buckets()
    assert flushable == []

def test_some_buckets_flushable(aggregator):
    # Some buckets are flushable
    timestamp1 = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 0.5 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 1

def test_force_flush_no_buckets(aggregator):
    # Force flush with no buckets
    aggregator._force_flush = True
    aggregator.buckets = {}
    flushable = aggregator._flushable_buckets()
    assert flushable == []
    assert aggregator._force_flush == False

def test_force_flush_with_buckets(aggregator):
    # Force flush with buckets
    aggregator._force_flush = True
    timestamp1 = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 0.5 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 2
    assert aggregator._force_flush == False

def test_weight_calculation_with_flushable_buckets(aggregator):
    # Weight calculation with flushable buckets
    timestamp1 = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 3 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    aggregator._buckets_total_weight = 30
    flushable = aggregator._flushable_buckets()
    assert aggregator._buckets_total_weight == 0

def test_weight_calculation_no_flushable_buckets(aggregator):
    # Weight calculation with no flushable buckets
    timestamp1 = time.time() - 0.5 * aggregator.ROLLUP_IN_SECONDS
    timestamp2 = time.time() - 0.8 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {
        timestamp1: {"metric1": Metric(weight=10)},
        timestamp2: {"metric2": Metric(weight=20)}
    }
    aggregator._buckets_total_weight = 30
    flushable = aggregator._flushable_buckets()
    assert aggregator._buckets_total_weight == 30

def test_empty_bucket_list(aggregator):
    # Empty bucket list
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {}}
    flushable = aggregator._flushable_buckets()
    assert flushable == [(timestamp, {})]

def test_boundary_timestamps(aggregator):
    # Boundary timestamps
    timestamp = time.time() - aggregator.ROLLUP_IN_SECONDS - aggregator._flush_shift
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 1

def test_negative_timestamps(aggregator):
    # Negative timestamps
    aggregator.buckets = {-1: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert flushable == [(-1, {"metric1": Metric(weight=10)})]

def test_simultaneous_access(aggregator):
    # Simultaneous access
    def worker(aggregator):
        aggregator._flushable_buckets()

    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    threads = [threading.Thread(target=worker, args=(aggregator,)) for _ in range(10)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

def test_large_number_of_buckets(aggregator):
    # Large number of buckets
    aggregator.buckets = {time.time() - i * aggregator.ROLLUP_IN_SECONDS: {"metric": Metric(weight=1)} for i in range(1000)}
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 1000

def test_large_metrics_within_buckets(aggregator):
    # Large metrics within buckets
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {f"metric{i}": Metric(weight=1) for i in range(1000)}}
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 1

def test_random_flush_shift(aggregator):
    # Random flush shift
    aggregator._flush_shift = random.random() * aggregator.ROLLUP_IN_SECONDS
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert len(flushable) == 1

def test_state_reset_after_force_flush(aggregator):
    # State reset after force flush
    aggregator._force_flush = True
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert aggregator._force_flush == False

def test_state_reset_after_regular_flush(aggregator):
    # State reset after regular flush
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": Metric(weight=10)}}
    flushable = aggregator._flushable_buckets()
    assert aggregator._force_flush == False

def test_invalid_bucket_data(aggregator):
    # Invalid bucket data
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: "invalid"}
    with pytest.raises(TypeError):
        aggregator._flushable_buckets()

def test_invalid_metric_data(aggregator):
    # Invalid metric data
    timestamp = time.time() - 2 * aggregator.ROLLUP_IN_SECONDS
    aggregator.buckets = {timestamp: {"metric1": {}}}
    with pytest.raises(AttributeError):
        aggregator._flushable_buckets()

🔘 (none found) − ⏪ Replay Tests

To make the program run faster, I will focus on the following.

1. Minimize lock contention by reducing the scope and frequency of lock acquisition.
2. Replace the use of unnecessary temporary variables.
3. Optimize the iteration and condition checks.

Here's the optimized version of the program.



### Explanation.
1. **Moved `cutoff` and `flushable_buckets` initialization outside the lock:** This reduces the time spent holding the lock.
2. **Replaced temporary lists with direct list comprehensions:** This minimizes unnecessary variable creation and memory usage.
3. **Removed redundant checks and clear logic:** By leveraging a direct comprehension for keys to delete, we optimized the iteration to avoid repeated condition checking.
4. **Used `buckets.clear()` and `del` more effectively:** This ensures atomic deletion within the lock's scope.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 15, 2024
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 June 15, 2024 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants