Skip to content

[Question] Why isn't MultiProcessCollector a subclass of Collector? #1105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Apich238 opened this issue Apr 30, 2025 · 0 comments
Open

[Question] Why isn't MultiProcessCollector a subclass of Collector? #1105

Apich238 opened this issue Apr 30, 2025 · 0 comments

Comments

@Apich238
Copy link

I am studying how to use Prometheus in multiprocessing python code (without Gunicorn).

For now i came to following simple example with two child processes:

from multiprocessing import Process
import shutil
import time, os

os.environ["PROMETHEUS_MULTIPROC_DIR"] = './PMDir'  # This environment variable must be set BEFORE the first import 
                                                    # from prometheus_client to ensure the library uses MultiProcessValue 
                                                    # (metric value class with mmap support). Otherwise, 
                                                    # MutexValue (non-mmap) will be used, and files won't be created.

from prometheus_client import start_http_server, multiprocess, CollectorRegistry, Counter

os.environ["PROMETHEUS_MULTIPROC_DIR"] = './PMDir'  # If we set it AFTER import, the library initializes 
                                                    # the value class = MutexValue, mmap files will not be created,
                                                    # and MultiProcessCollector will have nothing to work with.
                                                    # This matches the documentation's recommendation to set it 
                                                    # before app startup.

COUNTER1 = None
COUNTER2 = None
COUNTER3 = None
COUNTER4 = None


def init_counters(registry):
    """
    Initialize counters (or other metrics) and register them in the specified registry.
    According to MetricWrapperBase's code, if registry=None, metrics won't be registered in any registry.
    """
    global COUNTER1, COUNTER2, COUNTER3, COUNTER4

    COUNTER1 = Counter('counter1', 'Incremented by the first child process', registry=registry)
    COUNTER2 = Counter('counter2', 'Incremented by the second child process', registry=registry)
    COUNTER3 = Counter('counter3', 'Incremented by all processes', registry=registry)
    COUNTER4 = Counter('counter4', 'Incremented by main process', registry=registry)

# We are free not to create registry object in child processes. Both f1 and f2 works as process targets.
# The mmap file handling is managed at the metric object level, not the collector level.
# Variation 1: create registry or not to create registry - both works as I expect.
def f1():
    """First child process body. Works without manual registry creation."""
    init_counters(None)
    while True:
        time.sleep(1)
        print("Child process 1", os.getpid())
        COUNTER1.inc()
        COUNTER3.inc()
    

def f2():
    """Second child process body. Works with manual registry creation."""
    registry = CollectorRegistry()
    init_counters(registry)
    while True:
        time.sleep(2)
        print("Child process 2", os.getpid())
        COUNTER2.inc()
        COUNTER3.inc()


if __name__ == '__main__':
    # Ensure the multiprocess directory exists and is empty
    prome_stats = os.environ["PROMETHEUS_MULTIPROC_DIR"]
    if os.path.exists(prome_stats):
        shutil.rmtree(prome_stats)
    os.mkdir(prome_stats)

    # Variation 2: When using MultiProcessCollector directly (see Variation 4), registry creation is optional
    # registry = CollectorRegistry()  # Create registry for HTTP server
    registry = None  # Works without registry between mpc and http server

    # Create MultiProcessCollector object. It reads and aggregates mmap files from PROMETHEUS_MULTIPROC_DIR.
    # Registering it in our registry means thar registry.collect() calls mpc.collect() and thus metrics from 
    # mpc aggregated by registry.
    # MultiProcessCollector ONLY reads (mmap) files; metric saving is handled by MultiProcessValue in metrics obj.
    mpc = multiprocess.MultiProcessCollector(registry)

    # Variation 3: If main process have to report its own metrics it can use both separate CollectorRegistry or None.
    # init_counters(CollectorRegistry())  # Use separate registry for main process metrics.
    init_counters(None)  # Works without registry registration
    # init_counters(registry)  # Metrics will duplicate if we use same registry as that one mpc registered in.
    #                          # More precisely, in this situation registry will export both mpc metrics and 
    #                          # main process metrics, including overlaps

    # Variation 4: HTTP server can use both registry or mpc
    # start_http_server(8000, registry=registry)  # Standard approach
    start_http_server(8000, registry=mpc)  # Works despite mpc not being a Collector subclass (but implements collect())

    p1 = Process(target=f1, args=())
    p1.start()
    p2 = Process(target=f2, args=())
    p2.start()

    print("collect")

    try:
        while True:
            print('main process   ', os.getpid())
            time.sleep(1)
            COUNTER3.inc()
            COUNTER4.inc()
    except KeyboardInterrupt:
        p1.terminate()
        p2.terminate()
        shutil.rmtree(prome_stats)

(I left my findings as comments because documentation didnt answer my questions and I can make mistakes in my assumptions)

I noticed that MultiProcessCollector implements collect() but doesn't inherit from the Collector class.

The Custom collectors documentation suggests that custom collectors should implement this interface. However, MultiProcessCollector is defined as:

class MultiProcessCollector:
    """Collector for files for multi-process mode."""

    def __init__(self, registry, path=None):
        ...

This seems contradictory since:

  • The class name contains "Collector"
  • It implements the core collect() method
  • It's designed to be registered in a CollectorRegistry

Could you clarify:

  1. Is this intentional design? If yes, what's the rationale behind not inheriting from Collector?
  2. Are there any potential compatibility risks in not following the Collector interface contract?
  3. What are potential risks of using MultiProcessCollector object in start_http_server directly, without registry = CollectorRegistry() between them?

I want to ensure I understand this correctly for proper integration. Thanks for your insights!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant