Skip to content

inference extension not correctly scrape pod metrics #365

@Kuromesi

Description

@Kuromesi

What happened:
Currently inference extension request url := fmt.Sprintf("http://%s/metrics", existing.Address) for metrics, however, in the latest version, pod address is the pod ip and the port is not assigned to the url, so inference extension can not correctly scrape the pod metrics since it always send requests to port 80.

new := &PodMetrics{
		Pod: Pod{
			NamespacedName: types.NamespacedName{
				Name:      pod.Name,
				Namespace: pod.Namespace,
			},
			Address: pod.Status.PodIP,
		},
		Metrics: Metrics{
			ActiveModels: make(map[string]int),
		},
	}

And also, the scrape time is refreshMetricsInterval = flag.Duration("refreshMetricsInterval", 50*time.Millisecond, "interval to refresh metrics via polling pods"). So once the metrics fail to be scraped, the resulting logs can become quite extensive.

Image

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions