Add summary statistics for metrics views via MCP #7602

grahamplata · 2025-07-09T18:52:24Z

Add summary statistics for metrics views via MCP server tool

query_metrics_view_summary mcp tool

request

{
  "metrics_view": "product_metrics_pg_snapshot_date_2025_03_21_orgs_metrics"
}

response

    [{
        "max": "2025-03-21T15:06:06.104879Z",
        "min": "2023-05-04T12:24:25.106518Z",
        "watermark": "2025-03-21T15:06:06.104879Z",
        "summary": {
            "_all_usergroup_id": {
                "DimensionName": "_all_usergroup_id",
                "DataType": "VARCHAR",
                "SampleValue": "5bd92381-6a5b-4858-a9ec-7cc056ec82f3"
            },
            "_billing_customer_id": {
                "DimensionName": "_billing_customer_id",
                "DataType": "VARCHAR",
                "SampleValue": "24d49175-8de2-428e-9ea3-0b01db4724a5"
            },
            "_created_by_user_id": {
                "DimensionName": "_created_by_user_id",
                "DataType": "VARCHAR",
                "SampleValue": "d991b5f5-fde2-4c15-b30d-169f030b27fa"
            },
            "_id": {
                "DimensionName": "_id",
                "DataType": "VARCHAR",
                "SampleValue": "24d49175-8de2-428e-9ea3-0b01db4724a5"
            },
            "_payment_customer_id": {
                "DimensionName": "_payment_customer_id",
                "DataType": "VARCHAR",
                "SampleValue": "cus_RIEnoGvpb0AUlS"
            },
            "billing_customer_id": {
                "DimensionName": "billing_customer_id",
                "DataType": "VARCHAR",
                "SampleValue": "24d49175-8de2-428e-9ea3-0b01db4724a5"
            },
// ...snip

Sample execution from Claude Desktop

Checklist:

Covered by tests
Ran it and it works as intended
Reviewed the diff before requesting a review
Checked for unhandled edge cases
Linked the issues it closes
Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
Intend to cherry-pick into the release branch
I'm proud of this work!

…ls test

avaitla · 2025-07-09T21:14:01Z

One thing to consider later is if SampleValue can be [SampleValue], with top five sample values, I suspect this will get better results (low cardinality columns will show their full set of values) - then again on the other hand might be too much context. A human generally sees this immediately in the ui and benefits from it.

grahamplata · 2025-07-10T13:54:07Z

One thing to consider later is if SampleValue can be [SampleValue], with top five sample values, I suspect this will get better results (low cardinality columns will show their full set of values) - then again on the other hand might be too much context. A human generally sees this immediately in the ui and benefits from it.

Done and done, added more core sample

[
    {
        "max": "2025-03-21T15:06:06.104879Z",
        "min": "2023-05-04T12:24:25.106518Z",
        "watermark": "2025-03-21T15:06:06.104879Z",
        "summary": {
            "custom_domain": {
                "DimensionName": "custom_domain",
                "DataType": "CODE_STRING",
                "Values": [
                    "reports.gdb.net",
                    "",
                    "rill.romen.ai",
                    "lens.digiadglobal.com",
                    "nebula.boldcollective.co"
                ]
            },
            "description": {
                "DimensionName": "description",
                "DataType": "CODE_STRING",
                "Values": [
                    "A sandbox for me",
                    "Audience-driven revenue data provided by Auth",
                    "BlueCargo",
                    "\"\"",
                    ""
                ]
            },
            "display_name": {
                "DimensionName": "display_name",
                "DataType": "CODE_STRING",
                "Values": [
                    "iansilva2305",
                    "audiohook",
                    "emodo",
                    "djkim",
                    "networkcraze"
                ]
            },
            "favicon_asset_id": {
                "DimensionName": "favicon_asset_id",
                "DataType": "CODE_UUID",
                "Values": [
                    null,
                    "60ec1638-dc36-4c2f-b4fb-56074f2aab57"
                ]
            },
            "id": {
                "DimensionName": "id",
                "DataType": "CODE_UUID",
                "Values": [
                    "78599ac2-c458-453c-86bd-1cd4aa1711c7",
                    "2ca72c5b-e5d7-48ff-8ef8-d235e493f47e",
                    "61cfb1bb-8426-468e-b023-94685cd015ae",
                    "ed5226e0-4b61-4c43-b315-db5ec6b464af",
                    "54c47392-b6ec-4dac-a9f6-db436bbf51da"
                ]
            },
            "logo_asset_id": {
                "DimensionName": "logo_asset_id",
                "DataType": "CODE_UUID",
                "Values": [
                    "54652be3-1ef5-4009-b4ff-e06f3aec2945",
                    "d9526a4f-176a-482b-ae5f-ecfe881e9280",
                    "106d3b6f-f66f-42a0-96a7-6feb3dda97db",
                    "3ec454c6-2fec-40b0-bfe8-57f37425abaf",
                    "70fa445b-596f-49a5-b6a4-41dd499ef2ae"
                ]
            },
            "name": {
                "DimensionName": "name",
                "DataType": "CODE_STRING",
                "Values": [
                    "kiwialec",
                    "syft_data",
                    "BillingTest",
                    "TTI",
                    "kensuke-b-sasaki"
                ]
            },
            "payment_customer_id": {
                "DimensionName": "payment_customer_id",
                "DataType": "CODE_STRING",
                "Values": [
                    "",
                    "cus_RnO55p6OIzqR7u",
                    "cus_RIEnwjUggkNuuB",
                    "cus_RIEnaX6gP5KpAl",
                    "cus_RIEnPcDZ10wDMq"
                ]
            },
            "updated_on": {
                "DimensionName": "updated_on",
                "DataType": "CODE_TIMESTAMP",
                "Values": [
                    "2025-01-06 01:16:48.392374+00",
                    "2025-02-25 15:32:28.471094+00",
                    "2024-12-09 06:55:56.456532+00",
                    "2025-03-17 01:15:48.121846+00",
                    "2025-01-06 01:18:41.720782+00"
                ]
            }
        }
    }
]

avaitla · 2025-07-10T14:30:34Z

Nice. I think Examples or Samples is still a better name than just Values as the latter indicates it's the only set of values but the former indicates it's just an example subset. You can also ask gpt itself what it thinks the best naming is for it to best use it in context window for the use case of an AI Analyst trying to better understand the metrics view of interest.

ericpgreen2 · 2025-07-10T22:38:23Z

Tagged @k-anshul for the backend code review, given Benjamin's OOO.

Some notes about the data structure:

The current summary structure has redundancy: dimension names serve as both object keys and object properties.
It's unclear how to extend this pattern to include measure summaries.

What about something like this (doesn't all need implementing now, but demonstrates a cleaner, more extensible structure):

{
  "time_range": {
    "min": "2023-05-04T12:24:25.106518Z",
    "max": "2025-03-21T15:06:06.104879Z",
    "watermark": "2025-03-21T15:06:06.104879Z"
  },
  "dimensions": [
    {
      "name": "custom_domain",
      "data_type": "CODE_STRING",
      "sample_values": ["reports.foodie.com", "", "rill.groupie.ai", ...],
      "null_count": 42,
      "distinct_count": 5
    },
    {
      "name": "description", 
      "data_type": "CODE_STRING",
      "sample_values": ["A sandbox for me", "Audience-driven...", ...],
      "null_count": 15,
      "distinct_count": 100
    }
  ],
  "measures": [
    {
      "name": "revenue",
      "data_type": "CODE_FLOAT64",
      "min": 0.0,
      "max": 1000000.0,
      "avg": 50000.0,
      "null_count": 0,
      "zero_count": 150
    }
  ],
  "metadata": {
    "total_rows": 1000000,
    "dimensions_count": 9,
    "measures_count": 3,
    "last_refreshed": "2025-03-21T15:06:06.104879Z"
  }
}

avaitla · 2025-07-11T01:59:59Z

Curious if you run both in your own set of basic eval queries on your demo dashboard which performs better. I think the suggestion from Eric is good, not having dim name as the key for the dict but rather name: user_id, the null and total counts are also good signals (but it's worthwhile to build eval queries to avoid regression)

grahamplata · 2025-07-11T18:15:15Z