-
Notifications
You must be signed in to change notification settings - Fork 144
Add summary statistics for metrics views via MCP #7602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
One thing to consider later is if SampleValue can be [SampleValue], with top five sample values, I suspect this will get better results (low cardinality columns will show their full set of values) - then again on the other hand might be too much context. A human generally sees this immediately in the ui and benefits from it. |
Done and done, added more core sample [
{
"max": "2025-03-21T15:06:06.104879Z",
"min": "2023-05-04T12:24:25.106518Z",
"watermark": "2025-03-21T15:06:06.104879Z",
"summary": {
"custom_domain": {
"DimensionName": "custom_domain",
"DataType": "CODE_STRING",
"Values": [
"reports.gdb.net",
"",
"rill.romen.ai",
"lens.digiadglobal.com",
"nebula.boldcollective.co"
]
},
"description": {
"DimensionName": "description",
"DataType": "CODE_STRING",
"Values": [
"A sandbox for me",
"Audience-driven revenue data provided by Auth",
"BlueCargo",
"\"\"",
""
]
},
"display_name": {
"DimensionName": "display_name",
"DataType": "CODE_STRING",
"Values": [
"iansilva2305",
"audiohook",
"emodo",
"djkim",
"networkcraze"
]
},
"favicon_asset_id": {
"DimensionName": "favicon_asset_id",
"DataType": "CODE_UUID",
"Values": [
null,
"60ec1638-dc36-4c2f-b4fb-56074f2aab57"
]
},
"id": {
"DimensionName": "id",
"DataType": "CODE_UUID",
"Values": [
"78599ac2-c458-453c-86bd-1cd4aa1711c7",
"2ca72c5b-e5d7-48ff-8ef8-d235e493f47e",
"61cfb1bb-8426-468e-b023-94685cd015ae",
"ed5226e0-4b61-4c43-b315-db5ec6b464af",
"54c47392-b6ec-4dac-a9f6-db436bbf51da"
]
},
"logo_asset_id": {
"DimensionName": "logo_asset_id",
"DataType": "CODE_UUID",
"Values": [
"54652be3-1ef5-4009-b4ff-e06f3aec2945",
"d9526a4f-176a-482b-ae5f-ecfe881e9280",
"106d3b6f-f66f-42a0-96a7-6feb3dda97db",
"3ec454c6-2fec-40b0-bfe8-57f37425abaf",
"70fa445b-596f-49a5-b6a4-41dd499ef2ae"
]
},
"name": {
"DimensionName": "name",
"DataType": "CODE_STRING",
"Values": [
"kiwialec",
"syft_data",
"BillingTest",
"TTI",
"kensuke-b-sasaki"
]
},
"payment_customer_id": {
"DimensionName": "payment_customer_id",
"DataType": "CODE_STRING",
"Values": [
"",
"cus_RnO55p6OIzqR7u",
"cus_RIEnwjUggkNuuB",
"cus_RIEnaX6gP5KpAl",
"cus_RIEnPcDZ10wDMq"
]
},
"updated_on": {
"DimensionName": "updated_on",
"DataType": "CODE_TIMESTAMP",
"Values": [
"2025-01-06 01:16:48.392374+00",
"2025-02-25 15:32:28.471094+00",
"2024-12-09 06:55:56.456532+00",
"2025-03-17 01:15:48.121846+00",
"2025-01-06 01:18:41.720782+00"
]
}
}
}
] |
Nice. I think Examples or Samples is still a better name than just Values as the latter indicates it's the only set of values but the former indicates it's just an example subset. You can also ask gpt itself what it thinks the best naming is for it to best use it in context window for the use case of an AI Analyst trying to better understand the metrics view of interest. |
Tagged @k-anshul for the backend code review, given Benjamin's OOO. Some notes about the data structure:
What about something like this (doesn't all need implementing now, but demonstrates a cleaner, more extensible structure): {
"time_range": {
"min": "2023-05-04T12:24:25.106518Z",
"max": "2025-03-21T15:06:06.104879Z",
"watermark": "2025-03-21T15:06:06.104879Z"
},
"dimensions": [
{
"name": "custom_domain",
"data_type": "CODE_STRING",
"sample_values": ["reports.foodie.com", "", "rill.groupie.ai", ...],
"null_count": 42,
"distinct_count": 5
},
{
"name": "description",
"data_type": "CODE_STRING",
"sample_values": ["A sandbox for me", "Audience-driven...", ...],
"null_count": 15,
"distinct_count": 100
}
],
"measures": [
{
"name": "revenue",
"data_type": "CODE_FLOAT64",
"min": 0.0,
"max": 1000000.0,
"avg": 50000.0,
"null_count": 0,
"zero_count": 150
}
],
"metadata": {
"total_rows": 1000000,
"dimensions_count": 9,
"measures_count": 3,
"last_refreshed": "2025-03-21T15:06:06.104879Z"
}
} |
Curious if you run both in your own set of basic eval queries on your demo dashboard which performs better. I think the suggestion from Eric is good, not having dim name as the key for the dict but rather name: user_id, the null and total counts are also good signals (but it's worthwhile to build eval queries to avoid regression) |
initial pass remove unused save save
7769c98
to
f6e14ca
Compare
golangci-lint
9643718
to
6b4b471
Compare
// Validate time range | ||
if defaultTimeRange.Max.IsZero() { | ||
// If we have no time data, still process time dimensions we found | ||
for _, dim := range timeDimensions { | ||
var dataType string | ||
if dim.DataType != nil { | ||
dataType = dim.DataType.Code.String() | ||
} | ||
|
||
timeDimensionSummaries = append(timeDimensionSummaries, DimensionSummary{ | ||
Name: dim.Name, | ||
DataType: dataType, | ||
}) | ||
} | ||
|
||
// Include default time dimension if it wasn't found in explicit dimensions | ||
if !defaultTimeDimFound && defaultTimeDimName != "" && e.security.CanAccessField(defaultTimeDimName) { | ||
timeDimensionSummaries = append(timeDimensionSummaries, DimensionSummary{ | ||
Name: defaultTimeDimName, | ||
DataType: "TIMESTAMP", // Assume timestamp for default time dimension | ||
}) | ||
} | ||
|
||
return &SummaryResult{ | ||
Dimensions: timeDimensionSummaries, | ||
TimeRange: defaultTimeRange, | ||
}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case seems a little weird. If the range is empty, I think it can just keep it empty and keep going (not return early).
(And not add a time filter in the dimension profiling.)
Sample [
{
"dimensions": [
{
"name": "__time",
"data_type": "TIMESTAMP",
"min_value": "2025-07-15T18:24:27.8Z",
"max_value": "2025-07-22T18:24:27.8Z"
},
{
"name": "adomain",
"data_type": "CODE_STRING",
"example_value": "clorox.com",
"min_value": "1800petmeds.com",
"max_value": "zyp.test"
},
{
"name": "advertiser_name",
"data_type": "CODE_STRING",
"example_value": "Clorox",
"min_value": "6sense",
"max_value": "vertex"
},
{
"name": "app_or_site",
"data_type": "CODE_STRING",
"example_value": "App",
"min_value": "App",
"max_value": "Site"
}
],
"time_range": {
"name": "__time",
"data_type": "TIMESTAMP",
"min_value": "2025-07-15T18:24:27.8Z",
"max_value": "2025-07-22T18:24:27.8Z"
}
}
] |
Add summary statistics for metrics views via MCP server tool
query_metrics_view_summary
mcp toolSample execution from Claude Desktop
Checklist: