Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Ottomata
	Sep 23 2021, 3:54 PM

Description

T234565: Standardize the logging format is trying to standardize the software logging format with the Elastic Common Schema. If we are able to produce these ECS logs with Event Platform, they would be automatically ingested into the WMF Data Lake.

This will allow people to use SQL to query the logs with Spark SQL (e.g,. spark3-sql, or pyspark, etc.) or Presto, and can be used for dashboarding with Superset.

This would be particularly useful if we are able to successfully migrate the MediaWiki logging format to ECS, as then MediaWiki software logs could be joined with other MediaWiki data in Hive.

Since the logs would be in Kafka with a well defined schema, they would also be consumable and reusable for other purposes, e.g. stream processing, anomaly detection and alerting, or ingestion into different data stores.

In 2021, Data Engineering and Observability teams met to discuss this idea. To accomplish this, we'd need:

An ECS+Event Platform compatible event JSONSchema in schemas/event/primary
- This means combining a JSONSchema ECS schema with the Event Platform required fields
CI Tooling to generate the Elastic index template schema from the ECS event schema. (This Elastic template schema could be stored in operations/software/ecs as it is now, or in schemas/event/primary)
Logging streams declared in MediaWiki event stream config
Logging streams mirrored from the Kafka logging clusters to the Kafka jumbo clusters (T304373: Also intake Network Error Logging events into the Analytics Data Lake also requires this).

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T189333 Changing Kibana filters is ridiculously slow
Open	colewhite	T234565 Standardize the logging format
Open	None	T291645 Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL

Event Timeline

Ottomata created this task.Sep 23 2021, 3:54 PM

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptSep 23 2021, 3:54 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Ottomata added projects: Event-Platform, Analytics.Sep 23 2021, 3:57 PM

Ottomata mentioned this in T247675: Stop overriding LogstashFormatter format.

Ottomata added a subscriber: Milimetric.

Marostegui triaged this task as Medium priority.Sep 24 2021, 4:29 AM

odimitrijevic moved this task from Incoming to Event Platform on the Analytics board.Sep 27 2021, 4:22 PM

lmata edited projects, added Observability-Logging, SRE Observability (FY2021/2022-Q2); removed SRE Observability (FY2021/2022-Q1).Oct 6 2021, 4:06 AM

herron removed a project: observability.Oct 7 2021, 5:23 PM

BTullis subscribed.Oct 15 2021, 3:22 PM

Ottomata mentioned this in T288851: Make logging work for mediawiki in k8s.Oct 25 2021, 9:17 PM

@colewhite, in https://phabricator.wikimedia.org/T288851#7456931 you said:

topics prefixed by rsyslog- will be automatically picked up by Logstash.

We've found using topic naming conventions for ingestion jobs to be brittle. We're moving towards using EventStreamConfig to automate configuring things like this. See: https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#consumers_and_producers

Example:

curl  'https://meta.wikimedia.org/w/api.php?action=streamconfigs&all_settings=1&streams=mediawiki.api-request' |  jq .

{
  "streams": {
    "mediawiki.api-request": {
      "topics": [
        "eqiad.mediawiki.api-request",
        "codfw.mediawiki.api-request"
      ],
      "stream": "mediawiki.api-request",
      "consumers": {
        "analytics_hadoop_ingestion": {
          "enabled": true,
          "job_name": "event_default"
        }
      },
      "canary_events_enabled": true,
      "topic_prefixes": [
        "eqiad.",
        "codfw."
      ],
      "destination_event_service": "eventgate-analytics",
      "schema_title": "mediawiki/api/request"
    }
  }
}

Here, we are declaring a consumer called 'analytics_hadoop_ingestion'. The settings for that consumer are arbitrary and specific to the consumer job. When that job runs, it requests all streams that have consumers.analytics_hadoop_ingestion declared, and uses those settings to import the data.

Logstash ingestion could probably do something similar, if the logging streams to import were declared in EventStreamConfig.

Ottomata mentioned this in T269832: Add a presto query logger.Dec 13 2021, 2:17 PM

Maintenance_bot added a project: Data-Engineering.Dec 13 2021, 2:46 PM

odimitrijevic moved this task from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.Jan 5 2022, 11:37 PM

odimitrijevic removed a project: Analytics.Jan 12 2022, 12:34 AM

Restricted Application added a project: Analytics. · View Herald TranscriptJan 12 2022, 12:34 AM

odimitrijevic removed a project: Analytics.Jan 12 2022, 12:49 AM

Restricted Application added a project: Analytics. · View Herald TranscriptJan 12 2022, 12:49 AM

lmata edited projects, added SRE Observability; removed SRE Observability (FY2021/2022-Q2), Observability-Logging.Jan 17 2022, 4:11 PM

lmata moved this task from FY2021/2022-Q2 to Inbox on the SRE Observability board.

lmata edited projects, added Observability-Logging; removed SRE Observability.Jan 17 2022, 11:12 PM

Ottomata mentioned this in T304373: Also intake Network Error Logging events into the Analytics Data Lake.Mar 22 2022, 12:03 PM

Ottomata mentioned this in T265876: Logging options for apache httpd in k8s.Jan 10 2023, 2:55 PM

JArguello-WMF moved this task from Apache Iceberg Migration to Event Platform Backlog on the Data-Engineering board.Jun 29 2023, 11:31 PM

JArguello-WMF added a project: Data Engineering and Event Platform Team.Jun 30 2023, 4:30 PM

JArguello-WMF moved this task from Data Eng Backlog to Event Platform Backlog on the Data Engineering and Event Platform Team board.Jun 30 2023, 4:38 PM

• lbowmaker removed a project: Data Engineering and Event Platform Team.Nov 10 2023, 2:29 PM

Ottomata mentioned this in T355837: Add Prometheus support to statsd.js via mw.track().May 20 2024, 5:29 PM

Ottomata mentioned this in T359583: Provide a way to get sampled POST body logs.Jun 13 2024, 1:58 PM

tchin subscribed.Jul 24 2024, 6:44 PM

Michael subscribed.Jul 26 2024, 11:28 AM

Ottomata mentioned this in T266886: Augment NEL reports with a computed timestamp-of-generation.Jul 29 2024, 4:45 PM

gmodena subscribed.Aug 12 2024, 7:23 PM

This would have been useful to debug T374662, aggregating the times out of elasticsearch is a bit hard as it would have to aggregate 50M requests through a single core (estimated time: multiple days due to repeated work for each pagination). Being able to throw hadoop at the problem would solve in a few 10s of minutes with an easy query.

bking subscribed.Sep 13 2024, 6:44 PM

This would be very useful for us to be able to understand if known problematic reusers (see: https://phabricator.wikimedia.org/T317001) are similarly saturating other data streams to avoid correcting issue behavior.

This would also help with some analysis in {T375146}

matmarex subscribed.Oct 9 2024, 6:25 PM

I've been told that this project would let me process Logstash data with SQL queries, and I would like that very much.

Ottomata renamed this task from Integrate Event Platform and ECS logs to Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL.Oct 10 2024, 1:16 PM

Ottomata updated the task description. (Show Details)

Ottomata moved this task from Backlog to Stream Data Products on the Event-Platform board.Oct 25 2024, 1:33 PM

Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQLOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL
Open, MediumPublic
Actions

Related Objects
Search...