Skip to content

Conversation

gusai-dd
Copy link
Contributor

@gusai-dd gusai-dd commented Jan 7, 2020

What does this PR do?

Adds Attribute-based exclusion filter documentation and updates screenshots.

Motivation

Align doc with deployed feature.

Preview link

https://docs-staging.datadoghq.com/gusai-dd-patch-1/logs/indexes/#exclusion-filters

@gusai-dd gusai-dd added the Logs label Jan 7, 2020
@gusai-dd gusai-dd requested a review from pcarioufr January 7, 2020 12:06
@gusai-dd gusai-dd requested a review from a team as a code owner January 7, 2020 12:06
@gusai-dd
Copy link
Contributor Author

gusai-dd commented Jan 7, 2020

Hello @pcarioufr ,
changes implemented.
Thanks for your review!

@NBParis
Copy link

NBParis commented Jan 8, 2020

Preview do not work for me, is that expected?

Comment on lines 99 to 107
#### Sample all logs of a trace

In order to guarantee the sampling of all logs related to a given trace, define the log sampling on `Trace Id` attribute.
Refer to [Trace Remapper][10] to get more information on how to connect logs and traces.

For example, setting exclude 99% of `Trace Id`, you end up with:
- 1% of traces for which we keep all logs
- 99% of traces for which we keep no logs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we do a specific section out of it. This could be an example of the previous section as it's an attribute like this others even if automatically added by Datadog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is a specific use-case related to Datadog specific traceId, that's why it's in a subsection.

Copy link
Contributor Author

@gusai-dd gusai-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review implemented.

Comment on lines 99 to 107
#### Sample all logs of a trace

In order to guarantee the sampling of all logs related to a given trace, define the log sampling on `Trace Id` attribute.
Refer to [Trace Remapper][10] to get more information on how to connect logs and traces.

For example, setting exclude 99% of `Trace Id`, you end up with:
- 1% of traces for which we keep all logs
- 99% of traces for which we keep no logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is a specific use-case related to Datadog specific traceId, that's why it's in a subsection.

Comment on lines 83 to 96
In order to ensure complete log information related to a given subset of transactions, sample on a specific attribute holding an ID, such as User, Session, Request Identifiers.
By default, the log-based sampling is applied (exclusion percentage set on `all logs`).

This feature operates on attribute's values rather than the log itself, for example:
* given an attribute `id` holding 3 distinct values (`1`, `2`, `3`),
* defining a 66% exclusion filter on it,
* results in indexing all logs holding 1 specific attribute's value (`3`), all the other logs won't be indexed.

{{< img src="logs/indexes/index_exclusion_attribute_groups.png" alt="index attribute-based exclusion filter groups" style="width:70%;">}}

For example, exclude `nginx` logs for 99% of `http.request_id`:

{{< img src="logs/indexes/index_exclusion_on_attribute.png" alt="index attribute-based exclusion filter" style="width:70%;">}}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's focus on the http.request_id example and the trace one and put the emphasis on the fact that the kept traces are complete (which is the key functionality of this new sampling mechanism).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, session_id and trace_id examples only.
I prefer session instead of request as the latter has no concept of transaction unless we define an exclusion filter with a * query (on every log).


#### Sample all logs of a trace

In order to guarantee the sampling of all logs related to a given trace, define the log sampling on `Trace Id` attribute.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be clearer here, as I'm not sure it's crystal clear what we are trying to say.

Something like the following:

Once [logs and traces are connected], a trace_id attribute is automatically added in the logs. In order to be able to sample those logs but make sure that for the chosen trace ids the full set of logs is kept, an attribute-based exclusion filter should be set on the trace_id attribute.

For example, setting exclude 60% of Trace Id, results in:

  • 40% of all traces get all their related logs indexed.
  • 60% of all traces have their logs sampled (still available in livetail and archives)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed better your wording, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the last bullet is not correct: for the 60% of traces we don't get any logs indexed.

@pcarioufr pcarioufr added the WORK IN PROGRESS No review needed, it's a wip ;) label Jan 15, 2020
@ruthnaebeck
Copy link
Contributor

Replacement PR - #6413

@ruthnaebeck ruthnaebeck deleted the gusai-dd-patch-1 branch January 15, 2020 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WORK IN PROGRESS No review needed, it's a wip ;)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants