Skip to content

Anonymous users in logs + audit logs #6054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #4726
Emyrk opened this issue Feb 6, 2023 · 7 comments
Closed
Tracked by #4726

Anonymous users in logs + audit logs #6054

Emyrk opened this issue Feb 6, 2023 · 7 comments
Assignees
Labels
design needed Request for more beauty

Comments

@Emyrk
Copy link
Member

Emyrk commented Feb 6, 2023

Problem

We track failed attempts in audit logs and logs, and to generate diffs we are using random uuids's for anonymous users (see #5925).

The issue for this is that the uuids are not obviously anonymous at first glance, and might cause some misunderstandings if logs are consumed programatically. It would be hard to tell if a uuid was random, or maybe some user that was deleted (hard delete).

Tracking the same anonymous user

The other issue with this is that it does not track the same anonymous user. IP address in logs does do this to an extent, and might be sufficient. We could set a cookie on the user's browser some identifying token that does not expire so all unauthenticated requests have the same tracking ID. That way in the audit logs, it is clear the same browser is making many failed requests. Especially useful if there is a group of users behind a NAT since the IP tracking would not distinguish individual users.

Idea

One idea to ensure that it is clear which UUIDs track anonymous users and have no relations to any data in the database, we could:

Static UUID

We could use a static id (eg 11111111-1111-1111-1111-111111111111). This is obvious it is not a real UUID.

The downside is all anonymous users are the same.

Not completely random uuid

We could manipulate a random UUID. UUID's have various parts to them. The RFC states:

https://www.rfc-editor.org/rfc/rfc4122#section-4.4

[4.4](https://www.rfc-editor.org/rfc/rfc4122#section-4.4).  Algorithms for Creating a UUID from Truly Random or
      Pseudo-Random Numbers

   The version 4 UUID is meant for generating UUIDs from truly-random or
   pseudo-random numbers.

   The algorithm is as follows:

   o  Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

   o  Set the four most significant bits (bits 12 through 15) of the
      time_hi_and_version field to the 4-bit version number from
      [Section 4.1.3](https://www.rfc-editor.org/rfc/rfc4122#section-4.1.3).

   o  Set all the other bits to randomly (or pseudo-randomly) chosen
      values.

Standard parts of a uuid (remember that v4 is the one we use and most of this is completely random)

386f74ad-340c-41f9-87ac-6c7c840cd459_What+is+a+UUID_+(1)

We could use a different version uuid, as it has the same format. I don't know what version exactly, but we could use a version (like v1) that uses time for many of the bits, and something static for the Node part. Meaning any "anonymous user" is essentially tracked to the time they were identified as an anonymous user.

Example:
(decode the uuid here to see the time: https://www.uuidtools.com/decode)

3cb47c7a-a637-11ed-afa1-aa0000000000

Some combination of static and random

Same as above, but more random parts instead of using time.

@Emyrk Emyrk added feature design needed Request for more beauty labels Feb 6, 2023
@spikecurtis
Copy link
Contributor

We should not be putting random garbage in audit logs.

It is not a requirement at this time to track anonymous failed login attempts beyond IP address. If the audit code won't correctly drop logs without a user UUID, we should fix the audit code so that it will.

@Kira-Pilot Kira-Pilot mentioned this issue Feb 7, 2023
41 tasks
@Kira-Pilot
Copy link
Member

Related to #6108

@bpmct
Copy link
Member

bpmct commented Feb 9, 2023

Failed log in attempts are a pretty common thing for enterprise products to capture in audit logs. This is something we did in v1: https://www.enterpriseready.io/features/audit-log/

@bpmct
Copy link
Member

bpmct commented Feb 9, 2023

However, it looks like some platforms do not have it: https://gitlab.com/gitlab-org/gitlab/-/issues/233377

@spikecurtis
Copy link
Contributor

@bpmct my position is that we should drop audit logs on failed login attempts, with IP address, but we should not be assigning random UUIDs to these attempts as that can confuse auditors into thinking the UUIDs represent actual coder users. We don't want customers calling us up saying, "why didn't I get any audit logs for this user other than just a failed login attempt?"

@Kira-Pilot
Copy link
Member

@spikecurtis we're not doing that! We adjusted the audit code as you suggested. We still need the concept of an anonymous user (with UUIDs we do track) so we can capture 1) repeated failed logins by complete strangers and 2) users who are joining for the first time and don't have an API Key yet.

@Kira-Pilot
Copy link
Member

After investigating, we have decided to move forward with the following solution:

  1. We now audit users signing in for the first time with OIDC or Github. These will not show as 'unknown user' in the UI, as they were previously. Instead, we grab all their linked user information. You can review this fix here.
  2. We also talked about tracking failed logins by complete strangers, and we have decided not to track these logins at all. This decision was largely inspired by a comment in Gitlab's issue for the same feature request (which they decided not to implement):
...this feature has the potential to increase the severity of a DoS attack.  Actually, that could be why it wasn't implemented in the first place...
Even outside the context of deliberate DoS attacks, this feature will result in publicly-available instances of GitLab logging a lot of junk over time.  There are constantly bots roaming the web trying to log into applications, it's just a fact of life.  Those entries are only interesting when they reach a volume where the IP is worth blocking, and in that case the information should be recorded as an IP block, not as a huge quantity of similar login attempts.
I wish I had recorded where this proposal originated from, because I can't remember now.  Looking at it today, I'm not sure this proposal would produce more value than cost in the big picture.
As for this particular customer's problem, I wonder if we could modify the rate limiting logic to record the username if it's accessible.  (In the context of login it would be accessible as an input parameter.)  How does that sound?

We will continue to audit failed login attempts by known users.

Closing this ticket. The original ticket for the Github/OIDC fix can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design needed Request for more beauty
Projects
None yet
Development

No branches or pull requests

4 participants