Skip to content

chore: enforce unique linked_ids #12815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 3, 2024
Merged

chore: enforce unique linked_ids #12815

merged 4 commits into from
Apr 3, 2024

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Mar 28, 2024

Closes #12116

Follow up of: #12117

What this does

linked_id, login_type should be unique. In the past, deleted users shared the same linked_id, causing conflicts and issue for new users.

This db constraint should prevent that mistake in the future.

Caution

I have a bit of worry running this db migration that can remove linked_ids. linked_id is only populated when a user is created, so this migration cannot have a mistake in it.

If 2 users currently share the same linked_id, that means a single github user has 2 coder accounts using the same github account. By doing linked_id = '', we just make the user login code match on email instead. This isn't great, but the prior situation is also completely ambiguous.

The concern comes from legacy deployments that might have a mistake from previous code that handled linked_ids with less strictness.

Need some serious SQL query review...

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query matches motivation, so I'm approving based on that.

) > 1 THEN '' ELSE out.linked_id END;

-- Enforce unique linked_id constraint on non-empty linked_id
CREATE UNIQUE INDEX idx_user_link_linked_id ON user_links USING btree (linked_id, login_type) WHERE (linked_id != '');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres typically names these with idx as suffix, this would be named ~ user_links_linked_id_login_type_idx.

Also, what's the purpose for allowing this index to be partial, could we not delete the conflicting user_links entries instead?

Copy link
Member Author

@Emyrk Emyrk Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. Legacy accounts do not have a linked_id, and at present, we make no attempt to update those values when they log in again.

I debated implementing some code to migrate users to the modern as they log in, but that changes the match behavior. At present, if your linked_id is empty, we use your email to match.

You could fathom a situation where someone has more than 1 github email, and more than 1 coder account with said emails. If I attempted to fix this, the github account would get bound to the next account logged in, which is not deterministic.

So I'd rather just leave the legacy accounts as they were and keep them empty.

-- linked_value, there is no way to determine correctly which user should
-- be updated. Since the linked_id is empty, this value will be linked
-- by email.
UPDATE ONLY user_links out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UPDATE ONLY user_links out
UPDATE ONLY user_links AS out

Not necessary, but perhaps more clear.

PS. Is ONLY actually necessary here, is there inheritance involved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only because of inheritance.

-- When the count of linked_id is greater than 1, set the linked_id to empty
SELECT COUNT(*)
FROM user_links inn
WHERE out.linked_id = inn.linked_id AND out.login_type = inn.login_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be really explicit about only matching this specific case, should we also check out.user_id != inn.user_id? This obviously has the chance of failing on index creation if this constraint isn't met.

What about checking oauth expiry? Only zero the one with an older expiry? Don't know if that's what we want, just putting ideas out there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index is only on login_type and linked_id. The issue at present is that 2 users can share the same linked_id. If I add a user_id check, then the counts would be per user, and I could still have a constraint error.

What about checking oauth expiry? Only zero the one with an older expiry? Don't know if that's what we want, just putting ideas out there.

Nah. We should only have 1 user_link per user. Although maybe that is a better migration to apply first. I have that in an issue here: #12116

Fixing that is a more challenging problem though, because you'd have to decide which login is the correct one.

Emyrk added 4 commits April 3, 2024 11:10
Duplicate linked_ids make no sense. 2 users cannot share the same
source user from a provider
@Emyrk Emyrk force-pushed the stevenmasley/unique_linked_id branch from 592312f to 1d54a52 Compare April 3, 2024 18:04
@Emyrk Emyrk merged commit a3187dc into main Apr 3, 2024
@Emyrk Emyrk deleted the stevenmasley/unique_linked_id branch April 3, 2024 18:17
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add db constraints on table user_links
2 participants