Skip to content

feat(coderd): add dbcrypt package #9421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

feat(coderd): add dbcrypt package #9421

wants to merge 1 commit into from

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Aug 30, 2023

See #9522 instead.

This commit builds upon the previous work in #7959:

  • Moved dbcrypt package to enterprise/dbcrypt
  • Modified original dbcrypt behaviour to not delete un-decryptable rows.
  • Added a table dbcrypt_sentinel used to determine database encryption status.
  • Added support for multiple encryption keys in dbcrypt.

NOTE: This is part 1 of a 2-part PR. This PR focuses mainly on the dbcrypt and database packages. #9433 adds the required plumbing to integrate this into enterprise/coderd properly.

The original working branch cj/dbcrypt can be viewed here: #9339

This commit builds upon the previous work in #7959:
- Moved dbcrypt package to enterprise/dbcrypt
- Modified original dbcrypt behaviour to not delete
  un-decryptable rows.
- Added a table dbcrypt_sentinel used to determine database
  encryption status.
- Added support for multiple encryption keys in
  dbcrypt.

NOTE: This is part 1 of a 2-part PR. This PR focuses
mainly on the dbcrypt and database packages. A separate
PR will add the required plumbing to integrate this into
enterprise/coderd properly.

Co-authored-by: Kyle Carberry <kyle@coder.com>
}

func (*DecryptFailedError) Unwrap() error {
return sql.ErrNoRows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic behind making failed decryption look like sql.ErrNoRows?

I feel like it will mask operational issues (like having the wrong key), making it take longer for ops to find and fix the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer for that apart from "it felt like the right thing to do"?
But you're correct, making it unwrap to ErrNoRows does mask the case where we can't decrypt a value. In the case of an oauth token, it just means that the user has to re-authenticate. But for other data, we could end up in a weird situation where we select, allegedly get no rows, insert, and hit a duplicate key constraint -- which would be a really annoying problem to figure out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this initially, and it's a dev-ux problem. I was actually considering changing the way we do this in entirety - not making it an abstraction layer of database.Store.

It's always problematic in our code to get ErrNoRows then hit a constraint on insert. We'd have to make everything an upsert, which sounds like a weird constraint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylecarbs what were you considering instead? Adding the Cipher as a field accessible in the enterprise coderd API struct, and calling it at the handler level? Or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Kyle -- while I agree in principle that this could reduce the overall cognitive overhead, I think this might be better done as a refactor in a follow-up PR instead.

@spikecurtis
Copy link
Contributor

spikecurtis commented Sep 1, 2023

This whole thing feels a little fragile, partially on account of attempting to keep it minimally invasive on the DB structure. In particular, it's not easy to tell that an encryption key is definitely no longer in use. This is very important to security people, especially when you are rotating a key that you think might have been compromised.

The Sentinel is a nice idea, but its current powers are limited. It just makes sure that you have the old key before starting to encrypt with a new key, and it only affects things on startup.

So, if you have multiple coderds working, and update some but not all of them with a new key, you might not immediately notice problems. The old coderds keep working, and when they encounter data they can't decrypt, they make the user re-authenticate and encrypt with the old key. The new coderds don't notice this at all --- they can decrypt data from the old key.

Then, you go and run coder dbcrypt-rotate and encrypt all the data with the new key, and then restart without the old key. In this thought experiment, you miss the same coderds for the same reason, so there are still some coderds out there happily encrypting with the old key you thought you got rid of.


Instead of a singleton sentinel, consider a table of key identities

CREATE TABLE dbcrypt_keys (
    number int NOT NULL PRIMARY KEY,
    active_key_id text,
    revoked_key_id text,
    test text
);

Active keys have the active_key_id set, and revoked keys have the revoked_key_id set (for posterity). test is like the current sentinel, it contains the string coder encrypted with the key, so you can verify that you have the matching key by trying to decrypt it.

For encrypted colums, we break them into two columns: a xxxxxxx_key_id column that has a foreign key relationship to dbcrypt_keys.active_key_id and the regular value column that contains unencrypted data (if key_id is NULL) or encrypted data. The foreign key relationship prevents revoking the key if there is still data encrypted with it, and prevents new data from being inserted once it is revoked.

To rotate a key, you do a transaction that reads the dbcrypt_keys table, verifies you have all the active keys, and then inserts a new key with the next number. The number is a primary key for the table, so multiple coderds can't insert different successor keys.

Then, you migrate all the data off they key you want to revoke, and update the row to NULL active_key_id and set revoked_key_id. Once revoked, coderds can be safely restarted without the key, as they just have to check that they have all active keys.

@johnstcn
Copy link
Member Author

johnstcn commented Sep 1, 2023

Instead of a singleton sentinel, consider a table of key identities

CREATE TABLE dbcrypt_keys (
    number int NOT NULL PRIMARY KEY,
    active_key_id text,
    revoked_key_id text,
    test text
);

Active keys have the active_key_id set, and revoked keys have the revoked_key_id set (for posterity). test is like the current sentinel, it contains the string coder encrypted with the key, so you can verify that you have the matching key by trying to decrypt it.

For encrypted colums, we break them into two columns: a xxxxxxx_key_id column that has a foreign key relationship to dbcrypt_keys.active_key_id and the regular value column that contains unencrypted data (if key_id is NULL) or encrypted data. The foreign key relationship prevents revoking the key if there is still data encrypted with it, and prevents new data from being inserted once it is revoked.

To rotate a key, you do a transaction that reads the dbcrypt_keys table, verifies you have all the active keys, and then inserts a new key with the next number. The number is a primary key for the table, so multiple coderds can't insert different successor keys.

Then, you migrate all the data off they key you want to revoke, and update the row to NULL active_key_id and set revoked_key_id. Once revoked, coderds can be safely restarted without the key, as they just have to check that they have all active keys.

I was initially avoiding making too many schema changes, but this has a number of nice properties:

  1. We can drop the dbcrypt- magic prefix as the null-ness of xxx_key_id determines if the data is encrypted or not,
  2. We can additionally stop prefixing the encrypted value with the key's checksum, as (I assume at least) we would be storing the key checksum in the xxx_key_id column.
  3. We have schema-level referential integrity of encrypted data (cannot overstate how huge this is)

It also has some cons:

  1. You get a proliferation of xxx_key_id columns everywhere that the the AGPL code needs to be at least cursorily aware of even though they should never be set in AGPL code. This is fine from a licensing perspective though as the structs, schema etc. are all AGPL.
  2. Adding new encrypted fields becomes a database migration (although arguably this is also a plus)

Overall it definitely seems like a useful and worthwhile change 👍

@johnstcn
Copy link
Member Author

johnstcn commented Sep 5, 2023

Closing in favour of #9522

@johnstcn johnstcn closed this Sep 5, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2023
@johnstcn johnstcn deleted the cj/dbcrypt_1 branch October 13, 2023 12:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants