Skip to content

feat: Add database data generator to make fakedbs easier to populate #5922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jan 31, 2023

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Jan 30, 2023

What this does

Supports an easy way to populate random data into a fake database for unit testing.

How this works

You can create a generator around a fake database.

gen := databasefake.NewGenerator(t, db)

And generate whatever objects you want. If you omit a field, that field is randomly generated.

user := gen.User(ctx, database.User{})
template := gen.Template(ctx, database.Template{})
workspace := gen.Workspace(ctx, database.Workspace{
	OwnerID:    user.ID,
	TemplateID: template.ID,
})

Why?

I think there is a good reason to use fake db, and not need to use Postgres for all tests. Postgres tests are much slower, and sometimes using postgres does not provide any additional value.

We often make setup functions for these tests to pre-populate data. This is ok, but does constrain each test case to use the same setup. The setup also has to write code like this:

orgID := uuid.New()
organization, err := db.InsertOrganization(ctx, database.InsertOrganizationParams{
ID: orgID,
Name: "banana",
Description: "wowie",
CreatedAt: database.Now(),
UpdatedAt: database.Now(),
})
require.NoError(t, err)
group, err := db.InsertGroup(ctx, database.InsertGroupParams{
ID: uuid.New(),
Name: "yeww",
OrganizationID: organization.ID,
})
require.NoError(t, err)

This code is repetitive and kinda annoying to write. The goal of this PR to is make pre-populating data easy. That previous code now looks like this:

db := databasefake.New()
gen := databasefake.NewGenerator(t, db)
group := gen.Group(ctx, "group", database.Group{})

This PR also supports some more complex syntax and out-of-order objects.

More complex API

A more complex api also exists. This is to support data sets as a single object to be reused, or easily stored in a field.

Example:

// This creates a user, a template, a template version, a workspace, a build, a file, and
// a provisioner job. All related.
data := gen.Populate(ctx, map[string]interface{}{
	"u-one": database.User{},
	"t-one": database.Template{},
	"tv-one": database.TemplateVersion{
		TemplateID: uuid.NullUUID{
			UUID:  gen.Lookup("t-one"),
			Valid: true,
		},
	},
	"w-one": database.Workspace{
		Name:       "peter-pan",
		OwnerID:    gen.Lookup("u-one"),
		TemplateID: gen.Lookup("t-one"),
	},
	"b-one": database.WorkspaceBuild{
		JobID:       gen.Lookup("j-one"),
		WorkspaceID: gen.Lookup("w-one"),
	},
	"f-one": database.File{},
	"j-one": database.ProvisionerJob{
		InitiatorID:   gen.Lookup("u-one"),
		Provisioner:   database.ProvisionerTypeEcho,
		StorageMethod: database.ProvisionerStorageMethodFile,
		FileID:        gen.Lookup("f-one"),
		Type:          database.ProvisionerJobTypeWorkspaceBuild,
		Input: must(json.Marshal(provisionerdserver.WorkspaceProvisionJob{
			WorkspaceBuildID: gen.Lookup("b-one"),
		})),
	},
})

One example shown in groupparm_test.go. More examples coming
Comment on lines 27 to 28
gen := databasefake.NewGenerator(t, db)
group := gen.Group(ctx, "group", database.Group{})
Copy link
Member Author

@Emyrk Emyrk Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do ya'll think about this syntax?

Also supported:

org := gen.Organization(ctx, "group-org", database.Organization{})
group := gen.Group(ctx, "group", database.Group{
	OrganizationID: gen.Lookup("group-org"),
	// OR
	// OrganizationID: org.ID,
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also go out of order with lookup:

group := gen.Group(ctx, "group", database.Group{
	OrganizationID: gen.Lookup("group-org"),
})
org := gen.Organization(ctx, "group-org", database.Organization{})

@Emyrk Emyrk marked this pull request as ready for review January 30, 2023 21:27
@Emyrk
Copy link
Member Author

Emyrk commented Jan 30, 2023

I refactored a lot of examples. I can do more.


// PrimaryOrg is to keep all resources in the same default org if not
// specified.
func (g *Generator) PrimaryOrg(ctx context.Context) database.Organization {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool, nice work!


func must[T any](value T, err error) T {
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid panicing in tests, maybe pass t and do t.Fatal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at where I do it, it is used for json.Marshal on a standard struct. This should absolutely never fail. I cannot add another arg, because you cannot do:

must(t, json.Marshal())

You can't use multiple returns and an argument like that 😢. The must is a nice way to keep the context in the struct body.

@Emyrk Emyrk requested review from mafredri and removed request for kylecarbs January 31, 2023 16:16
Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the intent of this is to reduce code duplication in favor of creating a common abstraction for populating the database. This PR does the opposite in that case because it nets positive on lines of code.

I'm not sure we should add this level of abstraction right now. I like the experiment, but it's unclear why I'd use this when the lines aren't dramatically reduced, and it becomes more difficult to understand what gets inserted.

Just my 2c, but I'm open to others' thoughts!

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

I had something similar before. I had a custom case for every enum (because reflect can't get the consts that are valid). I was having trouble with supporting nested structs, which I guess we do not need to support.

What do you do for things like ExpiredAt and CreatedAt though? Both time.Time{}, but one should be in the future, one should be now or in the past. So you either need struct tags, or do something with the field names.

Other fields which need custom handling off the top of my head:

  • All enums
  • RbacRoles is []string{}
  • Name is usually regex limited. We could just make all string alpha only.
  • AvatarURL is a string, should be a valid url
  • []byte is sometimes json, sometimes a hash
  • IPAddress probably handle custom

I also don't want truly random data. If you don not specify RBACRoles, we should default to a member. Provisioner should be echo for tests. So the faker should accept some defaults instead of random in many cases no?

@kylecarbs
Copy link
Member

I guess in my mind we shouldn't handle these bespoke situations with a generator anyways, because they are bespoke, so why generalize it?

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

Pretty much all fakers allow domain knowledge in the generator.

The problem with truly random data, is that is usually isn't useful. If we go that route, it's more like fuzzing than faking. If that is what we want, I can support the reflect path better.


Just to put the differences in the switch statement.

	case database.Workspace:
		workspace, err := db.InsertWorkspace(ctx, database.InsertWorkspaceParams{
			ID:                takeFirst(orig.ID, uuid.New()),
			OwnerID:           takeFirst(orig.OwnerID, uuid.New()),
			CreatedAt:         takeFirst(orig.CreatedAt, time.Now()),
			UpdatedAt:         takeFirst(orig.UpdatedAt, time.Now()),
			OrganizationID:    takeFirst(orig.OrganizationID, uuid.New()),
			TemplateID:        takeFirst(orig.TemplateID, uuid.New()),
			Name:              takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
			AutostartSchedule: orig.AutostartSchedule,
			Ttl:               orig.Ttl,
		})
		require.NoError(t, err, "insert workspace")
		return workspace

vs

	case database.Workspace:
        // Reflect based random data seeded with the original data.
		value := random(orig)
		workspace, err :=  db.InsertWorkspace(ctx, value)
		require.NoError(t, err, "insert workspace")
		return workspace

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

I think too the reflect based method would be inherently more challenging to maintain (unless we use a library). I think the current code is pretty straight forward for a new person to look at.

@kylecarbs
Copy link
Member

Hmm, I see what you mean. I think what I'd prefer here is:

package databasegen

func New(db database.Store) *Generator {
  return &Generator{db}
}

type Generator struct {
  db database.Store
}

func (g *Generator) Workspace(workspace database.Workspace) database.Workspace {
  if workspace.RequiredField == nil {
    panic("asdf")
  } 

}

func (g *Generator) APIKey() (database.APIKey, string) {

}

Kinda a combo of what you have now and something new. I don't think it should go in databasefake, because it's not specific to that abstraction (if I understand correctly), and I want PSQL tests to benefit from this as well.

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

I liked the idea of removing state, so I updated the code to look like this: https://github.com/coder/coder/blob/stevenmasley/fake_data/coderd/database/databasefake/generator.go

You are right about not storing state in the generator imo. This change does remove the need for the switch, so I could do:

func Template(t *testing.T, db database.Store, seed database.Template) database.Template {
	template, err := db.InsertTemplate(context.Background(), database.InsertTemplateParams{
		ID:                           takeFirst(seed.ID, uuid.New()),
		CreatedAt:                    takeFirst(seed.CreatedAt, time.Now()),
		UpdatedAt:                    takeFirst(seed.UpdatedAt, time.Now()),
		OrganizationID:               takeFirst(seed.OrganizationID, uuid.New()),
		Name:                         takeFirst(seed.Name, namesgenerator.GetRandomName(1)),
		Provisioner:                  takeFirst(seed.Provisioner, database.ProvisionerTypeEcho),
		ActiveVersionID:              takeFirst(seed.ActiveVersionID, uuid.New()),
		Description:                  takeFirst(seed.Description, namesgenerator.GetRandomName(1)),
		DefaultTTL:                   takeFirst(seed.DefaultTTL, 3600),
		CreatedBy:                    takeFirst(seed.CreatedBy, uuid.New()),
		Icon:                         takeFirst(seed.Icon, namesgenerator.GetRandomName(1)),
		UserACL:                      seed.UserACL,
		GroupACL:                     seed.GroupACL,
		DisplayName:                  takeFirst(seed.DisplayName, namesgenerator.GetRandomName(1)),
		AllowUserCancelWorkspaceJobs: takeFirst(seed.AllowUserCancelWorkspaceJobs, true),
	})
	require.NoError(t, err, "insert template")
	return template
}

We can add required fields, but at present no objects have any required fields. I put it in the databasefake package originally because for postgres you have the problem of foreign key constraints. I can let people use it for postgres though, just know the generator does not have the intelligence to know of dependencies.

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

Example of the new package:

func Template(t *testing.T, db database.Store, seed database.Template) database.Template {
template, err := db.InsertTemplate(context.Background(), database.InsertTemplateParams{
ID: takeFirst(seed.ID, uuid.New()),
CreatedAt: takeFirst(seed.CreatedAt, time.Now()),
UpdatedAt: takeFirst(seed.UpdatedAt, time.Now()),
OrganizationID: takeFirst(seed.OrganizationID, uuid.New()),
Name: takeFirst(seed.Name, namesgenerator.GetRandomName(1)),
Provisioner: takeFirst(seed.Provisioner, database.ProvisionerTypeEcho),
ActiveVersionID: takeFirst(seed.ActiveVersionID, uuid.New()),
Description: takeFirst(seed.Description, namesgenerator.GetRandomName(1)),
DefaultTTL: takeFirst(seed.DefaultTTL, 3600),
CreatedBy: takeFirst(seed.CreatedBy, uuid.New()),
Icon: takeFirst(seed.Icon, namesgenerator.GetRandomName(1)),
UserACL: seed.UserACL,
GroupACL: seed.GroupACL,
DisplayName: takeFirst(seed.DisplayName, namesgenerator.GetRandomName(1)),
AllowUserCancelWorkspaceJobs: takeFirst(seed.AllowUserCancelWorkspaceJobs, true),
})
require.NoError(t, err, "insert template")
return template
}
func APIKey(t *testing.T, db database.Store, seed database.APIKey) (key database.APIKey, secret string) {
id, _ := cryptorand.String(10)
secret, _ = cryptorand.String(22)
hashed := sha256.Sum256([]byte(secret))
key, err := db.InsertAPIKey(context.Background(), database.InsertAPIKeyParams{
ID: takeFirst(seed.ID, id),
// 0 defaults to 86400 at the db layer
LifetimeSeconds: takeFirst(seed.LifetimeSeconds, 0),
HashedSecret: takeFirstBytes(seed.HashedSecret, hashed[:]),
IPAddress: pqtype.Inet{},
UserID: takeFirst(seed.UserID, uuid.New()),
LastUsed: takeFirst(seed.LastUsed, time.Now()),
ExpiresAt: takeFirst(seed.ExpiresAt, time.Now().Add(time.Hour)),
CreatedAt: takeFirst(seed.CreatedAt, time.Now()),
UpdatedAt: takeFirst(seed.UpdatedAt, time.Now()),
LoginType: takeFirst(seed.LoginType, database.LoginTypePassword),
Scope: takeFirst(seed.Scope, database.APIKeyScopeAll),
})
require.NoError(t, err, "insert api key")
return key, secret
}
func Workspace(t *testing.T, db database.Store, orig database.Workspace) database.Workspace {
workspace, err := db.InsertWorkspace(context.Background(), database.InsertWorkspaceParams{
ID: takeFirst(orig.ID, uuid.New()),
OwnerID: takeFirst(orig.OwnerID, uuid.New()),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
OrganizationID: takeFirst(orig.OrganizationID, uuid.New()),
TemplateID: takeFirst(orig.TemplateID, uuid.New()),
Name: takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
AutostartSchedule: orig.AutostartSchedule,
Ttl: orig.Ttl,
})
require.NoError(t, err, "insert workspace")
return workspace
}
func WorkspaceBuild(t *testing.T, db database.Store, orig database.WorkspaceBuild) database.WorkspaceBuild {
build, err := db.InsertWorkspaceBuild(context.Background(), database.InsertWorkspaceBuildParams{
ID: takeFirst(orig.ID, uuid.New()),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
WorkspaceID: takeFirst(orig.WorkspaceID, uuid.New()),
TemplateVersionID: takeFirst(orig.TemplateVersionID, uuid.New()),
BuildNumber: takeFirst(orig.BuildNumber, 0),
Transition: takeFirst(orig.Transition, database.WorkspaceTransitionStart),
InitiatorID: takeFirst(orig.InitiatorID, uuid.New()),
JobID: takeFirst(orig.JobID, uuid.New()),
ProvisionerState: takeFirstBytes(orig.ProvisionerState, []byte{}),
Deadline: takeFirst(orig.Deadline, time.Now().Add(time.Hour)),
Reason: takeFirst(orig.Reason, database.BuildReasonInitiator),
})
require.NoError(t, err, "insert workspace build")
return build
}
func User(t *testing.T, db database.Store, orig database.User) database.User {
user, err := db.InsertUser(context.Background(), database.InsertUserParams{
ID: takeFirst(orig.ID, uuid.New()),
Email: takeFirst(orig.Email, namesgenerator.GetRandomName(1)),
Username: takeFirst(orig.Username, namesgenerator.GetRandomName(1)),
HashedPassword: takeFirstBytes(orig.HashedPassword, []byte{}),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
RBACRoles: []string{},
LoginType: takeFirst(orig.LoginType, database.LoginTypePassword),
})
require.NoError(t, err, "insert user")
return user
}
func Organization(t *testing.T, db database.Store, ctx context.Context, orig database.Organization) database.Organization {
org, err := db.InsertOrganization(context.Background(), database.InsertOrganizationParams{
ID: takeFirst(orig.ID, uuid.New()),
Name: takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
Description: takeFirst(orig.Description, namesgenerator.GetRandomName(1)),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
})
require.NoError(t, err, "insert organization")
return org
}
func Group(t *testing.T, db database.Store, ctx context.Context, orig database.Group) database.Group {
group, err := db.InsertGroup(context.Background(), database.InsertGroupParams{
ID: takeFirst(orig.ID, uuid.New()),
Name: takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
OrganizationID: takeFirst(orig.OrganizationID, uuid.New()),
AvatarURL: takeFirst(orig.AvatarURL, "https://logo.example.com"),
QuotaAllowance: takeFirst(orig.QuotaAllowance, 0),
})
require.NoError(t, err, "insert group")
return group
}
func ProvisionerJob(t *testing.T, db database.Store, ctx context.Context, orig database.ProvisionerJob) database.ProvisionerJob {
job, err := db.InsertProvisionerJob(context.Background(), database.InsertProvisionerJobParams{
ID: takeFirst(orig.ID, uuid.New()),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
OrganizationID: takeFirst(orig.OrganizationID, uuid.New()),
InitiatorID: takeFirst(orig.InitiatorID, uuid.New()),
Provisioner: takeFirst(orig.Provisioner, database.ProvisionerTypeEcho),
StorageMethod: takeFirst(orig.StorageMethod, database.ProvisionerStorageMethodFile),
FileID: takeFirst(orig.FileID, uuid.New()),
Type: takeFirst(orig.Type, database.ProvisionerJobTypeWorkspaceBuild),
Input: takeFirstBytes(orig.Input, []byte("{}")),
Tags: orig.Tags,
})
require.NoError(t, err, "insert job")
return job
}
func WorkspaceResource(t *testing.T, db database.Store, ctx context.Context, orig database.WorkspaceResource) database.WorkspaceResource {
resource, err := db.InsertWorkspaceResource(context.Background(), database.InsertWorkspaceResourceParams{
ID: takeFirst(orig.ID, uuid.New()),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
JobID: takeFirst(orig.JobID, uuid.New()),
Transition: takeFirst(orig.Transition, database.WorkspaceTransitionStart),
Type: takeFirst(orig.Type, "fake_resource"),
Name: takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
Hide: takeFirst(orig.Hide, false),
Icon: takeFirst(orig.Icon, ""),
InstanceType: sql.NullString{
String: takeFirst(orig.InstanceType.String, ""),
Valid: takeFirst(orig.InstanceType.Valid, false),
},
DailyCost: takeFirst(orig.DailyCost, 0),
})
require.NoError(t, err, "insert resource")
return resource
}
func File(t *testing.T, db database.Store, ctx context.Context, orig database.File) database.File {
file, err := db.InsertFile(context.Background(), database.InsertFileParams{
ID: takeFirst(orig.ID, uuid.New()),
Hash: takeFirst(orig.Hash, hex.EncodeToString(make([]byte, 32))),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
CreatedBy: takeFirst(orig.CreatedBy, uuid.New()),
Mimetype: takeFirst(orig.Mimetype, "application/x-tar"),
Data: takeFirstBytes(orig.Data, []byte{}),
})
require.NoError(t, err, "insert file")
return file
}
func UserLink(t *testing.T, db database.Store, ctx context.Context, orig database.UserLink) database.UserLink {
link, err := db.InsertUserLink(context.Background(), database.InsertUserLinkParams{
UserID: takeFirst(orig.UserID, uuid.New()),
LoginType: takeFirst(orig.LoginType, database.LoginTypeGithub),
LinkedID: takeFirst(orig.LinkedID),
OAuthAccessToken: takeFirst(orig.OAuthAccessToken, uuid.NewString()),
OAuthRefreshToken: takeFirst(orig.OAuthAccessToken, uuid.NewString()),
OAuthExpiry: takeFirst(orig.OAuthExpiry, time.Now().Add(time.Hour*24)),
})
require.NoError(t, err, "insert link")
return link
}
func TemplateVersion(t *testing.T, db database.Store, orig database.TemplateVersion) database.TemplateVersion {
version, err := db.InsertTemplateVersion(context.Background(), database.InsertTemplateVersionParams{
ID: takeFirst(orig.ID, uuid.New()),
TemplateID: uuid.NullUUID{
UUID: takeFirst(orig.TemplateID.UUID, uuid.New()),
Valid: takeFirst(orig.TemplateID.Valid, true),
},
OrganizationID: takeFirst(orig.OrganizationID, uuid.New()),
CreatedAt: takeFirst(orig.CreatedAt, time.Now()),
UpdatedAt: takeFirst(orig.UpdatedAt, time.Now()),
Name: takeFirst(orig.Name, namesgenerator.GetRandomName(1)),
Readme: takeFirst(orig.Readme, namesgenerator.GetRandomName(1)),
JobID: takeFirst(orig.JobID, uuid.New()),
CreatedBy: takeFirst(orig.CreatedBy, uuid.New()),
})
require.NoError(t, err, "insert template version")
return version

No generics (except for take), no reflect. Actually pretty clean and similar to our current helper functions we write.

@kylecarbs
Copy link
Member

Agreed. I like that @Emyrk

@kylecarbs
Copy link
Member

We should move it out of databasefake tho. I think it can be in coderd/database/databasegen

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

Agreed. This came out way cleaner. Will refactor the tests I changed to use this.

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

It also will work with postgres, the caller just needs to handle the constraints on their own.

Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like the simplicity!

Thanks for iterating here @Emyrk. Your explanations helped drive a better UX for all!

Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might wanna add a package comment to generator.go that denotes that fields will be generated if not provided. It's pretty implicit, but might be helpful to make it explicit.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and clean. 👍
Thanks for the input @kylecarbs!
Just a few import ordering nits, and potential bikeshedding over the name dbgen instead of databasegen.

@Emyrk
Copy link
Member Author

Emyrk commented Jan 31, 2023

I think you are right, dbgen it is, then I'll merge

@Emyrk Emyrk merged commit 4a6fc40 into main Jan 31, 2023
@Emyrk Emyrk deleted the stevenmasley/fake_data branch January 31, 2023 21:10
@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2023
@kylecarbs
Copy link
Member

We should probably rename databasefake to dbfake then. Consistency is important, so I didn't want to break it just for this package.

@kylecarbs
Copy link
Member

@Emyrk thoughts on the above?

@Emyrk
Copy link
Member Author

Emyrk commented Feb 2, 2023

I like dbfake.

We havedbtestutil and dbtype already.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants