Skip to content

[Feature Store] Feature group creation: provide a DataCatalogConfig while enabling glue table creation #2916

Closed
@simonvdk

Description

@simonvdk

Use case
Create a feature group with automatic glue table creation for the offline store metadata, while configuring the glue data catalog database and table names

Issue encountered
It seems that providing a DataCatalogConfig and setting disable_glue_table_creation to false are mutually exclusive:

  • I can either not configure the glue database and table names and enable the glue table creation, so that the glue table with the default name and database is created upon feature group creation
  • OR I can provide a DataCatalogConfig but then I have to disable the glue table creation, so that the requested glue table is not created upon feature group creation

But I cannot provide a DataCatalogConfig and enable the glue table creation. Error encountered:

An error occurred (ValidationException) when calling the CreateFeatureGroup operation: Validation Error: DataCatalogConfig is not permitted in the request unless AutoCreateGlueTable is turned off. Please either set AutoCreateGlueTable to false or remove DataCatalogConfig from the request.

Why this seems to be an issue:

  • this behaviour (mutually exclusive) is not mentioned in the documentation. Also, there is no further mention or example of how to configure the offline store data catalog in the documentation
  • given the current state of the documentation, a user may want to configure the name of the glue database and table where the offline store metadata will be stored, while benefiting from the glue table creation upon feature group creation (with all the configuration - schema, storage descriptor etc - coming from the feature group information)
  • this extract from the java SDK documentation seems to indicate that the DataCatalogConfig should not be mutually exclusive with the automatic table creation

Ways to reproduce issue
Reproduced with AWS SDK (2.50.0) and AWS CLI.
Providing an OfflineStoreConfig with both DisableGlueTableCreation=False and a DataCatalogConfig with configured glue database (already created) and a glue table (that does not yet exist) raises the above error. Providing the DataCatalogConfig with DisableGlueTableCreation=True does not raise, but the glue table is not created either.

Example with AWS CLI:

aws sagemaker create-feature-group --cli-input-json '{"EventTimeFeatureName": "timestamp", "Description": "", "RecordIdentifierFeatureName": "record_id", "FeatureDefinitions": [{"FeatureName": "record_id", "FeatureType": "Integral"}, {"FeatureName": "timestamp", "FeatureType": "String"}], "OfflineStoreConfig": {"S3StorageConfig": {"S3Uri": "s3://my_bucket/my_prefix", "KmsKeyId": "arn:aws:kms:region:account_id:key/key_id"}, "DataCatalogConfig": {"TableName": "my_table", "Catalog": "account_id", "Database": "my_db"}, "DisableGlueTableCreation": false}, "FeatureGroupName": "my-feature-group"}'

Expected output
A clearer documentation about how to configure the offline store data catalog (e.g. with an example in a notebook), and possibly the possibility to configure the data catalog while benefiting from the glue table creation

NB: A similar issue has been opened on the aws-cli repository

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions