Description
Use case
Create a feature group with automatic glue table creation for the offline store metadata, while configuring the glue data catalog database and table names
Issue encountered
It seems that providing a DataCatalogConfig and setting disable_glue_table_creation to false are mutually exclusive:
- I can either not configure the glue database and table names and enable the glue table creation, so that the glue table with the default name and database is created upon feature group creation
- OR I can provide a DataCatalogConfig but then I have to disable the glue table creation, so that the requested glue table is not created upon feature group creation
But I cannot provide a DataCatalogConfig and enable the glue table creation. Error encountered:
An error occurred (ValidationException) when calling the CreateFeatureGroup operation: Validation Error: DataCatalogConfig is not permitted in the request unless AutoCreateGlueTable is turned off. Please either set AutoCreateGlueTable to false or remove DataCatalogConfig from the request.
Why this seems to be an issue:
- this behaviour (mutually exclusive) is not mentioned in the documentation. Also, there is no further mention or example of how to configure the offline store data catalog in the documentation
- given the current state of the documentation, a user may want to configure the name of the glue database and table where the offline store metadata will be stored, while benefiting from the glue table creation upon feature group creation (with all the configuration - schema, storage descriptor etc - coming from the feature group information)
- this extract from the java SDK documentation seems to indicate that the
DataCatalogConfig
should not be mutually exclusive with the automatic table creation
Ways to reproduce issue
Reproduced with AWS SDK (2.50.0) and AWS CLI.
Providing an OfflineStoreConfig
with both DisableGlueTableCreation=False
and a DataCatalogConfig
with configured glue database (already created) and a glue table (that does not yet exist) raises the above error. Providing the DataCatalogConfig
with DisableGlueTableCreation=True
does not raise, but the glue table is not created either.
Example with AWS CLI:
aws sagemaker create-feature-group --cli-input-json '{"EventTimeFeatureName": "timestamp", "Description": "", "RecordIdentifierFeatureName": "record_id", "FeatureDefinitions": [{"FeatureName": "record_id", "FeatureType": "Integral"}, {"FeatureName": "timestamp", "FeatureType": "String"}], "OfflineStoreConfig": {"S3StorageConfig": {"S3Uri": "s3://my_bucket/my_prefix", "KmsKeyId": "arn:aws:kms:region:account_id:key/key_id"}, "DataCatalogConfig": {"TableName": "my_table", "Catalog": "account_id", "Database": "my_db"}, "DisableGlueTableCreation": false}, "FeatureGroupName": "my-feature-group"}'
Expected output
A clearer documentation about how to configure the offline store data catalog (e.g. with an example in a notebook), and possibly the possibility to configure the data catalog while benefiting from the glue table creation
NB: A similar issue has been opened on the aws-cli repository