Description
Describe the feature you'd like
In non-local mode, I can pass a role to each step of my SageMaker Pipeline and this role will be assumed a step. This helps scoping permissions narrowly as well as working around the IAM limit of 10 managed policies per role and 10240 bytes character limit for inline policies.
processor_select_delivery_periods = sagemaker.processing.Processor(
entrypoint=["bash", "processing.sh"],
role=role_arn, # <- pass role for specific step
instance_type="ml.m5.large",
instance_count=1,
sagemaker_session=signals.sagemaker.pipeline_session.cache,
)
step_args = processor_select_delivery_periods.run(
arguments=...,
)
However, the argument seems not respected in local mode. Even worse, not even the IAM role defined at pipeline level seems to be respected
upsert_response = pipeline.upsert(
role_arn='arn:/xxx/yzzc',
description=description or "",
tags=tags,
)
The credentials seem to be purely generated from the sagemaker_session
passed to Pipeline
:
sagemaker.workflow.pipeline.Pipeline(
name=str(name),
parameters=list(parameters),
steps=[step_a, step_b],
sagemaker_session=sagemaker_session
)
In my understanding, the local mode's docker-compose file (one per step) contains credentials via environment variables (AWS_*) that are then used in the container. Before generating these credentials, one would need use sagemaker_session
to assume the role passed to a step and then generate the credentials from these roles instead of directly from that session.
How would this feature be used? Please describe.
The inconsistency between local and remote execution breaks the highly efficient local developer workflow.
Describe alternatives you've considered
- Refrain from using step-specific roles, i.e. one role for all steps. Drawback is IAM policy restrictions mentioned above.
- Splitting my pipeline into multiple pipelines since it contains many steps that are only loosely related. Drawback: Need another orchestration tool on top.
Additional context
I opened an AWS Support case: 174841854500356.