222 add deployment of example get started experiments model #233

daavoo · 2023-08-10T15:48:11Z

Add Sagemaker deployment.

https://github.com/iterative/example-get-started-experiments/actions/workflows/deploy-model.yml

https://us-east-2.console.aws.amazon.com/sagemaker/home?region=us-east-2#/endpoints/results-train-pool-segmentation-v0-1-0-dev

export AWS_DEFAULT_REGION=us-east-2
python src/endpoint_predict.py \
--img_path data/test_data/REGION_1-24_0_1024_0_1024.jpg \
--endpoint_name results-train-pool-segmentation-v0-1-0-dev

example-get-started-experiments/code/.github/workflows/deploy-model.yml

tapadipti

Added some change suggestions for the deploy GH action. Will add related changes to the deploy-model file.

example-get-started-experiments/code/.github/workflows/deploy-model.yml

tapadipti

Added some suggestions for creating different configs and endpoints per stage, and creating a serverless endpoint.

example-get-started-experiments/code/sagemaker/deploy_model.py

dberenbaum · 2023-08-14T17:13:12Z

example-get-started-experiments/code/sagemaker/deploy_model.py

+    composed_name =  re.sub(
+        r"[^a-zA-Z0-9\-]", "-", f"{name}-{version}-{stage}")


Do you know what possible characters can appear in f"{name}-{version}-{stage}" that would need to be replaced here? I guess it's related to iterative/dvc#9821?

I guess it's related to iterative/dvc#9821?

This regex is only relevant for Sagemaker and it is enforced by their API .
From top of my head, the composed name includes / , : from name and . from version

I think the rationale for the naming restrictions in gto were to make it more likely they would be compatible with systems like sagemaker. Not sure it really helps though. We still hit these restrictions and users don't know why they can't use these characters.

dberenbaum · 2023-08-14T17:27:07Z

example-get-started-experiments/generate.sh

  python src/train.py

 dvc stage add -n evaluate \
  -p base,evaluate \
  -d src/evaluate.py -d models/model.pkl -d data/test_data \
  python src/evaluate.py

+dvc stage add -n sagemaker \
+  -d models/model.pth -o model.tar.gz \
+  'cp models/model.pth sagemaker/code/model.pth && cd sagemaker && tar -cpzf model.tar.gz code/ && cd .. && mv sagemaker/model.tar.gz .  && rm sagemaker/code/model.pth'


Some minor thoughts to try to clean this up a bit:

I haven't tested this, but wondering if we can simplify this at all with something like this?

Suggested change

'cp models/model.pth sagemaker/code/model.pth && cd sagemaker && tar -cpzf model.tar.gz code/ && cd .. && mv sagemaker/model.tar.gz . && rm sagemaker/code/model.pth'

'cp models/model.pth sagemaker/out && cp sagemaker/code sagemaker/out && tar -cpzf model.tar.gz sagemaker/out'

Also, I wonder if it would be better to append directly to dvc.yaml and use list syntax for cmd instead of &&?

Also, I wonder if it would be better to append directly to dvc.yaml and use list syntax for cmd instead of &&?

I didn't manage to use list cmd with stage add (which we use in the generate.sh script , but might be missing something

example-get-started-experiments/generate.sh

dberenbaum

It would be nice to address my comments from this PR, but I don't see any actual blockers here. Nice work @daavoo!

daavoo · 2023-08-14T19:02:40Z

I don't seem to have access:

Can you try with the Sandbox account?
Also, make sure you set us-east-2 as AWS region when querying

What is our plan here? Not a blocker, but do we want to work towards making it public?

I assume we don't want to make the actual endpoint public, but rather a simple UI that queries the endpoint.
I was assuming that, for now, we would be using it for live demos and using the sandbox account.

example-get-started-experiments/code/.github/workflows/deploy-model.yml

shcheklein · 2023-08-14T22:05:25Z

example-get-started-experiments/code/.github/workflows/dvc-studio.yml

@@ -90,7 +90,7 @@ jobs:
      - uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-region: us-east-2
-          role-to-assume: arn:aws:iam::342840881361:role/SandboxUser
+          role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}


do we have to generalize it in this way? (an extra step for us to take care of)

It was suggested by @jesper7 and @tapadipti to use the role as a secret. I assume it has some security implications although I am not sure what is the danger of leaking a role name

@daavoo How did you set it up for https://github.com/iterative/example-get-started-experiments? Should we add it to the readme here?

Note that roles can be public.

shcheklein

Looks really great! looks simple and clen

tapadipti · 2023-08-16T12:40:20Z

@daavoo Looks like this PR is close to getting merged. Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt? I can try to replace the example snippets in the blog post to use your snippets. And you might wanna rewrite some of the text. We'll not have a web UI, but that should be ok.

daavoo · 2023-08-16T14:59:56Z

Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt?

Makes sense to me. I would perhaps also use the opportunity to cut the scope of the post a little by dropping DVC details in favor of pointers to the dvc get-started pages

tapadipti · 2023-08-16T17:07:37Z

Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt?

Makes sense to me. I would perhaps also use the opportunity to cut the scope of the post a little by dropping DVC details in favor of pointers to the dvc get-started pages

Ok. I'll share an updated version of the blog post tomorrow.
@shcheklein FYI since we were discussing this today morning.

daavoo · 2023-08-16T20:04:58Z

Merging as the endpoint is now working. Don't hesitate to open followups

tapadipti · 2023-08-17T08:36:44Z

example-get-started-experiments/code/.github/workflows/deploy-model.yml

+    - run:  dvc remote add -d --local storage s3://dvc-public/remote/get-started-pools
+
+    - run: |
+        MODEL_DATA=$(dvc get --show-url . model.tar.gz)


@daavoo Doesn't this get the model data for the latest commit instead of the specific version we are trying to deploy? From the DVC docs, when --rev is not specified, The latest commit (in the default branch) is used by default when this option is not specified.

I assume that the phrase in the docs is meant for the scenario where you pass a remote URL repo as the first argument.

I think the behavior here is correct, but I am going to double-check.

I think it works because we are using the local repo . and so the current status of the workspace will be used. The workflow uses actions/checkout and runs on the git tag creation so the workspace will be the revision where the tag was created

tapadipti · 2023-08-17T09:17:13Z

example-get-started-experiments/code/sagemaker/deploy_model.py

+    composed_name =  re.sub(
+        r"[^a-zA-Z0-9\-]", "-", f"{name}-{version}-{stage}")
+
+    model = PyTorchModel(


@daavoo If I created a SageMaker model for a given model version before deploying it in one environment (eg, dev), does this code recreate the same model before deploying it in another environment (eg, prod). Or does it skip recreating it and just return a reference to the old model?

It currently creates a new model. If we want to do what you said last, we should only add stage to the composed name after the model has been created.

Referring to this msg - you said that the existing model would be re-used, right? Would removing stage from the model name make this happen?

Would removing stage from the model name make this happen?

Yes

ok, I'll create a PR to remove stage from the model name.

tapadipti · 2023-08-17T09:19:40Z

example-get-started-experiments/code/sagemaker/deploy_model.py

+    )
+
+
+    return model.deploy(


@daavoo If I am trying to deploy a new model version to an existing stage, does this code create a new endpoint or does it update the existing endpoint? (I need to confirm this for the blog post)

@daavoo If I am trying to deploy a new model version to an existing stage, does this code create a new endpoint or does it update the existing endpoint? (I need to confirm this for the blog post)

Not sure I understand the first part but a new endpoint is created from combining the name, version, and stage, so any change to those 3 things result in a new endpoint

So if I deploy v1 to prod today, and v2 to prod tomorrow, will I have 2 different endpoints with 2 different endpoint names? Shouldn't the prod endpoint always be the same? (just like studio prod version is always studio.iterative.ai) So that any clients running inference against the prod endpoint don't have to update their endpoint addresses each time there's a new deployment.

The version shouldn't be part of the endpoint name, right?

Shouldn't the prod endpoint always be the same?

I think it is a matter of opinion and depends on the pattern we expect/want to showcase:

A) The endpoint is directly queried by external consumers
B) There is some app&url that uses the right endpoint internally without exposing it to external consumers.

(just like studio prod version is always studio.iterative.ai) So that any clients running inference against the prod endpoint don't have to update their endpoint addresses each time there's a new deployment.

That is the user-facing app&url and it only "points" to the correct software because there are gitops bumping the internal studio version (i.e. https://github.com/iterative/itops/pull/2238)

So then we should also create a service to delete stale endpoints, right? Else, we will end up with a bunch of endpoints over time.

So then we should also create a service to delete stale endpoints, right? Else, we will end up with a bunch of endpoints over time.

I would expect that to be configured / handled on the AWS side.

Anyhow, as per your suggestions, I will send a new P.R. creating/updating a single endpoint per stage

tapadipti · 2023-08-17T09:43:09Z

example-get-started-experiments/code/src/train.py

@@ -68,6 +68,7 @@ def train():
        models_dir = Path("models")
        models_dir.mkdir(exist_ok=True)
        learn.export(fname=(models_dir / "model.pkl").absolute())
+        torch.save(learn.model, (models_dir / "model.pth").absolute())
        live.log_artifact(


@daavoo If we logged the tar file of the model, then we wouldn't need to specify the file name model.tar.gz in the deployment script, right? The deployment script would be able to get the file name from the Git tag itself, which means that a single deployment script could be used for deploying several models with different names. Right?

Right now, in the blog post, I show that there's an artifact entry for model.pkl (so the model registry shows model.pkl). But the file that is getting deployed is model.tar.gz. We know that they are the same model, but it still looks like a disconnect between the model registry and the deployment script.

As discussed in #233 (comment)

dberenbaum · 2023-08-17T15:10:28Z

Agree with @tapadipti that it makes sense to have one endpoint per stage or per version. Otherwise, I think we kind of miss the point of the registry (you can deploy every update to a new endpoint without it). IMO one endpoint per stage makes the most sense to drive home the value of that field, and I think we should focus on this being a self-contained deployment (you can do deployment without needing a separate engineering team to pick up the new model endpoint).

daavoo added 7 commits August 10, 2023 13:04

Add Sagemaker deployment.

b5e29ea

Update remote

43efd66

Update to openid

d235947

updates to deploy

3e6f2ab

updates to use keys

90cbba5

Update links

f04d5a2

Fix README. Update region

9933b75

daavoo requested review from dberenbaum, shcheklein and tapadipti August 10, 2023 15:49

region update

986c43d

daavoo commented Aug 11, 2023

View reviewed changes

example-get-started-experiments/code/.github/workflows/deploy-model.yml Outdated Show resolved Hide resolved

daavoo added 2 commits August 14, 2023 08:27

Use sandbox role

33ae6e5

Update region

3421626

tapadipti reviewed Aug 14, 2023

View reviewed changes

example-get-started-experiments/code/sagemaker/deploy_model.py Outdated Show resolved Hide resolved

example-get-started-experiments/code/sagemaker/deploy_model.py Show resolved Hide resolved

example-get-started-experiments/code/sagemaker/deploy_model.py Show resolved Hide resolved

daavoo added 4 commits August 14, 2023 09:21

us-east-2

966087d

Use secrets.AWS_ROLE_TO_ASSUME

efe4d51

updates

e81c6dd

Add serverless_inference_config

a72fdbf

daavoo marked this pull request as ready for review August 14, 2023 12:20

checkout

60b08d5

daavoo self-assigned this Aug 14, 2023

daavoo added the A: example-get-started-experiments DVC Experiment, DVCLive examples label Aug 14, 2023

daavoo added 5 commits August 14, 2023 13:30

Update pattern

85b2a1e

fixes

7fb7439

cleanup name

9f3609a

Add version

7615d0d

fix name

7797fcd

dberenbaum reviewed Aug 14, 2023

View reviewed changes

example-get-started-experiments/generate.sh Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

dberenbaum approved these changes Aug 14, 2023

View reviewed changes

This comment was marked as resolved.

Sign in to view

shcheklein reviewed Aug 14, 2023

View reviewed changes

example-get-started-experiments/code/.github/workflows/deploy-model.yml Outdated Show resolved Hide resolved

shcheklein reviewed Aug 14, 2023

View reviewed changes

daavoo added 3 commits August 16, 2023 18:13

Increase deploy resources and add explicit fastai requirement

48a9661

Add map_location

e557d0a

Update endpoint to resize

4067cea

daavoo merged commit 85cf558 into master Aug 16, 2023

daavoo deleted the 222-add-deployment-of-example-get-started-experiments-model branch August 16, 2023 20:05

This was referenced Aug 16, 2023

Unclear reference to a pull request in example-get-started-experiments README.md #230

Closed

Add deployment of example-get-started-experiments model #222

Closed

tapadipti reviewed Aug 17, 2023

View reviewed changes

tapadipti added a commit that referenced this pull request Aug 17, 2023

Remove stage from model name

9e069cd

As discussed in #233 (comment)

tapadipti mentioned this pull request Aug 17, 2023

Remove stage from model name #236

Closed

daavoo mentioned this pull request Aug 28, 2023

WIP: Single endpoint per stage #242

Merged

0x2b3bfa0 mentioned this pull request Sep 17, 2023

Use AWS_SAGEMAKER_ROLE #254

Merged

		composed_name = re.sub(
		r"[^a-zA-Z0-9\-]", "-", f"{name}-{version}-{stage}")

	'cp models/model.pth sagemaker/code/model.pth && cd sagemaker && tar -cpzf model.tar.gz code/ && cd .. && mv sagemaker/model.tar.gz . && rm sagemaker/code/model.pth'
	'cp models/model.pth sagemaker/out && cp sagemaker/code sagemaker/out && tar -cpzf model.tar.gz sagemaker/out'

222 add deployment of example get started experiments model #233

222 add deployment of example get started experiments model #233

Conversation

daavoo commented Aug 10, 2023 • edited Loading

tapadipti left a comment

Choose a reason for hiding this comment

tapadipti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daavoo Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

dberenbaum left a comment

Choose a reason for hiding this comment

daavoo commented Aug 14, 2023

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shcheklein left a comment

Choose a reason for hiding this comment

tapadipti commented Aug 16, 2023 • edited Loading

daavoo commented Aug 16, 2023

tapadipti commented Aug 16, 2023

daavoo commented Aug 16, 2023

Choose a reason for hiding this comment

daavoo Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tapadipti Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dberenbaum commented Aug 17, 2023

daavoo commented Aug 10, 2023 •

edited

Loading

daavoo Aug 14, 2023 •

edited

Loading

tapadipti commented Aug 16, 2023 •

edited

Loading

daavoo Aug 17, 2023 •

edited

Loading

tapadipti Aug 17, 2023 •

edited

Loading