-
Notifications
You must be signed in to change notification settings - Fork 13
Update to yolo #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to yolo #232
Conversation
|
||
git rm TrainSegModel.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a git tag for the notebook state.
Feels more natural to drop it once converted to DVC pipeline
@@ -107,47 +103,3 @@ This tag also contains a GitHub Actions workflow that reruns the pipeline if any | |||
changes are introduced to the pipeline-related files. | |||
[CML](https://cml.dev/) is used in this workflow to provision a cloud-based GPU | |||
machine as well as report model performance results in Pull Requests. | |||
|
|||
## Deploying the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this section for now.
Will properly close #230 on a subsequent P.R. adding the sagemaker code
Try to improve look of images painted by YOLO
Thoughts on yolo:
A couple small Studio UI issues:
|
fire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but I vote to drop fire and use argparse. The scripts will still be plenty short and I think they will be more transparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
desc="This is a Computer Vision (CV) model that's segmenting out swimming pools from satellite images.", | ||
labels=["cv", "segmentation", "satellite-images", params.train.arch], | ||
) | ||
yolo.train(data="datasets/yolo_dataset.yaml", epochs=epochs, imgsz=imgsz, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not track datasets/yolo_dataset.yaml
in dvc.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason. Can totally do it.
It is just a description of where paths are because YOLO wants it to be in a file so felt like not really relevant
yolo settings datasets_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/datasets/" | ||
yolo settings runs_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/runs/" | ||
yolo settings weights_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/weights/" | ||
jupyter nbconvert --execute 'TrainSegModel.ipynb' --inplace | ||
git add . | ||
tick | ||
git commit -m "Run notebook and apply best experiment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still applying the best experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, do you think we should keep that logic? I am not even sure if people realize what was going one without checking this script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just meant we should change the commit message
-d src/train.py -d datasets/yolo_dataset/train -d datasets/yolo_dataset/val \ | ||
"python src/train.py \${train}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels odd to not have the model as an output of the stage here to me.
Also, why do have train and val datasets as separate deps/outs instead of tracking the whole dataset as one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels odd to not have the model as an output of the stage here to me.
I can make the changes for that. I know odd but couldn't find a reason why it matters (maybe I should add additional stage)
Also, why do have train and val datasets as separate deps/outs instead of tracking the whole dataset as one?
So in Studio you can use Data columns to show the number of files separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make the changes for that. I know odd but couldn't find a reason why it matters (maybe I should add additional stage)
Does iterative/dvclive#631 break anything here since it skips caching if _inside_dvc_exp=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does iterative/dvclive#631 break anything here since it skips caching if _inside_dvc_exp=True?
Because the .dvc file is created at notebook stage, I believe there is no difference because DVC will (dvc) commit the changes at the end
Overall, I think it's great and shows off how powerful and simple the tools can be, but feels more like a showcase of functionality than a way to teach about dvc experiments. I would like to at least take some steps on iterative/dvclive#603 if we make this our default example project. Although it's hard to maintain them all as auto-generated repos, I think we need many examples of different frameworks, so I would urge that we should try to keep both projects in some form instead of replacing one with the other. |
Not sure how/what you are reading. I could talk for a while about metrics for these tasks 😅 The metric currently selected is an instance segmentation one (pmAP 0.5-0.95) and it doesn't hold the same meaning as accuracy or similar (like, I expect
The point is not (only?) to modularize but to use
It is automatically added as a callback if it is installed: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/callbacks/dvc.py This is the way YOLO currently handles the integrations.
So, I am assuming you did not have that worry with the previous fast.ai code because the callback was passed explicitly?
Yes, feel free to add/show whatever.
Same as above, just the view I added, doesn't have to be what we show. It is what I prefer but I don't know if that is the best way to show as an example. |
Sorry, maybe it was already implicit in the previous comments but could you summarize the point that makes it the case for this example vs the current one? What I got as main differences:
Is there something else? |
Sorry, I was reading it wrong 😄 . I have used mAP but here I was looking at the image masks. I think it is just harder to see whether the red bounding boxes overlay an actual pool. Anyway, disregard that comment.
I don't have that strong an opinion on which one should be the "default." I think we need to keep both (and in that case, I'm fine with keeping the deployment in this one) because they showcase different features and frameworks and we need more examples. Reasons why I wouldn't push to make this one the "default":
|
Yes, the default built-in visualization is ... not great.
Makes sense. |
Will make it another repo, converting to draft for now |
Deployed:
https://github.com/daavoo/example-get-started-experiments
https://studio.iterative.ai/team/Iterative/projects/example-get-started-experiments-bhm5eg8le6
Make the decision because:
Not only in general, but in our particular case more than 1 customer/prospect has explicitly reported using YOLO but (afaik) we did not hear that from fast.ai.
See train.py vs previous with fast.ai
No need to separate evaluate.py stage because YOLO loads the best model and evaluates it at the end of training.
TODO:
trainer.best
artifact astype=model
ultralytics/ultralytics#4143