-
Notifications
You must be signed in to change notification settings - Fork 128
New features, including COW and warmup options, are added and configurable #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -75,6 +75,8 @@ type FeaturesConfig struct { | |||
Import_cache string `json:"import_cache"` | |||
Downsize_paused_mem bool `json:"downsize_paused_mem"` | |||
Enable_seccomp bool `json:"enable_seccomp"` | |||
Warmup bool `json:"warmup"` | |||
COW bool `json:"COW"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does COW do? At a glance, it seems like it determines whether you use Zygotes to create new sandboxes. If you don't want to use Zygotes, you can just choose not to have a Zygote tree by configuring the "Import_cache" setting ahead. So I'm wondering whether it's necessary to have a second config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in cow we'll still have every zygotes in the tree, but they don't share the memory anymore. Let's say we have a tree root->A->[A,B], if cow is disabled, it will create a sandbox import A and another sandbox import A.B separately.
@@ -75,6 +75,8 @@ type FeaturesConfig struct { | |||
Import_cache string `json:"import_cache"` | |||
Downsize_paused_mem bool `json:"downsize_paused_mem"` | |||
Enable_seccomp bool `json:"enable_seccomp"` | |||
Warmup bool `json:"warmup"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warmup could mean a lot of things -- probably use a longer name referencing "import cache" or zygotes" so folks know what this is about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, should we be pulling all zygote-related options together into the same struct? It's starting to get a little scattered with Import_cache_tree, Import_cache, Warmup...
@@ -9,5 +9,6 @@ type ZygoteProvider interface { | |||
Create(childSandboxPool sandbox.SandboxPool, isLeaf bool, | |||
codeDir, scratchDir string, meta *sandbox.SandboxMeta, | |||
rt_type common.RuntimeType) (sandbox.Sandbox, error) | |||
Warmup() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate call for this? The Warmup option you added is globally visible. Why not just check whether this is enabled in NewImportCache, and do the warmup there if so?
Children []*ImportCacheNode `json:"children"` | ||
Packages []string `json:"packages"` | ||
Children []*ImportCacheNode `json:"children"` | ||
SplitGeneration int `json:"split_generation"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see the value in including this in the deployment of the tree. Isn't it an artifact of how you currently happen to generate a tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"packages", "children" exists in original struct. "SplitGeneration" exists for print log message in warmup
@@ -233,15 +249,20 @@ func (cache *ImportCache) getSandboxInNode(node *ImportCacheNode, forceNew bool, | |||
} | |||
} | |||
node.sbRefCount += 1 | |||
fmt.Printf("node.sb != nil: node %d, getSandboxInNode with ref count %d\n", node.SplitGeneration, node.sbRefCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ought to be making log.Printf calls so we get timestamps too. fmt.Printf would only be for commands a user is executing (not the long-running server).
@@ -14,6 +14,10 @@ type MultiTree struct { | |||
trees []*ImportCache | |||
} | |||
|
|||
func (mt *MultiTree) Warmup() error { | |||
panic("multi-tree warmup not implemented!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like to avoid panics because we won't do cleanup then (e.g., maybe something we mounted never gets unmounted). Let's just return an error.
return "", nil | ||
} | ||
code := fmt.Sprintf(` | ||
os.environ['OPENBLAS_NUM_THREADS'] = '2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not hardcode stuff specific to one pkg (numpy?) here.
The default value of warmup is always false, and cow is always true to keep consistent with the original OL implementation.
Notice: when memory is limited, warmup might fail. Warmup is still an experimental feature, some of the errors are not carefully handled. Set it to false or enlarge the memory pool if running into any bugs.
I have verified that when downsize the pip-install container to 300MB, warmup would work on 9GB memory on current default cache tree. The reason to downsize the pip-install container is: when the packages in the cache tree are not installed, init containers to install them would consume a lot of memory. If all packages are installed, warming larger cache tree is also possible.