Skip to content

Integrate Capabilities in Open-Lambda #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

m0mosenpai
Copy link

@m0mosenpai m0mosenpai commented May 15, 2025

Upon looking at the code (and the workshop slides), I found 3 possible places where we might drop privileges:

  1. Inside freshProc() [src/worker/sandbox/sock.go] right before exec-ing the server.py.
  2. Inside fork_server() [min-image/runtimes/server.py] right before the while loop that forks a new Zygote
  3. Inside start_container() [ min-image/runtimes/server.py] right before forking a Leaf to run the user code.

Some general questions I have:

  1. When and on what basis is either of web_server.py or fork_server.py invoked? (EDIT: I meant web_server() and fork_server() functions in server.py my bad)
  2. I also noticed that the base directory for OL gets initialized with 700 permissions and since we run everything as root, it makes it so that I have to su before I can navigate to the directory and create a lambda function with my code. Is it possible that none of this is run as root right from the beginning itself so this issue doesn't occur either? Or maybe the base directory can be initialized with more relaxed permissions?
  3. Which capabilities do we really need at each step? One plan for finding that out was that maybe we can drop all privileges at each step and strace the system calls (or if we know ones that are needed already) to see which capabilities they need and then iteratively add and test them to see if all things work as expected. Which also brings me to another question
  4. Is there a good way I can test all of my changes? Test cases or things I should look out for?

@tylerharter
Copy link
Member

Great progress! Let's do one case at a time, each in different PRs:

How about this first? "right before forking a Leaf to run the user code". But is it before, or after forking?

For you four questions:

  1. I'm a little confused, I don't think we have files named web_server.py or fork_server.py
  2. OL needs root sometimes to manipulate cgroups and namespaces. Maybe there's something to improve here, but probably as a separate effort (after capability dropping)
  3. Unfortunately, creating namespaces and cgroups requires CAP_SYS_ADMIN (https://man.archlinux.org/man/clone3.2.en). CAP_SYS_ADMIN is very broad and is sometimes called the "new root", so we cannot precisely drop to the exact privileges we need. But something is better than nothing, and this is a great step if CAP_SYS_ADMIN gets broken into finer-grained permissions in the future.
  4. You can see some of the CI tests we run (https://github.com/open-lambda/open-lambda/blob/main/.github/workflows/ci.yml#L64) and do those locally. You should eventually add some tests that try to use capabilities that are not allowed (we should see the invocation fail).

Copy link
Member

@tylerharter tylerharter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!


capList[0] = cap;
if (cap_set_flag(caps, CAP_EFFECTIVE, 1, capList, setting) == -1) {
cap_free(caps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the nice things about a goto is it gives you a single place to clean things up, like this:

if (caps != NULL)
cap_free(caps);

Copy link
Author

@m0mosenpai m0mosenpai Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerharter I actually didn't get what you mean here. I tried to use the goto statement already. Is there an even better way to do it?

@@ -8,6 +9,71 @@
#include <sys/types.h>
#include <sys/wait.h>
#include <seccomp.h>
#include <sys/capability.h>

static int modify_cap_impl(int cap, int setting) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent can we just create Python wrappers around the cap API (can_get_proc, cap_set_flag, etc), and then have the logic using it in Python code only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerharter I couldn't actually find a native way to do this in Python (atleast at a granular level) and then came across the seccomp code in this file. It feels like we should be able to do a lot with the wrappers - like we have good amount of control over how we want to expose these functions to Python. Is there a concern around doing it in this way?

We may have to do it in a different way (or use an external module?) for dropping privileges in sock.go since these would only work in Python.

@@ -208,6 +208,7 @@ def main():
file.write(pid)
print(f'server.py: joined cgroup, close FD {fd_id}')

# drop privileges here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make capability dropping optional, via config, like it is for seccomp? https://github.com/open-lambda/open-lambda/blob/main/min-image/runtimes/python/server.py#L189

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerharter I added the command line argument for enabling capabilities as well. However, we'll need to pass the flag on to everywhere we drop capabilities - to check it and drop only if it's true. Wondering what would be the best way to do that.

@m0mosenpai
Copy link
Author

m0mosenpai commented Jun 10, 2025

@tylerharter My bad, I meant to say web_server and fork_server functions in server.py. I edited the original comment. I couldn't understand which one was being invoked when.

Okay sure, that makes sense. I'll start with CAP_SYS_ADMIN then. Thanks for pointing me to the tests.

@m0mosenpai m0mosenpai closed this Jun 10, 2025
@m0mosenpai m0mosenpai reopened this Jun 10, 2025
@m0mosenpai
Copy link
Author

@tylerharter "How about this first? "right before forking a Leaf to run the user code". But is it before, or after forking?"

Taking this scenario first, I'm not sure exactly which one would be better or what the difference in the impact would be - dropping capabilities before vs after?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants