[RFC] Support embedding and sandboxing untrusted code #4210

stephanemagnenat · 2022-10-11T09:57:02Z

Summary

As far as I have seen, RustPython is not yet suitable for "safe" embedding, meaning that executed Python code can block or hurt the caller code, because:

execution can loop forever,
memory allocation is not bounded,
import cannot be disabled.

Detailed Explanation

I wish to use RustPython as a scripting language within a game engine, running third-party user code. A requirement for me is that this code is run in a safe way. As far as I have seen (but I might have missed some elements), it is currently not the case:

It is not possible to limit code execution up to a maximum number of instructions. Looking into RustPython's source, if I'm not mistaken, this feature would need to be added to ExecutingFrame in its execution loop. Maybe the ExecutionResult type could be extended with an InstructionBudgetExceeded variant or similar (which could later be expanded to support step by step interactive debugging).
I couldn't find a way to set limits to heap allocation or stack size, that would also be needed.
My understanding is that importlib is always enabled, but that the OS module is currently disabled in Wasm32 or Wasi, is that correct? If so, there should be a way to have finer-control on import lib or even forbid user-defined import statements altogether. Also, I see no reason to link the control of the availability of the os module to WASM, maybe a feature flag would be a good addition. Similarly, it should be possible to not enable some Windows specific code and fully disable or controls IOs (including network) and side-effect functions (such as delay) regardless of the target platform.
Control of the garbage collection would be welcome ((READY FOR REVIEW)Garbage collect: A stop-the-world cycle collector #4180), although not critical because one design could be that the RustPython context does not outlive a display frame.

Drawbacks, Rationale, and Alternatives

The rationale is to use RustPython as an embedding language within larger software, such as game engines. In these, the software must fully control the scripting environment's limits.

The main drawback is increased code complexity within RustPython, but I believe it can be done cleanly, with some work of course. The split of the Std library (#3102) was already a step towards the direction of embedding.

The alternatives are to not implement this feature, or do it in a fork. A similar issue exists (#3090), but it is more of a question, so I thought an RFC-style new issue is better.

Unresolved Questions

There are quite some design questions obviously, but I guess first one should agree whether this overall feature makes sense for the project, then the design can be worked out. Probably a unified way to control embedding would be elegant.

The text was updated successfully, but these errors were encountered:

youknowone · 2022-10-11T12:47:08Z

What's your requirements for the embedding? It sounds more like sandboxing to run arbitrary third party code, rather than a simple embedding to run first or second party code. For example, limiting running time is not a common requirement of embedding. Sharing more detailed usage will give us more detailed image what kind of requirement you have.

I agree to have sandboxing support to leverage our wasm support. Here are a few more answers about the questions.

import cannot be disabled, but importlib can be disabled. I recently added importlib feature to the main crate and you can test it with --no-default-features.(please don't forget to enable other features again) But disabling importlib doesn't mean you cannot import anything. it means you cannot import "complex" libraries. People will still have _imp, which can be a resource of full importlib implementation. We will need more work to prevent import itself.

You seem to want to have memory quota. This is not easy in python. Catching exceeding quota is comparably easy, but preventing exceeding quota will not. guess you tried 100000000 ** 100000000. This is a single expression in python but can explode any practical memory.

stephanemagnenat · 2022-10-11T14:31:26Z

Thank you for your quick answer! Yes, our need is indeed more about sandboxing untrusted code beyond simple embedding.

Regarding our use case, we are creating an educational game creation platform (https://cand.li) that currently has its custom visual language and is written in a mix of Typescript and Rust (compiled to WASM). We are progressively migrating more and more of our code base to Rust, and in the medium term Python support is something our customers (teachers and students) are asking for, the idea being that they can start with the visual language, then write their own advanced blocks in Python, even sharing them with less-advanced students. However, sandboxing is critical:

As games can be shared, arbitrary user-written code will be executed on the target machine.
As users are typically students, it is likely they will sometimes write infinite loops and memory-exploding algorithms, even during development. So to provide the proper educational experiences, these bugs must be detected and reported (I guess some advanced static code analysis could help but is likely not enough).

Regarding memory quota, I imagine that it would be possible for all internal Python data to go through a gate checking the quota when being created, and basically fail execution if the quota is reached (for us that would be enough). Of course I guess it would be an optional feature. I'm under the impression that it touches a similar problematic as the garbage collection, and maybe can be designed at the same time. Regarding the intrinsics that can allocate an arbitrary amount of memory, like 100000000 ** 100000000, I guess they need to be able to take a memory quota (I imagine the delta between the current amount and the limit) and return a failure if they would allocate significantly more than that. It is probably annoying to write that, but I guess might be useful even beyond sandboxing, to not exhaust the memory of the machine, fill the swap, before the process is killed by the OOM killer in case of unrealistic expressions like the one above.

Regarding importing, allowing import from a restricted set of pure Python or Rust-based modules defined outside the sandbox would be very good, just being able to prevent accessing the computer's filesystem would be critical.

youknowone · 2022-10-18T07:19:22Z

The IO parts looks like to be natively blocked by wasm environment.
Yes, we basically are interested in sandboxing, but there wasn't much progress yet.

stephanemagnenat · 2022-10-18T13:40:36Z

The IO parts looks like to be natively blocked by wasm environment.

Yes, but at least in our case, we would like to have it on native as well, as we plan to make a native app at some point. I imagine that other game projects would have a similar need.

Yes, we basically are interested in sandboxing, but there wasn't much progress yet.

Good to hear that you are interested! I'm happy to provide input from the use case side of the thing, whenever it is helpful!

BjornTheProgrammer · 2024-05-19T03:48:47Z

Hey, I was just wondering if this is possible in the latest version of RustPython. I wish to run a native app that can run user-generated Python code. Ideally, it would be as feature-rich as possible but without access to IO of any type. An alternative I have thought of is to use a wasm build and call the interpreter through that to do sandboxing.

FlippingBinary · 2024-09-19T14:28:03Z

I'm very interested in using this library in my projects because Python is such a popular language in data science, but stability in the face of arbitrary code is very important to me. I wouldn't want a user to be able to accidentally crash their browser or lock up their machine.

To limit the amount of memory (or even the rate at which memory is allocated), perhaps the library could support custom allocators? Rust uses the Global one by default, but I think RustPython could add support for custom allocators without disrupting any existing code that depends on it. Then the user could create a custom allocator that keeps track of whatever matters to them and denies allocation (pretending to be out of memory) when a resource limit has been reached. The simplest approach for the RustPython library would be to simply expose the ability to override the custom allocator for the memory allocated by the vm for the Python code, but another option would be to define one or more custom allocators that have a nice API for setting memory limits.

For execution duration, obviously a multi-threaded application could just use a dedicated thread for the vm and terminate it whenever it wants, but a single-threaded application won't have that luxury. Is there a way to get an iterator from the vm instead of trusting that run will eventually return? An iterator could be stepped for as long as the calling code wants until some number of iterations have been reached or the wall clock passes some duration of time. If the iterator could also return the size of the stack (at least the depth of it), the calling code could enforce a limit on how deep the stack can grow, breaking out of iteration when it suspects infinite recursion.

NoelJacob · 2024-11-23T19:31:04Z

Ideally, it would be as feature-rich as possible but without access to IO of any type.

Did you find anyway to do it?

An alternative I have thought of is to use a wasm build and call the interpreter through that to do sandboxing.

Did you try that?

BjornTheProgrammer · 2024-11-24T03:59:08Z

@NoelJacob I did try it, and it worked, however it is complicated. I cannot disclose all the details, since it was for work.

stephanemagnenat added the RFC Request for comments label Oct 11, 2022

stephanemagnenat changed the title ~~[RFC] Support safe embedding~~ [RFC] Support safe embedding/sandboxing Oct 11, 2022

stephanemagnenat changed the title ~~[RFC] Support safe embedding/sandboxing~~ [RFC] Support embedding and sandboxing untrusted code Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Support embedding and sandboxing untrusted code #4210

[RFC] Support embedding and sandboxing untrusted code #4210

stephanemagnenat commented Oct 11, 2022 •

edited

Loading

youknowone commented Oct 11, 2022 •

edited

Loading

stephanemagnenat commented Oct 11, 2022 •

edited

Loading

youknowone commented Oct 18, 2022

stephanemagnenat commented Oct 18, 2022 •

edited

Loading

BjornTheProgrammer commented May 19, 2024

FlippingBinary commented Sep 19, 2024

NoelJacob commented Nov 23, 2024

BjornTheProgrammer commented Nov 24, 2024

[RFC] Support embedding and sandboxing untrusted code #4210

[RFC] Support embedding and sandboxing untrusted code #4210

Comments

stephanemagnenat commented Oct 11, 2022 • edited Loading

Summary

Detailed Explanation

Drawbacks, Rationale, and Alternatives

Unresolved Questions

youknowone commented Oct 11, 2022 • edited Loading

stephanemagnenat commented Oct 11, 2022 • edited Loading

youknowone commented Oct 18, 2022

stephanemagnenat commented Oct 18, 2022 • edited Loading

BjornTheProgrammer commented May 19, 2024

FlippingBinary commented Sep 19, 2024

NoelJacob commented Nov 23, 2024

BjornTheProgrammer commented Nov 24, 2024

stephanemagnenat commented Oct 11, 2022 •

edited

Loading

youknowone commented Oct 11, 2022 •

edited

Loading

stephanemagnenat commented Oct 11, 2022 •

edited

Loading

stephanemagnenat commented Oct 18, 2022 •

edited

Loading