Skip to content

Add a better injectModule method to WASM #1744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 3, 2020

Conversation

coolreader18
Copy link
Member

This allows for strict privacy for WASM modules, which is important in an environment like the browser, where most of the time you're running code on someone else's computer. The idea for this method would be to write a wrapper module around some possibly security-sensitive functions that you pass from js, and then allow arbitrary code to access that module. For robot rumble (the successor to codingworkshops, the code learning platform), it might be used like:

// vm.injectModule(name, source, imports?, strict_private?)
vm.injectModule(
    "robots",
    `
__all__ = ["export"]
def export(...): runRobot(...)
# robot-rumble "stdlib" source code
`,
    { runRobot: () => { ... } },
    true,
);
// later
vm.exec(userCode);

We don't want user code arbitrarily calling the runRobot function, so we wrap it in a module that makes sure that it can't, and also (ideally) provides a nicer, more pythonic interface. The strict_private parameter enables that; it runs the source code provided in one namespace with the JS imports, and copies only the items exported in __all__ to a new namespace that is injected into sys.modules.

@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch from 03df1ea to 997927a Compare February 4, 2020 05:07
@coolreader18
Copy link
Member Author

coolreader18 commented Feb 4, 2020

Hmm, just realized that it would be possible to access the globals by doing e.g. export.__globals__['runRobot']. I might look into preventing that, though I don't know if it's that necessary; it's pretty obscure.

@coolreader18
Copy link
Member Author

Alright, I added the concept of "incognito" mode (though it's not an actual compile::Mode), which means that functions defined in it cannot have their __globals__ accessed, and frames that originate from an incognito context disallow accessing f_globals and f_locals. There's probably a few things like that that I missed, but I think the basic concept is decent.

@youknowone
Copy link
Member

Because I didn't look into non-vm part of this project that much, I am confused about this a little. To make it clear, this is a extension feature (mostly) for wasm, right?

Basically, I don't like the idea that non-standard python feature slow down the standard features.
It seems this feature is only enabled when inject_modlue is called. So please review if this idea will work:

  1. Define new objects SecureModule, SecureFrame and SecureCodeObject (or Incognito*)
  2. Define traits of module, frame and code object. Because the mode is given to compiler, a single module will have only CodeObject or SecureCodeObject. So they probably perfectly exclude each other.
  3. Turn some functions and struct into generic. Probably vm.run_code, PyFunction and module related things?
  4. These things might be placed in wasm or vm by whether this feature is belong to rustpython or rustpython-wasm.

Note: Here "standard Python" means documented or CPython compatible.

@coolreader18
Copy link
Member Author

coolreader18 commented Feb 7, 2020

I think that would be a better way of separating the concerns, but it might also be cumbersome to have to define CodeObject, Frame, and PyFunction twice. Making core functions/frame code generic might also increase the code size a ton. I think this is sort of about whether or not we should implement features that CPython doesn't have, so here's my take on that: as an alternative Python interpreter, we are in a unique situation where we can experiment with non-standard features that we consider useful, and see how they work/interact with the rest of the language. At least in this situation (e.g. arbitrary Python code running in the browser), a feature like this is definitely necessary, but CPython doesn't usually run in the browser. However, I'm sure there are other environments where Python code does need security like the security that this feature provides, so I think it would be interesting to try this and see where else it could be used, or where else it might lead.

@coolreader18
Copy link
Member Author

coolreader18 commented Feb 7, 2020

Also, w.r.t. "only used from the rustpython-wasm module", I was thinking that it could also be enabled from a Python module by something like from __future__ import incognito or __incognito__ = true.

@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch from a023acc to a651729 Compare February 8, 2020 07:07
@palaviv
Copy link
Contributor

palaviv commented Feb 8, 2020

Basically, I don't like the idea that non-standard python feature slow down the standard features.

@coolreader18 can you give us some numbers on the performance impact this change has?

@coolreader18
Copy link
Member Author

Alright, I had some trouble getting the benchmarks to work, but here's what I got:

On master:

running 3 tests
test bench_rustpy_mandelbrot    ... bench: 1,017,088,271 ns/iter (+/- 21,787,720)
test bench_rustpy_nbody         ... bench: 306,066,689 ns/iter (+/- 13,873,395)
test bench_rustpy_parse_to_ast  ... bench:  10,004,214 ns/iter (+/- 1,026,285) = 6 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 4 filtered out


RustPython on  master [!] is 📦 v0.1.1 via 🐍 v3.8.1 via 🦀 v1.41.0 took 6m42s 
❯ 

On this branch:


running 3 tests
test bench_rustpy_mandelbrot    ... bench: 1,028,623,787 ns/iter (+/- 51,170,076)
test bench_rustpy_nbody         ... bench: 305,318,476 ns/iter (+/- 30,826,374)
test bench_rustpy_parse_to_ast  ... bench:  10,042,795 ns/iter (+/- 689,063) = 6 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 4 filtered out


RustPython on  coolreader18/wasm-inject-module [!] is 📦 v0.1.1 via 🐍 v3.8.1 via 🦀 v1.41.0 took 6m50s 
❯ 

Basically very little difference; the difference between 2 benchmarks are within both the standard deviations given by the bencher.

@palaviv
Copy link
Contributor

palaviv commented Feb 9, 2020

I agree that this is a very small decrease in performance. What do you think @youknowone?
We should document features that are not part of CPython. Where do you think this should be documented? I mentioned it in #1192 but we never decided where.

@youknowone
Copy link
Member

youknowone commented Feb 11, 2020

To make things clear, I am not asking changes or trying to decline this PR by these comments. This is just my general preferences.

Before talking about performance, the performance itself doesn't justify the design. So this is a kind of negotiation of trade-off between performance and the implementation cost of the new feature. It seems having multiple PyFunction is not a big deal. If generic has too much complexity for core python features, then still secure objects can take over the complexity part as an optional feature. I think struct SecureCodeObject(CodeObject) and similar secure frame is another option. it seems we don't need 2 versions of frame - we just need 2 versions of FrameRef.

whether incognito mode is included in rustpython core or wasm is a subtle issue for me. I think the feature itself need to live out of the core language. How about this? wasm support is one of the major feature of RustPython but perfectly living out of rustpython core world. So no matter whatever things happen in there. I think any non-standard feature can be added in this way of extension. We just can keep the core for the best compatibility.

About the benchmark, the measured performance doesn't look meaningful. The benchmark should measure the cost of f_globals and f_locals. The benchmark code use them enough? I am doubtful in general sense of Python though I didn't look in the benchmark test code.

Still, if you guys think adding this feature in this way is better than other way, I think it is acceptable. Because usually I was concentrating on core features, I don't have any perspective of the big plans of the project. So I don't want to disturb to decide things we do out of the core language.

@coolreader18
Copy link
Member Author

That makes a lot of sense; I think your idea of separate FrameRef payoad types is much more elegant than my solution, I just don't know if it's feasible with how the PyRef type works at the moment. Eventually I'd like to allow some sort of upcast operation on a PyObject that returns a reference to the payload type of its base class, but I'm still trying to figure out the best way to do that. That would be necessary to implement your solution, though, as FrameRefs stored in e.g. the vm.frames frame stack are just PyRefs and can be accessed from Python, so it wouldn't be possible to store multiple different Frame payloads in the frame stack without that upcast feature.

@coolreader18
Copy link
Member Author

@youknowone would you be okay with this getting merged?

@youknowone
Copy link
Member

youknowone commented Feb 17, 2020

if everybody think this is the best for now, yes, I am okay

@coolreader18
Copy link
Member Author

@palaviv what do you think of this?

Copy link
Contributor

@palaviv palaviv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We should document this

@windelbouwman
Copy link
Contributor

Hi all,

I looked into this change a bit, and I cannot say I agree with it. Python is insecure by design, you can modify all kind of things. In my opinion, adding an incognito mode adds extra code which tries to protect global variables, but as you mentioned elsewhere, you could still circumvent this by accessing the globals in another obscure way. I think it would be best to stick to CPython behavior, which is insecure by design. Making python properly secure can probably be harder than you imagine at first.

I like the __future__ compilation part of the change, that would be good to merge, but I'm reluctant to merge the incognito mode. Also, I would like to be able to use the rustpython code as a library on embedded systems, and then I want the library to be as simple as possible, and not too tailered towards WASM / javascript.

Please do not be offended by this feedback, but this is my view on the project.

@coolreader18
Copy link
Member Author

coolreader18 commented Feb 21, 2020

Would you be opposed to any attempt to sandbox RustPython? What about a wasm-only builtin module that has a function to wrap a callable into an opaque object?

@opaque_callable
def a(): pass

I think we do need some solution for sandboxing, since a lot of use cases for embedding a scripting language into a program involves running arbitrary code, and I think RustPython's self-contained nature lends itself well to embedding.

@windelbouwman
Copy link
Contributor

I'm not opposed to sandboxing rustpython as a whole. This is already possible by compiling rustpython as a wasm module, and by using the wasm import / export machinery to very tightly control what can be accessed by rustpython. In that sense, the sandboxing would take place by limiting the exposed modules, such as sys, os and the file and socket API's. Maybe we could add feature flags to choose which parts of the standard library you want to use?

My rationale is that we do not require any specific sandboxing features inside rustpython itself. Just treat all python scripts executing in rustpython as one big interconnected mass. If you want to isolate some of your python code, since it is safety related, the I would propose to fire up two rustpython VM's and link them using queues, or another interface. In this sense, you would have two isolated rustpython VM's. One with a few privileges, and a limited API to the second VM. The second VM would than contain the secure code, and have more access.

On top off that I think python is insecure by it's nature, which is also what makes it a great tinkering tool, so I think we should stick to that, and not try to build protection features in. Another reason for not wanting the protection features, is that it might sound like it's secure, but still there are loopholes to bypass this security. In this case it is better to have no security than apparent security.

I agree that rustpython lends itself very well for embedding, and I would like to keep this feature.

@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch 3 times, most recently from c065929 to 064187f Compare February 29, 2020 04:51
@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch from 064187f to ba50927 Compare March 14, 2020 01:53
@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch from ba50927 to 6afdfdc Compare April 3, 2020 17:20
@coolreader18 coolreader18 force-pushed the coolreader18/wasm-inject-module branch from 6afdfdc to a8027ab Compare April 3, 2020 17:34
@coolreader18
Copy link
Member Author

I'm merging this now, since it's basically just the __future__ syntax and the injectModule method, no big changes like incognito.

@coolreader18 coolreader18 merged commit 71af10b into master Apr 3, 2020
@coolreader18 coolreader18 deleted the coolreader18/wasm-inject-module branch April 5, 2020 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants