Skip to content

[RFC] Threading #1831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
palaviv opened this issue Mar 28, 2020 · 32 comments
Closed

[RFC] Threading #1831

palaviv opened this issue Mar 28, 2020 · 32 comments
Labels
E-help-wanted Extra attention is needed RFC Request for comments

Comments

@palaviv
Copy link
Contributor

palaviv commented Mar 28, 2020

Summary

I think we have got to the stage we should support threading in RustPython.

Detailed Explanation

There was already some discussion on threading but I wanted to start a dedicated issue as it is a big decision. I will try to break the issue in order to make some order:

Definition of done

We need a user to be able to create a new thread in python and execute code on the same Python object from multiple threads:

import threading

def worker(num):
    """thread worker function"""
    print 'Worker: %s' % num
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

GIL

I think that one of the biggest discussion is the GIL. Should we add a GIL to RustPython? Should we allow only one thread to use each object at a time?

Suggested approach

I suggest the following changes in order to reach the ability to spawn a new thread:

  • I suggest we will create a new VirtualMachine for each thread. Thus making VirtualMachine !sync and !Send.
  • Use Arc instead of Rc in PyObjectRef. pub type PyObjectRef = Arc<PyObject<dyn PyObjectPayload>>
  • PyValue and PyObjectPayload traits will implement Sync and Send. Thus forcing us to make all Py* structs sync as well.
  • We will need to convert most of the code Rc, Cell and RefCell use to thread safe options. For example Arc, Mutex and Atomic*.
  • When accessing data of an Object wrapped in Mutex we will lock the mutex for our use. This will require careful handling when using internal objects to avoid deadlocks when used between threads.

A simple example of the start_new_thread method in _thread will look something like:

fn start_new_thread(func: PyFunctionRef, args: PyFuncArgs, vm: &VirtualMachine) -> u64 {
    let handle = thread::spawn(move || {
        let thread_vm = VirtualMachine::new(PySettings::default()); // Should get some params from original VM
        thread_vm.invoke(func.as_object(), args).unwrap();
    });
    get_id(handle.thread())
}

Drawbacks, Rationale, and Alternatives

  • I suggest that we will ignore the GIL and allow multiple threads to execute python code simultaneously. This can be a big advantage of RustPython on CPython. The user will be expected to make his own code thread safe. This might prove problematic to code that rely on the GIL.

  • What about third party library's? We do not yet have an API but would we force them to implement Sync and Send as well?

  • There was a lot of talk about alternatives and I cannot find all so please fill free to add in the comments. One alternatives that was suggested is Crossbream.

Unresolved Questions

  • How can we do this change in small steps?
  • How to test for deadlocks?
  • Is this the right time to do this?
@palaviv palaviv added the RFC Request for comments label Mar 28, 2020
@coolreader18
Copy link
Member

I think your idea of making a new vm per-thread is a good idea, we could probably clone the PyContext, builtins and sysmodule (and some other stuff) to the new vm but create a new e.g. exception stack. Maybe we could split off some of the fields that are RefCells into a separate VMState struct in order to make it easier to clone them/wrap them in Mutexs.

@coolreader18
Copy link
Member

This is pretty interesting/exciting, I think I might work on preparing the vm for this.

@coolreader18
Copy link
Member

Well, just with this change:

diff --git a/vm/src/pyobject.rs b/vm/src/pyobject.rs
index f0265d1c..929ffab6 100644
--- a/vm/src/pyobject.rs
+++ b/vm/src/pyobject.rs
@@ -1177,11 +1177,11 @@ pub trait PyValue: fmt::Debug + Sized + 'static {
     }
 }
 
-pub trait PyObjectPayload: Any + fmt::Debug + 'static {
+pub trait PyObjectPayload: Send + Sync + Any + fmt::Debug + 'static {
     fn as_any(&self) -> &dyn Any;
 }
 
-impl<T: PyValue + 'static> PyObjectPayload for T {
+impl<T: Send + Sync + PyValue + 'static> PyObjectPayload for T {
     #[inline]
     fn as_any(&self) -> &dyn Any {
         self

cargo gives 716 errors in the codebase, so I think this will have to be a gradual process of converting our types to be Send + Sync, similar to migrating from enum PyObjectPayload -> trait PyObjectPayload a year ago.

@palaviv
Copy link
Contributor Author

palaviv commented Apr 4, 2020

Yes. This is going to take some time converting all objects to be Send + Sync. In order to help with the tracking our progress and making sure objects that have already been converted will not regress. I suggest adding a new temporary trait:

// Temporary trait to follow the progress of threading conversion
pub trait ThreadSafe: Send + Sync {}

And we will implement the ThreadSafe trait for each object that is ThreadSafe. Once all objects are ThreadSafe we will remove this trait and mark PyObjectPayload with Send + Sync.

@coolreader18
Copy link
Member

Unfortunately, that won't necessarily work, since a lot of the built-in types store other PyObjectRefs inside, and until PyObjectRef is made Send + Sync itself, those can't be marked ThreadSafe.

@palaviv
Copy link
Contributor Author

palaviv commented Apr 9, 2020

The problem is that this is a circular dependency. I would like to be able to work on this in small steps. We can go in two ways here:

  1. Use Mutex in every object that contains PyObjectRef. Once PyObjectRef is Send+Sync remove the Mutex or change to RWlock.
  2. Make PyObjectRef Send+Sync by wrapping everything inside in Mutex. After making all object ThreadSafe remove the Mutex.

What do you think? Any other suggestions?

@palaviv
Copy link
Contributor Author

palaviv commented Apr 11, 2020

The problem is that this is a circular dependency. I would like to be able to work on this in small steps. We can go in two ways here:

  1. Use Mutex in every object that contains PyObjectRef. Once PyObjectRef is Send+Sync remove the Mutex or change to RWlock.
  2. Make PyObjectRef Send+Sync by wrapping everything inside in Mutex. After making all object ThreadSafe remove the Mutex.

What do you think? Any other suggestions?

After looking more into this my suggestion would not actually work as Mutex<T> implements Sync only when T implements Send.
Another way to go is to declare temporarily:

unsafe impl<T: PyValue> Send for PyObject<T> {}
unsafe impl<T: PyValue> Sync for PyObject<T> {}

This should not affect the current implementation as we do not support threading yet but will allow us to change most object to ThreadSafe.
@coolreader18 what do you think?

@coolreader18
Copy link
Member

coolreader18 commented Apr 11, 2020

Technically that would be unsound, but as its only temporary and we wouldn't actually be using it I think it might be fine. Also, have you tried adding just a Send bound to PyObjectPayload? I'd think there'd be a lot more payload types that already implement that.

@coolreader18
Copy link
Member

Dang, just ran into a problem with using RwLocks for bytearray: https://github.com/RustPython/RustPython/pull/1821/checks?check_run_id=579689378. According to the standard library documentation for RwLock, calling read() might (read: will) panic if the lock is already held by the current thread, which doesn't really fit with our using them to replace RefCells, which are perfectly fine with that usage. I think we have to either use some other RwLock implementation, like parking_lots's, which allows for recursive/reentrant reads, or figure out some way to structure our code so that we don't try to read() when a lock is already held, e.g. don't call into python code when the lock is held.

@palaviv
Copy link
Contributor Author

palaviv commented Apr 12, 2020

Wow that was quick 😅
From looking into the error it seems to happen only when running with trace. I am not sure what option is better. I don't see any problem using a reentrant lock but I do think we should try and avoid calling python code when holding locks.
I will try and look more into the error and update if I understand where is the problem.

@redradist
Copy link

Guys, I would like to help in supporting threading in RustPython without GIL !!
Is there either a pull-request or branch on which you are working ?

@palaviv palaviv added the E-help-wanted Extra attention is needed label Apr 14, 2020
@palaviv
Copy link
Contributor Author

palaviv commented Apr 14, 2020

Hi @redradist, That would be great. We have a lot of work to do!!!
You can start converting object to thread safe. You can look at what I did in #1848 and #1860. I would recommend starting with object that don't contain PyObjectRef until we make it Send + Sync. I would recommend starting with pystruct, io, os just look for anything that implements PyValue.

@redradist

This comment has been minimized.

@palaviv

This comment has been minimized.

@redradist

This comment has been minimized.

@coolreader18

This comment has been minimized.

@redradist

This comment has been minimized.

@redradist

This comment has been minimized.

@coolreader18

This comment has been minimized.

@redradist

This comment has been minimized.

@redradist

This comment has been minimized.

@coolreader18

This comment has been minimized.

@redradist

This comment has been minimized.

@redradist

This comment has been minimized.

@coolreader18

This comment has been minimized.

@coolreader18
Copy link
Member

@redradist if you have any more problems with running on windows, could you open a separate issue for it?

@redradist
Copy link

redradist commented May 12, 2020

@coolreader18 @palaviv
Guys, is there anything left to mark ThreadSafe ?
I searching in repo and seems like all classes marked as ThreadSafe ... ?!

@palaviv
Copy link
Contributor Author

palaviv commented May 16, 2020

Sorry @redradist I have already finished that. There are many other things we need help with. Please look at the contribution guide.

@palaviv
Copy link
Contributor Author

palaviv commented May 29, 2020

I think we can close this issue. RustPython supports threads without the GIL. You can test this with this example:

# single_threaded.py
import time
from threading import Thread

COUNT = 500000

def countdown(n):
    while n>0:
        n -= 1

start = time.time()
countdown(COUNT)
end = time.time()

print('Time taken in seconds -', end - start)
# multi_threaded.py
import time
from threading import Thread

COUNT = 500000

def countdown(n):
    while n>0:
        n -= 1

t1 = Thread(target=countdown, args=(COUNT//2,))
t2 = Thread(target=countdown, args=(COUNT//2,))

start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()

print('Time taken in seconds -', end - start)

When running with CPython I get:

> python3 single_threaded.py
Time taken in seconds - 0.023739099502563477
> python3  multi_threaded.py
Time taken in seconds - 0.023739099502563477

While in RustPython:

> target/release/rustpython single_threaded.py
Time taken in seconds - 2.0910146236419678
> target/release/rustpython  multi_threaded.py
Time taken in seconds - 1.4379611015319824

As you can see we are slow but we are slow on multiple threads concurrently 😄

@palaviv palaviv closed this as completed May 29, 2020
@redradist
Copy link

redradist commented May 29, 2020

I think we can close this issue. RustPython supports threads without the GIL. You can test this with this example:

# single_threaded.py
import time
from threading import Thread

COUNT = 500000

def countdown(n):
    while n>0:
        n -= 1

start = time.time()
countdown(COUNT)
end = time.time()

print('Time taken in seconds -', end - start)
# multi_threaded.py
import time
from threading import Thread

COUNT = 500000

def countdown(n):
    while n>0:
        n -= 1

t1 = Thread(target=countdown, args=(COUNT//2,))
t2 = Thread(target=countdown, args=(COUNT//2,))

start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()

print('Time taken in seconds -', end - start)

When running with CPython I get:

> python3 single_threaded.py
Time taken in seconds - 0.023739099502563477
> python3  multi_threaded.py
Time taken in seconds - 0.023739099502563477

While in RustPython:

> target/release/rustpython single_threaded.py
Time taken in seconds - 2.0910146236419678
> target/release/rustpython  multi_threaded.py
Time taken in seconds - 1.4379611015319824

As you can see we are slow but we are slow on multiple threads concurrently 😄

@palaviv
Is there tests with RustPython when you have Rc instead of Arc on PyObject for single threaded implementation ?

@palaviv
Copy link
Contributor Author

palaviv commented May 29, 2020

@redradist we currently don't support this but that sounds like a nice feature if you want to try and take it.

@redradist
Copy link

@palaviv

@redradist we currently don't support this but that sounds like a nice feature if you want to try and take it.

No, I've just would like to see benchmarks before and after ... )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-help-wanted Extra attention is needed RFC Request for comments
Projects
None yet
Development

No branches or pull requests

3 participants