2. Agenda
• Talk about web services in a really dumb
(“abstract”?) way
• Explain when we need async web servers
• Why is async hard?
• What is Tornado and how does it work?
• Why am I writing a new PyMongo wrapper to
work with Tornado?
• How does my wrapper work?
3. CPU-bound web service
Client Server
socket
• No need for async
• Just spawn one process per core
4. Normal web service
Backend
Client Server (DB, web service,
socket socket SAN, …)
• Assume backend is unbounded
• Service is bound by:
• Context-switching overhead
• Memory!
5. What’s async for?
• Minimize resources per connection
• I.e., wait for backend as cheaply as possible
7. HTTP long-polling (“COMET”)
• E.g., chat server
• Async’s killer app
• Short-polling is CPU-bound: tradeoff between
latency and load
• Long-polling is memory bound
• “C10K problem”: kegel.com/c10k.html
• Tornado was invented for this
8. Why is async hard to code?
Client Server Backend
request
request
time
store state
response
response
9. Ways to store state
this slide is in beta
Multithreading
Memory per connection
Greenlets / Gevent
Tornado, Node.js
Coding difficulty
10. What’s a greenlet?
• A.K.A. “green threads”
• A feature of Stackless Python, packaged as a
module for standard Python
• Greenlet stacks are stored on heap, copied to
/ from OS stack on resume / pause
• Cooperative
• Memory-efficient
11. Threads:
State stored on OS stacks
# pseudo-Python
sock = listen()
request = parse_http(sock.recv())
mongo_data = db.collection.find()
response = format_response(mongo_data)
sock.sendall(response)
15. Tornado IOLoop
class IOLoop(object):
def add_handler(self, fd, handler, events):
self._handlers[fd] = handler
# _impl is epoll or kqueue or ...
self._impl.register(fd, events)
def start(self):
while True:
event_pairs = self._impl.poll()
for fd, events in event_pairs:
self._handlers[fd](fd, events)
16. Python, MongoDB, & concurrency
• Threads work great with pymongo
• Gevent works great with pymongo
– monkey.patch_socket(); monkey.patch_thread()
• Tornado works so-so
– asyncmongo
• No replica sets, only first batch, no SON manipulators, no
document classes, …
– pymongo
• OK if all your queries are fast
• Use extra Tornado processes
17. Introducing: “Motor”
• Mongo + Tornado
• Experimental
• Might be official in a few months
• Uses Tornado IOLoop and IOStream
• Presents standard Tornado callback API
• Stores state internally with greenlets
• github.com/ajdavis/mongo-python-driver/tree/tornado_async
18. Motor
class MainHandler(tornado.web.RequestHandler):
def __init__(self):
self.c = MotorConnection()
@tornado.web.asynchronous
def post(self):
# No-op if already open
self.c.open(callback=self.connected)
def connected(self, c, error):
self.c.collection.insert(
{‘x’:1},
callback=self.inserted)
def inserted(self, result, error):
self.write(’OK’)
self.finish()
20. Motor internals: wrapper
class MotorCollection(object):
def insert(self, *args, **kwargs):
callback = kwargs['callback']
1
del kwargs['callback']
kwargs['safe'] = True
def call_insert():
# Runs on child greenlet
result, error = None, None
try:
sync_insert = self.sync_collection.insert
3
result = sync_insert(*args, **kwargs)
except Exception, e:
error = e
# Schedule the callback to be run on the main greenlet
tornado.ioloop.IOLoop.instance().add_callback(
lambda: callback(result, error)
8
)
# Start child greenlet
2
greenlet.greenlet(call_insert).switch()
6
return
21. Motor internals: fake socket
class MotorSocket(object):
def __init__(self, socket):
# Makes socket non-blocking
self.stream = tornado.iostream.IOStream(socket)
def sendall(self, data):
child_gr = greenlet.getcurrent()
# This is run by IOLoop on the main greenlet
# when data has been sent;
# switch back to child to continue processing
def sendall_callback():
child_gr.switch() 7
self.stream.write(data, callback=sendall_callback)
4
# Resume main greenlet
child_gr.parent.switch()
5
22. Motor
• Shows a general method for asynchronizing
synchronous network APIs in Python
• Who wants to try it with MySQL? Thrift?
• (Bonus round: resynchronizing Motor for
testing)
23. Questions?
A. Jesse Jiryu Davis
jesse@10gen.com
emptysquare.net
(10gen is hiring, of course:
10gen.com/careers)