Replacing CPython with PyPy at Magnetic

20-08-2014

Magnetic is the leader in online search retargeting, with a large, high volume, performance-critical platform written in Python. With the help of Maciej, the Magnetic bidders were ported from CPython to PyPy, yielding an overall 30% performance gain.

Areas of work included identifying porting issues such as CPython- and ctypes-based Python modules. Maciej helped us to find pure Python replacements and to port proprietary C extension modules to use cffi.

Modules replaced with pure python alternatives included:
geoip -> pygeoip
MySQLdb -> pymysql
ujson -> to use the standard library json which was suitably fast on PyPy

Modules that were ported included a proprietary CPython C extension module which was replaced with cffi binding to a .so library.

protobuf-generated python modules that used the protobuf python module were also replaced with cffi bindings to protobuf-c generated .so libs after discovering that the generated code was not performant via profiling.

One of our concerns was increased memory consumption by PyPy processeses (we run many of them per CPU). Memory consumption increased by 30-50% initially, causing slowdown because of swapping on our 16GB RAM machines. These issues were resolved by supplying the machines with an additional 8GB of memory, which stabilized their memory usage.

All works agreed within the scope of the project, were delivered by Baroque Software in two weeks.

Thanks
Julian Berman
magnetic.com

Embedding PyPy in uWSGI - interview with Roberto De Ioris

10-03-2014

Hello everyone.

To inaugurate this blog, we would like to present an interview with Roberto De Ioris, who is the lead developer of the uWSGI project as well a co-founder of unbit. We worked together with Roberto on providing an embedding interface for PyPy. unbit is using PyPy in production for its customers and have seen performance improvements ranging from 8% to 120% and more.

Maciej Fijalkowski: What does your company do?

Roberto De Ioris: Unbit was born in 2005 as a hosting provider for developers in Italy. We were the first (in Italy) to support "bleeding edge" technologies like Ruby On Rails and Django. In 2008 we started releasing lot of the source code we wrote in the first 3 years of the company. In 2009 we started working on the uWSGI project to have a single codebase for hosting various customers applications. Since then the project evolved a lot and now all of our infrastructure is based on it. Currently our main business has moved from hosting to consulting for companies that want to enter the hosting/PaaS market or for web-based agencies with scalability and availability problems.


Maciej Fijalkowski: How did you hear about PyPy?

Roberto De Ioris: The first time was in 2008 at the Italian PyCon but to my ears it looked pretty "freaky" as it was sold as running Python over Python (there was no jit at that time). The first time I heard of it and I started being interested was in 2011 at the EuroPython. I saw an Armin Rigo talk and started investigating. Starting from 2012 we got blasted by request to support PyPy on uWSGI. The demos Armin Rigo (and Antonio Cuni) showed were astonishing, so our customers wanted to try it. Unfortunately we were bound to the CPython C api, so supporting PyPy at that time seemed impossible. I started studying PyPy internals; then I tried working on cpyext, PyPy's CPython compatibility layer, and I released a first "almost-working" plugin. It was really a hack and afaik no-one took it seriously, expecially becasue performance was worst than CPython :) Then you contacted me about the improvements in the cffi area and proposed an interesting approach: let's write the uWSGI plugin using cffi instead of cpyext. The first attempt was already promising, after a couple of week we released the first working implementation, and in summer 2013 we had the first customer using it in production. Soon after, we started investigating if we could start using it for some of our work. Now we have 3 apps (and the fourth is being tested) in production using PyPy, 2 based on django and 1 (a REST api server) in pure WSGI. Currently we have improvements in raw performance (read: response times) that span from 8% to a pretty interesting 40%, but we have a peak of an astonishing 100-120% and even more. Take into the account that most of our apps are simple "blocking-on-db" ones, so a 2x increase is literally money. The best thing for now is that we started adding more threads as we experienced (even on the first plugin incarnation) an improved threads management.


Maciej Fijalkowski: We're working on improved GIL handling by the way

Roberto De Ioris: Well, personally I find cffi a silver bullet because it allows me to write lot of things I would have written in C directly in Python, but it is a personal need so I do not know how useful it is. :)


Maciej Fijalkowski: So tell me a few things about uWSGI

Roberto De Ioris: I have to say that it is a very particular project; the hosting market is pretty "frustrating" from various point of view. We decided to invest in an application server, after 4 years of infinite problems reported by our customers with various app servers that were available at that time. Most of those problems were related to bad practices, but I can assure you that telling a developer his code sucks is not easy, especially when he pays you :) so the initial spirit of uWSGI was "bypassing" bad practices, and constantly monitoring the app to spot problems and so on. It may look like a winning approach, but unfortunately a bad app is a bad app, there is nothing you can do to avoid it, so we decided to change direction and tried to implement a common solution for various hosting needs. In a couple of years we were able to support different languages and technologies on top of the same code. In addition to this our main (almost secret) objective was reducing resource usage of the application servers, which directly translates to being able to host more customers on the same server. This resulted in a product with really good performance and really low resource usage combined with an incredible number of features (most of them targeted at sysadmins), which allowed us to move our business into other areas. Companies like booking.com (perl) now uses it, as well as other companies in the hosting market (like PythonAnywhere).


Maciej Fijalkowski: I think it is the most popular way to deploy Python if you have performance needs

Roberto De Ioris: I suppose it is, but most of our customers choose it for features like the Emperor, or the zerg mode that simplifies management.


Maciej Fijalkowski: Cool, thanks! Anything you would like to add?

Roberto De Ioris: Well, personally I am one of those people that likes understanding the internals of computing, so projects like PyPy are an "amusement park". Yesterday I spotted a chat on IRC where people discussed reducing assembler instructions from 9 to 8. A lot of people consider this kind of thing annoying/boring/useless, but I really love them. It reminds me of the first year of my "computing" experience.


Maciej Fijalkowski: Thank you for your time!