Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Subinterpreters for Python (lwn.net)
157 points by lukastyrychtr on May 13, 2020 | hide | past | favorite | 139 comments


Subinterpreters existed from the very early days in the C API and were key to the implementation of mod_python (which I wrote). So if you used mod_python, you used subinterpeters without realizing it.

http://modpython.org/live/current/doc-html/pythonapi.html#mu...

EDIT: And it looks like I had subinterpreters in the first released version in May 2000, so the initial git (formerly SVN) commit already had them https://github.com/grisha/mod_python/blob/9b211b7e8a65f1af4b...

EDIT2: Just noticed this comment:

  * Nov 1998 - support for multiple interpreters introduced.


How did you deal with C-extensions, since apparently most don't support it at all (which is a shame, apparently we messed up culturally here).


I didn't :)


From the PEP: https://www.python.org/dev/peps/pep-0554/

> A common misconception is that this PEP also includes a promise that subinterpreters will no longer share the GIL. When that is clarified, the next question is "what is the point?". This is already answered at length in this PEP. Just to be clear, the value lies in:

    * increase exposure of the existing feature, which helps improve
      the code health of the entire CPython runtime
    * expose the (mostly) isolated execution of subinterpreters
    * preparation for per-interpreter GIL
    * encourage experimentation
I think I'll ask the followup question - what is the point of those? Why should we increase exposure of an existing feature we know is not fully baked and we know will cause problems with NumPy/SciPy? How will the exposure improve the code health of CPython and who will do the improvement? What is the advantage in exposing isolated execution of subinterpreters? In what way does exposing this feature help prepare for a per-interpreter GIL? What experiments are being encouraged specifically?


I think the Rationale section tries to explain the benefits? https://www.python.org/dev/peps/pep-0554/#rationale

What I don't get is how they expect C extensions to adapt. I don't expect widespread support for this for like... a decade?


The Nick Coghlan quote seems a bit out of context - sure, there are all these things it does that multiprocessing doesn't do, but multiprocessing allows concurrency at all and this doesn't. The comparison makes sense if it's talking about no-shared-state concurrency, but this PEP is quite clear that it's only proposing no-shared-state ... single-threaded operation.

The rest of the section is sparse on what you actually do with it. It says that it "has the potential to be a powerful tool" but not what you do with the power. It says it's about "enabling the fundamental capability of multiple isolated interpreters" but it's not clear what the capability brings you. The only thing with some details is the first sentence - "Running code in multiple interpreters provides a useful level of isolation within the same process." But what is the level? It sounds like it gives you isolation for most things but not all, which is about as useful as a face mask with a breathing hole cut out in it.

It'd help to see a concrete answer of something you can build with this that you can't build without it (or perhaps not as easily/performantly/reliably/etc.). The Ceph PR, where they do something similar themselves (https://github.com/ceph/ceph/pull/14971), gives a good answer: "Notably, with this change, it's possible to have more than one mgr module use cherrypy." (But it's still not clear to me why the mgr can't just use multiple Python interpreters... ceph-mgr is kind of an amalgam of various useful services for your Ceph cluster, and it's never been clear to me why it has to be "the" manager with some modules instead of various independent services you can turn off/on as you need.)


> multiprocessing allows concurrency at all and this doesn't

I do not think multiprocessing will meet the concurrency needs of many potential applications.

For one thing, I've looked at the multiprocessing source code, and to me it looks like a hairball of bugs just waiting to happen. To run Python code in a separate process with some set of initial data, it basically pickles it in the parent and then unpickles it in the child. That seems awfully kludgey to me.

For another, if you have multiple concurrent tasks that have to share data, doing it within a single process between multiple threads is a lot easier than doing it between multiple processes. Every other major language besides Python allows the former; only Python has the GIL which makes the former basically useless.


> To run Python code in a separate process with some set of initial data, it basically pickles it in the parent and then unpickles it in the child.

FWIW, the subinterpreter threading scheme will also use a similar message passing construct to pass values between interpreters. As the sibling comment mentions, it's just a message passing scheme.

That said,

> I've looked at the multiprocessing source code, and to me it looks like a hairball of bugs just waiting to happen.

I have as well, and completely agree.


Came here just to agree.


This is nothing but message passing. In the high performance computing world, MPI is the standard scheme for running massively parallel calculations. While Python's pickle may not be the highest performance serialization scheme, it is pretty fast and transparently handles most objects you can throw at it. It certainly does not cover all the same use cases as threads and as you point out shared data it is much easier to use theads.


I ended up having to switch to dill instead of the default pickle a while back in one of my projects, because pickling lambdas is apparently not possible by default.

But in general it's pretty useful.


Same here.

Everything seemed to be working fine until it didn't.


> Every other major language besides Python allows the former

As noted in a reply to your other comment, this is not true -- Ruby has the equivalent of Python's GIL.


Only CRuby has a GIL. JRuby is totally viable for production use and doesn’t have a GIL.


And python also has Jython and other implementations that don't have the GIL. I don't understand what point is being made with that line of thinking?


Does anyone use Jython though? In my experience, only CPython and PyPy are commonly used, both of which have a GIL.


Yes, that's why the work has to start ASAP.

There is currently a project to abstract the C-API with HPy: https://speakerdeck.com/antocuni/hpy-a-future-proof-way-of-e...

The idea is to (very slowly) remove access to direct CPython internals and give a handle to important low-level behavior through a standardized API hidding implementation details.

It's a lot of work, and going to affect a huge part of the ecosystem, but I'm hopeful this is something the extensions authors will actually want to be part of because it promises to solve a lot of problems:

- on the long run, this will make maintaining a c-extension much easier: more stability no matter the evolution of upstream, more certainty.

- there will be a clear way to do things, which means better documentation and conventions, potentially libs to do common things. Right now creating extensions is more a set of receipes people found out to work.

- because of this, scrutiny will be given to common patterns, which will be abstracted and optimized, and given the security of an API contract

- extensions using this API will work out of the box, or faster if they already did, on alternative python implementations like pypy

Of course, the end goal is to free the python core devs to experiment with breaking changes at the c level, such as changing the memory management model, and eventually maybe remove the GIL or allow for a JIT.


My take on this is that subinterpreters by themselves don't excite me that much, but the possibility of having subinterpreters running each in their own thread that do not have to share a single GIL is a huge potential win for Python. No other popular language has the huge restriction on multi-threaded code that Python has with the GIL. I have been hoping for years that the Python dev team would recognize that as an issue and try to do something about it.


> No other popular language has the huge restriction on multi-threaded code that Python has with the GIL.

Is Ruby’s (MRI’s) GVL equivalent to Python’s GIL?


It is.


I feel like Lua would have gotten more traction in the early days, and maybe eclipsed Python, if it wasn't so hyper-focused on being a tiny embed-able language/runtime. It was so well-designed from the very beginning.

Every new release of Python seems like it gets bigger and bigger and more and more incomprehensible. The Python 2 -> 3 transition was a (necessary?) disaster. And now we're trying, in earnest, to figure out how to get rid of the GIL. The async/await syntax is a whole other fiasco. Now we have colored functions all over the place. Python code doesn't even work with Python code. It's just an absolute mess.


I tried compiling a machine learning project recently that depended on lua and I had lua version problems (5.2 vs 5.4)

I agree with the 2->3 problem. I also wish there were no GIL.

That said, python is my favorite language of them all.

I wonder if the way forward might just be to expect different incompatible versions from time to time and create robust facilities to handle it.


> if it wasn't so hyper-focused on being a tiny embed-able language/runtime

What do you think Lua's developers should have done differently?

What stops Lua from being used as a general-purpose scripting language?


I think Lua is most useful because it's a hyper-focused embeddable scripting language.

Probably the biggest errors were the historical either/or integer/floating point numbers, and the weird PHP-like hash-tables-are-the-only-first-class-datastructure.


> Python code doesn't even work with Python code

async / await are just primitives, they are, in no way, incompatible with the rest of Python code.

At worst, if you mix blocking and non blocking, you loose the performance benefit, that's all. It all works. You are just back to non async execution.

You can use async functions in sync code and vice-versa.

In fact, you can use async / await for asyncio, or another lib, or no lib at all, it's a completly lib agnostic concept. It doesn't even need any io. async / await is just a "yield from" which understands __await__.

It's an interface. A contract. Like generators.

What's more, the python community is now following the best pratice of creating Sans I/O libs (https://sans-io.readthedocs.io/), which means you can plug in whatever I/O lib you want.

There is a lot of confusion around asyncio, partly because of the communication and doc around it, partly because it sucked up until python 3.7, and partly because a lot of python devs have absolutely no need for it and see it from the point of view of their day to day task.

But it's definitly not fiasco, it's quite a handy tool, and the generic interface of async / await is nifty and allows cool designs.


Asyncio rocks. It’s so easy to create performant little apps that can get put up without a web server in front of them.

Type hinting rocks. Use it properly everywhere and you can a lot of confidence and speed when refactoring.

I use attrs (I know there are alternatives) to get legit type checking on object fields. Yes it is runtime type checking but if you write tests, those errors will pop up before you ever run the app.

Honestly I think the priority with Python should just be around digging deep into the implementations for performance gains. I’m very happy with how the ergonomics of the language are now.


Can you summarize how Python async improved in 3.7?


Not async in general, but asyncio in particular.

Before 3.7, you had to be able to fetch the loop, handle its reference, pass it around, manage its life cycle, etc.

Starting from 3.7, only low level code (E.G: framework authors) would need to, the API is organized so that by default, you always perform something in the current running loop.

It's a big deal, and remove a ton of complexity. Maintenance is easier, learning curve is flatter.

The best example, is that in 3.6 and lower, you had to do "loop = asyncio.get_event_loop()", then call "loop.run_until_complete()". In 3.7, you use asyncio.run(). It seems trivial, it's really not:

https://github.com/python/asyncio/blob/db2fe1d1e1d0f7352167a....

This has consequences on the whole design, which is now full of things like that (run_in_executor, create_task, run_forever, etc), changing completly the way to write correct code.

Another thing is that in 3.7, the asyncio uses async / await and not yield from. Because of this dog fooding, the whole lib improved.

3.7 has a LOT of patches just for asyncio: https://docs.python.org/3.7/whatsnew/changelog.html#python-3...

Speed (Future has been rewritten in C, and many other tweaks), fixes, _way_ better doc...

Obvious shortcomming were resolved, like asyncio.wait_for() that now waits until the cancellation is complete, or asyncio.all_tasks() returning only pending tasks. A lot of little things that add up.

The whole ergonomics are better, up to the command line experience with "python -X dev" which automatically add warnings and set asyncio in debug mode, and frankly you can't cleanly code async without.

The final nail in the coffin are the context vars (https://docs.python.org/3/library/contextvars.html). It's the only sane way to deal with a global registery that is dependant of the current running task.

It's almost always easier to upgrade to 3.7 than trying to use asyncio before it.


It's the natural progression of tiny programming languages that become popular.

The idioms and programming styles change over time. Languages age. You add more syntax and semantics until some people start preferring more simple solutions and everything starts again.

C has stayed clean but C++ can be seen as fork of C into never ending complexity.

Java started as small language intended to be used like Javascript.

JavaScript started as small, now it's becoming Java.


I think that the beauty of Python is that is still Python.

I mostly still use the same features I used 10 years ago. And so do most of my clients.

But occasionally, I have the option to go the extra mile, and use asyncio for perfs or type hints for robustness.

The language grew big, but you don't feel it on the day to day coding.


What's wrong with the async/await syntax?


It should have been implemented as functions or attributes instead of language syntax sugar.

Like Lua did. Like Rust did.

Imagine this:

   def foo():
       return "bar"

   async def foo():
       return "bar"
How do you know how to call that function? It's called the same thing. It has the same signature.. It returns the same thing. One is sync, the other is async. This is "colored functions".

You have to call the 2nd one like:

    await foo()

why do you have to care about that? It's either a coroutine or it's not. All functions in Python should just be callable like normal functions. But the Computer Scientists went and messed that up and made colored functions.

It's absurd.


I don't mind the "await foo()" so much; as others have noted, it makes explicit that your code is yielding control at that point.

What I mind is having to do "async def", "async for", "async with" all over the place for no good reason. Python didn't do that with generators; you defined an ordinary function, and if it had a "yield" somewhere in its body, it returned a generator. I don't have to do "gen def", "gen with", "gen for", etc. to remind the interpreter that it's a generator.

The obvious way to handle async coroutines is the same: you define an ordinary function, and if it has "await" somewhere in its body, it's a coroutine. I shouldn't have to be reminding the interpreter everywhere that it's a coroutine.

(The reason "await" is a special case is that, with a generator, it's just an ordinary iterable, so it's not yielding control if I just use it like any other iterable, say in a for statement. If I want to do something with the generator that might yield control (as in the "using generators as coroutines" paradigm that came before async/await), I have to call its "send()" method, or otherwise do something special that will stand out. So there's no need to add a keyword for it. But with a coroutine, ordinary call sites that wouldn't yield control with any other kind of function can yield control, so it makes sense to add a keyword to make that explicit.)


I have the opposition opinion. I think generators should have had an explicit keyword for defining them, like:

gen def stuff()

Because it doesn't, people are very confused about generators, and think they are functions, which they are not. They are completely different objects.

Plus there is no way to tell if a definition is a generator without reading all of it.

Explicit is better than implicit and all that.


> What I mind is having to do "async def", "async for", "async with" all over the place for no good reason.

IMO `async for` and `async with` are misnomers - they are statements that call asynchronous functions (`__anext__`, `__aenter__` and `__aexit__`) so would have been better named something like `for await` and `with await`.

`async def`, on the other hand, was just copied cargo-cult-style from C#. I don't think it's necessarily bad, but you're right to say that it's inconsistent with Python's existing generators.


If all you're ever going to do with the result of an async function is await it immediately after calling, then you don't need async functions because you're just doing synchronous programming with extra syntax.

The reason there's a difference between foo() and await foo() is that the former gives you a future (aka Promise in JavaScript), while the latter gives you the result. You can then wait on multiple futures, so that things can happen concurrently.

If you add type annotations to your code and use mypy (which you definitely should), it'll pick up the distinction between the two types - Awaitable[str] vs str - and give you an error if you try to pass the future object to a function that expects a string.

The only case where I've run into problems is where I've called an async function that returns None, but have forgotten to await the result. In that case the function never actually begins execution.


Writing multi-threaded programs is much, much harder than single-threaded code. I think part of the attraction of async/await is that you can write mostly non-concurrent code. But then it gets extremely important that you're aware when you might have concurrency. For instance:

    x = get_something_from_global_state()
    y = foo()
    insert_into_global_state(x, y)
If the program is suddenly possibly concurrent when calling foo(), then perhaps x is no longer valid. Good luck debugging that. await foo() makes it much more obvious.

Now, of course, in scenarios where you don't really care about this property, e.g. if you're dealing with a database in another process where it is obvious that stuff may happen behind your back, or if you don't actually have any concurrency, I get that await is probably pretty annoying.


This was already the case with generators wasn't it? Worse in that case it depend on the yield statement, which cannot be seen from the function definition.

In fact I'm pretty sure generators and async/await are almost the same feature, with possibly just some additional logic to allow await to suspend computation (so you don't have to write a loop `while not done(): sleep()` manually).


Yes, `await` is just a modified form of `yield from`.


> How do you know how to call that function?

async def foo() is not a function and it doesn't have the same signature: the async is very explicit.

It's a different primitive, and it's not used AT ALL for the same thing at a function.

It's like writting:

    def foo():
        yield "bar"
and saying, 'how do you cal that function? How do you know how to call that function? It's called the same thing. It has the same signature.. It returns the same thing. One has one exit point, the other has several. This is "colored functions".

You have to call the 2nd one like:

    next(iter(foo()))
It's absurd'

Well, no, it's not. It's a different tool. It's not a function, it's a generator.

You don't have to care about async/await. In fact, you can live your entire life without using them.

But it's a handy tool that allows you do:

- delegate asynchronous behavior, like I/O, outside of your callable

- make you match the syntaxic order with execution order despite said delegation

- make it clear and explicit where you chose to do so, indentifying the context switching of your program

Does it seems complicated ? It's because it is, and you probably don't need it.

But if you need it, it's easier than the alternative of using callbacks, and less error prones that implicit asyncronicity.


Isn't your example exactly how Rust works to?


Rust is statically typed, so it’s not quite the same effect


The argument for requiring `await` is that it makes explicit which calls might yield control back to the event loop.

With Lua’s stackful coroutines, this information would either be hidden or just a naming convention, e.g. read_file vs read_file_async.


I'll bite. Opinion follows, no need to downvote.

async def

is terrible, of the syntactic/semantic options it is the worst.

Best would have been to just allow creation of a name of a coroutine-usable version of a function-

def f: # normal f ....

a = async(f)

That allows existing functions to be turned into coroutines while not introducing the new syntactic pattern of a double word declaration.


You can write a trivial decorator to make any normal blocking function async-usable (i.e. your suggestion is possible today and only takes a few lines of code — not even using deprecated functionality, I mean), but that doesn’t magically make it async-capable if it’s not designed for async in the first place.

If you’re arguing for generator-based coroutines, then (1) that’s not just any normal function (check inspect.isgeneratorfunction and inspect.iscoroutinefunction); (2) it was already explored and deprecated. Guess we can have another talk on why if you wish.


Sure, there are a few different issues here, and my response above was mostly off the cuff.

I have had the pleasure of engaging with cooperative vs scheduled concurrency since the 1980s, in the early days of the Mac, and asynchronous IO since it started to enter the mainstream in the 1990s. I don't write enough python any more that this impacts me much; more that I remember finding out about it and being astonished.

That's not to say- the async PEP is well written and one gets the mental model the author is promoting and the problems the PEP is trying to solve. I used tornado a bunch in its early days and it was and I'm sure still is a lovely piece of engineering.

Nevertheless, among the reasons for my astonishment were:

* there is a difference between cooperative concurrency and largely IO-scheduling driven asynchrony. There are a range of protocols that callers and callees engage in, a level of greater or lesser interest that the caller has on what and when the callee does it, and a variety of ways the callee can make what it did more or less eventually known to the caller. Cooperation and async capture two distinct common use cases in this rich space, yet the python implementation seems to improperly lump the two together. And the added keywords are crude and only provide partial expression of the range of options. For instance, deadlines are a top level consideration both in cooperation and in asynchrony, yet there is no syntax for specifying them.

* there is a level of sophistication required for effective cooperative or asynchronous programming that, despite the rise of node and so forth, is simply not met by the bulk of programmers employing it, nor necessary in the programs they are writing. Elevating the syntax to first class as though it is something for most python programmers to consider- well, you can have a synchronous function, or an asynchronous function, pick one- is putting power tools in the hands of people who mostly don't know how to use them working on problems that entirely don't need them. Especially for python, this seems like the wrong thing to have done.

* the doubled keywords pattern is ugly. Yes, if you are going to allow for cooperative behavior, decorating the affected syntactic entities is necessary. If it had to get added to the language I would much rather have seen new single tokens- adef and afor- for this purpose, maybe added to an "expert" module to be "imported" before use. With a whole new declarative keyword, one asks- what else can this be applied to? Why not have async variables or instance members? The syntax suggests itself and there are potential semantics.

Forgive me, but I was shocked when I saw this and still think it was an immature and unnecessary addition to the language.


These are reasonable arguments but I largely disagree as someone who might not meet your threshold of sophistication.

* Apparently the "crude" syntax is supplemented by an asyncio library providing more fine grained controls without further complicating Python syntax.

* I don't think gatekeeping is a very good argument against a language feature. Obviously Node is special because it forces you to use async even if you just want to write boring old blocking code, but in Python this is strictly optional, and having it as a language feature doesn't mean "most Python programmers" are suddenly going to consider it, just like most Python programmers don't consider metaclasses. I'm certain the vast majority of Python programmers aren't having this inner debate of "you can have a synchronous function, or an asynchronous function, pick one"; it's been five years and four major Python versions and I've never heard anyone complaining about this choice when it's not relevant.

* Meanwhile, I've personally used native asyncio for both of your distinct common use cases (replaced threading-based solutions, saw saner code and some performance improvements), so I consider it a success.

* Ugliness is subjective, I'd say adef/afor/awith/etc. are more of a wtf if I'm new to the language, and you end up with more keywords. The async keyword also leaves the door open; async def/with were introduced in py35 while async for was introduced in py36, so it is conceivable that the async keyword could be expanded to more use cases in the future.


Thanks, fair points, no offense intended.


None taken, I honestly don’t think I’m very sophisticated when it comes to async and concurrency (but I do benefit from being served).


How would you implement async() under the covers? If you wanted something like go (or loom) then wouldn't you need stack swapping? And if you have those then you muddy Python's #1 best feature: the almost free C ffi.


You still can do that using @asyncio.coroutine decorator

for example

    @asyncio.coroutine
    def hello():
        return "foo"
Although not for long, they deprecated it in 3.8 and will be removed in 3.10. I wish they would leave it, since I was able to use them to write API that could be synchronous or not depending which constructor was invoked. I'm wondering if there are better ways to do it.


I usually see folks cite that async/await exposes, to the programmer, implementation details that one wishes were hidden by the compiler/interpreter.


Everybody wishes asynchronous functionality was magically hidden by the interpreter until unexpected shit starts happening.


That is python's MO, is why things like the double underscore functions exist. Being able to declare a coroutine (not just use) is not a bad idea, but IMHO it was done badly.


Python 2 -> 3 was such a waste of everyones time and has done unrepairable damage to it's ecosystem.


I think the Haskell saying "avoid success at all costs" is relevant here. Lua might have become a more popular language if it was not so focused on being embed-able. However, it would likely have failed at its goal of being an embed-able language.


Sorry can you EILI(have 3ish years of python experience but rarely get this far into the weeds.)

What is an example of being more incomprehensible?

>Python code doesn't even work with Python code

Is this statement a function of the 2 > 3 transition.


Probably additions like the Walrus operator that "complicate" the syntax and negate the "Zen" of Python by having one right way of doing something.

Migrating away from the pseudocodeness of Python.

Just a guess though I have about the same experience as you.


> What is an example of being more incomprehensible?

Python's type hints, in particular, have added a huge amount of complexity to the last few versions of the language.


Type annotations are completely optional though. It's possible to write Python that targets 3.5 and 3.8 at the same time.


The core developers made data classes unnecessarily dependent on type annotations, which made me worry that annotations would become less-and-less optional over time. Fortunately, that doesn't seem to be happening.

However, the numerous (very long) typing-related PEPs are definitely making the language larger and more complex.

Also, any code that uses annotations is more verbose and less like what I want from Python - executable pseudocode.


> Fortunately [annotations becoming less-and-less optional over time] doesn't seem to be happening.

On the other hand, Guido himself recently said "new stdlib code should be written with annotations inline" [0] whereas the core developers original position was "it will be the choice of the author of new stdlib modules whether and how to use type hints" [1], or even "We will not be putting type annotations anywhere in the stdlib" [2].

[0] https://pyfound.blogspot.com/2020/04/the-path-forward-for-ty...

[1] https://mail.python.org/archives/list/python-dev@python.org/...

[2] https://mail.python.org/archives/list/python-dev@python.org/...


> NumPy core developer Sebastian Berg chimed in as well. He suggested that it could take up to a solid year of work to support subinterpreters in NumPy.

Whoa, that's a shame. I actually found it really easy to patch NumPy to make it PyParallel-compatible -- simply had to tweak the memory allocator stuff: https://github.com/pyparallel/numpy/commit/046311ac1d66cec78...

Example that loaded a 12GB NumPy array and serviced requests in parallel: https://github.com/pyparallel/pyparallel/blob/branches/3.3-p....

I really wish the PyParallel approach gained more traction. Having the solution Windows-only reaaaaally didn't help with having other core developers experiment with the approach.

[*]: https://pyparallel.org


This seems great!!

Obviously, being Windows-only is a big drawback - is there any hope it could be implemented for other platforms some day? (I mean in terms of technical feasibility, rather than effort, money etc)

I know literally nothing about the OS side of things relevant to this topic, but googling for "async io linux" turns up io_uring which seems to date from 2019ish and is maybe addressing some of what's lacking there?


If you're looking for a bit more info on the Windows-only aspect, there is a lot of detail here: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-....

I think the TL;DR though is that PyParallel was a successful proof-of-concept (but seems to have failed at moving the bar anywhere) because the threadpool and async I/O primitives on Windows are so much more sophisticated than what's available on any other platform.

On Linux/Mac, I'd have to write so much scaffolding to get the same baseline functionality offered by Windows, and many parts wouldn't even be possible to replicate. It'd be a huge engineering effort that would take a team of people. (Just like all the Vista+ threadpool stuff took a team of kernel engineers working for years at Microsoft.)

That being said, I've been looking at stuff like the Chromium cross-platform threadpool stuff recently and that could potentially be used as a substitute (I believe it maps 1:1 with native threadpool APIs on Windows, and mimics the best it can on Linux/Mac). But that's an unwieldy 3rd party package for Python to suddenly depend on.

I also disagree with the sentiment that the GIL, parallel computing and async I/O are all separate, orthogonal pieces. The reason PyParallel was so performant was the fact that I treated all three as very intertwined concepts that had to be addressed all at once.


As usual, when someone makes a move towards the Coveted Feature, people promptly insist that gradual progress is impossible and the only way is with every problem being solved at once in the hundred million modules—each somehow taking a year to undo the use of global variables. I.e. same as it was for the past twenty years when gradual progress could've been made. Deference is again made towards Numpy, the de-facto implementation of Python. No trace of choice is offered to users who would perhaps use a handful of modules from those that happen to be converted at a point in time, or use Python in ways that aren't practical now.

The actual problem, of course, is that multi-threading Python would still be slow, and those who see it as just a shell to run C modules, do have a point.


This seems like such a significant problem for the language and has been around since the beginning of time. Considering the amount of value that Python adds to companies and individuals around the world, is there a reason that institutions or someone with the means hasn't funded a project to "solve" the GIL problem?


PyPy put up a prototype and proposal to deliver a GILless python for $100,000. No one took us up on the offer. It is still open.

https://morepypy.blogspot.com/2017/08/lets-remove-global-int...


I find it really sad that no company will fund this. One of the huge python deployments would recoup this in hosting quickly.

Instead we’re all just burning more money on AWS.


It's far more common to have Python programs be I/O bound, and when they are CPU bound it's often not due to GIL contention (remember that C extensions can drop the GIL before doing something lengthy). It would be nice if the GIL was gone but a fair fraction of Python developers would not notice much difference.


> It's far more common to have Python programs be I/O bound

But part of the reason for that is that Python programmers know there's no point in trying to run multiple CPU-bound tasks in the same Python process, so they don't try.

> C extensions can drop the GIL before doing something lengthy

Yes, but they're still limited in what they can do--as soon as they call back into a Python bytecode they're GIL-bound again. And if you depend on C extensions any time you need CPU bound concurrent tasks, you're giving up a lot of the advantages of using Python in the first place.


It’s part of the reason but not all or, I suspect, even most: the number of things which people need to compute with multi threading but without much I/O is not an especially large fraction of what people use pure Python for. If you need raw CPU speed for arbitrary code, it’s not the language most people would pick.

The exceptions also tend to have existing high-quality extensions (crypto, compression, image processing, etc.) so while it’s technically true that you’re giving up Python most people aren’t doing that personally - they’re just calling Pillow or numpy - or they’re using it for a tiny fraction of the total program.

Frequently this ends up being the same speed or even faster than using other languages because most people are either using the same C libraries or learning just how many optimizations their simple implementation lacked.

Again, it’s not hard to come up with things where the GIL is inarguably a bottleneck but it comes up a lot more in debates than real-life in my experience.


I'm one of those with embarrassingly-parallel cpu-bound workloads. The multiprocessing module works, but the extra bookkeeping and plumbing over an actually-parallel-multithreading implementation is a pain in the butt. That said, the multithreading speedup is both faster and easier than porting to another language.


Definitely - I used to support a computational lab and have worked on enough other CPU-bound problems to have plenty of things which I wouldn’t recommend Python for. I just think that as a field we’re predisposed to focus on that as the kind of work Real Programmers™️ do when most groups are limited by their ability to implement and maintain business logic long before they hit the wall on what Python can do.


Does I/O release the GIL in Python?


Yes, generally.


I think it's just not that big of a problem / can be worked around. I had a multi terabyte of RAM across 1000s of cores python product and we just used one process per core


Google threw money at the problem and that didn't help. Lookup the history behind the "Unladen Swallow" project.


> Google threw money at the problem

Not really, especially compared to the amount of money they threw at JavaScript. IIRC Unladen Swallow was just a couple of interns.


Unladen Swallow wasn’t actually a serious attempt like V8 or anything and a large part of why it failed was that LLVM wasn’t ready and even when it was, it was ill suited to compiling a highly dynamic language like Python.


I think the big problem is all the code out there that depends on the GIL. A hypothetical python4 that removed the GIL would probably cause issues of the same general nature as the 2 -> 3 migration, though at a reduced magnitude.


I recall digging into Python subinterpreters a decade ago. Abandoned it because the real killer wasn’t shared GIL, it was shared modules. If one subinterpreter imports a module and modifies that module’s state, then every other subinterpreter that uses that module is impacted too.

Article says nothing about that, which makes me very cautious. The GIL only impacts performance; module sharing wrecks robustness.

Even if each subinterpreter does now keep its own module cache, there’s still the challenge of working safely with common C-level resources such as file handles. (Other than telling users “don’t do that”—and GLWT.)

I’ve used Python now for 17 years and it’s been a very productive tool for me. But it definitely has its baked-in limitations and fighting those is an exercise in rapidly diminishing returns.

Something as fundamental as parallelism can’t just be slopped on top of a language as an afterthought; it needs to be designed for it from the start. So rather than trying to retrofit bad parallelism to Python, perhaps it’d be sense to bring the positive parts of Python over to something that already does parallelism right, such as Erlang, and build from there.


I don't see the benefit of a subinterpreter compared to a subprocess. I wish python didn't have a GIL, but I don't see what problems this would solve for me that multiprocessing doesn't.


Getting to a go-like lightweight concurrency model in the language. Different abstraction than multi process.

Py subinterpreter will never be as cheap as a goroutine stack, so you wont be able to run millions of pyroutines. But thousands is a reasonable and useful goal.


It's interesting to compare with Go, which uses goroutines for all concurrency. Python has threading, multiprocessing, async-await and now subinterpreters.

Remind me which language is designed to have "one obvious way to do it"...


Yeah, in his later years I think Guido stopped having the energy to keep saying no.

Not a criticism. He deserves nothing but plaudits. One of best earned retirements in all of CS history, IMO.


> Yeah, in his later years I think Guido stopped having the energy to keep saying no.

https://twitter.com/gvanrossum/status/773593466609610753


Python was first released in 1991. It's older than Java. It existed before internet or multi-core was common.

You have to balance "one obvious way to do it", "saying relevant" and "backward compatibility".

We broke compat once in 25 years to stay relevant with I/O, and gave 13 years to migrate.

And still, it was an outrage.

Let's see what Go looks like in 2034.


and greenlet, and stackless, and pypy's stm..

python's never really been about dogma. part of why its package management is such a mess is because it's been so successful at integrating with everything under the sun and still needs to support it all.


Honestly for python there's still one obvious way given just about all use-cases. Use multiprocessing for now.

In the future a lot of stuff is going to shift over to async-await. Until enough of your dependencies do it's fine if you ignore it for now.

If you need any of the other options you'll probably already know and don't need to ask.

Threading is good if you're doing a lot of waiting, but this use case is more or less replaced by async await.

Multiprocessing is good multi-purpose. It works around the GIL and so works for most use cases.

Async-await is like threading with a nicer programmer interface and works especially well for networking code and the like. The nice thing about async-await is that it's trivially interoperable with all the previous options. When using a framework or library that has an async-await API it'll be doing whichever form works best on the backside. E.g. awaiting a database query might spawn a thread or a process and you don't need to care.

Subinterpreters are only relevant if you're writing C-code that embeds the python interpreter. So e.g. the mod_wsgi developers might care. A python developer never uses these directly.

So unless you're a library author or doing fairly low-level things you won't need to care about any of the options but async-await pretty soon. The only awkward thing is that async-await isn't widely used/supported yet leading to there currently being two obvious ways to do it depending on what your libraries support.

tl;dr; Use multiprocessing if you must. use async-await if you can. Let library developer worry about the tricky bits. Pretty soon just use async-await always.


> Subinterpreters are only relevant if you're writing C-code that embeds the python interpreter. [...] A python developer never uses these directly.

The linked article is about making subinterpreters available to pure Python code.

> Pretty soon just use async-await always.

I can see async/await supplanting threading for I/O-bound concurrency, but it doesn't provide parallelism. For that, maybe subinterpreters will eventually supplant multiprocessing?


async / await is just a protocol to delegate async behavior outside of your control flow.

You can provide the async behavior using any solution you want: threads, io event loop, subinterpretters, subprocess...

It can all be called from async / await.

In fact, if you use asyncio, the lib, it provides mechanisme to await from threads, subprocess.Popen, and multiprocessing pools.


Those are all different kinds of “it” with differences which affect your application design.

If you want a stronger criticism, pick multiprocessing/threading and concurrent, and the long migration trajectory for deprecating the older ones.


> Those are all different kinds of “it” with differences which affect your application design.

"It" here is concurrency, and Go proves that its possible for a language to have "one way to do it". Goroutines are often seen as an alternative to async/await, but they're also a better parallelism solution than all of Python's threading, multiprocessing and subinterpreters.

> If you want a stronger criticism, pick multiprocessing/threading and concurrent, and the long migration trajectory for deprecating the older ones.

I haven't heard anything about deprecating multiprocessing/threading. Is that the long-term plan for subinterpreters?


I never understood why Stackless Python didn't become the de-facto implementation.


Perfomance,I suppose? Spawning a process is expensive.


I understood that on Linux using a process was similar in speed to a thread, though they're much slower on Windows. Has that changed?

For my python multi core code I like using fork with sending objects over a socket with pickle. It gives more control than multiprocessing.

It works pretty well. One downside of fork with python is that the reference counters are scattered about, leading to big COW memory churn, however. Big numpy arrays should be shared, however.


Spwaning the process is not the only problem, communicating between them is also expensive.


This makes sense. Combine it with asyncio and you have a full Node-like runtime without relying on Uvicorn and the like (even though clustering is still good for scaling).


If you're using asyncio, I don't see what having a subinterpreter gains you. Asyncio by itself is already a Node-like runtime without relying on third-party libraries.


True true, my comment was not clear. With asyncio it's still a single threaded event loop (that can be clustered like Node or any Python processes as has always been done).

With sub-interpreters, I wonder how different it is from normal multithreading like in JVM. Need to read up on this idea. I can't see how it wouldn't require the same locking and mutexes or message passing if multiple interpreters are to work on logically related data.

So my immediate thoughts were about leveraging it as a replacement for multithreading and event loop clustering by treating the interpreters as lightweight processes and having them communicate over some kind of protocol. Like how the BEAM does built-in process supervision.

The "supervisor" process itself is async in the way it coordinates the tree.


Aaand I finally read the article and PEP and there's no escape from the GIL :( Damn wishful thinking.


Yeah... super confusing.


Tcl has had those for...20 years or so? They're pretty handy for some things.


Absolute sandboxing, w the option to engage “safe” interp that also (tuneably) segregates it from OS facilities like the network and filesystem.

/s Will be looking forward to Pythonistas to be extolling the virtues of multiple interps as much as hearing node folk go on about event-oriented programming ;)

Nb: Much love to both node and python. Looking forward to seeing python exercise this.


Yes, and have you heard about the new hotness in Python, f-strings? Special strings where you can actually embed variables and executable commands WITHIN the string!!!


This is pretty common for languages right? TCL and the JVM come to mind. Nice for code running in a sandboxed environment, especially if you have full control. I think python is getting more interesting lately, always found it a bit of a boring language.


>In particular, giving each subinterpreter its own global interpreter lock (GIL) is not (yet) on the table.

What's the use then?



So, basically, no real reason besides "preparation for per-interpreter GIL"...

I, for one, wouldn't be encouraged to use it in this state, where it doesn't really bring any benefits yet!


Next release iirc.


I suspect at this point it will require a new interpreter to fully address the deeply-embedded assumptions about memory access and bytecode execution in CPython. This has been tried many times, though, with only PyPy really seeing much traction. CPython being the de-facto interpreter, and the huge ecosystem of widely-used C extensions, makes any change likely to be slow... and the upside would need to be very compelling.

I think what we see instead is people moving to different runtimes altogether -- golang, jvm, v8, whatever.

I would suggest that Python has gone off the rails trying to mimic other languages too much recently. At some point the identity crisis hopefully ends and Python will return to it's strengths -- which are best expressed by the Zen of Python. Until then, get ready for more features.



My first thought was, like many others, "if they still share a GIL then why bother?"

But if this is a first step along the road to that, then it's all good and why not.

Another thought is: could this be a "thread-safe alternative to gevent" in some use cases?

I'm thinking particularly of web app hosting situation where you have something like uWSGI+gevent+your WSGI app

In that case, apart from the required gevent.monkeypatch (which presumably would not be needed with subinterpreters) your web app code does not explicitly do any stuff with gevent. But it is running in a gevent thread, so your web app code now has a non-obvious requirement to be thread-safe. This has bitten me with subtle bugs in the past.

I would be interested to explore the characteristics of a uWSGI+subinterpreter+WSGI app set up. It should avoid the thread-safety issue. Like gevent threads, the subinterpreters would share a GIL. I guess it might use more memory?


I really enjoy this style of article. Looks like the author based most of this off of mailing-list interactions. There's clearly fascinating information within those threads, despite how difficult they are to follow retroactively.


Absolutely. LWN has quite a lot of those; it's one of the main reasons I have a subscription there.


My 2 cent would be to make the necessary changes in the standard library such that an external library (on pypi) can enable the feature. That lets us start working with them, but puts no pressure on c extension developers to support it because subintepreters “aren’t standandard” but just yet another pypi library which might fail in combination with others. Then in a couple of years if the ecosystem adopt and supports them, it can be moved to standard.


Is this like V8 Isolates for Python?


Oh how I wish Python had just sucked it up at 3.0 and eaten the performance hit for removing the GIL.

By now, everybody would have optimized it back to normal (or better!).


There was no possible way to do that given the things python lets you get away with "atomically". Requiring explicit synchronization to avoid data races would have been an even bigger language break than the other breaking changes in 3. Python is not a language for people who want to do fine-grained explicit concurrency.

And doing it implicitly would require stupidly fine-grained locks on all objects, destroying any performance gains; there's no way to "optimize" that to GIL performance.


There's still plenty of programs where you'd want to spawn a thread that basically doesn't touch any of the data of the parent thread. It'd be nice if there was a way around the GIL for those cases.

As an example take the multiprocessing.pool.ThreadPool.map function. Most of the use-cases of that function only read from the parent threads memory once to pass the function arguments. After that the thread may very well spend a lot of time only reading memory it reserved itself. It's rather wasteful to have that thread wait on the GIL.

Of course I haven't got an obvious solution either but it seems to me that for at least a subset of the uses of threading you could work around the GIL without breaking python too badly.

Perhaps split the GIL into a per-thread Thread-interpreter-lock. That way each object can simply be annotated with which lock it belongs to. That way you still get proper atomicity like you're used to in python, but a thread might also actually run concurrently a lot of the time if it only touches objects it created itself.


> There's still plenty of programs where you'd want to spawn a thread that basically doesn't touch any of the data of the parent thread. It'd be nice if there was a way around the GIL for those cases.

Subinterpreters gets you that (eventually; not in v1 apparently).

> Perhaps split the GIL into a per-thread Thread-interpreter-lock. That way each object can simply be annotated with which lock it belongs to. That way you still get proper atomicity like you're used to in python, but a thread might also actually run concurrently a lot of the time if it only touches objects it created itself.

I don't think this scheme would work especially well (or not better than subinterpreters). Message passing still requires copying, or some way to quiesce other threads and recursively change lock ownership of an object graph. To avoid deadlocks, at least one thread will need to drop its own lock to acquire the thread lock of objects owned by a 2nd thread. I'm not sure you could do that and preserve the legacy global atomicity behavior, and it would by definition be slower and bloatier on any individual thread than the existing GIL behavior.


Does the CPython GIL give you a stronger effective guarantee than the one in CRuby? Because the one in CRuby will absolutely not save you from data races.


There really was no hope of this without ditching reference counting, and consequently a much more destablizing change to the C API which was all but entirely unharmed during the 3.x transition.

There have been several attempts to bring new science to bear on solving the refcounting problem, I don't think any of them ever got far enough to be taken seriously

A significant chunk of Python performance is derived from optimizations made possible by the GIL - for example, so long as it is held, no additional locks are necessary to allocate a small object, unless that entails asking the system allocator for more memory


> A significant chunk of Python performance is derived from optimizations made possible by the GIL

Considering that Python has some of the worst performance around, does it really matter? Other languages don't have this problem, and they're orders of magnitude faster.


If performance doesn't matter, why bother attempt such a complex change at all?


The big problem with the 2->3 transition was that the cost exceeded the perceived benefit. Removing the GIL would have increased the cost, but also the benefit.

I think the core team over-estimated the benefits of the change, hence people's reluctance to pay the cost. I suspect that "threading like Go" might have been a benefit more worth paying the cost.


> The big problem with the 2->3 transition was that the cost exceeded the perceived benefit.

I believe it's generally agreed that the cost of the transition was surprisingly high. Hindsight is 20/20.

> Removing the GIL would have increased the cost, but also the benefit.

The only attempts I'm aware of resulted in a significant (~50%) slowdown in single-threaded code execution due to the requirement of adding in so many more locks elsewhere (removing the GIL doesn't remove the need for the locks around the various critical sections). Sure maybe it would result in better performance for some programs, but I'm not even sure it would result in something better than a cleaner design. In my experience, the portions of code that could make use of the threading could just be moved to a C/C++-extension and make use of it there (though in 99% of cases I wouldn't go that far and just stick to e.g. numpy). The examples of the code that really should stay in python, but also should make use of concurrency seem to usually sit quite nicely in the pypy framework.

That's not to say that removing the GIL wouldn't be nice, but I think the need for it is often overstated.


I think it was Greg Stein who made a significant attempt on removing the GIL back in the 1.x timeframe? IIRC, the penalty was much less than 50%, but it was still a significant cost.

I think that as time has passed, the inability to simply utilize more than one core becomes more of a problem. The transition to supporting async serves mostly to highlight its inability to use a thread pool like Go, as an example.

While I agree that C extensions (and especially, the standard extensions, like eg. socket) really help a lot to limit the impact of of the GIL, I think the usual advice to "just move it to a C extension" or "use multiple processes" are kinda ridiculous to anyone who's not used to the situation.

Unfortunately, fixing it will almost certainly require breaking the C API, and we've only just about recovered from the 2->3 breakage: I don't think the language can afford to do it again, perhaps ever.

Which is why I think it's a shame that the last transition didn't actually achieve more ...


Oh how I wish Python 3 was named Monty or something. Kill the GIL! Ruthlessly abandon backwards compatibility in the C API! It's a new language, you can do anything!

Folks who delayed adoption until this year had a long and glorious period of stability, and are now waking up to a shitshow of incompatibility and half-adopted features.


> a shitshow of incompatibility and half-adopted features.

To the extent that this isn’t wrong, it’s because they’re years behind on testing and upgrading from unsupported libraries. The switch isn’t hard on a relatively clean project.


I did preface that I was talking about people who delayed adoption until the very last minute. The stability was seductive, and IMO the forced transition was long overdue. In hindsight, I wish that had happened sometime around Py36 -- but at this point, I'd rather a new language unfettered by the old C API.

But what I said about half-adopted features is due to my experience with rather large bodies of code -- they're not "relatively clean" in any sense, and I often find myself doing long-overdue modernization that hinders my productivity


Again, is the problem the fraction of a Python 3 upgrade which can’t be done automatically or needing to update old third-party libraries or not having sufficient test coverage to be able to make any changes with confidence?


I'm not sure what the point of your questions is. Breaking backwards-compatibility makes headaches for maintainers for all of the reasons you list and more. I'd love to see Monty take an even more radical approach, with a "thin enough" compatibility layer. It'd be a great language with all sorts of improvements and mostly-familiar syntax, and old, stable Python libraries would stay stable -- even through a compatibility layer.


There are plenty of things that could have been down, but the Python core dev team is small and very resource limited.


Why does Python move at such a glacially slow pace? The language itself is unfortunate, the implementation even worse. Why can none of these problems be fixed?

Though, maybe it doesn't matter. Julia is better in every way and supports easy Python interop. It's already a questionable decision to start a new Python project today and it probably will only become more questionable in the future.

The language no longer serves any niche nor has any purpose. Let's just all move on.


Within the past few years python got a lot of typing improvements, nanosecond precision time, async/await, syntax that is very similar to ES6, lots of speedups, string templating and assignment expressions. How is that slow?


Read the other comments and people says it moves too fast.

People are never happy.


And beazley wouldn't care, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: