Why Python 4.0 won't be like Python 3.0

No readers like this yet.
code.org keynote address


Newcomers to python-ideas occasionally make reference to the idea of "Python 4000" when proposing backwards incompatible changes that don't offer a clear migration path from currently legal Python 3 code. After all, we allowed that kind of change for Python 3.0, so why wouldn't we allow it for Python 4.0?

I've heard that question enough times now (including the more concerned phrasing "You made a big backwards compatibility break once, how do I know you won't do it again?"), that I figured I'd record my answer here, so I'd be able to refer people back to it in the future.

What are the current expectations for Python 4.0?

My current expectation is that Python 4.0 will merely be "the release that comes after Python 3.9". That's it. No profound changes to the language, no major backwards compatibility breaks - going from Python 3.9 to 4.0 should be as uneventful as going from Python 3.3 to 3.4 (or from 2.6 to 2.7). I even expect the stable Application Binary Interface (as first defined in PEP 384) to be preserved across the boundary.

At the current rate of language feature releases (roughly every 18 months), that means we would likely see Python 4.0 some time in 2023, rather than seeing Python 3.10.

So how will Python continue to evolve?

First and foremost, nothing has changed about the Python Enhancement Proposal process - backwards compatible changes are still proposed all the time, with new modules (like asyncio) and language features (like yield from) being added to enhance the capabilities available to Python applications. As time goes by, Python 3 will continue to pull further ahead of Python 2 in terms of the capabilities it offers by default, even if Python 2 users have access to equivalent capabilities through third party modules or backports from Python 3.

Competing interpreter implementations and extensions will also continue to explore different ways of enhancing Python, including PyPy's exploration of JIT-compiler generation and software transactional memory, and the scientific and data analysis community's exploration of array oriented programming that takes full advantage of the vectorisation capabilities offered by modern CPUs and GPUs. Integration with other virtual machine runtimes (like the JVM and CLR) is also expected to improve with time, especially as the inroads Python is making in the education sector are likely to make it ever more popular as an embedded scripting language in larger applications running in those environments.

For backwards incompatible changes, PEP 387 provides a reasonable overview of the approach that was used for years in the Python 2 series, and still applies today: if a feature is identified as being excessively problematic, then it may be deprecated and eventually removed.

However, a number of other changes have been made to the development and release process that make it less likely that such deprecations will be needed within the Python 3 series:

  • the greater emphasis on the Python Package Index, as indicated by the collaboration between the CPython core development team and the Python Packaging Authority, as well as the bundling of the pip installer with Python 3.4+, reduces the pressure to add modules to the standard library before they're sufficiently stable to accommodate the relatively slow language update cycle
  • the "provisional API" concept (introduced in PEP 411) makes it possible to apply a "settling in" period to libraries and APIs that are judged likely to benefit from broader feedback before offering the standard backwards compatibility guarantees
  • a lot of accumulated legacy behaviour really was cleared out in the Python 3 transition, and the requirements for new additions to Python and the standard library are much stricter now than they were in the Python 1.x and Python 2.x days
  • the widespread development of "single source" Python 2/3 libraries and frameworks strongly encourages the use of "documented deprecation" in Python 3, even when features are replaced with newer, preferred, alternatives. In these cases, a deprecation notice is placed in the documentation, suggesting the approach that is preferred for new code, but no programmatic deprecation warning is added. This allows existing code, including code supporting both Python 2 and Python 3, to be left unchanged (at the expense of new users potentially having slightly more to learn when tasked with maintaining existing code bases).

From (mostly) English to all written languages

It's also worth noting that Python 3 wasn't expected to be as disruptive as it turned out to be. Of all the backwards incompatible changes in Python 3, many of the serious barriers to migration can be laid at the feet of one little bullet point in PEP 3100:

  • Make all strings be Unicode, and have a separate bytes() type. The new string type will be called 'str'.

PEP 3100 was the home for Python 3 changes that were considered sufficiently non-controversial that no separate PEP was considered necessary. The reason this particular change was considered non-controversial was because our experience with Python 2 had shown that the authors of web and GUI frameworks were right: dealing sensibly with Unicode as an application developer means ensuring all text data is converted from binary as close to the system boundary as possible, manipulated as text, and then converted back to binary for output purposes.

Unfortunately, Python 2 doesn't encourage developers to write programs that way - it blurs the boundaries between binary data and text extensively, and makes it difficult for developers to keep the two separate in their heads, let alone in their code. So web and GUI framework authors have to tell their Python 2 users "always use Unicode text. If you don't, you may suffer from obscure and hard to track down bugs when dealing with Unicode input".

Python 3 is different: it imposes a much greater separation between the "binary domain" and the "text domain", making it easier to write normal application code, while making it a bit harder to write code that works with system boundaries where the distinction between binary and text data can be substantially less clear. I've written in more detail elsewhere regarding what actually changed in the text model between Python 2 and Python 3.

This revolution in Python's Unicode support is taking place against a larger background migration of computational text manipulation from the English-only ASCII (officially defined in 1963), through the complexity of the "binary data + encoding declaration" model (including the C/POSIX locale and Windows code page systems introduced in the late 1980's) and the initial 16-bit only version of the Unicode standard (released in 1991) to the relatively comprehensive modern Unicode code point system (first defined in 1996, with new major updates released every few years).

Why mention this point? Because this switch to "Unicode by default" is the most disruptive of the backwards incompatible changes in Python 3 and unlike the others (which were more language specific), it is one small part of a much larger industry wide change in how text data is represented and manipulated. With the language specific issues cleared out by the Python 3 transition, a much higher barrier to entry for new language features compared to the early days of Python and no other industry wide migrations on the scale of switching from "binary data with an encoding" to Unicode for text modelling currently in progress, I can't see any kind of change coming up that would require a Python 3 style backwards compatibility break and parallel support period. Instead, I expect we'll be able to accommodate any future language evolution within the normal change management processes, and any proposal that can't be handled that way will just get rejected as imposing an unacceptably high cost on the community and the core development team.

Originally published on Curious Efficiency and also available at the Red Hat Developer Blog. Republished under Creative Commons. For more on the evolution of Python, you might be interested in The transition to multilingual programming with Python, also by Nick Coghlan.

User profile image.
Nick is a CPython core developer and a member of the Board of Directors for the Python Software Foundation.


'My current expectation is that Python 4.0 will merely be "the release that comes after Python 3.9"'

That's not a decimal point.. What comes after 3.9 is 3.10.

Yeah, but Nick is one of the core developers of Python, so if they say "after Python 3,9 comes 4.0" then 4.0 it is :)

In reply to by WorMzy (not verified)

...and this is the attitude that makes Python 3 one of the worst adopted language updates since Perl 6. Why don't you just roll all the good features of python 3 into python 2.7 and call it Python 4 (or python 2.8). Then the vast majority of code bases get compatibility with the newest python version along with all of the newest features and we can just pretend Py3K never happened.

In reply to by Ricardo J. Barberis (not verified)

Sorry, but that is something only unknowing person can say. "Why don't you..." - seriously, are you a software developer? :/ If they would've done so, Python wouldn't be as neat as it is, it'd be a pile of garbage, literally.

In reply to by dhj (not verified)

Ugh, that's a horrible semantic versioning violation. I hope he was joking.

If python means to never make backwards-incompatible changes, then there should never be a python 4.0.

In reply to by Ricardo J. Barberis (not verified)

"embedded scripting language"

Sorry, Lua already owns that market. Python has absolutely no chance, as the VM is bloated and very poorly suited to embedding.

This is why I think Python 4 should involve a complete performance rewrite of the interpreter. Embrace the extensive use of Python in numerical computing and expand its utility to areas such as game development. Look to pypy and cython for inspiration. Both could be integrated into core, but it probably would be easier to start from scratch.

In reply to by Clive (not verified)

Lua is an embedded scripting language, it can't function on its own.

In reply to by Clive (not verified)

Sorry for the delayed reply, but you're right that there are some serious technical hurdles to embedding CPython in its current form. Folks will do that work if they consider the end result sufficiently valuable (e.g. Blender), but it's complex and finicky enough that it isn't going to be people's first choice, especially when the primary focus is on scripting the components of the embedding application, and you don't want or need access to things like the Scientific Python computational stack for matrix manipulation and machine learning algorithms.

PEP 432 (https://www.python.org/dev/peps/pep-0432/) is a currently unfunded proposal to change that situation by redesigning the way the CPython runtime gets configured. We're late enough in the 3.5 development cycle now that it's unlikely to be implemented this year, so I'll likely take another look at moving that forward in the Python 3.6 time frame.

This is an area that becomes potentially even more interesting as changes in digital education policies begin to have an impact, and we start to see high school students (at least in Australia and the UK, but potentially in other regions as well) graduating having already been exposed to Python as their introduction to text based programming languages.

In reply to by Clive (not verified)

Python 4 should become the Java and C# killer if it wants to keep its title. What we are seeing at the moment with Python is that it keeps jumping between interim versions, example the Linux user base still have not even embraced 3.0. If it wants to be king of the hill it needs to listen to its user base.

Exactly. On the Linux front, a number of us have been working with Linux vendors and major projects like OpenStack on their porting issues, and tackling those wherever it makes the most sense to do so (whether that's adding the "--py3k" mode to pylint, working to get Ubuntu and Fedora to using Python 3 as the primary system Python, or pushing for changes like making the C.UTF-8 locale a standard feature of upstream glibc). (Red Hat and Canonical actually each employ a number of CPython core developers, as do other major Python-on-Linux consumers like HP, Google and Rackspace)

As various technical barriers to porting are identified, we've either looked at ways to enhance Python 3 itself to address them (such as restoring full support for the binary transform codecs and reinstating binary interpolation support), or else looked at how to make porting tools like six, modernize and future easier to access (this is a key reason that the pip package installer now ships by default with both Python 2 and Python 3).

On the user front, the primary and secondary education sector have by and large switched to Python 3 already, and the inclusion of a dedicated matrix multiplication operation in Python 3.5 was specifically in response to feedback (and contributions) from the analytical Python community.

Aside from possible interest in the matrix multiplication operator for cleaner expression of matrix and vector operations, there unfortunately isn't a great Python 3 carrot for organisations like AutoDesk as yet (which is likely to be the tipping point for those parts of the 3D animation community that don't use Blender), but as noted in my reply to Clive, I'd like to tackle the current complexity of embedding at some point, which then becomes attractive for anyone that is currently needing to resort to complex hacks to get Python 2 to behave the way they would like it to.

As far as challenging the JVM and CLR goes, I agree that's a very interesting space, and I actually believe it's the PyPy community that are best placed to do it. Not only does PyPy embody an innovative approach to designing JIT compilers, but Armin Rigo's research into practical applications of Software Transactional Memory is genuinely groundbreaking, and offers a realistic prospect of being able to program PyPy hosted applications in an event driven model while still making use of all available cores, and without halting the entire application if you happen to make a blocking call from an event handler.

In reply to by Zed (not verified)

It is awesome what Python has done, and I agree with what you are saying. I hope that soon, acceptance of Python 3 as the standard python version for Ubuntu, Red Hat, etc. will happen, as it has been steadily pulling ahead of the Python 2.7 set of features.

I also am watching in apt fascination at what the PyPy community has done, on so many fronts. I can see that sometime in the future, PyPy could make having multiple Python versions for different backends (CPython, Jython, IronPython) can become a thing of the past. If they can get Jython & IronPython's integration with the JVM/CLR API's like how Jython/IronPython does, and if they can get the closest compatibility with CPython's code (without sacrificing their amazing work on various fronts), as well as good integration with C/C++ extensions, I can see it becoming the standard Python implementation. However, they still are a ways off from all of that.

Python's push into the scientific/secondary education sector has also been nothing short of astounding. An old friend of mine has been using Python during his Doctorate work (after some prodding by me, and realization that it is less expensive/more rich than Matlab and other tools), as well as getting his research lab adopting it as the defacto standard there for their physics research.

In reply to by ncoghlan

Python is for algorithms and script-like stuff, Java/C# is for binary development and standalone apps, and Lua is for embedding into a Java/C# app. And C++ is for practically everything. So why would Python go embedded (except PythonFu in GIMP already did)?

Creative Commons LicenseThis work is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license