How I patched Python to include this great Ruby feature

What I learned from adding "else-less" functionality to Python, as inspired by Ruby.

Image by:

Opensource.com

Ruby, unlike Python, makes lots of things implicit, and there's a special kind of if expression that demonstrates this well. It's often referred to as an "inline-if" or "conditional modifier", and this special syntax is able to return one value when a condition is true, but another value (nil, specifically) when a condition is false. Here's an example:

$ irb
irb(main):> RUBY_VERSION
=> "2.7.1"
irb(main):> a = 42 if true
=> 42
irb(main):> b = 21 if false
=> nil
irb(main):> b
=> nil
irb(main):> a
=> 42

In Python, you can't do that without explicitly adding an else clause to the expression. In fact, as of this PR, the interpreter tells you immediately that else is mandatory:

$ python
Python 3.11.0a0
>>> a = 42 if True
  File "<stdin>", line 1
    ;a = 42 if True
    ^^^^^^^^^^
SyntaxError: expected 'else' after 'if' expression

However, I find Ruby's if actually very convenient.

Image by:

Python accepts else-less if statements, similar to Ruby.

This convenience became more evident when I had to go back to Python and write things like this:

>>> my_var = 42 if some_cond else None

So I thought to myself, what would it be like if Python had a similar feature? Could I do it myself? How hard would that be?

Looking into Python's source code

Digging into CPython's code and changing the language's syntax sounded not trivial to me. Luckily, during the same week, I found out on Twitter that Anthony Shaw had just written a book on CPython Internals and it was available for pre-release. I didn't think twice and bought the book. I've got to be honest, I'm the kind of person who buys things and doesn't use them immediately. As I had other plans in mind, I let it "gather dust" in my home folder until I had to work with that Ruby service again. It reminded me of the CPython Internals book and how challenging hacking the guts of Python would be.

The first thing was to go through the book from the very start and try to follow each step. The book focuses on Python 3.9, so in order to follow it, one needs to check out the 3.9 tag, and that's what I did. I learned about how the code is structured and then how to compile it. The next chapters show how to extend the grammar and add new things, such as a new operator.

As I got familiar with the code base and how to tweak the grammar, I decided to give it a spin and make my own changes to it.

The first (failed) attempt

As I started finding my way around CPython's code from the latest main branch, I noticed that lots of things had changed since Python 3.9, yet some fundamental concepts didn't.

My first attempt was to dig into the grammar definition and find the if expression rule. The file is currently named Grammar/python.gram. Locating it was not difficult. An ordinary CTRL+F for the else keyword was enough.

file: Grammar/python.gram
...
expression[expr_ty] (memo):
   | invalid_expression
   | a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
   | disjunction
   | lambdef
....

Now with the rule in hand, my idea was to add one more option to the current if expression where it would match a=disjunction 'if' b=disjunction and the c expression would be NULL.

This new rule should be placed immediately after the complete one, otherwise, the parser would match a=disjunction 'if' b=disjunction always, returning a SyntaxError.

...
expression[expr_ty] (memo):
   | invalid_expression
   | a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
   | a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }
   | disjunction
   | lambdef
....

Regenerating the parser and compiling Python from source

CPython comes with a Makefile containing lots of useful commands. One of them is the regen-pegen command which converts Grammar/python.gram into Parser/parser.c.

Besides changing the grammar, I had to modify the AST for the if expression. AST stands for Abstract Syntax Tree, and it is a way of representing the syntactic structure of the grammar as a tree. For more information about ASTs, I highly recommend the Crafting Interpreters book by Robert Nystrom.

Moving on, if you observe the rule for the if expression, it goes like this:

   | a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }

The means when the parser finds this rule, it calls the _PyAST_IfExp, which gives back a expr_ty data structure. So this gave me a clue that to implement the new rule's behavior, I'd need to change _PyAST_IfExp.

To find where it is located, I used my rip-grep skills and searched for it inside the source root:

$ rg _PyAST_IfExp -C2 .

[OMITTED]
Python/Python-ast.c
2686-
2687-expr_ty
2688:_PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, int lineno, int
2689- col_offset, int end_lineno, int end_col_offset, PyArena *arena)
2690-{
[OMITTED]

The implementation goes like this:

expr_ty
_PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, int lineno, int
             col_offset, int end_lineno, int end_col_offset, PyArena *arena)
{
   expr_ty p;
   if (!test) {
        PyErr_SetString(PyExc_ValueError,
                        "field 'test' is required for IfExp");
       return NULL;
   }
   if (!body) {
        PyErr_SetString(PyExc_ValueError,
                        "field 'body' is required for IfExp");
        return NULL;
   }
   if (!orelse) {
        PyErr_SetString(PyExc_ValueError,
                        "field 'orelse' is required for IfExp");
        return NULL;
   }
   p = (expr_ty)_PyArena_Malloc(arena, sizeof(*p));
   if (!p)
        return NULL;
   p->kind = IfExp_kind;
   p->v.IfExp.test = test;
   p->v.IfExp.body = body;
   p->v.IfExp.orelse = orelse;
   p->lineno = lineno;
   p->col_offset = col_offset;
   p->end_lineno = end_lineno;
   p->end_col_offset = end_col_offset;
   return p;
}

Since I passed orelseNULL, I thought it was just a matter of changing the body of if (!orelse)None to orelse. It would look like this:

   if (!orelse) {
- PyErr_SetString(PyExc_ValueError,
- "field 'orelse' is required for IfExp");
- return NULL;
+ orelse = Py_None;
   }

Now it was time to test it. I compiled the code with make -j8 -s and fired up the interpreter:

$ make -j8 -s

Python/Python-ast.c: In function ‘_PyAST_IfExp’:
Python/Python-ast.c:2703:16: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
         orelse = Py_None;

Despite the glaring obvious warnings, I decided to ignore it just to see what would happen.

$ ./python
Python 3.11.0a0 (heads/ruby-if-new-dirty:f92b9133ef, Aug 2 2021, 09:13:02) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 42 if True
>>> a
42
>>> b = 21 if False
[1] 16805 segmentation fault (core dumped) ./python

Ouch! It worked for the if True case, but assigning Py_None to expr_ty orelse caused a segfault.

Time to go back to see what went wrong.

The second attempt

It wasn't too difficult to figure out where I messed up. orelse is an expr_ty, and I assigned to it a Py_None which is a PyObject *. Again, thanks to rip-grep, I found its definition:

$ rg constant -tc -C2

Include/internal/pycore_asdl.h
14-typedef PyObject * string;
15-typedef PyObject * object;
16:typedef PyObject * constant;

Now, how did I find out Py_None was a constant?

While reviewing the Grammar/python.gram file, I found that one of the rules for the new pattern matching syntax is defined like this:

# Literal patterns are used for equality and identity constraints
literal_pattern[pattern_ty]:
   | value=signed_number !('+' | '-') { _PyAST_MatchValue(value, EXTRA) }
   | value=complex_number { _PyAST_MatchValue(value, EXTRA) }
   | value=strings { _PyAST_MatchValue(value, EXTRA) }
   | 'None' { _PyAST_MatchSingleton(Py_None, EXTRA) }

However, this rule is a pattern_ty, not an expr_ty. But that's fine. What really matters is to understand what _PyAST_MatchSingleton actually is. Then, I searched for it in Python/Python-ast.c:

file: Python/Python-ast.c
...
pattern_ty
_PyAST_MatchSingleton(constant value, int lineno, int col_offset, int
                        end_lineno, int end_col_offset, PyArena *arena)
...

I looked for the definition of a None node in the grammar. To my great relief, I found it!

atom[expr_ty]:
   | NAME
   | 'True' { _PyAST_Constant(Py_True, NULL, EXTRA) }
   | 'False' { _PyAST_Constant(Py_False, NULL, EXTRA) }
   | 'None' { _PyAST_Constant(Py_None, NULL, EXTRA) }
....

At this point, I had all the information I needed. To return an expr_ty representing None, I needed to create a node in the AST which is constant by using the _PyAST_Constant function.

   | a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
- | a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }
+ | a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, _PyAST_Constant(Py_None, NULL, EXTRA), EXTRA) }
   | disjunction

Next, I must revert Python/Python-ast.c as well. Since I'm feeding it a valid expr_ty, it will never be NULL.

file: Python/Python-ast.c
...
   if (!orelse) {
- orelse = Py_None;
+ PyErr_SetString(PyExc_ValueError,
+ "field 'orelse' is required for IfExp");
+ return NULL;
   }
...

I compiled it again:

$ make -j8 -s && ./python
Python 3.11.0a0 (heads/ruby-if-new-dirty:25c439ebef, Aug 2 2021, 09:25:18) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> c = 42 if True
>>> c
42
>>> b = 21 if False
>>> type(b)
<class 'NoneType'>
>>>

It works!

Now, I needed to do one more test. Ruby functions allow returning a value if a condition matches, and if not, the rest of the function body gets executed. Like this:

> irb
irb(main):> def f(test)
irb(main):>   return 42 if test
irb(main):>   puts 'missed return'
irb(main):>   return 21
irb(main):> end
=> :f
irb(main):> f(false)
missed return
=> 21
irb(main):> f(true)
=> 42

At this point, I wondered if that would work with my modified Python. I rushed to the interpreter again and wrote the same function:

>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> f(False)
>>> f(True)
42
>>>

The function returns None if test is False... To help me debug this, I summoned the ast module. The official docs define it like so:

The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.

I printed the AST for this function:

>>> fc = '''
... def f(test):
... return 42 if test
... print('missed return')
... return 21
... '''
>>> print(ast.dump(ast.parse(fc), indent=4))
Module(
   body=[
        FunctionDef(
            name='f',
            args=arguments(
                posonlyargs=[],
                args=[
                  arg(arg='test')],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                Return(
                  value=IfExp(
                  test=Name(id='test', ctx=Load()),
                  ;body=Constant(value=42),
                  orelse=Constant(value=None))),
                Expr(
                  value=Call(
                    func=Name(id='print', ctx=Load()),
                      args=[
                        Constant(value='missed return')],
                      keywords=[])),
                  Return(
                      value=Constant(value=21))],
            decorator_list=[])],
   type_ignores=[])

Now things made more sense. My change to the grammar was just "syntax sugar". It turns an expression like this: a if b into this: a if b else None. The problem here is that Python returns no matter what, so the rest of the function is ignored.

You can look at the bytecode generated to understand what exactly is executed by the interpreter. And for that, you can use the dis module. According to the docs:

The dis module supports the analysis of CPython bytecode by disassembling it.

>>> import dis
>>> dis.dis(f)
  2 0 LOAD_FAST 0 (test)
              2 POP_JUMP_IF_FALSE 4 (to 8)
              4 LOAD_CONST 1 (42)
              6 RETURN_VALUE
        >> 8 LOAD_CONST 0 (None)
            10 RETURN_VALUE

What this basically means is that in case the test is false, the execution jumps to 8, which loads the None into the top of the stack and returns it.

Supporting "return-if"

To support the same Ruby feature, I need to turn the expression return 42 if test into a regular if statement that returns if test is true.

To do that, I needed to add one more rule. This time, it would be a rule that matches the return <value> if <test> piece of code. Not only that, I needed a _PyAST_ function that creates the node for me. I'll then call it _PyAST_ReturnIfExpr:

file: Grammar/python.gram

return_stmt[stmt_ty]:
+ | 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }
   | 'return' a=[star_expressions] { _PyAST_Return(a, EXTRA) }

As mentioned previously, the implementation for all these functions resides in Python/Python-ast.c, and their definition is in Include/internal/pycore_ast.h, so I put _PyAST_ReturnIfExpr there:

file: Include/internal/pycore_ast.h

 stmt_ty _PyAST_Return(expr_ty value, int lineno, int col_offset, int
                      end_lineno, int end_col_offset, PyArena *arena);
+stmt_ty _PyAST_ReturnIfExpr(expr_ty value, expr_ty test, int lineno, int col_of
fset, int
+ end_lineno, int end_col_offset, PyArena *arena);
 stmt_ty _PyAST_Delete(asdl_expr_seq * targets, int lineno, int col_offset, int
                      end_lineno, int end_col_offset, PyArena *arena);

file: Python/Python-ast.c

+stmt_ty
+_PyAST_ReturnIfExpr(expr_ty value, expr_ty test, int lineno, int col_offset, int end_lineno, int
+ end_col_offset, PyArena *arena)
+{
+ stmt_ty ret, p;
+ ret = _PyAST_Return(value, lineno, col_offset, end_lineno, end_col_offset, arena);
+
+ asdl_stmt_seq *body;
+ body = _Py_asdl_stmt_seq_new(1, arena);
+ asdl_seq_SET(body, 0, ret);
+
+ p = _PyAST_If(test, body, NULL, lineno, col_offset, end_lineno, end_col_offset, arena);
+
+ return p;
+}
+
 stmt_ty

I examined the implementation of _PyAST_ReturnIfExpr. I wanted to turn return <value> if <test> into if <test>: return <value>.

Both return and the regular if are statements, so in CPython, they're represented as stmt_ty. The _PyAST_If expectes a expr_ty test and a body, which is a sequence of statements. In this case, the body is asdl_stmt_seq *body.

As a result, what I really wanted here was an if statement with a body where the only statement is a return <value> one.

CPython disposes of some convenient functions to build asdl_stmt_seq *, and one of them is _Py_asdl_stmt_seq_new. So I used it to create the body and added the return statement I created a few lines before with _PyAST_Return.

Once that was done, the last step was to pass the test as well as the body to _PyAST_If.

And before I forget, you may be wondering what on earth is the PyArena *arena. Arena is a CPython abstraction used for memory allocation. It allows efficient memory usage by using memory mapping mmap() and placing it in contiguous chunks of memory.

Time to regenerate the parser and test it one more time:

>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> import dis
>>> f(False)
>>> f(True)
42

It doesn't work. Check the bytecodes:

>>> dis.dis(f)
  2 0 LOAD_FAST 0 (test)
            2 POP_JUMP_IF_FALSE 4 (to 8)
            4 LOAD_CONST 1 (42)
            6 RETURN_VALUE
        >> 8 LOAD_CONST 0 (None)
        10 RETURN_VALUE
>>>

It's the same bytecode instructions again!

Going back to the compilers class

At that point, I was clueless. I had no idea what was going on until I decided to go down the rabbit hole of expanding the grammar rules.

The new rule I added went like this: 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }.

My only hypothesis was that a=star_expressions 'if' b=disjunction was being resolved to the else-less rule I added in the beginning.

By going over the grammar one more time, I figured that my theory held. star_expressions would match a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }.

The only way to fix this was by getting rid of the star_expressions. So I changed the rule to:

 return_stmt[stmt_ty]:
- | 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }
+ | 'return' a=disjunction guard=guard !'else' { _PyAST_ReturnIfExpr(a, guard, EXTRA) }
  | 'return' a=[star_expressions] { _PyAST_Return(a, EXTRA) }

You might be wondering, what are guard, !else, and star_expressions?

This guard is a rule that is part of the pattern matching rules. The new pattern matching feature added in Python 3.10 allows things like this:

match point:
   case Point(x, y) if x == y:
        print(f"Y=X at {x}")
        case Point(x, y):
        print(f"Not on the diagonal")

And the rule goes by this:

guard[expr_ty]: 'if' guard=named_expression { guard }

With that, I added one more check. To avoid it failing with SyntaxError, I needed to make sure the rule matched only code like this: return value if cond. Thus, to prevent code such as return an if cond else b being matched prematurely, I added a !' else to the rule.

Last but not least, the star_expressions allow me to return destructured iterables. For example:

>>> def f():
  ...: a = [1, 2]
  ...: return 0, *a
  ...:&

>>> f()
(0, 1, 2)

In this case, 0, * a is a tuple, which falls under the category of star_expressions. The regular if-expression doesn't allow using star_expressions with it, AFAIK, so changing the new return rule won't be an issue.

Does it work yet?

After fixing the return rule, I regenerated the grammar one more time and compiled it:

>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> f(False)
missed return
21
>>> f(True)
42

It works!

Looking at the bytecode:

>>> import dis
>>> dis.dis(f)
  2 0 LOAD_FAST 0 (test)
            2 POP_JUMP_IF_FALSE 4 (to 8)
            4 LOAD_CONST 1 (42)
            6 RETURN_VALUE

  3 >> 8 LOAD_GLOBAL 0 (print)
            10 LOAD_CONST 2 ('missed return')
            12 CALL_FUNCTION 1
            14 POP_TOP

  4 16 LOAD_CONST 3 (21)
            18 RETURN_VALUE
>>>

That's precisely what I wanted. Is the AST is the same as the one with regular if?

>>> import ast
>>> print(ast.dump(ast.parse(fc), indent=4))
Module(
   body=[
        FunctionDef(
            name='f',
            args=arguments(
                posonlyargs=[],
                args=[
                  arg(arg='test')],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                If(
                    test=Name(id='test', ctx=Load()),
                    body=[
                      Return(
                      value=Constant(value=42))],
                      orelse=[]),
                Expr(
                  value=Call(
                          func=Name(id='print', ctx=Load()),
                          args=[
                            Constant(value='missed return')],
                          keywords=[])),
                Return(
                  value=Constant(value=21))],
            decorator_list=[])],
   type_ignores=[])
>>>

Indeed it is!

If(
   test=Name(id='test', ctx=Load()),
   body=[
        Return(
            value=Constant(value=42))],
   orelse=[]),

This node is the same as the one that would be generated by:

if test: return 42

If it's not tested, it's broken?

To conclude this journey, I thought it'd be a good idea to add some unit tests as well. Before writing anything new, I wanted to get an idea of what I had broken.

With the code tested manually, I ran all tests using the test module python -m test -j8. The -j8 means it uses eight processes to run the tests in parallel:

$ ./python -m test -j8

To my surprise, only one test failed!

== Tests result: FAILURE ==
406 tests OK.
1 test failed:
   test_grammar

Because I ran all tests, it's hard to navigate the output, so I can run only this one again in isolation:

======================================================================
FAIL: test_listcomps (test.test_grammar.GrammarTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/miguel/projects/cpython/Lib/test/test_grammar.py", line 1732, in test_listcomps
   check_syntax_error(self, "[x if y]")
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miguel/projects/cpython/Lib/test/support/__init__.py", line 497, in check_syntax_error
   with testcase.assertRaisesRegex(SyntaxError, errtext) as cm:
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: SyntaxError not raised
----------------------------------------------------------------------

Ran 76 tests in 0.038s

FAILED (failures=1)
test test_grammar failed
test_grammar failed (1 failure)

== Tests result: FAILURE ==

1 test failed:
   test_grammar

1 re-run test:
   test_grammar

Total duration: 82 ms
Tests result: FAILURE

And there it is! It expected a syntax error when running a [x if y] expression. I can safely remove it and re-run the tests again:

== Tests result: SUCCESS ==

1 test OK.

Total duration: 112 ms
Tests result: SUCCESS

Now that everything is OK, it's time to add a few more tests. It's important to test not only the new "else-less if" but also the new return statement.

By navigating through the test_grammar.py file, I can find a test for pretty much every grammar rule. The first one I look for is test_if_else_expr. This test doesn't fail, so it only tests for the happy case. To make it more robust, I needed to add two new tests to check if True and if False cases:

     self.assertEqual((6 < 4 if 0), None)
        self.assertEqual((6 < 4 if 1), False)

I ran everything again, and all tests passed this time.

Note: bool in Python is a subclass of integer, so you can use 1 to denote True and 0 for False.

Ran 76 tests in 0.087s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 174 ms
Tests result: SUCCESS

Lastly, I needed the tests for the return rule. They're defined in the test_return test. Just like the if expression one, this test passed with no modification.

To test this new use case, I created a function that receives a bool argument and returns if the argument is true. When it's false, it skips the return, just like the manual tests I had been doing up to this point:

        def g4(test):
             a = 1
             return a if test
             a += 1
             return a

        self.assertEqual(g4(False), 2)
        self.assertEqual(g4(True), 1)

I saved the file and re-ran test_grammar one more time:

----------------------------------------------------------------------

Ran 76 tests in 0.087s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 174 ms
Tests result: SUCCESS

Looks good! The test_grammar test passed. Just in case, I re-ran the full test suite:

$ ./python -m test -j8

After a while, all tests passed, and I'm very happy with the result.

Limitations

If you know Ruby well, by this point, you've probably noticed that what I did here was not 100% the same as a conditional modifier. For example, in Ruby, you can run actual expressions in these modifiers:

irb(main):002:0> a = 42
irb(main):003:0> a += 1 if false
=> nil
irb(main):004:0> a
=> 42
irb(main):005:0> a += 1 if true
=> 43

I cannot do the same with my implementation:

>>> a = 42
>>> a += 1 if False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'
>>> a += 1 if True
>>> a
43

What this reveals is that the return rule I created is just a workaround. If I want to make it as close as possible to Ruby's conditional modifier, I'll need to make it work with other statements as well, not just return.

Nevertheless, this is fine. My goal with this experiment was just to learn more about Python internals and see how I would navigate a little-known code base written in C and make the appropriate changes to it. And I have to admit that I'm pretty happy with the results!

Conclusion

Adding a new syntax inspired by Ruby is a really nice exercise to learn more about the internals of Python. Of course, if I had to convert this as a PR, the core developers would probably find a few shortcomings, as I have already described in the previous section. However, since I did this just for fun, I'm very happy with the results.

The source code with all my changes is on my CPython fork under the branch ruby-if-new.

This article was originally published on the author's personal blog and has been adapted with permission.