Hacking python bytecode in runtime: custom 'return(...)'
Python flow is controlled by a number of statements, such as for
, if
, while
, raise
, try
, return
.
These are a part of python language which you are not allowed to (re-)implement by yourself.
Or are you?
Assumptions
Let’s make a function _return(something)
which returns something
in the stack frame of a function that called it.
That said, the following code
def f():
_return("hacked")
return 42
Should return 'hacked'
rather than 42
.
The idea
Python interpreter runs a very well-organized python bytecode.
To achieve such non-standard behavior we will have to patch the bytecode belonging to the function f
.
This is a relatively well-documented and straightforward task where you take advantage of standard inspect
and dis
libraries.
The issue is, we have to patch the code that is already running.
This is not possible with standard python.
But we can still break in with our memory editor.
The implementation
The _return
code is tiny.
First, we retrieve the parent frame inside _return
.
Second, we put the RETURN
opcode 0x53
right after _return
was called.
Finally, we return from the _return
function with the desired value.
It will be put on top of the stack frame of the callee and automatically picked up by the following (new) RETURN
opcode.
The minimal working example looks like this:
import inspect
from mem_view import Mem
def _return(what):
# Get parent frame, its opcodes and the position of the interpreter
parent_frame = inspect.currentframe().f_back
parent_code = parent_frame.f_code.co_code
pos = parent_frame.f_lasti
# Make sure that the last opcode is calling this function
assert parent_code[pos] == 0x83 # CALL_FUNCTION
# Make sure there is enough space for the patch
assert len(parent_code) >= pos + 4
# Place the RETURN_VALUE opcode right after the CALL_FUNCTION opcode
mem = Mem.view(parent_code)
mem[pos + 2:pos + 4] = b'\x53\x00' # RETURN_VALUE
# Now, return the argument to put it on top of the parent stack
return what
I tested it with cPython 3.9 and it works! But one point can still be improved. After patching, opcodes look like this:
pos - 4: LOAD_GLOBAL _return
pos - 2: LOAD_... what
pos: CALL_FUNCTION
pos + 2: RETURN_VALUE
But, after patching, there is no need to call the _return
function because the desired return value will always be on top of the stack!
So, we can replace CALL_FUNCTION
itself with another RETURN_VALUE
to avoid additional function calls:
pos - 4: LOAD_GLOBAL _return
pos - 2: LOAD_... what <- this object will be returned
pos: RETURN_VALUE <- this will be triggered every call of `f` ...
pos + 2: RETURN_VALUE <- ... except the very first call which will return from here
The required change:
# mem[pos + 2:pos + 4] = b'\x53\x00' -- old
mem[pos:pos + 4] = b'\x53\x00' * 2 # RETURN_VALUE and RETURN_VALUE