Hacking python bytecode in runtime: custom 'return(...)'
Python flow is controlled by a number of statements, such as
These are a part of python language which you are not allowed to (re-)implement by yourself.
Or are you?
Let’s make a function
_return(something) which returns
something in the stack frame of a function that called it.
That said, the following code
def f(): _return("hacked") return 42
'hacked' rather than
Python interpreter runs a very well-organized python bytecode.
To achieve such non-standard behavior we will have to patch the bytecode belonging to the function
This is a relatively well-documented and straightforward task where you take advantage of standard
The issue is, we have to patch the code that is already running.
This is not possible with standard python.
But we can still break in with our memory editor.
_return code is tiny.
First, we retrieve the parent frame inside
Second, we put the
0x53 right after
_return was called.
Finally, we return from the
_return function with the desired value.
It will be put on top of the stack frame of the callee and automatically picked up by the following (new)
The minimal working example looks like this:
import inspect from mem_view import Mem def _return(what): # Get parent frame, its opcodes and the position of the interpreter parent_frame = inspect.currentframe().f_back parent_code = parent_frame.f_code.co_code pos = parent_frame.f_lasti # Make sure that the last opcode is calling this function assert parent_code[pos] == 0x83 # CALL_FUNCTION # Make sure there is enough space for the patch assert len(parent_code) >= pos + 4 # Place the RETURN_VALUE opcode right after the CALL_FUNCTION opcode mem = Mem.view(parent_code) mem[pos + 2:pos + 4] = b'\x53\x00' # RETURN_VALUE # Now, return the argument to put it on top of the parent stack return what
I tested it with cPython 3.9 and it works! But one point can still be improved. After patching, opcodes look like this:
pos - 4: LOAD_GLOBAL _return pos - 2: LOAD_... what pos: CALL_FUNCTION pos + 2: RETURN_VALUE
But, after patching, there is no need to call the
_return function because the desired return value will always be on top of the stack!
So, we can replace
CALL_FUNCTION itself with another
RETURN_VALUE to avoid additional function calls:
pos - 4: LOAD_GLOBAL _return pos - 2: LOAD_... what <- this object will be returned pos: RETURN_VALUE <- this will be triggered every call of `f` ... pos + 2: RETURN_VALUE <- ... except the very first call which will return from here
The required change:
# mem[pos + 2:pos + 4] = b'\x53\x00' -- old mem[pos:pos + 4] = b'\x53\x00' * 2 # RETURN_VALUE and RETURN_VALUE