Modifying immutable objects in python
In python, strings and bytes are immutable objects which you cannot modify. But what to do if you absolutely want to? The nastiest way to do it is to overwrite the corresponding memory block. Here is how you do it in pure python.
Assumptions
For the moment let’s consider a bytes
object b'xyz'
which we would like to replace with b'abc'
.
It has a single reference a
and nothing else cares about it so we are relatively safe changing it.
>>> x = b'xyz'
>>> awful_hack(x)
>>> x
b'abc'
The idea
Of course, we may do the following trick but it will create a new object:
def awful_hack(_):
globals()['x'] = b'abc'
Can we make this change in-place by literally overwriting 3 bytes with another 3 bytes?
The answer is yes.
We may, for example, compile a small library written in C which does the trick.
But we can also do it in pure python.
Generally speaking, we are looking into some sort of memcpy
or memmove
exposed in the
existing python libraries.
The easiest is ctypes.memmove(destination, source, length)
which writes arbitrary data
to arbitrary address.
The implementation
The basic usage of memmove
would look like this:
from ctypes import memmove
def awful_hack(x):
new_data = b'abc'
src = id(new_data)
dst = id(x)
memmove(dst, src, len(new_data))
where id(...)
returns object address.
But it won’t work: bytes
object is not just its contents but also its type, length, and other necessary handling data.
In fact, the contents of bytes
stays exactly 0x20
bytes after the pointer (cPython 3.9).
Let’s take this into account.
from ctypes import memmove
def awful_hack(x):
new_data = b'abc'
src = id(new_data) + 0x20
dst = id(x) + 0x20
memmove(dst, src, len(new_data))
x = b'xyz'
awful_hack(x)
print(x) # prints b'abc'
Now it works!
Well, kinda.
Even though x
was never used, the interpreter is now clearly broken.
Update: I was not able to reproduce this behavior reliably.
print(b'xyz') # prints b'abc'
The reason is, cPython does its best to conserve memory by not creating new immutable built-ins and referring to identical old ones.
print(...)
simply receives a pointer to the memory block we just changed, reads from it and prints b'abc'
.
Bonus
A class providing a read-write view of an arbitrary memory block. As noted above, use with care.
from ctypes import memmove, string_at
def ptr(data):
return id(data) + 0x20
class Mem:
def __init__(self, addr, length):
self.addr = addr
self.length = length
@property
def _bytes(self):
return string_at(self.addr, self.length)
def _w(self, offset, buffer):
buffer = bytes(buffer)
memmove(self.addr + offset, ptr(buffer), len(buffer))
def __getitem__(self, item):
return self._bytes[item]
def __setitem__(self, item, value):
if isinstance(value, int):
value = bytes([value & 0xFF])
else:
value = bytes(value)
if isinstance(item, int):
assert len(value) == 1
self._w(item, value)
elif isinstance(item, slice):
start, stop, step = item.indices(self.length)
assert step == 1
assert len(value) == stop - start
self._w(start, value)
else:
raise NotImplementedError
def __len__(self):
return self.length
def __str__(self):
return f"Mem({self._bytes})"
@staticmethod
def view(a):
if isinstance(a, bytes):
return Mem(ptr(a), len(a))
else:
raise NotImplementedError
x = b"xyz"
v = Mem.view(x)
print(x, v)
v[:] = b'abc'
print(x, v)