Adversarial machine learning is one of my specific interests in the field. This short post is about advertising a notebook code with some examples but also speculating on the topic. I am specifically focusing on computer vision tasks taking handwritten digit classification as a textbook example. The subject notebook is available via this link.

Here, I am not discussing technical details (please visit the above link) but rather briefly provide my personal attitude towards the topic and why I think it is important.

Adversarial machine learning is a study of machine learning application vulnerabilities. Though it may sound like hacking into someone else’s computer, I see it as more of a fundamental study of why machine learning fails in theory and how to demonstrate this in practice.

From my point of view, adversarial ML challenges one of the pillars of ML as a whole: the ability to generalize. When I was first introduced to ML I was often asking myself: well, is there really anything new that I did not know before? Maybe you have a good training in linear algebra, analysis and stats as well: in this case you will mostly feel like revising familiar ideas written in a new language when reading ML papers. But you might have hard times when trying to find a good analogy for the generalization term.

For myself, I have built the following simple picture to incorporate generalization into what I know already. Say, you have data points and you are preparing a fit to figure out more data points. A good fit interpolates. This is possible in a situation with many data points, few dimensions and a simple model to describe the data. A bad fit extrapolates: a situation with a few data points and/or many dimensions and/or many parameters in the model to fit. Generalization is a kind of “good extrapolation”. In other words, numbers-wise you would expect a bad fit in your very specific problem but somehow it works much better than expected.

Extrapolating but still generalizing is the core of applied ML from my point of view. To see it, imagine a typical ML task that I also consider in the notebook: classifying hand-written digits. Each sample is specified as a 28x28 px greyscale image: a vector of 784 numbers in total. To properly sample such space we would need an exponential number of images: n ** 784. This is a very large number: len(str(2 ** 784)) == 237 > google. Preparing this number of images is, of course, impossible in practice. Instead, we use 50,000 training samples but still manage to prepare a model that successfully recognizes most digits from the test set.

Leaving aside the discussion on why ML may work in practice, let’s focus on why it does not work in theory. The answer is, because it extrapolates. Indeed, it is really impossible to sample a 784-dimensional space. What happens with 50,000 training points is that they form a tiny convex hull where each point is at the boundary and it is almost impossible to land a new point inside this cluster (see 2110.09485 and 2101.09849 for the details). Even if, for whatever reason, the model generalizes towards the new point it is actually a no-brainer task to find another point where it does not generalize.

This is where adversarial machine learning comes into play: it develops methods to find such points but also to deliver adversarial examples into the physical world where they trick machine learning setups in production. The core idea for the majority of the methods is very simple: you compute gradients of the ML setup and simply follow them. Thus, a major part of current research is to make machine learning setups robust against adversarial gradient attacks. This is where things are getting interesting.

You could, for example, “flatten” your ML setup gradient-wise. This usually imposes a strong regularization during the training. Such models have higher training and test set errors but they are also less expressive (this can be easily seen from the linear regression example).

Another approach is to use a more educated model guess with fewer parameters to optimize. Here you face another dilemma: if you are smart enough to propose such models then you might not need machine learning at all.

Finally, you could make a machine learning function non-analytic. While this may seem promising at first glance, non-analytic functions are very difficult to optimize. Typically, the machine learning function is analytic while training and some sort of noise layer is added afterwards. This is an ad-hoc solution to the problem: gradients can be averaged over multiple queries and exploited afterwards.

Thus, currently, the possibility of adversarial attacks seem to be an intrinsic property of machine learning: at least until “extrapolates but generalizes” situation determines the field.