How MIT Students Fooled Google AI Image-Recognition Tech?
You see a nice picture of a dog. But Google’s neural network sees guacamole. The trick behind this new way of deceiving AI is more important than you think.
Machine Learning algorithms, which use large amounts of data to feed everything from email to language translation, are touted as the next big thing in technology. The only problem is that they are vulnerable.
In recent years, researchers have shown how a type of Machine Learning algorithm called image classifier – think of it as a program to which you can show a picture of your pet, and it will tell you if is a dog or a cat – are vulnerable and that in a surprising way. These programs are sensitive to the attacks of so-called “adversarial examples”. A adversarial example occurs when you show the algorithm what is clearly an image of a dog, but instead of seeing a dog, a detail that human eyes cannot detect makes the classifier see a guacamole the place.
Image credit: Labsix
The researchers initially thought that these attacks were highly theoretical, more a demonstration than a subject of concern. That was until the beginning of this year, when a group of MIT students from the LabSix student organization showed they could create three-dimensional objects that the algorithms could misclassify – showing that adversarial examples are a threat in the real world. The work of the students was limited in a key way: they still had to have access to the internal mechanisms of the algorithm to create their adversarial examples.
Today, these same students announced that they had already exceeded this limit, which is a disturbing glimpse of the vulnerabilities of AI that are already at work in our world.
In a new article, the authors describe their new ability to create adversarial examples when they know very little about the algorithm they are attacking (they have also been able to complete the attack much faster than any other what method has been used to date). To demonstrate the effectiveness of their technique, they successfully attacked and cheated the Google Cloud Vision API, a standard commercial image classification algorithm used on the Internet. All they knew about Cloud Vision was what it produced when it looked at an image – for example, the first few choices of identifying an image and the confidence it had in each option.
“Not having basic information about the neural network made creating an adversarial example to fool it a huge challenge,” says Andrew Ilyas, a LabSix student . “Normally what you want to do when you construct these adversarial examples, you start with an image of a dog that you want to turn into guacamole,” says Ilyas. “It’s important, traditionally, that I have access to the probability at all times that this picture is guacamole. But with Google Cloud Vision, it’s not going to tell you anything about how likely that dog is going to be guacamole. It’s only going to tell me how confident it is that it’s a dog.”
Image credit: Labsix
To work around this problem, the team used a method from another area of computer science to estimate how much each pixel in the dog’s image needs to be changed for the algorithm to think the image was guacamole. Then they used a pair of algorithms working together to slowly shift the pixels. The process works by submitting this image thousands, even millions of times in the Cloud Vision API, while the algorithms slowly adjust the dog to guacamole. Normally, this can take up to 5 million queries, but the method of Ilyas and his team is much faster. It only took about 1 million queries to create a specific contradictory example for the Google Cloud Vision image classifier – the guacamole that human eyes would never see.
This is a much more effective mode of attack and could make it easier for malicious people to deceive any number of commercial image classifiers used online. The LabSix team points out that they did not choose Google for a particular reason – several other companies offer this type of algorithms, including Amazon and Microsoft. For example, the review company Disqus uses an image classifier called Clarifai to eliminate inappropriate images from the comment sections of websites.
There are broader implications. For example, defense companies and criminal investigators also use cloud-based learning systems to sort large piles of images. A skilled coder could create an image that would seem harmless to the human eye, but would also be dangerous for the machine – and vice versa.
“This is yet another result that real-world systems are at risk and we’re making progress toward breaking practical systems,” says Anish Athalye, another LabSix student. “This is a system people hadn’t attacked before. Even if things are commercial, closed proprietary systems, they are easy to break.”
While adversarial examples changing in the real world, researchers have still not found a solid way to guard against them, and this could have devastating consequences in the future as these algorithms continue to colonize our online and offline world. But Ilyas and Athalye hope that if researchers are able to find vulnerabilities before these technologies become widespread, they will have the chance to fill the gaps in the algorithms – before people with bad intentions exploit them.