Coffee Break with AI is brought to you by Elisio Quintino and Martijn Loos.
Welcome back to our blog series Coffee Break with AI! The previous blog went over chess, Jeopardy! and poker, and showed they were all solved by AI models that don’t use neural networks. In this chapter, we highlight three milestones where AI did use neural networks to solve their cases. Let’s dive in!
DeepMind beats an Atari game
2012 was a breakthrough year for neural networks and deep learning. Multiple articles were released on how neural networks outperformed state of the art algorithms.
DeepMind, a company founded in 2010 and bought by Google in 2014, jumped on this technique by using deep learning and applying it to reinforcement learning. Think of reinforcement learning as a computer model trying out different things, getting a reward when it does well, getting a penalty when it does bad, and learning from this interaction. Using this, DeepMind was the first to create a model that beat the Atari game Breakout in 2013. They released a paper in 2013 and a short video in 2015 showing the AI playing.
Why Atari breakout?
Atari Breakout is a minimal, contained application, with few variables. The model only needed to take into account the game screen and score and that was all. It’s a scoped first step in applying neural networks in combination with reinforcement learning.
In addition, when humans play Atari games, they play it on a screen with a controller. When teaching a model how to play these games, there has to be a way to capture what is happening on the screen, so the model can use that information to learn. Therefore, using Atari breakout in this research also provides insight on how to process visual information using machine learning.
How did it work?
The DeepMind team captured the Atari Breakout screen while playing and then used the graphical information as input for a convolutional neural network (which is a type of model that can take in both images and video as input). Almost all machine learning models need numerical values as input, but a screen capture is just a bunch of pixels. So, the convolutional neural network converted the pixels of the game into numerical values that can be understood by the reinforcement learning model.
Then the reinforcement learning was trained on this data, with the goal to maximize the score that is on the screen. It learned through multiple iterations that the longer the ball was in play, the more blocks were hit and the higher the score would be. It even learned that digging a tunnel through the side of the level, so the ball keeps bouncing on top of the blocks, is a good strategy.
What are the benefits?
Applying AI to games is beneficial because it gives access to an environment that is easily controllable and understandable by the people implementing the model. These kinds of models can figure out the rules of the application on their own. The developers did not tell the models how Atari Breakout works, the model needed to learn the mechanics of the game only from visual information. It needed to understand the rules, figure out what the controls did, learn how to score points and develop a strategy, without any hardcoded information about the game itself.
Applying this kind of research is not only fun, but it is also a step towards autonomous learning in real-world environments!
DeepMind wins at GO
The general idea was that AI wouldn’t be able to beat a human player at Go for a long time, but due to neural networks, DeepMind was able to do this already in 2016! They beat the world champion of Go, Lee Sedol. For more details, check out the (very entertaining!) movie on YouTube about this match and the events that led up to it.
Why Go?
After chess, Go was the next big logical game for the AI community. If we compare chess with Go, chess might have more rules, but Go has more unique board configurations. Where chess has 10120, Go has 10174. Think of it like this: multiply the number of chessboard configurations by the number of stars in the universe, then multiply the result by the number of stars in the universe again, and that is still factors lower than the number of board configurations for Go. It’s way harder and, therefore, a new challenge for AI to overcome.
How did it work?
They made a large neural network model that only knew the rules of Go. The team of DeepMind didn’t give it any strategy. First, it trained on millions of hours of Go gameplay from humans. Taking those models as a baseline, they started pitting small variations of the models against themselves in a tournament-like fashion. In this process, the models that win will go on to the next round, so that only the best performing models will remain. This whole training procedure can take weeks. The time it took to train the model that eventually beat Lee Sedol was about 6 weeks.
What are the benefits?
Similar to chess, the benefit of this is to show how AI can solve an “intelligent” task. It shows how powerful neural networks can be when there is enough data and time to train.
The Go-challenge can be seen as the Chess-challenge on steroids. Both games have always been accepted as a proof of intelligence for their players. Having an AI model mastering them and winning against top human players should not be underappreciated. But Go goes even further, by having an unimaginably large number of board configurations in comparison to chess, which makes a brute-force approach impossible. Unofficially, we can consider Go to have been the last man standing against AI in the collection of classic board games.
Alphastar wins at Starcraft 2
Yet another invention of DeepMind, where they created a model that could play the videogame StarCraft 2 at the game’s highest possible level, even beating the top-ranked players.
Why StarCraft 2?
StarCraft 2 is what is called a real-time strategy (RTS) game. This means that the game is continuous in time. If a game is not real-time, it is usually turn-based, like Go, which pauses on every turn.
The premise of StarCraft 2 is to build up a base, then build units to try to defeat the base and units of the opponent. The game is also continuous in space, which means there is not a specific grid where units can move on, but they can move anywhere on the map.
Additionally, in these types of games, the level that is played on is not completely visible until units are commanded to explore and gain that visibility. So unlike chess or go, the complete state of the game is not known at all times, or in other words, it is an imperfect information scenario.
All of these points together make it a very challenging game to take on autonomously.
How did it work?
AlphaStar took the same approach as Go, first training on hundreds of hours of play data from human players and then using that model to play against itself numerous times. In each iteration, a variable was changed and the version that won would be used for the next iteration. In the end, the model was so good it could beat 99.8% of the StarCraft 2 player base.
What is the benefit?
Creating a model that works in an environment that is continuous in time and space and has imperfect information is a step towards making models that could work in scenarios in the physical world. The real world has all kinds of unknown variables and unexpected changes. If a model can hold its own in a simulated environment like StarCraft and make predictions over long sequences of actions, this could help apply it to real-world challenges. An example is autonomous rescue robots searching for endangered and/or trapped people.
Conclusion
After these examples, hopefully, we have given you a good understanding of how and why it can be important for AI research to utilize games, be it physically or digitally. As we saw, it goes beyond the games themselves: games represent, in many aspects, situations that an autonomous agent might find in the real world. Poker has elements in common with diplomacy and negotiation; StarCraft reflects in its micro-universe navigation in unknown territory; Jeopardy requires strong language understanding skills; and so on. We hope – and we’re already seeing it happen – that the lessons learned through games will contribute to the development of practical, real-world AI applications.
Learn more about the exciting work of IFS Labs here.
Coffee Break with AI is brought to you by Elisio Quintino and Martijn Loos.
Do you have questions or comments?
We’d love to hear them so please leave us a message below.