hammertoe

A coffee-loving software developer.

I mentioned last week that I am about to start 'weeknoting'. Weeknotes, are getting back to the original essence of 'blogging'. A term that was a portmanteu of 'web log'... literally a log of what is going on. For a more in-depth view of weeknotes and their ressurance throught the GovCamp movement see here.

So I've just had my first full week in my new role as a Developer Advocate for IBM. How has it gone? Well it has certainly been a pretty eventful week. Last Monday I went up to London to meet a few of the team. The IBM Developer Advocate London City Team are based in a We-work big glass 7-storey WeWork building in Moorgate. The trip up from Bristol was delightfully uneventful. There is now a new direct train from Bristol Parkway to London Paddington, then about half a dozen stops on the tube to Moorgate and the office is just next to the tube station.

I met some of the team. A great bunch of people I'm really looking forward to working with. One of the projects they are working on is a series of 'What is?' videos. These will be short 90 second videos explaining various topics and terms, such as containers, blockchain, etc. They plan to film a few of these each week, and I'm looking forward to get stuck in as this very much aligns with the longer form videos I was doing on cryptocurrencies last year.

I've heard of WeWork recently (as many have) due to the financial issues it has been having. I've worked in co-working spaces before and never thought much of them or what all the fuss about WeWork was. But this place is something else... has an amazing buzz about it. Partly due to the fact all the offices are glass so you can see from one side of the building right through about half a dozen offices and out the other side. Gives it both a very open feel, but also you see all the activity going on.

I actually nearly got recruited by someone else before even making it to the office! I was sat in the foyer of WeWork as I didn't have an access pass yet, waiting for someone from the team to come down and fetch me. A gentleman I didn't know (but I didn't really know any of the team yet) came up to me and offered out his hand:

“Matt”

“Yes” I replied

“Steve” he said.

I shook his hand and he ushered my towards a cafe-style table in the foyer.

“Coffee?”

“No, I'm OK thanks”

“Had an OK trip up?”

“Yeah, came up on the train, all went smoothly”

We chatted for a bit, then he said:

“So, you going to work after this?”

I looked puzzled at him, then it hit me... I was not who he thought I was, and he wasn't one of the team I was due to be meeting. Turns out his name is Matt. And he was expecting to meet someone called Steve.

So... yeah... turns out he was a recruitment agent and doing interviews today and though I was a prospective candidate.

I'm sat on the train again now heading up. This has certainly been a test of the process this morning, as my wife's knee dislocated last night. She has an ongoing degenerative issue with it, and last night just getting up from the dinner table it went and she ended up on the floor. Also my youngest daughter is home from school with a cough. So this morning was racing about sorting out thermos flasks of tea and food for her as she is bed-bound. Eldest daughter is home, so can keep an eye on her. But between dealing with a very tired and upset 9 year old and having to take the dog out for walk, and get eveybody ready before leaving the house, it was a bit of a push. But a mad cycle to the station and made it in the end with 10 mins to spare.

Most of the team should be in London today, and there is another new starter to the team, so we are all going out for lunch, so I didn't want to miss it. I am also on-call for my old job this week as part of the transition process, so dealing with incoming pager alerts too. Will be a busy day.

Also last week I gave a remote lecture on machine learning and AI to a class of Computer Science students at Samford University in Alabama.

Then Later on that evening I joined one of my new colleuages, Sean, to do a workshop in Bristol on serving machine learning models in the cloud. This was part of a series of IBM Developer events we run in cities around the country (and world) and I hope to be running more of these in Bristol in the coming year.

At the end of the talk, Sean invited me up to do a part of the talk I'd just done earlier to the students. Both our talks touched on some common topics, and it was great to hear the same thing explained in different ways.

One of the main parts of my new role will be going to conferences and meetups and doing talks. However the current Corona virus pandemic may put a bit of a dent in that. So far most of the big tech companies (IBM included) have cancelled their annual developer events. I am also due to be doing a talk at the DragonPy conference in Slovenia next month. The organisers are trying to decide if it will still go ahead. The Slovenian govt has just put a ban on gatherings of over 500 people. This conference is going to be smaller than that as far as I know, but still... not an easy decision for the organisers.

The rest of this past week, I've been settling in, doing some housekeeping on my laptop and tidying up / archiving stuff from my old role and starting to look about at various learning topics for my new role.

One of the things that interests me is IBM's LinuxONE offering. This is Linux running on IBM zSeries mainframes. IBM put this under their 'HyperProtect' offering. One of the key things that zSeries offers is encryption of data at all levels. So I spun up a HyperProtect VM on IBM Cloud and had a go at compiling and installing some stuff to see what it is like. There are a lot of packages already compiled for it, but I was interested in installing the latest Boost C++ package, which means compiling from source as the current Ubuntu package is a bit old.

I couldn't quite get Boost working. Well, it compiled OK, but there is a particular set of functions missing. I see in the official apt packages that they have some patches related to it, so I'm going to try and install the latest source package and see how I get on.

I was invited to give a guest lecture (remotely) to a class of Computer Science students at Samford University, Alabama on machine learning.

The lecture covered some of the posts I've written about previously in this blog, on machine learning experiments I've been doing and also some practical examples of some of the algorithms they have learned / will be learning such as Generative Adverserial Networks (GANs) and Convolutional Neural Networks (CNNs).

The video of the talk is available on Cinnamon or YouTube:

And the slides are available on Slideshare:

Thanks very much to Asst. Professor Brian Toone for inviting me.

If you wish to use any of the tools mentioned at the end of the video such as Watson Data Studio you can sign up for IBM Cloud.

So, I'm currently sat on the train from Bristol to London to start my first day as a Developer Advocate for IBM. I thought I'd note down some of my initial thoughts before I have my first day, so that I can look back to see how things compare.

One of my goals will be to start writing weekly 'weeknotes' as I go along, as this is a job that I feel I can really expand my own personal development. I'm going to be making a weekly trip to London, so my plan is to use the train journey as a chance to reflect on each week and look at what went well and what didn't. What was frustrating, what did I learn, what excited me, etc.

I've been quite quiet on Coil in the past few weeks as I've been getting things ready to start this new role, but will be back to writing about neural networks and AI... and coffee... again soon, I promise! As in fact, this will kind of be my actual job now!

So, what is a “Developer Advocate”? It is a role which helps developers get the best out of a platform or tool. It is not a sales or marketing role, but an education role. Whilst ultimately the goal for my employers is to get more people using their products, it is about empowering developers to make informed choices and to get the best out of the tools and products they have available.

Also sometimes known as as “Developer Evangelist” and “Developer relations”, this is something I want to ask my new colleagues, do they see these terms as interchangeable, or do they see differences in them?

As some of you may know, I love doing talks, and writing about technologies. I have an innate desire to learn myself, and my main mantra is that if I can't explain how something works to someone else then I don't really understand it myself.

In the past, I have spoken at numerous conferences, workshops, meetups, etc. and organised several large developer conferences. Most recently I gave talks at PHPSW and SW Mobile on an Introduction to Machine Learning and Artificial Intelligence. Tomorrow I will be attending an IBM Code Bristol event on getting machine learning models ready for the cloud. I am also doing a remote lecture tomorrow for Computer Science students at Samford University. I've also been blogging about machine learning, and was doing a weekly talk on explaining cryptocurrencies. Next month I will be doing a talk on machine learning in Ljubljana at the DragonPy conference.

So this new job is right up my alley. I will be paid to educate people and play around with new technologies and services that IBM and the wider world create. This is kind of a role I've done in the past, but never in a paid capacity (so yes, cryptocurrency folk, I guess you can now call me a 'paid shill'). It has always been an incidental role. I ran a web development agency for 16 years and it was one of my main roles at the Technical Director, but it always seemed a hard role to justify in a small team. It is something that is very hard to quantify. I'll be interested to see how the IBM Developer Advocacy team measure their success, as I think they likely have a better means of tracking metrics.

I don't yet know exactly what my day to day role will involve exactly, but I think my first tasks will be to start scouting for events and conferences to attend and learning more specifically about some of IBM's offerings around machine learning, AI, containers, and the likes. I've met (virtually) a few of my new colleagues already, and super excited to be working with them, as seem like a very talented and enthusiastic bunch of people.

So, if you've not had Flammkuchen, think of it a bit like a pizza, but with no cheese. This is ideal as my wife Vik hates cheese. But until now, we'd only ever bought them ready made. There is a brand (I can't remember offhand) in a white box... sounds more Scandinavian than German that we often get. They live in the freezer and can be cooked at a moments notice from frozen.

But this evening my wife saw a recipe of how to make them from scratch. We had no idea they were so easy! Easier than making pizzas even as the dough doesn't need to be left to prove at all.

And... oh. my. god. It was amazing! So quick to make, and the base came out so much better then the pre-made ones. So here we go:

Ingredients

  • 250g (2 cups) plain white flour
  • 30ml (2 tbsp) olive oil
  • 125ml (½ cup) of water
  • 2.5ml (½ tsp) salt
  • half a white onion – finely sliced
  • 100g (3 ½ ounces) of smoked bacon lardons (or chopped bacon)
  • 30ml (2 tbsp) crème fraîche
  • 30ml (2 tbsp) sour cream

Preparation

  1. Heat the oven to as hot as it will go (about 275c / 520f)
  2. Put the flour, oil, water, salt in a bowl and mix it together with a spoon and then kneed it for a few minutes with your hand until smooth and elastic
  3. Line a baking tray with greaseproof paper and roll the dough out on the tray into a rectangle as thin as possible
  4. Mix the crème fraîche and sour creme together and then spread all over the base with the back of a spoon, leaving a 1cm (½ inch) gap at the edge.
  5. Scatter the sliced onion and bacon lardons over the cream

Cook

  1. Put in the oven on top shelf for 10 minutes (keep an eye on it as top can burn quickly)
  2. Cut into slices
  3. Eat!

Writing simple trading bots, or algorithms, to trade a cryptocurrency, commodity, or stock is pretty simple, right? You just need to buy low and sell high... easy, right?

Well as anyone who has attempted to do this will tell you, it's not that simple, as there are a myriad of complexities. In this post I'll talk about just one aspect, and that is 'hyperparameter tuning'. I'm going to slightly abuse the term hyperparameter here for this example. Typically, a hyperparameter is term used in machine learning to describe a 'meta parameter'. That is, not the parameters that the machine learning algorithm itself is learning, but the parameters about the learning as a whole.

An analogy: what books I decide to read at university in order to learn might be a parameter, but what university I go to in order to do that might be a hyperparameter.

In this post I'm talking about using a python library called hyperopt to tune the parameters of a simple trading algorithm. So we are actually tuning parameters, not hyperparameters, but the library doesn't care. Hopefully this will become clear below. Just think of it as trying to tune some parameters.

Lets say we devise a very simple trading strategy to instruct some software to automatically trade a currency (or cryptocurrency, stock, commodity, etc):

Buy when the price goes above the 21-period moving average, sell when it goes below it.

Pretty simple, huh? The aim is to try and detect some kind of trend in the movement.

In that algorithm, “21” is a (hyper)parameter. It could be 18, or 53, or 1, or 10,000. How do we choose the best one to ensure that our algorithm is profitable? Too small and the algorithm will trade too much and likely lose money in fees. Too big and we'll only trade once in a blue moon and by then the price will have already moved quite a lot.

What does the problem look like?

Let's take some guesses and plot them out and see what they look like. In the charts below I'm plotting the price of USD/JPY and a 5 period moving average. The green triangles indicate where the price rises above the moving average and we should buy. The red ones when we go below and should sell.

Quite noisy. Lots of trades going on. And we will be charged a commission on each trade, so will lose a small amount each trade. Let's look at a few more:

As you might be able to see we do really badly on some of them. Look at the last one. We buy in mid-February at around 110.2 Yen and then sell in mid-June at a lower price of around 107.5 Yen... not what we want to do!

What can we do?

So what is the best number to use? What is the best figure for a moving average that we should use? That is where the parameter tuning comes in.

We could just try every number. Computers are fast, right? Try every number between say 4 and 200 and see what works best. That is only 196 possible outcomes. This is known as a 'brute force' approach. That will take a computer less than a second to work out.

But what if our strategy is more complex? What if we have several parameters we need to tune simultaneously? Rate of upward trend, rate of downward trend, stop loss position, etc? We could quickly end up with hundreds of thousands or even millions of combinations to try out. And what if we are wanting to test on several years of data? Maybe using 5-minute intervals, not daily intervals? What if each attempt takes longer as we are testing out our SuperFancyUltimateMoneyMaker2000 strategy?

Brute force won't cut it. We could be waiting for hours, days or weeks for a computer to try all possible combinations.

So what if it could do something more clever? What if it could try some random combinations and then look to see if they give good results or not, and if they do, then try other values 'near' those other good ones.

This is what Hyperopt lets us do. It uses an algorithm called Tree-structured Parzen Estimator (TPE) to more intelligently 'search' the space of all possible combinations to find the best ones. The end result is something that could take a whole day to search all best combinations by the brute for approach can now take mere minutes.

And what is great with Hyperopt is that it is really simple to use it. You need to define two things:

  1. Your function that you want it to run. In this case a function that simulates trading as detailed above and takes one or more parameters you want to optimise.
  2. The parameter space.

Below I'll dive into the actual code and the results of the optimisation we did. If you are not a Coil subscriber, now will be a good time to subscribe ;)

Header photo by Mikael Kristenson on Unsplash

Read more...

Read more...

So following on from my last post about trying to generate cartoon ducks using AI — and accidentally producing something quite Warhol-ish, I decided to try and generate a new profile pic for myself on our intranet at work. A colleague of mine said that my current pic makes me look Mr Noodle from sesame street. No. You can't see the pic. But maybe my beard was a bit too unruly at the time, and maybe the wall behind me was a bit too bright and primary-colourish.

This is a post in my series on machine learning and artificial intelligence. You can find more posts on this topic at the main index.

[](https://coil.com/p/hammertoe/Blog-Machine-Learning-and-Artificial-Intelligence/1xVFVh1yj)

So could I use the same technique of Generative Adversarial Networks (GANs) to produce a new image of “me”?

Let's recap how these networks work with a little analogy:

“So Sir, can you describe to us the person who robbed you of your wallet?”

“Yes officer, he was male, early forties, caucasian, 5'9”, short brown hair, glasses, a beard and moustache”

[sketch artist works furiously]

“Like this?”

“No, the glasses were thinner, wire-framed type”

[sketch artist draws a new drawing with different glasses]

“Like this?”

“Yeah... maybe smaller nose”

[sketch artist draw a new drawing with smaller nose]

We have two neural networks, one, the generator (the sketch artist) and a second one, the discriminator (me). The first one is creating new images and the second one is trying to critique them. If the critic can't tell the difference between a 'fake' and a 'real' image, then the generator (sketch artist) has learned how to produce good likenesses of the subject.

So first, I needed a whole load of real images for the generator to feed to the discriminator in amongst its 'fake' ones to see if it could tell the difference.

Luckily Apple iPhones already have some machine learning in them to identify and categorise people. So I can easily copy 300 pictures from the last 5 years of myself from the phone to my desktop computer for processing.

I then opened them all up in Preview and very quickly and roughly cropped them to just have my face in. I discarded those that I was wearing sunglasses, or at a very odd angle to the camera.

I then fed those images into the GAN from before. And out of the gaussian noise, started to emerge a somewhat recognisable me...

Further...

So there, are definitely some likenesses there, but still in most of them I look like some apparition from a horror film.

I realised, that the images all being slightly different crops and orientations was giving the GAN a hard time. Bearing in mind this is a fairly simple network and we are far from the state of the art.

So I realised that I could use another bit of machine learning to pre-process the images. I could use a computer vision library to detect my facial features and then rotate, shift and scale the image such that at least my eyes were in the same place in each photo.

If you want to know the full technical details and code on this, I've written a separate (subscribers only) post on Using OpenCV2 to Align Images for DCGAN.

But the end result was pretty cool... and quite spooky. So here is an animated gif showing a few of the input images:

Notice how my face is slightly different size, and eyes moving about in different locations? Compare that to the aligned images:

Pretty eerie, right? Those are the same images as above, but the pre-processing AI has calculated the location of my eyes, and transformed the image such that my eyes are in the same location in each image.

So lets feed that back into the GAN and see how we do...

Not bad! There are certainly some ones there were it looks like I've been hit in the face with a shovel a few times... but actually overall it has done pretty well.

Again, what is pretty amazing is that none of these images ever existed. They are not distorted photos, but they are the result of a machine learning algorithm learning what my facial features look like and creating an entirely new and unique image of me.

So to recap, in the end there were three entirely separate machine learning / AI algorithms used in this process:

  1. The machine learning on my iPhone that had analysed all my photos and clustered the ones of the same same people together, such that I could easily find photos of me
  2. The OpenCV script used to pre-process the images that located my facial features in the photos and the rotated, scaled, and shifted the image such that my eyes were in the same place on each image
  3. The Generative Adversarial Network (GAN) that was used to generated entirely new images from the existing ones.

For coil subscribers, there is another animation showing the learning as it progressed.

Read more...

So, firstly, thanks to Alloy for the Magritte reference for the title of this post.

[](https://coil.com/p/hammertoe/Blog-Machine-Learning-and-Artificial-Intelligence/1xVFVh1yj)

This is a post in my series on machine learning and artificial intelligence. You can find more posts on this topic at the main index.

Yesterday, a friend on Twitter, Munch, put out an art challenge:

Well, I'm rubbish at art. But for a bit of fun, decided that rather than showcase my rubbish art, could I instead showcase a computer's attempt at art? What could go wrong? Can I teach a computer to draw a duck and fill in the blank template above?

So, introducing to you the concept of Generative Adversarial Networks... or GANs for short. Gen...what? These are a type of machine learning neural network that attempt to generate new data (generative) from learning from a series of samples, by trying to out-wit a second neural network (adversarial).

So how does it work? Let's imagine you and I play a game. Your job is to show me a series of James Bond films some of which are real, and some of which are ones that you have produced yourself. My job is to try and guess which of the films is real, and which is fake. Got it so far?

So firstly, you watch all 26 James Bond films and try to distil down the essence of a James Bond film. What features make it a James Bond film if you had to describe it to someone?

  • Athletic male lead, who is a British secret service agent
  • Lots of gadgets like X-ray glasses, and camera tape recorders
  • Car chases... cars with guns in them etc
  • Esoteric villains
  • A sexy adversary
  • Great music
  • etc

Using that list of 'features' of a Bond film, you now produce your own Bond film... you might be good at it you might be bad at it. To start with you are going to be pretty lousy at it on your first attempt.

My job is to look at the film you produce and decide if it is a real Bond film or fake. If it is very easy for me to guess a real one from a fake and I get them all right, then that means you are doing a poor job at making fakes.

The goal here, is for you to keep trying to make fakes to fool me. And my job is to keep feeding back to you how well I can tell the real from the fake. If I can't tell the difference from a real Bond film and a fake one, then you have done a very good job in producing a new, unseen, fake Bond film.

In terms of a machine learning architecture, it looks like this:

source: https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016

You have one branch at the top (yellow) feeding in real Bond films, and you have a generator at the bottom (blue) creating fakes ones (your job). And you have me, the discriminator (red), trying to tell if they are real or fake. My error rate in judging them, loss, is fed back to the start for the next iteration. Based on that loss, you can determine which features were important or not. ie. if you put a dragon in your Bond film and I feed back that clearly this is not a Bond film you can infer that dragons are not a legitimate element of a Bond film.

Show me the ducks!

So, I needed:

  • a working GAN algorithm
  • some ducks

For the algorithm, I found one on Github (a code sharing/collaboration site) that I could use for this task. This one had been used originally to try and generate new Simpson's characters, so might be good, as already tuned to cartoons:

I loaded the code into Google Colab, which is another collaborative tool that allows you to run python code on a cluster of computer run by Google. This is helpful as GANs take quite a lot of computing power, and means I could run it faster than I could on my computer at home.

I also needed a bunch of 'real' images of duck drawings for it to use. I search Google for “draw a duck share your art” which is the meme this was from, and downloaded a whole bunch of them:

I uploaded them to where my code could load them and then set the algorithm to run. My first attempt was for 300 'epochs', that is there were 300 iterations of trying to generate duck images and trying to fool the disciminator.

The first attempts you can't see anything duck-like at all, as the generator has started with random data and still not worked out what 'features' the output images should have. Here is 5 samples of 'ducks' from epoch 24:

Not very exciting, huh? Just grey blobs...

As it progresses, it starts to try various colours... and we end up with a nice tweed sort-of colour coming up. Great for cushions perhaps, but still not a duck. Epoch 100:

By the time we get to epoch 250, we are now getting some slight duck-like features show in the images. You can spot the yellow feet and beak emerging:

But by the time we get to epoch 300, it's all gone wrong again:

What happened?! Well it turns out it looks like the algorithm kinda overshot the mark. There is actually a graph produced of the learning losses of both the generator and the discriminator:

You can see that they were converging until around epoch 250, then the generator kinda went a bit nuts.

So, I decided to run it again, reducing the 'learning rate' of the algorithm. This means it will learn a bit slower, but hopefully won't overshoot as much. I left it running overnight for 600 epochs to see how it got on. The graph of the learning looks a lot more positive this time. As you can see by the time it got to the end it was still pretty stable. It looks like it still has a bit of a 'wobble' around 300, but then seems to stabilise again.

So... the moment of truth... let's see what our ducks looked like!

Here is a sample of them, from near the end at iteration 589:

Whilst I don't think I will be winning any awards with these ducks, I think it has certainly managed to get along the right tracks!

I think due to the high variance of input duck images, it had a hard time trying to work out what a duck should look like. With more samples, and perhaps more iterations it might do better. There are also a number of parameters to the algorithm that can be tuned to try and get better results. But for a quick un-tuned attempt, I'm pretty impressed.

And for Coil subscribers below you can see a really cool animation of the 'evolution' of the algorithm learning what a duck is :)

Part II of this adventure continues with: Ceci n'est pas un Matt – Machine Learning and Generative Adversarial Networks – Part II

Read more...

This is a blog about machine learning and artificial intelligence. My aim is to introduce people to the concepts and to explain some of the ideas behind it. Some posts will be aimed at trying to explain the concepts to less technical audiences, some will go into more depth on machine learning with code samples in Python.

Below is a list of the posts I've written or intend to write. This will be a living document and updated over time serving as a master index. If there is a specific topic you want covered, then let me know on Twitter.

Posts that are subscriber only are denoted with a padlock symbol 🔒

Machine Learning Concepts

Random fun stuff

More Technical Posts

[header photo credit Josh Riemer]

In this series I am going to be talking about machine learning and artificial intelligence. I will be covering three main branches of machine learning: supervised learning, unsupervised learning, and reinforcement learning. In this post I'll be talking about reinforcement learning, as it is one of the ones I find the most fun.

This is a post in my series on machine learning and artificial intelligence. You can find more posts on this topic at the main index.

If you haven't done already, and need a primer, go and read my introduction to machine learning that I wrote previously: What is Machine Learning / Artificial Intelligence?

So, here we go. Who has one of these at home?

No, not an Ewok... it is in fact a dog, Violet, my pug/bichon cross. If you have a dog or are a parent, you probably have already come across reinforcement learning. It is the act of learning via a reward. Reinforcing good behaviour with a treat. Perhaps discouraging unwanted behaviour via a punishment.

You tell your dog to sit. It has no idea what the word 'sit' means. It looks at you. You give it a hint, maybe, holding a treat above it's head so it sits down to look up. And you give it the treat. You repeat this over and over. And the dog learns to associate the command 'sit' with the desired outcome.

Reinforcement learning in computers is pretty much exactly the same. It is an iterative process by which an algorithm 'learns' desired behaviour by way of being given a 'reward'.

This process is shown above as a state diagram. We have an agent (the dog), that performs an action (sits), in an environment (your house), which results in a new state (a sitting dog) and a reward (tasty treat).

What is so great about reinforcement learning is that you don't describe how the result should be achieved, just reward the desired outcome. ie, you don't tell your dog to bend it's rear legs to sit its backside on the ground. You just reward it when it sits down.

Let's move away from pets, and show this in the context of a computer learning to play a simple game:

Do you remember the Atari game “Breakout”? In the game you control a small horizontal “paddle” at the bottom of the screen. You have a joystick or controller and can press “left” or “right” to move the paddle. A ball bounces around the screen and your job is to stop the ball falling out the bottom of the screen. A bit like pinball. You have to bounce it off the coloured blocks and the top and try to destroy them all. You get a score for each block you destroy.

We can use reinforcement learning to teach a computer how to play this game. We have two possible actions: left and right, we have a state which is the grid of coloured pixels that make up the screen, and we have a reward, the high score. So the job of the algorithm is to look at the pixels (the image on the screen), move the paddle left or right, and observe the score as a reward.

At first the algorithm will do very badly. It has no idea what it is doing (just like a human player for the first few seconds playing). IT will randomly move the paddle left and right. It has no idea what the coloured pixels on the screen represent, or even what the ball is.

But with each successive iteration it learns a little bit. In most cases RL algorithms use a neural network to process the images. I'll describe these in a later post, but they are effectively a model of a human brain and the neurons in it.

Above is a graph showing the score after each game played of breakout for various RL algorithms. Each coloured line is a different algorithm, and each one learns in a slightly different way. The axis along the bottom is in millions of steps, so this graph covers two million steps (iterations) in total. As you can see, at the start, they all get a pretty low score. Then over time, they learn. Some faster than others. For example, PPO doesn't get a very high score overall (around 150). Others, e.g. C51 got scores over 500.

At the end, the algorithm has learned that it needs to position the paddle to where the ball is heading in order to bounce the ball back up to the blocks. It has learned to play the game.

What else can we use reinforcement learning algorithms for?

Read more...