I now want to teach you an alternative method for planning. This alternative method has a number of advantages and a number of disadvantages. It’s called dynamic programming, and just like A-star, it’s going to find you the shortest path. You give it a map of the environment as in A-star, one or more goal positions– let’s assume just one goal position. What it outputs is the best path from any possible starting location. This planning technique is not just limited to a single start location, but to any start location. Why would we worry about this? Let me give you an example. Suppose you are the Google self-driving car in an environment just like this. You’re in this little street over here, and you’re asked to turn right, but your goal is right over here. As before, there are two different lanes over here–a left turn lane and a straight lane. If you reach the straight lane, the only way to get to the goal is to go around the block over here and proceed in this direction. You’ve seen this example before. Now, the point I want to make is a different one. That is, your attempt to do a lane shift over here might fail. Why would it fail? Well, it could be there’s a big, big truck in this lane over here, and as you go into the right lane when you’re waiting for the truck to disappear, there are these people behind you that honk their horns. You really don’t want to wait for the truck to disappear. That means the environment is stochastic. The outcomes of actions are non-deterministic. In our planning so far we ignored this, but in reality that’s the case. In reality, you might find yourself–wow, I’m over here. How did that happen? Well, it’s happened because the world is stochastic, and this truck over here– this stupid truck—didn’t let you in. What that means is you need a plan not just for the most likely position but you might need a plan for other positions as well. What dynamic programming gives you is a plan for every position. If we redraw this environment as a grid with a goal location and certain obstacles, they dynamic programming gives you an optimal action to do at every single grid cell. As you can see, each grid cell now has a label. That label is often called policy, and policy is a function that maps the grid cell into an action with the action in this case as a move left, move down, move right, or move up. Now, we will compute a policy using dynamic programming. That is, given a grid world like this and a goal state like that, we will write software that will output for each of the grid cells what the best thing is to do should the robot find itself there. That requires a different algorithm than A-star. It happens to be a more computation involved algorithm. As I said before, it’s called dynamic programming for robot path planning.