3.3 Pathfinding Algorithms
A pathfinding algorithm is a computer program that finds the shortest route/path between two points. This means using an algorithm to find the best way through a maze. It is mainly based on graphs and graph theory, which is why it can be really slow.
The most common pathfinding algorithms include:
- A* (Best Overall very flexible and great use cases)
- Breadth First Search (decent, starts from the root and explores children of the root children)
- Depth First Search (decent, starts from the root and sees one of it’s children’s sub tree)
- Greedy (decent, makes educated guesses, the node with the least estimated distance from the current node to the end node will be used next)
We will discuss the first three.
DFS (Depth First Search)
DFS is one of the easiest search algorithms. Since it only goes in one direction until it arrives at a stop, it is the easiest to implement. The way DFS works is out of its current neighbors, it picks the first one it sees and visits it, then it does the same thing. Again and again and again, until it arrives at a stop. Then it goes back one level and then picks the other neighbor. This process keeps on going until we finally reach the endpoint.
DFS can also be implemented in 2 unique ways, recursively and iteratively.
Recursively:
In this case, this is our logic:
- Check if we have visited the node we are on.
- If we have, don’t do anything.
- If we haven’t, visit this node.
- Once we visit this node add this to our visited set and then visit all the other neighbors using this function.
Here’s what that would look like in code:
def dfs(visited, graph, node): if node not in visited: print(node) visited.add(node) for neighbour in graph[node]: dfs(visited, graph, neighbour)
View code on GitHub.
Iteratively:
In this case we use a stack. Visualize a stack of disks. When you put a disk on top, you can only remove the disk on top. If you remove a disk on the bottom everything will get destroyed. This is similar to what a stack does. You can only remove from the top and you can only add to the top. First in, last out. So, as we keep on adding new neighbors to the stack, whenever we use the next neighbor, that will always be the most recent one. That means when the newest ones are being used first, we are going down the same path. If you have neighbors A, B, C. These are in a stack, so we can get the most recent added, C. Next, you will find 3 more neighbors, D, E, F. Now these also get added to the stack. So, our stack is (from bottom to top), A, B, (removed), D, E, F. Now we take the most recent one from the stack again, F. Notice how we just left A and B alone, but they are still in the stack. If we ever end up with 0 neighbors, we will start using the neighbors that we previously left behind in the stack since no new ones got added. So the old ones are the most recent ones. Using this step by step approach, we will go through all the nodes.
def dfs(graph, start): stack, path = [start], [] while stack: node = stack.pop() if node in path: continue path.append(node) for neighbor in graph[node]: stack.append(neighbor) return path
View code on GitHub.
BFS (Breadth First Search)
BFS is a search algorithm that searches a tree or graph, depth by depth. This means that it will finish looking through all the nodes in the current depth before moving on to all the nodes that the current nodes are linked to. Let’s run through how to searches a tree first:
Given tree:
First step, visit the start of the tree and identify neighbors:
Second step, visit neighbor one:
Third step, visit neighbor two:
Then keep repeating from there:
Now let’s go over each part more specifically. A couple of terms that you should familiarize yourself with, are the following:
- Nodes: Nodes in a tree are the connecting points in the tree. In the given example, the nodes are the numbers.
- Visiting: When you “visit” a node, you check it/that’s your current node.
- Neighbor: The surrounding nodes which the current node is connected to.
- Depth: The number of connections getting to a node takes. In the given example, 1 and 2 are at depth 1, 3 and 4 are at depth 2, 5 is at depth 3, etc.
In order to learn how to implement a BFS, we need to know how our tree is stored. Let’s say this is how we define our nodes:
class Node: val = 0 # value of the current node* neighbors = [] # connecting nodes of the current node*
In this code, our neighbors are more nodes. So, we can do something like
current_node.neighbors[0].neighbors[1].val
To get a value of 3, assuming we are starting at the left-most node and our neighbors are sorted top to bottom (according to the graph),
This is what our logic will look like while implementing BFS.
Now let’s write it:
def BFS(start_node): visited = [start_node] current_depth_nodes = [start_node] while len(current_depth_nodes) != 0: # this depth is not empty current_node = current_depth_nodes[0] current_depth_nodes.pop(0) # remove # add each neighbor to this depth for neighbor in current_node.neighbors: if neighbor not in visited: visited.append(neighbor) current_depth_nodes.append(neighbor)
Now when we run this it is not very efficient. This is mostly due to the data types that we are using. Right now, we only use lists, but if we use sets and queues, we could improve our performance.
from queue import Queue def BFS(start_node): visited = set(start_node) current_depth_nodes = Queue() current_depth_nodes.put(start_node) while len(current_depth_nodes) != 0: # this depth is not empty # returns and deletes the first element current_node = current_depth_nodes.get() # add each neighbor to this depth for neighbor in current_node.neighbors: if neighbor not in visited: # adds element to end visited.add(neighbor) # adds element to end current_depth_nodes.put(neighbor)
View code on GitHub.
This increases the performance of our code because the data types that we use now are specifically meant for this job and therefore deliver better and quicker results.
A*
A* is a really efficient search which expands on BFS by using a simple idea. If you watch a BFS algorithm searching, you are going to notice that it checks for the endpoint in all directions. If you watch an A* algorithm searching, you are going to notice it search towards the endpoint. This helps you find your endpoint quicker compared to BFS because it has to search less points as a total.
Take a look at the following images, and you’ll get a picture of why A* is more efficient. Note: the green represents the starting location, the yellow represents searched areas, and the red represents the end goal.
BFS (searches in all directions around the start)
A* (searches towards the end)
In A* we are going to search through all the neighbors, like we do in BFS, but we are going to assign each neighbor a score. That score is calculated by taking the distance it takes to get there, and the estimated distance it takes to get from the neighbor to the endpoint. Once we add these 2 values up, we can approximate the best node to go to from our current situation.
For our example: Since we are showing an example of a grid and using the A* search to find the shortest path from start to finish, all the cells in the grid are analogous to nodes. Also, our 2 functions that make up our score will be called h and g. The g score is going to be the distance it takes to get to the cell from start to cell, the h score (also called heuristic) is going to be the estimated distance from a node to the finish. Our final score, let’s call if f score, is going to be the sum of the distance it takes to get there(g score), and the estimated distance it takes to get to the finish (h score).
We are going to be using an algorithm similar to BFS, but we are going to incorporate the score of the cell in order to improve our search. Let’s get started with A*.
An important data structure that A* uses is the PriorityQueue. The PriorityQueue holds all the data that you give it. But when you ask for data back, it gives you the data with the lowest priority. For example, if we put in 2 tuples, (5, None) and (12, None), no matter what order we put them in, PriorityQueue will give them back in the same order, which is, (5, None) and then (12, None). You can think of it as it sorts them in the background, and then when asked for, returns the lowest elements. When we use this data structure in our algorithm, the f scores are going to be the “priority level” (similar to the number in the tuples example). When we ask for them back, we are going to get the lowest f scores back, and the lowest f scores are the paths which have the smallest distance. We will call our PriorityQueue the open set, or open for short. It is going to contain all of our neighbors and their f scores for sorting/priority.
We also need some way of getting our f scores. We know the formula (g score + h score), but how do we get it? We are going to store all the f and g scores in a python dictionary. H score is going to be calculated by function which calculates the distance between 2 cells. It is just going to add up the difference between the x and y values between the 2 cells. Since h score is just an approximation, that will work fine. G score is going to be tracked as we loop through all the neighbors. Since the distance between neighbors and current cells is always going to be 1, we can add 1 to our current g scores to get the g scores for our neighbors. Since it will take 1 more distance to get to them. The f score, as mentioned before, is just the sum of the g and h scores.
We will also be keeping track of which cell got us to the next cell, so we can backtrack and complete our path, once we reach our endpoint. We will know when we reach the endpoint, if the endpoint is one of our neighbors.
Now, let’s see the code:
from queue import PriorityQueue def heuristic(cell, end): # assuming (100, 100) is end x1, y1 = cell x2, y2 = end return abs(x2 - x1) + abs(y2 - y1) def reconstruct(path, end): final = [] curr = end while curr in path: final.append(curr) curr = path[curr] return reversed(final) def get_neighbors(cell, start=(0, 0), end=(100, 100)): def between(a, b, c): return (b <= a and a <= c) or (b >= a and a >= c) x1, y1 = cell return [ (x + x1, y + y1) for x in range(-1, 2) for y in range(-1, 2) if ( between(x + x1, start[0], end[0]) and between(y + y1, start[1], end[1]) ) ] def a_star(end: tuple = (100, 100), start: tuple = (0, 0)): count = 0 open = PriorityQueue() open.put((0, count, start)) path = {} g_score = { (x, y): float("inf") for x in range(end[0] + 1) for y in range(end[1] + 1) } g_score[start] = 0 f_score = { (x, y): float("inf") for x in range(end[0]) for y in range(end[1]) } f_score[0] = heuristic(start, end) while not open.empty(): curr = open.get()[2] temp_g = g_score[curr] + 1 for n in get_neighbors(curr, start, end): if n == end: path[n] = curr return reconstruct(path, end) if temp_g < g_score[n]: path[n] = curr g_score[n] = temp_g f_score[n] = temp_g + heuristic(n, end) if not any(n == item[2] for item in open.queue): count += 1 open.put((f_score[n], count, n)) # path not found return None path = [coord for coord in a_star((50, 10))] path.insert(0, (0, 0)) for y in range(0, 11): lst = [] for x in range(0, 51): if (x, y) not in path: lst.append("x") else: lst.append("o") print(lst)
View code on GitHub.
Practice
BFS and DFS
Implement a BFS and recursive DFS algorithm, that prints the nodes you visit as you go.
Template code is provided for you.
A* Practice
In this practice problem, you get to fill in some a_star code as well as see the effects of using different heuristics on a_star's execution time. Heuristics and helper functions are given. Your job is to fill in sections of the A* algorithm where it says “your code here”.
Previous Session
3.2 Sorting AlgorithmsCopyright © 2021 Code 4 Tomorrow. All rights reserved.
The code in this course is licensed under the MIT License.
If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.