Backyard Bird Classification with Fast AI

Saahil Barai
Analytics Vidhya
Published in
11 min readMay 31, 2021

--

Two Mourning doves out for breakfast

Every morning, at my home, we put food in the bird houses we have in our backyard. Over the last couple of weeks, I have noticed a lot of new birds visiting to grab a bite to eat and thought it would be a fun project to see exactly which types of birds are visiting.

To do so, I trained a neural network using Fast AI’s libraries. This was my first time experimenting with neural networks and I had a great time. I would highly encourage anyone interested to try it out. The ease with which we can now train neural networks to accomplish fine-grained classifications is astounding. I will detail the entire process I used and my findings below for anyone who would like to try this themselves or modify them to their own use case.

Fast AI

Before I deep dive into the code and processes, I would like to go over the main tools we will be using throughout this project. The first of which is Fast AI. Fast AI is a deep learning library that simplifies the training of accurate and fast neural networks. Through the course of this post there are many instances where the library shined in its ability to speed up the process of training a highly accurate neural network. At every step in the pipeline of making a model, the developers seem to have extensively thought of the user’s experience. As a result, it was a joy to use Fast AI.

Residual Neural Networks

Residual neural networks represent one specific type of neural network introduced in 2015 within the paper Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.

Residual neural networks were created to solve the problems of vanishing gradients and exploding gradients. The problems of vanishing and exploding gradients go as follows. Neural networks are composed of layers and the number of layers represents the depth of the network. As such, depth becomes an important parameter affecting performance. In the quest for creating better networks, a question arose: Is the solution to creating better networks adding more layers and thus more depth?

When this idea was tested oftentimes the neural networks would reach a point of stagnation (vanishing gradient) or a point of instability where the model is unable to converge to an optimal solution (exploding gradient).

To solve this the paper describes an architecture of neural networks where skip connections are utilized in-between layers. These skip connections allow the deeper layers in the model to learn very little. Oftentimes, if it is beneficial to leave the layers unchanged the function learnt by the skip connection will be similar to the concept of the identity function. The identity function returns the same value as it received as input. This will then result in minimal to no change for deeper layers, thus solving the problem of vanishing and exploding gradients. An in-depth explanation behind this architecture is very well described in a video by Yannic Kilcher. Overall, the skip connections allow for the training of a deeper network by not forcing the latter layers to continue to learn.

Transfer Learning

The speed at which the neural networks provided by Fast AI train are in part due to the concept of transfer learning. In transfer learning a pre-trained model is leveraged to solve a new but related problem.

The traditional mode of learning goes as follows. We have features which may be, but are not limited to the pixels of an image, numerical data, or categorical data. We also have a target corresponding to some features that we are trying to predict. For example, images of cats and dogs for which the target is the animal displayed in the image. Using the features and targets a model can be trained from scratch for the task at hand.

Transfer learning allows for the use of a neural network that has been pre-trained as a starting point for a new task. Taking the dog and cat example above, suppose that there is a model trained with many animals as its features regardless of whether dogs and cats are included in the feature space. We would then be able to use this pre-trained model and retrain some of the last layers to accomplish the new task of identifying dogs and cats.

Transfer learning is especially helpful when there is not enough labeled data to train a model for a specific task and when models need to be trained quickly. Taking this into consideration, it is extraordinary that state of the art results can be achieved using transfer learning.

Building a Dataset

Prior to training my neural network, I first had to build a dataset so that the neural network can learn about the different species of birds. This dataset will be composed of two main components: an image of a bird and the text label of the species of the bird in the image.

To decide what species I should make my neural network aware of, I first had to get some knowledge on the local birds of the city I live in. To do so, I went to a couple local bird watching websites and started to compile a list of bird species that were common in my city. In this way, I was able to narrow down the list of bird species and choose which ones to train my model on.

Now that I had my species list, I was ready to begin collecting some images. A quick and easy way to create an image-based dataset is through the use of Google Images. There are many ways to download images from Google Images: Python scripts, the browser JavaScript console, and browser extensions. The process I used to collect images is as follows:

  1. Download an extension for your browser that allows you to download images from Google Images.
  2. Search for the particular topic you are intending to find images on. In our case this would be a particular species such as “blue jays”.
  3. Scroll to a good section of images on your screen. It is important to note that the better images you chose the less you must manually delete afterwards.
  4. Use the extension to download the images. I downloaded 25 to 30 images per species but depending on your application and results you may want to increase that number. My first attempt with 10 to 15 images did not perform as well. The increase in images helped my model significantly.
  5. Sort the images by topic. In this case I created a folder for the dataset, and subfolders for the bird species.
  6. Repeat this process for each topic. In my case I repeated this for all my species.
  7. Use the verify images command within Fast AI to ensure each image can be opened.

By the end of this process, the dataset should resemble the structure in the image below.

View inside the main dataset file — sorted by species with each folder containing images of the species

Training a Model

With the dataset built, it is time to train a model. We will start by reading in the dataset we created above.

The ImageDataLoaders function from Fast AI reads in the dataset. Since the dataset was segmented by species rather than a train test split, I provided a validation percentage. This percentage is the portion of data that should be kept aside for checking the model’s performance. The following commands resize the images making the size of each consistent. Lastly, the image data is normalized.

data = ImageDataLoaders.from_folder(path, valid_pct=0.2,item_tfms=[Resize(460)], batch_tfms=[*aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)])

To make sure our dataset was read in properly, the show_batch command can be used to display a portion of the labeled dataset.

data.show_batch()

Upon confirmation that the dataset looks good, it is time to initialize and train a model. We will be using a ResNet34 which is a residual neural network composed of 34 layers to train our model. The first line of code below initializes our model by taking in the data object we created above, the model type, and a metric type to assess how well the model is performing. The second line of code trains our model and takes in the parameter n_epoch which tells the model how many iterations to perform of training or more specifically the stochastic gradient process. The third line of code simply saves our model so that if we would like to revert to or recall this particular model further down in the code, we can do so.

learn = cnn_learner(data, models.resnet34, metrics = error_rate) learn.fit_one_cycle(n_epoch=4)
learn.save('stage1')
Results from the first round of training

The results of our first 4 iterations are shown above. I was able to get down to ~0.29 error rate after 4 epochs. Although the resultant model has significantly improved over the 4 epochs, the error rate was still quite high. To get a good judgement of where my model stood, I viewed some image classification projects in the Fast AI community that were similar to mine.

To reduce the error rate, I tried to optimize the learning rate. Having too high or too small of a learning rate can harm the model. A large learning rate can prevent the model from converging to an optimal solution and a small learning rate may result in premature stoppage of training. Consequently, the solution to finding the optimal learning rate is often to try a variety of rates. Fortunately, Fast AI has a function that does just that as shown by the code below.

learn.unfreeze()
learn.lr_find()

The first line of code “unfreezes” the initialized model. When a model is “frozen”, modifications to every layer apart from the last layer of the neural network are prohibited. The motivation behind this ties back into transfer learning as mentioned above. To take advantage of the pre-trained model we do not want to tamper with the learnings of the early layers. The second line of code trains a model by starting with a very low learning rate. The learning rate then increases over many intervals until it is very high. At each interval loss is recorded and plotted on the resultant graph below. The intuition behind performing this graphing is to take some of the guesswork out of finding the more optimal learning rates. When looking at this graph it is important to find a zone in which there is a steep downward slope. If the graph has multiple, trying both regions can be helpful.

Loss versus Learning Rate graph to find optimal slicing of learning rates

The graph for my particular model, as shown above, had one clear region of steep downward slope in-between 1e-4 and 1e-3. Using this range, I trained my model now specifying a max learning rate. The slice function allows the model to use 1e-4 as the rate for the first layer and 1e-3 as the rate for the last layer. Each layer in the middle will then be equally spaced learning rates between the range provided.

learn.fit_one_cycle(4, lr_max=slice(1e-4,1e-3))
learn.save('stage2')
Results from the second round of training

The results of our 4 epochs after optimizing the learning rate have a dramatic influence on the error rate of the model. I was able to get the error rate from ~0.29 earlier to ~0.04. Equally as amazing is the time in which we were able to train this model. In a matter of minutes, we were able to create a model with ~0.95 accuracy.

Insights

There are two main ways I further improved my model: the confusion matrix and the top losses.

The confusion matrix is a good way to see how the model is performing and where the model may be consistently misclassifying. In some cases, the confusion matrix can also show errors that are tolerable. If two birds are extremely similar, it could reveal that the source of error in the model is not a larger flaw. In my case the blue jay and indigo bunting were very similar from some angles and as a result, I could justify why the model was making an error. So, the matrix allowed me to dig deeper into the model’s error rate in a sense.

We can use the following set of commands to output a confusion matrix.

interp = ClassificationInterpretation.from_learner(learn)
fig = interp.plot_confusion_matrix(figsize=(10,15))
Confusion Matrix of Final Model

To get a more explicit picture we can show the model’s top losses. The model’s top losses represent where the model was the most confident and simultaneously wrong.

interp.plot_top_losses(9, figsize=(15,11))

In my case looking at the top losses, showed me that some of the model’s most glaring errors were due to the images given to it. For example, I included a picture that contained two separate species of birds. This was leading my model to identify the incorrect bird in the picture. I went ahead and deleted these images from the dataset.

Bird Watching

Now for the fun part! I spent the morning taking pictures of all the birds who were eating breakfast and was able to identify what species each one was with high confidence.

To make predictions on pictures that I took, I first had to export the model. Exporting the model, creates a pickle file that can be deployed elsewhere or loaded back at a later date. I loaded back the exported model since I was working in my notebook still. This allowed me to call the predict method and feed the model the path to an image for which to predict a label. The second block of code below is simply to create the visualization of predicted labels against the pictures fed into the model.

learn.export()
model_inf = load_learner('path_to_export/export.pkl')
pred_class = model_inf.predict('path_to_img/img_name.JPG')
testimg1 = Image.open('path_to_img/img_name.JPG')
print(pred_class[0])
plt.imshow(testimg1)
plt.axis('off')
plt.show()
Predicting species of birds found in my backyard using the trained model

--

--