Log In Sign Up

Teaching Computer Vision for Ecology

by   Elijah Cole, et al.

Computer vision can accelerate ecology research by automating the analysis of raw imagery from sensors like camera traps, drones, and satellites. However, computer vision is an emerging discipline that is rarely taught to ecologists. This work discusses our experience teaching a diverse group of ecologists to prototype and evaluate computer vision systems in the context of an intensive hands-on summer workshop. We explain the workshop structure, discuss common challenges, and propose best practices. This document is intended for computer scientists who teach computer vision across disciplines, but it may also be useful to ecologists or other domain experts who are learning to use computer vision themselves.


page 1

page 2

page 3

page 4


Second Croatian Computer Vision Workshop (CCVW 2013)

Proceedings of the Second Croatian Computer Vision Workshop (CCVW 2013, ...

Adapting Computer Vision Algorithms for Omnidirectional Video

Omnidirectional (360) video has got quite popular because it provides a ...

Perspectives on individual animal identification from biology and computer vision

Identifying individual animals is crucial for many biological investigat...

Gather – a better way to codehack online

A virtual hands-on computer laboratory has been designed within the Gath...

Challenges and Opportunities for Computer Vision in Real-life Soccer Analytics

In this paper, we explore some of the applications of computer vision to...

Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective

Developing successful sign language recognition, generation, and transla...

1 Introduction

Extracting important information from images and videos normally requires painstaking manual effort from human annotators. Computer vision algorithms can automate this process. This is especially important when manually reviewing the data is not feasible, either because the amount of data is too large (e.g. the 100TB of satellite imagery collected daily) or the number of annotators is too small (e.g. when expertise is required to identify a species in an image). Both of these challenges are common in ecology.

Ecology presents a particularly compelling use case for computer vision. Due to the effects of climate change, we need to monitor animal populations, vegetation properties, and other indicators of ecosystem health at a large scale [41]. Ecologists are collecting vast amounts of raw data with camera traps, drones, and satellites, but there are not enough experts to annotate the data. Computer vision algorithms can accelerate the pace of research in ecology by efficiently transforming this raw data into useful knowledge. Encouraging progress is already being made in areas like animal detection [18, 29, 25], fine-grained species recognition [42], individual re-identification [38], species distribution modeling [19, 21], and land cover mapping [32]. These efforts can be viewed in the broader context of computational sustainability [23]

and efforts to use machine learning to combat the effects of climate change 


Figure 1: A simplified dependency graph depicting some of the skills required to develop a computer vision system. These software engineering and machine learning topics are rarely included in ecology training. See Appendix B for a catalog of key topics and their significance.

To build on this progress, we must equip ecologists with the skills they need to understand and apply computer vision methods in their research. While ecologists often have training in statistics and programming, they are rarely exposed to the interconnected web of software engineering and machine learning topics necessary for computer vision. We illustrate a few of these topics in Figure 1.

In this work, we discuss the process of teaching computer vision to ecologists in the context of the Resnick Sustainability Institute Summer Workshop on Computer Vision Methods for Ecology (CV4E Workshop), an intensive 3-week workshop held at Caltech in 2022 [4]. We review related work in Section 2 before describing the workshop in Section 3, discussing key take-aways in Section 4, and outlining educational techniques we found useful in Section 5.

2 Related Work

There is an emerging literature devoted to teaching machine learning [37, 20, 34]

, deep learning 

[30], and computer vision [31, 24, 26, 36]. [35] more narrowly focuses on common errors in machine learning course projects. However, most of these works concern efforts to teach students from computer science or related disciplines. There is prior work discussing the specific challenge of teaching machine learning to cross-disciplinary audiences such as non-CS undergraduates [39], business students [43], artists [22], materials scientists [40], and biologists [28]. Our work is complementary, focusing on the process of teaching computer vision to ecologists (mostly Ph.D. students and postdocs – see Figure 2) who have background knowledge in statistics and programming but little prior experience in machine learning. In addition, we consider an immersive workshop in which researchers build prototypes using their own research data, not a traditional classroom environment.

3 The CV4E Workshop

The inaugural CV4E Workshop was held at Caltech from August 1 - 19, 2022. The program was designed to train ecologists to use computer vision in their own research. Here we outline the stages of the workshop.

Application. The application had five components: (i) a one-page project proposal, (ii) a one-page personal statement, (iii) a programming example, (iv) one letter of reference, and (v) a CV. The most important element was the project proposal, in which participants described the problem they wanted to solve with computer vision, the potential impact of a working solution, and the available data and labels.

Selection process. The CV4E staff recruited application reviewers from the machine learning and ecology communities. Each application received two reviews. Final decisions were made by the CV4E staff. The primary criteria were: (i) goal clarity, (ii) project feasibility, (iii) potential impact, and (iv) candidate preparation. Details about the 2022 cohort can be found in Figure 2 and Appendix A. To maximize accessibility, all participants were funded for travel, room, and board for the duration of the program.

Figure 2: Summary of the 2022 CV4E Workshop participant backgrounds (top) and project categories (bottom). Full details can be found in Appendix A.

Pre-workshop preparation. All participants were added to a Slack workspace which served as the primary communication channel for the workshop. Each participant was assigned to a working group overseen by a CV4E instructor. During the 6 months between participant selection and the beginning of the workshop, participants met with their instructors to finalize project plans and address any data or label issues. Participants were also expected to learn Python during this period. Instructors assisted by providing Python resources and holding biweekly office hours.

In-person workshop. Figure 3 gives a representative weekly schedule for the CV4E Workshop. Participants received classroom instruction from Lectures and Invited Speakers. Each participant joined a Reading Group on a topic of their choice (see Appendix D), which met twice weekly for a guided discussion of research papers. During the Work Time, participants worked on their projects independently, with CV4E staff and working groups peers available for questions. Each working group discussed their progress and obstacles during the Group Updates.

Outcomes. All 18 of our participants had trained models for their projects by the end of the workshop. Some of these models were already achieving high performance, while others needed more investigation. In addition, the participants and staff formed a community that has endured beyond the workshop through the Slack workspace and ongoing projects.

Figure 3: The weekly schedule for the 2022 CV4E Workshop was roughly evenly split between instructional time (Lectures, Invited Speakers, Reading Groups, Group Updates) and instructor-supervised working time (Work Time). In practice, portions of the Group Updates, Lunch, Break, and Dinner slots were often used by participants as extra work time.

4 Lessons Learned

Enforce structured Python preparation. The primary obstacle for most participants was insufficient Python preparation. While participants were not required to know Python before applying, they were asked to learn Python before arriving. To facilitate this process, the staff provided resources for learning Python and hosted office hours in the months leading up to the CV4E Workshop. However, many participants (even capable R programmers) still struggled with Python issues throughout the workshop. In hindsight, we overestimated the extent to which R experience is helpful for quickly learning Python. In the future we will enforce more structured Python preparation before the workshop.

Start simple.

It is challenging to build a working computer vision system from scratch in 3 weeks. To maximize the probability of success, it is important to start simple. When appropriate, we encouraged participants to use standard well-understood pipelines e.g. fine-tuning an ImageNet-pretrained ResNet-50.

Work in long blocks. Participants made much more progress during long blocks of work time (3 hours) than during shorter work blocks (1-2 hours).

Collect similar projects in working groups. Participants were often eager to help each other, especially when they were deploying similar techniques. Working groups should be constructed to maximize opportunities for such collaborations.

Mix experience levels in working groups. Some participants had significant experience with machine learning or programming, enabling them to make swift progress on their projects with minimal assistance from instructors. Experienced participants routinely volunteered to assist less experienced participants, which seemed mutually beneficial. In the future, we plan to ensure that each working group has a mix of experienced and inexperienced participants.

Make unambiguous infrastructure recommendations. There are many reasonable ways to set up the infrastructure necessary for computer vision work. For instance, consider the problem of developing code which is meant to be executed on a VM. One approach is to edit the code locally in a text editor and move it to the VM using rsync, handling revision control locally. Another approach is to use a tool like VSCode [17] which allows code on the VM to be edited directly via SSH. In this case, revision control would be handled on the VM. A third approach is to edit code locally, push the code to GitHub, and pull the code to the VM. Revision control is “built in” for this workflow. The instructors had different preferences, and no workflow was clearly superior. Participants did not benefit from being asked to make their own choice about which workflow to use. In the future we will provide unambiguous and unified recommendations for development infrastructure.

Avoid deep learning library wrappers.

There are many “wrappers” for deep learning libraries which are meant to make deep learning tools easier to use. Some are general-purpose (e.g. PyTorch Lightning 

[13]) while others are domain-specific (e.g. OpenSoundscape [11], DeepForest [5], TorchGeo [16]). While these wrappers are undoubtedly useful, they are not ideal for our participants for two reasons. First, they conceal too much complexity which hinders the process of learning about e.g. training loops and data flow. Second, they were more difficult to customize and debug, even with instructor assistance. In the future, we will encourage all participants to work directly with deep learning libraries.

Avoid Jupyter Notebooks. Jupyter Notebooks provide capabilities familiar to experienced R users, such as the ability to run sections of code interactively. However, participants who relied on Jupyter Notebooks while learning Python often struggled to transition to more traditional command line workflows when developing their computer vision systems. We now believe that learning to work with Python through the command line provides a better foundation for understanding machine learning workflows.

Make sure GPUs are available. Cloud computing services like AWS and Azure often provide free credits for education and research. However, GPUs may not be available depending on customer demand. It is important to confirm with cloud providers that GPUs will be made available. Alternatively, consider using local computing resources or university clusters.

5 Educational Techniques

In this section we describe a few educational techniques we found helpful for the CV4E Workshop.

Guided troubleshooting. Troubleshooting and debugging are vital skills in machine learning, and it was important to provide participants with opportunities to hone these abilities. However, due to the tight schedule of the CV4E Workshop, we did not want participants to be stuck on any one problem for too long. To balance these objectives, instructors tried to walk participants through the troubleshooting process by asking leading questions about the problems they were encountering. For unusual problems of limited educational value (i.e. complex configuration or installation issues), instructors intervened to resolve the issue as quickly as possible.

Pair pseudocoding. Most of our participants were not comfortable writing Python code at the beginning of the CV4E Workshop, so we wanted to provide frequent opportunities for hands-on coding. Whenever possible, instructors avoided writing code for the participants. To prevent participants from getting stuck on code design issues, we used pair pseudocoding:

  1. The instructor asks the participant to explain what they would like to accomplish, discussing until the goal is clear to both parties.

  2. The instructor writes pseudocode that solves the problem and walks through it with the participant to help them understand the logic of the solution. The pseudocode can be more specific or vague depending on the participant’s needs.

  3. The participant writes Python code to solve the problem, while the instructor remains available to answer questions as they arise.

Goal statements. During the initial stages of the project, some of our participants felt like progress was not being made because the code didn’t “work” yet. To make their progress more salient, some instructors asked participants to make a goal statement at the beginning of each work session, and to check progress towards that goal at the end of each work session. This strategy helped participants to maintain motivation until more tangible results were obtained.

Contextualized lectures. Maintaining interest during lectures was not a significant problem for the CV4E Workshop due to the enthusiasm of the participants. However, it is easy for lectures on machine learning topics to become too abstract. We tried to ensure that the lectures remained grounded in applications and examples. Since each participant had their own applied problem in mind, we often paused lectures to ask participants to reflect on how the lecture topic applied to their individual projects. Participants shared their answers with the class, providing concrete examples that illustrated the lecture topic.

6 Conclusion

We have described our experience at the inaugural Resnick Sustainability Institute Summer Workshop on Computer Vision Methods for Ecology. We consider the format to be a success, as all of our participants trained models for their projects by the end of the workshop. However, we have also discussed some challenges we encountered and identified opportunities to improve the CV4E Workshop. We hope these observations will be useful for others who teach computer vision across disciplines.

7 Acknowledgements

We would like to thank the Resnick Sustainability Institute, Caroline Murphy, Xenia Amashukeli, and Pietro Perona for making the CV4E Workshop possible. Computing credits were provided by Amazon AWS and Microsoft Azure. We also thank the inaugural cohort of the CV4E Workshop: Antón Álvarez, Carly Batist, Peggy Bevan, Catherine Breen, Anna Boser, Tiziana Gelmi Candusso, Melanie Clapham, Rowan Converse, Roni Goldshmid, Natalie Imirzian, Brian Lee, Francesca Ponce, Alixandra Prybyla, Rachel Renne, Felix Rustemeyer, Taiki Sakai, Ethan Shafron, and Casey Youngflesh.


  • [1] Amazon Web Services (AWS).
  • [2] Audacity.
  • [3] Computer Vision Annotation Tool (CVAT).
  • [4] CV4E Summer Workshop.
  • [5] DeepForest.
  • [6] FFMPEG.
  • [7] ImageMagick.
  • [8] ImgLab.
  • [9] Microsoft Azure.
  • [10] OpenCV.
  • [11] OpenSoundscape.
  • [12] PyTorch.
  • [13] PyTorch Lightning.
  • [14] TensorBoard.
  • [15] TensorFlow.
  • [16] TorchGeo.
  • [17] VSCode.
  • [18] Sara Beery, Dan Morris, and Siyu Yang. Efficient pipeline for camera trap image review. arXiv preprint arXiv:1907.06772, 2019.
  • [19] Elijah Cole, Benjamin Deneu, Titouan Lorieul, Maximilien Servajean, Christophe Botella, Dan Morris, Nebojsa Jojic, Pierre Bonnet, and Alexis Joly. The geolifeclef 2020 dataset. arXiv preprint arXiv:2004.04192, 2020.
  • [20] Adrian A de Freitas and Troy B Weingart.

    I’m going to learn what?!? teaching artificial intelligence to freshmen in an introductory computer science course.

    In Proceedings of the 52nd ACM technical symposium on computer science education, pages 198–204, 2021.
  • [21] Benjamin Deneu, Alexis Joly, Pierre Bonnet, Maximilien Servajean, and François Munoz.

    Very high resolution species distribution modeling based on remote sensing imagery: How to capture fine-grained and large-scale vegetation ecology with convolutional neural networks?

    Frontiers in plant science, 13:839279, 2022.
  • [22] Rebecca Fiebrink. Machine learning education for artists, musicians, and other creative practitioners. ACM Transactions on Computing Education (TOCE), 19(4):1–32, 2019.
  • [23] Carla Gomes, Thomas Dietterich, Christopher Barrett, Jon Conrad, Bistra Dilkina, Stefano Ermon, Fei Fang, Andrew Farnsworth, Alan Fern, Xiaoli Fern, et al. Computational sustainability: Computing for a better world and a sustainable future. Communications of the ACM, 62(9):56–65, 2019.
  • [24] Tal Hassner and Itzik Bayaz. Teaching computer vision: Bringing research benchmarks to the classroom. ACM Transactions on Computing Education (TOCE), 14(4):1–17, 2015.
  • [25] Benjamin Kellenberger, Diego Marcos, and Devis Tuia. Detecting mammals in uav images: Best practices to address a substantially imbalanced dataset with deep learning. Remote sensing of environment, 216:139–153, 2018.
  • [26] Sami Khorbotly. A project-based learning approach to teaching computer vision at the undergraduate level. In 2015 ASEE Annual Conference & Exposition, pages 26–91, 2015.
  • [27] Zachary C Lipton and Jacob Steinhardt. Troubling trends in machine learning scholarship. arXiv preprint arXiv:1807.03341, 2018.
  • [28] Chris S Magnano, Fangzhou Mu, Rosemary S Russ, Milica Cvetkovic, Debora Treu, and Anthony Gitter. An approachable, flexible, and practical machine learning workshop for biologists. bioRxiv, 2022.
  • [29] Jason Parham, Charles Stewart, Jonathan Crall, Daniel Rubenstein, Jason Holmberg, and Tanya Berger-Wolf. An animal detection pipeline for identification. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1075–1083, 2018.
  • [30] Simon J.D. Prince. Understanding Deep Learning. MIT Press, 2022.
  • [31] S.J.D. Prince. Computer Vision: Models Learning and Inference. Cambridge University Press, 2012.
  • [32] Caleb Robinson, Le Hou, Kolya Malkin, Rachel Soobitsky, Jacob Czawlytko, Bistra Dilkina, and Nebojsa Jojic. Large scale high-resolution land cover mapping with multi-resolution data. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , pages 12726–12735, 2019.
  • [33] David Rolnick, Priya L Donti, Lynn H Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, et al. Tackling climate change with machine learning. ACM Computing Surveys (CSUR), 55(2):1–96, 2022.
  • [34] Omar Shouman, Simon Fuchs, and Holger Wittges. Experiences from teaching practical machine learning courses to master’s students with mixed backgrounds. In Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop, pages 62–67. PMLR, 2022.
  • [35] James Skripchuk, Yang Shi, and Thomas Price. Identifying common errors in open-ended machine learning projects. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1, pages 216–222, 2022.
  • [36] Scott Spurlock and Shannon Duvall. Making computer vision accessible for undergraduates. Journal of Computing Sciences in Colleges, 33(2):215–221, 2017.
  • [37] Peter Steinbach, Heidi Seibold, and Oliver Guhr. Teaching machine learning in 2020. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 1–6. PMLR, 2021.
  • [38] Charles V. Stewart, Jason R. Parham, Jason Holmberg, and Tanya Y. Berger-Wolf. The animal id problem: Continual curation. arXiv preprint arXiv:2106.10377, 2021.
  • [39] Elisabeth Sulmont, Elizabeth Patitsas, and Jeremy R Cooperstock.

    What is hard about teaching machine learning to non-majors? insights from classifying instructors’ learning goals.

    ACM Transactions on Computing Education (TOCE), 19(4):1–16, 2019.
  • [40] Shijing Sun, Keith Brown, and A Gilad Kusne. Teaching machine learning to materials scientists: Lessons from hosting tutorials and competitions. Matter, 5(6):1620–1622, 2022.
  • [41] Devis Tuia, Benjamin Kellenberger, Sara Beery, Blair R Costelloe, Silvia Zuffi, Benjamin Risse, Alexander Mathis, Mackenzie W Mathis, Frank van Langevelde, Tilo Burghardt, et al. Perspectives in machine learning for wildlife conservation. Nature communications, 13(1):1–15, 2022.
  • [42] Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
  • [43] Linus Wunderlich, Allen Higgins, and Yossi Lichtenstein. Machine learning for business students: An experiential learning approach. In Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, pages 512–518, 2021.

Appendix A 2022 Cohort

a.1 Participant Backgrounds

The inaugural 2022 CV4E Workshop had 18 participants. Broken down by current occupation, our cohort consisted of:

  • 1 Master’s student;

  • 10 Ph.D. students;

  • 5 post-doctoral researchers; and

  • 2 researchers from government agencies or non-governmental organizations.

Broken down geographically, our cohort consisted of:

  • 11 participants from 7 different states in the U.S.;

  • 5 participants from European countries; and

  • 2 participants from Canada.

Participants came from diverse academic backgrounds, including conservation biology, biological anthropology, geography, mechanical engineering, civil engineering, neuroscience, and ecology.

a.2 Participant Projects

Projects fell into seven main categories.

  1. Individual Re-Identification: Associating images of the same animal taken from various cameras, locations, and times. The two relevant projects were: (1) re-identifying bears, (2) re-identifying Iberian Lynx.

  2. Regression: Assigning a continuous number to an image or video. The one relevant project was: (1) analyzing wind speed from overhead drone video of trees.

  3. Classification: Categorizing or labeling images or parts of images from a fixed collection of categories. The six relevant projects were: (1) determining presence or absence of lemur vocalizations, (2) beaked whale species classification from echolocation clicks, (3) bumblebee species and caste classification from flight sounds, (4) assigning ants to size categories, (5) identifying weather conditions from camera trap images, and (6) species identification in urban camera traps. Note that projects (1), (2), and (3) used computer vision techniques to classify images (spectrograms) that represent audio signals.

  4. Object Detection: Locating instances of objects in images or videos. The four relevant projects were detecting: (1) piospheres, (2) woodland draws, (3) flies, and (4) waterfowl. Projects (3) and (4) use detection as an intermediate step towards counting.

  5. Segmentation: Classifying pixels based on their semantic characteristics. The three relevant projects were segmenting: (1) walrus groups in the Arctic, (2) permafrost, and (3) trees. All projects were based on remote sensing imagery.

  6. Clustering: Grouping objects together according to some notion of similarity. The one relevant project was: (1) determining the species richness of an area using the number of clusters in a collection of camera trap imagery.

  7. Super-resolution: Increasing the resolution of an image. The one relevant project was: (1) increasing the resolution of land surface temperature data using satellite imagery.

Appendix B Key Topics

In this section we catalog topics that many of our participants learned during the workshop, either through formal instruction or on their own. We emphasize tools and concepts that were initially unfamiliar to most participants. For each topic, we describe the content and explain why it was important for our participants. See also the list of lectures in Appendix C.

b.1 Tools

b.1.1 Annotation Tools

Content: Using annotation tools to label image [8, 3] or audio [2] data.


: Labeled data is essential for training and evaluating computer vision algorithms. Since CV4E Workshop participants were using their own data, many of them needed to learn to use some sort of annotation tool. Furthermore, many of these tools can export labels in the standard formats expected by open-source computer vision libraries.

b.1.2 Unix Commands

Content: Common Unix commands like ls, pwd, rm, mkdir, rmdir, mv, cat, head, tail etc. Occasionally, less common commands like chmod or grep.

Motivation: Facility with Unix commands is crucial for installing packages, working with virtual machines, and using revision control. Understanding Unix commands also helps to build intuition for core concepts like absolute vs. relative paths.

b.1.3 Terminal-Based Text Editing

Content: Tools like nano for editing text that is stored on a server from the command line.

Motivation: When configuring SSH authentication it is often necessary to edit text files on the VM (e.g. /.ssh/config).

b.1.4 Terminal Multiplexing

Content: Tools like tmux or screen for managing terminal sessions.

Motivation: Long-running code (e.g. model training in PyTorch) should be executed in a terminal session that is decoupled from the SSH connection to avoid being terminated when a laptop is closed or internet connection is lost.

b.1.5 Ssh

Content: The ssh command and SSH keys. Occasionally, SSH tunneling.

Motivation: The ssh command is used to create a terminal session connected to a VM. Related topics like SSH keys are also important for e.g. authenticating terminal-based file transfers and enabling GitHub access. SSH tunneling can be necessary for setting up tools like TensorBoard [14].

b.1.6 Terminal-Based File Transfers

Content: Tools like scp or rsync for transferring files.

Motivation: Command line tools are the most reliable way to move large amounts of data from one place to another. This is useful for local transfers (e.g. from one hard drive to another) and remote transfers (e.g. from a local hard drive to a storage volume attached to a virtual machine).

b.1.7 Revision Control

Content: Using GitHub for tracking changes made to code.

Motivation: Code for computer vision projects tends to quickly grow in complexity, and it is easy to forget what has changed since the last working version. Tools like GitHub allow earlier versions of the code to be revisited easily if a bug was introduced by some change. In addition, GitHub can be used to move code from a local machine (git push) to a virtual machine (git pull) along with allowing users to download (git clone) open-sourced computer vision repositories.

b.1.8 Cloud Computing


Interacting with the web interfaces of cloud computing providers. Creating a virtual machine with appropriate resources e.g. GPUs, storage. Estimating and managing cost.

Motivation: One of the most common ways to access GPU resources for computer vision work is to use a VM from a cloud computing provider like Amazon Web Services (AWS) [1] or Microsoft Azure [9]. It is important to understand the benefits (scalability, reliability) and drawbacks (cost) of cloud computing.

b.1.9 Virtual Environments

Content: Creating and managing virtual environments.

Motivation: Computer vision projects typically rely on large pre-existing codebases, which may require particular versions of certain packages to be installed. While the user could change their base installations, a better solution is to create a virtual environment (through e.g. conda) in which the dependencies of the codebase can be installed. Virtual environments are also useful if a “clean reinstall” becomes necessary, because they are easy to create and delete.

b.1.10 Python

Content: Basic syntax, conditionals, loops, string parsing, file I/O, functions, classes and data structures.

Motivation: Facility with Python is crucial for efficiently working with Python-based deep learning libraries, which the computer vision community uses almost exclusively.

b.1.11 Python Libraries

Content: Common libraries like numpy, pandas, ipdb, sklearn, and matplotlib.

Motivation: Python has many stable, high-quality libraries for numerical computing and data analysis. Libraries like ipdb allow for in-line debugging.

b.1.12 Deep Learning Libraries

Content: Preferably PyTorch [12] and alternatively TensorFlow [15] for building deep learning systems.

Motivation: Modern deep learning libraries are indispensable for developing and training computer vision systems.

b.1.13 Image Processing Libraries

Content: Libraries and command-line tools like OpenCV [10], ImageMagick [7], and FFmpeg [6].

Motivation: These tools are often used for efficient data augmentation and visualization.

b.2 Computer Science Concepts

There are a few core concepts from computer science that came up frequently throughout the program.

b.2.1 Object Oriented Programming

Content: Classes and objects. Inheritance, encapsulation, polymorphism.

Motivation: Many important libraries assume an understanding of object oriented programming concepts. For instance, one common point of confusion for our participants was the difference between the PyTorch dataset class and a dataset object from that class. Understanding object oriented programming also makes it easier to understand data structures.

b.2.2 Data Structures


Common data structures (e.g. list, tuple, dictionary, NumPy array, PyTorch tensor) and their methods, casting from one data type to another, checking data structures.

Motivation: Unexpected behavior differences between e.g. Python lists, NumPy arrays, and PyTorch tensors can cause significant frustration if data structures are not well understood.

b.2.3 Data Types

Content: Common data types e.g. int, float, double, string, and bool.

Motivation: Understanding data types increases context understanding and can significantly impact data storage size.

b.2.4 Namespaces

Content: The built-in, global, and local namespaces.

Motivation: Namespaces are the answer to many common questions e.g. why variables defined inside a function are not accessible outside the function.

b.2.5 Mutability

Content: Mutable and immutable objects. In-place operations.

Motivation: Mutable objects can be changed in-place while mutable objects cannot. This is the basis for understanding whether changes made to an object inside a function will affect the object outside of the function.

b.3 Machine Learning Concepts

Participants learned different practical and conceptual aspects of computer vision and machine learning depending on their project. However, all participants had to engage with a few core concepts.

b.3.1 Generalization

Content: The concept of generalization, different types of generalization, identifying a type of generalization that reflects the goals of a project.

Motivation: In ecology there are many different notions of generalization, and it is important to choose one that reflects the goals of a project. For instance, in camera trap image classification it might be important to generalize to new locations or to future data from the same locations. These different notions of generalization need to be measured in different ways.

b.3.2 Data Splits

Content: The role of training, validation, and testing data. Designing appropriate splits to measure the chosen type of generalization.


Training, validation, and testing splits should be designed to capture an appropriate problem-specific notion of generalization. These splits must then be handled appropriately (e.g. no hyperparameter tuning on the test split) to ensure that performance measurements reflect generalization.

b.3.3 Overfitting

Content: Defining and recognizing overfitting. Mitigating overfitting using regularization techniques.

Motivation: All participants were working with deep learning, for which overfitting is always a significant concern.

b.3.4 Evaluation Metrics


Common evaluation metrics for different tasks, their strengths and limitations, choosing metrics that reflect high-level goals.

Motivation: Appropriate metrics are vital for determining which approaches work best and deciding if a computer vision system is “good enough” to be used for a real application.

b.3.5 Deep Learning

Content:Neural networks, loss functions, minibatch gradient descent.

Motivation: All modern computer vision methods rely on deep learning. Since our participants were building and troubleshooting computer vision systems, they needed to understand deep learning basics as well. Loss functions were a particular focus, since changing the loss is one of the primary ways of adapting an existing method to a new problem.

b.3.6 Representations


Image embeddings, distances in embedding space, pretraining, transfer learning.

Motivation: ImageNet pretraining is ubiquitous in modern computer vision, but many of our participants work in specialized domains for which ImageNet pretraining may not be appropriate. Domain-specific pretraining requires an understanding of representation learning. The concept of image embeddings is also useful for understanding many common computer vision algorithms (e.g. metric learning) and visualization techniques (e.g. t-SNE).

b.4 Other Skills

b.4.1 Critically Reading Machine Learning Papers

Content: Understanding machine learning terminology and paper structure, critically interpreting claims, evaluating complexity vs. performance trade-offs.

Motivation: Exploring the literature in a new field is always daunting. This is particularly challenging in machine learning where papers may be over-enthusiastically written, necessitating extra vigilance from the reader to clearly understand the drawbacks and benefits of a method [27].

b.4.2 Selecting “Good” Open Source Libraries

Content: Recognizing markers of quality in open source code.

Motivation: There is plenty of open-source computer vision code, but not all of it is reliable or well-maintained. Participants must learn to check indicators of code quality e.g. how many users a library has or how often the developers fix bugs.

b.4.3 Digging in to Libraries

Content: Reading documentation, finding the code that handles a certain task, understanding how components of a codebase interact.

Motivation: Computer vision projects depend on numerous complex but (generally) well-documented libraries. It is important to be able to understand the documentation. Sometimes it also becomes necessary to locate and inspect the piece of code being documented (e.g. a function from some library) to understand how it works in detail.

b.4.4 General Troubleshooting

Content: Errors vs. warnings, searching for more information about error messages.

Motivation: Errors and warnings are common when e.g. installing packages or testing new code. One of the most important skills in any programming activity is the ability to use a search engine to understand an error message. This involves locating the appropriate part of an error message to use as a search term, reading through the results, and choosing an appropriate next step.

b.4.5 Debugging Python Code

Content: Types of errors, finding the source of an error, print statement debugging.

Motivation: For a given line of code, any number of errors could arise. Understanding the different types of Python errors is helpful for pinpointing the root cause. Print statement debugging is also extremely useful for troubleshooting code running on a remote machine.

Appendix C List of Lectures

  1. Intro and Logistics (Sara Beery)

  2. Dataset Prototyping and Visualization (Jason Parham)

  3. Working on the Cloud (Suzanne Stathatos)

  4. Data Splitting and Avoiding Data Poisoning (Sara Beery)

  5. Training your Model: Deciding on Configurations, Launching, Monitoring, Checkpointing, and Keeping Runs Organized (Benjamin Kellenberger)

  6. Working with Open-Source CV Codebases: Choosing a Baseline Model and Custom Data Loading (Sara Beery)

  7. Evaluation Metrics (Elijah Cole)

  8. Offline Evaluation and Analysis (Sara Beery)

  9. What’s next? Rules of Thumb to Improve Results (Benjamin Kellenberger)

  10. Data Augmentation (Björn Lütjens)

  11. Expanding and Improving Training Datasets with Models: Weak Supervision, Self Supervision, Targeted Relabeling, and Anomaly Detection (Tarun Sharma)

  12. Fair Comparisons and Ablation Studies: Understanding What is Important (Elijah Cole)

  13. Efficient Models: Speed vs. Accuracy (Justin Kay)

  14. Serving, Hosting, and Deploying Models and Quality Control (Jason Parham)

Appendix D List of Reading Groups

  1. Time Series, Spectral Transforms, and Remote Sensing

  2. Data Imbalance & Long Tail Distributions

  3. Weak Supervision, Unsupervised Learning, Fine-tuning & Transfer Learning

  4. Bias & Domain Shift and Generalization