At Vulcan, we are committed to continuing Paul Allen’s legacy of championing African wildlife conservation. So far, we have been investigating the use of ML on aerial imagery to be able to spot a number of activities from the sky. The ones we are potentially interested in detecting are those that are illegal in protected African parks, like poaching, logging, or cattle grazing. We hope that our ML system will be used to help a park’s rangers survey these massive expanses of land they are charged with protecting, and deploy their resources more efficiently. We are hopeful that having an “eye in the sky” will help these teams stop the people who are trying to kill animals (like elephants, rhinos, giraffes, etc.), or otherwise do any damage to these special places.
The Role of ML
In our ideal deployment scenario, while aerial imagery is being taken, it would be streamed back to a ground station. There, a human would be watching the footage and looking for suspicious activity in the park. Since a flight can last many hours, fatigue is bound to set in. For probably 99% of the time, there will be nothing of interest in the video feed, so you can imagine how hard it would be to stay alert. Then, when something of interest does come into the view, it may only be on the screen for a quick moment as the aircraft flies over. That object could also be a person trying very hard to blend in with their surroundings so that they don’t get caught or an rhino hiding in the shade of a tree. Any of these factors alone makes the job difficult, but when they are all combined together, spotting these objects reliably can become almost impossible. That’s where machine learning can help. I am developing and training ML models to make this person’s life easier by automatically detecting and flagging objects of interest in the video feed. The humans would then get to decide whether or not to send out rangers to investigate further.
Currently, our ML system supports detection in infrared (IR) and visual spectrum images. Most poaching happens at night when it’s easier to move around undetected (it’s also easier to be caught unaware by a charging water buffalo, but that’s a risk poachers seem willing to take). IR allows us to be able to provide an eye in the sky even at night. This will hopefully provide more visual context of a situation before deciding to send people into a potentially dangerous situation in complete darkness.
An IR image containing humans, a vehicle and a campfire.
Feedback to Improve Learning
One of the most time consuming tasks in the supervised ML world is annotating data. We spent months collecting and sending images to Samasource for annotation. (If you haven’t heard of Samasource
, they are worth checking out. They are an impact sourcing non-profit that provides job training and experience for unemployed people in some of the most impoverished areas of the world by outsourcing digital work.) Even once we have our images annotated, we continue to encounter many challenges regarding how representative our training data is of what will be seen during missions. While we’ve given our models a good baseline idea of what people, animals, vehicles, etc. look like in IR and visual spectrum from an aircraft, if we were to start collecting imagery more regularly in the parks, we could begin to supplement our training with data that is real and specific to the parks and the behaviors taking place there.
What would be even more ideal would be to leverage our expert-in-the-loop to avoid the original bottleneck of third-party annotation. The rangers operating the cameras from the ground station will be an invaluable asset in improving the ML system because they will have the ability to correct the model’s annotations. If someone will be sitting there watching the screen anyway in the beginning (stay tuned for a future blog post on how we’re staging some of R&D plans to not rely on a human in the loop), we can put their human brain power to use. As the video footage plays for the user, the model will be popping up bounding boxes around objects it thinks are of interest. Inevitably, there will be mistakes. A ranger will certainly know, for instance, that even though the warm termite mound buzzing with activity is vaguely human-shaped, it is not, in fact, a human and they will be able to give that feedback to the system.
These images show some ML system misclassifications. From left to right: 1) A person correctly detected, but their boat was missed; 2) a tree in IR incorrectly labeled as a human; 3) three inner tubes incorrectly labeled as elephants because the model has never seen inner tubes before; 4) one adult cow that has been correctly labeled but the baby was missed. Our experts-in-the-loop would be able to give feedback to correct these.
After they watch and simultaneously correct a true mission flight, a new set of annotated data would be ready to use for training! This realtime, human/computer collaborative annotation will allow for a faster turnaround to building versions of our generic model that are fine-tuned for each specific location where we want to deploy the ML system.
The rangers will be able to see their corrections reflected in the later iterations of the model. Their trust in the technology will hopefully increase as we continue to listen and learn from them. While we have a ways to go before this is a complete solution (for example, we don’t want loud aircraft to let bad actors know that we are coming or to disturb the animals), we’re excited to continue to explore ways to progress toward our common goal of more effective wildlife conservation.