Machine Learning for Protecting the Oceans
January 14, 2019
Our oceans help feed our planet, support millions of jobs around the world, generate trillions of dollars in economic activity every year, and support most of the world’s life. But the health of our oceans is under threat from unsustainable practices like illegal fishing in protected areas, while human, arms, resource and drug smuggling on the ocean threatens the safety and health of communities across the globe.
 
To protect our oceans and communities against these threats Vulcan developed Skylight: a platform to monitor maritime domain activity on a global scale. Skylight combines signals from vessel transponders, satellite imagery, radar and other sensors to find patterns across these data sources, detect unusual activity in these patterns, and send real-time alerts to enforcement agencies to allow them to take action against bad actors.
 
Machine Learning (ML) enables the Skylight platform to process these signals and convert them into actionable alerts. We are building and deploying ML models throughout our platform to tackle a wide range of tasks: detecting and classifying vessels in imagery, matching vessel data across different data sources, detecting vessel activities in real time, detecting likely vessel transshipment events, assigning transshipment confidence scores to vessel encounters, and using transponder data to decide when and where to collect additional satellite imagery.
 

Machine Learning on Satellite Imagery


Satellite imagery is expensive to collect and may be hampered by orbital coverage or bad weather - so why use this data source in the first place?
 
Automatic Identification System (AIS) vessel transponders are a location publishing device installed on vessels which automatically report the position of a vessel. Originally deployed for ship to ship collision avoidance, this system has been adopted as a way to track movement of vessels via ground stations and orbiting satellites. However many AIS transponders are user-configured and easy to tamper with. Users may spoof vessel identification numbers, disable the device, or use GPS interference devices to report false locations. While the International Maritime Organization (IMO) requires AIS to be installed on ships having 300 gross tonnage and above on international voyages, this does not include many smaller vessels that may be of interest to maritime enforcement authorities.

 

Figure 1: Small vessels like these Philippine bancas do not carry AIS transponders but can be detected by satellite imagery.
 
Therefore a core feature of Skylight is to augment AIS transponder data with visible and radar satellite imagery to find vessels which do not have an AIS transponder or have disabled their transponder. But a single satellite image of a patch of ocean may cover hundreds of square kilometers and is too large and tedious for a person to manually analyze and annotate vessels across large areas of empty water.
 
We have built an satellite image processing pipeline which can do that job and integrates with the Skylight platform. The first stage of the pipeline is a vessel detection ML model which finds all the vessels in an image, extracts their positions, and extracts additional features like vessel length and width. The second stage of the pipeline runs each detection through a vessel type classifier ML model which assigns a probability for the type of each vessel that was detected (e.g. fishing, passenger, cargo, etc). The third stage uses that information to find a likely matching vessel from our vessel transponder database. This information is then passed to the Skylight user interface where maritime analysts can browse the satellite imagery and the vessels detected and identified in each image.
 

Figure 2: Vessel Detection and Classification pipeline.
 
There have been several challenges training the models for vessel detection, classification and correlation. Gathering training data was the first challenge - until recently most commercially available satellite imagery has been focused over land or at shoreline, so we had to task our own satellite imagery collections to gather a sufficient amount of training data over the open ocean. Cloud coverage is a perennial problem for collecting imagery over the ocean and is a problem for training the detector. Additionally different satellites offer differing ground sample resolution, spectrums, and varying degrees of post processing, which increases difficulty in training a detector that must find vessels varying in size from a few meters to several hundred meters in length.
 
In the process we have also had a few surprising false positives identified by our detector - including whales in the Pacific Ocean, oil patches in the Gulf of Mexico, and even airplanes in flight over the open ocean. Whitecaps, wakes and imagery that contains land are additional challenges for training the detector. Segmentation of multiple vessels tied together is required for many images. Preprocessing the images and managing varying image contrasts due to different times of day, different satellites, or extreme latitudes are all challenges for building models that are robust against false detections.
 
DG_plane_watermark.jpg
Figure 3: Among sources of false positives are airplanes in flight over the ocean.
 

Machine Learning on Time Series Data


Our team is also training and deploying models that operate on time series data. In this case the models are primarily trained on the AIS transponder data to do tasks like identifying vessel type and vessel activity. Whereas the satellite imagery training data can be both scarce and expensive, here the challenge is often dealing with the large amount of unlabeled data. The image below shows an example of vessel tracks over a small section of ocean for a collection of vessels and buoys over a few days.
 
Skylight_spaghetti_800x431.jpg
Figure 4: Vessel and buoy tracks from a few days over a small area of the ocean.
 
While the image above looks like a spaghetti accident, the movements of individual vessels tend to have regularities which lend themselves to building ML models that can identify either the current activity of a vessel or the type of vessel based on the movement of the vessel over time. In the image below even a novice can likely identify sections of the track which correspond to transits versus areas where this vessel is fishing.
 

Figure 5: Vessel tracks of a fishing vessel clearly shows fishing versus transiting activity. We can use ML models to compute how much fishing activity is occurring in an area or send real-time alerts to enforcement authorities of suspicious activity in protected areas.
 
Some of the challenges we face in this domain are: building tools to help us view, annotate and inspect ML-generated labels of vessel tracks, working with uneven and sparse reporting frequencies of vessels transponders, and choosing the appropriate features and representation of tracks to use as input to our models. Compared to the image-based ML models, building ML models for the time series data requires more time on feature engineering. Another challenge is determining the ground truth of an activity when there is sparse data for a vessel. Often it is unclear even to maritime domain experts what a vessel is doing based on the transponder data alone, which makes it harder to train models to handle those uncertainties.
 
We are also working on more advanced models in this area: identifying anomalous versus non-anomalous "gaps" in transponder data, predicting future vessel activity, and identifying coordinated activity based on the behavior of a single observable vessel. Another area of research is incorporating multi-modal data, graph and network-based features into our models.
 

It's Only the Beginning


This is only a taste of some of the ways we are using machine learning to protect the oceans and the people that depend on them. We are constantly improving the models mentioned above and developing new ones. Through the power of machine learning and integration with the Skylight platform we can build a more complete picture of activity on the ocean and help drive enforcement actions to the most critical threats.
 
 
About the Author
Dave M.
Principal Software Engineer
Dave is a Principal Software Engineer at Vulcan. He led the design of the Skylight architecture and integration of ML models into the Skylight platform. Prior to joining Vulcan Dave worked for 11 years in the e-commerce and computational marketing space, focusing on distributed systems and data analysis. He is thrilled to put those skills to use in the conservation and philanthropy space. He has a Master's Degree in Computer Science from Georgia Institute of Technology.

Category Tags
Machine Learning
ML4Good
Ocean Health
Remote Sensing
About the Author
Dave M.
Principal Software Engineer
Dave is a Principal Software Engineer at Vulcan. He led the design of the Skylight architecture and integration of ML models into the Skylight platform. Prior to joining Vulcan Dave worked for 11 years in the e-commerce and computational marketing space, focusing on distributed systems and data analysis. He is thrilled to put those skills to use in the conservation and philanthropy space. He has a Master's Degree in Computer Science from Georgia Institute of Technology.

Category Tags
Machine Learning
ML4Good
Ocean Health
Remote Sensing
Build a better future
Working at Vulcan