Today, we’ve changed the Seen Nearby label on suggestions to Expected Nearby. The label comes from predictions made by the iNaturalist Geomodel that we’re introducing for the first time.
What is the Geomodel?
Most of you are familiar with the iNaturalist Computer Vision Model which takes an image as input and returns the most likely species based on visual similarity as an output. We train that model on a set of about 80 thousand species with enough data and update it monthly (we released version 2.7 today).
The iNaturalist Geomodel takes a location as input and returns the most likely species at that location as output. Like the Computer Vision Model, it is a Deep Learning model trained on the same set of taxa and updated on the same monthly schedule. We developed and published the Geomodel in collaboration with the same Visipedia team that assisted with the iNaturalist Computer Vision Model. The map below shows Geomodel predictions for American Pika. The Geomodel is trained only on iNaturalist observations and an elevation map.
From Gridded Observations to Geomodel Predictions
iNaturalist has been using the Geomodel to weight computer vision suggestions since June of the year. We started using the Geomodel to apply the Expected Nearby label today.
Previously, we used a gridded version of the raw observations to weight Computer Vision suggestions and apply the Nearby label. We counted the relative number of observations for each species onto a 1-degree grid. If there were any observations of the taxon in the surrounding 9 grid cells, we applied the Seen Nearby label to suggestions. We used the relative number of observations in the grid cells to weight the Computer Vision suggestions. Note the grid cell for Mexican Treehopper in southern Brazil likely due to a misidentified observation.
We’re now using the Expected Nearby Map predicted by the Geomodel to apply the Nearby label. You can think of the Expected Nearby Map as an estimate of whether the species is present near the location.
The change in name from Seen Nearby to Expected Nearby is intended to make it clear that the label comes from a model prediction rather than a grid of observations. Note that these predictions aren’t perfect. For example, Mexican Treehopper probably doesn’t occur in the Galapagos or Cuba despite the predictions. For some species the Geomodel performs remarkably well, while for others predictions have very high error. Work to better understand these and experiment with improvements is ongoing. But as we show below, on average the Geomodel improves upon the 1-degree grid approach it replaces and we expect continued improvements with future Geomodel versions.
We use an Unthresholded version of the Expected Nearby Map to weight Computer Vision suggestions. You can think of the Unthresholded Map as the relative probability that a species occurs at a location.
You can explore the Expected Nearby Maps and Unthresholded Maps we use to weight Computer vision suggestions on new Geomodel prediction pages we’ve linked from the taxon pages of all approximately included 80,000 species.
Why the Geomodel and Next Steps
We transitioned from the 1-degree gridded data to the Geomodel for four main reasons:
1. Improvements to Computer Vision suggestions
As detailed in the Evaluating the Geomodel section below, the Geomodel improves the accuracy of Computer Vision suggestions compared to the 1-degree grid approach. Version 2.7, released today, is about a 4% improvement over the 1-degree grid approach for Top 1 suggestion accuracy and we anticipate more accuracy gains with future Geomodel versions as we refine the modeling approach and more observations are uploaded.
2. Future direction: Fast/offline geospatial information
The number of Geomodel parameters is less than 2% the size of the 1-degree grid cell data. This means the Geomodel is small and fast enough to run on the mobile device like the Computer Vision Model does on Seek. This opens up the potential for including geospatial information in features such as the Seek in-camera suggestions and displaying taxon maps on mobile devices offline. We haven’t built these features yet, but the Geomodel will make them possible.
3. Future direction: Surfacing unusual observations
As iNaturalist grows, the community needs better tools to surface unusual observations that may represent misidentifications or important discoveries such as a range extension or the early detection of an invasive species.
The figure below shows 2.1 million dragonfly observations ranked by their geographic unusualness as predicted by the Geomodel. The right side of the histogram shows the most unusual 0.01% of observations. We sent these 223 unusual observations to dragonfly expert @dennispaulson to vet. 197 observations (88%) were misidentified observations (red bars) such as this Rainpool Spreadwing misidentified as a Slender Spreadwing. The remaining 26 represented some legitimately unusual records (white bars) such as this Slaty Skimmer range extension from Colorado.
Some observations in the white bars were unusual to our model but not to @dennispaulson, such as this Highland Meadowhawk from Haiti that the Geomodel thinks is unusual. With more observations and identifications from poorly sampled regions, the accuracy of the Geomodel will improve over time.
Fly expert @zdanko helped with a similar experiment with 500,000 hoverfly observations. Similar to dragonflies, of the 365 most unusual observations, 267 observations (73%) were misidentifications.
We’re excited about the potential to build tools around the Geomodel to help more quickly surface these unusual observations for more attention from experts so that misidentifications can be fixed and important discoveries like species range extensions aren’t missed.
4. Future direction: Context about range size
One of the most important characteristics of a species from a conservation perspective is its geographic range size. All other things being equal, small-range species tend to be at much greater risk of extinction than species that are widely distributed. In order to prioritize scarce conservation resources and attention, land managers need tools to determine which species are small-ranged local endemics (species that occur nowhere else in the world) from other more widely distributed species.
As describe in the Evaluating the Geomodel section below, Geomodel predictions of range area are well correlated with the areas of range maps such as the Taxon Ranges that appear on some taxon pages that come from external sources.
The figure below shows Geomodel predictions of range area for 10 small-ranged birds from around the world. We hope to build tools around the Geomodel to make it easier to determine which observations belong to small ranged endemic species in order to help the land management community prioritize these conservation targets.
The Expected Nearby Maps are being rendered on the Geomodel prediction pages at a coarse 1.8 thousand square-kilometer resolution and therefore are not publicly revealing precise information about sensitive species. We continue to improve iNaturalist channels that securely mobilize sensitive species data and precise predictions for conservation purposes.
Evaluating the Geomodel
We have evaluated the Geomodel by measuring:
- Improvements to suggestion accuracy
- Retaining the correct suggestion in the Expected Nearby subset
- Overlap between Expected Nearby maps and Taxon Ranges
1. Improvements to suggestion accuracy
On average, Top 1 suggestion accuracy improved from 75% to 83% (+8%) by weighting the raw Computer Vision scores with the 1-degree grid. Weighting with the Geomodel instead improved Top 1 suggestion accuracy to 87% (+12%). We repeated this analysis within geographic and taxonomic groupings and in all cases the Geomodel outperformed the 1-degree data.
2. Retaining the correct suggestion in the Expected Nearby subset
By default, we only show the subset of Nearby suggestions. This has the advantage of removing suggestions that are unlikely based on location, but there’s also a risk of removing the correct suggestion. We calculated Recall statistics measuring how often the correct suggestion was retained in the Nearby subsets derived from the Geomodel and the 1-degree grid. On average, both approaches yielded the same Recall of 0.94 meaning for every 100 observations the correct result was included in the Nearby subset 94 times.
3. Overlap between Expected Nearby maps and Taxon Ranges
To measure how well the Expected Nearby maps compared to the Taxon Ranges displayed on the iNaturalist taxon pages, we compared them and calculated Precision and Recall statistics. The Taxon Ranges aren’t perfectly accurate either so for evaluation purposes we used the subset of around 5,000 Taxon Ranges that contained at least 90% of the observations for the taxon.
We repeated this analysis comparing the 1-degree grids and Geomodel to the Taxon Ranges. The Geomodel predictions improved the average of Precision and Recall. The F1 statistic (the harmonic mean of Precision and Recall) improved by 9% for the Geomodel compared to the 1-degree grid.
The Geomodel also does a better job of matching Taxon Range area than the 1-degree grids as measured by Mean Logarithmic Squared Error (MLSE).
Thank you
We want to extend special thanks to our research collaborators, including Oisin Mac Aodha (University of Edinburgh), Elijah Cole (Caltech), Grant Van Horn (UMass Amherst), Christian Lange (University of Edinburgh), Pietro Perona (Caltech), and @tbrooks (IUCN), as well as the generous support from a Climate Change AI 2021-2022 Innovation Grant that helped make this work possible.
We’re excited about the gains in suggestion accuracy the Geomodel is making possible today and the potential for future directions that it opens up for us to pursue in the coming months. Thank you to the entire iNaturalist community for generating all of the observations and identifications that make training powerful models like the Geomodel possible!