Occurrence records in the ALA can be filtered by using a flag called
geospatial_kosher. We use this term to refer to records that we consider spatially valid. This is a general collection of tests applied to the record to see whether the record passes basic tests for its spatial data components.
The flag is stored with the record so that spatial applications, such as the Spatial Portal, can quickly eliminate records that are likely to have severely inaccurate spatial information (e.g. incorrect latitude or longitude).
What makes a record spatially valid?
There's a list of tests. If the record fails any one of these, it's not considered to be spatially valid (geospatially kosher):
- Geospatial issue. If there is a user-annotated geospatial issue attached to the record, then the record fails. This status can change if the issue is resolved and removed. Example:"Geospatial issue"(note that not all examples actually find something in the ALA worth reporting)
- Taxonomic issue. If there is a user-annotated taxonomic issue attached to the record, then the record fails. This test is included because users flag what may be a geospatial issue as a taxonomic issue, depending on what they see in the record. "This is a kangaroo in the middle of the ocean, well it can't be a kangaroo." Example
- Supplied coordinates are zero. If both the supplied latitude and longitude are zero, the record fails. Example
- Coordinates are out of range for species. If the latitude or longitude are out of range (-90 to 90 for latitude, -180 to 180 for longitude) the record fails. The "out of range of species" is a red herring in this case. Example
- Decimal latitude/longitude conversion failed. If we cannot convert the supplied decimal latitude and longitude to the WGS84 datum (EPSG 4326) that the ALA uses. This may be because the supplied position cannot be unambiguously translated to or because the supplied position is invalid. Example
- Unparseable verbatim coordinates. If there isn't a decimal latitude and longitude supplied and if we cannot convert the text (verbatim) latitude and longitude into a decimal form, then the record fails. Example
- Unable to convert UTM coordinates. If we only have an easting and northing and cannot convert these to a latitude and longitude, the record fails. Example
Most of these tests rely on the record having some sort of coordinate location - a latitude and longitude or a grid reference - something that would allow you to find the record on a map. If a record doesn't have any coordinate-based location, then the record is neither spatially valid or not spatially valid. Examples of this case include where the location is just a town name, there is only a text description of the location, or the record is for a drawing showing the characteristics of a species.
What else do we check for?
There are a number of flags of lesser severity that relate to spatial validity.
To use these checks, you need to enable the Assertions facet in the filter menu. Click on Customise Filters near the top left of the page and endure that Assertions > Record Issues has been ticked. After that, after an initial query, you will be presented with a list of issues in the Narrow Your Results panel. Most example searches are for the genus Acacia, again, although there are some where we have to try different queries.
- Coordinates centre of country. The coordinates are exactly at the centre of the country the record is in. There may be occasional records that are legitimate that pass this test but, for the most part, this indicates a record where some upstream processor has put in assumed coordinates when only the country is supplied as a location. In the ALA, for example, there are some very lost looking creatures near Alice Springs. The "centre of the country" is defined in the ALA as the centre point of the bounding box of the country; this may not match the definition that the upstream processor is using. Example
- Supplied coordinates centre of state. The coordinates are exactly at the centre of the state or province the record is in. This test follows a similar pattern to the country test, above. Example
- Suspected outlier. The record lies outside the expected environment of the observed species (taxon). The ALA runs the reverse jackknife algorithm to detect records that are in locations where the environment is very different from most other records of that species. If the record exceeds a threshold on more than one (check) of the five environmental factors, then it is marked as potential outlier. Example (this doesn't return any records at present)
- Outside expert range for species. If we have an expert-compiled species spatial range and the record location is outside that range. Expert-defined ranges are currently available for a limited number of species such as birds and fish. Example (again, this doesn't return any records at the moment)
- Zero latitude. The latitude is zero. It's possible, but unlikely, that the observation is exactly on the equator. Example
- Zero longitude. The longitude is zero, for the same reason as zero latitude. Example
- Habitat mismatch. The location of the observation does not match the habitat of the species (terrestrial or marine). Currently this test uses IRMNG to detect if the species is marine but the accuracy of the location of the observation and the coastline (without a spatial buffer) does mean occasional false positives: A marine species on land or vice versa. Example
How is the flag used?
There are several ways of using the flag. Keep in mind that the flag has three possible values: true, false or not present. True is not strictly the opposite of false. In some applications, a spatially valid test is automatically added to any query, since it makes no sense to try and display spatially unusable records. The examples at the end of each entry are all searches for the genus Acacia, so that you can see how the filters work.
geospatial_kosher:true- shown in the biocache as Spatial validity: Spatially valid - is the general test used by spatial services, such as the spatial portal, to only include records that have usable latitude/longitude coordinates. Example
geospatial_kosher:false- shown in the biocache as Spatial validity: Spatially invalid - shows records that have been supplied with coordinates but where there is something seriously wrong with the information provided. Example
-geospatial_kosher:true- shown in the biocache as excluded Spatial validity: Spatially valid - shows records that are not explicitly spatially valid, both those which have failed coordinate checks and those without coordinate locations. Example (This query contains a lot more records than the previous example.)
-geospatial_kosher:false- shown in the biocache as excluded Spatial validity: Spatially invalid - shows records that haven't failed their coordinate checks; this filter may be useful if you want all records except those with something obviously wrong with them. Example
geospatial_kosher:*gives you everything that has a testable set of coordinates, whether they have passed or failed. Example
How do I get the old style spatially valid filter?
The older spatially valid filter included the tests mentioned above in "What makes a record spatially valid?" and all the tests in "What else do we check for?" These tests have been relaxed, since they were eliminating records where validity was more a matter of opinion than a clear spatial issue.
To get the old style of filter back, you can use the Record Issues filter and exclude any records that fail one of the checks.
You can always add these tests to a query by copying and pasting the following filter onto the query URL, before the #mapView bit.
This is a bit long but tells the biocache to exclude anything that has failed any one of the tests. Here is a complete example. Here is another example that finds every record in the ALA that fails at least one of the tests; if you have a look at the Record Issues list, you'll see that these tests aren't the only problems these records have.
Is geospatial_kosher a good name?
No, it's not, for two reasons:
- It's hard to understand what it actually means, since "kosher" in this sense is an informal term meaning genuine or authentic. Unless you have a wide vocabulary, it's likely to be misunderstood.
- Problems and issues should generally be marked with a positive flag, along the lines of
spatially_suspect=trueUsing this convention, records without a value can be assumed to pass.
However, a large array of the ALA's software assumes the presence of this field, so we're stuck with it for now. As we develop better data quality metrics, we are likely to start using terms like