ADR: Index AsIngestedCoordinates
ADR: Index AsIngested Coordinates
@chad @gehrmann @Keith_Wall @LFlakes @josh.townsend @lifeiliu @Java1Guy @srabanaguha
- ADR: Index AsIngested Coordinates
- Status
- Background
- Context & Scope
- Tradeoff Analysis
- Proposed solution (to be analyzed and implemented by Shell developers)
- Change Management
- Decision
- Consequences
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Background
- Discussed in OSDU Geomatics Integration workstream and supported by Shell, BP, Exxon and Equinor geomatics representatives.
- Discussed during AAF 2023-06-07 (Josh Townsend which has some recording and limited notes).
- Discussion further in issue !95 (closed) (this issue)
- Which refers to related issues:
- #62 (closed) (1 year ago; reporting in M12 the AsIngestedCoordinates are not returned; kept open but with answer that GET storage can be used to retrieve the original record.)
-
Issue 70 on geomatics board (This is only a placeholder pointing to this issue 95 for monitoring. It has some interesting comments that belong really here, as follows:
- "The Architectural Advice Forum did not endorse indexing the AsIngested Coordinates as spatial objects that would permit spatial search, but that is not needed or requested. As we discussed, there are at least two options that would allow return of the coordinates from search: (1) Index AsIngested as an ordinary array. or (2) Add data needed for search as extended properties.)"
- "Thomas @gehrmann and I discussed and agreed the most robust solution is to index the AsIngested coordinates and CRS as a simple array, not a spatial object."
This ADR writeup by Bert Kampes on request by Chad Leong is to help Shell developers have a clear idea of the proposed changes/specification, distilled from the above sources. The solution "way forward" is agreed, but not yet marked as "Approved" until after comments are received on this ADR specification design.
Context & Scope
AsIngestedCoordinates are currently not returned by the search, but only the Wgs84Coordinates (after normalization of ingested data that has an AbstractSpatialLocation). These Wgs84Coordinates are in a GeoJSON structure and potentially can contain a geometry with many vertices. At some point in the past a determination was made in OSDU architecture that returning AsIngestedCoordinates would not be necessary. It is true to Wgs84Coordinates are normalized and used for search. However, AsIngestedCoordinates and CRS are important properties to be available from Search results for example for a list of wells.
The Geomatics Workstream and others have commented that AsIngestedCoordinates were not returned as was expected.
We learned AsIngestedCoordinates were omitted by design because of fear of performance degredation and because these coordinate values are not used for searches in most use cases. (However they are used for discovery and QC across records; and existing solutions typically do allow search with logical operators).
A use case is for Well records. A developer may want to show to a user all the wells from a platform in a table, where one of the properties are the original coordinates and CRS. Currently this is only possible by retrieving each record through storage and it would be more efficient to have been returned by Search. Wells master data do not have an associated data file, such as a Wellbore might have in the form of a path in witsml.
Another use case is ingesting data without a BoundCRS, i.e., cannot be normalized to Wgs84. Then it is useful to have Original Coordinates in the array so someone can see there were coordinates but no Wgs84 coordinates normalized.
See also attached pptx from AAF and description and comments on issue !95 (closed).
AbstractSpatialLocation
-
link to AbstractSpatialLocation, which has:
- Quality metadata *
- And includes AbstractAnyCrsFeatureCollection, A schema like GeoJSON FeatureCollection with a non-WGS 84 CRS context; based on https://geojson.org/schema/FeatureCollection.json. Attention: the coordinate order is fixed: Longitude/Easting/Westing/X first, followed by Latitude/Northing/Southing/Y, optionally height as third coordinate, which has:
- features[].geometry.type: Point, MultiPoint, LineString, MultiLineString, Polygon or MultiPolygon.
- features[].geometry.coordinates (array),
- And properties for
- CoordinateReferenceSystemID
- persistableReferenceCrs
- VerticalCoordinateReferenceSystemID
- persistableReferenceVerticalCrs
- VerticalUnitID
- persistableReferenceUnitZ
Requirements
In addition to the simplified Elastic GeoJSON derived from Wgs84Coordinates that are currently already returned (i.e., no change to Wgs84Coordinates):
-
(Efficient) method to see the first AsIngested Coordinates, with their horizontal (and possible vertical) CRS(s), and specific metadata on location quality (which are part of the AbstractSpatialLocation entity).
-
It is expected that the first point coordinates are returned in search query responses if desired.
-
The string properties are expected to be
- usable in queries and
- be returned in search query responses if desired.
-
The coordinates of the first point are
- numbers (in JSON speak floating point numbers), AsIngestedCoordinates.FirstPoint.X, AsIngestedCoordinates.FirstPoint.Y, AsIngestedCoordinates.FirstPoint.Z.
- It is expected that the numbers can be used in simplistic box queries, provided the AsIngestedCoordinates.CoordinateReferenceSystemID (and AsIngestedCoordinates.VerticalCoordinateReferenceSystemID for 3D) are part of the query condition.
- It is expected that the first point coordinates are returned in search query responses if desired.
Tradeoff Analysis
Discussion yielded that returning AsIngestedCoordinates as properties in the Search query response, only for the first point, and with some other SpatialLocation metadata is the correct tradeoff to satisfy Geomatics use cases and not burden the indexer performance or memory.
Proposed solution (to be analyzed and implemented by Shell developers)
- Following approach is proposed. It says proposed because I am not intimately familiar with the code or all possible gotchas that you may run into when developing. It mainly describes the situation from an end-user what is needed to be returned.
For a record being ingested, for example a Well that may somehow have following AsIngestedCoordinates:
"data": {
// Pseudo json follows. feel free to replace with a real example
// AbstractSpatialLocation
"SomeLocation": {
"SpatialLocationCoordinatesDate": "2023-02-19",
"QuantitativeAccuracyBandID": "<1 m",
"QualitativeSpatialAccuracyTypeID": "Checked: Approved",
"CoordinateQualityCheckPerformedBy": "Bert",
"CoordinateQualityCheckDateTime": "2023-01-19",
"CoordinateQualityCheckRemarks": [
"good",
"really",
"vertical is good too"
],
"AppliedOperations": [
"conversion from ED_1950_UTM_Zone_31N to GCS_European_1950; 1 points converted",
"transformation GCS_European_1950 to GCS_WGS_1984 using ED_1950_To_WGS_1984_24; 1 points successfully transformed"
],
"SpatialParameterTypeID": "Outline",
"SpatialGeometryTypeID": "Point"
},
// AbstractAnyCrsFeatureCollection
"AsIngestedCoordinates": {
"CoordinateReferenceSystemID": "osdu:reference-data--CoordinateReferenceSystem:BoundProjected:EPSG::32021_EPSG::15851:",
"VerticalCoordinateReferenceSystemID": "osdu:reference-data--CoordinateReferenceSystem:Vertical:EPSG::5714:",
"VerticalUnitID": "osdu:reference-data--UnitOfMeasure:m:",
"persistableReferenceCrs": "{\"authCode\":{\"auth\":\"OSDU\",\"code\":\"32021079\"},\"lateBoundCRS\":{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"32021\"},\"name\":\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"PROJCS[\\\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],PROJECTION[\\\"Lambert_Conformal_Conic\\\"],PARAMETER[\\\"False_Easting\\\",2000000.0],PARAMETER[\\\"False_Northing\\\",0.0],PARAMETER[\\\"Central_Meridian\\\",-100.5],PARAMETER[\\\"Standard_Parallel_1\\\",46.18333333333333],PARAMETER[\\\"Standard_Parallel_2\\\",47.48333333333333],PARAMETER[\\\"Latitude_Of_Origin\\\",45.66666666666666],UNIT[\\\"Foot_US\\\",0.3048006096012192],AUTHORITY[\\\"EPSG\\\",32021]]\"},\"name\":\"NAD27 * OGP-Usa Conus / North Dakota CS27 South zone [32021,15851]\",\"singleCT\":{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"15851\"},\"name\":\"NAD_1927_To_WGS_1984_79_CONUS\",\"type\":\"ST\",\"ver\":\"PE_10_9_1\",\"wkt\":\"GEOGTRAN[\\\"NAD_1927_To_WGS_1984_79_CONUS\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],METHOD[\\\"NADCON\\\"],PARAMETER[\\\"Dataset_conus\\\",0.0],OPERATIONACCURACY[5.0],AUTHORITY[\\\"EPSG\\\",15851]]\"},\"type\":\"EBC\",\"ver\":\"PE_10_9_1\"}",
"persistableReferenceVerticalCrs": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"5714\"},\"name\":\"MSL_Height\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"VERTCS[\\\"MSL_Height\\\",VDATUM[\\\"Mean_Sea_Level\\\"],PARAMETER[\\\"Vertical_Shift\\\",0.0],PARAMETER[\\\"Direction\\\",1.0],UNIT[\\\"Meter\\\",1.0],AUTHORITY[\\\"EPSG\\\",5714]]\"}",
"persistableReferenceUnitZ": "{\"scaleOffset\":{\"scale\":1.0,\"offset\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
"features": [ // NOTE: A well will only have a single AnyCrsPoint for the surface location, potentially 2D, rather than 3D (and then also no vertical CRS, etc.). But I added here the 3D and additional AnyCrsLineString just to make clear what to do in this case.
{
"type": "AnyCrsFeature"
"geometry": {
"type": "AnyCrsPoint"
"coordinates": [1500000.0, 12345678.0, 100.0]
}
},
{
"type": "AnyCrsFeature"
"geometry": {
"type": "AnyCrsLineString"
"coordinates": [[1400000.0, 12345666.0, 99.0], [1600000.0, 12345777.0, 101.0]]
}
} ]
// Wgs84 Coordinates
"Wgs84Coordinates": { etc. Not relevant}
}
}
The desired end result of a query search response would include the following properties. They are a direct copy of the input record AbstractSpatialLocation fragment.
{
"data": {
"AsingestedCoordinates.FirstPoint.X": 222222.0, // Number (floating point) if given on ingest of course
"AsingestedCoordinates.FirstPoint.Y": 111111.0, // Number.
"AsingestedCoordinates.FirstPoint.Z": 100.0, // Number. Blank (null) unless the input had a Z value
"AsingestedCoordinates.CoordinateReferenceSystemID": "xxx", // see note below. OSDU allows data ingesting with PR and not with a reference to a CRS record id. What to do then?
"AsingestedCoordinates.VerticalCoordinateReferenceSystemID": "xxx", // for 3D Z value if in input
"AsingestedCoordinates.persistableReferenceCrs": "string xxx", // see note below.
"AsingestedCoordinates.persistableReferenceVerticalCrs": "string xxx",
"AsingestedCoordinates.persistableReferenceUnitZ": "string xxx",
"AsingestedCoordinates.QuantitativeAccuracyBandID": "xxx",
"AsingestedCoordinates.QualitativeSpatialAccuracyTypeID": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckPerformedBy": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckDateTime": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckRemarks[]": "(string array)",
"AsingestedCoordinates.AppliedOperations[]": "(string array)"
}
}
Note:
- AsingestedCoordinates.FirstPoint.Type is not needed because Wgs84Coordinates will have the original type. Though perhaps it is useful in case the FirstPoint was something like "AnyCrsMultiPoint" to know that
- AsingestedCoordinates.SpatialLocationCoordinatesDate is not needed because QC time is already there and this is more for plate motion that seems not needed at the moment. We could add it though.
expand me
Got you!Accepted Limitations / things to work out
The following are some accepted limitation of the proposed solution, e.g., that we agree only to index the first point in a flat array and not as a geometry for reasons of performance. There are also some Questions which the developers will have to contemplate and propose a solution for (which may be that there is no solution).
- Only first point of the AsIngested geometry is accepted if geometry contains more than 1 point.
- If it would be useful or better to use a switch or flag to search so user can decide when to include geometry in the response (I would argue then Wgs84 and AsIngested) then it is fine if by default they are returned but can be omitted. But I expect this is already the case using the ReturnedFields.
- In itself it seems not a bad option to be default omit the geometry because it can be large for 2D lines or so. But that is not the intention of this issue.
- What to do if the ingested Geometry is complex?
- It is not relevant to the implementation, but please clarify if AnyCrsfeatureCollection indeed can contain both Points and LineStrings (for example) or has to contain only a single feature. The name collection suggests it can be complex combination of types.
- If AsIngested geometry contains multiple types or OneOff then
- Index the Point if it exists, else the first point of a MultiPoint, else the first point of a LineString, else of a MultiLineString, else of a Polygon, else of the MultiPolygon (else nothing, there is no geometry!).
- What to do if there is a PR but no CRS id on input?
- Option 1 is to not return the CRS and no coordinates but that is not satisfactory.
- Option 2 is to not return the CRS but coordinates.
- Option 3 is to return the PR in the CRS ID field.
- Option 4 is to return the PR as PR (preferred).
- Option 5 is to look up the id of the PR (but we do not have a function for that and would take time...). In a way this is ideal though, but we expect people to ingest data with a (bound CRS) record id.
- Can somehow the CRS Name (Hor and Vert) be returned?
- Option 1 is no. I think we have to accept this, because the name is not part of the input.
- Option 2 is yes. Because the normalizer will print in OperationsApplied the CRS Name (at least for the horizontal which is most important).
- Option 3 is to look up the CRS by id and then retrieve some parameters (for example the PR to augment the stored and indexed record with the numerical definition used at the time of normalization; as a permanent record frozen in time what was applied at the time of ingestion - which was the original requirement in 2021 for ingested data to look up the PR and store it with the data but this was said not to be possible.)
- Can somehow the AppliedOperations be returned or not too useful to bother?
- Option 1 is yes.
- Option 2 is no.
Change Management
- Operators may need to re-ingest data or update the index. Is it possible to "patch" data to re-run the indexer on data already ingested?
Decision
- Implement by Shell developers working on Search Service.
Consequences
- The indexer code changes should have no noticable impact on the system or applications (only additional property returned).
#EOF.