Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • S Storage
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 52
    • Issues 52
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 12
    • Merge requests 12
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • System
  • Storage
  • Issues
  • #26
Closed
Open
Issue created Aug 10, 2020 by Thomas Gehrmann [slb]@gehrmannDeveloper

[System/Storage] Relax id validation to support OSDU relationship definitions/constraints

OSDU defines entity-types as a compound reference <group-type>/<individual-type>. These OSDU entity-type specifications are used to constrain relationships, e.g. identify a relationship target type via a pattern.

Jump to latest conclusion

The Storage service constrains the id using this regular expression in ValidationDoc.java:

  • "[\\w-\\.]+:[\\w-\\.]+:[\\w-\\.]+" describing the following parts:
  • <data-partition-id>:<entity-type>:<unique-instance-id> where entity-type means group-type/individual-type.

The corresponding JSON schema pattern regex using ECMAScript style is

  • "^[\\w-\\.]+:[\\w-\\.]+:[\\w-\\.]+$"

it should be changed to at least

  • "^[\\w-\\.]+:[\\w-\\.\\/]+:[\\w-\\.]+$" -- see revision below.

to support <data-partition-id>:<group-type>/<individual-type>:<unique-instance-id>.

Furthermore it should be decided which other characters to allow in the unqiue <unique-instance-id>. My suggestion is to relax this to support GUIDs (already supported) and url-encoded strings. There are a number of use cases for deterministic <unique-instance-id> for reference data.

Decision as per November 3rd

The regex expression for id will change to:

  • "^[\\w-\\.]+:[\\w-\\.\\/]+:.+$"

The actual validation regex must be published with the Storage service. In turn, OSDU data definitions must adopt the constraints in their schema definitions. At the moment validation pattern for id are entirely unconstrained, except :, i.e. [^:\]+ for each of the id parts.

Addition December 6th:

The regex for the kind in ValidationDoc.java line 27 seems to be incorrect as well. It lacks the ^ and $ symbols at the beginning and end (otherwise any invalid characters can be added at the beginning and end). The condition for the semantic version number also doesn't filter invalid separators. Instead this expression should work:

^[\w\-\.]+:[\w\-\.]+:[\w\-\.]+:[0-9]+.[0-9]+.[0-9]+$

or as string:
"^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.]+:[0-9]+.[0-9]+.[0-9]+$"

Summary January 6, 2021

The following regex expressions have been tested in https://regex101.com/ using the ECMAScript option (JSON standard):

RECORD_ID_REGEX              = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+$"
as used in regex101:            ^[\w\-\.]+:[\w-\.\/]+:.+$

RECORD_ID_WITH_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+:[0-9]+$"
as used in regex101:            ^[\w\-\.]+:[\w-\.\/]+:.+:[0-9]+$

KIND_REGEX                   = "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.\\/]+:[0-9]+.[0-9]+.[0-9]+$"
as used in regex101             ^[\w\-\.]+:[\w\-\.]+:[\w\-\.\/]+:[0-9]+.[0-9]+.[0-9]+$ 

If we eventually support 'optionally versioned' id references in the Storage API, there is another regex required:

RECORD_ID_WITH_OPTIONAL_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+:[0-9]*$"
as used in regex101                      ^[\w\-\.]+:[\w-\.\/]+:.+:[0-9]*$

It turned out that all these 'wishes' were made without (seriously) checking the implementations. / is a reserved character in at least one implementation. Therefore, change of plans, again.

Summary January 26, 2021

To preserve 'business' ids, like unit symbols, it is required to url-encode the desired IDs, e.g. in reference-data. This stops the otherwise reserve characters. : is already used as a separator in kind and id. It is a desired symbol for certain business desired ids. This means the last part of the id should use this regex: [\w\-\.\:\%]+ alpha-numeric characters, underscore, dash, dot, colon and percent.

RECORD_ID_REGEX              = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+$"
as used in regex101:            ^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+$

RECORD_ID_WITH_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]+$"
as used in regex101:            ^[\w\-\.]+:[\w-\.\/]+:[\w\-\.\:\%]+:[0-9]+$

KIND_REGEX                   = "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.]+:[0-9]+.[0-9]+.[0-9]+$"
as used in regex101             ^[\w\-\.]+:[\w\-\.]+:[\w\-\.]+:[0-9]+.[0-9]+.[0-9]+$ 

If we eventually support 'optionally versioned' id references in the Storage API, there is another regex required:

RECORD_ID_WITH_OPTIONAL_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]*$"
as used in regex101                      ^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+:[0-9]*$
Edited Jan 27, 2021 by Thomas Gehrmann [slb]
Assignee
Assign to
Time tracking