ADR: Array of Objects support by Indexer

changed the description

@rajk, @gehrmann, @wladmirf, @dkodeih, @danielscholl, @Nieten, @ethiraj - please review ADR

tagging @madhurtanwani and @mosiddi as well.

CC: @ChrisZhang, @ashams_s, @meenarathinavel

@Dmitriy_Rudko thanks for raising this ADR. I fully support externalizing this concern as there are key pros and cons of each type choice in elastic.

However the DD team may not be the (only) right team to weigh in on the config flags. We need a collaborative engagement here to annotate the instances properly.

On the code side can we consider an option to rebuild the index schema and trigger indexing of the operator decides to make a previously object attribute as a nested type for search and retrieval fidelity. Can we include this option to easily promote up or down index classification at deployment into the ADR please.

Thanks

@rajk Regarding rebuild of the index, I think we should solve this problem not just for cases with nested/object/flattened type but for index change in general. We already facing cases of schema mutation - when schema was updated in Schema service without increment of version. Mostly this will be the cases during development.

From this perspective, we can generate an event that schema was updated - and this should trigger full reindex for particular kind.

Thanks Dmitriy for summarizing this. I'm tagging Alex from Elastic as well. @alex.close

@rajk, @Dmitriy_Rudko, Data Definitions can contribute here. There is a custom tag x-osdu-skip-indexing already, but likely you need more than that, plus a collective agreement what the max number of elements to index is.

And the latest schema bug fixes to support the dataset service are still in the merge queue - Any agility on my side won't be visible downstream.

correct @gehrmann and thanks for the help here. - the flag as is today helps with a boolean discrimination - on or off, Dmitry is proposing that we mimic the elastic behavior and have an option to determine three cases - off, flattened for search or fully on for search with hierarchy and fidelity of return.

we can may be think of a mapping like this:

x-osdu-skip-indexing = on - treat as object
x-osdu-skip-indexing = off
- x-osdu-flatten-index = off - treat as nested object (expensive)
- x-osdu-flatten-index = on - treat as flattened type (limitation but less expensive)

@Dmitriy_Rudko what do you think?

@rajk Im a little worry about performance in this case. Default behavior will be 'nested' type - and Im pretty sure that people will just forget about other options.

With explicit logic, we are forcing Data Modeler to think trough this and explicitly chose the type that works best in this particular case.

@Dmitriy_Rudko can you please give some examples where flattening will be applicable instead of a nested object?

fully agree - can you include this in the ADR scope, so we can take care of this during execution. Thanks.

changed the description

Information is already in ADR body.

As a summary:

object - good default choose as it will not blow up the index, but will not work for case where we need to query individual objects:

List well logs that has the curves with (mnemonic GR and quality X) and / or (mnemonic RHOB and quality Z)

changed the description

mentioned in commit 2a533b3f

mentioned in merge request !123 (merged)

mentioned in commit e0e84f13

added M5 label

mentioned in commit 5824b1ae

mentioned in commit ebeb1a59

ADR: Array of Objects support by Indexer

Status

Context

Scope

Decision

Rationale

Consequences

Designs

Child items ...

Activity