|
@@ -0,0 +1,297 @@
|
|
|
+Versioning Elasticsearch
|
|
|
+========================
|
|
|
+
|
|
|
+Elasticsearch is a complicated product, and is run in many different scenarios.
|
|
|
+A single version number is not sufficient to cover the whole of the product,
|
|
|
+instead we need different concepts to provide versioning capabilities
|
|
|
+for different aspects of Elasticsearch, depending on their scope, updatability,
|
|
|
+responsiveness, and maintenance.
|
|
|
+
|
|
|
+## Release version
|
|
|
+
|
|
|
+This is the version number used for published releases of Elasticsearch,
|
|
|
+and the Elastic stack. This takes the form _major.minor.patch_,
|
|
|
+with a corresponding version id.
|
|
|
+
|
|
|
+Uses of this version number should be avoided, as it does not apply to
|
|
|
+some scenarios, and use of release version will break Elasticsearch nodes.
|
|
|
+
|
|
|
+The release version is accessible in code through `Build.current().version()`,
|
|
|
+but it **should not** be assumed that this is a semantic version number,
|
|
|
+it could be any arbitrary string.
|
|
|
+
|
|
|
+## Transport protocol
|
|
|
+
|
|
|
+The transport protocol is used to send binary data between Elasticsearch nodes;
|
|
|
+`TransportVersion` is the version number used for this protocol.
|
|
|
+This version number is negotiated between each pair of nodes in the cluster
|
|
|
+on first connection, and is set as the lower of the highest transport version
|
|
|
+understood by each node.
|
|
|
+This version is then accessible through the `getTransportVersion` method
|
|
|
+on `StreamInput` and `StreamOutput`, so serialization code can read/write
|
|
|
+objects in a form that will be understood by the other node.
|
|
|
+
|
|
|
+Every change to the transport protocol is represented by a new transport version,
|
|
|
+higher than all previous transport versions, which then becomes the highest version
|
|
|
+recognized by that build of Elasticsearch. The version ids are stored
|
|
|
+as constants in the `TransportVersions` class.
|
|
|
+Each id has a standard pattern `M_NNN_SS_P`, where:
|
|
|
+* `M` is the major version
|
|
|
+* `NNN` is an incrementing id
|
|
|
+* `SS` is used in subsidiary repos amending the default transport protocol
|
|
|
+* `P` is used for patches and backports
|
|
|
+
|
|
|
+When you make a change to the serialization form of any object,
|
|
|
+you need to create a new sequential constant in `TransportVersions`,
|
|
|
+introduced in the same PR that adds the change, that increments
|
|
|
+the `NNN` component from the previous highest version,
|
|
|
+with other components set to zero.
|
|
|
+For example, if the previous version number is `8_413_00_1`,
|
|
|
+the next version number should be `8_414_00_0`.
|
|
|
+
|
|
|
+Once you have defined your constant, you then need to use it
|
|
|
+in serialization code. If the transport version is at or above the new id,
|
|
|
+the modified protocol should be used:
|
|
|
+
|
|
|
+ str = in.readString();
|
|
|
+ bool = in.readBoolean();
|
|
|
+ if (in.getTransportVersion().onOrAfter(TransportVersions.NEW_CONSTANT)) {
|
|
|
+ num = in.readVInt();
|
|
|
+ }
|
|
|
+
|
|
|
+If a transport version change needs to be reverted, a **new** version constant
|
|
|
+should be added representing the revert, and the version id checks
|
|
|
+adjusted appropriately to only use the modified protocol between the version id
|
|
|
+the change was added, and the new version id used for the revert (exclusive).
|
|
|
+The `between` method can be used for this.
|
|
|
+
|
|
|
+Once a transport change with a new version has been merged into main or a release branch,
|
|
|
+it **must not** be modified - this is so the meaning of that specific
|
|
|
+transport version does not change.
|
|
|
+
|
|
|
+_Elastic developers_ - please see corresponding documentation for Serverless
|
|
|
+on creating transport versions for Serverless changes.
|
|
|
+
|
|
|
+### Collapsing transport versions
|
|
|
+
|
|
|
+As each change adds a new constant, the list of constants in `TransportVersions`
|
|
|
+will keep growing. However, once there has been an official release of Elasticsearch,
|
|
|
+that includes that change, that specific transport version is no longer needed,
|
|
|
+apart from constants that happen to be used for release builds.
|
|
|
+As part of managing transport versions, consecutive transport versions can be
|
|
|
+periodically collapsed together into those that are only used for release builds.
|
|
|
+This task is normally performed by Core/Infra on a semi-regular basis,
|
|
|
+usually after each new minor release, to collapse the transport versions
|
|
|
+for the previous minor release. An example of such an operation can be found
|
|
|
+[here](https://github.com/elastic/elasticsearch/pull/104937).
|
|
|
+
|
|
|
+### Minimum compatibility versions
|
|
|
+
|
|
|
+The transport version used between two nodes is determined by the initial handshake
|
|
|
+(see `TransportHandshaker`, where the two nodes swap their highest known transport version).
|
|
|
+The lowest transport version that is compatible with the current node
|
|
|
+is determined by `TransportVersions.MINIMUM_COMPATIBLE`,
|
|
|
+and the node is prevented from joining the cluster if it is below that version.
|
|
|
+This constant should be updated manually on a major release.
|
|
|
+
|
|
|
+The minimum version that can be used for CCS is determined by
|
|
|
+`TransportVersions.MINIMUM_CCS_VERSION`, but this is not actively checked
|
|
|
+before queries are performed. Only if a query cannot be serialized at that
|
|
|
+version is an action rejected. This constant is updated automatically
|
|
|
+as part of performing a release.
|
|
|
+
|
|
|
+### Mapping to release versions
|
|
|
+
|
|
|
+For releases that do use a version number, it can be confusing to encounter
|
|
|
+a log or exception message that references an arbitrary transport version,
|
|
|
+where you don't know which release version that corresponds to. This is where
|
|
|
+the `.toReleaseVersion()` method comes in. It uses metadata stored in a csv file
|
|
|
+(`TransportVersions.csv`) to map from the transport version id to the corresponding
|
|
|
+release version. For any transport versions it encounters without a direct map,
|
|
|
+it performs a best guess based on the information it has. The csv file
|
|
|
+is updated automatically as part of performing a release.
|
|
|
+
|
|
|
+In releases that do not have a release version number, that method becomes
|
|
|
+a no-op.
|
|
|
+
|
|
|
+### Managing patches and backports
|
|
|
+
|
|
|
+Backporting transport version changes to previous releases
|
|
|
+should only be done if absolutely necessary, as it is very easy to get wrong
|
|
|
+and break the release in a way that is very hard to recover from.
|
|
|
+
|
|
|
+If we consider the version number as an incrementing line, what we are doing is
|
|
|
+grafting a change that takes effect at a certain point in the line,
|
|
|
+to additionally take effect in a fixed window earlier in the line.
|
|
|
+
|
|
|
+To take an example, using indicative version numbers, when the latest
|
|
|
+transport version is 52, we decide we need to backport a change done in
|
|
|
+transport version 50 to transport version 45. We use the `P` version id component
|
|
|
+to create version 45.1 with the backported change.
|
|
|
+This change will apply for version ids 45.1 to 45.9 (should they exist in the future).
|
|
|
+
|
|
|
+The serialization code in the backport needs to use the backported protocol
|
|
|
+for all version numbers 45.1 to 45.9. The `TransportVersion.isPatchFrom` method
|
|
|
+can be used to easily determine if this is the case: `streamVersion.isPatchFrom(45.1)`.
|
|
|
+However, the `onOrAfter` also does what is needed on patch branches.
|
|
|
+
|
|
|
+The serialization code in version 53 then needs to additionally check
|
|
|
+version numbers 45.1-45.9 to use the backported protocol, also using the `isPatchFrom` method.
|
|
|
+
|
|
|
+As an example, [this transport change](https://github.com/elastic/elasticsearch/pull/107862)
|
|
|
+was backported from 8.15 to [8.14.0](https://github.com/elastic/elasticsearch/pull/108251)
|
|
|
+and [8.13.4](https://github.com/elastic/elasticsearch/pull/108250) at the same time
|
|
|
+(8.14 was a build candidate at the time).
|
|
|
+
|
|
|
+The 8.13 PR has:
|
|
|
+
|
|
|
+ if (transportVersion.onOrAfter(8.13_backport_id))
|
|
|
+
|
|
|
+The 8.14 PR has:
|
|
|
+
|
|
|
+ if (transportVersion.isPatchFrom(8.13_backport_id)
|
|
|
+ || transportVersion.onOrAfter(8.14_backport_id))
|
|
|
+
|
|
|
+The 8.15 PR has:
|
|
|
+
|
|
|
+ if (transportVersion.isPatchFrom(8.13_backport_id)
|
|
|
+ || transportVersion.isPatchFrom(8.14_backport_id)
|
|
|
+ || transportVersion.onOrAfter(8.15_transport_id))
|
|
|
+
|
|
|
+In particular, if you are backporting a change to a patch release,
|
|
|
+you also need to make sure that any subsequent released version on any branch
|
|
|
+also has that change, and knows about the patch backport ids and what they mean.
|
|
|
+
|
|
|
+## Index version
|
|
|
+
|
|
|
+Index version is a single incrementing version number for the index data format,
|
|
|
+metadata, and associated mappings. It is declared the same way as the
|
|
|
+transport version - with the pattern `M_NNN_SS_P`, for the major version, version id,
|
|
|
+subsidiary version id, and patch number respectively.
|
|
|
+
|
|
|
+Index version is stored in index metadata when an index is created,
|
|
|
+and it is used to determine the storage format and what functionality that index supports.
|
|
|
+The index version does not change once an index is created.
|
|
|
+
|
|
|
+In the same way as transport versions, when a change is needed to the index
|
|
|
+data format or metadata, or new mapping types are added, create a new version constant
|
|
|
+below the last one, incrementing the `NNN` version component.
|
|
|
+
|
|
|
+Unlike transport version, version constants cannot be collapsed together,
|
|
|
+as an index keeps its creation version id once it is created.
|
|
|
+Fortunately, new index versions are only created once a month or so,
|
|
|
+so we don’t have a large list of index versions that need managing.
|
|
|
+
|
|
|
+Similar to transport version, index version has a `toReleaseVersion` to map
|
|
|
+onto release versions, in appropriate situations.
|
|
|
+
|
|
|
+## Cluster Features
|
|
|
+
|
|
|
+Cluster features are identifiers, published by a node in cluster state,
|
|
|
+indicating they support a particular top-level operation or set of functionality.
|
|
|
+They are used for internal checks within Elasticsearch, and for gating tests
|
|
|
+on certain functionality. For example, to check all nodes have upgraded
|
|
|
+to a certain point before running a large migration operation to a new data format.
|
|
|
+Cluster features should not be referenced by anything outside the Elasticsearch codebase.
|
|
|
+
|
|
|
+Cluster features are indicative of top-level functionality introduced to
|
|
|
+Elasticsearch - e.g. a new transport endpoint, or new operations.
|
|
|
+
|
|
|
+It is also used to check nodes can join a cluster - once all nodes in a cluster
|
|
|
+support a particular feature, no nodes can then join the cluster that do not
|
|
|
+support that feature. This is to ensure that once a feature is supported
|
|
|
+by a cluster, it will then always be supported in the future.
|
|
|
+
|
|
|
+To declare a new cluster feature, add an implementation of the `FeatureSpecification` SPI,
|
|
|
+suitably registered (or use an existing one for your code area), and add the feature
|
|
|
+as a constant to be returned by getFeatures. To then check whether all nodes
|
|
|
+in the cluster support that feature, use the method `clusterHasFeature` on `FeatureService`.
|
|
|
+It is only possible to check whether all nodes in the cluster have a feature;
|
|
|
+individual node checks should not be done.
|
|
|
+
|
|
|
+Once a cluster feature is declared and deployed, it cannot be modified or removed,
|
|
|
+else new nodes will not be able to join existing clusters.
|
|
|
+If functionality represented by a cluster feature needs to be removed,
|
|
|
+a new cluster feature should be added indicating that functionality is no longer
|
|
|
+supported, and the code modified accordingly (bearing in mind additional BwC constraints).
|
|
|
+
|
|
|
+The cluster features infrastructure is only designed to support a few hundred features
|
|
|
+per major release, and once features are added to a cluster they can not be removed.
|
|
|
+Cluster features should therefore be used sparingly.
|
|
|
+Adding too many cluster features risks increasing cluster instability.
|
|
|
+
|
|
|
+When we release a new major version N, we limit our backwards compatibility
|
|
|
+to the highest minor of the previous major N-1. Therefore, any cluster formed
|
|
|
+with the new major version is guaranteed to have all features introduced during
|
|
|
+releases of major N-1. All such features can be deemed to be met by the cluster,
|
|
|
+and the features themselves can be removed from cluster state over time,
|
|
|
+and the feature checks removed from the code of major version N.
|
|
|
+
|
|
|
+### Testing
|
|
|
+
|
|
|
+Tests often want to check if a certain feature is implemented / available on all nodes,
|
|
|
+particularly BwC or mixed cluster test.
|
|
|
+
|
|
|
+Rather than introducing a production feature just for a test condition,
|
|
|
+this can be done by adding a _test feature_ in an implementation of
|
|
|
+`FeatureSpecification.getTestFeatures`. These features will only be set
|
|
|
+on clusters running as part of an integration test. Even so, cluster features
|
|
|
+should be used sparingly if possible; Capabilities is generally a better
|
|
|
+option for test conditions.
|
|
|
+
|
|
|
+In Java Rest tests, checking cluster features can be done using
|
|
|
+`ESRestTestCase.clusterHasFeature(feature)`
|
|
|
+
|
|
|
+In YAML Rest tests, conditions can be defined in the `requires` or `skip` sections
|
|
|
+that use cluster features; see [here](https://github.com/elastic/elasticsearch/blob/main/rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/README.asciidoc#skipping-tests) for more information.
|
|
|
+
|
|
|
+To aid with backwards compatibility tests, the test framework adds synthetic features
|
|
|
+for each previously released Elasticsearch version, of the form `gte_v{VERSION}`
|
|
|
+(for example `gte_v8.14.2`).
|
|
|
+This can be used to add conditions based on previous releases. It _cannot_ be used
|
|
|
+to check the current snapshot version; real features or capabilities should be
|
|
|
+used instead.
|
|
|
+
|
|
|
+## Capabilities
|
|
|
+
|
|
|
+The Capabilities API is a REST API for external clients to check the capabilities
|
|
|
+of an Elasticsearch cluster. As it is dynamically calculated for every query,
|
|
|
+it is not limited in size or usage.
|
|
|
+
|
|
|
+A capabilities query can be used to query for 3 things:
|
|
|
+* Is this endpoint supported for this HTTP method?
|
|
|
+* Are these parameters of this endpoint supported?
|
|
|
+* Are these capabilities (arbitrary string ids) of this endpoint supported?
|
|
|
+
|
|
|
+The API will return with a simple true/false, indicating if all specified aspects
|
|
|
+of the endpoint are supported by all nodes in the cluster.
|
|
|
+If any aspect is not supported by any one node, the API returns `false`.
|
|
|
+
|
|
|
+The API can also return `supported: null` (indicating unknown)
|
|
|
+if there was a problem communicating with one or more nodes in the cluster.
|
|
|
+
|
|
|
+All registered endpoints automatically work with the endpoint existence check.
|
|
|
+To add support for parameter and feature capability queries to your REST endpoint,
|
|
|
+implement the `supportedQueryParameters` and `supportedCapabilities` methods in your rest handler.
|
|
|
+
|
|
|
+To perform a capability query, perform a REST call to the `_capabilities` API,
|
|
|
+with parameters `method`, `path`, `parameters`, `capabilities`.
|
|
|
+The call will query every node in the cluster, and return `{supported: true}`
|
|
|
+if all nodes support that specific combination of method, path, query parameters,
|
|
|
+and endpoint capabilities. If any single aspect is not supported,
|
|
|
+the query will return `{supported: false}`. If there are any problems
|
|
|
+communicating with nodes in the cluster, the response will be `{supported: null}`
|
|
|
+indicating support or lack thereof cannot currently be determined.
|
|
|
+Capabilities can be checked using the clusterHasCapability method in ESRestTestCase.
|
|
|
+
|
|
|
+Similar to cluster features, YAML tests can have skip and requires conditions
|
|
|
+specified with capabilities like the following:
|
|
|
+
|
|
|
+ - requires:
|
|
|
+ capabilities:
|
|
|
+ - method: GET
|
|
|
+ path: /_endpoint
|
|
|
+ parameters: [param1, param2]
|
|
|
+ capabilities: [cap1, cap2]
|
|
|
+
|
|
|
+method: GET is the default, and does not need to be explicitly specified.
|