Browse Source

Add some information on locale database to the ES docs (#113587)

Simon Cooper 1 year ago
parent
commit
53d9c3cc6a

+ 8 - 0
docs/reference/ingest/processors/date.asciidoc

@@ -67,3 +67,11 @@ the timezone and locale values.
 }
 --------------------------------------------------
 // NOTCONSOLE
+
+[WARNING]
+====
+// tag::locale-warning[]
+The text strings accepted by textual date formats, and calculations for week-dates, depend on the JDK version
+that Elasticsearch is running on. For more information see <<custom-date-format-locales,custom date formats>>.
+// end::locale-warning[]
+====

+ 44 - 8
docs/reference/mapping/params/format.asciidoc

@@ -31,8 +31,38 @@ down to the nearest day.
 [[custom-date-formats]]
 ==== Custom date formats
 
-Completely customizable date formats are supported. The syntax for these is explained
-https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html[DateTimeFormatter docs].
+Completely customizable date formats are supported. The syntax for these is explained in
+https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/format/DateTimeFormatter.html[DateTimeFormatter docs].
+
+[[custom-date-format-locales]]
+===== Differences in locale information between JDK versions
+
+There can be some differences in date formats between JDK versions and different locales. In particular,
+there can be differences in text strings used for textual date formats, and there can be differences
+in the results of week-date calculations.
+
+There can be differences in text strings used by the following field specifiers:
+
+* `B`, `E`, `G`, `O`, `a`, `v`, `z` of any length
+* `L`, `M`, `Q`, `q`, `c`, `e` of length 3 or greater
+* `Z` of length 4
+
+If the text format changes between Elasticsearch or JDK versions, it can cause significant problems
+with ingest, output, and re-indexing. It is recommended to always use numerical fields in custom date formats,
+which are not affected by locale information.
+
+There can also be differences in week-date calculations using the `Y`, `W`, and `w` field specifiers.
+The underlying data used to calculate week-dates can vary depending on the JDK version and locale;
+this can cause differences in the calculated week-date for the same calendar dates.
+It is recommended that the built-in week-date formats are used, which will always use ISO rules
+for calculating week-dates.
+
+In particular, there is a significant change in locale information between JDK releases 22 and 23.
+Elasticsearch will use the _COMPAT_ locale database when run on JDK 22 and before,
+and will use the _CLDR_ locale database when run on JDK 23 and above. This change can cause significant differences
+to the textual date formats accepted by Elasticsearch, and to calculated week-dates. If you are using
+affected specifiers, you may need to modify your ingest or output integration code to account
+for the differences between these two JDK versions.
 
 [[built-in-date-formats]]
 ==== Built In Formats
@@ -256,31 +286,37 @@ The following tables lists all the defaults ISO formats supported:
 `week_date` or `strict_week_date`::
 
     A formatter for a full date as four digit weekyear, two digit week of
-    weekyear, and one digit day of week: `xxxx-'W'ww-e`.
+    weekyear, and one digit day of week: `YYYY-'W'ww-e`.
+    This uses the ISO week-date definition.
 
 `week_date_time` or `strict_week_date_time`::
 
     A formatter that combines a full weekyear date and time, separated by a
-    'T': `xxxx-'W'ww-e'T'HH:mm:ss.SSSZ`.
+    'T': `YYYY-'W'ww-e'T'HH:mm:ss.SSSZ`.
+    This uses the ISO week-date definition.
 
 `week_date_time_no_millis` or `strict_week_date_time_no_millis`::
 
     A formatter that combines a full weekyear date and time without millis,
-    separated by a 'T': `xxxx-'W'ww-e'T'HH:mm:ssZ`.
+    separated by a 'T': `YYYY-'W'ww-e'T'HH:mm:ssZ`.
+    This uses the ISO week-date definition.
 
 `weekyear` or `strict_weekyear`::
 
-    A formatter for a four digit weekyear: `xxxx`.
+    A formatter for a four digit weekyear: `YYYY`.
+    This uses the ISO week-date definition.
 
 `weekyear_week` or `strict_weekyear_week`::
 
     A formatter for a four digit weekyear and two digit week of weekyear:
-    `xxxx-'W'ww`.
+    `YYYY-'W'ww`.
+    This uses the ISO week-date definition.
 
 `weekyear_week_day` or `strict_weekyear_week_day`::
 
     A formatter for a four digit weekyear, two digit week of weekyear, and one
-    digit day of week: `xxxx-'W'ww-e`.
+    digit day of week: `YYYY-'W'ww-e`.
+    This uses the ISO week-date definition.
 
 `year` or `strict_year`::
 

+ 9 - 1
docs/reference/mapping/types/date.asciidoc

@@ -81,6 +81,14 @@ on those dates so they should be avoided.
 // end::decimal-warning[]
 ====
 
+[WARNING]
+====
+// tag::locale-warning[]
+The text strings accepted by textual date formats, and calculations for week-dates, depend on the JDK version
+that Elasticsearch is running on. For more information see <<custom-date-format-locales,custom date formats>>.
+// end::locale-warning[]
+====
+
 [[multiple-date-formats]]
 ==== Multiple date formats
 
@@ -126,7 +134,7 @@ The following parameters are accepted by `date` fields:
 
     The locale to use when parsing dates since months do not have the same names
     and/or abbreviations in all languages. The default is the
-    https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html#ROOT[`ROOT` locale],
+    https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html#ROOT[`ROOT` locale].
 
 <<ignore-malformed,`ignore_malformed`>>::
 

+ 18 - 1
docs/reference/migration/migrate_8_16.asciidoc

@@ -16,5 +16,22 @@ coming::[8.16.0]
 [[breaking-changes-8.16]]
 === Breaking changes
 
-There are no breaking changes in {es} 8.16.
+The following changes in {es} 8.16 might affect your applications
+and prevent them from operating normally.
+Before upgrading to 8.16, review these changes and take the described steps
+to mitigate the impact.
 
+[discrete]
+[[breaking_816_locale_change]]
+==== JDK locale database change
+
+{es} 8.16 changes the version of the JDK that is included from version 22 to version 23. This changes
+the locale database that is used by Elasticsearch from the _COMPAT_ database to the _CLDR_ database.
+This can result in significant changes to custom textual date field formats,
+and calculations for custom week-date date fields.
+
+For more information see <<custom-date-format-locales,custom date formats>>.
+
+If you run {es} 8.16 on JDK version 22 or below, it will use the _COMPAT_ locale database
+to match the behavior of 8.15. However, please note that starting with {es} 9.0,
+{es} will use the _CLDR_ database regardless of JDK version it is run on.

+ 16 - 3
docs/reference/setup/install.asciidoc

@@ -5,8 +5,8 @@
 [[hosted-elasticsearch-service]]
 === Hosted Elasticsearch Service
 
-{ecloud} offers all of the features of {es}, {kib}, and  Elastic’s {observability}, {ents}, and {elastic-sec} solutions as a hosted service 
-available on AWS, GCP, and Azure. 
+{ecloud} offers all of the features of {es}, {kib}, and  Elastic’s {observability}, {ents}, and {elastic-sec} solutions as a hosted service
+available on AWS, GCP, and Azure.
 
 To set up Elasticsearch in {ecloud}, sign up for a {ess-trial}[free {ecloud} trial].
 
@@ -17,7 +17,7 @@ To set up Elasticsearch in {ecloud}, sign up for a {ess-trial}[free {ecloud} tri
 If you want to install and manage {es} yourself, you can:
 
 * Run {es} using a <<elasticsearch-install-packages,Linux, MacOS, or Windows install package>>.
-* Run {es} in a <<elasticsearch-docker-images,Docker container>>. 
+* Run {es} in a <<elasticsearch-docker-images,Docker container>>.
 * Set up and manage {es}, {kib}, {agent}, and the rest of the Elastic Stack on Kubernetes with {eck-ref}[{eck}].
 
 TIP: To try out Elasticsearch on your own machine, we recommend using Docker and running both Elasticsearch and Kibana. For more information, see <<run-elasticsearch-locally,Run Elasticsearch locally>>. Please note that this setup is *not suitable for production use*.
@@ -98,6 +98,19 @@ the bundled JVM are treated as if they were within {es} itself.
 The bundled JVM is located within the `jdk` subdirectory of the {es} home
 directory. You may remove this directory if using your own JVM.
 
+[discrete]
+[[jdk-locale]]
+=== JDK locale database
+
+The locale database used by {es}, used to map from various date formats to
+the underlying date storage format, depends on the version of the JDK
+that {es} is running on. On JDK version 23 and above, {es} will use the
+_CLDR_ database. On JDK version 22 and below, {es} will use the _COMPAT_
+database. This may mean that the strings used for textual date formats,
+and the output of custom week-date formats, may change when moving from
+a previous JDK version to JDK 23 or above. For more information, see
+<<custom-date-format-locales,custom date formats>>.
+
 [discrete]
 [[jvm-agents]]
 === JVM and Java agents

+ 1 - 0
server/src/main/java/org/elasticsearch/common/ReferenceDocs.java

@@ -84,6 +84,7 @@ public enum ReferenceDocs {
     FLOOD_STAGE_WATERMARK,
     X_OPAQUE_ID,
     FORMING_SINGLE_NODE_CLUSTERS,
+    JDK_LOCALE_DIFFERENCES,
     // this comment keeps the ';' on the next line so every entry above has a trailing ',' which makes the diff for adding new links cleaner
     ;
 

+ 7 - 6
server/src/main/java/org/elasticsearch/common/time/DateUtils.java

@@ -9,6 +9,7 @@
 
 package org.elasticsearch.common.time;
 
+import org.elasticsearch.common.ReferenceDocs;
 import org.elasticsearch.common.logging.DeprecationCategory;
 import org.elasticsearch.common.logging.DeprecationLogger;
 import org.elasticsearch.core.Predicates;
@@ -405,18 +406,18 @@ public class DateUtils {
             deprecationLogger.warn(
                 DeprecationCategory.PARSING,
                 "cldr_date_formats_" + format,
-                "Date format [{}] contains textual field specifiers that could change in JDK 23."
-                    + " For more information, see https://ela.st/jdk-23-locales",
-                format
+                "Date format [{}] contains textual field specifiers that could change in JDK 23. See [{}] for more information.",
+                format,
+                ReferenceDocs.JDK_LOCALE_DIFFERENCES
             );
         }
         if (CONTAINS_WEEK_DATE_SPECIFIERS.test(format)) {
             deprecationLogger.warn(
                 DeprecationCategory.PARSING,
                 "cldr_week_dates_" + format,
-                "Date format [{}] contains week-date field specifiers that are changing in JDK 23."
-                    + " For more information, see https://ela.st/jdk-23-locales",
-                format
+                "Date format [{}] contains week-date field specifiers that are changing in JDK 23. See [{}] for more information.",
+                format,
+                ReferenceDocs.JDK_LOCALE_DIFFERENCES
             );
         }
     }

+ 2 - 1
server/src/main/resources/org/elasticsearch/common/reference-docs-links.json

@@ -43,5 +43,6 @@
   "MAX_SHARDS_PER_NODE": "size-your-shards.html#troubleshooting-max-shards-open",
   "FLOOD_STAGE_WATERMARK": "fix-watermark-errors.html",
   "X_OPAQUE_ID": "api-conventions.html#x-opaque-id",
-  "FORMING_SINGLE_NODE_CLUSTERS": "modules-discovery-bootstrap-cluster.html#modules-discovery-bootstrap-cluster-joining"
+  "FORMING_SINGLE_NODE_CLUSTERS": "modules-discovery-bootstrap-cluster.html#modules-discovery-bootstrap-cluster-joining",
+  "JDK_LOCALE_DIFFERENCES": "mapping-date-format.html#custom-date-format-locales"
 }