Jelajahi Sumber

ESQL: Document warnings behavior in CsvTests (#125441)

The `CsvTests` has a slight difference regarding warnings from real
Elasticsearch indices and this is worth documenting. I've also added an
explanation to `SingleValueMatchQuery` that explains *exactly* when it
makes a warning because it's not *exactly* the same as when the compute
engine would make a warning. The resulting documents are the same - but
the warnings are not.
Nik Everett 6 bulan lalu
induk
melakukan
f83ca0c6b7

+ 11 - 1
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/querydsl/query/SingleValueMatchQuery.java

@@ -37,7 +37,17 @@ import java.io.IOException;
 import java.util.Objects;
 
 /**
- * Finds all fields with a single-value. If a field has a multi-value, it emits a {@link Warnings}.
+ * Finds all fields with a single-value. If a field has a multi-value, it emits
+ * a {@link Warnings warning}.
+ * <p>
+ *     Warnings are only emitted if the {@link TwoPhaseIterator#matches}. Meaning that,
+ *     if the other query skips the doc either because the index doesn't match or because it's
+ *     {@link TwoPhaseIterator#matches} doesn't match, then we won't log warnings. So it's
+ *     most safe to say that this will emit a warning if the document would have
+ *     matched but for having a multivalued field. If the document doesn't match but
+ *     "almost" matches in some fairly lucene-specific ways then it *might* emit
+ *     a warning.
+ * </p>
  */
 public final class SingleValueMatchQuery extends Query {
 

+ 12 - 0
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/CsvTests.java

@@ -30,6 +30,7 @@ import org.elasticsearch.compute.operator.Driver;
 import org.elasticsearch.compute.operator.DriverRunner;
 import org.elasticsearch.compute.operator.exchange.ExchangeSinkHandler;
 import org.elasticsearch.compute.operator.exchange.ExchangeSourceHandler;
+import org.elasticsearch.compute.querydsl.query.SingleValueMatchQuery;
 import org.elasticsearch.core.Releasables;
 import org.elasticsearch.core.Tuple;
 import org.elasticsearch.index.IndexMode;
@@ -153,6 +154,17 @@ import static org.hamcrest.Matchers.notNullValue;
  * it’s creating its own Source physical operator, aggregation operator (just a tiny bit of it) and field extract operator.
  * <p>
  * To log the results logResults() should return "true".
+ * <p>
+ * This test never pushes to Lucene because there isn't a Lucene index to push to. It always runs everything in
+ * the compute engine. This yields the same results modulo a few things:
+ * <ul>
+ *     <li>Warnings for multivalued fields: See {@link SingleValueMatchQuery} for an in depth discussion, but the
+ *         short version is this class will always emit warnings on multivalued fields but tests that run against
+ *         a real index are only guaranteed to emit a warning if the document would match all filters <strong>except</strong>
+ *         it has a multivalue field.</li>
+ *     <li>Sorting: This class emits values in the order they appear in the {@code .csv} files that power it. A real
+ *         index emits documents a fair random order. Multi-shard and multi-node tests doubly so.</li>
+ * </ul>
  */
 // @TestLogging(value = "org.elasticsearch.xpack.esql:TRACE,org.elasticsearch.compute:TRACE", reason = "debug")
 public class CsvTests extends ESTestCase {