| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178 | [[query-dsl-mlt-query]]=== More Like This QueryMore like this query find documents that are "like" provided text byrunning it against one or more fields.[source,js]--------------------------------------------------{    "more_like_this" : {        "fields" : ["name.first", "name.last"],        "like" : "text like this one",        "min_term_freq" : 1,        "max_query_terms" : 12    }}--------------------------------------------------More Like This can find documents that are "like" a set ofchosen documents. The syntax to specify one or more documents is similar tothe <<docs-multi-get,Multi GET API>>.If only one document is specified, the query behaves the same as the<<search-more-like-this,More Like This API>>.[source,js]--------------------------------------------------{    "more_like_this" : {        "fields" : ["name.first", "name.last"],        "like" : [        {            "_index" : "test",            "_type" : "type",            "_id" : "1"        },        {            "_index" : "test",            "_type" : "type",            "_id" : "2"        },        "and also some text like this one!"        ],        "min_term_freq" : 1,        "max_query_terms" : 12    }}--------------------------------------------------Additionally, <<docs-termvectors-artificial-doc,artificial documents>> are also supported.This is useful in order to specify one or more documents not present in the index.[source,js]--------------------------------------------------{    "more_like_this" : {        "fields" : ["name.first", "name.last"],        "like" : [        {            "_index" : "test",            "_type" : "type",            "doc" : {                "name": {                    "first": "Ben",                    "last": "Grimm"                },                "tweet": "You got no idea what I'd... what I'd give to be invisible."              }            }        },        {            "_index" : "test",            "_type" : "type",            "_id" : "2"        }        ],        "min_term_freq" : 1,        "max_query_terms" : 12    }}--------------------------------------------------`more_like_this` can be shortened to `mlt`.Under the hood, `more_like_this` simply creates multiple `should` clauses in a `bool` query ofinteresting terms extracted from some provided text. The interesting terms areselected with respect to their tf-idf scores. These are controlled by`min_term_freq`, `min_doc_freq`, and `max_doc_freq`. The number of interestingterms is controlled by `max_query_terms`. While the minimum number of clausesthat must be satisfied is controlled by `minimum_should_match`. The termsare extracted from the text in `like` and analyzed by the analyzer associatedwith the field, unless specified by `analyzer`. There are other parameters,such as `min_word_length`, `max_word_length` or `stop_words`, to control whatterms should be considered as interesting. In order to give more weight tomore interesting terms, each boolean clause associated with a term could beboosted by the term tf-idf score times some boosting factor `boost_terms`.When a search for multiple documents is issued, More Like This generates a`more_like_this` query per document field in `fields`. These `fields` arespecified as a top level parameter or within each document request.IMPORTANT: The fields must be indexed and of type `string`. Additionally, whenusing `like` with documents, the fields must be either `stored`, store `term_vector`or `_source` must be enabled.The `more_like_this` top level parameters include:[cols="<,<",options="header",]|=======================================================================|Parameter |Description|`fields` |A list of the fields to run the more like this query against.Defaults to the `_all` field for text and to all possible fieldsfor documents.|`like`|coming[2.0]Can either be some text, some documents or a combination of all, *required*.A document request follows the same syntax as the<<docs-multi-get,Multi Get API>> or <<docs-multi-termvectors,Multi Term Vectors API>>.In this case, the text is fetched from `fields` unless specified otherwise in each document request.The text is analyzed by the default analyzer at the field, unless overridden by the`per_field_analyzer` parameter of the <<docs-termvectors-per-field-analyzer,Term Vectors API>>.|`like_text` |deprecated[2.0,Replaced by `like`]The text to find documents like it, *required* if `ids` or `docs` arenot specified.|`ids` or `docs` |deprecated[2.0,Replaced by `like`]A list of documents following the same syntax as the<<docs-multi-get,Multi GET API>> or <<docs-multi-termvectors,Multi termvectors API>>.The text is fetched from `fields` unless specified otherwise in each `doc`.The text is analyzed by the default analyzer at the field, unless specified by the`per_field_analyzer` parameter of the <<docs-termvectors-per-field-analyzer,Term Vectors API>>.|`ignore_like`|coming[2.0] The `ignore_like` parameter is used to skip termsfrom the documents specified by `like`.  In other words, we could ask fordocuments `like: "Apple"`, but `ignore_like: "cake crumble tree"`. Follows thesame syntax as `like`.|`include` |When using `like` with document requests, specifies whether the documents should beincluded from the search. Defaults to `false`.|`minimum_should_match`| From the generated query, the number of terms thatmust match following the <<query-dsl-minimum-should-match,minimum shouldsyntax>>. (Defaults to `"30%"`).|`min_term_freq` |The frequency below which terms will be ignored in thesource doc. The default frequency is `2`.|`max_query_terms` |The maximum number of query terms that will beincluded in any generated query. Defaults to `25`.|`stop_words` |An array of stop words. Any word in this set isconsidered "uninteresting" and ignored. Even if your Analyzer allowsstopwords, you might want to tell the MoreLikeThis code to ignore them,as for the purposes of document similarity it seems reasonable to assumethat "a stop word is never interesting".|`min_doc_freq` |The frequency at which words will be ignored which donot occur in at least this many docs. Defaults to `5`.|`max_doc_freq` |The maximum frequency in which words may still appear.Words that appear in more than this many docs will be ignored. Defaultsto unbounded.|`min_word_length` |The minimum word length below which words will beignored. Defaults to `0`.(Old name "min_word_len" is deprecated)|`max_word_length` |The maximum word length above which words will beignored. Defaults to unbounded (`0`). (Old name "max_word_len" is deprecated)|`boost_terms` |Sets the boost factor to use when boosting terms.Defaults to deactivated (`0`). Any other value activates boosting with givenboost factor.|`boost` |Sets the boost value of the query. Defaults to `1.0`.|`analyzer` |The analyzer that will be used to analyze the `like text`.Defaults to the analyzer associated with the first field in `fields`.|=======================================================================
 |