lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
							// tag::cohere[]

[source,console]
--------------------------------------------------
PUT cohere-embeddings
{
  "mappings": {
    "properties": {
      "content_embedding": { <1>
        "type": "dense_vector", <2>
        "dims": 1024, <3>
        "element_type": "byte"
      },
      "content": { <4>
        "type": "text" <5>
      }
    }
  }
}
--------------------------------------------------
<1> The name of the field to contain the generated tokens. It must be refrenced
in the {infer} pipeline configuration in the next step.
<2> The field to contain the tokens is a `dense_vector` field.
<3> The output dimensions of the model. Find this value in the
https://docs.cohere.com/reference/embed[Cohere documentation] of the model you
use.
<4> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `content`. It must be referenced in
the {infer} pipeline configuration in the next step.
<5> The field type which is text in this example.

// end::cohere[]

// tag::hugging-face[]

[source,console]
--------------------------------------------------
PUT hugging-face-embeddings
{
  "mappings": {
    "properties": {
      "content_embedding": { <1>
        "type": "dense_vector", <2>
        "dims": 768, <3>
        "element_type": "float"
      },
      "content": { <4>
        "type": "text" <5>
      }
    }
  }
}
--------------------------------------------------
<1> The name of the field to contain the generated tokens. It must be referenced
in the {infer} pipeline configuration in the next step.
<2> The field to contain the tokens is a `dense_vector` field.
<3> The output dimensions of the model. Find this value in the
https://huggingface.co/sentence-transformers/all-mpnet-base-v2[HuggingFace model documentation].
<4> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `content`. It must be referenced in
the {infer} pipeline configuration in the next step.
<5> The field type which is text in this example.

// end::hugging-face[]

// tag::openai[]

[source,console]
--------------------------------------------------
PUT openai-embeddings
{
  "mappings": {
    "properties": {
      "content_embedding": { <1>
        "type": "dense_vector", <2>
        "dims": 1536, <3>
        "element_type": "float",
        "similarity": "dot_product" <4>
      },
      "content": { <5>
        "type": "text" <6>
      }
    }
  }
}
--------------------------------------------------
<1> The name of the field to contain the generated tokens. It must be referenced
in the {infer} pipeline configuration in the next step.
<2> The field to contain the tokens is a `dense_vector` field.
<3> The output dimensions of the model. Find this value in the
https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI documentation]
of the model you use.
<4> The faster` dot_product` function can be used to calculate similarity
because OpenAI embeddings are normalised to unit length. You can check the
https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use[OpenAI docs]
about which similarity function to use.
<5> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `content`. It must be referenced in
the {infer} pipeline configuration in the next step.
<6> The field type which is text in this example.

// end::openai[]

// tag::azure-openai[]

[source,console]
--------------------------------------------------
PUT azure-openai-embeddings
{
  "mappings": {
    "properties": {
      "content_embedding": { <1>
        "type": "dense_vector", <2>
        "dims": 1536, <3>
        "element_type": "float",
        "similarity": "dot_product" <4>
      },
      "content": { <5>
        "type": "text" <6>
      }
    }
  }
}
--------------------------------------------------
<1> The name of the field to contain the generated tokens. It must be referenced
in the {infer} pipeline configuration in the next step.
<2> The field to contain the tokens is a `dense_vector` field.
<3> The output dimensions of the model. Find this value in the
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models[Azure OpenAI documentation]
of the model you use.
<4> For Azure OpenAI embeddings, the `dot_product` function should be used to
calculate similarity as Azure OpenAI embeddings are normalised to unit length.
See the
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/understand-embeddings[Azure OpenAI embeddings]
documentation for more information on the model specifications.
<5> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `content`. It must be referenced in
the {infer} pipeline configuration in the next step.
<6> The field type which is text in this example.

// end::azure-openai[]