| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141 |
- // tag::cohere[]
- [source,console]
- --------------------------------------------------
- PUT cohere-embeddings
- {
- "mappings": {
- "properties": {
- "content_embedding": { <1>
- "type": "dense_vector", <2>
- "dims": 1024, <3>
- "element_type": "byte"
- },
- "content": { <4>
- "type": "text" <5>
- }
- }
- }
- }
- --------------------------------------------------
- <1> The name of the field to contain the generated tokens. It must be refrenced
- in the {infer} pipeline configuration in the next step.
- <2> The field to contain the tokens is a `dense_vector` field.
- <3> The output dimensions of the model. Find this value in the
- https://docs.cohere.com/reference/embed[Cohere documentation] of the model you
- use.
- <4> The name of the field from which to create the dense vector representation.
- In this example, the name of the field is `content`. It must be referenced in
- the {infer} pipeline configuration in the next step.
- <5> The field type which is text in this example.
- // end::cohere[]
- // tag::hugging-face[]
- [source,console]
- --------------------------------------------------
- PUT hugging-face-embeddings
- {
- "mappings": {
- "properties": {
- "content_embedding": { <1>
- "type": "dense_vector", <2>
- "dims": 768, <3>
- "element_type": "float"
- },
- "content": { <4>
- "type": "text" <5>
- }
- }
- }
- }
- --------------------------------------------------
- <1> The name of the field to contain the generated tokens. It must be referenced
- in the {infer} pipeline configuration in the next step.
- <2> The field to contain the tokens is a `dense_vector` field.
- <3> The output dimensions of the model. Find this value in the
- https://huggingface.co/sentence-transformers/all-mpnet-base-v2[HuggingFace model documentation].
- <4> The name of the field from which to create the dense vector representation.
- In this example, the name of the field is `content`. It must be referenced in
- the {infer} pipeline configuration in the next step.
- <5> The field type which is text in this example.
- // end::hugging-face[]
- // tag::openai[]
- [source,console]
- --------------------------------------------------
- PUT openai-embeddings
- {
- "mappings": {
- "properties": {
- "content_embedding": { <1>
- "type": "dense_vector", <2>
- "dims": 1536, <3>
- "element_type": "float",
- "similarity": "dot_product" <4>
- },
- "content": { <5>
- "type": "text" <6>
- }
- }
- }
- }
- --------------------------------------------------
- <1> The name of the field to contain the generated tokens. It must be referenced
- in the {infer} pipeline configuration in the next step.
- <2> The field to contain the tokens is a `dense_vector` field.
- <3> The output dimensions of the model. Find this value in the
- https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI documentation]
- of the model you use.
- <4> The faster` dot_product` function can be used to calculate similarity
- because OpenAI embeddings are normalised to unit length. You can check the
- https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use[OpenAI docs]
- about which similarity function to use.
- <5> The name of the field from which to create the dense vector representation.
- In this example, the name of the field is `content`. It must be referenced in
- the {infer} pipeline configuration in the next step.
- <6> The field type which is text in this example.
- // end::openai[]
- // tag::azure-openai[]
- [source,console]
- --------------------------------------------------
- PUT azure-openai-embeddings
- {
- "mappings": {
- "properties": {
- "content_embedding": { <1>
- "type": "dense_vector", <2>
- "dims": 1536, <3>
- "element_type": "float",
- "similarity": "dot_product" <4>
- },
- "content": { <5>
- "type": "text" <6>
- }
- }
- }
- }
- --------------------------------------------------
- <1> The name of the field to contain the generated tokens. It must be referenced
- in the {infer} pipeline configuration in the next step.
- <2> The field to contain the tokens is a `dense_vector` field.
- <3> The output dimensions of the model. Find this value in the
- https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models[Azure OpenAI documentation]
- of the model you use.
- <4> For Azure OpenAI embeddings, the `dot_product` function should be used to
- calculate similarity as Azure OpenAI embeddings are normalised to unit length.
- See the
- https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/understand-embeddings[Azure OpenAI embeddings]
- documentation for more information on the model specifications.
- <5> The name of the field from which to create the dense vector representation.
- In this example, the name of the field is `content`. It must be referenced in
- the {infer} pipeline configuration in the next step.
- <6> The field type which is text in this example.
- // end::azure-openai[]
|