infer-api-mapping.asciidoc 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
  1. // tag::cohere[]
  2. [source,console]
  3. --------------------------------------------------
  4. PUT cohere-embeddings
  5. {
  6. "mappings": {
  7. "properties": {
  8. "content_embedding": { <1>
  9. "type": "dense_vector", <2>
  10. "dims": 1024, <3>
  11. "element_type": "byte"
  12. },
  13. "content": { <4>
  14. "type": "text" <5>
  15. }
  16. }
  17. }
  18. }
  19. --------------------------------------------------
  20. <1> The name of the field to contain the generated tokens. It must be refrenced
  21. in the {infer} pipeline configuration in the next step.
  22. <2> The field to contain the tokens is a `dense_vector` field.
  23. <3> The output dimensions of the model. Find this value in the
  24. https://docs.cohere.com/reference/embed[Cohere documentation] of the model you
  25. use.
  26. <4> The name of the field from which to create the dense vector representation.
  27. In this example, the name of the field is `content`. It must be referenced in
  28. the {infer} pipeline configuration in the next step.
  29. <5> The field type which is text in this example.
  30. // end::cohere[]
  31. // tag::hugging-face[]
  32. [source,console]
  33. --------------------------------------------------
  34. PUT hugging-face-embeddings
  35. {
  36. "mappings": {
  37. "properties": {
  38. "content_embedding": { <1>
  39. "type": "dense_vector", <2>
  40. "dims": 768, <3>
  41. "element_type": "float"
  42. },
  43. "content": { <4>
  44. "type": "text" <5>
  45. }
  46. }
  47. }
  48. }
  49. --------------------------------------------------
  50. <1> The name of the field to contain the generated tokens. It must be referenced
  51. in the {infer} pipeline configuration in the next step.
  52. <2> The field to contain the tokens is a `dense_vector` field.
  53. <3> The output dimensions of the model. Find this value in the
  54. https://huggingface.co/sentence-transformers/all-mpnet-base-v2[HuggingFace model documentation].
  55. <4> The name of the field from which to create the dense vector representation.
  56. In this example, the name of the field is `content`. It must be referenced in
  57. the {infer} pipeline configuration in the next step.
  58. <5> The field type which is text in this example.
  59. // end::hugging-face[]
  60. // tag::openai[]
  61. [source,console]
  62. --------------------------------------------------
  63. PUT openai-embeddings
  64. {
  65. "mappings": {
  66. "properties": {
  67. "content_embedding": { <1>
  68. "type": "dense_vector", <2>
  69. "dims": 1536, <3>
  70. "element_type": "float",
  71. "similarity": "dot_product" <4>
  72. },
  73. "content": { <5>
  74. "type": "text" <6>
  75. }
  76. }
  77. }
  78. }
  79. --------------------------------------------------
  80. <1> The name of the field to contain the generated tokens. It must be referenced
  81. in the {infer} pipeline configuration in the next step.
  82. <2> The field to contain the tokens is a `dense_vector` field.
  83. <3> The output dimensions of the model. Find this value in the
  84. https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI documentation]
  85. of the model you use.
  86. <4> The faster` dot_product` function can be used to calculate similarity
  87. because OpenAI embeddings are normalised to unit length. You can check the
  88. https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use[OpenAI docs]
  89. about which similarity function to use.
  90. <5> The name of the field from which to create the dense vector representation.
  91. In this example, the name of the field is `content`. It must be referenced in
  92. the {infer} pipeline configuration in the next step.
  93. <6> The field type which is text in this example.
  94. // end::openai[]
  95. // tag::azure-openai[]
  96. [source,console]
  97. --------------------------------------------------
  98. PUT azure-openai-embeddings
  99. {
  100. "mappings": {
  101. "properties": {
  102. "content_embedding": { <1>
  103. "type": "dense_vector", <2>
  104. "dims": 1536, <3>
  105. "element_type": "float",
  106. "similarity": "dot_product" <4>
  107. },
  108. "content": { <5>
  109. "type": "text" <6>
  110. }
  111. }
  112. }
  113. }
  114. --------------------------------------------------
  115. <1> The name of the field to contain the generated tokens. It must be referenced
  116. in the {infer} pipeline configuration in the next step.
  117. <2> The field to contain the tokens is a `dense_vector` field.
  118. <3> The output dimensions of the model. Find this value in the
  119. https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models[Azure OpenAI documentation]
  120. of the model you use.
  121. <4> For Azure OpenAI embeddings, the `dot_product` function should be used to
  122. calculate similarity as Azure OpenAI embeddings are normalised to unit length.
  123. See the
  124. https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/understand-embeddings[Azure OpenAI embeddings]
  125. documentation for more information on the model specifications.
  126. <5> The name of the field from which to create the dense vector representation.
  127. In this example, the name of the field is `content`. It must be referenced in
  128. the {infer} pipeline configuration in the next step.
  129. <6> The field type which is text in this example.
  130. // end::azure-openai[]