generate-embeddings.asciidoc 2.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
  1. ////
  2. [source,console]
  3. ----
  4. DELETE _ingest/pipeline/my-text-embeddings-pipeline
  5. ----
  6. // TEST
  7. // TEARDOWN
  8. ////
  9. // tag::elser[]
  10. This is how an ingest pipeline that uses the ELSER model is created:
  11. [source,console]
  12. ----
  13. PUT _ingest/pipeline/my-text-embeddings-pipeline
  14. {
  15. "description": "Text embedding pipeline",
  16. "processors": [
  17. {
  18. "inference": {
  19. "model_id": ".elser_model_2",
  20. "target_field": "my_embeddings",
  21. "field_map": { <1>
  22. "my_text_field": "text_field"
  23. },
  24. "inference_config": {
  25. "text_expansion": { <2>
  26. "results_field": "tokens"
  27. }
  28. }
  29. }
  30. }
  31. ]
  32. }
  33. ----
  34. <1> The `field_map` object maps the input document field name (which is
  35. `my_text_field` in this example) to the name of the field that the model expects
  36. (which is always `text_field`).
  37. <2> The `text_expansion` inference type needs to be used in the inference ingest
  38. processor.
  39. To ingest data through the pipeline to generate tokens with ELSER, refer to the
  40. <<reindexing-data-elser>> section of the tutorial. After you successfully
  41. ingested documents by using the pipeline, your index will contain the tokens
  42. generated by ELSER.
  43. // end::elser[]
  44. // tag::dense-vector[]
  45. This is how an ingest pipeline that uses a text embedding model is created:
  46. [source,console]
  47. ----
  48. PUT _ingest/pipeline/my-text-embeddings-pipeline
  49. {
  50. "description": "Text embedding pipeline",
  51. "processors": [
  52. {
  53. "inference": {
  54. "model_id": "sentence-transformers__msmarco-minilm-l-12-v3", <1>
  55. "target_field": "my_embeddings",
  56. "field_map": { <2>
  57. "my_text_field": "text_field"
  58. }
  59. }
  60. }
  61. ]
  62. }
  63. ----
  64. <1> The model ID of the text embedding model you want to use.
  65. <2> The `field_map` object maps the input document field name (which is
  66. `my_text_field` in this example) to the name of the field that the model expects
  67. (which is always `text_field`).
  68. To ingest data through the pipeline to generate text embeddings with your chosen
  69. model, refer to the
  70. {ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an inference ingest pipeline]
  71. section. The example shows how to create the pipeline with the inference
  72. processor and reindex your data through the pipeline. After you successfully
  73. ingested documents by using the pipeline, your index will contain the text
  74. embeddings generated by the model.
  75. // end::dense-vector[]