Search Pipelines#

The Aryn Conversational Search Stack uses OpenSearch Search Pipelines to orchestrate interactions with LLMs using retrieval-augmented generation (RAG). Search Pipelines are a feature in OpenSearch that allows a user to add pre- and post-processing steps in the search path. It allows server-side computation on search results. A search pipeline is made up of search ‘processors’, which represent individual processing steps in the pipeline.

They Aryn stack uses both the RAG Search Processor and Hybrid Search Processor in a single Search Pipeline per query. We describe each processor below, and conclude with how to use them together. However, the pipeline is composable, and you can choose to use different search technqiues with the RAG Search Processor.

Retrieval Augmented Generation#

Aryn added the Retrieval Augmented Generation Search Processor to OpenSearch v2.10 for conversational search. This is a response processor, meaning that it executes after the OpenSearch search process.

Untitled

The diagram above showes the flow of the RAG Search Processor.

  1. The results from a hybrid search query are retrieved as the search context

  2. The previous interactions from conversational memory are retrieved as conversation context

  3. The processor constructs a prompt for an LLM using the search context, conversational context, and prompt template. It sends this to the LLM and gets a response

  4. The response is added to the question and additional metadata, and saved in conversational memory as an interaction

  5. The generative response and list of hybrid search results are returned to the application

If a conversation ID wasn’t supplied (see here), then the processor will not retrieve the conversational context or add an interactoin to conversational memory.

To create a RAG pipeline, you must first have a remote LLM-wrapper deployed in ml-commons. Then, for example, to create a RAG pipeline called rag_pipeline using OpenAI GPT-3.5-Turbo,

PUT /_search/pipeline/rag_pipeline
{
  "description": "Retrieval Augmented Generation Pipeline",
  "response_processors": [
    {
      "retrieval_augmented_generation": {
        "tag": "openai_pipeline_demo",
        "model_id": "<remote_model_id>",
        "context_field_list": [
          "text_representation"
        ],
        "llm_model": "gpt-3.5-turbo"
      }
    }
  ]
}

To use this processor, simply add this to your OpenSearch query

GET <index_name>/_search
{
  "query": {
    "neural": {
      "embedding": {
        "query_text": "Who wrote the book of love",
        "model_id": "<embedding model id>",
        "k": 100
      }
    }
  },
  "ext": {
    "generative_qa_parameters": {
      "llm_question": "Who wrote the book of love?"
    }
  }
}

The resulting LLM answer is in response.ext

Creating the full Search Pipeline#

For the Aryn Stack, we generally use a Search Pipeline with Hybrid Search and RAG. The best way to get good answers from the LLM is to put the best search results into the LLM, and Hybrid Search often gives you the best search results. To create this Search Pipeline, you can simply create a pipeline with both search processors:

PUT /_search/pipeline/rag_hybrid_pipeline
{
  "description": "Hybrid RAG Pipeline",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.111, 0.889]
          }
        }
      }
    }
  ],
  "response_processors": [
    {
      "retrieval_augmented_generation": {
        "tag": "openai_pipeline_demo",
        "model_id": "<remote_model_id>",
        "context_field_list": [
          "text_representation"
        ],
        "llm_model": "gpt-3.5-turbo"
      }
    }
  ]
}

And then querying also looks like the union of the two queries:

GET <index-name>/_search?search_pipeline=hybrid_pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Who wrote the book of love?"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Who wrote the book of love?",
              "model_id": "<embedding model id>",
              "k": 100
            }
          }
        }
      ]
    }
  },
  "ext": {
    "generative_qa_parameters": {
      "llm_question": "Who wrote the book of love?"
    }
  }
}