Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/4/pt: Difference between revisions

Latest revision as of 07:29, 20 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

Enterprise chatbots, powered by generative AI, are rapidly emerging as the most explored initial applications of this technology in the industry, aimed at enhancing employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), Langchain/Llamaindex types of LLM orchestration frameworks serve as key technological components in building generative-AI based chatbots. However, building successful enterprise chatbots is not easy. They require meticulous engineering of RAG pipelines. This includes fine-tuning semantic embeddings and LLMs, extracting relevant documents from vector databases, rephrasing queries, reranking results, designing effective prompts, honoring document access controls, providing concise responses, including pertinent references, safeguarding personal information, and building agents to orchestrate all these activities. In this paper, we present a framework for building effective RAG-based chatbots based on our first-hand experience of building three chatbots at NVIDIA: chatbots for IT and HR benefits, company financial earnings, and general enterprise content. Our contributions in this paper are three-fold. First, we introduce our FACTS framework for building enterprise-grade RAG-based chatbots that address the challenges mentioned. FACTS mnemonic refers to the five dimensions that RAG-based chatbots must get right - namely content freshness (F), architectures (A), cost economics of LLMs (C), testing (T), and security (S). Second, we present fifteen control points of RAG pipelines and techniques for optimizing chatbots’ performance at each stage. Finally, we present empirical results from our enterprise data on the accuracy-latency tradeoffs between large LLMs vs small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots.

Os chatbots empresariais, impulsionados por IA generativa, estão rapidamente emergindo como as aplicações iniciais mais exploradas dessa tecnologia na indústria, com o objetivo de aumentar a produtividade dos funcionários. A Geração Aumentada por Recuperação (RAG), Modelos de Linguagem de Grande Escala (LLMs), e frameworks de orquestração de LLMs do tipo Langchain/Llamaindex servem como componentes tecnológicos chave na construção de chatbots baseados em IA generativa. No entanto, construir chatbots empresariais bem-sucedidos não é fácil. Eles exigem uma engenharia meticulosa dos pipelines de RAG. Isso inclui o ajuste fino de embeddings semânticos e LLMs, extração de documentos relevantes de bancos de dados vetoriais, reformulação de consultas, reclassificação de resultados, design de prompts eficazes, respeito aos controles de acesso a documentos, fornecimento de respostas concisas, inclusão de referências pertinentes, proteção de informações pessoais e construção de agentes para orquestrar todas essas atividades. Neste artigo, apresentamos uma estrutura para construir chatbots eficazes baseados em RAG com base em nossa experiência prática na construção de três chatbots na NVIDIA: chatbots para benefícios de TI e RH, ganhos financeiros da empresa e conteúdo empresarial geral. Nossas contribuições neste artigo são três. Primeiro, introduzimos nossa estrutura FACTS para construir chatbots empresariais baseados em RAG que abordam os desafios mencionados. O mnemônico FACTS refere-se às cinco dimensões que os chatbots baseados em RAG devem acertar - a saber, frescor do conteúdo (F), arquiteturas (A), economia de custos dos LLMs (C), testes (T) e segurança (S). Em segundo lugar, apresentamos quinze pontos de controle dos pipelines de RAG e técnicas para otimizar o desempenho dos chatbots em cada estágio. Finalmente, apresentamos resultados empíricos de nossos dados empresariais sobre as compensações entre precisão e latência entre grandes LLMs e pequenos LLMs. Até onde sabemos, este é o primeiro artigo desse tipo que fornece uma visão holística dos fatores, bem como soluções para construir chatbots empresariais seguros.