Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/4/zh: Difference between revisions

Latest revision as of 08:52, 19 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

Enterprise chatbots, powered by generative AI, are rapidly emerging as the most explored initial applications of this technology in the industry, aimed at enhancing employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), Langchain/Llamaindex types of LLM orchestration frameworks serve as key technological components in building generative-AI based chatbots. However, building successful enterprise chatbots is not easy. They require meticulous engineering of RAG pipelines. This includes fine-tuning semantic embeddings and LLMs, extracting relevant documents from vector databases, rephrasing queries, reranking results, designing effective prompts, honoring document access controls, providing concise responses, including pertinent references, safeguarding personal information, and building agents to orchestrate all these activities. In this paper, we present a framework for building effective RAG-based chatbots based on our first-hand experience of building three chatbots at NVIDIA: chatbots for IT and HR benefits, company financial earnings, and general enterprise content. Our contributions in this paper are three-fold. First, we introduce our FACTS framework for building enterprise-grade RAG-based chatbots that address the challenges mentioned. FACTS mnemonic refers to the five dimensions that RAG-based chatbots must get right - namely content freshness (F), architectures (A), cost economics of LLMs (C), testing (T), and security (S). Second, we present fifteen control points of RAG pipelines and techniques for optimizing chatbots’ performance at each stage. Finally, we present empirical results from our enterprise data on the accuracy-latency tradeoffs between large LLMs vs small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots.

企业聊天机器人，由生成式人工智能驱动，正在迅速成为该技术在行业中最受关注的初始应用，旨在提高员工生产力。检索增强生成（RAG）、大型语言模型（LLM）、Langchain/Llamaindex类型的LLM编排框架是构建生成式AI聊天机器人的关键技术组件。然而，构建成功的企业聊天机器人并不容易。它们需要对RAG管道进行精细的工程设计。这包括微调语义嵌入和LLM，从向量数据库中提取相关文档，重述查询，重新排序结果，设计有效的提示，遵循文档访问控制，提供简明的响应，包含相关参考，保护个人信息，并构建代理来协调所有这些活动。在本文中，我们基于在NVIDIA构建三个聊天机器人的第一手经验，提出了一个构建有效RAG聊天机器人的框架：用于IT和HR福利、公司财务收益和一般企业内容的聊天机器人。我们的贡献有三方面。首先，我们介绍了我们的FACTS框架，用于构建企业级RAG聊天机器人，以解决上述挑战。FACTS助记符指的是RAG聊天机器人必须正确处理的五个维度——即内容新鲜度（F）、架构（A）、LLM的成本经济性（C）、测试（T）和安全性（S）。其次，我们提出了RAG管道的十五个控制点以及在每个阶段优化聊天机器人性能的技术。最后，我们展示了来自企业数据的实证结果，关于大型LLM与小型LLM之间的准确性-延迟权衡。据我们所知，这是第一篇提供全面视角的论文，涵盖了构建安全企业级聊天机器人的因素及解决方案。