Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/29/zh: Difference between revisions

Latest revision as of 08:52, 19 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

'''To Fine-tune LLMs or not?''' A key decision is whether to fine-tune LLMs, balancing the use of foundational models with domain-specific customizations. One size doesn’t fit all when it comes to LLMs. Some use cases may work well with foundational models, while others require customization. When considering customization, several options are available, including prompt engineering, P-tuning, parameter-efficient fine-tuning (PEFT), and full fine-tuning (FT). Fine-tuning requires significant investment in data labeling, training, and evaluations, each of which can be time-consuming and costly. Automating testing and quality evaluation processes become critical to ensuring efficiency and accuracy when customizing LLMs. Figure [[#S3.F3|3]] shows the accuracy vs latency tradeoff evaluations we have done comparing OpenAI’s GPT-4 model with some of the open-source models on about 245 queries from NVHelp bot domain. Our results show that the Llama3-70B model excels in several aspects of answer quality while maintaining acceptable latency.

是否微调大型语言模型（LLMs）？这是一个关键决策，涉及在使用基础模型与领域特定定制之间的平衡。对于LLMs来说，没有一种通用的解决方案。一些用例可能适合使用基础模型，而另一些则需要定制。在考虑定制时，有多种选择，包括提示工程、P-tuning、参数高效微调（PEFT）和完全微调（FT）。微调需要在数据标注、训练和评估方面进行大量投资，这些过程都可能耗时且成本高昂。自动化测试和质量评估过程对于确保在定制LLMs时的效率和准确性至关重要。图3展示了我们在NVHelp bot领域的约245个查询中，比较OpenAI的GPT-4模型与一些开源模型的准确性与延迟权衡评估。我们的结果显示，Llama3-70B模型在保持可接受延迟的同时，在多个方面的答案质量上表现出色。