Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/46/zh: Difference between revisions

Latest revision as of 08:52, 19 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

'''Bigger Vs. Smaller Models''': Larger, commercial LLMs, smaller open source LLMs are increasingly becoming viable for many use cases, thereby offering cost effective alternatives to companies. As open-source models are catching up with larger, commercial models, they are increasingly offering close-comparable accuracy, as demonstrated in our NVHelp bot empirical evaluation in Figure [[#S3.F3|3]], and generally have better latency performance compared to larger models. Additionally, GPU optimization of inference models can further speed up processing times. Open-source models optimized with NVIDIA’s Tensor RT-LLM inference libraries, for instance, have shown faster performance than non-optimized models. These strategies help balance the need for cost-efficiency with maintaining high performance and security standards.

《大模型与小模型》：大型商业LLM和小型开源LLM在许多用例中越来越具有可行性，从而为公司提供了具有成本效益的替代方案。随着开源模型逐渐赶上大型商业模型，它们在准确性上越来越接近，如我们在NVHelp机器人实证评估中的图3所示，并且通常在延迟性能上优于大型模型。此外，GPU优化的推理模型可以进一步加快处理时间。例如，使用NVIDIA的Tensor RT-LLM推理库优化的开源模型，性能比未优化的模型更快。这些策略有助于在保持高性能和安全标准的同时，平衡成本效益的需求。