Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/46/ja: Difference between revisions

Latest revision as of 07:13, 20 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

'''Bigger Vs. Smaller Models''': Larger, commercial LLMs, smaller open source LLMs are increasingly becoming viable for many use cases, thereby offering cost effective alternatives to companies. As open-source models are catching up with larger, commercial models, they are increasingly offering close-comparable accuracy, as demonstrated in our NVHelp bot empirical evaluation in Figure [[#S3.F3|3]], and generally have better latency performance compared to larger models. Additionally, GPU optimization of inference models can further speed up processing times. Open-source models optimized with NVIDIA’s Tensor RT-LLM inference libraries, for instance, have shown faster performance than non-optimized models. These strategies help balance the need for cost-efficiency with maintaining high performance and security standards.

大規模モデル対小規模モデル: 大規模な商用LLMと小規模なオープンソースLLMは、多くのユースケースにおいてますます実用的になり、企業にとってコスト効率の良い代替手段を提供しています。オープンソースモデルは大規模な商用モデルに追いつきつつあり、我々のNVHelpボットの実証評価で示されているように、精度がほぼ同等であることが増えています（図3）。また、一般的に大規模モデルと比較してレイテンシー性能が優れています。さらに、推論モデルのGPU最適化により処理時間をさらに短縮することができます。例えば、NVIDIAのTensor RT-LLM推論ライブラリで最適化されたオープンソースモデルは、非最適化モデルよりも高速なパフォーマンスを示しています。これらの戦略は、コスト効率の必要性と高いパフォーマンスおよびセキュリティ基準の維持を両立させるのに役立ちます。