Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/68/ja: Difference between revisions

Latest revision as of 07:13, 20 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

ChipNemo ([[#bib.bib10|10]]) presents evidence for using a domain adapted language model for improving RAG’s performance on domain specific questions. They finetuned the e5-small-unsupervised model with 3,000 domain specific auto-generated samples. We tried fine-tuning e5-large embeddings model in Scout Bot. Our results did not demonstrate significant improvements. We are presently collecting high quality human-annotated data to repeat the experiments. This could be an important direction to explore in the future for our work. Another interesting technique was presented by Setty ''et. al.'' ([[#bib.bib15|15]]), in improving RAG performance using Hypothetical Document Embeddings (HYDE) technique. HyDE uses an LLM to generate a theoretical document when responding to a query and then does the similarity search with both the original question and hypothetical answer. This is a promising approach but might make the architecture complex.

ChipNemo（10）は、ドメイン適応型言語モデルを使用してRAGのドメイン特化型質問に対するパフォーマンスを向上させる証拠を提示しています。彼らは、3,000のドメイン特化型自動生成サンプルを用いてe5-small-unsupervisedモデルを微調整しました。私たちはScout Botでe5-large埋め込みモデルの微調整を試みましたが、結果は大幅な改善を示しませんでした。現在、高品質な人間による注釈付きデータを収集して実験を繰り返しています。これは将来の研究において重要な方向性となる可能性があります。Setty et. al.（15）によって提示されたもう一つの興味深い技術は、Hypothetical Document Embeddings（HYDE）技術を使用してRAGのパフォーマンスを向上させる方法です。HyDEは、クエリに応答する際に理論的な文書を生成し、その後、元の質問と仮説的な回答の両方で類似性検索を行います。これは有望なアプローチですが、アーキテクチャを複雑にする可能性があります。