Translations:FACTS About Building Retrieval Augmented Generation-based Chatbots/29/ko: Difference between revisions

Latest revision as of 07:19, 20 February 2025

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (FACTS About Building Retrieval Augmented Generation-based Chatbots)

'''To Fine-tune LLMs or not?''' A key decision is whether to fine-tune LLMs, balancing the use of foundational models with domain-specific customizations. One size doesn’t fit all when it comes to LLMs. Some use cases may work well with foundational models, while others require customization. When considering customization, several options are available, including prompt engineering, P-tuning, parameter-efficient fine-tuning (PEFT), and full fine-tuning (FT). Fine-tuning requires significant investment in data labeling, training, and evaluations, each of which can be time-consuming and costly. Automating testing and quality evaluation processes become critical to ensuring efficiency and accuracy when customizing LLMs. Figure [[#S3.F3|3]] shows the accuracy vs latency tradeoff evaluations we have done comparing OpenAI’s GPT-4 model with some of the open-source models on about 245 queries from NVHelp bot domain. Our results show that the Llama3-70B model excels in several aspects of answer quality while maintaining acceptable latency.

LLM을 미세 조정할 것인가, 말 것인가? LLM을 미세 조정할지 여부는 기본 모델 사용과 도메인별 맞춤화 사이의 균형을 맞추는 중요한 결정입니다. LLM에 있어서는 하나의 크기가 모두에게 맞지 않습니다. 일부 사용 사례는 기본 모델로 잘 작동할 수 있지만, 다른 경우에는 맞춤화가 필요합니다. 맞춤화를 고려할 때, 프롬프트 엔지니어링, P-튜닝, 파라미터 효율적 미세 조정(PEFT), 전체 미세 조정(FT) 등 여러 옵션이 있습니다. 미세 조정은 데이터 라벨링, 훈련, 평가에 상당한 투자가 필요하며, 이는 시간과 비용이 많이 들 수 있습니다. LLM을 맞춤화할 때 효율성과 정확성을 보장하기 위해 테스트 및 품질 평가 프로세스를 자동화하는 것이 중요합니다. 그림 3은 OpenAI의 GPT-4 모델과 일부 오픈 소스 모델을 NVHelp 봇 도메인의 약 245개 쿼리에 대해 비교한 정확도 대 지연 시간 트레이드오프 평가를 보여줍니다. 우리의 결과는 Llama3-70B 모델이 여러 측면에서 답변 품질에 뛰어나면서도 수용 가능한 지연 시간을 유지한다는 것을 보여줍니다.