没有合适的资源?快使用搜索试试~ 我知道了~
内容概要:本文介绍了StructuredRAG,这是一个针对大型语言模型(LLMs)生成JSON响应格式能力的基准测试。研究对比了Gemini 1.5 Pro和Llama 3 8B-instruct两种模型的表现,通过24次实验评估它们对不同类型JSON输出的任务表现。研究结果显示,在简单类型的任务中性能较高,但在复杂任务如列表输出和复合对象方面性能下降显著。此外,文章还探讨了不同提示策略的效果,以及使用OPRO优化方法提高生成复杂JSON结构的成功率。 适合人群:研究人员和技术开发者,特别是关注大型语言模型及其应用的人群。 使用场景及目标:适用于需要评估和改进大型语言模型生成结构化输出能力的项目,帮助理解和优化多组件AI系统中的数据交换。 其他说明:本文不仅提供了详细的实验结果和分析,还公开了源代码,方便进一步的研究和发展。
资源推荐
资源详情
资源评论
STRUCTUREDRAG: JSON RESPONSE FORMATTING WITH
LARGE LANGUAGE MODELS
Connor Shorten
Weaviate
Charles Pierse
Weaviate
Thomas Benjamin Smith
Weaviate
Erika Cardenas
Weaviate
Akanksha Sharma
Weaviate
John Trengrove
Weaviate
Bob van Luijt
Weaviate
ABSTRACT
The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is
crucial for their use in Compound AI Systems. However, evaluating and improving this capability
remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to
assess LLMs’ proficiency in following response format instructions. We evaluate two state-of-the-art
LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting
strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting.
Across 24 experiments, we find an average success rate of 82.55%. We further find a high variance
in performance across tasks, models, and prompting strategies with success rates ranging from 0
to 100%. We find that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro.
We observe that task complexity significantly influences performance, with tasks involving lists or
composite object outputs proving more challenging. Our findings highlight the need for further
research into improving the reliability and consistency of structured output generation in LLMs. We
have open-sourced our experimental code and results at github.com/weaviate/structured-rag.
1 Introduction
Large Language Models (LLMs) have become extremely effective at Zero-Shot Learning. Zero-Shot Learning is used
to describe a machine learning model’s ability to perform a task without any training data for the task given in advance.
An emergent area of importance is not only to test LLMs on how well they can perform novel tasks, but also how
well they can structure their output in a particular format. This is a critical requirement for developing Compound AI
Systems [
1
,
2
] that consist of multiple LLM inferences or external computational tools. For example, Multi-Hop RAG
[
3
] is a Compound AI System where an LLM inference first predicts one or multiple search queries for an input and
then sends these queries to a search tool. Another LLM inference then aggregates these search results and the original
question to generate a response. In order for the Multi-Hop RAG system to parse the response from the query writer to
send to the search tool, it is critical that the query writer follows a particular response format such as a JSON with the
key “queries” and a list of strings as the value.
In this work, we seek to measure the ability of LLMs to follow JSON response format instructions with Zero-Shot
Learning. While structured decoding methods, such as DOMINO [
4
], have emerged as a popular solution for ensuring
correct JSON outputs in Compound AI Systems, we seek to better understand the baseline performance of Zero-Shot
Learning. Structured decoding may slow down inference throughput, complicate system integration, and interfere with
the LLM’s prior knowledge and the benefits of prompt optimization [
5
]. To address these concerns, we a construct a novel
benchmark of six RAG-inspired [
6
] structured output tests. These tests explore different typed JSON responses such as
string, integer, or boolean values, as well as outputting a list of strings, denoted as List[string]. Further, we illustrate the
use of composite objects containing more than one type per instance. We present the AnswerWithConfidence composite
object consisting of a string valued answer and an integer valued confidence. We further test the ability to output a
list of AnswerWithConfidence objects, similarly denoted as List[AnswerWithConfidence]. An output from the LLM
passes these tests if it is able to be parsed into the requested JSON response format. This entails that the output jointly
arXiv:2408.11061v1 [cs.CL] 7 Aug 2024
资源评论
豪AI冰
- 粉丝: 73
- 资源: 68
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 用QT实现的桌面端聊天室软件,含服务端和客户端,使用经过SSL加密的TCP通
- 一款基于 MATLAB 的 EEG 神经反馈训练系统 在神经反馈实验过程中可实时观察并记录 EEG 信号和神经反馈实验标记
- Java SSM 商户管理系统 客户管理 库存管理 销售报表 项目源码 本商品卖的是源码,合适的地方直接拿来使用,不合适的根据
- 基于Spring boot 的Starter机制提供一个开箱即用的多数据源抽取工具包,计划对RDMS(关系型
- 水泵系统水力计算公式-标准版
- Wesley是一套为经销商量身定制的全业务流程渠道 分销管理系统(手机APP称为经销商管家)
- Adaptive Autosar EM 标准规范
- 鼓谱图片转MuseScore超文本文档实验程序
- 自动驾驶感知动态障碍物算法上车效果 (Xavier jetson&autoware)
- 【实验指导书-2024版】实验一:查验身份证.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功