没有合适的资源?快使用搜索试试~ 我知道了~
内容概要:本文介绍了StructuredRAG,这是一个针对大型语言模型(LLMs)生成JSON响应格式能力的基准测试。研究对比了Gemini 1.5 Pro和Llama 3 8B-instruct两种模型的表现,通过24次实验评估它们对不同类型JSON输出的任务表现。研究结果显示,在简单类型的任务中性能较高,但在复杂任务如列表输出和复合对象方面性能下降显著。此外,文章还探讨了不同提示策略的效果,以及使用OPRO优化方法提高生成复杂JSON结构的成功率。 适合人群:研究人员和技术开发者,特别是关注大型语言模型及其应用的人群。 使用场景及目标:适用于需要评估和改进大型语言模型生成结构化输出能力的项目,帮助理解和优化多组件AI系统中的数据交换。 其他说明:本文不仅提供了详细的实验结果和分析,还公开了源代码,方便进一步的研究和发展。
资源推荐
资源详情
资源评论
STRUCTUREDRAG: JSON RESPONSE FORMATTING WITH
LARGE LANGUAGE MODELS
Connor Shorten
Weaviate
Charles Pierse
Weaviate
Thomas Benjamin Smith
Weaviate
Erika Cardenas
Weaviate
Akanksha Sharma
Weaviate
John Trengrove
Weaviate
Bob van Luijt
Weaviate
ABSTRACT
The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is
crucial for their use in Compound AI Systems. However, evaluating and improving this capability
remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to
assess LLMs’ proficiency in following response format instructions. We evaluate two state-of-the-art
LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting
strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting.
Across 24 experiments, we find an average success rate of 82.55%. We further find a high variance
in performance across tasks, models, and prompting strategies with success rates ranging from 0
to 100%. We find that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro.
We observe that task complexity significantly influences performance, with tasks involving lists or
composite object outputs proving more challenging. Our findings highlight the need for further
research into improving the reliability and consistency of structured output generation in LLMs. We
have open-sourced our experimental code and results at github.com/weaviate/structured-rag.
1 Introduction
Large Language Models (LLMs) have become extremely effective at Zero-Shot Learning. Zero-Shot Learning is used
to describe a machine learning model’s ability to perform a task without any training data for the task given in advance.
An emergent area of importance is not only to test LLMs on how well they can perform novel tasks, but also how
well they can structure their output in a particular format. This is a critical requirement for developing Compound AI
Systems [
1
,
2
] that consist of multiple LLM inferences or external computational tools. For example, Multi-Hop RAG
[
3
] is a Compound AI System where an LLM inference first predicts one or multiple search queries for an input and
then sends these queries to a search tool. Another LLM inference then aggregates these search results and the original
question to generate a response. In order for the Multi-Hop RAG system to parse the response from the query writer to
send to the search tool, it is critical that the query writer follows a particular response format such as a JSON with the
key “queries” and a list of strings as the value.
In this work, we seek to measure the ability of LLMs to follow JSON response format instructions with Zero-Shot
Learning. While structured decoding methods, such as DOMINO [
4
], have emerged as a popular solution for ensuring
correct JSON outputs in Compound AI Systems, we seek to better understand the baseline performance of Zero-Shot
Learning. Structured decoding may slow down inference throughput, complicate system integration, and interfere with
the LLM’s prior knowledge and the benefits of prompt optimization [
5
]. To address these concerns, we a construct a novel
benchmark of six RAG-inspired [
6
] structured output tests. These tests explore different typed JSON responses such as
string, integer, or boolean values, as well as outputting a list of strings, denoted as List[string]. Further, we illustrate the
use of composite objects containing more than one type per instance. We present the AnswerWithConfidence composite
object consisting of a string valued answer and an integer valued confidence. We further test the ability to output a
list of AnswerWithConfidence objects, similarly denoted as List[AnswerWithConfidence]. An output from the LLM
passes these tests if it is able to be parsed into the requested JSON response format. This entails that the output jointly
arXiv:2408.11061v1 [cs.CL] 7 Aug 2024
资源评论
豪AI冰
- 粉丝: 73
- 资源: 68
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 拳皇97.exe拳皇972.exe拳皇973.exe
- Matlab根据flac、pfc或其他软件导出的坐标及应力、位移数据再现云图 案例包括导出在flac6.0中导出位移的fish代码(也可以自己先准备软件导出的坐标数据及对应点的位移或应力数据,可根据需
- python-geohash-0.8.5-cp38-cp38-win-amd64
- 法码滋.exe法码滋2.exe法码滋3.exe
- 串联式、并联式、混联式混合动力系统simulink控制策略模型(串联式、并联式、混联式每个都是独立的需要单独说拿哪个,默认是混联式RB) 有基于逻辑门限值、状态机的规则控制策略(RB)、基于等效燃油
- 医药用品检测21-YOLO(v5至v11)、COCO、Paligemma、VOC数据集合集.rar
- 数据恢复软件 Apeaksoft Data Recovery for Mac v1.6.16
- 阅读工具 OmniReader Pro for Mac v3.0.3
- 数据恢复 Disk Drill Enterprise for Mac v5.7.1704
- 全自动批量建站快速养权重站系统【纯静态html站群版】:(GPT4.0自动根据关键词写文章+自动发布+自定义友链+自动文章内链+20%页面加提权词)
- 医药用品检测53-YOLO(v5至v11)、COCO、CreateML、Paligemma数据集合集.rar
- req-sign、bd-ticket-ree-public加密算法(JS)
- 船舶燃料消耗和二氧化碳排放分析数据集,燃料消耗和碳排放关联分析数据
- KUKA机器人安装包,与PROFINET软件包
- 非wine、原生Linux迅雷安装包deb文件,支持Ubuntu、UOS统信、深度Deepin、LinuxMint、Debain系通用
- VScode最新安装包macos版本
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功