没有合适的资源?快使用搜索试试~ 我知道了~
内容概要:本文介绍了StructuredRAG,这是一个针对大型语言模型(LLMs)生成JSON响应格式能力的基准测试。研究对比了Gemini 1.5 Pro和Llama 3 8B-instruct两种模型的表现,通过24次实验评估它们对不同类型JSON输出的任务表现。研究结果显示,在简单类型的任务中性能较高,但在复杂任务如列表输出和复合对象方面性能下降显著。此外,文章还探讨了不同提示策略的效果,以及使用OPRO优化方法提高生成复杂JSON结构的成功率。 适合人群:研究人员和技术开发者,特别是关注大型语言模型及其应用的人群。 使用场景及目标:适用于需要评估和改进大型语言模型生成结构化输出能力的项目,帮助理解和优化多组件AI系统中的数据交换。 其他说明:本文不仅提供了详细的实验结果和分析,还公开了源代码,方便进一步的研究和发展。
资源推荐
资源详情
资源评论
STRUCTUREDRAG: JSON RESPONSE FORMATTING WITH
LARGE LANGUAGE MODELS
Connor Shorten
Weaviate
Charles Pierse
Weaviate
Thomas Benjamin Smith
Weaviate
Erika Cardenas
Weaviate
Akanksha Sharma
Weaviate
John Trengrove
Weaviate
Bob van Luijt
Weaviate
ABSTRACT
The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is
crucial for their use in Compound AI Systems. However, evaluating and improving this capability
remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to
assess LLMs’ proficiency in following response format instructions. We evaluate two state-of-the-art
LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting
strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting.
Across 24 experiments, we find an average success rate of 82.55%. We further find a high variance
in performance across tasks, models, and prompting strategies with success rates ranging from 0
to 100%. We find that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro.
We observe that task complexity significantly influences performance, with tasks involving lists or
composite object outputs proving more challenging. Our findings highlight the need for further
research into improving the reliability and consistency of structured output generation in LLMs. We
have open-sourced our experimental code and results at github.com/weaviate/structured-rag.
1 Introduction
Large Language Models (LLMs) have become extremely effective at Zero-Shot Learning. Zero-Shot Learning is used
to describe a machine learning model’s ability to perform a task without any training data for the task given in advance.
An emergent area of importance is not only to test LLMs on how well they can perform novel tasks, but also how
well they can structure their output in a particular format. This is a critical requirement for developing Compound AI
Systems [
1
,
2
] that consist of multiple LLM inferences or external computational tools. For example, Multi-Hop RAG
[
3
] is a Compound AI System where an LLM inference first predicts one or multiple search queries for an input and
then sends these queries to a search tool. Another LLM inference then aggregates these search results and the original
question to generate a response. In order for the Multi-Hop RAG system to parse the response from the query writer to
send to the search tool, it is critical that the query writer follows a particular response format such as a JSON with the
key “queries” and a list of strings as the value.
In this work, we seek to measure the ability of LLMs to follow JSON response format instructions with Zero-Shot
Learning. While structured decoding methods, such as DOMINO [
4
], have emerged as a popular solution for ensuring
correct JSON outputs in Compound AI Systems, we seek to better understand the baseline performance of Zero-Shot
Learning. Structured decoding may slow down inference throughput, complicate system integration, and interfere with
the LLM’s prior knowledge and the benefits of prompt optimization [
5
]. To address these concerns, we a construct a novel
benchmark of six RAG-inspired [
6
] structured output tests. These tests explore different typed JSON responses such as
string, integer, or boolean values, as well as outputting a list of strings, denoted as List[string]. Further, we illustrate the
use of composite objects containing more than one type per instance. We present the AnswerWithConfidence composite
object consisting of a string valued answer and an integer valued confidence. We further test the ability to output a
list of AnswerWithConfidence objects, similarly denoted as List[AnswerWithConfidence]. An output from the LLM
passes these tests if it is able to be parsed into the requested JSON response format. This entails that the output jointly
arXiv:2408.11061v1 [cs.CL] 7 Aug 2024
资源评论
豪AI冰
- 粉丝: 73
- 资源: 68
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- apache-maven-3.6.1-bin.zip
- c593f5fc-d4a7-4b43-8ab2-51afc90f3f62
- IIR滤波器参数计算函数
- WPF树菜单拖拽功能,下级目录拖到上级目录,上级目录拖到下级目录.zip
- CDH6.3.2版本hive2.1.1修复HIVE-14706后的jar包
- 鸿蒙项目实战-天气项目(当前城市天气、温度、湿度,24h天气,未来七天天气预报,生活指数,城市选择等)
- Linux环境下oracle数据库服务器配置中文最新版本
- Linux操作系统中Oracle11g数据库安装步骤详细图解中文最新版本
- SMA中心接触件插合力量(插入力及分离力)仿真
- 变色龙记事本,有NPP功能,JSONview功能
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功