大型语言模型JSON响应格式化的基准测试与优化_langchain返回格式化jsoin资源-CSDN文库

86 浏览量 2024-10-25 11:59:09 上传评论收藏 662KB PDF 举报

资源推荐

资源详情

资源评论

STRUCTUREDRAG: JSON RESPONSE FORMATTING WITH

LARGE LANGUAGE MODELS

Connor Shorten

Weaviate

Charles Pierse

Weaviate

Thomas Benjamin Smith

Weaviate

Erika Cardenas

Weaviate

Akanksha Sharma

Weaviate

John Trengrove

Weaviate

Bob van Luijt

Weaviate

ABSTRACT

The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is

crucial for their use in Compound AI Systems. However, evaluating and improving this capability

remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to

assess LLMs’ proﬁciency in following response format instructions. We evaluate two state-of-the-art

LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting

strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting.

Across 24 experiments, we ﬁnd an average success rate of 82.55%. We further ﬁnd a high variance

in performance across tasks, models, and prompting strategies with success rates ranging from 0

to 100%. We ﬁnd that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro.

We observe that task complexity signiﬁcantly inﬂuences performance, with tasks involving lists or

composite object outputs proving more challenging. Our ﬁndings highlight the need for further

research into improving the reliability and consistency of structured output generation in LLMs. We

have open-sourced our experimental code and results at github.com/weaviate/structured-rag.

1 Introduction

Large Language Models (LLMs) have become extremely effective at Zero-Shot Learning. Zero-Shot Learning is used

to describe a machine learning model’s ability to perform a task without any training data for the task given in advance.

An emergent area of importance is not only to test LLMs on how well they can perform novel tasks, but also how

well they can structure their output in a particular format. This is a critical requirement for developing Compound AI

Systems [

] that consist of multiple LLM inferences or external computational tools. For example, Multi-Hop RAG

[

] is a Compound AI System where an LLM inference ﬁrst predicts one or multiple search queries for an input and

then sends these queries to a search tool. Another LLM inference then aggregates these search results and the original

question to generate a response. In order for the Multi-Hop RAG system to parse the response from the query writer to

send to the search tool, it is critical that the query writer follows a particular response format such as a JSON with the

key “queries” and a list of strings as the value.

In this work, we seek to measure the ability of LLMs to follow JSON response format instructions with Zero-Shot

Learning. While structured decoding methods, such as DOMINO [

], have emerged as a popular solution for ensuring

correct JSON outputs in Compound AI Systems, we seek to better understand the baseline performance of Zero-Shot

Learning. Structured decoding may slow down inference throughput, complicate system integration, and interfere with

the LLM’s prior knowledge and the beneﬁts of prompt optimization [

]. To address these concerns, we a construct a novel

benchmark of six RAG-inspired [

] structured output tests. These tests explore different typed JSON responses such as

string, integer, or boolean values, as well as outputting a list of strings, denoted as List[string]. Further, we illustrate the

use of composite objects containing more than one type per instance. We present the AnswerWithConﬁdence composite

object consisting of a string valued answer and an integer valued conﬁdence. We further test the ability to output a

list of AnswerWithConﬁdence objects, similarly denoted as List[AnswerWithConﬁdence]. An output from the LLM

passes these tests if it is able to be parsed into the requested JSON response format. This entails that the output jointly

arXiv:2408.11061v1 [cs.CL] 7 Aug 2024

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

豪AI冰

粉丝: 73
资源: 68

大型语言模型JSON响应格式化的基准测试与优化

json-benchmarks:json库的基准测试

db_benchmarks:mongodbmysqlother dbs的基准代码

TTN:各种TTN测试项目

MongoDBTest.

user

吉奥

scrapy0.22 API英文版

my-goblog

go-sales

scrapy.pdf

python版网络爬虫

fintrospect：为Finagle实现快速，类型安全的HTTP Web服务

golang工程最佳实践

Ajax 学习资源 中外都有

2020最新Gin框架中文文档-翻译-asong-无水印版本V1.11

百度校园招聘历年经典面试题汇总：Android岗

数据库设计指南-60个设计技巧

Hyperfoil:面向微服务的负载驱动程序

百度持续交付项目组面试题

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

CIFAR10数据集免费下载

大作业05-YOLOV5口罩检测数据集+代码+模型 2000张标注好的数据+教学视频.zip

Deep Learning Tuning Playbook（中译版）

zotero翻译插件.xpi

最新资源

Ajax 学习资源中外都有