大型语言模型时代的协同策略综述资源-CSDN文库

版权申诉

3 浏览量 2024-12-03 12:28:09 上传评论收藏 1.49MB PDF 举报

资源推荐

资源详情

资源评论

Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in

the Era of Large Language Models

Jinliang Lu

1,2∗

, Ziliang Pang

1∗

, Min Xiao

1,2∗

, Yaochen Zhu

, Rui Xia

, Jiajun Zhang

1,2,4 †

Institute of Automation, Chinese Academy of Sciences, Beijing, China

School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, Beijing, China

Nanjing University of Science and Technology, Nanjing, China

Wuhan AI Research, Wuhan, China

{lujinliang2019, ziliang.pang}@ia.ac.cn,

{yczhu, rxia}@njust.edu.cn, {jjzhang, min.xiao}@nlpr.ia.ac.cn

Abstract

The remarkable success of Large Language

Models (LLMs) has ushered natural language

processing (NLP) research into a new era. De-

spite their diverse capabilities, LLMs trained on

different corpora exhibit varying strengths and

weaknesses, leading to challenges in maximiz-

ing their overall efﬁciency and versatility. To

address these challenges, recent studies have ex-

plored collaborative strategies for LLMs. This

paper provides a comprehensive overview of

this emerging research area, highlighting the

motivation behind such collaborations. Speciﬁ-

cally, we categorize collaborative strategies into

three primary approaches: Merging, Ensemble,

and Cooperation. Merging involves integrating

multiple LLMs in the parameter space. En-

semble combines the outputs of various LLMs.

Cooperation leverages different LLMs to allow

full play to their diverse capabilities for spe-

ciﬁc tasks. We provide in-depth introductions

to these methods from different perspectives

and discuss their potential applications. Addi-

tionally, we outline future research directions,

hoping this work will catalyze further studies

on LLM collaborations and paving the way for

advanced NLP applications.

1 Introduction

"Many hands make light work."

—– John Heywood

Human beings have long understood the power of

collaboration. When individuals pool their diverse

skills and efforts, they can achieve far more than

they could alone. This principle of collective effort

has found new relevance in the realm of machine

learning (Dietterich, 2000; Panait and Luke, 2005;

Sagi and Rokach, 2018), signiﬁcantly boosting the

development of artiﬁcial intelligence.

In recent years, large language models (LLMs)

(Brown et al., 2020; Chowdhery et al., 2023) have

Equal Contribution

†

Corresponding author

PaLM-1/2

Qwen - 1/1.5/2

LLaMA-Chat

LLaMA - 1/2/3

LLaMA-Guard

CodeX

GPT-3

CodeLLaMA

Gemma

Flan-T5

GPT-4

ChatGPT

Qwen-Multilingual

Qwen-Audio

Gemini - 1/1.5

DeepSeek - Coder - 1/2

DeepSeek - Chat- 1/2

Baichuan

ERNIE

Mistral

Codestral

WebGPT

Figure 1: Recently, numerous large language models

have been released, each with its own unique strengths.

This diversity has fueled research into collaboration

between these models.

emerged as one of the most rapidly developing

and promising directions in artiﬁcial intelligence.

These models have signiﬁcantly transformed the

paradigm of natural language processing (NLP)

(Min et al., 2023a; Chang et al., 2024; Zhao et al.,

2023) and inﬂuenced other areas (Wu et al., 2023a;

Zhang et al., 2024a). This impressive revolution

has inspired numerous universities, institutes, and

companies to pre-train and release their own LLMs.

Currently, over 74,000 pre-trained models are avail-

able on the HuggingFace model hub

. As shown in

Figure 1, these models, trained with diverse data,

architectures, and methodologies, possess unique

capabilities: some are proﬁcient in multilingual

tasks (Le Scao et al., 2023; Lin et al., 2022), others

specialize in domains like medicine (Yang et al.,

2024b) or ﬁnance (Wu et al., 2023b), some are

adept at processing long-context windows (Chen

et al., 2023e,f), while others are ﬁne-tuned for

better alignment with human interaction (Ouyang

et al., 2022). However, no single model consis-

tently outperforms all others across tasks (Jiang

https://huggingface.co/models

arXiv:2407.06089v1 [cs.CL] 8 Jul 2024

et al., 2023a). This variability motivates research

into the collaboration between various LLMs to

unlock their combined potential, akin to creating a

Hexagon Warrior.

Despite progress in LLM collaboration research,

the relationships and context among the proposed

methods remain unclear. This survey aims to ﬁll

that gap by categorizing collaboration techniques

into three main approaches: Merging, Ensemble,

and Cooperation. Speciﬁcally, Merging and En-

semble methods for LLMs are derived from tradi-

tional fusion techniques commonly explored in ma-

chine learning (Li et al., 2023a). These methods are

tailored to be more suitable for LLMs, effectively

leveraging the collaborative advantages of diverse

LLMs. Merging involves integrating the parame-

ters of multiple LLMs into a single, uniﬁed model,

requiring that the parameters are compatible within

a linear space. In contrast, Ensemble focuses on

combining the outputs generated by various LLMs

to produce coherent results, with less emphasis on

the parameters of the individual models. Coopera-

tion extends beyond merging and ensemble. This

survey concentrates on cooperative methods that

harness the diverse strengths of LLMs to achieve

speciﬁc objectives. In general, these techniques

expand the methodologies for model collaboration,

holding signiﬁcant research importance for LLMs.

The structure of this work is organized as follows.

We begin by providing the background of LLMs

and deﬁning collaboration techniques for LLMs

in Section 2. Next, we introduce three key cate-

gories: Merging in Section 3, Ensemble in Section

4, and Cooperation in Section 5. Each category

of methods is thoroughly classiﬁed and described

in detail, offering a clear understanding of their

respective frameworks and applications. Finally,

we offer a comprehensive discussion in Section 6,

highlighting challenges and future directions for

research.

In summary, this study aims to comprehensively

explore the strategies and methodologies for col-

laborative efforts among LLMs. We aspire for this

survey to enrich understanding of LLM collabora-

tion strategies and to inspire future research.

2 Background

2.1 Large Language Models

Language modeling has always been a cornerstone

of natural language processing (NLP). Recently,

plenty of studies scale up of Transformer-based lan-

guage models (Vaswani et al., 2017; Radford et al.,

2018) to substantial more than billions of param-

eters, exempliﬁed by models like GPT-3 (Brown

et al., 2020), PaLM (Chowdhery et al., 2023; Anil

et al., 2023), LLaMA (Touvron et al., 2023a,b).

These models are typically considered as Large

Language Models (LLMs) due to their massive

amount of parameters (Zhao et al., 2023). This

subsection discusses the architecture and scaling of

LLMs, their training objectives, and the emergent

abilities they exhibit.

Architecture and Scaling Similar to pre-trained

language models (PLMs) (Radford et al., 2018; De-

vlin et al., 2019), LLMs primarily adopt the Trans-

former architecture (Vaswani et al., 2017) as their

backbone, consisting of stacked multi-head atten-

tion and feed-forward layers. Unlike PLMs, most

currently released LLMs are built upon decoder-

only architectures for training efﬁciency and few-

shot capabilities. This approach also shows po-

tential when the number of parameters increases

(Zhang et al., 2022). Recent studies have investi-

gated the quantitative relationship between model

capacity, the amount of training data, and model

size, known as the scaling law (Kaplan et al., 2020;

Hoffmann et al., 2022).

Training Objectives In the previous studies

about PLMs, various language modeling tasks are

proposed. For example, masked language model-

ing for BERT (Devlin et al., 2019), De-noising lan-

guage modeling for BART (Lewis et al., 2020) and

T5 (Raffel et al., 2020). However, current LLMs

typically utilize the standard causal language mod-

eling as their training objective, which aims to pre-

dict the next token based on the preceding tokens

in a sequence. This training objective is well-suited

for decoder-only architectures.

Beyond the pre-training objective, recent studies

have aimed to model human preferences to better

align LLMs with human expectations. For exam-

ple, the well-known InstructGPT (Ouyang et al.,

2022) introduces reinforcement learning from hu-

man feedback (RLHF), which uses preference re-

wards as an additional training objective. Although

RLHF is effective at making LLMs more helpful to

users, it inevitably incurs an alignment tax, which

refers to performance degradation after RLHF. Re-

cent research has explored various techniques to

mitigate alignment tax issues (Lin et al., 2023; Lu

et al., 2024b; Fu et al., 2024b).

(a) LLM Merging (b) LLM Ensemble (c) LLM Cooperation

Isaac Newton

Newton

Isaac Newton

Final Answer

Guys, I need help!

How can we assist you?

Who discovered the law of gravity?

Contrast?

Compression?

Checking?

Figure 2: The illustration of different collaboration strategies, with each animal in the ﬁgures representing a different

LLM.

Emergent Abilities The fundamental capability

of language models is text generation, where tokens

are auto-regressively generated based on preceding

tokens using greedy search or nucleus sampling

(Holtzman et al., 2020a):

∼ p(y

) (1)

Interestingly, LLMs can not only generate real-

istic text but also perform speciﬁc tasks when pro-

vided with task-speciﬁc prompts, without requiring

ﬁne-tuning on particular downstream tasks (Brown

et al., 2020). This phenomenon is one of the most

important differences between LLMs and previ-

ous PLMs. Wei et al. (2022b) deﬁne the emergent

ability as “an ability that is not present in smaller

models but is present in larger models.” Among

these emergent abilities, in-context learning (ICL)

(Brown et al., 2020; Dong et al., 2022) and instruc-

tion following are commonly used and signiﬁcantly

enhance the ability of LLMs to process various

tasks.

ICL helps LLMs understand tasks by using sev-

eral task examples as demonstrations. When pro-

vide these demonstrations as prompts, LLMs can

automatically generate reasonable output for the

given test example, which can be formalized as:

p(y|x) = p(y|x, demonstration({(x

, y

)}

i=1

))

(2)

Instruction following ability are typically emerge

in LLMs that have been ﬁne-tuned on examples

formatted with instructions on multiple tasks. The

generation process can be formalized as:

p(y|x) = p(y|x, I) (3)

where

refers to the given instruction for cur-

rent example

. The instruction tuning technique

(Sanh et al., 2021; Ouyang et al., 2022; Wei et al.,

2022a) can enhance the generalization capabilities

of LLMs, enabling them to perform well with in-

structions on a variety of tasks, including unseen

ones (Thoppilan et al., 2022).

2.2 Collaboration for LLMs

For previous task-dependent NLP models, collabo-

ration strategies typically aimed to improve perfor-

mance on speciﬁc tasks (Jia et al., 2023). Recently,

LLMs have revolutionized NLP by demonstrating

remarkable versatility across a wide range of tasks.

This shift has also shifted the focus of collabora-

tion strategies for LLMs toward enhancing versa-

tility and achieving more general objectives. Con-

sequently, some recently proposed collaboration

strategies have become more ﬂexible and tailored

speciﬁcally for LLMs.

The Necessity of LLM Collaboration Although

almost all LLMs demonstrate strong versatility

across various tasks through in-context learning

and instruction following, different LLMs still have

distinct strengths and weaknesses (Jiang et al.,

2023a).

Differences in training corpora and model ar-

chitectures among various LLM families—such

as LLaMA, GLM (Zeng et al., 2023), and QWen

(Bai et al., 2023)—result in signiﬁcant variations

in their capabilities. Even within the same fam-

ily, ﬁne-tuning on speciﬁc corpora (e.g., mathemat-

ics (Azerbayev et al., 2023), code (Roziere et al.,

2023), or medical domains (Wu et al., 2024)) can

lead to noticeable performance differences. Effec-

tive collaboration among these LLMs can unlock

LLM Collaboration

Cooperation (§5)

Federated Co-

operation (§5.4)

Federated

Prompt Engi-

neering (§5.4.2)

e.g Zhang et al. (2024b), Li

et al. (2024a), Guo et al. (2022)

Federated Train-

ing (§5.4.1)

e.g. Fan et al. (2024), Ye et al.

(2024), Wang et al. (2024d)

Compensatory

Cooperation (§5.3)

Retriever (§5.3.2)

e.g. Ma et al. (2023b), Mao et al.

(2024), Li et al. (2024d), Su et al. (2024)

Detector (§5.3.1)

e.g. Pan et al. (2023), Huo et al. (2023),

Chen et al. (2023b), Wang et al. (2024c)

Knowledge

Transfer (§5.2)

Supplying New

Knowledge (§5.2.3)

e.g. Ormazabal et al. (2023), (Liu et al., 2024a),

Zhao et al. (2024b), Zhou et al. (2024b)

Strengthening

Correct Knowl-

edge (§5.2.2)

e.g. Tu et al. (2023), Lu et al.

(2024a), Deng and Raffel (2023)

Mitigating

Incorrect Knowl-

edge (§5.2.1)

e.g. Li et al. (2023b), Liu et al. (2021),

O’Brien and Lewis (2023), Shi et al. (2024)

Efﬁcient Com-

putation (§5.1)

Speculative

Decoding (§5.1.2)

e.g. Stern et al. (2018), Leviathan et al.

(2023), Ou et al. (2024), Huang et al. (2024a)

Input Compres-

sion (§5.1.1)

e.g. LLMLINGUA (Jiang et al., 2023b), Li

et al. (2024b), Liu et al. (2023), (Gao, 2024b)

Ensemble (§4)

LLM Ensemble

Application (§4.2)

e.g. Gundabathula and Kolar (2024), Barabucci et al.

(2024), Coste et al. (2024), Ahmed et al. (2024)

LLM Ensemble

Methodology (§4.1)

After Infer-

ence (§4.1.3)

e.g. Chen et al. (2023d), Madaan et al.

(2023), Yue et al. (2024), Jiang et al. (2023a)

During Infer-

ence (§4.1.2)

e.g. Hoang et al. (2023), Li et al. (2024c),

Xu et al. (2024b), Huang et al. (2024c)

Before Infer-

ence (§4.1.1)

e.g. Shnitzer et al. (2023), Lu et al. (2023),

Srivatsa et al. (2024), Hu et al. (2024)

Merging (§3)

Merging for

Enhancing Multi-

Task Capability

(M-MTC) (§3.2)

Methods based

on Incremental

Training (§3.2.3)

e.g. Tang et al. (2023), Yang et al. (2024a)

Methods based

on Task Prop-

erty (§3.2.2)

e.g. Ilharco et al. (2023), Yadav et al. (2023), Yang

et al. (2023), Zhou et al. (2024a), Yu et al. (2023)

Methods based

on Weighted

Average (§3.2.1)

e.g. Jin et al. (2022), Daheim

et al. (2023), Nathan et al. (2024)

Merging for

Relatively Optimal

Solution (M-

ROS) (§3.1)

Adaptation to

LLMs (§3.1.2)

e.g. Wan et al. (2024b), Liu et al. (2024b), Kim

et al. (2024), Fu et al. (2024a), Lin et al. (2023)

Basic M-ROS

Methodolo-

gies (§3.1.1)

e.g. Soup (Wortsman et al., 2022), Rame et al.

(2022), Wan et al. (2024b), Liu et al. (2024b)

Figure 3: The primary categorization of LLM collaboration in this survey.

their full potential, signiﬁcantly enhancing their

overall performance and versatility.

Furthermore, LLMs inevitably suffer from com-

putational inefﬁciencies (Zhou et al., 2024c), hallu-

cinations (Rawte et al., 2023; Ji et al., 2023; Huang

et al., 2023), and privacy leaks Fan et al. (2024).

Recent studies explore the collaboration strategies

between LLMs, which provides potential solutions

to mitigate these issues and compensate for their

shortcomings.

The Category of LLM Collaboration Methods

Collaboration between LLMs refers to the pro-

cess where multiple LLMs work together, lever-

aging their individual strengths and capabilities

to achieve a shared objective. In this survey, we

categorize LLM collaboration methods into three

aspects: merging, ensemble and cooperation. As

shown in Figure 2,

•

Merging involves integrating multiple LLMs

into a uniﬁed, stronger one, primarily through

arithmetic operations in the model parameter

space.

•

Ensemble combines the outputs of different

models to obtain coherent results. Recent stud-

ies have proposed various ensemble methods

tailored for LLMs.

•

Cooperation is a relatively broad concept.

This survey focuses on cooperation methods

that leverage the diverse capabilities of differ-

ent LLMs to accomplish speciﬁc objectives,

such as efﬁcient computation or knowledge

transfer.

It should be noted that as we move from merging

to ensemble to cooperation, the requirements for

LLMs gradually relax, making the proposed meth-

ods increasingly ﬂexible. Speciﬁcally, merging

methods are effective only when the LLMs share

a compatible parameter space, allowing seamless

integration. Ensemble methods require LLMs to

have diverse yet comparable abilities; without this

balance, the ensemble may be less effective. In

contrast, cooperation methods are more ﬂexible,

focusing on leveraging LLMs with various capabili-

ties that are specially designed to achieve particular

objectives.

For each category, we further classify speciﬁc

methods based on their focus or stages of imple-

mentation. The comprehensive categorization is

shown in Figure 3.

3 Merging

Single models have inherent limitations, such as po-

tentially missing important information (Sagi and

Rokach, 2018), and being prone to getting stuck

in local optima or lacking multi-task capabilities.

To address these limitations, researchers have ex-

plored model merging methods, which combine

multiple models in the parameter space to create a

uniﬁed, stronger model. Model merging has made

signiﬁcant progress in recent years, with various

techniques cataloged in existing surveys (Li et al.,

2023a). In the era of LLMs, model merging has

become an important solution for model collabo-

ration, usually employing basic merging methods

and demonstrate the effectiveness. This section

focuses on the merging techniques that are proven

to be effective for LLMs

Current studies on model merging typically fo-

cus on two key issues: merging to approach a rel-

atively optimal solution (M-ROS) and merging to

enhance multi-task capability (M-MTC). Research

on M-ROS is based on the ﬁnding that gradient-

optimized solutions often converge near the bound-

ary of a wide ﬂat region rather than at the central

point (Izmailov et al., 2018). Model merging of-

fers a way to approach this relatively optimal point,

thereby yielding a stronger model. M-MTC, on

the other hand, aims to utilize model merging tech-

niques to enrich a single model with capabilities

across multiple tasks (Ilharco et al., 2023; Yadav

et al., 2023). In the following subsection, we will

introduce the techniques for each objective and

their application to LLMs.

It is important to note that for both M-ROS and

M-MTC, current model merging methods are appli-

cable only to models with the same architecture and

parameters within the same space. Therefore, most

candidate models for merging should be trained

with identical initialization. For instance, the can-

didate models

M = {M

, M

, · · · , M

}

should

be ﬁne-tuned from the same pre-trained model

This requirement ensures compatibility and coher-

ence among the model parameters, promoting suc-

cessful merging. Unfortunately, for models with in-

compatible parameters, such as LLaMA and QWen,

current merging techniques are ineffective.

Some advanced methods, such as merging after neuron

alignments

like OT Fusion (Singh and Jaggi, 2020), Re-

Basin techniques (Peña et al., 2023; Ainsworth et al., 2023),

and REPAIR (Jordan et al., 2023)

have not been widely

explored for LLMs. We leave the implementation of these

techniques on LLMs for future work.

剩余28页未读，继续阅读

评论收藏

内容反馈

版权申诉

pk_xz123456

粉丝: 2679
资源: 3706

大型语言模型时代的协同策略综述

RestGPT：连接大型语言模型与实际应用中的RESTful API

协同训练改善了大型语言模型的即时学习.pdf

基于模型的协同过滤电影评分预测模型_springsnc_python预测_python_协作编辑Python_预测模型_

大语言模型提示注入攻击安全风险分析报告-大数据协同安全技术国家工程研究中心-2023.7.6-55页.pdf

基于动态围捕点的多机器人协同策略.rar

LangChain：连接大型语言模型与外部世界的桥梁.zip

广义协同转弯模型

多智能体协作：大型语言模型在法庭模拟与软件开发的应用研究

大型语言模型智能代理操作系统（AIOS）架构与优化

论文研究-基于灰色关联理论和距离协同模型的区域协同发展评价方法及实证.pdf

大模型时代正在到来.pdf

2014年关于协同过滤的经典综述

论文研究-业务模型中的协同模型研究.pdf

资源型城市生态经济系统协同度评价——基于复合系统协同度模型

基于活动基模型的协同分割

YOLO模型与其他深度学习模型的融合与协同

轻量化CAD模型实时协同工具的研究与开发.pdf

模型交换和协同仿真的功能模型接口FMI(翻译).pdf

Python_由Databricks开发的大型语言模型DBRX的代码示例和资源.zip

ReAct: 通过大型语言模型结合推理与行动实现任务解耦

城市交通流诱导与交通控制系统协同研究综述

协同通信综述.pptx

基于无人机协同对抗策略的matlab源码+数据.zip

利用提示工程集成化学知识增强大型语言模型在科学领域的性能

语言经济学视阈下辽宁跨境电商语言服务协同发展策略研究.pdf

协同通信综述

面向协同工作的信息流模型研究

基于云模型的协同过滤推荐算法

最新资源