信息检索中的多模型路由技术：基于多个专家嵌入模型的RouterRetriever研究与应用资源-CSDN文库

信息检索

132 浏览量 2024-10-25 11:56:37 上传评论收藏 454KB PDF 举报

资源推荐

资源详情

资源评论

ROUTERRETRIEVER: Exploring the Beneﬁts of Routing

over Multiple Expert Embedding Models

Hyunji Lee

Luca Soldaini

Arman Cohan

γ,α

Minjoon Seo

Kyle Lo

KAIST AI

Allen Institute for AI

Yale University

hyunji.amy.lee@kaist.ac.kr {lucas, kylel}@allenai.org

Abstract

Information retrieval methods often rely on a single em-

bedding model trained on large, general-domain datasets

like MSMARCO. While this approach can produce a re-

triever with reasonable overall performance, models trained

on domain-speciﬁc data often yield better results within their

respective domains. While prior work in information retrieval

has tackled this through multi-task training, the topic of com-

bining multiple domain-speciﬁc expert retrievers remains un-

explored, despite its popularity in language model generation.

In this work, we introduce ROUTERRETRIEVER, a retrieval

model that leverages multiple domain-speciﬁc experts along

with a routing mechanism to select the most appropriate ex-

pert for each query. It is lightweight and allows easy addi-

tion or removal of experts without additional training. Eval-

uation on the BEIR benchmark demonstrates that ROUTER-

RETRIEVER outperforms both MSMARCO-trained (+2.1 ab-

solute nDCG@10) and multi-task trained (+3.2) models. This

is achieved by employing our routing mechanism, which sur-

passes other routing techniques (+1.8 on average) commonly

used in language modeling. Furthermore, the beneﬁt gener-

alizes well to other datasets, even in the absence of a spe-

ciﬁc expert on the dataset. To our knowledge, ROUTERRE-

TRIEVER is the ﬁrst work to demonstrate the advantages

of using multiple domain-speciﬁc expert embedding models

with effective routing over a single, general-purpose embed-

ding model in retrieval tasks

Introduction

While a single embedding model trained on large-scale

general-domain datasets like MSMARCO (Campos et al.

2016) often performs well, research shows that models

trained on domain-speciﬁc datasets, even if smaller, can

achieve superior results within those domains (Izacard et al.

2021; Bonifacio et al. 2022). Moreover, ﬁnetuning on MS-

MARCO after pretraining with contrastive learning can

sometimes degrade performance on speciﬁc datasets (Wang

et al. 2023; Lee et al. 2023). To improve embedding models

for domain-speciﬁc datasets, previous studies have explored

approaches such as data construction (Wang et al. 2021; Ma

et al. 2020) and domain adaptation methods (Xin et al. 2021;

Fang et al. 2024). However, less attention has been paid to

Work performed during internship at AI2.

Code in https://github.com/amy-hyunji/RouterRetriever

leveraging multiple expert embedding models and routing

among them to select the most suitable one during inference.

In this work, we introduce ROUTERRETRIEVER, a re-

trieval model that leverages multiple domain-speciﬁc experts

with a routing mechanism to select the most suitable expert

for each instance. For each domain, we train gates (experts),

and during inference, the model determines the most rele-

vant expert by computing the average similarity between the

query and a set of pilot embeddings representing each ex-

pert, selecting the expert with the highest similarity score.

ROUTERRETRIEVER is lightweight, as it only requires the

training of parameter-efﬁcient LoRA module (Hu et al.

2021) for each expert, resulting in a minimal increase in pa-

rameters. Additionally, ROUTERRETRIEVER offers signiﬁ-

cant ﬂexibility: unlike a single model that requires retraining

when domains are added or removed, ROUTERRETRIEVER

simply adds or removes experts without the need for further

training.

Evaluation on the BEIR benchmark (Thakur et al. 2021)

with various combinations of experts highlights the ben-

eﬁts of having multiple expert embedding models with a

routing mechanism compared to using a single embed-

ding model. When keeping the total number of training

datasets constant, ROUTERRETRIEVER consisted of only

domain-speciﬁc experts without an MSMARCO expert out-

performs both a model trained on the same dataset in a multi-

task manner and a model trained with MSMARCO. Also,

adding domain-speciﬁc experts tends to improve perfor-

mance even when an expert trained on a large-scale general-

domain dataset like MSMARCO is already present, sug-

gesting that, despite the capabilities of a general-domain

experts, domain-speciﬁc experts provide additional bene-

ﬁts, underscoring their importance. Moreover, ROUTERRE-

TRIEVER consistently improves performance as new experts

are added, whereas multi-task training tends to show per-

formance degradation when a certain number of domains

are included. This indicates the advantage of having sepa-

rate experts for each domain and using a routing mechanism

to select among them. Notably, the beneﬁts of ROUTER-

RETRIEVER generalize not only to datasets that have cor-

responding experts but also to additional datasets without

speciﬁc experts.

We further explore the factors behind these performance

beneﬁts. First, ROUTERRETRIEVER consistently shows im-

arXiv:2409.02685v1 [cs.IR] 4 Sep 2024

proved performance with the addition of more experts

(gates), suggesting that broader domain coverage by experts

enhances retrieval accuracy. This trend holds even in an or-

acle setting, where the gate that maximizes performance is

always selected. Notable, adding a new expert for a different

domain yields greater performance gains than adding addi-

tional experts within the same domain. Second, we observe

that parametric knowledge inﬂuences embedding extraction.

This observation supports the idea that training with domain-

speciﬁc knowledge improves the quality of embedding ex-

traction of the domain. Last, the performance difference be-

tween an instance-level oracle (which routes each instance

to its best expert) and a dataset-level oracle (which routes

queries to the expert with the highest average performance

for the dataset) suggests that queries may beneﬁt from a

knowledge of other domains, supporting the effectiveness

of our routing technique. Our results point to potential re-

search opportunities in improving routing techniques among

multiple expert retrievers, a direction that leads to the devel-

opment of a retriever system that performs well across both

general and domain-speciﬁc datasets.

Related Works

Domain Speciﬁc Retriever There exists substantial re-

search on retrieval models that aim to improve perfor-

mance on domain-speciﬁc tasks. One approach focuses on

dataset augmentation. As domain-speciﬁc training datasets

are often unavailable and can be costly to construct, re-

searchers have developed methods that either train mod-

els in an unsupervised manner (Lee, Chang, and Toutanova

2019; Gao, Yao, and Chen 2021; Gao and Callan 2021) or

ﬁne-tune models on pseudo-queries generated for domain-

speciﬁc datasets (Bonifacio et al. 2022; Ma et al. 2020;

Wang et al. 2021). Another approach is developing domain-

speciﬁc embeddings. A common approach is training in a

multi-task manner over domain-speciﬁc datasets (Lin et al.

2023; Wang et al. 2021). Recent works have aimed to im-

prove domain-speciﬁc retrievers by developing instruction-

following retrieval models (Asai et al. 2022; Weller et al.

2024; Oh et al. 2024; Su et al. 2022; Wang et al. 2023);

instruction contains such domain knowledge. Another ex-

ample is Fang et al. (2024) which trains a soft token for

domain-speciﬁc knowledge. While these methods also aim

to extract good representative embeddings for the input text,

these methods rely on a single embedding model and pro-

duce domain-speciﬁc embeddings by additionally including

domain-speciﬁc knowledge (e.g., appended as instructions)

to the input. ROUTERRETRIEVER differs from these prior

methods by allowing for the employment of multiple embed-

ding models where rather than providing the domain knowl-

edge to the input, added to the model as parametric knowl-

edge to produce the domain representative embeddings.

Routing Techniques Various works have focused on de-

veloping domain-speciﬁc experts and routing mechanisms

to improve general performance in generation tasks. One ap-

proach simultaneously trains experts (gates) and the rout-

ing mechanism (Sukhbaatar et al. 2024; Muqeeth et al.

2024). Another line of work includes post-hoc techniques

Query

Base

Encoder

Gate A

Expert

Encoder A

Pilot Embeddings for A

Gate B

Gate C

Final Query Embedding

Pilot Embedding Library

Expert

Encoder B

Pilot Embeddings for B

Expert

Encoder C

Pilot Embeddings for C

Figure 1: ROUTERRETRIEVER:

 Given a query, we ﬁrst

extract its embedding using a base encoder. We then cal-

culate an average similarity between the query embedding

(black dot) and the pilot embeddings for each gate (orange

dots for Gate A, red dots for Gate B, and blue dots for Gate

C). The gate with the highest average similarity (Gate A in

this case) is selected.

 The ﬁnal query embedding is then

produced by passing the query to Expert Encoder A, which

consists of the base encoder combined with Gate A, the se-

lected expert gate (LoRA).

that do not require additional training for routing. Some

approaches use the model itself as the knowledge source

by training it on domain-speciﬁc knowledge (Feng et al.

2023), incorporate domain-speciﬁc knowledge in the token

space (Belofsky 2023; Shen et al. 2024), or select the most

relevant source from a sampled training dataset of each do-

main (Ye et al. 2022; Jang et al. 2023). Routing techniques

have also been investigated for improving generation qual-

ity in retrieval-augmented generation tasks; Mallen et al.

(2022) explores routing to decide whether to utilize exter-

nal knowledge and Jeong et al. (2024) focuses on routing

to choose among different retrieval approaches. However,

there has been less emphasis on applying these techniques

to information retrieval tasks. In this work, we investigate

the beneﬁts of leveraging multiple domain-speciﬁc experts

and routing mechanisms in information retrieval, contrasting

this approach with the traditional methods of using a sin-

gle embedding model trained on a general-domain dataset

or multi-task training across various domains. Additionally,

we ﬁnd that simply adapting routing techniques from gener-

ation tasks to information retrieval does not yield high per-

formance, underscoring the importance of developing rout-

ing techniques tailored speciﬁcally for information retrieval.

Router Retriever

In this section, we introduce ROUTERRETRIEVER, a re-

trieval model composed of a base retrieval model and mul-

tiple domain-speciﬁc experts (gates). As shown in Figure 1,

for a given input query,

 the most appropriate embedding

is selected using a routing mechanism. Then,

 the query

embedding is generated by passing the query through the

selected gate alongside the base encoder.

In the ofﬂine time, we train the experts (gates) with

Algorithm 1: Constructing Pilot Embedding Library

Require: Domain-speciﬁc training datasets D

, . . . , D

, gates G = {g

, . . . , g

}

1: Initialize empty set P = {} for the pilot embedding library

2: for each dataset D

in {D

, . . . , D

} do

3: Initialize an empty list L

← [ ]

4: for each instance x

in D

5: g

max

) ← arg max

∈G

Perf.(g

, x

) // Find the gate with maximum performance g

max

for instance x

6: Add pair (x

, g

max

) to L

7: end for

8: for each gate g

in G do

9: Group

← {x

| g

max

= g

for (x

, g

max

) in L

} // Group all instances x

for which g

is the maximum performing

gate

10: if Group

is not empty then

11: E ← BaseEncoder(Group

) // Extract embeddings using the base encoder

12: c

← k-means(E, k = 1) // Compute the centroid embedding by clustering cluster size 1, which is the pilot

embedding

13: if g

exists in P then

14: Append c

to the list associated with g

in P (P[g

])

15: else

16: Add a new entry {g

: [c

]} to P

17: end if

18: end if

19: end for

20: end for

21: Output: Pilot embeddings library P

domain-speciﬁc training datasets and construct a pilot em-

bedding library. This library contains pairs of pilot em-

beddings for each domain along with the corresponding ex-

pert trained on that domain. Please note that this process is

performed only once. During inference (online time), when

given an input query, a routing mechanism determines the

appropriate expert. We calculate the similarity score be-

tween the input query embedding and the pilot embeddings

in the pilot embedding library, and then choose the expert

with the highest average similarity score.

We use Contriever (Izacard et al. 2021) as the base en-

coder and train parameter-efﬁcient LoRA (Hu et al. 2021)

for each domain as the gate for that domain keeping the

model lightweight. For example, in the case of Figure 1,

ROUTERRETRIEVER includes a base encoder with three

gates (experts): Gate A, Gate B, and Gate C, and the Expert

Encoder A is composed of the base encoder with Gate A

(LoRA trained on a dataset from domain A) added. This ap-

proach allows for the ﬂexible addition or removal of domain-

speciﬁc gates, enabling various gate combinations without

requiring further training for the routing mechanism.

Experts (Gates) For each domain D

, where i = 1, . . . , T

and T is the total number of domains, we train a separate

expert (gate) g

using the corresponding domain dataset. Af-

ter the training step, we have a total of T different gates,

G = {g

, g

, . . . , g

}, with each gate g

specialized for a

speciﬁc domain.

Pilot Embedding Library Given a domain-speciﬁc train-

ing dataset D

= {x

, . . . , x

} where x

is an instance

in D

, we perform inference using all gates G to iden-

tify which gate provides the most suitable representative

embedding for each instance (line 4-7 in Alg. 1). For

each instance x

, we select g

max

, the gate that demon-

strates the highest performance, deﬁned as g

max

) =

arg max

∈G

Performance(g

, x

). This process produces

pairs (x

, g

max

) for all instances in the dataset D

Next, we group these pairs by g

max

, constructing T

groups, one for each domain. Then for each group, we per-

form k-means clustering with cluster size 1 to get the pilot

embedding (line 8-19 in Alg. 1). In speciﬁc, with the con-

structed pairs (x

, g

max

), we group them by the ones that

have the same g

max

, Group

, which contains list of in-

stances x

with same gate as the max gate. This results in

T groups, one for each domain (m = 1, · · · , T ). If the

Group

is not empty, we ﬁrst extract all embeddings for

instances in the group with the base encoder (BaseModel).

We then apply k-means clustering () to these embeddings

with a cluster size of one. The centroid of this cluster c

taken as the pilot embedding for the domain. This results in

one pilot embedding per group, yielding a maximum of T

pilot embeddings for the training dataset D

. Each of these

embeddings is associated with a different gate, representing

the most suitable one for that domain. Please note that since

when Group

is empty, we do not extract pilot embedding

for the empty group (cluster), thereby the number of pilot

embeddings for the training dataset could be less than T .

By repeating this process across all domain-speciﬁc train-

ing datasets D

, . . . , D

, we obtain T pilot embeddings for

each gate, one from each domain-speciﬁc training dataset

(repeating line 3-19 in Alg. 1 for all training dataset

· · · D

). Consequently, the pilot embeddings contains

剩余14页未读，继续阅读

评论收藏

内容反馈

豪AI冰

粉丝: 73
资源: 68

信息检索中的多模型路由技术：基于多个专家嵌入模型的RouterRetriever研究与应用

基于神经网络的个性化信息检索模型研究.pdf

基于BERT嵌入BiLSTM_CRF模型的中文专业术语抽取研究_吴俊1

基于BM25、BGE检索算法的检索增强生成RAG示例，支持OpenAI风格的大模型服务.zip

基于本体的信息检索模型研究

大模型应用开发：RAG入门与实战-札记PDF

P2P网络中基于多关键字的信息检索研究

信息检索经典从模型到算法教程

几种信息检索模型的比较

基于XML的移动信息检索模型研究 基于XML的移动信息检索模型研究

基于深度学习的图像检索研究

基于机器学习的知识检索模型研究.pdf

基于DualEncoder实现检索式对话模型数据集

计算机专业信息检索技术课程课件

现代信息检索-IR模型(

深度学习在三维模型检索中的应用.pdf

《AI大模型应用》-一个基于 RAG 与大模型技术的医疗问答系统.zip

对比学习训练无监督密集型信息检索模型

3D模型检索技术研究

基于深度学习的三维模型检索算法综述.pdf

腾讯研究院：2024工业大模型应用报告.pdf

信息检索模型调查报告

最新资源

基于XML的移动信息检索模型研究基于XML的移动信息检索模型研究