设备间通信中的延迟最优动态模式选择和资源分配-第一部分：最优策略

67 浏览量 2021-03-03 12:39:20 上传评论收藏 406KB PDF 举报

本文主要探讨了在设备到设备（Device-to-Device，简称D2D）通信环境中，如何通过动态模式选择和资源分配以达到延迟最优。文章分为两部分，第一部分着重于提出最优策略，而第二部分则关注于更实用的算法。文章研究了在正交频分复用多址接入（Orthogonal Frequency Division Multiple Access，简称OFDMA）蜂窝网络中，为了最小化平均端到端延迟性能，在丢包率约束条件下，如何进行动态模式选择和子信道分配。研究者们将最优资源控制问题形式化为无限时域平均奖励约束马尔可夫决策过程（CMDP），并利用基于简化的状态等效贝尔曼方程的穷举离线值迭代算法，得出了最优控制策略。然而，该策略面临著名的维度灾难问题，导致其在具有多个D2D用户和蜂窝用户的实际场景中应用受限。在文章的第二部分，作者采用了线性价值逼近技术以进一步减少状态空间。此外，文章还应用了具有两个时间尺度的在线随机学习算法，根据信道状态信息（CSI）和队列状态信息（QSI）的实时观测来更新价值函数和拉格朗日乘数（LMs）。这种组合在线随机学习解决方案在某些现实条件下几乎可以肯定地收敛到全局最优解。仿真结果表明，所提出的在线随机学习方法几乎达到了与离线值迭代算法相同的结果，并且在不稳定性方面优于传统的仅基于CSI的方案和吞吐量最优方案。文章中涉及的关键知识点如下： 1. 设备到设备（D2D）通信：D2D通信是一种无线通信技术，它允许移动设备在没有基站作为中介的情况下直接进行通信。该技术能够提高频谱效率，减少能量消耗，并提升通信延迟性能。 2. 正交频分复用多址接入（OFDMA）：OFDMA是一种无线通信技术，它允许用户共享频谱资源。在OFDMA系统中，总带宽被划分为多个子信道，不同的用户可以同时在不同的子信道上发送数据。 3. 平均端到端延迟：延迟是指数据从发送端传输到接收端所需的时间。端到端延迟是指从源点到目的地的整个通信路径上的平均延迟时间。在实时通信系统中，低延迟是关键性能指标。 4. 动态模式选择：在D2D通信中，动态模式选择是根据网络状况和设备能力，动态调整通信模式的过程。例如，根据信号强度和网络拥堵情况，选择直接通信或者通过基站中继。 5. 资源分配：资源分配是无线通信网络中的一项关键任务，涉及如何有效分配有限的频谱资源和其他网络资源以优化网络性能。本研究中特别关注如何分配子信道资源以最小化延迟。 6. 马尔可夫决策过程（MDP）：MDP是一种数学模型，用于在给定决策者当前状态和可能采取行动的情况下，建模随时间变化的决策问题。它广泛应用于优化控制系统中。 7. 价值迭代：价值迭代是解决MDP问题的一种方法，它通过迭代地估计状态的最优值来计算最优策略。 8. 随机学习算法：随机学习算法是一种不依赖于完整系统信息的学习方法，它通过利用系统中的随机性质来估计最优策略或参数。 9. 拉格朗日乘数（Lagrangian Multipliers）：在优化问题中，拉格朗日乘数用于将有约束的问题转化为无约束的问题，它将约束条件嵌入到目标函数中。 10. 信道状态信息（CSI）和队列状态信息（QSI）：CSI是关于无线信道特性的信息，如信号强度、信道质量和干扰水平等。QSI提供了关于通信队列中等待传输的数据包数量和队列长度的信息。 11. 在线和离线算法：在线算法是指在接收数据的同时进行处理的算法，而离线算法是指在所有数据收集完毕后再进行处理的算法。在线学习算法适用于实时系统，可以更好地适应动态变化的环境。文章通过探讨D2D通信中的延迟最优动态模式选择和资源分配问题，提出了有效的理论和算法框架，以应对实际应用中的挑战。通过采用在线随机学习方法，作者能够解决高维状态空间问题，并提升系统稳定性与延迟性能。

资源推荐

资源详情

资源评论

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TVT.2015.2444791, IEEE Transactions on Vehicular Technology

Delay-Optimal Dynamic Mode Selection and

Resource Allocation in Device-to-Device

Communications - Part II: Practical Algorithm

Lei Lei Member, IEEE, Yiru Kuang, Nan Cheng Student Member, IEEE, Xuemin (Sherman) Shen Fellow, IEEE,

Zhangdui Zhong and Chuang Lin Senior Member, IEEE

Abstract—In the Part I of the paper (“Delay-Optimal Dy-

namic Mode Selection and Resource Allocation in Device-to-

Device Communications - Part I: Optimal Policy”), we inves-

tigated dynamic mode selection and subchannel allocation for

an Orthogonal Frequency Division Multiple Access (OFDMA)

cellular network with device-to-device (D2D) communications

to minimize the average end-to-end delay performance under

dropping probability constraint. We formulated the optimal

resource control problem into an inﬁnite horizon average reward

constraint Markov decision process (CMDP), and the optimal

control policy derived in Part I using the brute-force ofﬂine

value iteration algorithm based on the reduced state equiv-

alent Bellman’s equation still faces the well-known curse of

dimensionality problem, which limits its practical application in

realistic scenarios with multiple D2D users and cellular users.

In the part II of the paper, we use linear value approximation

techniques to further reduce the state space. Moreover, online

stochastic learning algorithm with two time scales is applied to

update the value functions and Lagrangian Multipliers (LMs)

based on the real-time observations of channel state information

(CSI) and queue state information (QSI). The combined online

stochastic learning solution converges almost surely to a global

optimal solution under some realistic conditions. Simulation

results show that the proposed approach achieves nearly the

same performance as the ofﬂine value iteration algorithm, and

outperforms the conventional CSI-only scheme and throughput-

optimal scheme in stability sense.

Index Terms—Device-to-Device Communication; Mode Selec-

tion; Resource Allocation; Online Stochastic Learning

I. INTRODUCTION

In the Part I of the paper [1], we introduced the problem

of optimal dynamic mode selection and resource allocation to

minimize the average end-to-end delay under the constraint

of packet dropping probability for network assisted device-to-

device (D2D) communications [2]–[4] with bursty trafﬁc. We

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

Manuscript received Jan. 12, 2015; revised March 22, 2015; accepted

June 8, 2015. This work was supported by the National Natural Science

Foundation of China (No. 61272168, No. U1334202, No. 61472199), the

State Key Laboratory of Rail Trafﬁc Control and Safety (No. RCS2014ZT10),

Beijing Jiaotong University, and the Key Grant Project of Chinese Ministry

of Education (No. 313006).

L. Lei, Y. Kuang and Z. Zhong are with the State Key Laboratory of Rail

Trafﬁc Control and Safety, Beijing Jiaotong University, China.

N. Cheng and X. Shen are with the Department of Electrical and Computer

Engineering, University of Waterloo, Waterloo, Ontario, Canada.

C. Lin is with the Department of Computer Science and Technology,

Tsinghua University, Beijing, China.

considered an Orthogonal Frequency Division Multiple Access

(OFDMA) system with one base station (BS), multiple D2D

user equipment (UE) pairs, and cellular UEs with uplink or

downlink transmission. Compared with the resource control

problem in traditional cellular networks, there are a number

of unique issues to address to obtain resource optimization

in D2D communications, such as (1) route selection between

the one-hop route of D2D link (direct over-the-air link) in

D2D Mode and the two-hop route of cellular links in Cellular

Mode; (2) resource allocation for D2D links and cellular links

with resource reuse; (3) joint uplink and downlink resource

optimization for the end-to-end performance of the two-hop

route when a pair of D2D UEs works in the Cellular Mode.

In order to characterize the above issues, we ﬁrst developed

a queuing model whose underlying system state dynamics

evolves as a controlled Markov chain, where the system state

includes the joint queue state of the queues at the UEs for

uplink transmission and the queues at the BS for downlink

transmission as well as the joint channel state of all the D2D

links, cellular uplinks and cellular downlinks. Speciﬁcally, we

introduced two important concepts to characterize the unique

features of D2D communications. The ﬁrst concept is radio

resource group (RRG), which deﬁnes a group of links that

may reuse radio resources. Therefore, the channel state of a

link is a tuple including its Adaptive Modulation and Coding

(AMC) states in all the RRGs that this link belongs to. The

second concept is link constraint set of a queue to characterize

the set of servers for the queue in different routes. Based on

the queuing model, the delay-optimal resource control over

frequency-selective fading channel with AMC scheme in the

physical layer is formulated as an inﬁnite horizon average

reward constrained Markov Decision Process (CMDP) [6],

[7]. In order to formulate the CMDP model, the transition

kernel of the controlled Markov chain was derived, which

takes into account the coupling relationship between the

uplink and downlink resource allocation. Moreover, closed-

form expressions for end-to-end performance metrics such

as average delay and dropping probability were given as

functions of steady-state probabilities of the controlled Markov

chain, based on which the cost function of CMDP model was

given. We utilized the Lagrangian approach to turn the CMDP

problem into an unconstraint Markov Decision Process (MDP)

problem, and established the strong duality result over the

space of randomized policy. Moreover, we further proved the

existence of an optimal policy, which is either a deterministic

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TVT.2015.2444791, IEEE Transactions on Vehicular Technology

policy or a mix of two deterministic policies, equivalent to

choosing independently one of two deterministic policies at

each epoch by the toss of a (biased) coin. To solve the un-

constraint MDP problem, we derived an equivalent Bellman’s

equation with reduced state space. We showed by simulations

that the optimal policy derived by the brute-force ofﬂine value

iteration algorithm based on the equivalent Bellman’s equation

achieves signiﬁcant gain compared to various baselines such as

the conventional CSI-only control and the throughput optimal

control (MaxWeight algorithm).

It is worth noting that the complexity of the brute-force

ofﬂine value iteration algorithm based on the reduced state

equivalent Bellman’s equation still grows exponentially with

the number of users in the network, limiting its application

in practical scenarios. In fact, it is well-known that there is

no simple solution for the inﬁnite horizon average reward

MDP problem that delay-aware resource control belongs to,

because the brute-force value iterations or policy iterations

could not lead to any viable solution due to the curse of

dimensionality [8]–[11]. Moreover, our problem for network

assisted D2D communications is further complicated due to

the unique issues listed above. For example, the channel state

transition probabilities, which are used to derive the condi-

tional expectations of cost function and queue state transition

probabilities in the equivalent Bellman’s equation, are very

difﬁcult to obtain when more than two links are allowed to

reuse the same time-frequency resource.

In the Part II of the paper, we address the curse of dimen-

sionality problem in solving the CMDP formulated in Part I,

so that a practical algorithm with acceptable computational

complexity and signaling overhead can be derived. To reduce

the complexity, we obtain a delay-optimal solution using ap-

proximate dynamic programming and online stochastic learn-

ing. Speciﬁcally, we approximate the value function in the

equivalent Bellman’s equation by a sum of per-queue value

functions. The per-queue value functions are estimated and

learned using an online stochastic learning algorithm based

on the real-time observations of the CSI and QSI, eliminating

the need of deriving the channel state transition probabilities.

Moreover, the Lagrangian Multipliers (LMs) for the constraint

optimization problem are updated simultaneously with the

value functions over different time scales. The optimal dy-

namic mode selection and resource allocation actions can be

determined by an algorithm that has a similar structure with

the MaxWeight algorithm in Lyapunov stability approach, with

the weight determined by the per-queue value functions instead

of the queue lengths. We prove the almost-sure convergence

of the proposed algorithm. We also show by simulations that

our proposed scheme achieves signiﬁcant gain compared to

various baselines such as the conventional CSI-only control

and the throughput optimal control (MaxWeight algorithm).

Together with Part I, this pair of works provide a general

framework for the dynamic constrained optimization of mode

selection and resource allocation in D2D communications

under bursty trafﬁc model, where the general form of the

optimal policy and a practical algorithm with simple structure

and near-optimal performance are given.

The organization of the paper is as follows. We recall the

general network model for network assisted D2D communi-

cations as well as the MDP problem formulation for dynamic

mode selection and resource allocation in Section II. In Section

III, we derive a low complexity learning algorithm, which

updates the per-queue value functions based on real-time

observations of CSI and QSI, as well as a resource allocation

algorithm with similar structure as the MaxWeight algorithm.

In Section IV, we discuss the performance simulations. Finally,

we summarize the main results in Section V.

II. NETWORK MODEL AND PREVIOUS RESULTS

A. Network Model

Consider a Frequency Division Duplex (FDD) OFDMA

cellular network with D2D communications capability, where

there are D D2D UE pairs, C

cellular UEs (CUEs) with

uplink communications and C

CUEs with downlink com-

munications in a single cell. A D2D UE pair consists of

a source D2D UE (src. DUE) and a destination D2D UE

(dest. DUE) within direct over-the-air communications range

with each other, which is formed through the various neigh-

bor/peer/service discovery mechanisms proposed in literature.

The whole uplink or downlink spectrum is divided into N

equal size subchannels. A subchannel in the uplink (resp.

downlink) spectrum shall be referred to as uplink (resp.

downlink) subchannel in the rest of the paper. Moreover, we

assume that D2D links share uplink resources with cellular

uplinks. Time is slotted and each time slot has an equal length.

The above OFDMA cellular network with D2D commu-

nications can be formulated as a general network model

with a set N of nodes and a set L of transmission links.

Deﬁne N := {0, 1, . . . , N}, where node 0 represents the base

station (BS) and nodes 1, . . . , 2D represent the DUEs, nodes

2D + 1, . . . , 2D + C

represent the uplink CUEs, and nodes

2D + C

+ 1, . . . , N = 2D + C

+ C

represent the downlink

CUEs. We use i or j to denote the index of a node within N

(i.e., i, j ∈ N ) in the rest of the paper. Each transmission link

represents a communication channel for direct transmission

from a given node i to another node j, and is labeled by

(i, j) (where i, j ∈ N ). All data that enter the network

are associated with a particular connection which deﬁnes the

source and destination of the data. Let C

= {1, . . . , D},

= {D + 1, . . . , D + C

}, and C

= {D + C

1, . . . , D + C

+ C

} represent the set of D2D connections,

cellular uplink connections and cellular downlink connections,

respectively. Deﬁne C := {1, . . . , C} = C



(with

C = D+C

) as the set of all connections in the network.

We use c to denote the index of a connection within C (i.e.,

c ∈ C) in the rest of the paper.

The data from connection c is transmitted hop by hop along

the route(s) of the connection to its destination node. Each

node i along the route(s) of connection c maintains a queue

(c)

for storing its data except for the destination node, since

the data is considered to exit the network once it reaches the

destination. Deﬁne Θ as the set of queues in the system. We

assume each queue has a ﬁnite capacity of N

< ∞ (in

number of bits or packets). The set of queues can be divided

into two non-overlapping disjoint sets, i.e., uplink queues Θ

剩余12页未读，继续阅读

评论收藏

内容反馈

weixin_38608378

粉丝: 4
资源: 857

设备间通信中的延迟最优动态模式选择和资源分配-第一部分：最优策略

用动态规划法求解资源分配问题

动态规划求解资源分配问题

D2D通信的干扰控制和资源分配算法研究

通风设备最优分配策略研究

5G技术与标准介绍----第3部分：5G网络技术之核心网介绍

YD-T 1371.6-2006 2GHz TD-SCDMA数字蜂窝移动通信网 Uu接口物理层技术要求 第六部分：物理层测量

miehie_v63.zip_资源分配_资源分配算法

河北大学计算机网络历年期末试题09-10b

9th IEEE/ACM International Symposium on Cluster Computing and the Grid

电信设备-无线通信系统中的调度许可传输方法.zip

TD-SCDMA移动通信系统中的分组业务调度算法

计算机网络课后题答案

通信与网络（国际期刊）

网优入门手册

网络工程师学习笔记共享（共11章）.txt

TD-SCDMA标准_初稿

Networkers2009：BRKAPP-2002 - Server Load Balancing Design

3GPP Release-13 description

Paper 1_resourceallocation_d2d_auction_

F5负载均衡培训

计算机-后端-认知无线局域网资源管理架构的研究和实现.pdf

计算机网络第五版习题答案

计算机网络 课后题答案

神经网络在K短路路径优化中的并行计算.pptx

公共安全视频监控建设联网应用项目(雪亮工程)北京路施工组织计划.doc

第四代移动通信系统的关键技术及分析

3GPP Release-11 description

无线通信第二章蜂窝网的概念.pptx

最新资源

YD-T 1371.6-2006 2GHz TD-SCDMA数字蜂窝移动通信网 Uu接口物理层技术要求第六部分：物理层测量

计算机网络课后题答案