预测列车运行延误的混合贝叶斯网络模型_贝叶斯模型航班延误资源-CSDN文库

High-speed

Bayesian

107 浏览量 2021-03-14 21:06:36 上传评论收藏 1.05MB PDF 举报

资源推荐

资源详情

资源评论

Contents lists available at ScienceDirect

Computers & Industrial Engineering

journal homepage: www.elsevier.com/locate/caie

A hybrid Bayesian network model for predicting delays in train operations

Javad Lessan

a,c

, Liping Fu

a,b

, Chao Wen

b,c,

⁎

Department of Civil and Environmental Engineering, University of Waterloo, Waterloo N2L 3G1, Canada

School of Transportation & Logistics, Southwest Jiaotong University, Chengdu, Sichuan 610031, China

Railway Research Center, University of Waterloo, Waterloo N2L 3G1, Canada

ARTICLE INFO

Keywords:

High-speed rail

Train operation

Punctuality

Bayesian networks

Delay prediction

Performance evaluation

ABSTRACT

We present a Bayesian network-(BN) based train delay prediction model to tackle the complexity and de-

pendency nature of train operations. Three diﬀerent BN schemes, namely, heuristic hill-climbing, primitive

linear and hybrid structure, are investigated using real-world train operation data from a high-speed railway

line. We ﬁrst use historical data to rationalize the dependency graph of the developed structures. Each BN

structure is then trained with the gold standard k-fold cross validation approach to avoid over-ﬁtting and

evaluate its performance against the others. Overall, the validation results indicate that a BN-based model can be

an eﬃcient tool for capturing superposition and interaction eﬀects of train delays. However, a well-designed

hybrid BN structure, developed based on domain knowledge and judgments of expertise and local authorities,

can outperform the other models. We present a performance comparison of the predictions obtained from the

hybrid BN structure against the real-world benchmark data. The results show that the proposed model on

overage can achieve over 80% accuracy in predictions within a 60-min horizon, yielding low prediction errors

regarding mean absolute error (MAE), mean error (ME) and root mean square error (RMSE) measures.

1. Introduction

A railway system comprises several subsystems, such as network

infrastructure, rolling-stock, control and communication, and various

operational rules and policies with the goal of providing reliable train

services to transport passengers or goods. However, many uncertainties

may arise from these subsystems that can disturb the planned activities

and operations, resulting in unexpected delays (Wen et al., 2017). As a

service complaint, train delays impose a huge cost on passengers and

operators, contributing to the ineﬃciency of train operations (Van Oort,

2011). In the United Kingdom, for instance, 14 million train-minute

delays were recorded during 2006–2007 on the British national rail

network that cost over £1 billion in terms of lost time to the passengers

(Oﬃce, 2008). Consequently, reducing delays is of great importance to

train operators and desirable to passengers (Marković, Milinković,

Tikhonov, & Schonfeld, 2015). Speciﬁcally, the validity of all levels of

railway operations planning, such as creating feasible and realizable

timetables, predicting real-time traﬃc, predicting conﬂicts, and pro-

viding reliable passenger information, depends highly on the accurate

estimation of train process times that are subject to delay incidents

(Kecman & Goverde, 2015b, 2015a; Kecman, Corman, Peterson, &

Joborn, 2015b). Therefore, delays should be predicted and compen-

sated in time, otherwise there may be a disruption or domino eﬀect of

the propagated delays (Zhang, Li, & Yang, 2018). While part of the

delay factors inﬂuencing train process times is predictable and con-

trollable, most of them are not only uncontrollable but also un-

predictable, adding to the challenges of managing railway operations.

In real-world train operations, delay prediction relies heavily on the

experience and intuition of a local dispatcher rather than a network-

wide computational instrument (Martin, 2016). Given the complex

structure of a railway network and interdependent train operations

between a large set of origins and destinations, a local dispatcher’s

estimation of delays and the subsequent decisions are strongly depen-

dent on the state of traﬃc and network and limited to a local geo-

graphical area. In large and dense network areas, however, the domain

knowledge and expertise of local dispatchers must be supported by an

advanced computational tool that can account for the inter-

dependencies of train operations and interrelated delay factors. Crea-

tion of such an advanced tool has been hindered by two fundamental

limitations. Firstly, methodologically, there has been a lack of models

capable of simultaneously examining multiple components of delay

incidents intertwined with stochastic operations and interaction eﬀects.

Secondly, technologically, there has been a need for collection and

incorporation of massive train operation data. Recently, the integration

of graph and probability theories led to the introduction of Bayesian

networks (BNs) that enabled practitioners to overtake these limitations.

https://doi.org/10.1016/j.cie.2018.03.017

Received 3 September 2017; Received in revised form 5 March 2018; Accepted 9 March 2018

⁎

Corresponding author at: School of Transportation & Logistics, Southwest Jiaotong University, Chengdu, Sichuan 610031, China.

E-mail addresses: jlessan@uwaterloo.ca (J. Lessan), lfu@uwaterloo.ca (L. Fu), wenchao@swjtu.cn (C. Wen).

Computers & Industrial Engineering xxx (xxxx) xxx–xxx

Please cite this article as: Lessan, J., Computers & Industrial Engineering (2018), https://doi.org/10.1016/j.cie.2018.03.017

Speciﬁcally, BNs methodology is a representational tool meant to cap-

ture complex structures and “organize one’s knowledge about a parti-

cular situation into a coherent whole” (Darwiche, 2009). At the same

time, it allows for incorporation of massive historical data in identifying

the contingencies between multiple events and updating the state of

diﬀerent variables given real-time data. These features, convoluting

diﬀerent factors and fusing massive data, have given BNs an advantage

over other artiﬁcial intelligence techniques.

In this paper, we present three diﬀerent BN designing architectures,

namely, a heuristic, a naive, and a hybrid method, to represent the

relationship and superposition of interdependent variables identiﬁed in

the delay chain of trains. Using information obtained from historical

data, we rationalize the contingency graph of the proposed BN struc-

tures. Next, we apply the gold standard k-fold cross-validation method

to train and evaluate the proposed BNs. The hybrid BN structure,

having a higher performance compared to the other models, is then

tested against real-world benchmark data under di ﬀerent performance

measures. To the best of our knowledge, this is the ﬁrst hybrid BN-

based delay prediction model introduced into the relevant prediction

literature. The main idea behind the hybrid structure introduced here is

to distinguish between the delay due to the most recent performed

operation and the delay propagated from previous operations. The

proposed ideas were made possible through examining the similarities

and diﬀerences between the naive and heuristic structures supported by

domain knowledge and expertise of local authorities. Our results can be

generalized to similar problems in other networks in order to better

support train dispatching and delay management decisions.

The remainder of this paper is structured as follows. The next sec-

tion presents a brief overview of the related literature and summarizes

our contributions. Section 3 provides the methodological framework

and formal description of the terms and the concepts used in this study.

Section 4 describes the historical data and our general assumptions, and

continues with training and validating results of the candidate BNs.

Section 5 focuses on evaluating the performance of the hybrid BN

model discussed under diﬀerent performance measures. Finally, con-

clusions and future research directions are presented in Section 6.

2. Literature review

Train timetables are traditionally scheduled using train motion

equations with the input of the estimated running and dwelling times at

individual stations and sections. To minimize the probability of sche-

dule deviation in actual operations, the parameters of these equations

are usually tuned or optimized based on historical train data (Kecman &

Goverde, 2015b). However, these techniques are not adaptive, often

failing to address the time-varying nature of train operation settings.

For example, each new operational conﬁguration would require re-

optimizing the timetables, which is computationally extensive. Some of

these drawbacks could be overcome by applying data-driven ap-

proaches and statistical models to estimate the process times based on

various contributing factors (Kecman & Goverde, 2015b). The under-

lying problem is related to the delay prediction practice that has re-

ceived considerable attention due to its vital importance to train op-

erations management and passenger information provision (Meester &

Muns, 2007).

A number of prediction models have been developed in the litera-

ture, which can be classiﬁ

ed by their scope, model types and solution

methods

(Marković et al., 2015). Traditional methods such as regres-

sion models have been introduced to predict delays. However, these

methods require frequent updates of train positions and rich data.

Micro- and macro-level simulation tools have been applied to simulate

delays at diﬀerent level of details. The simulation models, developed

based on ﬁxed distributions, require frequent updates from train posi-

tions and real-time train data (Kecman et al., 2015b). The update re-

quirements are mostly due to time-varying operational conditions and

the interaction between diﬀerent subsystems (stations, sections and

trains) under the eﬀects of infrastructure and operational rules. Yuan

(2006) and Yuan, Goverde, and Hansen (2002) presented a delay pre-

diction model that deals with the stochastic behavior, dependency of

train delays and delay propagation to assess stability and punctuality of

a published timetable against primary delays. An artiﬁcial neural net-

work model was proposed to predict the delay of passenger trains in

Iranian Railways (Yaghini, Khoshraftar, & Seyedabadi, 2013). The ac-

curacy level of the proposed model was found to be superior to other

statistical models such as decision tree and multinomial logistic re-

gression methods. Peters, Emig, Jung, and Schmidt (2005) developed

an intelligent real-time delay prediction model that predicts the delay

of the upstream or downstream trains based on the delays currently

incurred in the network. The prediction accuracy of the proposed model

was compared against a rule-based system with a set of predeﬁned rules

in a deterministic manner. However, these models are not ﬂexible en-

ough to incorporate the domain knowledge of experts and local dis-

patchers as well as the operational characteristics.

A generic statistical model for estimating the running and dwelling

times was proposed by Kecman and Goverde (2015b). Three global

predictive models: robust linear regression, regression trees, and

random forests are presented based on advanced statistical learning

techniques. Moreover, based on the robust linear regression and some

reﬁnements, they calibrated local models for each particular train line,

station or block section. The presented models were evaluated using an

aggregated set of historical data on the level of block sections. In an-

other eﬀort, the real-time prediction of train delays was used to detect

instabilities in the timetable and retrieve a feasible train schedule

(Marković et al., 2015). Kecman (2014) also proposed a real-time delay

prediction model based on historical arrival and departure data. Event

graphs were used in Hansen, Goverde, and van der Meer (2010),to

forecast running and arrival times. A stochastic model for delay pro-

pagation in large transportation networks was proposed by Berger,

Gebhardt, Müller-Hannemann, and Ostrowski (2011), to process mas-

sive streams of real-time data. In the same way, the statistical models

are not adaptive enough to incorporate the domain knowledge of local

dispatchers and networks ’ characteristics.

A model for real-time prediction of train delays using Bayesian

reasoning can be found in Kecman, Corman, and Meng (2015a). They

used two months of historical traﬃc realization data from the Swedish

infrastructure manager in a simulated real-time environment. The

computational results indicated that the predictions are reliable for up

to 30-min horizons. Their main assumption, however, is that the train

orders and routes within the prediction horizon are known, which is

often not the case in the real-world. A Bayesian model for predicting the

propagation of delays can be found in Kecman

et al. (2015b), which

uses real-time events based on their speciﬁc order. Martin (2016) pro-

posed a prototype rail advisory system that applies a series of predictive

reasoning and machine learning models, to predict the eﬀects of various

disruptions. Also train movement data, collected from the infrastructure

track occupation records, sensors in rolling-stock, or mobile GPS de-

vices, were used by Flier, Graﬀagnino, and Nunkesser (2009) to ﬁnd

robust train paths. Marković et al. (2015) presented a comparison be-

tween the performance of support vector regression and neural net-

works for analyzing passenger train arrival delays and the inﬂuence of

infrastructure on arrival delays. Using numerous test instances they

show that support vector regression outperforms other models in pre-

dicting arrival delays. However, to date, identifying which BN archi-

tectures are most valid/reliable for predicting train delays for each

particular network structure has not been well studied.

Clearly, there is still a need for better predictive models that account

for massive real-world train operation data, domain knowledge and

expertise of local authorities. In this paper, for the ﬁrst time we propose

a hybrid BN-based predictive model for predicting arrival and de-

parture delays, built upon testing diﬀerent BN architectures, wealth of

train operation records, and domain-speciﬁc knowledge. The proposed

model is easy to interpret and generalize while at the same time

J. Lessan et al.

Computers & Industrial Engineering xxx (xxxx) xxx–xxx

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38650379

粉丝: 4
资源: 901

预测列车运行延误的混合贝叶斯网络模型

tcp：TrAIn_Connection_Prediction-最佳列车延误预测系统。 TCP协议

满足客流需求的地铁列车延误协同调整方法

基于贝叶斯网络模型的交通状态预测

贝叶斯动态模型及其预测

基于混合贝叶斯网络数据挖掘及研究生升学预测模型的研究.pdf

一种基于周期的动态贝叶斯网络预测模型

机器学习预测系统python合集（贝叶斯网络、马尔科夫模型、线性回归、岭回归、多项式回归、决策树回归、深度神经网络预测）.zip

贝叶斯预测模型的应用

贝叶斯网络模型

贝叶斯预测模型

贝叶斯网络模型预测小学生成绩

论文研究-动态贝叶斯网络在水文预报中的应用.pdf

贝叶斯网络 MATLAB 代码

贝叶斯网络java代码及训练集测试集_贝叶斯网络_测试集_

用R和WinBUGS实现贝叶斯分级模型

贝叶斯网络预测平台的设计与开发.pdf

一种基于贝叶斯网络模型的交通事故预测方法_秦小虎.pdf

贝叶斯网络20题目.docx

贝叶斯网络可视化软件GeNIe使用说明

基于动态贝叶斯网络的战场信息预测与评估_陈固胜

基于python实现的机器学习预测系统汇总+GUI界面(贝叶斯网络、马尔科夫模型、线性回归、岭回归多项式回归、决策树等).zip

贝叶斯网络模型概述1

高等学校规划教材《多变量贝叶斯动态模型及其预测》

matlab开发-贝叶斯回归混合模型

贝叶斯网络建模软件 GeNIe 用户指南

基于贝叶斯的组合模型交通预测

网络游戏-基于贝叶斯网络模型的水质污染判断方法.zip

kenqui.zip_贝叶斯 预测_贝叶斯估计_贝叶斯分析

最新资源

kenqui.zip_贝叶斯预测_贝叶斯估计_贝叶斯分析