Foundations_and_Trends_in_Machine_Learning_Foundations资源-CSDN文库

共15个文件

pdf：15个

Machine

Learning

5星 · 超过95%的资源需积分: 15 162 浏览量 2013-12-13 15:24:27 上传评论 1 收藏 18.73MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

book Foundations_and_Trends_in_Machine_Learning.zip （15个子文件）

5_2-3.pdf 2.38MB

2_1.pdf 1.08MB

3_1.pdf 776KB

4_2.pdf 562KB

2_4.pdf 1.36MB

1_1-2.pdf 2.05MB

2_2a.pdf 1.82MB

4_3.pdf 544KB

2_3.pdf 315KB

1_4.pdf 4.29MB

4_1.pdf 3.1MB

4_4.pdf 662KB

5_1.pdf 808KB

3_3-4.pdf 871KB

1_3.pdf 564KB

Foundations and Trends



Machine Learning

Vol. 1, No. 4 (2008) 403–565

 2009 S. Mahadevan

DOI: 10.1561/2200000003

Learning Representation and Control

in Markov Decision Processes: New Frontiers

By Sridhar Mahadevan

Contents

1 Introduction 404

1.1 Motivation 404

1.2 Laplacian Operators 409

1.3 Dimensionality Reduction of MDPs 412

1.4 Roadmap to the Paper 416

2 Sequential Decision Problems 418

2.1 Markov Decision Processes 419

2.2 Exact Solution Methods 429

2.3 Simulation-Based Methods 432

3 Laplacian Operators and Markov Decision

Processes 435

3.1 Laplacian Operators 436

3.2 Laplacian Matrices in MDPs 437

3.3 Generalized Inverses of the Laplacian 439

3.4 Positive-Semideﬁnite Laplacian Matrices 450

4 Approximating Markov Decision Processes 458

4.1 Linear Value Function Approximation 458

4.2 Least-Squares Approximation of a Fixed Policy 463

4.3 Approximation in Learning Control 466

4.4 Approximation Using Convex Optimization 469

4.5 Summary 472

5 Dimensionality Reduction Principles in MDPs 473

5.1 Low-Dimensional MDP Induced by a Basis 473

5.2 Formulating the Basis Construction Problem 475

5.3 Basis Construction Through Adaptive State Aggregation 478

5.4 Invariant Subspaces: Decomposing an Operator 479

6 Basis Construction: Diagonalization Methods 483

6.1 Diagonalization of the Laplacian of a Policy 483

6.2 Regularization Using Graph Laplacian Operators 486

6.3 Scaling to Large State Space Graphs 491

7 Basis Construction: Dilation Methods 498

7.1 Krylov Spaces 499

7.2 Reward Dilation Using Laplacian Operators 500

7.3 Reward Dilation Using Drazin Inverse of Laplacian 502

7.4 Schultz Expansion for Drazin and Krylov Bases 505

7.5 Multiscale Iterative Method to Compute Drazin Bases 507

7.6 Dilation and Multiscale Analysis 508

7.7 Diﬀusion Wavelets 510

8 Model-Based Representation Policy Iteration 519

8.1 Representation Policy Iteration: Drazin and Krylov Bases 520

8.2 Representation Policy Iteration: Diﬀusion Wavelets 520

8.3 Experimental Results 520

9 Basis Construction in Continuous MDPs 529

9.1 Continuous Markov Decision Processes 529

9.2 Riemannian Manifolds 530

9.3 Sampling Techniques 534

9.4 Learning Eigenfunctions Using Nystr¨om Extension 535

10 Model-Free Representation Policy Iteration 538

10.1 Model-Free Representation Policy Iteration 539

10.2 Scaling to Large Discrete MDPs 539

10.3 Experimental Results of RPI in Continuous MDPs 544

10.4 Krylov-Accelerated Diﬀusion Wavelets 546

11 Related Work and Future Challenges 549

11.1 Related Work 549

11.2 Future Work 551

11.3 Summary 554

Acknowledgments 556

References 557

Foundations and Trends



Machine Learning

Vol. 1, No. 4 (2008) 403–565

 2009 S. Mahadevan

DOI: 10.1561/2200000003

Learning Representation and Control

in Markov Decision Processes: New Frontiers

Sridhar Mahadevan

Department of Computer Science, University of Massachusetts — Amherst,

140 Governor’s Drive, Amherst, MA 01003, USA, mahadeva@cs.umass.edu

Abstract

This paper describes a novel machine learning framework for solving

sequential decision problems called Markov decision processes (MDPs)

by iteratively computing low-dimensional representations and approx-

imately optimal policies. A uniﬁed mathematical framework for learn-

ing representation and optimal control in MDPs is presented based

on a class of singular operators called Laplacians, whose matrix repre-

sentations have nonpositive oﬀ-diagonal elements and zero row sums.

Exact solutions of discounted and average-reward MDPs are expressed

in terms of a generalized spectral inverse of the Laplacian called the

Drazin inverse. A generic algorithm called representation policy iter-

ation (RPI) is presented which interleaves computing low-dimensional

representations and approximately optimal policies. Two approaches

for dimensionality reduction of MDPs are described based on geometric

and reward-sensitive regularization, whereby low-dimensional represen-

tations are formed by diagonalization or dilation of Laplacian opera-

tors. Model-based and model-free variants of the RPI algorithm are

presented; they are also compared experimentally on discrete and

continuous MDPs. Some directions for future work are ﬁnally outlined.

Introduction

In this section, we introduce the problem of representation discovery in

sequential decision problems called Markov decision processes (MDPs),

whereby the aim is to solve MDPs by automatically ﬁnding “low-

dimensional” descriptions of “high-dimensional” functions on a state

(action) space. The functions of interest include policy functions speci-

fying the desired action to take, reward functions specifying the imme-

diate payoﬀ for taking a particular action, transition distributions

describing the stochastic eﬀects of doing actions, as well as value func-

tions that represent the long-term sum of rewards of acting according

to a given policy. Our aim is to illustrate the major ideas in an informal

setting, leaving more precise deﬁnitions to later sections. The concept

of a Laplacian operator is introduced, and its importance to MDPs is

explained. The general problem of dimensionality reduction in MDPs

is discussed. A roadmap to the remainder of the paper is also provided.

1.1 Motivation

A variety of problems of practical interest to researchers across a diverse

range of areas, from artiﬁcial intelligence (AI) [117] to operations

404

评论收藏

内容反馈

turgunn

2015-03-26

It is a really good book about machine learning.
gw

2018-07-30

很好，非常全面
bluemoonstar

2017-02-14

好东东非常感谢

dengcy028

粉丝: 4
资源: 44

Foundations_and_Trends_in_Machine_Learning

Foundations and Trends® in Machine Learning

foundation of machine learning

Foundation and Trends in Machine Learning 1.3

通信与信息论基础 Foundations and Trends in Communications and Information Theory

Foundations_of_Machine_Learning

软件系统架构经典,software_architecture_foundations_theory_and_practise

Foundations of Machine Learning 2018版.rar

Foundations of Machine Learning

Mathematical_Foundations_of_Elasticity.pdf

Foundations_of_Qt_Development

Foundations_of_Machine_Learning.zip_文章/文档_matlab__文章/文档_matlab_

Foundations_of_Qt_Development-CODE

CAA_V5_For_CATIA_Foundations_Exercises

Foundations_of_Interconnect and microstrip design [Edwards_T.C.,_Steer_M.B.]

Foundations_of_Python_Network_Programming

excel_python_pythonexcel_Foundations_analytics_excelpython_源码.

Foundations_of_Qt_Development src

解决win7win8win10装4.8-3.5的.Net framework3.5安装失败问题 附带安装文档

谷歌浏览器axure扩展程序

时序图画图工具-TimeGen3.2安装包

大唐杯习题合集-历年真题模拟题

zotero-pdf-translate-1.0.24（2023年7月10日）

姓名变为拼音.bas

百度、高德、腾讯、天地图、谷歌、必应、MapBox等地图金字塔切图工具 MapCutter 3.11.2

CiteSpace5.6.R2，目前最稳定的版本，改时间就可以用

C#源码 上位机 SECS协议，里面包含各种进制转换，用于半导体行业，程序全源码

百度、高德、腾讯、天地图、谷歌、必应、MapBox等地图金字塔切图工具 MapCutter 3.10.1

小米盒子3，MDZ-16-AA，系统镜像文件1.4.16d.full.img

最新资源

解决win7win8win10装4.8-3.5的.Net framework3.5安装失败问题附带安装文档

C#源码上位机 SECS协议，里面包含各种进制转换，用于半导体行业，程序全源码