没有合适的资源?快使用搜索试试~ 我知道了~
Torque Resource Manager Administrator Guide 5.1.3
需积分: 0 2 下载量 36 浏览量
2024-02-16
17:46:46
上传
评论
收藏 2.13MB PDF 举报
温馨提示
试读
352页
torque 5.1.3版本的user guide,详细介绍了torque的安装部署,适用于集群用户的参考,以及排除安装、使用的疑难问题。
资源推荐
资源详情
资源评论
TORQUE Resource Manager
Adminstrator Guide 5.1.3
May 2016
© 2016 Adaptive Computing Enterprises, Inc. All rights reserved.
Distribution of this document for commercial purposes in either hard or soft copy form is strictly prohibited without prior
written consent from Adaptive Computing Enterprises, Inc.
Adaptive Computing, Cluster Resources, Moab, Moab Workload Manager, Moab Viewpoint, Moab Cluster Manager, Moab
Cluster Suite, Moab Grid Scheduler, Moab Grid Suite, Moab Access Portal, and other Adaptive Computing products are either
registered trademarks or trademarks of Adaptive Computing Enterprises, Inc. The Adaptive Computing logo and the Cluster
Resources logo are trademarks of Adaptive Computing Enterprises, Inc. All other company and product names may be
trademarks of their respective companies.
Adaptive Computing Enterprises, Inc.
1712 S. East Bay Blvd., Suite 300
Provo, UT 84606
+1 (801) 717-3700
www.adaptivecomputing.com
Scan to open online help
ii
iii
Welcome To TORQUE Resource Manager 1
TORQUE Administrator Guide Overview 2
Chapter 1 Introduction 4
Chapter 2 Installation And Configuration 7
TORQUE Installation Overview 7
TORQUE Architecture 7
Installing TORQUE 8
Compute Nodes 14
Enabling TORQUE As A Service 16
Initializing/Configuring TORQUE On The Server (pbs_server) 17
Specifying Compute Nodes 19
Configuring TORQUE On Compute Nodes 20
Configuring Ports 20
Configuring Trqauthd For Client Commands 24
Finalizing Configurations 25
Advanced Configuration 25
Customizing The Install 26
Server Configuration 33
Setting Up The MOM Hierarchy (Optional) 38
Manual Setup Of Initial Server Configuration 41
Server Node File Configuration 42
Basic Node Specification 42
Specifying Virtual Processor Count For A Node 43
Specifying GPU Count For A Node 43
Specifying Node Features (Node Properties) 44
Testing Server Configuration 44
TORQUE On NUMA Systems 46
TORQUE NUMA Configuration 46
Building TORQUE With NUMA Support 46
TORQUE Multi-MOM 50
Multi-MOMConfiguration 51
Stopping Pbs_mom In Multi-MOM Mode 52
Chapter 3 Submitting And Managing Jobs 54
Job Submission 54
Multiple Job Submission 56
Managing Multi-Node Jobs 57
Requesting Resources 58
Requesting Generic Resources 65
Requesting Floating Resources 66
Requesting Other Resources 66
Exported Batch Environment Variables 66
Enabling Trusted Submit Hosts 68
Example Submit Scripts 69
Job Files 69
Monitoring Jobs 71
Canceling Jobs 71
Job Preemption 72
Keeping Completed Jobs 72
Job Checkpoint And Restart 73
Introduction To BLCR 74
Configuration Files And Scripts 74
Starting A Checkpointable Job 81
Checkpointing A Job 82
Restarting A Job 83
Acceptance Tests 83
Job Exit Status 83
Service Jobs 87
Submitting Service Jobs 88
Submitting Service Jobs In MCM 88
Managing Service Jobs 89
Chapter 4 Managing Nodes 90
Adding Nodes 90
Node Properties 91
Changing Node State 92
Changing Node Power States 93
Host Security 96
Linux Cpuset Support 97
Scheduling Cores 99
Geometry Request Configuration 99
Geometry Request Usage 100
Geometry Request Considerations 100
Scheduling Accelerator Hardware 101
Chapter 5 Setting Server Policies 102
Queue Configuration 102
Queue Attributes 103
Example Queue Configuration 114
Setting A Default Queue 114
Mapping A Queue To Subset Of Resources 114
iv
v
Creating A Routing Queue 115
Server High Availability 117
Setting Min_threads And Max_threads 131
Chapter 6 Integrating Schedulers For TORQUE 132
Chapter 7 Configuring Data Management 133
SCP Setup 133
Generating SSH Key On Source Host 133
Copying Public SSH Key To Each Destination Host 134
Configuring The SSH Daemon On Each Destination Host 134
Validating Correct SSH Configuration 135
Enabling Bi-Directional SCP Access 135
Compiling TORQUE To Support SCP 135
Troubleshooting 136
NFS And Other Networked Filesystems 136
File Stage-in/stage-out 137
Chapter 8 MPI (Message Passing Interface) Support 139
MPICH 139
Open MPI 140
Chapter 9 Resources 143
Chapter 10 Accounting Records 146
Chapter 11 Job Logging 148
Job Log Location And Name 148
Enabling Job Logs 148
Chapter 12 Troubleshooting 150
Automatic Queue And Job Recovery 150
Host Resolution 150
Firewall Configuration 151
TORQUE Log Files 151
Using "tracejob" To Locate Job Failures 153
Using GDB To Locate Job Failures 155
Other Diagnostic Options 155
Stuck Jobs 156
Frequently Asked Questions (FAQ) 157
Compute Node Health Check 163
Configuring MOMs To Launch A Health Check 163
Creating The Health Check Script 164
剩余351页未读,继续阅读
资源评论
Blockbuater_drug
- 粉丝: 778
- 资源: 20
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功