没有合适的资源?快使用搜索试试~ 我知道了~
Torque Resource Manager Administrator Guide 6.1.3
需积分: 0 0 下载量 152 浏览量
2024-02-17
11:21:55
上传
评论
收藏 3.89MB PDF 举报
温馨提示
试读
521页
torque 6.1.3版本的user guide,详细介绍了torque的安装部署,适用于集群用户的参考,以及排除安装、使用的疑难问题。
资源推荐
资源详情
资源评论
Torque Resource Manager
Administrator Guide 6.1.3
February 2021
© 2018, 2021 Adaptive Computing Enterprises, Inc. All rights reserved.
This documentation and related software are provided under a license agreement containing
restrictions on use and disclosure and are protected by intellectual property laws. Except as
expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce,
translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any
part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this
software, unless required by law for interoperability, is prohibited.
This documentation and related software may provide access to or information about content,
products, and services from third-parties. Adaptive Computing is not responsible for and expressly
disclaims all warranties of any kind with respect to third-party content, products, and services
unless otherwise set forth in an applicable agreement between you and Adaptive Computing.
Adaptive Computing will not be responsible for any loss, costs, or damages incurred due to your
access to or use of third-party content, products, or services, except as set forth in an applicable
agreement between you and Adaptive Computing.
Adaptive Computing, Moab®, Moab HPC Suite, Moab Viewpoint, Moab Grid, NODUS Cloud OS™, and
other Adaptive Computing products are either registered trademarks or trademarks of Adaptive
Computing Enterprises, Inc. The Adaptive Computing logo is a trademark of Adaptive Computing
Enterprises, Inc. All other company and product names may be trademarks of their respective
companies.
The information contained herein is subject to change without notice and is not warranted to be
error free. If you find any errors, please report them to us in writing.
Adaptive Computing Enterprises, Inc.
1100 5th Avenue South, Suite #201
Naples, FL 34102
+1 (239) 330-6093
www.adaptivecomputing.com
2
Contents
Welcome 14
Chapter 1: Introduction 15
1.1 Torque Administrator Guide Overview 15
1.2 Getting Started 17
1.2.1 What Is A Resource Manager? 17
1.2.2 What Are Batch Systems? 17
1.2.3 Basic Job Flow 18
Chapter 2: Installation And Configuration 20
2.1 Torque Installation Overview 22
2.2 Basic Server Configuration 23
2.2.1 Server Configuration File (serverdb) 23
2.2.2 ./torque.setup 23
2.2.3 Pbs_server -t Create 24
2.2.4 Setting Up The Environment For Pbs_server And Pbs_mom 24
2.3 Torque Architecture 26
2.4 Installing Torque Resource Manager 27
2.4.1 Requirements 27
2.4.2 Open Necessary Ports 28
2.4.3 Install Dependencies, Packages, Or Clients 29
2.4.4 Install Torque Server 31
2.4.5 Install Torque MOMs 33
2.4.6 Install Torque Clients 34
2.4.7 Configure Data Management 35
2.5 Compute Nodes 37
2.6 Enabling Torque As A Service 39
2.7 Initializing/Configuring Torque On The Server (pbs_server) 40
2.8 Specifying Compute Nodes 41
2.9 Configuring Torque On Compute Nodes 43
2.10 Configuring Ports 44
2.10.1 Configuring Torque Communication Ports 44
2.10.2 Changing Default Ports 48
2.11 Configuring Trqauthd For Client Commands 51
2.12 Finalizing Configurations 53
2.13 Advanced Configuration 54
2.14 Customizing The Install 55
3
4
2.14.1 HAVE_WORDEXP 62
2.15 Server Configuration 63
2.15.1 Server Configuration Overview 63
2.15.2 Name Service Configuration 63
2.15.3 Configuring Job Submission Hosts 63
2.15.4 Configuring Torque On A Multi-Homed Server 65
2.15.5 Architecture Specific Notes 65
2.15.6 Specifying Non-Root Administrators 65
2.15.7 Setting Up Email 65
2.15.8 Using MUNGE Authentication 66
2.16 Setting Up The MOM Hierarchy (Optional) 68
2.16.1 MOM Hierarchy Example 68
2.16.2 Setting Up The MOMHierarchy 70
2.16.3 Putting The MOMHierarchy On The MOMs 70
2.17 Opening Ports In A Firewall 72
2.17.1 Red Hat 6-Based Systems 72
2.17.2 Red Hat 7-Based Systems 72
2.17.3 SUSE 11-Based Systems 73
2.17.4 SUSE 12-Based Systems 73
2.18 Port Reference 74
2.19 Manual Setup Of Initial Server Configuration 80
2.20 Server Node File Configuration 82
2.21 Basic Node Specification 83
2.22 Specifying Virtual Processor Count For A Node 84
2.23 Specifying GPU Count For A Node 85
2.24 Specifying Node Features (Node Properties) 86
2.25 Testing Server Configuration 87
2.26 Configuring Torque For NUMA Systems 89
2.27 Torque NUMA-Aware Configuration 90
2.27.1 About Cgroups 90
2.27.2 Prerequisites 90
2.27.3 Installation Instructions 91
2.27.4 Multiple Cgroup Directory Configuration 92
2.28 Torque NUMA-Support Configuration 93
2.28.1 Configure Torque For NUMA-Support 93
2.28.2 Create The Mom.layout File 93
2.28.3 Configure The Server_priv/nodes File 96
2.28.4 Limit Memory Resources (Optional) 97
2.29 Torque Multi-MOM 98
2.30 Multi-MOMConfiguration 99
2.30.1 Configure Server_priv/nodes 99
2.30.2 Edit The /etc/hosts File 99
2.30.3 Start Pbs_mom With Multi-MOM Options 100
2.31 Stopping Pbs_mom In Multi-MOM Mode 101
Chapter 3: Submitting And Managing Jobs 102
3.1 Job Submission 104
3.2 Multiple Job Submission 106
3.2.1 Submitting Job Arrays 106
3.2.2 Slot Limit 107
3.3 Managing Multi-Node Jobs 108
3.4 Requesting Resources 109
3.4.1 Native Torque Resources 109
3.4.2 Interpreting Resource Requests 116
3.4.3 Interpreting Node Requests 116
3.4.4 Moab Job Extensions 117
3.5 Requesting NUMA-Aware Resources 119
3.6 Requesting Generic Resources 120
3.7 Requesting Floating Resources 121
3.8 Requesting Other Resources 122
3.9 Exported Batch Environment Variables 123
3.10 Enabling Trusted Submit Hosts 125
3.11 Example Submit Scripts 126
3.12 Job Files 127
3.13 Monitoring Jobs 129
3.14 Canceling Jobs 131
3.15 Job Preemption 132
3.16 Keeping Completed Jobs 133
3.17 Job Checkpoint And Restart 134
3.18 Introduction To BLCR 135
3.19 Configuration Files And Scripts 136
3.20 Starting A Checkpointable Job 141
3.21 Checkpointing A Job 143
3.22 Restarting A Job 144
3.22.1 Restarting A Job In The Held State 144
3.22.2 Restarting A Job In The Completed State 144
3.23 Acceptance Tests 145
3.24 Job Exit Status 146
3.25 Torque Process Tracking 150
3.25.1 Default Process Tracking 150
5
剩余520页未读,继续阅读
资源评论
Blockbuater_drug
- 粉丝: 778
- 资源: 20
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Screenshot_20240430_144340_com.ss.android.ugc.live.jpg
- 回到山沟沟.mp3
- 基于matlab实现自适应波束形成RLS及LMS算法仿真源程序1.rar
- 基于matlab实现自己编写的基于卡尔曼滤波的利用加速度传感器的计步器,测试数据是传感器放在腰部和手臂 .rar
- 基于matlab实现阵列信号处理,波束形成.rar
- 111111111111111111
- 基于matlab实现计步器编程;对当前的计步器装置的数值算法模拟 .rar
- Mdb学习查看PW;access;mdb;pw;password;patch
- 基于matlab实现关于语音信号声源定位DOA估计所用的一些传统算法.rar
- 基于ultralytics-yolov8, 将其检测/分类/分割/姿态等任务移植到rk3588上
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功