没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
DATA CENTER
SAN Fabric Administration
Best Practices Guide
Support Perspective
A high-level guide focusing on the tools needed to proactively
configure, monitor, and manage the Brocade Fibre Channel
Storage Area Network infrastructure.
SAN Administration BEST PRACTICES
Brocade SAN Administration Best Practices – Support Perspective 2 of 21
CONTENTS
!
Introduction ......................................................................................................................................................................................................................................... 3!
Audience and Scope ........................................................................................................................................................................................................................ 3!
Brocade Tool Set ................................................................................................................................................................................................................................ 3!
Evolution of the Enterprise Data Center .................................................................................................................................................................................... 4!
SAN Administrator Dilemma ........................................................................................................................................................................................................ 5!
Fabric Configuration ......................................................................................................................................................................................................................... 7!
Fabricwide parameters ................................................................................................................................................... 7!
Fill Word (Condor 2 8 Gbps platform only) .................................................................................................................... 7!
Bottleneck Detection ....................................................................................................................................................... 8!
Edge Hold Time ................................................................................................................................................................ 8!
Debug Log Level Settings ............................................................................................................................................... 9!
Brocade Fabric Watch ..................................................................................................................................................... 9!
Zoning ............................................................................................................................................................................. 10!
Advanced Zoning Considerations ................................................................................................................................. 11!
Zoning Recommendations ............................................................................................................................................ 11!
Firmware Management ................................................................................................................................................. 12!
Firmware Recommendations ....................................................................................................................................... 12!
Routing Policies .............................................................................................................................................................................................................................. 13!
Port-Based Routing ....................................................................................................................................................... 13!
Exchange-Based Routing .............................................................................................................................................. 13!
Dynamic Load Sharing .................................................................................................................................................. 14!
Lossless Dynamic Load Sharing ................................................................................................................................... 14!
In-Order Delivery (IOD) ................................................................................................................................................... 14!
Fabric Diagnostics .......................................................................................................................................................................................................................... 15!
Device Latency ............................................................................................................................................................... 15!
Faulty Media .................................................................................................................................................................. 15!
Data Collection for Support ......................................................................................................................................................................................................... 17!
Appendix A: Configuring Port Fencing .................................................................................................................................................................................... 18!
Appendix B: Terminology ............................................................................................................................................................................................................. 19!
Appendix C: References ............................................................................................................................................................................................................... 20!
Software and Hardware Product Documentation ....................................................................................................... 20!
Technical Briefs ............................................................................................................................................................. 20!
Brocade Compatibility and Support ............................................................................................................................. 20!
Brocade Scalability Guidelines ..................................................................................................................................... 20!
Brocade SAN Health ...................................................................................................................................................... 20!
Brocade Bookshelf ........................................................................................................................................................ 20!
Other ............................................................................................................................................................................... 20!
SAN Administration BEST PRACTICES
Brocade SAN Administration Best Practices – Support Perspective 3 of 21
INTRODUCTION
For over 15 years Brocade has been developing, installing, and training customers on Fibre Channel (FC) Storage
Area Networks (SANs) and, over time, has developed deep technical knowledge in administering SANs. This
document is intended to be a high-level document based on Brocade experience, products, and features focusing on
SAN fabric administration best practices guidelines for addressing configuration, monitoring, managing, and
diagnosing the Brocade-based SAN infrastructure.
The guidelines in this document will not apply to every environment, but they will help guide you through the tools
you need for successful administration of SAN fabrics. Please consult your Brocade sales representative or Brocade
SE for details about the hardware and software products and features described in this document.
Note: This is a “living” document that is continuously being expanded, so be sure to frequently check MyBrocade
(my.brocade.com) for the latest update of this and other best practice documents. Future release of this document
will cover additional topics such as best practices for routed fabrics, Dense Wavelength-Division Multiplexing
(DWDM) connections, and access gateways. Refer to documents in the reference section for further details on the
features and tools discussed in this guide. Refer to the SAN Design and Best Practices Guide for optimal design
principles.
AUDIENCE AND SCOPE
This document is intended for Storage Area Network-Fabric Administrators (including storage and network), Brocade
certified Systems Engineers, IT architects, and System Integrators that provide value-added management solutions
based on the latest product releases from Brocade.
The scope of this document is to address common issues faced by administrators in managing their SANs. The goal
is to reduce the time needed for troubleshooting and dealing with application anomalies by using available tools to
minimize fabricwide disruptions. The details outlined in this document are for 8 Gbps and 16 Gbps devices only.
Note: The features and functions covered in this document apply only to Brocade
®
Fabric OS
®
-based products. This
document is not a replacement for product-specific manuals or detailed training on Brocade Fabric OS (FOS) or
Brocade Network Advisor.
BROCADE TOOL SET
Brocade has built in an extensive set of SAN administration, usability, and RAS (Reliability, Accessibility, and
Serviceability) features into the product line, including ASICs, Brocade FOS, Brocade Network Advisor, Brocade SAN
Health
®
, and the Brocade SAN Health Professional management tool.
Brocade FOS: Brocade FOS has evolved through six generations of Fibre Channel speed transitions to provide a
highly resilient platform for building next-generation Storage Area Network products. The operating system has
evolved to provide two options for deployment. For very risk adverse customers running mission-critical applications,
where stability and uptime are critical, upgrading within the minor release train with RAS improvements is the best
option. Customers who want to take advantage of Brocade innovations in new products and features can continue to
leverage the latest Brocade FOS release.
These are the key RAS features for the Target Path release:
• Credit recovery
• Bottleneck detection
• Port fencing
These are the key SAN resiliency features on the latest major Brocade FOS 7.0.x release:
• Credit loss detection and automatic recovery (Inter-Switch Link [ISL] and backend ports), including
stuck Virtual Channels (VC)
SAN Administration BEST PRACTICES
Brocade SAN Administration Best Practices – Support Perspective 4 of 21
• C3 discard frame logging and viewing
• Forward Error Correction on all 16 gigabit (Gbit) and 10 Gbit links
• In-flight encryption and compression
• D_Port support
• Advanced SFP monitoring (thresholds based on SFP type)
• Using E_Port top-talkers on 16 Gbps ISLs
• Access Gateway N_Port monitoring
• Duplicate World Wide Name (WWN) detection and resolution
Refer to the Brocade Fabric OS v7.0.x Release Notes, Brocade Fabric OS Administrator’s Guide, and Brocade Fabric
OS Command Reference Guide supporting Brocade Fabric OS v7.0.x for details on these new features.
Brocade Network Advisor and DCFM offer comprehensive monitoring and management support across multiple
Brocade SAN, IP, and converged network fabrics. These applications equip administrators with configuration, zoning,
visualization, analysis, and troubleshooting tools. Only Brocade Network Advisor is supported for management of
switches operating with Brocade FOS v7.0 and later firmware versions.
Brocade SAN Health provides an accurate view of the SAN environment with fabric topologies and detailed
performance metrics. Brocade SAN Health audit reports provide detailed color-coded hierarchical SAN insights from
Brocade FOS, Brocade M-EOS, the Brocade Mi10K Director, and Cisco MDS switches. The tool supports discovery
and reporting of both open systems and FICON fabrics.
EVOLUTION OF THE ENTERPRISE DATA CENTER
Fibre Channel-based SANs have evolved over the past 10 years, from SAN islands to a highly consolidated and
complex infrastructure driven by server virtualization and high capacity storage arrays. Diverse workloads and traffic
profiles going through the core network present a challenge in addressing intermittent anomalies in the fabric.
Fabric usage has also changed. There are more high-availability clusters, such as IBM HACMP, VMware, and
Microsoft Windows. Workload has also become much more complex. Instead of simple host target port pairs, you
now see hypervisors such as VMware VSphere, Windows Hyper-V, and IBM VIOS servicing large numbers of
virtualized hosts. This makes it much more difficult to isolate application problems when application performance
becomes a problem.
Storage virtualization has created its own special I/O requirements, adding a degree of complexity to the I/O complex
previously unseen outside of very complex mainframe environments.
All this has a serious impact on storage—particularly fabric—problem determination. There are more entities to
manage such as Logical Unit Numbers (LUNs), hosts, storage, and virtual machines (VMs), and more potential
problems. Also, the operational environment is much more difficult to troubleshoot than it was even a few years ago.
Rogue or badly behaving devices have much more impact on production environment than they did previously, and
management tools have not kept up with the changes.
There is an increase in virtualized hosts running in hypervisor clusters accessing virtualized storage, which could
potentially put a strain on the storage infrastructure, especially when there are a high number of virtual hosts per
physical server and all accessing the same storage infrastructure.
Many of the new behaviors induced by innovations in workload and storage infrastructures have generated a
corresponding difference in fabric traffic patterns and fabric manageability. For example, there is a significant
increase in very short frames, such as those encapsulating SCSI reserves and in-band Fibre Channel control frames
used by workload and storage virtualization products.
N_Port ID Virtualization (NPIV) hides flow information that was previously reported on individual ports.
SAN Administration BEST PRACTICES
Brocade SAN Administration Best Practices – Support Perspective 5 of 21
The result of all this change is the appearance of increasing issues with application performance that seem to be
associated with storage performance in some way but that cannot be sufficiently identified so that corrective action
can be taken.
SAN ADMINISTRATOR DILEMMA
When application performance problems become obvious, the SAN complex is frequently blamed. SAN
administrators usually have no metrics that might point to some other component in the infrastructure.
Frequently, the result is very long delays before the culprit or culprits are identified and measures are taken to
address the problem. The impact of such outages range from an inconvenience to a massive outage, where mission-
critical application availability is compromised and the enterprise is seriously affected. The experience is never a
positive one.
Brocade recognized the need for improved monitoring and problem determination aids and started a series of
initiatives to address the problem of relevant performance and problem determination metrics in the fabric.
Bottleneck detection is one of the first deliverables of this work. Bottleneck detection is designed to positively
identify bottlenecks in the fabric.
Two types of bottlenecks are detected:
• Bandwidth-based bottlenecks are determined by high link utilization. These are called congestion
bottlenecks. Congestion bottlenecks are relatively easy to detect and, in effect, can be detected by
other Brocade products such as Brocade Fabric Watch. Bottleneck detection provides an alternative
mechanism and more information about the congestion.
• Device latency-based bottlenecks, called latency bottlenecks are much more difficult to detect. This is
the primary focus of bottleneck detection, and the focus of much of the remainder of this section.
Latency detection is frame-based and identifies buffer credit problems. One of the major strength of Fibre Channel is
that it creates lossless connections by implementing a flow control scheme based on buffer credits. The
disadvantage of such an approach is that the number of available buffers is limited and may eventually be totally
consumed.
The temporary unavailability of buffer credits creates a temporary bottleneck. The longer the credits are unavailable,
the more serious the bottleneck. Whereas temporary credit unavailability is expected in normal Fibre Channel
operation, the longer durations are of most concern.
Long periods without buffer credits are typically manifested as performance problems and are usually the result of
device latencies. Exceptional situations cause fabric back pressure that can extend all the way across the fabric and
back. Excessive back pressure can create serious problems in an operational SAN.
Chronic back pressure can exacerbate the effect of hardware failures and misbehaving devices and can also
contribute to serious operational issues, as the existence of existing bottlenecks increases the probability of a
failure.
There are several common sources of high latencies:
• Storage ports (targets) often produce latencies that can slow down applications, because they do not
deliver data at the rate expected by the host platform. Even well-architected storage array performance
can deteriorate over time. For example, LUN provisioning policies such as allocating too many LUNs
behind a given port can contribute to poor performance of the storage, if the control processor in the
array cannot deliver data from all the LUNs quickly enough to satisfy read requests. The overhead of
dealing with a very large number of LUNs may cause slow delivery.
• Hosts (initiators) may also produce significant latencies by requesting more data than they are capable
of processing in a timely manner.
剩余20页未读,继续阅读
资源评论
king01299
- 粉丝: 1
- 资源: 62
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功