SubnetManagerやMTUについて

ぼくは何も分からないことがわかりました!!

SubnetManagerについて

http://www.mellanox.com/related-docs/prod_software/MLNX_VPI_WinOF_User_Manual_v4.95.pdf
3.2.2 OpenSM – Subnet Manager
詳しいことはググってください・・・

今回分からなかったことが2点。
・ESXiのvSwitchやVMKernelのMTUを変更する必要があるのかどうか。
・設定の変更の方法について。
・SMと各IBのMTUが違うと接続出来ないのか、また初期値はいくつなのか。

さていろいろ調査してはみたものの、
ますますわからなくなりました。

フォロワーの方々、
お付き合いして頂いたのにすみません。。。

WindowsだとMellanox版WinOFを使うかと思いますので調べてみた。
普通にネットワークの構築
Mellanox IPoIB
をデバイスの構成からみたら、
4092(-4バイト引くらしい)でした。

WindowsのSubnetManagerは、普通にインストールすると
C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe
ここにあります。

サービスに登録する場合は、

sc create OpenSM binPath= "c:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe --service" start= auto
sc start OpenSM

こんな感じ。
で、よくあるのが「-service」にしちゃうとダメなので注意。。。

削除する場合は

sc qc OpenSM
sc delete OpenSM

で再起動すれば消えます。

いろんなオプションがあるので、

>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe" -h
-------------------------------------------------
OpenSM 3.3.11 UMAD
Command Line Arguments:

------- OpenSM - Usage and options ----------------------
Usage:   opensm [options]
Options:
--version
          Prints OpenSM version and exits.

--config, -F <file-name>
          The name of the OpenSM config file. When not specified
          %ProgramFiles%\OFED\OpenSM\opensm.conf will be used (if exists).

--create-config, -c <file-name>
          OpenSM will dump its configuration to the specified file and exit.
          This is a way to generate OpenSM configuration file template.

こんな感じでいろいろ出てきます。

でだ。デフォルトの設定はどうなってんねんということで出してみた。

>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe" -c opensm.conf
-------------------------------------------------
OpenSM 3.3.11 UMAD
Command Line Arguments:
 Creating config file template 'opensm.conf'.
 Log File: %windir%\temp\osm.log
-------------------------------------------------

#
# DEVICE ATTRIBUTES OPTIONS
#
# The port GUID on which the OpenSM is running
guid 0x0000000000000000

# M_Key value sent to all ports qualifying all Set(PortInfo)
m_key 0x0000000000000000

# The lease period used for the M_Key on this subnet in [sec]
m_key_lease_period 0

# SM_Key value of the SM used for SM authentication
sm_key 0x0000000000000001

# SM_Key value to qualify rcv SA queries as 'trusted'
sa_key 0x0000000000000001

# Note that for both values above (sm_key and sa_key)
# OpenSM version 3.2.1 and below used the default value '1'
# in a host byte order, it is fixed now but you may need to
# change the values to interoperate with old OpenSM running
# on a little endian machine.

# Subnet prefix used on this subnet
subnet_prefix 0xfe80000000000000

# The LMC value used on this subnet
lmc 0

# lmc_esp0 determines whether LMC value used on subnet is used for
# enhanced switch port 0. If TRUE, LMC value for subnet is used for
# ESP0. Otherwise, LMC value for ESP0s is 0.
lmc_esp0 FALSE

# sm_sl determines SMSL used for SM/SA communication
sm_sl 0

# The code of maximal time a packet can live in a switch
# The actual time is 4.096usec * 2^<packet_life_time>
# The value 0x14 disables this mechanism
packet_life_time 0x12

# The number of sequential packets dropped that cause the port
# to enter the VLStalled state. The result of setting this value to
# zero is undefined.
vl_stall_count 0x07

# The number of sequential packets dropped that cause the port
# to enter the VLStalled state. This value is for switch ports
# driving a CA or router port. The result of setting this value
# to zero is undefined.
leaf_vl_stall_count 0x07

# The code of maximal time a packet can wait at the head of
# transmission queue.
# The actual time is 4.096usec * 2^<head_of_queue_lifetime>
# The value 0x14 disables this mechanism
head_of_queue_lifetime 0x12

# The maximal time a packet can wait at the head of queue on
# switch port connected to a CA or router port
leaf_head_of_queue_lifetime 0x10

# Limit the maximal operational VLs
max_op_vls 5

# Force PortInfo:LinkSpeedEnabled on switch ports
# If 0, don't modify PortInfo:LinkSpeedEnabled on switch port
# Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port
# Values are (IB Spec 1.2.1, 14.2.5.6 Table 146 "PortInfo")
#    1: 2.5 Gbps
#    3: 2.5 or 5.0 Gbps
#    5: 2.5 or 10.0 Gbps
#    7: 2.5 or 5.0 or 10.0 Gbps
#    2,4,6,8-14 Reserved
#    Default 15: set to PortInfo:LinkSpeedSupported
force_link_speed 15

# Force PortInfo:LinkSpeedExtEnabled on ports
# If 0, don't modify PortInfo:LinkSpeedExtEnabled on port
# Otherwise, use value for PortInfo:LinkSpeedExtEnabled on port
# Values are (MgtWG RefID #4722)
#    1: 14.0625 Gbps
#    2: 25.78125 Gbps
#    3: 14.0625 Gbps or 25.78125 Gbps
#    30: Disable extended link speeds
#    Default 31: set to PortInfo:LinkSpeedExtSupported
force_link_speed_ext 31

# FDR10 on ports on devices that support FDR10
# Values are:
#    0: don't use fdr10 (no MLNX ExtendedPortInfo MADs)
#    Default 1: enable fdr10 when supported
#    2: disable fdr10 when supported
fdr10 1

# The subnet_timeout code that will be set for all the ports
# The actual timeout is 4.096usec * 2^<subnet_timeout>
subnet_timeout 18

# Threshold of local phy errors for sending Trap 129
local_phy_errors_threshold 0x08

# Threshold of credit overrun errors for sending Trap 130
overrun_errors_threshold 0x08

# Use SwitchInfo:MulticastFDBTop if advertised in PortInfo:CapabilityMask
use_mfttop TRUE

#
# PARTITIONING OPTIONS
#
# Partition configuration file to be used
partition_config_file %ProgramFiles%\OFED\OpenSM\partitions.conf

# Disable partition enforcement by switches
no_partition_enforcement FALSE

#
# SWEEP OPTIONS
#
# The number of seconds between subnet sweeps (0 disables it)
sweep_interval 10

# If TRUE cause all lids to be reassigned
reassign_lids FALSE

# If TRUE forces every sweep to be a heavy sweep
force_heavy_sweep FALSE

# If TRUE every trap will cause a heavy sweep.
# NOTE: successive identical traps (>10) are suppressed
sweep_on_trap TRUE

#
# ROUTING OPTIONS
#
# If TRUE count switches as link subscriptions
port_profile_switch_nodes FALSE

# Name of file with port guids to be ignored by port profiling
port_prof_ignore_file (null)

# The file holding routing weighting factors per output port
hop_weights_file (null)

# The file holding non-default port order per switch for routing
port_search_ordering_file (null)

# Routing engine
# Multiple routing engines can be specified separated by
# commas so that specific ordering of routing algorithms will
# be tried if earlier routing engines fail.
# Supported engines: minhop, updn, dnup, file, ftree, lash,
#    dor, torus-2QoS
routing_engine (null)

# Connect roots (use FALSE if unsure)
connect_roots FALSE

# Use unicast routing cache (use FALSE if unsure)
use_ucast_cache FALSE

# Lid matrix dump file name
lid_matrix_dump_file (null)

# LFTs file name
lfts_file (null)

# The file holding the root node guids (for fat-tree or Up/Down)
# One guid in each line
root_guid_file (null)

# The file holding the fat-tree compute node guids
# One guid in each line
cn_guid_file (null)

# The file holding the fat-tree I/O node guids
# One guid in each line
io_guid_file (null)

# Number of reverse hops allowed for I/O nodes 
# Used for connectivity between I/O nodes connected to Top Switches
max_reverse_hops 0

# The file holding the node ids which will be used by Up/Down algorithm instead
# of GUIDs (one guid and id in each line)
ids_guid_file (null)

# The file holding guid routing order guids (for MinHop and Up/Down)
guid_routing_order_file (null)

# Do mesh topology analysis (for LASH algorithm)
do_mesh_analysis FALSE

# Starting VL for LASH algorithm
lash_start_vl 0

# Port Shifting (use FALSE if unsure)
port_shifting FALSE

# Assign ports in a random order instead of round-robin.
# If zero disable, otherwise use the value as a random seed
scatter_ports 0

# SA database file name
sa_db_file (null)

# If TRUE causes OpenSM to dump SA database at the end of
# every light sweep, regardless of the verbosity level
sa_db_dump FALSE

# Torus-2QoS configuration file name
torus_config %ProgramFiles%\OFED\OpenSM\osm-torus-2QoS.conf

#
# HANDOVER - MULTIPLE SMs OPTIONS
#
# SM priority used for deciding who is the master
# Range goes from 0 (lowest priority) to 15 (highest).
sm_priority 0

# If TRUE other SMs on the subnet should be ignored
ignore_other_sm FALSE

# Timeout in [msec] between two polls of active master SM
sminfo_polling_timeout 10000

# Number of failing polls of remote SM that declares it dead
polling_retry_number 4

# If TRUE honor the guid2lid file when coming out of standby
# state, if such file exists and is valid
honor_guid2lid_file FALSE

#
# TIMING AND THREADING OPTIONS
#
# Maximum number of SMPs sent in parallel
max_wire_smps 4

# Maximum number of timeout based SMPs allowed to be outstanding
# A value less than or equal to max_wire_smps disables this mechanism
max_wire_smps2 4

# The timeout in [usec] used for sending SMPs above max_wire_smps limit and below max_wire_smps2 limit
max_smps_timeout 9000000

# The maximum time in [msec] allowed for a transaction to complete
transaction_timeout 3000

# The maximum number of retries allowed for a transaction to complete
transaction_retries 3

# Maximal time in [msec] a message can stay in the incoming message queue.
# If there is more than one message in the queue and the last message
# stayed in the queue more than this value, any SA request will be
# immediately be dropped but BUSY status is not currently returned.
max_msg_fifo_timeout 150000

# Use a single thread for handling SA queries
single_thread FALSE

#
# MISC OPTIONS
#
# Daemon mode
daemon FALSE

# SM Inactive
sm_inactive FALSE

# Babbling Port Policy
babbling_port_policy FALSE

# Use Optimized SLtoVLMapping programming if supported by device
use_optimized_slvl FALSE

#
# Event Plugin Options
#
# Event plugin name(s)
event_plugin_name (null)

# Options string that would be passed to the plugin(s)
event_plugin_options (null)

#
# Node name map for mapping node's to more descriptive node descriptions
# (man ibnetdiscover for more information)
#
node_name_map_name (null)

#
# DEBUG FEATURES
#
# The log flags used
log_flags 0x03

# Force flush of the log file after each log message
force_log_flush FALSE

# Log file to be used
log_file %windir%\temp\osm.log

# Limit the size of the log file in MB. If overrun, log is restarted
log_max_size 0

# If TRUE will accumulate the log over multiple OpenSM sessions
accum_log_file TRUE

# The directory to hold the file OpenSM dumps
dump_files_dir %windir%\temp\

# If TRUE enables new high risk options and hardware specific quirks
enable_quirks FALSE

# If TRUE disables client reregistration
no_clients_rereg FALSE

# If TRUE OpenSM should disable multicast support and
# no multicast routing is performed if TRUE
disable_multicast FALSE

# If TRUE opensm will exit on fatal initialization issues
exit_on_fatal TRUE

# console [off|local]
console off

# Telnet port for console (default 10000)
console_port 10000

#
# QoS OPTIONS
#
# Enable QoS setup
qos FALSE

# QoS policy file to be used
qos_policy_file %ProgramFiles%\OFED\OpenSM\qos-policy.conf

# QoS default options
qos_max_vls 0
qos_high_limit -1
qos_vlarb_high (null)
qos_vlarb_low (null)
qos_sl2vl (null)

# QoS CA options
qos_ca_max_vls 0
qos_ca_high_limit -1
qos_ca_vlarb_high (null)
qos_ca_vlarb_low (null)
qos_ca_sl2vl (null)

# QoS Switch Port 0 options
qos_sw0_max_vls 0
qos_sw0_high_limit -1
qos_sw0_vlarb_high (null)
qos_sw0_vlarb_low (null)
qos_sw0_sl2vl (null)

# QoS Switch external ports options
qos_swe_max_vls 0
qos_swe_high_limit -1
qos_swe_vlarb_high (null)
qos_swe_vlarb_low (null)
qos_swe_sl2vl (null)

# QoS Router ports options
qos_rtr_max_vls 0
qos_rtr_high_limit -1
qos_rtr_vlarb_high (null)
qos_rtr_vlarb_low (null)
qos_rtr_sl2vl (null)

# Prefix routes file name
prefix_routes_file %ProgramFiles%\OFED\OpenSM\prefix-routes.conf

#
# IPv6 Solicited Node Multicast (SNM) Options
#
consolidate_ipv6_snm_req FALSE

# Log prefix
log_prefix (null)

正直よくわかりません!!

その中でも

# PARTITIONING OPTIONS
#
# Partition configuration file to be used
partition_config_file %ProgramFiles%\OFED\OpenSM\partitions.conf

これが気になったので、
Linuxとかと同じ書き方をしてみた。
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_the_Subnet_Manager.html
こんな感じでpartitions.confを書いてみました。。

key0=0x7fff,ipoib,mtu=5 : ALL=full;

がこれで変わるのか、またよくわからず。

で、いろいろと試してみましたが、
Windows側でSubnetManagerを立てて繋げようとしたのですが、
WinOF4.80のSubnetManagerとESXiは繋がりませんでした。

その際、Windows側はSMときちんと接続しているようなのですが、、、

>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\ibv_devinfo.exe"
hca_id: ibv_device0
        fw_ver:                         2.9.8350
        node_guid:                      0002:c903:000e:793c
        sys_image_guid:                 0002:c903:000e:793f
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xB0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               2
                        port_lmc:               0x00
                        transport:              IB

あきらめて、ESXi側のOpenSMを使うことにしました。

インストール等はこちら。
ESXi6.0 U1でInfiniband

この場合、Windows側は特になにも変更することはなく。

>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\ibv_devinfo.exe"
hca_id: ibv_device0
        fw_ver:                         2.9.8350
        node_guid:                      0002:c903:000e:793c
        sys_image_guid:                 0002:c903:000e:793f
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xB0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               2
                        port_lmc:               0x00
                        transport:              IB

ということで4096で繋がってるらしい。。。?

ESXi側があまりにコマンドが少なくて調査がほとんどできませんでした。。。

# ./opt/opensm/bin/ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 1
        Firmware version: 2.9.8350
        Hardware version: b0
        Node GUID: 0x0002c903000e28da
        System image GUID: 0x0002c903000e28dd
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251086a
                Port GUID: 0x0002c903000e28db
                Link layer: InfiniBand

よくわかりませんw

で、ですよ?
この状況、ESXiのOpenSMとWindowsとESXiは通信出来ています。
問題は、ESXiのvSwitchやVMKernelのMTUを変更する必要があるのかどうか

デフォルトだと1500。

IB自体

# esxcfg-nics -l | grep ib0
vmnic_ib0 0000:06:00.0 ib_ipoib    Up   40000Mbps Full   00:02:c9:0e:28:db 1500   Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

MTUは1500

vSwitch

# esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch3         2102        3           128               1500    vmnic_ib0

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VMkernel 2            0        1           vmnic_ib0

こっちもMTUが1500です。

vSwitchのMTUを変更しようとすると

# esxcfg-vswitch -m 4096 vSwitch3
Unable to set MTU to 4096 the following uplinks refused the MTU setting: vmnic_ib0

怒られます。。。

だけど、IB自体は4096になってる?

# esxcli system module parameters list -m=mlx4_core|grep mtu
mtu_4k                  int           1       configure 4k mtu (mtu_4k > 0)

うーん、わかりません。。。

[tegaki]ギブアップ[/tegaki]

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です

Enter code * Time limit is exhausted. Please reload CAPTCHA.

このサイトはスパムを低減するために Akismet を使っています。コメントデータの処理方法の詳細はこちらをご覧ください