ぼくは何も分からないことがわかりました!!
SubnetManagerについて
http://www.mellanox.com/related-docs/prod_software/MLNX_VPI_WinOF_User_Manual_v4.95.pdf
3.2.2 OpenSM – Subnet Manager
詳しいことはググってください・・・
今回分からなかったことが2点。
・ESXiのvSwitchやVMKernelのMTUを変更する必要があるのかどうか。
・設定の変更の方法について。
・SMと各IBのMTUが違うと接続出来ないのか、また初期値はいくつなのか。
さていろいろ調査してはみたものの、
ますますわからなくなりました。
フォロワーの方々、
お付き合いして頂いたのにすみません。。。
WindowsだとMellanox版WinOFを使うかと思いますので調べてみた。
普通にネットワークの構築
Mellanox IPoIB
をデバイスの構成からみたら、
4092(-4バイト引くらしい)でした。
WindowsのSubnetManagerは、普通にインストールすると
C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe
ここにあります。
サービスに登録する場合は、
sc create OpenSM binPath= "c:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe --service" start= auto sc start OpenSM
こんな感じ。
で、よくあるのが「-service」にしちゃうとダメなので注意。。。
削除する場合は
sc qc OpenSM sc delete OpenSM
で再起動すれば消えます。
いろんなオプションがあるので、
>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe" -h ------------------------------------------------- OpenSM 3.3.11 UMAD Command Line Arguments: ------- OpenSM - Usage and options ---------------------- Usage: opensm [options] Options: --version Prints OpenSM version and exits. --config, -F <file-name> The name of the OpenSM config file. When not specified %ProgramFiles%\OFED\OpenSM\opensm.conf will be used (if exists). --create-config, -c <file-name> OpenSM will dump its configuration to the specified file and exit. This is a way to generate OpenSM configuration file template.
こんな感じでいろいろ出てきます。
でだ。デフォルトの設定はどうなってんねんということで出してみた。
>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\opensm.exe" -c opensm.conf ------------------------------------------------- OpenSM 3.3.11 UMAD Command Line Arguments: Creating config file template 'opensm.conf'. Log File: %windir%\temp\osm.log ------------------------------------------------- # # DEVICE ATTRIBUTES OPTIONS # # The port GUID on which the OpenSM is running guid 0x0000000000000000 # M_Key value sent to all ports qualifying all Set(PortInfo) m_key 0x0000000000000000 # The lease period used for the M_Key on this subnet in [sec] m_key_lease_period 0 # SM_Key value of the SM used for SM authentication sm_key 0x0000000000000001 # SM_Key value to qualify rcv SA queries as 'trusted' sa_key 0x0000000000000001 # Note that for both values above (sm_key and sa_key) # OpenSM version 3.2.1 and below used the default value '1' # in a host byte order, it is fixed now but you may need to # change the values to interoperate with old OpenSM running # on a little endian machine. # Subnet prefix used on this subnet subnet_prefix 0xfe80000000000000 # The LMC value used on this subnet lmc 0 # lmc_esp0 determines whether LMC value used on subnet is used for # enhanced switch port 0. If TRUE, LMC value for subnet is used for # ESP0. Otherwise, LMC value for ESP0s is 0. lmc_esp0 FALSE # sm_sl determines SMSL used for SM/SA communication sm_sl 0 # The code of maximal time a packet can live in a switch # The actual time is 4.096usec * 2^<packet_life_time> # The value 0x14 disables this mechanism packet_life_time 0x12 # The number of sequential packets dropped that cause the port # to enter the VLStalled state. The result of setting this value to # zero is undefined. vl_stall_count 0x07 # The number of sequential packets dropped that cause the port # to enter the VLStalled state. This value is for switch ports # driving a CA or router port. The result of setting this value # to zero is undefined. leaf_vl_stall_count 0x07 # The code of maximal time a packet can wait at the head of # transmission queue. # The actual time is 4.096usec * 2^<head_of_queue_lifetime> # The value 0x14 disables this mechanism head_of_queue_lifetime 0x12 # The maximal time a packet can wait at the head of queue on # switch port connected to a CA or router port leaf_head_of_queue_lifetime 0x10 # Limit the maximal operational VLs max_op_vls 5 # Force PortInfo:LinkSpeedEnabled on switch ports # If 0, don't modify PortInfo:LinkSpeedEnabled on switch port # Otherwise, use value for PortInfo:LinkSpeedEnabled on switch port # Values are (IB Spec 1.2.1, 14.2.5.6 Table 146 "PortInfo") # 1: 2.5 Gbps # 3: 2.5 or 5.0 Gbps # 5: 2.5 or 10.0 Gbps # 7: 2.5 or 5.0 or 10.0 Gbps # 2,4,6,8-14 Reserved # Default 15: set to PortInfo:LinkSpeedSupported force_link_speed 15 # Force PortInfo:LinkSpeedExtEnabled on ports # If 0, don't modify PortInfo:LinkSpeedExtEnabled on port # Otherwise, use value for PortInfo:LinkSpeedExtEnabled on port # Values are (MgtWG RefID #4722) # 1: 14.0625 Gbps # 2: 25.78125 Gbps # 3: 14.0625 Gbps or 25.78125 Gbps # 30: Disable extended link speeds # Default 31: set to PortInfo:LinkSpeedExtSupported force_link_speed_ext 31 # FDR10 on ports on devices that support FDR10 # Values are: # 0: don't use fdr10 (no MLNX ExtendedPortInfo MADs) # Default 1: enable fdr10 when supported # 2: disable fdr10 when supported fdr10 1 # The subnet_timeout code that will be set for all the ports # The actual timeout is 4.096usec * 2^<subnet_timeout> subnet_timeout 18 # Threshold of local phy errors for sending Trap 129 local_phy_errors_threshold 0x08 # Threshold of credit overrun errors for sending Trap 130 overrun_errors_threshold 0x08 # Use SwitchInfo:MulticastFDBTop if advertised in PortInfo:CapabilityMask use_mfttop TRUE # # PARTITIONING OPTIONS # # Partition configuration file to be used partition_config_file %ProgramFiles%\OFED\OpenSM\partitions.conf # Disable partition enforcement by switches no_partition_enforcement FALSE # # SWEEP OPTIONS # # The number of seconds between subnet sweeps (0 disables it) sweep_interval 10 # If TRUE cause all lids to be reassigned reassign_lids FALSE # If TRUE forces every sweep to be a heavy sweep force_heavy_sweep FALSE # If TRUE every trap will cause a heavy sweep. # NOTE: successive identical traps (>10) are suppressed sweep_on_trap TRUE # # ROUTING OPTIONS # # If TRUE count switches as link subscriptions port_profile_switch_nodes FALSE # Name of file with port guids to be ignored by port profiling port_prof_ignore_file (null) # The file holding routing weighting factors per output port hop_weights_file (null) # The file holding non-default port order per switch for routing port_search_ordering_file (null) # Routing engine # Multiple routing engines can be specified separated by # commas so that specific ordering of routing algorithms will # be tried if earlier routing engines fail. # Supported engines: minhop, updn, dnup, file, ftree, lash, # dor, torus-2QoS routing_engine (null) # Connect roots (use FALSE if unsure) connect_roots FALSE # Use unicast routing cache (use FALSE if unsure) use_ucast_cache FALSE # Lid matrix dump file name lid_matrix_dump_file (null) # LFTs file name lfts_file (null) # The file holding the root node guids (for fat-tree or Up/Down) # One guid in each line root_guid_file (null) # The file holding the fat-tree compute node guids # One guid in each line cn_guid_file (null) # The file holding the fat-tree I/O node guids # One guid in each line io_guid_file (null) # Number of reverse hops allowed for I/O nodes # Used for connectivity between I/O nodes connected to Top Switches max_reverse_hops 0 # The file holding the node ids which will be used by Up/Down algorithm instead # of GUIDs (one guid and id in each line) ids_guid_file (null) # The file holding guid routing order guids (for MinHop and Up/Down) guid_routing_order_file (null) # Do mesh topology analysis (for LASH algorithm) do_mesh_analysis FALSE # Starting VL for LASH algorithm lash_start_vl 0 # Port Shifting (use FALSE if unsure) port_shifting FALSE # Assign ports in a random order instead of round-robin. # If zero disable, otherwise use the value as a random seed scatter_ports 0 # SA database file name sa_db_file (null) # If TRUE causes OpenSM to dump SA database at the end of # every light sweep, regardless of the verbosity level sa_db_dump FALSE # Torus-2QoS configuration file name torus_config %ProgramFiles%\OFED\OpenSM\osm-torus-2QoS.conf # # HANDOVER - MULTIPLE SMs OPTIONS # # SM priority used for deciding who is the master # Range goes from 0 (lowest priority) to 15 (highest). sm_priority 0 # If TRUE other SMs on the subnet should be ignored ignore_other_sm FALSE # Timeout in [msec] between two polls of active master SM sminfo_polling_timeout 10000 # Number of failing polls of remote SM that declares it dead polling_retry_number 4 # If TRUE honor the guid2lid file when coming out of standby # state, if such file exists and is valid honor_guid2lid_file FALSE # # TIMING AND THREADING OPTIONS # # Maximum number of SMPs sent in parallel max_wire_smps 4 # Maximum number of timeout based SMPs allowed to be outstanding # A value less than or equal to max_wire_smps disables this mechanism max_wire_smps2 4 # The timeout in [usec] used for sending SMPs above max_wire_smps limit and below max_wire_smps2 limit max_smps_timeout 9000000 # The maximum time in [msec] allowed for a transaction to complete transaction_timeout 3000 # The maximum number of retries allowed for a transaction to complete transaction_retries 3 # Maximal time in [msec] a message can stay in the incoming message queue. # If there is more than one message in the queue and the last message # stayed in the queue more than this value, any SA request will be # immediately be dropped but BUSY status is not currently returned. max_msg_fifo_timeout 150000 # Use a single thread for handling SA queries single_thread FALSE # # MISC OPTIONS # # Daemon mode daemon FALSE # SM Inactive sm_inactive FALSE # Babbling Port Policy babbling_port_policy FALSE # Use Optimized SLtoVLMapping programming if supported by device use_optimized_slvl FALSE # # Event Plugin Options # # Event plugin name(s) event_plugin_name (null) # Options string that would be passed to the plugin(s) event_plugin_options (null) # # Node name map for mapping node's to more descriptive node descriptions # (man ibnetdiscover for more information) # node_name_map_name (null) # # DEBUG FEATURES # # The log flags used log_flags 0x03 # Force flush of the log file after each log message force_log_flush FALSE # Log file to be used log_file %windir%\temp\osm.log # Limit the size of the log file in MB. If overrun, log is restarted log_max_size 0 # If TRUE will accumulate the log over multiple OpenSM sessions accum_log_file TRUE # The directory to hold the file OpenSM dumps dump_files_dir %windir%\temp\ # If TRUE enables new high risk options and hardware specific quirks enable_quirks FALSE # If TRUE disables client reregistration no_clients_rereg FALSE # If TRUE OpenSM should disable multicast support and # no multicast routing is performed if TRUE disable_multicast FALSE # If TRUE opensm will exit on fatal initialization issues exit_on_fatal TRUE # console [off|local] console off # Telnet port for console (default 10000) console_port 10000 # # QoS OPTIONS # # Enable QoS setup qos FALSE # QoS policy file to be used qos_policy_file %ProgramFiles%\OFED\OpenSM\qos-policy.conf # QoS default options qos_max_vls 0 qos_high_limit -1 qos_vlarb_high (null) qos_vlarb_low (null) qos_sl2vl (null) # QoS CA options qos_ca_max_vls 0 qos_ca_high_limit -1 qos_ca_vlarb_high (null) qos_ca_vlarb_low (null) qos_ca_sl2vl (null) # QoS Switch Port 0 options qos_sw0_max_vls 0 qos_sw0_high_limit -1 qos_sw0_vlarb_high (null) qos_sw0_vlarb_low (null) qos_sw0_sl2vl (null) # QoS Switch external ports options qos_swe_max_vls 0 qos_swe_high_limit -1 qos_swe_vlarb_high (null) qos_swe_vlarb_low (null) qos_swe_sl2vl (null) # QoS Router ports options qos_rtr_max_vls 0 qos_rtr_high_limit -1 qos_rtr_vlarb_high (null) qos_rtr_vlarb_low (null) qos_rtr_sl2vl (null) # Prefix routes file name prefix_routes_file %ProgramFiles%\OFED\OpenSM\prefix-routes.conf # # IPv6 Solicited Node Multicast (SNM) Options # consolidate_ipv6_snm_req FALSE # Log prefix log_prefix (null)
正直よくわかりません!!
その中でも
# PARTITIONING OPTIONS # # Partition configuration file to be used partition_config_file %ProgramFiles%\OFED\OpenSM\partitions.conf
これが気になったので、
Linuxとかと同じ書き方をしてみた。
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_the_Subnet_Manager.html
こんな感じでpartitions.confを書いてみました。。
key0=0x7fff,ipoib,mtu=5 : ALL=full;
がこれで変わるのか、またよくわからず。
で、いろいろと試してみましたが、
Windows側でSubnetManagerを立てて繋げようとしたのですが、
WinOF4.80のSubnetManagerとESXiは繋がりませんでした。
その際、Windows側はSMときちんと接続しているようなのですが、、、
>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\ibv_devinfo.exe" hca_id: ibv_device0 fw_ver: 2.9.8350 node_guid: 0002:c903:000e:793c sys_image_guid: 0002:c903:000e:793f vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 2 port_lmc: 0x00 transport: IB
あきらめて、ESXi側のOpenSMを使うことにしました。
インストール等はこちら。
ESXi6.0 U1でInfiniband
この場合、Windows側は特になにも変更することはなく。
>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\ibv_devinfo.exe" hca_id: ibv_device0 fw_ver: 2.9.8350 node_guid: 0002:c903:000e:793c sys_image_guid: 0002:c903:000e:793f vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 2 port_lmc: 0x00 transport: IB
ということで4096で繋がってるらしい。。。?
ESXi側があまりにコマンドが少なくて調査がほとんどできませんでした。。。
# ./opt/opensm/bin/ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 1 Firmware version: 2.9.8350 Hardware version: b0 Node GUID: 0x0002c903000e28da System image GUID: 0x0002c903000e28dd Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x0251086a Port GUID: 0x0002c903000e28db Link layer: InfiniBand
よくわかりませんw
で、ですよ?
この状況、ESXiのOpenSMとWindowsとESXiは通信出来ています。
問題は、ESXiのvSwitchやVMKernelのMTUを変更する必要があるのかどうか。
デフォルトだと1500。
IB自体
# esxcfg-nics -l | grep ib0 vmnic_ib0 0000:06:00.0 ib_ipoib Up 40000Mbps Full 00:02:c9:0e:28:db 1500 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
MTUは1500
vSwitch
# esxcfg-vswitch -l Switch Name Num Ports Used Ports Configured Ports MTU Uplinks vSwitch3 2102 3 128 1500 vmnic_ib0 PortGroup Name VLAN ID Used Ports Uplinks VMkernel 2 0 1 vmnic_ib0
こっちもMTUが1500です。
vSwitchのMTUを変更しようとすると
# esxcfg-vswitch -m 4096 vSwitch3 Unable to set MTU to 4096 the following uplinks refused the MTU setting: vmnic_ib0
怒られます。。。
だけど、IB自体は4096になってる?
# esxcli system module parameters list -m=mlx4_core|grep mtu mtu_4k int 1 configure 4k mtu (mtu_4k > 0)
うーん、わかりません。。。
[tegaki]ギブアップ[/tegaki]