This is an archived page of the 2018 Intermediate LCI Workshop

Lustre Hands-On

The environment for the lab uses NCSA's Nebula OpenStack cluster. With OpenStack we were able to create your VMs in an automated fashion. To maximize your learning, your VMs are as close to a minimal install of CentOS as we could make them. OpenStack installs the ssh key you provided in the centos account. We do not care for the centos user account but you have to use it until you get ssh keys setup for the root account. Since the centos account exists you will use it for some Lustre testing once you have a working filesystem. In addition to having a centos account, the hostname and networking for your VMs get configured via the cloud-init package. The instructor will help you with your VMs if you lock yourself out or need a VM reset. Please be careful not to lock yourself out a VM and do not power the VMs off, as you'd have to bother the instructor to power it back on again.

Obtain your Virtual Machines IP addresses

Open the vminfo folder and find username.txt

Verify you can ssh to your workstation VM as centos user

  # On your laptop verify ssh-agent is running
  ssh-agent
  # Add your key to the ssh-agent
  ssh-add
  # Verify your key is loaded in the agent
  ssh-add -l
  # Now test ssh
  ssh -A -l centos workstationPublicIPAddress

Using your VM's IP addresses, setup the /etc/hosts file on workstation

Add /etc/hosts entries for your VMs to contain entries for all your VMs and the instructor's workstation

  [centos@cw464-ws ~]$ sudo nano /etc/hosts
  127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
  # Change the next 4 lines to your IP addresses and hostnames
  192.168.100.34 cw464-ws ws
  192.168.100.35 cw464-mds1 mds1
  192.168.100.36 cw464-oss1 oss1
  192.168.100.37 cw464-oss2 oss2
  192.168.100.34 iws
  # The above line is the instructor's workstation, it should NOT be one of your ip addresses.

Verify you can ssh to mds1, oss1 and oss2 VMs from workstation as centos user

Accept ssh key any warnings. Note that you are using the sshkey off your laptop forwarded by the ssh-agent when you ssh from ws to mds1,oss1 and oss2.

  [centos@cw464-ws ~]$ ssh cw464-mds1 uname -a
  The authenticity of host 'mds1 (192.168.100.35)' can't be established.
  ECDSA key fingerprint is SHA256:vGNBvikGukiv1DZ8WEy8o2QfRrgCcVRAJjMipCHjonY.
  ECDSA key fingerprint is MD5:30:bd:59:8c:b5:c9:be:96:17:30:e5:f3:bc:d8:38:71.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'mds1,192.168.100.35' (ECDSA) to the list of known hosts.
  Linux cw464-mds1.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  [centos@cw464-ws ~]$ ssh cw464-oss1 uname -a
  The authenticity of host 'oss1 (192.168.100.36)' can't be established.
  ECDSA key fingerprint is SHA256:vGNBvikGukiv1DZ8WEy8o2QfRrgCcVRAJjMipCHjonY.
  ECDSA key fingerprint is MD5:30:bd:59:8c:b5:c9:be:96:17:30:e5:f3:bc:d8:38:71.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'oss1,192.168.100.36' (ECDSA) to the list of known hosts.
  Linux cw464-oss1.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  [centos@cw464-ws ~]$ ssh cw464-oss2 uname -a
  The authenticity of host 'oss2 (192.168.100.37)' can't be established.
  ECDSA key fingerprint is SHA256:vGNBvikGukiv1DZ8WEy8o2QfRrgCcVRAJjMipCHjonY.
  ECDSA key fingerprint is MD5:30:bd:59:8c:b5:c9:be:96:17:30:e5:f3:bc:d8:38:71.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'oss2,192.168.100.37' (ECDSA) to the list of known hosts.
  Linux cw464-oss2.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Setup an ssh key for root (leave passphrase empty)

  [centos@cw464-ws ~]$ sudo -i
  [root@cw464-ws .ssh]# ssh-keygen 
  Generating public/private rsa key pair.
  Enter file in which to save the key (/root/.ssh/id_rsa): 
  Enter passphrase (empty for no passphrase): 
  Enter same passphrase again: 
  Your identification has been saved in /root/.ssh/id_rsa.
  Your public key has been saved in /root/.ssh/id_rsa.pub.
  The key fingerprint is:
  SHA256:0czyxkdKiwxzT6/rLmASyLSonUo9zt7yqHOLEqeXbIc root@cw464-mds1.os.ncsa.edu
  The key's randomart image is:
  +---[RSA 2048]----+
  |                 |
  |  .      +       |
  | + o  o + * .    |
  |. + .  = X =     |
  |.... .  S B o    |
  |o.+o. o  . o     |
  |.*oo.o .  .      |
  |+.E=+   .  .     |
  |.===+o   ++      |
  +----[SHA256]-----+

Setup workstation's root account to allow new key to be used

  [root@cw464-ws ~]# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys 
  cp: overwrite ‘authorized_keys’? yes

Add your ssh key into the root account also

  [root@cw464-ws ~]# cat /home/centos/.ssh/authorized_keys  >> /root/.ssh/authorized_keys 

Distribute the ws root ssh key to the root account on all your servers via centos account

  [root@cw464-ws ~]# mkdir /home/centos/rootjunk
  [root@cw464-ws ~]# cp -a /root/.ssh/ /home/centos/rootjunk/
  [root@cw464-ws ~]# chown -R centos /home/centos/rootjunk/

  [root@cw464-ws ~]# exit

  [centos@cw464-ws ~]$ scp -r rootjunk/ cw464-mds1:
  id_rsa.pub                            100%  407   883.7KB/s   00:00    
  id_rsa                                100% 1679     3.7MB/s   00:00    
  authorized_keys                       100%  825     1.7MB/s   00:00    
  known_hosts                           100%  181   479.8KB/s   00:00 

  [centos@cw464-ws ~]$ scp -r rootjunk/ cw464-oss1:
  id_rsa.pub                            100%  409   729.1KB/s   00:00    
  known_hosts                           100%  171   488.4KB/s   00:00    
  authorized_keys                       100%  409   729.8KB/s   00:00    
  id_rsa                                100% 1679     3.7MB/s   00:00    

  [centos@cw464-ws ~]$ scp -r rootjunk/ cw464-oss2:
  id_rsa.pub                            100%  409   637.2KB/s   00:00    
  known_hosts                           100%  171   364.9KB/s   00:00    
  authorized_keys                       100%  409   890.1KB/s   00:00    
  id_rsa                                100% 1679     3.0MB/s   00:00  

  # login to mds1, oss1 and oss2 as centos and copy rootjunk/.ssh to /root/.ssh

  [centos@cw464-mds1 ~]$ sudo -i
  [root@cw464-mds1 ~]# cp -r /home/centos/rootjunk/.ssh /root
  cp: overwrite ‘/root/.ssh/authorized_keys’? yes

  [centos@cw464-oss1 ~]$ sudo -i
  [root@cw464-oss1 ~]# cp -r /home/centos/rootjunk/.ssh /root
  cp: overwrite ‘/root/.ssh/authorized_keys’? yes

  [centos@cw464-oss2 ~]$ sudo -i
  [root@cw464-oss2 ~]# cp -r /home/centos/rootjunk/.ssh /root
  cp: overwrite ‘/root/.ssh/authorized_keys’? yes

Verify you are able to use the root account without using sudoing from the centos account

  charles@x360:~$ ssh  -l root WorkstationPublicIPAddress
  [root@cw464-ws ~]# for i in mds1 oss{1..2}; do ssh $i uname -a; done
  Linux cw464-mds1.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  Linux cw464-oss1.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  Linux cw464-oss2.os.ncsa.edu 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Remove an interface file that shouldn't be in the image (remnant of image build) on all VMs

The extra file makes the network serice show as failed. We only need eth0 interface file made by cloud-init.

[root@cw464-ws ~]# rm /etc/sysconfig/network-scripts/ifcfg-enp0s3 
rm: remove regular file ‘/etc/sysconfig/network-scripts/ifcfg-enp0s3’? yes

Disable selinux on all your VMs

  [root@cw464-ws ~]# nano /etc/sysconfig/selinux 
  [root@cw464-ws ~]# cat /etc/sysconfig/selinux 

  # This file controls the state of SELinux on the system.
  # SELINUX= can take one of these three values:
  #     enforcing - SELinux security policy is enforced.
  #     permissive - SELinux prints warnings instead of enforcing.
  #     disabled - No SELinux policy is loaded.
  SELINUX=disabled
  # SELINUXTYPE= can take one of three two values:
  #     targeted - Targeted processes are protected,
  #     minimum - Modification of targeted policy. Only selected processes are protected. 
  #     mls - Multi Level Security protection.
  SELINUXTYPE=targeted 

  [root@cw464-ws ~]# scp /etc/sysconfig/selinux mds1:/etc/sysconfig/selinux
  selinux                               100%  546   961.9KB/s   00:00   

  [root@cw464-ws ~]# scp /etc/sysconfig/selinux oss1:/etc/sysconfig/selinux
  selinux                               100%  546   961.9KB/s   00:00    

  [root@cw464-ws ~]# scp /etc/sysconfig/selinux oss2:/etc/sysconfig/selinux
  selinux                               100%  546   967.4KB/s   00:00  

Disable firewalld on all servers

  [root@cw464-ws ~]# systemctl disable firewalld
  Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
  Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.

  [root@cw464-ws ~]# ssh mds1 systemctl disable firewalld
  Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
  Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.

  [root@cw464-ws ~]# ssh oss1 systemctl disable firewalld
  Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
  Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.

  [root@cw464-ws ~]# ssh oss2 systemctl disable firewalld
  Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
  Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.

Distribute the /etc/hosts file to all servers

  [root@cw464-ws ~]# for i in mds1 oss{1..2}; do scp /etc/hosts $i:/etc ; done
  hosts                                           100%  279   315.4KB/s   00:00    
  hosts                                           100%  279   456.4KB/s   00:00    
  hosts                                           100%  279   504.4KB/s   00:00    

Configure yum to use the squid proxy server on iws

Add this line to /etc/yum.conf on all your VMs. This will speed up your yum downloads

  proxy=http://iws:3128

Install wget on all your servers

  [root@cw464-ws ~]# for i in ws mds1 oss{1..2}; do ssh $i yum -y install wget ; done

Setup the Lustre yum repos on ws,mds1,oss1 and oss2 servers

Note: workstation will be a Lustre client so it's different

  [root@cw464-ws ~]# wget -O /etc/yum.repos.d/lustre-client.repo http://iws/repofiles/lustre-client.repo

  [root@cw464-mds1 ~]# for i in lustre-server.repo e2fsprogs-wc.repo; do wget -O /etc/yum.repos.d/$i http://iws/repofiles/$i; done

  [root@cw464-oss1 ~]# for i in lustre-server.repo e2fsprogs-wc.repo; do wget -O /etc/yum.repos.d/$i http://iws/repofiles/$i; done

  [root@cw464-oss2 ~]# for i in lustre-server.repo e2fsprogs-wc.repo; do wget -O /etc/yum.repos.d/$i http://iws/repofiles/$i; done

Exclude e2fsprogs kernel from base and updates CentOS repos on mds1, oss1 and oss2 only

  [root@cw464-mds1 ~]# nano /etc/yum.repos.d/CentOS-Base.repo 

  [base]
  name=CentOS-$releasever - Base
  mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
  #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
  gpgcheck=1
  gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
  exclude = e2fsprogs* kernel*

  #released updates 
  [updates]
  name=CentOS-$releasever - Updates
  mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
  #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
  gpgcheck=1
  gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
  exclude = e2fsprogs* kernel*

Remove e2fsprogs that came with OS from mds1, oss1 and oss2 only, then install Lustre rpms

  yum remove e2fsprogs e2fsprogs-libs

  yum install e2fsprogs kernel lustre lustre-tests

  # reinstall cloud-init package, which was removed due to dependancies
  yum install cloud-init

Reboot mds1 oss2 and oss3 for kernel, selinux and firewall changes

  [root@cw464-ws ~]# for i in mds1 oss1 oss2; do ssh $i reboot ; done
  Connection to mds1 closed by remote host.
  Connection to oss1 closed by remote host.
  Connection to oss2 closed by remote host.

Verify kernel version, selinux is disabled and no firewall rules are applied

  [root@cw464-ws ~]# for i in mds1 oss1 oss2; do ssh $i uname -a ; done
  Linux cw464-mds1.os.ncsa.edu 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  Linux cw464-oss1.os.ncsa.edu 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  Linux cw464-oss2.os.ncsa.edu 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    
  [root@cw464-ws ~]# for i in mds1 oss1 oss2; do ssh $i sestatus ; done
  SELinux status:                 disabled
  SELinux status:                 disabled
  SELinux status:                 disabled
    
  [root@cw464-ws ~]# for i in mds1 oss1 oss2; do ssh $i iptables -nvL ; done
  Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         

  Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         

  Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         
  Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         
    
  Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         
    
  Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         
  Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         

  Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination         

  Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
   pkts bytes target     prot opt in     out     source               destination      

Format the mgs and mdt disks on the mds1 server

  [root@cw464-mds1 ~]# lsblk
  NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  vda    253:0    0   4G  0 disk 
  └─vda1 253:1    0   4G  0 part /
  vdb    253:16   0  16G  0 disk 
  vdc    253:32   0   1G  0 disk 
 
  [root@cw464-mds1 ~]# mkfs.lustre --mgs /dev/vdc

     Permanent disk data:
  Target:     MGS
  Index:      unassigned
  Lustre FS:  
  Mount type: ldiskfs
  Flags:      0x64
                (MGS first_time update )
  Persistent mount opts: user_xattr,errors=remount-ro
  Parameters:

  checking for existing Lustre data: not found
  device size = 1024MB
  formatting backing filesystem ldiskfs on /dev/vdc
    target name   MGS
    4k blocks     262144
    options        -q -O uninit_bg,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
  mkfs_cmd = mke2fs -j -b 4096 -L MGS  -q -O uninit_bg,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/vdc 262144
  Writing CONFIGS/mountdata


  [root@cw464-mds1 ~]# mkfs.lustre --fsname=lci --mgsnode=mds1@tcp --mdt --index=0 /dev/vdb

     Permanent disk data:
  Target:     lci:MDT0000
  Index:      0
  Lustre FS:  lci
  Mount type: ldiskfs
  Flags:      0x61
                (MDT first_time update )
  Persistent mount opts: user_xattr,errors=remount-ro
  Parameters: mgsnode=192.168.100.35@tcp

  checking for existing Lustre data: not found
  device size = 16384MB
  formatting backing filesystem ldiskfs on /dev/vdb
    target name   lci:MDT0000
    4k blocks     4194304
    options        -J size=655 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
  mkfs_cmd = mke2fs -j -b 4096 -L lci:MDT0000  -J size=655 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/vdb 4194304
  Writing CONFIGS/mountdata    

Mount the mgt mdt targets on mds1

  [root@cw464-mds1 ~]# mkdir -p /mnt/{mgt,mdt}

  [root@cw464-mds1 ~]# mount -t lustre /dev/vdc /mnt/mgt
  mount.lustre: increased /sys/block/vdc/queue/max_sectors_kb from 512 to 16384

  [root@cw464-mds1 ~]# mount -t lustre /dev/vdb /mnt/mdt
  mount.lustre: increased /sys/block/vdb/queue/max_sectors_kb from 512 to 16384

Format the two ost disks on the oss1 server

  [root@cw464-oss1 ~]# lsblk
  NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  vda    253:0    0   4G  0 disk 
  └─vda1 253:1    0   4G  0 part /
  vdb    253:16   0  16G  0 disk 
  vdc    253:32   0  16G  0 disk 

  [root@cw464-oss1 ~]# mkfs.lustre --fsname=lci --ost --mgsnode=mds1@tcp --index=0 /dev/vdb

     Permanent disk data:
  Target:     lci:OST0000
  Index:      0
  Lustre FS:  lci
  Mount type: ldiskfs
  Flags:      0x62
                (OST first_time update )
  Persistent mount opts: ,errors=remount-ro
  Parameters: mgsnode=192.168.100.35@tcp

  checking for existing Lustre data: not found
  device size = 16384MB
  formatting backing filesystem ldiskfs on /dev/vdb
  	target name   lci:OST0000
  	4k blocks     4194304
  	options        -J size=400 -I 512 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
  mkfs_cmd = mke2fs -j -b 4096 -L lci:OST0000  -J size=400 -I 512 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/vdb 4194304
  Writing CONFIGS/mountdata


  [root@cw464-oss1 ~]# mkfs.lustre --fsname=lci --ost --mgsnode=mds1@tcp --index=1 /dev/vdc

     Permanent disk data:
  Target:     lci:OST0001
  Index:      1
  Lustre FS:  lci
  Mount type: ldiskfs
  Flags:      0x62
               (OST first_time update )
  Persistent mount opts: ,errors=remount-ro
  Parameters: mgsnode=192.168.100.35@tcp

  checking for existing Lustre data: not found
  device size = 16384MB
  formatting backing filesystem ldiskfs on /dev/vdc
  	target name   lci:OST0001
  	4k blocks     4194304
  	options        -J size=400 -I 512 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
  mkfs_cmd = mke2fs -j -b 4096 -L lci:OST0001  -J size=400 -I 512 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/vdc 4194304
  Writing CONFIGS/mountdata

Mount the ost targets on oss1

  [root@cw464-oss1 ~]# mkdir -p /mnt/ost{0,1}

  [root@cw464-oss1 ~]# mount -t lustre /dev/vdb /mnt/ost0
  mount.lustre: increased /sys/block/vdb/queue/max_sectors_kb from 512 to 16384

  [root@cw464-oss1 ~]# mount -t lustre /dev/vdc /mnt/ost1
  mount.lustre: increased /sys/block/vdc/queue/max_sectors_kb from 512 to 16384

Repeat steps above for oss2 server

use index 2,3 and mount to /mnt/ost2 and /mnt/ost3

Install the Lustre client software on the workstation and reboot

  [root@cw464-ws ~]# yum install kmod-lustre-client lustre-client lustre-client-tests
  [root@cw464-ws ~]# reboot ; exit

Mount the Lustre filesystem on workstation

  [root@cw464-ws ~]# mkdir -p /mnt/lci

  [root@cw464-ws ~]# mount -t lustre mds1@tcp:/lci /mnt/lci

  [root@cw464-ws ~]# mount -t lustre
  192.168.100.35@tcp:/lci on /mnt/lci type lustre (rw,lazystatfs)

  [root@cw464-ws ~]# df -h /mnt/lci/
  Filesystem               Size  Used Avail Use% Mounted on
  192.168.100.35@tcp:/lci   62G  181M   59G   1% /mnt/lci

Setup /etc/fstab entries on all systems

  On each of your VMs add /etc/fstab entries
  Example:
  [root@cw464-ws ~]# cat /etc/fstab | grep lustre
  mds1@tcp:/lci /mnt/lci lustre	defaults,_netdev 0 0 

  [root@cw464-ws ~]# ssh mds1 cat /etc/fstab | grep lustre
  /dev/vdb /mnt/mdt lustre defaults 0 0
  /dev/vdc /mnt/mgt lustre defaults 0 0

  [root@cw464-ws ~]# ssh oss1 cat /etc/fstab | grep lustre
  /dev/vdb /mnt/ost0 lustre defaults 0 0
  /dev/vdc /mnt/ost1 lustre defaults 0 0

  [root@cw464-ws ~]# ssh oss2 cat /etc/fstab | grep lustre
  /dev/vdb /mnt/ost2 lustre defaults 0 0
  /dev/vdc /mnt/ost3 lustre defaults 0 0

Setup Lustre modules to load on boot for all VMs

  You can manually load the modules like this:

  [root@cw464-mds1 ~]# modprobe -v lustre
  insmod /lib/modules/3.10.0-862.2.3.el7_lustre.x86_64/extra/lustre/fs/lov.ko 
  insmod /lib/modules/3.10.0-862.2.3.el7_lustre.x86_64/extra/lustre/fs/mdc.ko 
  insmod /lib/modules/3.10.0-862.2.3.el7_lustre.x86_64/extra/lustre/fs/lmv.ko 
  insmod /lib/modules/3.10.0-862.2.3.el7_lustre.x86_64/extra/lustre/fs/lustre.ko
  
  Add modprobe commands to /etc/rc.d/rc.local and make it executable.
  
  [root@cw464-mds1 ~]# chmod -x /etc/rc.d/rc.local
  [root@cw464-mds1 ~]# cat /etc/rc.d/rc.local 
  #!/bin/bash
  # THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
  #
  # It is highly advisable to create own systemd services or udev rules
  # to run scripts during boot instead of using this file.
  #
  # In contrast to previous versions due to parallel execution during boot
  # this script will NOT be run after all other services.
  #
  # Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
  # that this script will be executed during boot.

  touch /var/lock/subsys/local

  modprobe -v lnet
  modprobe -v lustre

Setup some directories for the centos user to test with

  [root@cw464-ws lci]# mkdir -p /mnt/lci/centos

  [root@cw464-ws lci]# chown centos:centos /mnt/lci/centos/

  [root@cw464-ws lci]# su - centos

  [centos@cw464-ws ~]$ cd /mnt/lci/centos/

  [centos@cw464-ws centos]$ mkdir stripe-2osts-1oss stripe-2osts-2oss

  [centos@cw464-ws centos]$ mkdir stripe-4osts-2oss stripe-default

  [centos@cw464-ws centos]$ lfs setstripe -c 2 -i 0 stripe-2osts-1oss

  [centos@cw464-ws centos]$ lfs setstripe -c 2 -i 1 stripe-2osts-2oss

  [centos@cw464-ws centos]$ lfs setstripe -c 4 -i 0 stripe-4osts-2oss

  [centos@cw464-ws centos]$ lfs getstripe *
  stripe-2osts-1oss
  stripe_count:  2 stripe_size:   1048576 stripe_offset: 0
  stripe-2osts-2oss
  stripe_count:  2 stripe_size:   1048576 stripe_offset: 1
  stripe-4osts-2oss
  stripe_count:  4 stripe_size:   1048576 stripe_offset: 0
  stripe-default
  stripe_count:  1 stripe_size:   1048576 stripe_offset: -1

Download, inspect and run dd.sh script

  [centos@cw464-ws centos]$ wget -O /tmp/dd.sh http://iws/benchmark/dd.sh

  [centos@cw464-ws centos]$ cat /tmp/dd.sh 

  [centos@cw464-ws centos]$ /tmp/dd.sh 

Based on the output of the script what do you think you should set the default stripe for the filesystem to be? You can cheat if your run looks inconsistent (we don't have an ideal benchmarking environment).

Inspect the striping of your files

  [centos@cw464-ws centos]$ lfs getstripe stripe-default/1.junk 
  stripe-default/1.junk
  lmm_stripe_count:  1
  lmm_stripe_size:   1048576
  lmm_pattern:       1
  lmm_layout_gen:    0
  lmm_stripe_offset: 0
    obdidx		 objid		 objid		 group
         0	            11	          0xb	             0

  [centos@cw464-ws centos]$ lfs getstripe stripe-2osts-1oss/1.junk 
  stripe-2osts-1oss/1.junk
  lmm_stripe_count:  2
  lmm_stripe_size:   1048576
  lmm_pattern:       1
  lmm_layout_gen:    0
  lmm_stripe_offset: 0
    obdidx		 objid		 objid		 group
         0	             5	          0x5	             0
         1	             7	          0x7	             0

  [centos@cw464-ws centos]$ lfs getstripe stripe-2osts-2oss/1.junk 
  stripe-2osts-2oss/1.junk
  lmm_stripe_count:  2
  lmm_stripe_size:   1048576
  lmm_pattern:       1
  lmm_layout_gen:    0
  lmm_stripe_offset: 1
    obdidx		 objid		 objid		 group
         1	            10	          0xa	             0
         2	             4	          0x4	             0

  [centos@cw464-ws centos]$ lfs getstripe stripe-4osts-2oss/2.junk 
  stripe-4osts-2oss/2.junk
  lmm_stripe_count:  4
  lmm_stripe_size:   1048576
  lmm_pattern:       1
  lmm_layout_gen:    0
  lmm_stripe_offset: 0
    obdidx		 objid		 objid		 group
         0	             9	          0x9	             0
         1	            14	          0xe	             0
         2	             8	          0x8	             0
         3	             4	          0x4	             0

Change the default stripe of the filesystem

  [root@cw464-ws lci]# lfs getstripe /mnt/lci
  /mnt/lci
  stripe_count:  1 stripe_size:   1048576 stripe_offset: -1
  /mnt/lci/centos
  stripe_count:  1 stripe_size:   1048576 stripe_offset: -1

  [root@cw464-ws lci]# lfs setstripe -c 4 /mnt/lci

  [root@cw464-ws lci]# lfs getstripe /mnt/lci
  /mnt/lci
  stripe_count:  4 stripe_size:   1048576 stripe_offset: -1
  /mnt/lci/centos
  stripe_count:  4 stripe_size:   1048576 stripe_offset: -1

  [root@cw464-ws lci]# 
  [root@cw464-ws lci]# 

  [root@cw464-ws lci]# su - centos
  [centos@cw464-ws ~]$ mkdir /mnt/lci/centos/test-newdefault
  [centos@cw464-ws ~]$ lfs getstripe /mnt/lci/centos/test-newdefault
  /mnt/lci/centos/test-newdefault
  stripe_count:  4 stripe_size:   1048576 stripe_offset: -1

Take a look at Lustre disk usage

Notice that the osts have different amounts of free space.

  [root@cw464-ws ~]# lfs df -h
  UUID                       bytes        Used   Available Use% Mounted on
  lci-MDT0000_UUID            8.9G       46.2M        8.0G   1% /mnt/lci[MDT:0]
  lci-OST0000_UUID           15.4G        6.4G        8.2G  44% /mnt/lci[OST:0]
  lci-OST0001_UUID           15.4G        9.3G        5.2G  64% /mnt/lci[OST:1]
  lci-OST0002_UUID           15.4G        6.4G        8.2G  44% /mnt/lci[OST:2]
  lci-OST0003_UUID           15.4G        1.5G       13.0G  10% /mnt/lci[OST:3]

  filesystem_summary:        61.5G       23.6G       34.6G  41% /mnt/lci

Explore troubleshooting commands

Verify all your mds and osts are up, fix ones that are down.

  [root@cw464-ws modprobe.d]# lfs check servers
  lci-MDT0000-mdc-ffffa013b645f000 active.
  lci-OST0000-osc-ffffa013b645f000 active.
  lci-OST0001-osc-ffffa013b645f000 active.
  lci-OST0002-osc-ffffa013b645f000 active.
  lci-OST0003-osc-ffffa013b645f000 active.

Investigate additional Lustre resources