Monday 4 May 2015

Linux Tuning for Ultra Low Latency

These are some of my notes from BIOS and Linux tuning for Ultra Low Latency with SubMicroTrading.

Don’t blindly copy ANY settings here, test each one for impact and pick the values that suit your system. I include them as possible points of interest.
I have spent many weeks simply doing  this, its tedious but necessary to determine best settings for your system.

BIOS settings

Research every option, remember try to change one item at a time and run benchtest to ascertain impact

Disable hyper-threading
Disable turbo mode if overclocking
Disable all options related to power saving (eg CPU C State support)
Set SATA Configuration to ENHANCED
ACPI Power Management Features :  APIC ACPI SCI IRQ and  High Precision Timer ENABLED                                     
If using PCI3 NetworkCard the slot with the card HAS PCI3 enabled .. default may be PCI2

Operating System Tuning for Ultra Low Latency

Its over 20 years since I worked at a low level with Unix/Linux and the truth is I have forgotten more about the kernal than I  now know. I am NOT an O/S specialist. For SubMicroTrading I didn’t have the luxury to pay someone to configure linux for me so I had to do it myself. I still have my trusty Stevens UNIX Network Programming book which helped and ofcause today we have Google !  David Riddoch from Solarflare was also helpful at answering questions regarding Solarflare and OpenOnload tuning.

I started with Redhat and dismissed the Realtime variant as it was slower for my benchtest, I currently recommend CentOS 5.10 (I am somewhat behind the later versions but honestly what do they have that helps with low latency ?).

Read the Solarflare optimisation document, only wish it had existed when I started !   INSTALL OpenOnload !! I don’t understand why people working in microsecond level latency still don’t use kernal bypass !  OpenOnload is great as its non intrusive and requires ZERO application code changes.

INSTALL SOLARFLARE (requires linux install to have dev env) … note these notes are OLD so versions will be well out of date

copy the SolarFlare drivers to /INSTALL/SOLARFLARE

s1) rpmbuild --rebuild /INSTALL/SOLARFLARE/sfc-3.0.6.2199-1.src.rpm

==> CREATES  /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

s2) Install the RPM

rpm -ivh /usr/src/redhat/RPMS/x86_64/kernel-module-sfc-RHEL5-2.6.18-194.el5-3.0.6.2199-1.x86_64.rpm

==> eth2 and eth3 are now available
==> use rpm -e if old version around

s3) install OpenOnLoad

    tar -xvf openonload-20100923.tar
    ./scripts/onload_install
    modprobe -r sfc
    modprobe sfc

    openonload now ready to use

s4) install BIOS update tools
   
    gunzip SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tgz
    tar -xvf SF-104451-LS-4_Solarstorm_Linux_and_VMware_ESX_Utilities_RPM.tar
    ==> creates ==> sfutils-3.0.8.2216-1.rpm
    rpm -ivh /INSTALL/SOLARFLARE/sfutils-3.0.8.2216-1.rpm
   
    (if get clash with  previous version use rpm -e {oldRpm}
   
s5) check is BIOS update is required and update
   
    sfupdate
    sfupdate --write
   
    onload_tool disable_cstates persist


s6) setup and env var with your required profile eg latency :-

export PRERUN="onload --profile=latency "


s7) simply add $PRERUN to the start of your application invocation command

$PRERUN java …….


Beware O/S upgrades, Linux 6 has some extra horrid latency which after several days tweaking I still hadnt eradicated … I went back to Centos 5.10

Here are my notes from CentOS / RedHat installation

Deselect virtualisation
Disable firewall
Disable SELinux
Delete SWAP partition (you don’t want swapping so ensure you have enough memory!)

Obviously disabling SELinux and firewall is for benchtesting in controlled environment. For colocation running you need determine an appropriate security level for your org. If you must have a firewall between you and the exchange then use a hardware one.  Benchtest without security then with security on so you know the impact.

Protect cores against unwanted intrusion (will be discussed when I blog on using thread affinity)

Avoid millisecond latency impact by using discrete threading model with core affinity via kernal param isolcpus. The O/S wont share these cores via the scheduler so you will need to use thread affinity to bind threads to the protected cores (code to follow in later blog).

Edit the /boot/grub/grub.conf

kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT    nohz=off  isolcpus=6,7,8,9,10,11    rhgb

Kernal Params

There are many ! Here are some to look at :-

transparent_hugepages=never
intel_idle.max_cstate=0
nohz=off
nosoftlockup
idle=poll

Disable unwanted services

/sbin/chkconfig --list | grep "5:on"
 
chkconfig irqbalance off
chkconfig anacron off
chkconfig atd off
chkconfig avahi-daemon off
chkconfig bluetooth off
chkconfig cups off      
chkconfig hidd off      
chkconfig isdn off      
chkconfig pand off      
chkconfig rhnsd off     
chkconfig sendmail off  
chkconfig cpuspeed off  
chkconfig NetworkManager off
chkconfig iptables off
chkconfig ip6tables off
chkconfig libvirt-guests off

These are the services I disabled obviously you need to ensure you don’t need a service before you disabled it.
IRQBalance and CpuSpeed were the main services that I wished to disable … at the risk of sounding like a broken record disable single service, bench test rinse, repeat. Don’t disable ANY service without checking if YOU need it first !

System Scripts : rc.local

Edit /etc/rc.local

ethtool -C eth2 adaptive-rx off
ethtool -C eth2 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0
ethtool -C eth2 rx-usecs-irq 60
ethtool -A eth2 rx off tx off

ethtool -C eth3 adaptive-rx off
ethtool -C eth3 rx-usecs 0 rx-frames 0 rx-usecs-high 0 rx-usecs-low 0 pkt-rate-low 0 pkt-rate-high 0
ethtool -C eth2 rx-usecs-irq 60
ethtool -A eth3 rx off tx off

echo 0 > /sys/class/net/eth2/device/lro
echo 0 > /sys/class/net/eth3/device/lro

Only use rx irq 60  IF using openOnload …. You can experiment with this setting, I use spin reading so should never require an IRQ … but I found if I set lower or higher I could get nasty jitter.


System Scripts : sysctl.conf

Edit /etc/sysctl.conf

kernel.sysrq = 0
kernel.core_uses_pid = 1

net.ipv4.tcp_low_latency=1

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

kernal.isolcpus=6,7,8,9,10,11
kernel.vsyscall64 = 2

Some other useful CentOS "stuff"

Set Run Level

change /etc/inittab default run lvl from 5 to 3   

id:3:initdefault

Network config

/etc/sysconfig/network-scripts/ifcfg-eth*

After changing run

/etc/init.d/network restart

Label Root Partition

Label the root partition eg to RH_ROOT  … so its not confused with any other later O/S installs

SET DEVICE LABEL READY FOR GRUB  (** DONT FORGET UPDATE fstab OR IT WONT BOOT **)
e2label /dev/sda7 RH_ROOT

   edit /etc/fstab
LABEL=RH_ROOT          /                       ext3    defaults        1 1

Edit the /boot/grub/grub.conf
         kernel /vmlinuz-2.6.18-194.el5 ro root=LABEL=RH_ROOT    nohz=off  isolcpus=6,7,8,9,10,11    rhgb
         

Check Kernal CPU Params

     cat /sys/devices/system/cpu/cpuidle/current_driver
    
         intel_idle.max_cstate=0   idle=poll   transparent_hugepage=never processor.max_cstate=0