Tip: Set your damn host timeouts!

January 23rd, 2013
JK

I saw a question on LinkedIn on the Netapp group today: "what is the downtime (in seconds) for a FAS2240 high-availability machine in case one controller fails and the other takes over?"

My pre-first cup of coffee answer was less on the question, and more on proper upkeep of a system running on a SAN/NAS. Any SAN/NAS. Set your damn timeouts on your hosts!

A short answer to the question at hand: In the case of FC, the lun visibility timeout due to failover is usually less than 5ms. In the case of an IP based, the failover takes longer because the IPs need to be brought up on the partner, and the arp broadcast needs to happen. The whole process can take 30-60 seconds, but in a “good failover”, it would be a fraction of a second seen by the app.

One thing to note, is the type of failure can dictate the length. How will the other node know it failed? It can be up to 180 seconds before an issue is deemed a failure, and the partner takes over.

Will the partner take over in case of NIC failure? Well.. that depends. Was the cf.takeover.on_network_interface_failure set to? What about other failover situations?

The watchdog heartbeat over the backplane needs to constantly poll and monitor. you can completely kill off a controller, and the partner would need to see a few failures before it attempted a takeout. All during this time, the applications would still be connecting to the downed controller and getting no response.

I can’t begin to count how many people I have talked to with issues because of this, both customers who we have told this too and they just never did it, as well as people online.

Your SAN/NAS will go down. It may be due to upgrades, it may be due to failures. This is just how they work. They key is, the hosts connecting to them do not need to act in a negative manner due to a temporary lapse in access.

If a server cannot access a lun or mount point, the world starts to crumble in a blackhole abyss of SCSI or NFS errors. The martians attack, and the server cries. A linux box may autoswitch the mount point to read-only after a single unwritten IO if it cannot write it for longer than it’s timeout. An NFS mount may go stale.

I know this has happened at least once to everyone reading this. Remember the whole obnoxious readonly root filesystem issue with older RedHat clones of RHEL3 and RHEL4 on vmware? Yeah, that was a pain.

Well, let’s look at what’s happening here in a quick glimpse. A host is trying to write a bit of data. So, it’s sitting in the buffer. It attempts to write. What, huh, Dude Where’s my SAN? The lun is offline or not visible. That data is sitting in memory waiting to write. The host checks it’s reattempt settings, reattempt count, and SCSI error timeouts. It hands out and waits, then tries to write again. And again. Until the SCSI timeout is up. Once it does, it marks that mountpoint as bad. Had the timeout been set longer, it would have kept trying. In the middle of all of this, if your host is setup for MPIO, it would attempt connection on a different path. This is where the beauty of an Active/Active SAN with an IB backplane is beneficial. Depending on the error type, the host system could keep on chugging along sending data to the other node directly, even before the cluster failover takes place.

It is highly recommended that you set your host systems to a SCSI and NFS timeout of 180 seconds. If you have SnapDrive installed on windows or unix, this is usually done for you. Same with if you have VSC on VMware.

By default, VMware tools, NetApp SnapDrive, and other host utilities change these timeouts for you when installed, because you can’t always trust your end users to set it for you.

If you have systems which cannot have SnapDrive installed, and need to set timeouts manually there are a few ways to do it. (Oracle RAC servers are an instance where SnapDrive is bad to have installed)

Powershell snippet to query a vmware server or VC for windows system names, and set all Windows VMs to 190 second timeouts

$VMs = Get-VM | Where { $_.PowerState -eq "PoweredOn" } ForEach ($VM in $VMs) { $reg = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey('LocalMachine', $VM.Guest.Hostname) Write-Host "Registry Value Before: "$VM.Guest.HostName "-" $reg.OpenSubKey("SYSTEM\CurrentControlSet\Services\Disk\").GetValue("TimeoutValue") $regKey= $reg.OpenSubKey("SYSTEM\CurrentControlSet\Services\Disk",$true) $regkey.SetValue('TimeoutValue',190,'DWord') Write-Host "Registry Value After: "$VM.Guest.HostName "-" $reg.OpenSubKey("SYSTEM\CurrentControlSet\Services\Disk\").GetValue("TimeoutValue") }

 

A snippet for linux systems to put in /etc/rc.local to find all disks and set them to a higher timeout. As root:

for i in `find /sys/class/scsi_generic/*/device/timeout`; do echo "Old Value $i" ; cat $i; echo 180 > $i; echo "New Value $i "; cat $i; done

 

If you have a linux host with UDEV, and it is a VMWare box, VMWare tools creates a udev profile which sets timeouts. These may be lower than NetApps recommended timeouts of 180 seconds. Edit this file on each host: /etc/udev/rules.d/99-vmware-scsi-udev.rules

# Redhat systems ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware, " , SYSFS{model}=="VMware Virtual S", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"   # Debian systems ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware " , ATTRS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

I hope this helps someone, if anything, just remember… set your damn timeouts!

Read more from JK at: www.jk-47.com

Join the High Availability, Inc. Mailing List

Subscribe