Chuck Lever gave some pointers in our Linux-NFS troubleshooting. Several of our Linux servers have been experiencing NFS access issues. They enter a “hung” state and it is hard to get them back without rebooting. We do use the linux automounter and upgrading it to the autofs 4.x did not help. We also noticed that the client hangs on a single mount point (different each time), which is imaginable since automount in linux processes its mounts sequentially. Being able forcibly umount a fs is a hit-or-miss. If we are able to umount, then we can bring the system back up. Otherwise, its reboot time.
Did you know?
Using “umount -f” to forcibly unmount file systems does NOT forcibly umount the fs, as you would expect. If you are from a Solaris background, you probably ran that command and got squat. The -f option for umount in Linux still requires the filesystem to be idle. It is useful in situations where the NFS server is no longer available AND the file system is idle. The choice of arguments is rather poor, cause in almost all other commands, -f is either a “file” or “force”.
In order to forcibly unmount the filesystem (Solaris’s equivalent of umount -f) in Linux, you should use “umount -l”. The option is documented as “lazy”. The command actually then forces the unmounting of the filesystem whether or not the fs is busy.
According to Netapp BURT 114482 ( and we use them toasters :) ), the Linux kernel RPC client is picking a port for a TCP connection that is already in use by IPMI/RMCP services on certain Intel motherboard hardware. There are atleast two other blogs out there that put out similar info:
- Spencer Shelper’s blog got around the issue but creating a dummy daemon listening on port 623
- http://advogato.org/person/seb128/diary.html?start=3
So the workaround seems to be to convince the client to not use that port at all. Following Spencer’s example, I created two xinetd services, dummy1-stream and dummy1-dgram to listen on 623 tcp/udp & re-ran our mount/umount tests.
kreaper% cat dummy1-udp
service dummy1
{
id = dummy1-dgram
disable = no
socket_type = dgram
protocol = udp
wait = no
user = root
port = 623
server = /bin/false
}kreaper% cat dummy1
service dummy1
{
id = dummy1-stream
disable = no
socket_type = stream
protocol = tcp
port = 623
wait = no
user = root
server = /bin/false
}
I used the following script to observe the mount/umount behavior.
Here is my test script, called tmount.sh.
# cat ./tmount.sh
#!/bin/sh
INCR=1
while [ 1 ]
do
mount -o nfsvers=3,tcp,rw,rsize=32768,wsize=32768,timeo=600,actimeo=120,intr nfs-server.tigr.org:/vol/scratch /mnt
MOUNTERR=$?
echo "$INCR - Mount return code: $MOUNTERR"
if [ $MOUNTERR == 0 ]
then
ls /mnt > /dev/null
umount /mnt
fi
INCR=`expr $INCR + 1`
done
No cigar. The clients still hung..
# tcpdump -v host nfs_server~snip
client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:12.010195 IP (tos 0x0, ttl 64, id 2, offset 0, flags [DF], length: 144) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:15.011319 IP (tos 0x0, ttl 64, id 3, offset 0, flags [DF], length: 144)client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:18.012104 IP (tos 0x0, ttl 64, id 4, offset 0, flags [DF], length: 144) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:21.013072 IP (tos 0x0, ttl 64, id 5, offset 0, flags [DF], length: 144)
client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:24.014043 IP (tos 0x0, ttl 64, id 6, offset 0, flags [DF], length: 144) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.4046: UDP, length: 116 09:22:27.025134 IP (tos 0x0, ttl 64, id 22623, offset 0, flags [DF], length: 60)
client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.sunrpc: S [tcp sum ok] 1147223182:1147223182(0) win 5840
09:22:30.025740 IP (tos 0x0, ttl 64, id 22625, offset 0, flags [DF], length: 60) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.sunrpc: S [tcp sum ok] 1147223182:1147223182(0) win 5840
09:22:36.025646 IP (tos 0x0, ttl 64, id 22627, offset 0, flags [DF], length: 60) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.sunrpc: S [tcp sum ok] 1147223182:1147223182(0) win 5840
09:22:48.025444 IP (tos 0x0, ttl 64, id 22629, offset 0, flags [DF], length: 60) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.sunrpc: S [tcp sum ok] 1147223182:1147223182(0) win 5840
09:23:12.026036 IP (tos 0x0, ttl 64, id 22631, offset 0, flags [DF], length: 60) client-0-1-8.tigr.org.asf-rmcp > nfs-server.tigr.org.sunrpc: S [tcp sum ok] 1147223182:1147223182(0) win 5840
On a separate window, while the test script was running, here’s the netstat output:
#netstat -alp | grep nfs-server |grep mount
tcp 0 1 client-0-1-8.tig:asf-rmcp nfs-server.tigr.org:sunrpc SYN_SENT 19097/mount
what ?
#client-0-1-8:/etc/xinetd.d # grep asf-secure-rmcp /etc/servicesasf-secure-rmcp 664/tcp # ASF Secure Remote Management and Control Protocol
asf-secure-rmcp 664/udp # ASF Secure Remote Management and Control Protocol#
So we see now that there is one MORE IPMI/RMCP port on 664. A related google search turned up this.
After adding two more dummy daemons listening on port 664 tcp/udp, we got this:
#./tmount.sh~snip
100 - Mount return code: 0
101 - Mount return code: 0
102 - Mount return code: 0
103 - Mount return code: 0
104 - Mount return code: 0
nfs bindresvport: Address already in use
105 - Mount return code: 32
nfs bindresvport: Address already in use
106 - Mount return code: 32
nfs bindresvport: Address already in use
107 - Mount return code: 32
nfs bindresvport: Address already in use
108 - Mount return code: 32
nfs bindresvport: Address already in use
109 - Mount return code: 32
nfs bindresvport: Address already in use
110 - Mount return code: 32
nfs bindresvport: Address already in use
111 - Mount return code: 32
nfs bindresvport: Address already in use
112 - Mount return code: 32
nfs bindresvport: Address already in use
113 - Mount return code: 32
A lot better!. Instead of just hanging, now the client calls fail. There seems to be an issue with port selection. Here are couple of RHEL/FC BugZero posts that are quite close to what we are seeing.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146629
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=155470
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154678
In short: there seem to be two bugs, one in automount and one in /bin/mount (found in util-linux). Also note that automount still uses mount to do the actual mounting. While the automount bug seems to have been fixed in autofs-4.1.3-114, the /bin/mount bug is still out there. Based on Bug 154678 notes, it seems likely that a fix should be available soon…
Here is another interesting link on Solaris/OpenSolaris use of privileged ports.

I tore my hair out trying to figure this problem out. Only after figuring out the cause was I able to find anything on Google. As a side note, the Linux 2.6 kernel should use ports between 650 and 1023, so interference with port 623 should be a thing of the past on 2.6 kernels. Rather than creating a fake daemon, I used sysctl to adjust the kernel parameter sunrpc.min_resvport to a value > 664 instead of its default value of 650. This seems to have solved my problem.
I had this problem as well, but naturally wasn’t able to find anything relevant in Google until after I’d figured out my problem. Actually, my problem was slightly different because the ports used by Linux with 2.6 kernels range between 650 and 1023, so there’s no conflict on port 623. To avoid the use of 664, rather than creating a do-nothing daemon, I used sysctl to change the sunrpc.min_resvport from 650 to > 664 and all seems very happy now.