Today, I found strange messages on one Solaris 10 SPARC host:
Apr 10 05:38:30 nfsclient nfs: [ID 626546 kern.notice] NFS write error on host nfsserver: Stale NFS file handle. Apr 10 05:38:30 nfsclient nfs: [ID 702911 kern.notice] (file handle: 90ebba38 9c597108 a00c5 0 fba3300 a0300 0 1d000000)
The task is clear: find the file represented by the file handler 90ebba38 9c597108 a00c5 0 fba3300 a0300 0 1d000000
on the Solaris NFS server (also SPARC).
I cannot find any hint where to start. Some guidelines I found on internet were about the UFS and described file handles with different formats. Also /etc/mnttab
does not contain any identification I could match. So it’s time to analyze myself.
First of all we need to identify the file system. After some docs reading I came to the command helped me:
root@nfsserver# echo ::nfs_exptable | mdb -k <snip> /export/share1/rw 3015443b180 rtvp: 301ead0e240 ref : 1 flag: 0x4000 (EX_ROOT) VROOT dvp : 301e396dcc0 anon: 0 logb: 0 seci: 302dc9c1b00 nsec: 1 fsid: (0x90ebba38 0x9c597108) Security Flavors : sys ref: 1 flag: 0x24 (M_RW,M_EXP)
Bingo! See the fsid – it matches the first 8 bytes of the file handle. Now, we need to find the inode number. I really cannot identify the inode number directly from the given file handle number. Found several articles on the internet describing the process – but on UFS and opensolaris, therefore i386. We have ZFS running on SPARC. And no procedure described elsewehre really worked for my case – I was not successful. Therefore I decided to snoop the traffic while editing a known file:
root@nfsclient# vi /net/nfsserver/export/share1/rw/test.file root@nfsserver# snoop -v -d nxge2 nfsclient|grep NFS NFS: ----- Sun NFS ----- NFS: NFS: Proc = 7 (Write to file) NFS: File handle = [C0FB] NFS: 90EBBA389C597108000AE1C700000000CBBE3300000A0300000000001D000000 NFS: Offset = 0 NFS: Size = 5 NFS: Stable = ASYNC NFS: RPC: Program = 100003 (NFS), version = 3, procedure = 21 root@nfsclient# ls -li /net/nfsserver/export/share1/rw/test.file 51169 -rw-r----- 1 root Server 4 Apr 10 15:25 /net/nfsserver/export/share1/rw/test.file root@nfsserver# ls -li /export/share1/rw/test.file 51169 -rw-r----- 1 root Server 4 Apr 10 15:25 /export/share1/rw/test.file
So we double checked that the inode is 51169, let’s convert it to hex:
root@nfsserver# echo "obase=16;51169"|bc C7E1
The next step is to find where the inode number is stored in the file handle 90EBBA389C597108000AE1C700000000CBBE3300000A0300000000001D000000
. The File handle = [C0FB]
is just some checksum, not inode number. We already know that the first 8 bytes is fsid. Other info we could get from the opensolaris source code of nfs.h:
typedef struct { fsid_t _fh3_fsid; /* filesystem id */ ushort_t _fh3_len; /* file number length */ char _fh3_data[NFS_FH3MAXDATA]; /* and data */ ushort_t _fh3_xlen; /* export file number length */ char _fh3_xdata[NFS_FH3MAXDATA]; /* and data */ } fhandle3_t;
That means we could split the file handle to several parts:
90EBBA389C597108 filesystem id 000A following data block length (10) E1C700000000CBBE3300 data 000A following data block length (10) 0300000000001D000000 data
Already see it? Correct, the inode number C7E1
is stored in the beginning of the first data block in the reverse order (the last 4 bytes is some random “generation” number).
Knowing this, we can now easily parse the initial file handle and find the inode number:
90ebba389c597108 filesystem id 000a 00c5000000000fba3300 => inode c500 hex 000a 0300000000001d000000
Now, it’s just a piece of cake:
root@nfsserver# echo "ibase=16;C500"|bc 50432 root@nfsserver# find /export/share1/rw/ -inum 50432 -ls 50432 1 -rw-r--r-- 1 stat_adm stat_adm 318 Apr 10 05:38 /export/share1/rw/status/homedir.status
Gotcha!
Update
How it is on i386 clients?
A file on the same file system was reported as file handle 38baeb90 871599c e4b60a00 39 33ba0f 3a000 0 1d
on a x86 NFS client. It looks like a completely different number, right? Not really. First of all we need to add missing zeros to the octets:
38baeb90 0871599c e4b60a00 00000039 0033ba0f 0003a000 00000000 0000001d
The x86 hardware is Little-endian (different byte order than SPARC) so we need to change the byte sequence order:
90ebba38 9c597108 000ab6e4 39000000 0fba3300 000a0300 00000000 1d000000
And this is already known format we can split into the parts:
90ebba38 9c597108 000a b6e4390000000fba3300 => inode 39e4b6 000a 0300000000001d000000
And now the simplest part:
root@nfsserver# echo "ibase=16;39E4B6"|bc 3794102 root@nfsserver# find /export/share1/rw/ -inum 3794102 -ls 3794102 3 -rw-r--r-- 1 stat_adm stat_adm 12898 Apr 11 09:43 /export/share1/rw/var/storageinfo/show_storage.20130411.tmp
Easy, isn’t it? :-)
First of all Such nice preparation, which gives some boost to work on the current issue.
I getting same error like,
Oct 8 07:53:32 logas2p nfs: [ID 626546 kern.notice] NFS write error on host pfrxsap1g: Stale NFS file handle.
Oct 8 07:53:32 logas2p nfs: [ID 702911 kern.notice] (file handle: 53c6d65 2 a0000 b113 78247711 a0000 2 2d24214d 0)
But is this method applicable for NFS V4 as well? When i try to parse the file handle i am able to find the inode, may the one i found not sure about it..
Please help i m struggling with prod system
Once you get the file or directory details what actions can be taken to resolve the issue ?
Thanks for your reply.
[…] As @user3188445 pointed out, the file handle format has likely changed. Worth reverse-engineering it to figure out the format on your release/platform, see good example at syslog.eu/stale-nfs-file-handle-sparc-zfs […]