Stale NFS file handle – SPARC and ZFS (x86 update)

Today, I found strange messages on one Solaris 10 SPARC host:

Apr 10 05:38:30 nfsclient nfs: [ID 626546 kern.notice] NFS write error on host nfsserver: Stale NFS file handle.
Apr 10 05:38:30 nfsclient nfs: [ID 702911 kern.notice] (file handle: 90ebba38 9c597108 a00c5 0 fba3300 a0300 0 1d000000)

The task is clear: find the file represented by the file handler 90ebba38 9c597108 a00c5 0 fba3300 a0300 0 1d000000 on the Solaris NFS server (also SPARC).

I cannot find any hint where to start. Some guidelines I found on internet were about the UFS and described file handles with different formats. Also /etc/mnttab does not contain any identification I could match. So it’s time to analyze myself.

First of all we need to identify the file system. After some docs reading I came to the command helped me:

root@nfsserver# echo ::nfs_exptable | mdb -k

<snip>

/export/share1/rw    3015443b180     
    rtvp: 301ead0e240         ref : 1         flag: 0x4000 (EX_ROOT) VROOT
    dvp : 301e396dcc0         anon: 0         logb: 0               
    seci: 302dc9c1b00         nsec: 1         fsid: (0x90ebba38 0x9c597108)
    Security Flavors :
        sys       ref: 1        flag: 0x24 (M_RW,M_EXP)

Bingo! See the fsid – it matches the first 8 bytes of the file handle. Now, we need to find the inode number. I really cannot identify the inode number directly from the given file handle number. Found several articles on the internet describing the process – but on UFS and opensolaris, therefore i386. We have ZFS running on SPARC. And no procedure described elsewehre really worked for my case – I was not successful. Therefore I decided to snoop the traffic while editing a known file:

root@nfsclient# vi /net/nfsserver/export/share1/rw/test.file

root@nfsserver# snoop -v -d nxge2 nfsclient|grep NFS

NFS: ----- Sun NFS -----
NFS:
NFS: Proc = 7 (Write to file)
NFS: File handle = [C0FB]
NFS: 90EBBA389C597108000AE1C700000000CBBE3300000A0300000000001D000000
NFS: Offset = 0
NFS: Size = 5
NFS: Stable = ASYNC
NFS:
RPC: Program = 100003 (NFS), version = 3, procedure = 21

root@nfsclient# ls -li /net/nfsserver/export/share1/rw/test.file
51169 -rw-r----- 1 root Server 4 Apr 10 15:25 /net/nfsserver/export/share1/rw/test.file

root@nfsserver# ls -li /export/share1/rw/test.file
51169 -rw-r----- 1 root Server 4 Apr 10 15:25 /export/share1/rw/test.file

So we double checked that the inode is 51169, let’s convert it to hex:

root@nfsserver# echo "obase=16;51169"|bc
C7E1

The next step is to find where the inode number is stored in the file handle 90EBBA389C597108000AE1C700000000CBBE3300000A0300000000001D000000. The File handle = [C0FB] is just some checksum, not inode number. We already know that the first 8 bytes is fsid. Other info we could get from the opensolaris source code of nfs.h:

   typedef struct {
           fsid_t  _fh3_fsid;                      /* filesystem id */
           ushort_t _fh3_len;                      /* file number length */
           char    _fh3_data[NFS_FH3MAXDATA];      /* and data */
           ushort_t _fh3_xlen;                     /* export file number length */
           char    _fh3_xdata[NFS_FH3MAXDATA];     /* and data */
   } fhandle3_t;

That means we could split the file handle to several parts:

      90EBBA389C597108 filesystem id
                  000A following data block length (10)
  E1C700000000CBBE3300 data 
                  000A following data block length (10)
  0300000000001D000000 data

Already see it? Correct, the inode number C7E1 is stored in the beginning of the first data block in the reverse order (the last 4 bytes is some random “generation” number).

Knowing this, we can now easily parse the initial file handle and find the inode number:

      90ebba389c597108 filesystem id
                  000a 
  00c5000000000fba3300 => inode c500 hex
                  000a
  0300000000001d000000

Now, it’s just a piece of cake:

root@nfsserver# echo "ibase=16;C500"|bc
50432
root@nfsserver# find /export/share1/rw/ -inum 50432 -ls
50432    1 -rw-r--r--   1 stat_adm stat_adm      318 Apr 10 05:38 /export/share1/rw/status/homedir.status

Gotcha!

Update

How it is on i386 clients?

A file on the same file system was reported as file handle 38baeb90 871599c e4b60a00 39 33ba0f 3a000 0 1d on a x86 NFS client. It looks like a completely different number, right? Not really. First of all we need to add missing zeros to the octets:

38baeb90 0871599c e4b60a00 00000039 0033ba0f 0003a000 00000000 0000001d

The x86 hardware is Little-endian (different byte order than SPARC) so we need to change the byte sequence order:

90ebba38 9c597108 000ab6e4 39000000 0fba3300 000a0300 00000000 1d000000

And this is already known format we can split into the parts:

   90ebba38 9c597108 
                000a
b6e4390000000fba3300 => inode 39e4b6
                000a
0300000000001d000000

And now the simplest part:

root@nfsserver# echo "ibase=16;39E4B6"|bc
3794102
root@nfsserver# find /export/share1/rw/ -inum 3794102 -ls
3794102    3 -rw-r--r--   1 stat_adm stat_adm    12898 Apr 11 09:43 /export/share1/rw/var/storageinfo/show_storage.20130411.tmp

Easy, isn’t it? :-)

2 Comments

  1. First of all Such nice preparation, which gives some boost to work on the current issue.
    I getting same error like,
    Oct 8 07:53:32 logas2p nfs: [ID 626546 kern.notice] NFS write error on host pfrxsap1g: Stale NFS file handle.
    Oct 8 07:53:32 logas2p nfs: [ID 702911 kern.notice] (file handle: 53c6d65 2 a0000 b113 78247711 a0000 2 2d24214d 0)

    But is this method applicable for NFS V4 as well? When i try to parse the file handle i am able to find the inode, may the one i found not sure about it..
    Please help i m struggling with prod system

    Once you get the file or directory details what actions can be taken to resolve the issue ?

    Thanks for your reply.

Leave a Reply

Your email address will not be published. Required fields are marked *