NFS环境搭建:
server
- yum install -y nfs-utils rpcbind
- systemctl start rpcbind
- systemctl start nfs-server
- echo "/tmp/test *(rw,sync,no_root_squash)" >> /etc/exports
- exportfs -rv
client
- yum install -y nfs-utils rpcbind
- mount -t nfs serverIp:/tmp/test/son nfs
问题复现:
服务端:
- dd if=/dev/zero of=img.xfs bs=1M count=100
- mkfs.xfs img.xfs
- mkdir /tmp/test
- mount -o loop img.xfs /tmp/test
- mkdir /tmp/test/son
- echo "/tmp/test *(rw,async,no_root_squash_no_all_squash,insecure,fsid=1)" > /etc/exports
- exportfs -rv
客户端:
- mount -t nfs -o vers=3 serverIp:/tmp/test/son nfs # 使用nfsv3才能复现
服务端:
- echo > /etc/exports
- exportfs -rv
- umount /tmp/test
排查:
- 只有nfs mount 子目录才会有问题,mount共享文件夹的根目录没问题
- 在服务区删除共享前,用fuse看/tmp/test有内核现场knfsd占用,删除共享后可以看到内核进程占用没了,但还是umount不来
- 调试看是mnt->mnt_count这个引用计数不对导致umount不掉
- 在exp_rootfh->exp_parent里如果是client mount的是子目录,那么会对mnt->mnt_count加1,是在这里多加的1,再umount时引用计数为3就返回busy,正常umount时计数是2
- exp_rootfh->exp_parent->exp_get_by_name->sunrpc_cache_lookup_rcu->svc_export_init->path_get->mnt_get
修复patch:
SUNRPC/cache: Allow garbage collection of invalid cache entries
https://patchwork.kernel.org/project/linux-nfs/patch/20200114165738.922961-1-trond.myklebust@hammerspace.com/