1、 场景
distcp报错,报错日式如下:
Caused by: java.io.IOException: Fail to get checksum, since file *** is under construction.
at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1913)
at org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1853)
at org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1883)
at org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1880)
at org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1877)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1889)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
2、 分析
拷贝的命令如下:
`nohup hadoop distcp -Dmapreduce.task.timeout=1800000 -Ddistcp.dynamic.max.chunks.tolerable=30000 -update -prbugpcaxt -strategy dynamic -i -m 200 -bandwidth 30 -skipcrccheck -direct -numListstatusThreads 40 hdfs://src hdfs://dest`
从报错信息中看到:Fail to get checksum
命令行中,已经-skipcrccheck,不应该再有checksum,由于客户需要拷贝的时候,将权限等信息都带上,通过查看官网,-p 后面的c 为 checksum-type,把c去掉,任务正常运行。
并且不能带有-append,append需要checksum信息