本文介绍hdfs的相关操作shell,Hadoop格式化、启动、以及其他常见的进程操作命令,以便在操作hadoop时能够快速查看命令,起到备忘的作用。
一. HDFS shell
# 1. 上传文件
hdfs dfs -put ./wuguo.txt /sanguo
# 2. 追加文件
hdfs dfs -appendToFile liubei.txt /sanguo/shuguo.txt
#3. 下载文件
hdfs dfs -get /sanguo/shuguo.txt ./shuguo2.txt
#3. 修改文件所属权限
hdfs dfs -chmod 666 /sanguo/shuguo.txt
hdfs dfs -chown atguigu:atguigu /sanguo/shuguo.txt
#4. 文件大小
hdfs dfs -du -s -h /jinguo
说明:27表示文件大小;81表示27*3个副本;/jinguo表示查看的目录
#5. 递归删除
hdfs dfs -rm -r /sanguo
#删除
hdfs dfs -rm /sanguo/shuguo.txt
# 展示文件大小
hadoop fs -du -h hdfs://nn1node:9000/file_path/test_performance_10m
6.7 G 20.0 G hdfs://path/test_performance_10m/pday=20230101
二. Yarn常见命令
1. application操作
1)列出所有Application:yarn application -list
yarn application -list
2021-02-06 10:21:19,238 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2)根据Application状态获取app列表:yarn application -list -appStates
(所有状态:ALL、NEW、NEW_SAVING、SUBMITTED、ACCEPTED、RUNNING、FINISHED、FAILED、KILLED)
yarn application -list -appStates FINISHED
2021-02-06 10:22:20,029 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [FINISHED] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1612577921195_0001 word count MAPREDUCE atguigu default FINISHED SUCCEEDED 100% http://hadoop102:19888/jobhistory/job/job_1612577921195_0001
3)Kill掉Application
yarn application -kill application_1612577921195_0001
2021-02-06 10:23:48,530 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application application_1612577921195_0001 has already finished
2. 查看日志
1)查询Application日志:
yarn logs -applicationId <ApplicationId>
2)查询Container日志:yarn logs -applicationId < AppId> -containerId < ContainerId >
yarn logs -applicationId application_1612577921195_0001 -containerId container_1612577921195_0001_01_000001
3. 查看container
1)列出所有Container:yarn container -list < ApplicationAttemptId >
yarn container -list appattempt_1612577921195_0001_000001
2021-02-06 10:28:41,396 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of containers :0
Container-Id Start Time Finish Time State Host Node Http Address
2)打印Container状态: yarn container -status < ContainerId >
yarn container -status container_1612577921195_0001_01_000001
2021-02-06 10:29:58,554 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Container with id 'container_1612577921195_0001_01_000001' doesn't exist in RM or Timeline Server.
注:只有在任务跑的途中才能看到container的状态
4. node状态
列出所有节点:yarn node -list -all
yarn node -list -all
2021-02-06 10:31:36,962 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop103:38168 RUNNING hadoop103:8042 0
hadoop102:42012 RUNNING hadoop102:8042 0
hadoop104:39702 RUNNING hadoop104:8042
5. 更新队列
更新队列:yarn rmadmin -refreshQueues
yarn rmadmin -refreshQueues
2021-02-06 10:32:03,331 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8033
三. 启停命令
1. 启停方式一
启动
sbin/start-dfs.sh
sbin/start-yarn.sh
停止
sbin/yarn-dfs.sh
sbin/yarn-yarn.sh
2. 启停方式二
hdfs --daemon start/stop journalnode/namenode/datanode/zkfc
yarn --daemon start/stop resourcemanager/nodemanager
四. 一些场景命令ing
总结一些常见的hadoop场景
1. 关闭安全模式
hadoop dfsadmin -safemode leave
2. 查看那个节点是activeNamenode
hdfs haadmin -getServiceState nn1
#active
hdfs haadmin -getServiceState nn2
# standby