在命令行执行 hadoop fs -ls /
这种 Hadoop Commands 时,系统内部是怎样处理的呢?
1. bash 处理
可以用 linux 的which
和ll
命令查看hadoop
命令的源头:
$ which hadoop /usr/bin/hadoop $ ll /usr/bin/hadoop lrwxrwxrwx 1 root root 24 11-19 18:55 /usr/bin/hadoop -> /etc/alternatives/hadoop $ ll /etc/alternatives/hadoop lrwxrwxrwx 1 root root 64 11-19 18:55 /etc/alternatives/hadoop -> /app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin/hadoop
可以发现,在CDH集群中,hadoop
命令最终指向了/app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin/hadoop
文件,使用vim查看该文件,可以发现命令又被指向了/app/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/hadoop/bin/hadoop
,使用vim查看该文件,以下截取其核心部分:
#core commands *) # the core commands if [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell elif [ "$COMMAND" = "version" ] ; then CLASS=org.apache.hadoop.util.VersionInfo elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar elif [ "$COMMAND" = "key" ] ; then CLASS=org.apache.hadoop.crypto.key.KeyShell elif [ "$COMMAND" = "checknative" ] ; then CLASS=org.apache.hadoop.util.NativeLibraryChecker elif [ "$COMMAND" = "distcp" ] ; then CLASS=org.apache.hadoop.tools.DistCp CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [ "$COMMAND" = "daemonlog" ] ; then CLASS=org.apache.hadoop.log.LogLevel elif [ "$COMMAND" = "archive" ] ; then CLASS=org.apache.hadoop.tools.HadoopArchives CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [ "$COMMAND" = "credential" ] ; then CLASS=org.apache.hadoop.security.alias.CredentialShell elif [ "$COMMAND" = "s3guard" ] ; then CLASS=org.apache.hadoop.fs.s3a.s3guard.S3GuardTool CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [ "$COMMAND" = "trace" ] ; then CLASS=org.apache.hadoop.tracing.TraceAdmin elif [ "$COMMAND" = "classpath" ] ; then if [ "$#" -eq 1 ]; then # No need to bother starting up a JVM for this simple case. echo $CLASSPATH exit else CLASS=org.apache.hadoop.util.Classpath fi elif [[ "$COMMAND" = -* ]] ; then # class and package names cannot begin with a - echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'" exit 1 else CLASS=$COMMAND fi shift exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@" ;;
可以发现,shell脚本对不同的命令行输入(如,fs
、distcp
)指定了不同的$CLASS
变量,映射到不同的类,并最终用Java执行这些类(后续的参数作为Java参数被传入应用中)。
2. Java 处理
以hadoop fs
命令为例:
a. fs
映射到org.apache.hadoop.fs.FsShell
类,该类对应的main方法如下:
public static void main(String argv[]) throws Exception { FsShell shell = newShellInstance(); Configuration conf = new Configuration(); conf.setQuietMode(false); shell.setConf(conf); int res; try { res = ToolRunner.run(shell, argv); } finally { shell.close(); } System.exit(res); }
在main方法中初始化了Configuration,并通过 ToolRunner 的run()方法执行命令。
b. ToolRunner的run()方法代码如下
public static int run(Configuration conf, Tool tool, String[] args) throws Exception{ if(conf == null) { conf = new Configuration(); } GenericOptionsParser parser = new GenericOptionsParser(conf, args); //set the configuration back, so that Tool can configure itself tool.setConf(conf); //get the args w/o generic hadoop args String[] toolArgs = parser.getRemainingArgs(); return tool.run(toolArgs); }
ToolRunner的run方法会实例化一个GenericOptionsParser
,用于解析通用配置,如果命令行中包括fs
、jt
、conf
、libjars
、files
、archives
、D
、tokenCacheFile
这些关键字,就会在这个通用配置解析阶段被解析,并进行相应的配置。
之后,利用getRemainingArgs()
方法获取其它参数。
最后,调用tool.run(),这里的run方法是工具类(即FsShell
)本身实现的。
c. FsShell
的run方法代码如下:
public int run(String argv[]) throws Exception { // initialize FsShell init(); int exitCode = -1; ... try { exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length)); } ... return exitCode; }
沿着代码可以继续深入,研究fs
命令对应的执行逻辑,这篇文章主要是讲述命令是如何被提交执行的,更细节的过程就不继续展开了。
作者:OldChicken_
链接:https://www.jianshu.com/p/a6aed378a1d2
共同学习,写下你的评论
评论加载中...
作者其他优质文章