首页猿问从简单的Java程序调用mapre...

从简单的Java程序调用mapreduce作业

Java

手掌心 2019-11-27 14:42:41

我一直试图从同一程序包中的简单Java程序调用mapreduce作业。.我试图在java程序中引用mapreduce jar文件，并runJar(String args[])通过传递mapreduce作业的输入和输出路径，使用该方法调用它..但是该程序可以正常工作..我如何运行这样的程序，在该程序中，我只使用传递输入，输出和jar路径的主要方法？是否可以通过它运行mapreduce作业（jar）？我想要这样做是因为我要一个接一个地运行多个mapreduce作业，其中我的Java程序vl通过引用其jar文件来调用每个此类作业。如果可能的话，我不妨只使用一个简单的servlet进行此类调用并出于图形目的参考其输出文件。/* * To change this template, choose Tools | Templates * and open the template in the editor. *//** * * @author root */import org.apache.hadoop.util.RunJar;import java.util.*;public class callOther { public static void main(String args[])throws Throwable { ArrayList arg=new ArrayList(); String output="/root/Desktp/output"; arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar"); arg.add("/root/Desktop/input"); arg.add(output); RunJar.main((String[])arg.toArray(new String[0])); }}

查看完整描述

3 回答

慕运维8079593

TA贡献1876条经验获得超5个赞

哦，请不要使用runJar，Java API非常好。

了解如何从常规代码开始工作：

// create a configuration

Configuration conf = new Configuration();

// create a new job based on the configuration

Job job = new Job(conf);

// here you have to put your mapper class

job.setMapperClass(Mapper.class);

// here you have to put your reducer class

job.setReducerClass(Reducer.class);

// here you have to set the jar which is containing your

// map/reduce class, so you can use the mapper class

job.setJarByClass(Mapper.class);

// key/value of your reducer output

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

// this is setting the format of your input, can be TextInputFormat

job.setInputFormatClass(SequenceFileInputFormat.class);

// same with output

job.setOutputFormatClass(TextOutputFormat.class);

// here you can set the path of your input

SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));

// this deletes possible output paths to prevent job failures

FileSystem fs = FileSystem.get(conf);

Path out = new Path("files/out/processed/");

fs.delete(out, true);

// finally set the empty out path

TextOutputFormat.setOutputPath(job, out);

// this waits until the job completes and prints debug out to STDOUT or whatever

// has been configured in your log4j properties.

job.waitForCompletion(true);

如果您使用的是外部群集，则必须通过以下方式将以下信息放入配置中：

// this should be like defined in your mapred-site.xml

conf.set("mapred.job.tracker", "jobtracker.com:50001");

// like defined in hdfs-site.xml

conf.set("fs.default.name", "hdfs://namenode.com:9000");

当hadoop-core.jar位于您的应用程序容器类路径中时，这应该没问题。但是我认为您应该在网页上放置某种进度指示器，因为完成一项Hadoop工作可能需要几分钟到几小时;）

对于YARN（> Hadoop 2）

对于YARN，需要设置以下配置。

// this should be like defined in your yarn-site.xml

conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001");

// framework is now "yarn", should be defined like this in mapred-site.xm

conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml

conf.set("fs.default.name", "hdfs://namenode.com:9000");

反对回复 2019-11-27

尚方宝剑之说

TA贡献1788条经验获得超4个赞

因为映射和减少在不同机器上的运行，所以所有引用的类和jar必须在机器之间移动。

如果您有程序包jar，并且在您的桌面上运行，则@ThomasJungblut的答案是可以的。但是，如果您在Eclipse中运行，请右键单击您的类并运行，它不起作用。

代替：

job.setJarByClass(Mapper.class);

使用：

job.setJar("build/libs/hdfs-javac-1.0.jar");

同时，您的jar清单必须包含Main-Class属性，这是您的主类。

对于gradle用户，可以将这些行放在build.gradle中：

jar {

manifest {

attributes("Main-Class": mainClassName)

}}

反对回复 2019-11-27

3 回答
0 关注
853 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

从简单的Java程序调用mapreduce作业

从简单的Java程序调用mapreduce作业

3 回答

添加回答