首页手记编写MapReduce程序

编写MapReduce程序

标签：

大数据

MapReduce阶段将整个运行过程分为两个阶段，Map阶段和Reduce阶段。

Map阶段由一定数量的Map Task组成
输入数据格式解析：InputFormat
输入的数据处理：Mapper
输入数据分组：Partitioner
数据的拷贝与按key排序
数据处理：Reducer
数据的输出格式：outputFormat

JAVA

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {     public static class TokenizerMapper
                extends Mapper<Object, Text, Text, IntWritable> {            private final static IntWritable one = new IntWritable(1);            private Text word = new Text();            public void map(Object key, Text value, Context context
            ) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }        public static class IntSumReducer
                extends Reducer<Text, IntWritable, Text, IntWritable> {            private IntWritable result = new IntWritable();            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context
            ) throws IOException, InterruptedException {                int sum = 0;                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key,result);
            }
        }        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path("input/"));
            FileOutputFormat.setOutputPath(job, new Path("output/"));
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }

}

C++

mapper

#include <iostream>#include <string>using namespace std;int main() {  string key;  while(cin >> key) {    cout << key << "\t" << "1" << endl;
  }  return 0;
}

reducer

//reduce前是已经排序后的数据#include <iostream>#include <string>using namespace std;int main() {  string cur_key, last_key, value;  cin >> cur_key >> value;
  last_key = cur_key;  int n = 1;  while(cin >> cur_key) {    cin >> value;    if(last_key != cur_key) {      cout << last_key << "\t" << n << endl;
      last_key = cur_key;
      n = 1;
    } else {
      n++;
    }
  }  cout << last_key << "\t" << n << endl;  return 0;
}

shell

mapper

#! /bin/bashwhile read LINE; do
  for word in $LINE
  do
    echo "$word 1"
  donedone

reducer

#! /bin/bashcount=0
started=0
word=""while read LINE;do
  newword=`echo $LINE | cut -d ' '  -f 1`  if [ "$word" != "$newword" ];then
    [ $started -ne 0 ] && echo "$word\t$count"
    word=$newword
    count=1
    started=1  else
    count=$(( $count + 1 ))  fidoneecho "$word\t$count"

作者：张晓天a
链接：https://www.jianshu.com/p/a75ef8b6e7db

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕神8447489

手记
篇

粉丝

174

获赞与收藏

957

关注作者，订阅最新文章

阅读免费教程

后端通用面试教程

41个小节 30835 345

网络编程入门教程

20个小节 12725 240

Pandas 入门教程

25个小节 18607 342

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

编写MapReduce程序

JAVA

C++

shell

阅读免费教程