Hadoop实践(二)Mapreduce编程-创新互联

Mapreduce 编程,本文以WordCount  为例:实现文件字符统计

成都创新互联坚持“要么做到,要么别承诺”的工作理念,服务领域包括:网站设计制作、成都做网站、企业官网、英文网站、手机端网站、网站推广等服务,满足客户于互联网时代的太和网站设计、移动媒体设计的需求,帮助企业找到有效的互联网解决方案。努力成为您成熟可靠的网络建设合作伙伴!

    在eclipse 里面搭建一个java项目,引入hadoop lib目录下的jar,和 hadoop主目录下的jar。

    新建WordCount 类:

package org.scf.wordcount;

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

public static class Map extends MapReduceBase implements Mapper {

   private final static IntWritable one = new IntWritable(1);

   private Text word = new Text();

   public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

   String line = value.toString();

    StringTokenizer tokenizer = new StringTokenizer(line);

    while (tokenizer.hasMoreTokens()) {

     word.set(tokenizer.nextToken());

     output.collect(word, one);

    }

   }

  }

 public static class Reduce extends MapReduceBase implements Reducer {

   public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

   int sum = 0;

    while (values.hasNext()) {

     sum += values.next().get();

    }

    output.collect(key, new IntWritable(sum));

   }

  }

  public static void main(String[] args) throws Exception {

   JobConf conf = new JobConf(WordCount.class);

   conf.setJobName("wordcount");

   conf.setOutputKeyClass(Text.class);

   conf.setOutputValueClass(IntWritable.class);

   conf.setMapperClass(Map.class);

   conf.setCombinerClass(Reduce.class);

   conf.setReducerClass(Reduce.class);

   conf.setInputFormat(TextInputFormat.class);

   conf.setOutputFormat(TextOutputFormat.class);

   FileInputFormat.setInputPaths(conf, new Path(args[0]));

   FileOutputFormat.setOutputPath(conf, new Path(args[1]));

   JobClient.runJob(conf);

  }

}

2.编译,运行该类

 cd /home/Hadoop/

mkdir wordcount_classes

javac -classpath /usr/hadoop-1.0.4/hadoop-core-1.0.4.jar -d /home/Hadoop/wordcount_classes WordCount.java

 jar -cvf /home/Hadoop/wordcount.jar -C /home/Hadoop/wordcount_classes/ .

 hadoop dfs -put /home/Hadoop/test.txt  /user/root/wordcount/input/file2

 hadoop dfs -put /home/Hadoop/test1.txt  /user/root/wordcount/input/file3

 hadoop jar /home/Hadoop/wordcount.jar org.scf.wordcount.WordCount /user/root/wordcount/input /user/root/wordcount/output

hadoop dfs -ls /user/root/wordcount/output

 hadoop dfs -cat /user/root/wordcount/output/part-00000

另外有需要云服务器可以了解下创新互联scvps.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。


当前名称:Hadoop实践(二)Mapreduce编程-创新互联
标题链接:http://ybzwz.com/article/ddhooc.html