📜  使用 Cloudera Distribution Hadoop(CDH) 在 MapReduce 中计算偶数和奇数的总和

📅  最后修改于: 2021-10-27 06:37:38             🧑  作者: Mango

先决条件:Hadoop 和 MapReduce

计算偶数和奇数的数量并在任何语言中找到它们的总和是小菜一碟,就像在 C、C++、 Python、 Java等中一样。MapReduce 也使用Java来编写程序,但如果您知道语法,它会很容易怎么写。它是 MapReduce 的基础。您将首先学习如何执行此代码,类似于其他编程语言中的“Hello World”程序。所以这里是展示如何为偶数和奇数的计数和总和编写 MapReduce 代码的步骤。

例子:

输入:

1 2 3 4 5 6 7 8 9 

输出:

Even    20   // sum of even numbers
Even    4    // count of even numbers 
Odd    25    // sum of odd numbers
Odd    5     // count of odd numbers

脚步:

  • 首先打开Eclipse -> 然后选择File -> New -> Java Project -> Name it EvenOdd -> 然后Finish

  • 在项目中创建三个Java类。将它们命名为EODriver (具有主要函数)、 EOMapperEOReducer
  • 您必须为此包含两个参考库:

    右键单击Project -> 然后选择Build Path -> 单击Configue Build Path

    在上图中,您可以在右侧看到 Add External JARs 选项。单击它并添加以下提及文件。您可以在/usr/lib/ 中找到这些文件

    1. /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.6.0-mr1-cdh5.13.0.jar
    2. /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar

映射器代码:您必须将此程序复制粘贴到EOMapper Java类文件中。

// Importing libraries
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
  
public class EOMapper extends MapReduceBase implements Mapper {
  
    @Override
    // Map function
    public void map(LongWritable key, Text value, OutputCollector output, Reporter rep)
  
    throws IOException
    {
        // Splitting the line into spaces
        String data[] = value.toString().split(" ");
  
        for (String num : data) 
        {
  
            int number = Integer.parseInt(num);
  
            if (number % 2 == 1) 
            {
                // For Odd Numbers
                output.collect(new Text("ODD"), new IntWritable(number));
            }
  
            else 
            {
                // For Even Numbers
                output.collect(new Text("EVEN"), 
                       new IntWritable(number));
            }
        }
    }
}

Reducer 代码:您必须将此程序复制粘贴到EOReducer Java Class 文件中。

// Importing libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
  
public class EOReducer extends MapReduceBase implements Reducer {
  
    @Override
    // Reduce Function
    public void reduce(Text key, Iterator value,
     OutputCollector output, Reporter rep)
  
    throws IOException
    {
  
        // For finding sum and count of even and odd
        // you don't have to take different variables
        int sum = 0, count = 0;
        if (key.equals("ODD")) 
        {
            while (value.hasNext())
            {
                IntWritable i = value.next();
  
                // Finding sum and count of ODD Numbers
                sum += i.get();
                count++;
            }
        }
  
        else 
        {
            while (value.hasNext()) 
            {
                IntWritable i = value.next();
  
                // Finding sum and count of EVEN Numbers
                sum += i.get();
                count++;
            }
        }
  
        // First sum then count is printed
        output.collect(key, new IntWritable(sum));
        output.collect(key, new IntWritable(count));
    }
}

驱动程序代码:您必须将此程序复制粘贴到EODriver Java类文件中。

// Importing libraries
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
  
public class EODriver extends Configured implements Tool {
  
    @Override
    public int run(String[] args) throws Exception
    {
        if (args.length < 2) 
        {
            System.out.println("Please enter valid arguments");
            return -1;
        }
  
        JobConf conf = new JobConf(EODriver.class);
        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        conf.setMapperClass(EOMapper.class);
        conf.setReducerClass(EOReducer.class);
        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(IntWritable.class);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
  
        JobClient.runJob(conf);
        return 0;
    }
  
    // Main Method
    public static void main(String args[]) throws Exception
    {
        int exitcode = ToolRunner.run(new EODriver(), args);
        System.out.println(exitcode);
    }
}
  • 现在你必须制作一个 jar 文件。右键单击项目->单击导出->选择导出目标为 Jar 文件->命名 jar 文件(EvenOdd.jar) ->单击下一步-> 最后单击完成。现在将此文件复制到 Cloudera 的 Workspace 目录中

  • 在 CDH 上打开终端并将目录更改为工作区。您可以使用“cd workspace/”命令来完成此操作。现在,创建一个文本文件( EOFile.txt )并将其移动到 HDFS。对于那个打开的终端并编写此代码(请记住,您应该与刚刚创建的 jar 文件位于同一目录中)。

    现在,运行此命令将文件输入文件复制到 HDFS。

    hadoop fs -put EOFile.txt EOFile.txt

  • 现在使用以下语法运行 jar 文件:“hadoop jar JarFilename DriverClassName TextFileName OutPutFolderName”

  • 执行代码后,您可以在EOOutput文件中或通过在终端上编写以下命令来查看结果。
    hadoop fs -cat EOOutput/part-00000