引言
借年底盛宴品鉴之风,继续抒我Hadoop之情,本篇文章介绍如何对Hadoop的MapReduce进行单元测试。MapReduce的开发周期差不多是这样:编写mapper和reducer、编译、打包、提交作业和结果检索等,这个过程比较繁琐,一旦提交到分布式环境出了问题要定位调试,重复这样的过程实在无趣,因此先对MapReduce做单元测试,消除明显的代码bug尤为必要。
MRUnit简介
MRUnit是一款由Couldera公司开发的专门针对Hadoop中编写MapReduce单元测试的框架。可以用MapDriver单独测试Map,用ReduceDriver单独测试Reduce,用MapReduceDriver测试MapReduce作业。
实战
我们将利用MRUnit对本系列上篇文章MapReduce基本编程中的字数统计功能进行单元测试。
· 加入MRUnit依赖
<dependency> <groupId>com.cloudera.hadoop</groupId> <artifactId>hadoop-mrunit</artifactId> <version>0.20.2-320</version> <scope>test</scope> </dependency> |
· 单独测试Map
public class WordCountMapperTest { private Mappermapper; private MapDriverdriver; @Before public voidinit(){ mapper = newWordCountMapper(); driver = newMapDriver(mapper); } @Test public voidtest() throws IOException{ String line ="Taobao is a great website"; driver.withInput(null,newText(line)) .withOutput(newText("Taobao"),new IntWritable(1)) .withOutput(newText("is"), new IntWritable(1)) .withOutput(newText("a"), new IntWritable(1)) .withOutput(newText("great"), new IntWritable(1)) .withOutput(newText("website"), new IntWritable(1)) .runTest(); } } |
上面的例子通过MapDriver的withInput和withOutput组织map函数的输入键值和期待的输出键值,通过runTest方法运行作业,测试Map函数。测试运行通过。
· 单独测试Reduce
public class WordCountReducerTest { private Reducerreducer; privateReduceDriver driver; @Before public voidinit(){ reducer = newWordCountReducer(); driver = newReduceDriver(reducer); } @Test public voidtest() throws IOException{ String key ="taobao"; List values =new ArrayList(); values.add(newIntWritable(2)); values.add(newIntWritable(3)); driver.withInput(new Text("taobao"), values) .withOutput(new Text("taobao"), new IntWritable(5)) .runTest(); } } |