在输入路径弄了几个小part和1个很大的part,结果其他几个part迅速处理完,很大的那个part跑了很久,最后报错:
FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2734)
at java.util.ArrayList.ensureCapacity(ArrayList .java:167)
at java.util.ArrayList.add(ArrayList .java:351)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.finishBlock(HFile.java:369)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:353)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
at com.taobao.dump.HFileOutputFormat$1.write(HFileOutputFormat .java:145)
at com.taobao.dump.HFileOutputFormat$1.write(HFileOutputFormat .java:1)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask .java:513)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext .java:80)
at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer .java:61)
at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer .java:38)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask .java:571)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask .java:413)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation .java:1115)
危险系数:
4颗地雷。给了这么高的地雷的原因是最近也碰到一个我觉得可以归为一类的现象。
就
是一个特殊job,他需要全天在jt上运行直到我们kill掉它,而在jt有个机制是发现某job运行时间过长超过它的超时限制就会kill掉。于是我们
为了不被kill使用了这么一行代码:jobConf.setLong("mapred.task.timeout",
Long.MAX_VALUE);
把超时设为这个函数能承受的最大值,也没有觉得有什么问题并且一直正确运行在hadoop mr1的版本上,直到某天升级yarn,job启动报错
2013-08-15 19:46:56,583 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAp
pMaster
java.lang.NumberFormatException: For input string: "9223372036854775807"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:461)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:903)
at org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler.init(TaskHeartbeatHandler.java:90)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.mapred.TaskAttemptListenerImpl.init(TaskAttemptListenerImpl.java:95)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.init(MRAppMaster.java:328)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1145)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1142)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1097)
2013-08-15 19:46:56,587 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster rece
ived a signal. Signaling RMCommunicator and JobHistoryEventHandler.
2013-08-15 19:46:56,588 WARN [Thread-1] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppM
asterShutdownHook' failed, java.lang.NullPointerException
java.lang.NullPointerException。
开发非常给力的查出发现原因就是:jobConf.setLong("mapred.task.timeout", Long.MAX_VALUE);
在yarn中:mapreduce.task.timeout
. call Configuration.setInt()
instead.
数据类型变为int了。。。原来的那个参数溢出了。。
其实对于超时这个参数,我们只需要一天就够了,用mapreduce.task.timeout=86400000参数,而去掉原来的Long.MAX_VALUE代码就行了。
想想当时为了不被kill,给个最大值也没感觉有什么不合理的,没想到后来数据类型会随着升级而改变。而想想上面oom的例子中为了不让region split,给个最大值似乎也没什么不合理的。但是这些都为程序埋下了隐患。
so~~对于处于某种目的使用的特殊参数,最好还是经过评估给一个合理的数值,而不是为了实现功能而给一个暴力的最大值。