Hadoop单节点安装

发表于:2017-5-16 09:34

字体: | 上一篇 | 下一篇 | 我要投稿

 作者:doc001    来源:简书

  9. 格式化HDFS文件系统
  使用下列命令格式化HDFS文件系统:
  hdfs namenode -format
  启动Hadoop
  启动HDFS:
  start-dfs.sh
  启动yarn:
  start-yarn.sh
  HDFS和yarn的web控制台默认监听端口分别为50070和8088。
  如果一切正常,使用jps可以查看到正在运行的Hadoop服务,在我机器上的显示结果为:
  29117 NameNode
  29675 ResourceManager
  29278 DataNode
  30002 NodeManager
  30123 Jps
  29469 SecondaryNameNode
  运行Hadoop任务
  下面以著名的WordCount例子来说明如何使用Hadoop。
  1. 准备程序包
  下面是WordCount的源代码。
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
  编译代码,并打包:
  export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
  bin/hadoop com.sun.tools.javac.Main WordCount.java
  jar cf wc.jar WordCount*.class
  wc.jar就是打包后的Hadoop Mapreduce程序文件。
  2. 准备输入文件
  我们的Hadoop Mapreduce程序从HDFS读取输入文件,同时也将输出存放到HDFS中。本文将测试程序的输入目录和输出目录确定为wordcount/input和wordcount/output。
  在HDFS上创建输入文件夹:
  hdfs dfs -mkdir -p wordcount/input
  准备一些文本文件作为测试数据,本文准备的两个文件如下:
  文件1:input1
The Apache? Hadoop? project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS?): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
  文件2:input2
Apache Hadoop 2.6.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.4.1.
Here is a short overview of the major features and improvements.
Common
Authentication improvements when using an HTTP proxy server. This is useful when accessing WebHDFS via a proxy server.
A new Hadoop metrics sink that allows writing directly to Graphite.
Specification work related to the Hadoop Compatible Filesystem (HCFS) effort.
HDFS
Support for POSIX-style filesystem extended attributes. See the user documentation for more details.
Using the OfflineImageViewer, clients can now browse an fsimage via the WebHDFS API.
The NFS gateway received a number of supportability improvements and bug fixes. The Hadoop portmapper is no longer required to run the gateway, and the gateway is now able to reject connections from unprivileged ports.
The SecondaryNameNode, JournalNode, and DataNode web UIs have been modernized with HTML5 and Javascript.
YARN
YARN's REST APIs now support write/modify operations. Users can submit and kill applications through REST APIs.
The timeline store in YARN, used for storing generic and application-specific information for applications, supports authentication through Kerberos.
The Fair Scheduler supports dynamic hierarchical user queues, user queues are created dynamically at runtime under any specified parent-queue.
  将这两个文件拷贝到wordcount/input:
  hdfs dfs -copyFromLocal input* wordcount/input/
  3. 运行程序
  在Hadoop上执行程序:
  hadoop jar wc.jar WordCount wordcount/input wordcount/output
  程序的结果在wordcount/output,查看输出目录:
  hdfs dfs -ls wordcount/output
  查看输出结果:
  hdfs dfs -cat wordcount/output/part-r-00000
22/2<12
《2023软件测试行业现状调查报告》独家发布~

关注51Testing

联系我们

快捷面板 站点地图 联系我们 广告服务 关于我们 站长统计 发展历程

法律顾问:上海兰迪律师事务所 项棋律师
版权所有 上海博为峰软件技术股份有限公司 Copyright©51testing.com 2003-2024
投诉及意见反馈:webmaster@51testing.com; 业务联系:service@51testing.com 021-64471599-8017

沪ICP备05003035号

沪公网安备 31010102002173号