Hadoop单节点安装

您的位置：
门户
>> 文章精选
>> 软件开发专栏
>> 大数据
>> 查看资讯

发表于：2017-5-16 09:34

字体：大中小 | 上一篇 | 下一篇 | 我要投稿

作者：doc001 来源：简书

　　9. 格式化HDFS文件系统

　　使用下列命令格式化HDFS文件系统：

　　hdfs namenode -format

　　启动Hadoop

　　启动HDFS：

　　start-dfs.sh

　　启动yarn：

　　start-yarn.sh

　　HDFS和yarn的web控制台默认监听端口分别为50070和8088。

　　如果一切正常，使用jps可以查看到正在运行的Hadoop服务，在我机器上的显示结果为：

　　29117 NameNode

　　29675 ResourceManager

　　29278 DataNode

　　30002 NodeManager

　　30123 Jps

　　29469 SecondaryNameNode

　　运行Hadoop任务

　　下面以著名的WordCount例子来说明如何使用Hadoop。

　　1. 准备程序包

　　下面是WordCount的源代码。

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

　　编译代码，并打包：

　　export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

　　bin/hadoop com.sun.tools.javac.Main WordCount.java

　　jar cf wc.jar WordCount*.class

　　wc.jar就是打包后的Hadoop Mapreduce程序文件。

　　2. 准备输入文件

　　我们的Hadoop Mapreduce程序从HDFS读取输入文件，同时也将输出存放到HDFS中。本文将测试程序的输入目录和输出目录确定为wordcount/input和wordcount/output。

　　在HDFS上创建输入文件夹：

　　hdfs dfs -mkdir -p wordcount/input

　　准备一些文本文件作为测试数据，本文准备的两个文件如下：

　　文件1：input1

The Apache? Hadoop? project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.

Hadoop Distributed File System (HDFS?): A distributed file system that provides high-throughput access to application data.

Hadoop YARN: A framework for job scheduling and cluster resource management.

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

　　文件2：input2

Apache Hadoop 2.6.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.4.1.

Here is a short overview of the major features and improvements.

Common

Authentication improvements when using an HTTP proxy server. This is useful when accessing WebHDFS via a proxy server.

A new Hadoop metrics sink that allows writing directly to Graphite.

Specification work related to the Hadoop Compatible Filesystem (HCFS) effort.

HDFS

Support for POSIX-style filesystem extended attributes. See the user documentation for more details.

Using the OfflineImageViewer, clients can now browse an fsimage via the WebHDFS API.

The NFS gateway received a number of supportability improvements and bug fixes. The Hadoop portmapper is no longer required to run the gateway, and the gateway is now able to reject connections from unprivileged ports.

The SecondaryNameNode, JournalNode, and DataNode web UIs have been modernized with HTML5 and Javascript.

YARN

YARN's REST APIs now support write/modify operations. Users can submit and kill applications through REST APIs.

The timeline store in YARN, used for storing generic and application-specific information for applications, supports authentication through Kerberos.

The Fair Scheduler supports dynamic hierarchical user queues, user queues are created dynamically at runtime under any specified parent-queue.

　　将这两个文件拷贝到wordcount/input：

　　hdfs dfs -copyFromLocal input* wordcount/input/

　　3. 运行程序

　　在Hadoop上执行程序：

　　hadoop jar wc.jar WordCount wordcount/input wordcount/output

　　程序的结果在wordcount/output，查看输出目录：

　　hdfs dfs -ls wordcount/output

　　查看输出结果：

　　hdfs dfs -cat wordcount/output/part-r-00000

22/2<12

《2023软件测试行业现状调查报告》独家发布~

搜索风云榜

测试技术了解

2023测试行业调查报告

挣点稿费

车载测试入门

文章资料精选