分布式部署参考Hadoop Cluster Setup
有2台机器10.0.4.145 (NameNode和JobTracker角色),10.0.4.146(DataNode和TaskTracker的角色)。
都建立同样的目录/home/search/hadoop-0.17.1。
在主节点145上 [search@b2bsearch145 ~]$ cat .bash_profile
export HADOOP_HOME=/home/search/hadoop-0.17.1
export PATH=$PATH:$HADOOP_HOME/bin
方便命令行操作
1.1 创建证书建立信任登录过程
建立Master到每一台Slave的SSH受信证书。
确保~/.ssh/authorized_keys的文件权限为600
[search@b2bsearch145 hadoop-0.17.1]$ ll ~/.ssh/authorized_keys
-rw-r--r-- 1 search search 2529 6月25 10:01 /home/search/.ssh/authorized_keys
[search@b2bsearch145 hadoop-0.17.1]$ cat ~/.ssh/authorized_keys |grep 146
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA7naNGEpcbuon2/4M+0FDRp594MNk7jV0U3SaDLlT4vLvo0viCSP/2mEMi7iadaogkSr3FbIHryUsOhZ1MSwiDc2nv3TgxAh3K/jQkbP1MDGdHzOVvScrWcTfpFhDtL29HQJit5fpST0aZDlbCn8LsYX+y171Pun9Q4HyT9TkUL0= search@alitest146
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA06pe9YZTEEqmiutmjWQ1CgnmOWd3xh2YkqDinSuZi7t/Uyg/u/l0vJ5nv196dnYqdJJTyaVUU+ydcS7UJu+ykpeIYZGSL6XC2MqTMCpEVAtqP9WUhFXToJmq0tDrlYTfnYZOCIrDt+hjp+c7E7EH3phtEHdrlaAs9ZvcM/6/4L0= search@intl_search38146
1.2 配置文件
解压后进入conf目录,主要需要修改以下文件:hadoop-env.sh
,hadoop-site.xml
、masters
、slaves
。
默认的hadoop-default.xml仅仅更改dfs.permissions.supergroup属性值为当前用户组名称。bash -c groups获取group名称
<property>
<name>dfs.permissions.supergroup</name>
<value>search</value>
<descrīption>The name of the group of super-users.</descrīption>
</property>
hadoop-env.sh仅仅更改JAVA_HOME
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/ali/jdk1.6
一个可用的hadoop-site.xml
[search@b2bsearch145 conf]$ vi hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://10.0.4.145:54310/</value></property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://10.0.4.145:54311/</value>
</property>
<property> <name>dfs.replication</name>
<value>1</value></property><property>
<name>hadoop.tmp.dir</name>
<value>/home/search/hadoop-0.17.1/tmp/</value></property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512m</value></property>
<property> <name>dfs.block.size</name>
<value>5120000</value> <descrīption>The default block size for new files.</descrīption>
</property></configuration>
~
[search@b2bsearch145 conf]$ cat masters
10.0.4.145
[search@b2bsearch145 conf]$ cat slaves
10.0.4.146
1.3 初始化dfs以及启动进程
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop namenode -format
08/08/21 19:48:54 INFO dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = b2bsearch145/10.0.4.145
STARTUP_MSG: args = [-format]
STARTUP_MSG: version =0.17.1
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 669344; compiled by 'hadoopqa' on Thu Jun 19 01:18:25 UTC 2008
************************************************************/
08/08/21 19:48:54 INFO fs.FSNamesystem: fsOwner=search,search
08/08/21 19:48:54 INFO fs.FSNamesystem: supergroup=search
08/08/21 19:48:54 INFO fs.FSNamesystem: isPermissionEnabled=true
08/08/21 19:48:54 INFO dfs.Storage: Storage directory /home/search/hadoop-0.17.1/tmp/dfs/name has been successfully formatted.
08/08/21 19:48:54 INFO dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at b2bsearch145/10.0.4.145
************************************************************/
[search@b2bsearch145 hadoop-0.17.1]$ bin/start-dfs.sh
starting namenode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-namenode-b2bsearch145.out
10.0.4.146: starting datanode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-datanode-alitest146.out
10.0.4.145: starting secondarynamenode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-secondarynamenode-b2bsearch145.out
[search@b2bsearch145 hadoop-0.17.1]$ bin/start-mapred.sh
starting jobtracker, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-jobtracker-b2bsearch145.out
10.0.4.146: starting tasktracker, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-tasktracker-alitest146.out
[search@b2bsearch145 hadoop-0.17.1]$ /usr/ali/jdk1.6/bin/jps
18390 NameNode
18589 JobTracker
18721 Jps
18521 SecondaryNameNode
1.4 执行分布式统计词
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -copyFromLocal input/ test-in
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls
Found 1 items
/user/search/test-in <dir> 2008-08-21 19:53 rwxr-xr-x search search
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls /user/search/test-in
Found 2 items
/user/search/test-in/hadoop-default.xml <r 1> 37978 2008-08-21 19:53 rw-r--r-- search search
/user/search/test-in/hadoop-site.xml <r 1> 178 2008-08-21 19:53 rw-r--r-- search search
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -cat /user/search/test-in/hadoop-default.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop jar hadoop-0.17.1-examples.jar wordcount /user/search/test-in test-out
08/08/21 19:55:45 INFO mapred.FileInputFormat: Total input paths to process : 2
08/08/21 19:55:46 INFO mapred.JobClient: Running job: job_200808211951_0001
08/08/21 19:55:47 INFO mapred.JobClient: map 0% reduce 0%
08/08/21 19:55:52 INFO mapred.JobClient: map 66% reduce 0%
08/08/21 19:55:54 INFO mapred.JobClient: map 100% reduce 0%
08/08/21 19:56:01 INFO mapred.JobClient: map 100% reduce 100%
08/08/21 19:56:02 INFO mapred.JobClient: Job complete: job_200808211951_0001
08/08/21 19:56:02 INFO mapred.JobClient: Counters: 16
08/08/21 19:56:02 INFO mapred.JobClient: File Systems
08/08/21 19:56:02 INFO mapred.JobClient: Local bytes read=36202
08/08/21 19:56:02 INFO mapred.JobClient: Local bytes written=72658
08/08/21 19:56:02 INFO mapred.JobClient: HDFS bytes read=39559
08/08/21 19:56:02 INFO mapred.JobClient: HDFS bytes written=19133
08/08/21 19:56:02 INFO mapred.JobClient: Job Counters
08/08/21 19:56:02 INFO mapred.JobClient: Launched map tasks=3
08/08/21 19:56:02 INFO mapred.JobClient: Launched reduce tasks=1
08/08/21 19:56:02 INFO mapred.JobClient: Data-local map tasks=3
08/08/21 19:56:02 INFO mapred.JobClient: Map-Reduce Framework
08/08/21 19:56:02 INFO mapred.JobClient: Map input records=1239
08/08/21 19:56:02 INFO mapred.JobClient: Map output records=3888
08/08/21 19:56:02 INFO mapred.JobClient: Map input bytes=38156
08/08/21 19:56:02 INFO mapred.JobClient: Map output bytes=51308
08/08/21 19:56:02 INFO mapred.JobClient: Combine input records=3888
08/08/21 19:56:02 INFO mapred.JobClient: Combine output records=1428
08/08/21 19:56:02 INFO mapred.JobClient: Reduce input groups=1211
08/08/21 19:56:02 INFO mapred.JobClient: Reduce input records=1428
08/08/21 19:56:02 INFO mapred.JobClient: Reduce output records=1211
1.5 观察结果
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls /user/search/test-out
Found 2 items
/user/search/test-out/_logs <dir> 2008-08-21 19:55 rwxr-xr-x search search
/user/search/test-out/part-00000 <r 1> 19133 2008-08-21 19:55 rw-r--r-- search search
[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -cat /user/search/test-out/part-00000 |more
"_logs/history/" 1
"all".</descrīption> 1
"block"(trace 1
"dir"(trac 1
"false", 1
"local", 1
更多命令参考:Hadoop Shell Commands
1.6 Web UI浏览NameNode
1.7 Web UI浏览DataNode
1.8 WEB UI浏览JobTracker
1.9 Web UI浏览TaskTracker
1.10 停止应用
后续比较有意思的一些使用体验也将陆续上传。