淘宝商城(天猫)高级技术专家.3年研发+3年性能测试调优/系统测试+4年团队管理与测试架构、研发系统实践. 新舞台新气象, 深化测试基础架构及研发架构,希望能在某个技术领域成为真正的技术大牛。欢迎荐才http://bbs.51testing.com/viewthread.php?tid=120496&extra=&page=1 .邮件: jianzhao.liangjz@alibaba-inc.com,MSN:liangjianzhao@163.com.微博:http://t.sina.com.cn/1674816524

分布式计算hadoop部署

上一篇 / 下一篇  2008-08-22 20:24:35 / 个人分类:自动化测试框架实现与优化

 

很多人听说google 的云计算,基础mapreduce、gfs,但都停留于纸面。apache和yahoo 合作有一个类似项目hadoop,国内已经有实际公司在应用,如阿里妈妈,国外有hive项目。

说这么多,不如实际部署一个体验下。

 

hadoop要求sun jdk1.5或者以上,linux 平台。


更多参考http://www.infoq.com/cn/articles/hadoop-config-tip
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)#Prerequisites

 

分布式部署

参考Hadoop Cluster Setup

 

2台机器10.0.4.145 (NameNodeJobTracker角色),10.0.4.146(DataNodeTaskTracker的角色)

都建立同样的目录/home/search/hadoop-0.17.1

 

在主节点145 [search@b2bsearch145 ~]$ cat .bash_profile

export HADOOP_HOME=/home/search/hadoop-0.17.1

export PATH=$PATH:$HADOOP_HOME/bin

 

方便命令行操作

1.1   创建证书建立信任登录过程

建立Master到每一台SlaveSSH受信证书。

 

确保~/.ssh/authorized_keys的文件权限为600

[search@b2bsearch145 hadoop-0.17.1]$ ll ~/.ssh/authorized_keys

-rw-r--r-- 1 search search 2529 625 10:01 /home/search/.ssh/authorized_keys

[search@b2bsearch145 hadoop-0.17.1]$ cat ~/.ssh/authorized_keys |grep 146

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA7naNGEpcbuon2/4M+0FDRp594MNk7jV0U3SaDLlT4vLvo0viCSP/2mEMi7iadaogkSr3FbIHryUsOhZ1MSwiDc2nv3TgxAh3K/jQkbP1MDGdHzOVvScrWcTfpFhDtL29HQJit5fpST0aZDlbCn8LsYX+y171Pun9Q4HyT9TkUL0= search@alitest146

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA06pe9YZTEEqmiutmjWQ1CgnmOWd3xh2YkqDinSuZi7t/Uyg/u/l0vJ5nv196dnYqdJJTyaVUU+ydcS7UJu+ykpeIYZGSL6XC2MqTMCpEVAtqP9WUhFXToJmq0tDrlYTfnYZOCIrDt+hjp+c7E7EH3phtEHdrlaAs9ZvcM/6/4L0= search@intl_search38146

 

1.2   配置文件

解压后进入conf目录,主要需要修改以下文件:hadoop-env.shhadoop-site.xmlmastersslaves

 

默认的hadoop-default.xml仅仅更改dfs.permissions.supergroup属性值为当前用户组名称。bash -c groups获取group名称

<property>

 <name>dfs.permissions.supergroup</name>

  <value>search</value>

 <descrīption>The name of the group of super-users.</descrīption>

</property>

 

 

hadoop-env.sh仅仅更改JAVA_HOME

 

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

export JAVA_HOME=/usr/ali/jdk1.6

 

 

一个可用的hadoop-site.xml

 

 

[search@b2bsearch145 conf]$ vi hadoop-site.xml

 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://10.0.4.145:54310/</value></property>

<property>

<name>mapred.job.tracker</name>

<value>hdfs://10.0.4.145:54311/</value>

</property>

<property>  <name>dfs.replication</name>

<value>1</value></property><property>

<name>hadoop.tmp.dir</name>

<value>/home/search/hadoop-0.17.1/tmp/</value></property>

<property>

<name>mapred.child.java.opts</name>

<value>-Xmx512m</value></property>

<property> <name>dfs.block.size</name>

<value>5120000</value> <descrīption>The default block size for new files.</descrīption>

</property></configuration>

 

~                                          

[search@b2bsearch145 conf]$ cat masters

10.0.4.145

[search@b2bsearch145 conf]$ cat slaves

10.0.4.146

 

 

1.3   初始化dfs以及启动进程

 

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop namenode -format

08/08/21 19:48:54 INFO dfs.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:  host = b2bsearch145/10.0.4.145

STARTUP_MSG:  args = [-format]

STARTUP_MSG:  version =0.17.1

STARTUP_MSG:  build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 669344; compiled by 'hadoopqa' on Thu Jun 19 01:18:25 UTC 2008

************************************************************/

08/08/21 19:48:54 INFO fs.FSNamesystem: fsOwner=search,search

08/08/21 19:48:54 INFO fs.FSNamesystem: supergroup=search

08/08/21 19:48:54 INFO fs.FSNamesystem: isPermissionEnabled=true

08/08/21 19:48:54 INFO dfs.Storage: Storage directory /home/search/hadoop-0.17.1/tmp/dfs/name has been successfully formatted.

08/08/21 19:48:54 INFO dfs.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at b2bsearch145/10.0.4.145

************************************************************/

 

[search@b2bsearch145 hadoop-0.17.1]$ bin/start-dfs.sh

starting namenode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-namenode-b2bsearch145.out

10.0.4.146: starting datanode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-datanode-alitest146.out

10.0.4.145: starting secondarynamenode, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-secondarynamenode-b2bsearch145.out

[search@b2bsearch145 hadoop-0.17.1]$ bin/start-mapred.sh

starting jobtracker, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-jobtracker-b2bsearch145.out

 

10.0.4.146: starting tasktracker, logging to /home/search/hadoop-0.17.1/bin/../logs/hadoop-search-tasktracker-alitest146.out

 

[search@b2bsearch145 hadoop-0.17.1]$ /usr/ali/jdk1.6/bin/jps

18390 NameNode

18589 JobTracker

18721 Jps

18521 SecondaryNameNode

 

 

1.4   执行分布式统计词

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -copyFromLocal input/ test-in

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls

Found 1 items

/user/search/test-in   <dir>          2008-08-21 19:53       rwxr-xr-x      search search

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls /user/search/test-in

Found 2 items

/user/search/test-in/hadoop-default.xml <r 1>  37978  2008-08-21 19:53       rw-r--r--      search search

/user/search/test-in/hadoop-site.xml   <r 1>  178    2008-08-21 19:53       rw-r--r--      search search

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -cat /user/search/test-in/hadoop-default.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop jar hadoop-0.17.1-examples.jar wordcount /user/search/test-in test-out

08/08/21 19:55:45 INFO mapred.FileInputFormat: Total input paths to process : 2

08/08/21 19:55:46 INFO mapred.JobClient: Running job: job_200808211951_0001

08/08/21 19:55:47 INFO mapred.JobClient: map 0% reduce 0%

08/08/21 19:55:52 INFO mapred.JobClient: map 66% reduce 0%

08/08/21 19:55:54 INFO mapred.JobClient: map 100% reduce 0%

08/08/21 19:56:01 INFO mapred.JobClient: map 100% reduce 100%

08/08/21 19:56:02 INFO mapred.JobClient: Job complete: job_200808211951_0001

08/08/21 19:56:02 INFO mapred.JobClient: Counters: 16

08/08/21 19:56:02 INFO mapred.JobClient:  File Systems

08/08/21 19:56:02 INFO mapred.JobClient:    Local bytes read=36202

08/08/21 19:56:02 INFO mapred.JobClient:    Local bytes written=72658

08/08/21 19:56:02 INFO mapred.JobClient:    HDFS bytes read=39559

08/08/21 19:56:02 INFO mapred.JobClient:    HDFS bytes written=19133

08/08/21 19:56:02 INFO mapred.JobClient:  Job Counters

08/08/21 19:56:02 INFO mapred.JobClient:    Launched map tasks=3

08/08/21 19:56:02 INFO mapred.JobClient:    Launched reduce tasks=1

08/08/21 19:56:02 INFO mapred.JobClient:    Data-local map tasks=3

08/08/21 19:56:02 INFO mapred.JobClient:  Map-Reduce Framework

08/08/21 19:56:02 INFO mapred.JobClient:    Map input records=1239

08/08/21 19:56:02 INFO mapred.JobClient:    Map output records=3888

08/08/21 19:56:02 INFO mapred.JobClient:    Map input bytes=38156

08/08/21 19:56:02 INFO mapred.JobClient:    Map output bytes=51308

08/08/21 19:56:02 INFO mapred.JobClient:    Combine input records=3888

08/08/21 19:56:02 INFO mapred.JobClient:    Combine output records=1428

08/08/21 19:56:02 INFO mapred.JobClient:    Reduce input groups=1211

08/08/21 19:56:02 INFO mapred.JobClient:    Reduce input records=1428

08/08/21 19:56:02 INFO mapred.JobClient:    Reduce output records=1211

 

1.5   观察结果

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -ls  /user/search/test-out 

Found 2 items

/user/search/test-out/_logs    <dir>          2008-08-21 19:55       rwxr-xr-x      search search

/user/search/test-out/part-00000       <r 1>  19133  2008-08-21 19:55       rw-r--r--      search search

[search@b2bsearch145 hadoop-0.17.1]$ bin/hadoop dfs -cat /user/search/test-out/part-00000 |more

"_logs/history/"       1

"all".</descrīption>   1

"block"(trace  1

"dir"(trac     1

"false",       1

"local",       1

 

更多命令参考:Hadoop Shell Commands

 

 

1.6   Web UI浏览NameNode

 

 

1.7   Web UI浏览DataNode

 

 

 

1.8   WEB UI浏览JobTracker

 

 

 

1.9   Web UI浏览TaskTracker

 

1.10      停止应用

 

后续比较有意思的一些使用体验也将陆续上传。


相关阅读:

TAG: 分布式 计算 hadoop

panluhai的个人空间 引用 删除 panluhai   /   2011-06-28 15:34:50
感谢偶像分享,最近也在研究hadoop
panluhai的个人空间 引用 删除 panluhai   /   2011-06-28 15:34:19
5
引用 删除 liucs1986   /   2010-06-12 14:02:03
5
阿里巴巴一个测试架构师 引用 删除 liangjz   /   2008-08-22 20:26:36
比较遗憾,后面的图片无法贴上去。
但具体的端口号可以参考conf/hadoop-default.xml
 

评分:0

我来说两句

Open Toolbar