在裝Hadoop之前首先需要:
1.java1.6.x 最好是sun的,1.5.x也可以
2.ssh
安裝ssh
$ sudo apt-get install ssh
$ sudo apt-getinstall rsync
下載Hadoop
從http://hadoop.apache.org/core/releases.html 下載最近發(fā)布的版本
最好為hadoop創(chuàng)建一個(gè)用戶:
比如創(chuàng)建一個(gè)group為hadoop user為hadoop的用戶以及組
$ sudo addgroup hadoop
$ sudo adduser--ingroup hadoop hadoop
解壓下載的hadoop文件,放到/home/hadoop目錄下 名字為hadoop
配置JAVA_HOME:
gedit ~/hadoop/conf/hadoop-env.sh
將
- # The java implementation to use. Required.
- # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
# The java implementation to use. Required.# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
修改成java的安裝目錄:(我的是:/usr/lib/jvm/java-6-sun-1.6.0.15)
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.15
現(xiàn)在可以使用單節(jié)點(diǎn)的方式運(yùn)行:
$ cd hadoop
$ mkdir input
$ cp conf/*.xmlinput
$ bin/hadoop jar hadoop-*-examples.jar grep input output'dfs[a-z.]+'
$ cat output/*
Pseudo-distributed方式跑:
配置ssh
$ su - hadoop
$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to savethe key (/home/hadoop/.ssh/id_rsa):
Created directory'/home/hadoop/.ssh'.
Your identification has been saved in/home/hadoop/.ssh/id_rsa.
Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
9d:47:ab:d7:22:54:f0:f9:b9:3b:64:93:12:75:81:27 hadoop@ubuntu
讓其不輸入密碼就能登錄:
hadoop@ubuntu:~$ cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
使用:
$ ssh localhost
看看是不是直接ok了。
hadoop配置文件:
conf/core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value>
- </property>
- <property>
- <name>fs.default.name</name>
- <value>hdfs:
- </property>
- </configuration>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>hadoop.tmp.dir</name><value>/home/hadoop/hadoop-datastore/hadoop-${user.name}</value></property><property><name>fs.default.name</name><value>hdfs://localhost:9000</value></property></configuration>
hadoop.tmp.dir配置為你想要的路徑,${user.name}會(huì)自動(dòng)擴(kuò)展為運(yùn)行hadoop的用戶名
conf/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
<configuration><property><name>dfs.replication</name><value>1</value></property></configuration>
dfs.replication為默認(rèn)block復(fù)制數(shù)量
conf/mapred-site.xml
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
<configuration><property><name>mapred.job.tracker</name><value>localhost:9001</value></property></configuration>
執(zhí)行
格式化分布式文件系統(tǒng):
$ bin/hadoop namenode -format
啟動(dòng)hadoop:
$ bin/start-all.sh
可以從
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
查看NameNode和JobTracker
運(yùn)行例子:
$ bin/hadoop fs -put conf input
$ bin/hadoop jar hadoop-*-examples.jar grep inputoutput 'dfs[a-z.]+'
look at the run result:
$ bin/hadoop fs -get output output
$ catoutput/*
參考:1、http://hadoop.apache.org/common/docs/current/quickstart.html
2、http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29