Saturday, September 22, 2012

Running Hadoop on Ubuntu in Pseudo-Distributed Mode

I tried to run Hadoop on Ubuntu in pseudo-distributed mode today, following are the detailed steps:

Install Ubuntu 11.10 i386 in VirtualBox. In this release, JDK is located in <strong>/usr/lib/jvm/java-6-openjdk</strong> by default.

Add a dedicated Hadoop user account for running Hadoop
<pre lang="bash">sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop</pre>
Configure SSH for Hadoop user
<pre lang="bash">sudo apt-get install ssh
su - hadoop
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys</pre>
Download latest stable release of Hadoop from Hadoop's homepage. I downloaded release 1.0.2 in a gzipped tar file (hadoop-1.0.2-tar.gz). Then uncompress the hadoop-1.0.2.tar.gz.
<pre lang="bash">tar zxvf hadoop-1.0.2.tar.gz
mv hadoop-1.0.2 hadoop</pre>
Configure Hadoop

The $HADOOP_INSTALL/hadoop/conf directory contains some configuration files for Hadoop.

hadoop-env.sh
<pre lang="bash">export JAVA_HOME=/usr/lib/jvm/java-6-openjdk</pre>
core-site.xml

&lt;?xml version="1.0"?&gt;
&lt;?xml-stylesheet type="text/xsl" href="configuration.xsl"?&gt;

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;fs.default.name&lt;/name&gt;
&lt;value&gt;hdfs://localhost:9000&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;

hdfs-site.xml

&lt;?xml version="1.0"?&gt;
&lt;?xml-stylesheet type="text/xsl" href="configuration.xsl"?&gt;

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;dfs.replication&lt;/name&gt;
&lt;value&gt;1&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;

mapred-site.xml

&lt;?xml version="1.0"?&gt;
&lt;?xml-stylesheet type="text/xsl" href="configuration.xsl"?&gt;

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;dfs.replication&lt;/name&gt;
&lt;value&gt;1&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;

Format the HDFS filesystem
<pre lang="bash">
bin/hadoop namenode -format
</pre>
Start your single-node cluster
<pre lang="bash">
bin/start-all.sh
</pre>
Run the WordCount example job
<pre lang="bash">
bin/hadoop fs -copyFromLocal /home/hadoop/test_wc.txt test_wc.txt
bin/hadoop fs -ls
bin/hadoop jar hadoop-examples-1.0.2.jar wordcount test_wc.txt test_wc-output
bin/hadoop fs -cat test_wc-output/part-r-00000
bin/hadoop fs -copyToLocal test_wc-output /home/hadoop/test_wc-output
</pre>

Stop your single-node cluster
<pre lang="bash">
bin/stop-all.sh
</pre>
References:

Hadoop: The Definitive Guide
Running Hadoop On Ubuntu Linux (Single-Node Cluster)

No comments:

Post a Comment