Multi-node Hadoop Cluster Setup

Installing Java 8 (Both Master and Slave)
—————–
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

root@test1:~# java -version
java version “1.8.0_66”
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Configuring Java Environment (Both Master and Slave)
—————————-
$ sudo apt-get install oracle-java8-set-default

Creating hadoop User (Both master and Slave)
——————–
$ adduser hadoop
$ passwd hadoop

Switch to hadoop, generate RSA key and add to its authorized_keys (Both Master and Slave)
—————————————————————–
$ su – hadoop
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

$ ssh localhost
$ exit

Disable IPV6 – Hadoop does not support IPV6 (Both Master and Slave)
——————————————-
vi /etc/sysctl.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

sysctl -p

Download Hadoop (Both Master and Slave)
—————

$ cd /home/hadoop/
$ wget http://apache.claz.org/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
$ tar xzf hadoop-2.7.1.tar.gz
$ mv hadoop-2.7.1 hadoop
$ chown -R hadoop. hadoop

Configuration (Both Master and Slave)
————-

a. Edit the ~/.bashrc file and add contents as below

export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_HOME=/home/hadoop/hbase
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HBASE_HOME/bin

After updating the ~./bashrc run

source ~/.bashrc

b. Edit hadoop configuration files(except hadoop-env.sh) and add the given contents between <configuration> </configuration> tags at the end of each file

$ cd $HADOOP_HOME/etc/hadoop

I. Edit core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://<hadoop_master_IP>:9000</value> //9000 will be set as hdfs port
</property>

II. Edit hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>2</value> // Give the total number of slaves
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>

III. Edit mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> //mapreduce is handled by yarn in newer versions
</property>

IV. Edit  yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Now make the following changes.

Edit hadoop-env.sh and changed JAVA_HOME variable.
i.e. export JAVA_HOME=/usr/lib/jvm/java-8-oracle
Edit yarn-env.sh and changed JAVA_HOME variable.
i.e.   JAVA_HOME=/usr/lib/jvm/java-8-oracle

Edit the file $HADOOP_HOME/etc/hadoop/slaves in master server and add the IP of master and slave
Edit the file $HADOOP_HOME/etc/hadoop/masters in slave server and add the IP address of master

Format the namenode (Only on Master)
——————-
$ hdfs namenode -format

Start the hadoop Cluster (Only on Master)
————————
start-dfs.sh
start-yarn.sh

Leave a Reply

Your email address will not be published. Required fields are marked *