Ansible Playbook - Setup Hadoop CDH5 Using `tarball`.

Setting up Hadoop using Ansible, we will be using cdh5 tarball for installation of the cluster.

Table of Contents

This is a simple Hadoop playbook, to quickly start hadoop running in a cluster.

Here is the Script Location on Github:

Below are the steps to get started.

Get the script from Github.

Below is the command to clone.

ahmed@ahmed-server ~]$ git clone

Before we start.

Download hadoop-2.3.0-cdh5.1.2.tar.gz to file_archives directory.

Download jdk-7u75-linux-x64.tar.gz to file_archives directory.

Details about each Playbook ‘Roles’.

Details about each Role.


This role is used to update OS parameters and will update the below files.

  1. sysctl.conf Update swapiness, networking and more. Info in defaults/main.yml
  2. limits.conf Update soft and hard limits.
  3. 90-nproc.conf Update user based limits and adding hadoop_user limits file.
  4. /etc/hosts Update hosts file on the server - from host_name in hosts file.

/etc/hosts file will get the server information from the [allnodes] group in the hosts file.

NOTE : Commons will update the HOSTNAME of the server as well as per these entries.


This role install jdk1.7. Installation path - from group_vars/all with variable java_home.


This role will create ssh known hosts for all the hosts in the hosts file.


This role will make hadoop_user passwordless user for hadoop nodes.


This role will install and configure hadoop installation. Update files.

  1. core-site.xml Add Namenode.
  2. hdfs-site.xml Update hdfs parameters - default/main.yml.
  3. mapred-site.xml Update MR information.
  4. yarn-site.xml Update Yarn.
  5. slaves Update slaves information - hosts file.
  6. Update JAVA_HOME - group_vars.


This is hadoop user creation after installation. If we need more users then we need to add them in role post_install_setups.

Current we will create a user called stormadmin. More details in roles/post_install_setups/tasks/create_hadoop_user.yml

#  Creating a Storm User on Namenode/ This will eventually be a edge node.
- hosts: namenodes
  remote_user: root
    - post_install_setups

Step 1. Update below variables as per requirement.

Global Vars can be found in the location group_vars/all.

#  --------------------------------------
#  USERs
#  --------------------------------------

hadoop_user: hdadmin
hadoop_group: hdadmin
hadoop_password: <encrypted_password_here_howto_below>

#  Common Location information.
  install_base_path: /usr/local
  soft_link_base_path: /opt

Step 2. User information come from group_vars.

Username can be changed in the Global Vars, hadoop_user. Currently the password is hdadmin@123

Password can be generated using the below python snippet.

#  Password Generated using python command below.
python -c "from passlib.hash import sha512_crypt; \
            import getpass; print sha512_crypt.encrypt(getpass.getpass())"

Here is the execution. After entering the password you will get the encrypted password which can be used in the user creation.

ahmed@ahmed-server ~]$ python -c "from passlib.hash \
            import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass())"
Enter Password: *******
ahmed@ahmed-server ~]$

Step 3. Update Host File.

IMPORTANT update contents of hosts file. In hosts file host_name is used to create the /etc/hosts file.

#  All pre-prod nodes. 
[allnodes] host_name=ahmd-namenode host_name=ahmd-datanode-01 host_name=ahmd-datanode-02 host_name=ahmd-resourcemanager host_name=ahmd-secondary-namenode host_name=ahmd-datanode-03 host_name=ahmd-datanode-04

#  hadoop cluster







#  sshknown hosts list.


Step 4. Executing yml.

Execute below command.

ansible-playbook ansible_hadoop.yml -i hosts --ask-pass
