How to set up a Cloudera cluster with VirtualBox

If you ever wanted to explore a big data cluster at home, here’s the details you need.

The most important part is the memory RAM. I have 32 GB. I allocated 15GB for namenode and 3.5GB for every datanode.

Set up : 4VMs , 1 namenode, 3 datanode.

OS: Centos 7.8

Step 1

Install Centos 7.8 on one machine. Then clone it to have 4 machines.

Step 2

Network : Configure Hostname , IP address and /etc/hosts, disable SELinux, and SSH keyless.

Hostname : namenode , dn01, dn02 , dn03

[root@namenode ~]# nmtui

IP address :

First of all – Configure NAT NETWORK on virtualBOX, and for each VM change network to this one.

[root@namenode ~]# vi /etc/sysconfig/network-scripts/ifcfg-enp0s3
NAME=enp0s3
DEVICE=enp0s3
ONBOOT=yes
IPADDR=10.0.2.100
DNS1=8.8.8.8
DNS2=4.4.4.4
GATEWAY=10.0.2.1
NETMASK=255.0.0.0

I have allocated statis IPs for each VM. 100 namenode, 101 dn01, 102 dn02, 103 dn03.

Hosts file :

[root@namenode ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.2.100 namenode
10.0.2.101 dn01
10.0.2.102 dn02
10.0.2.103 dn03

Firewall and SELinux:

[root@namenode ~]# systemctl disable firelwalld
[root@namenode ~]# systemctl stop firelwalld

[root@namenode ~]# vi /etc/selinux/config

Set SELINUX=disabled

SSH keyless :

[root@namenode .ssh]# ssh-keygen -t rsa
[root@namenode .ssh]# cat .ssh/id_rsa.pub | ssh root@dn01 'cat >> .ssh/authorized_keys'
[root@namenode .ssh]# cat .ssh/id_rsa.pub | ssh root@dn02 'cat >> .ssh/authorized_keys'
[root@namenode .ssh]# cat .ssh/id_rsa.pub | ssh root@dn03 'cat >> .ssh/authorized_keys'

Step 3

Installing Cloudera

Login to namenode and wget the bin file.

wget https://archive.cloudera.com/cm7/7.1.1/cloudera-manager-installer.bin
chmod u+x cloudera-manager-installer.bin
sudo ./cloudera-manager-installer.bin

In this procedure, Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, Cloudera Manager Agent, Runtime, and managed service software on cluster hosts. Cloudera Manager also configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles.

NOTE: add PORT FORWARDING in the virtualbox NatNetwork. 7180 and 7183 is for Cloudera manager.

Step 4

Add hosts/roles and set up Embedded PostgreSQL Database

  1. Obtain the root password from the /var/lib/cloudera-scm-server-db/data/generated_password.txt 
psql -U cloudera-scm -p 7432 -h localhost -d postgres
postgres=# \l

If you don’t have the hive, ouzie,hue user and database :

https://docs.cloudera.com/cloudera-manager/7.1.1/installation/topics/cdpdc-configuring-starting-postgresql-server.html

CREATE ROLE <user> LOGIN PASSWORD '<password>';
CREATE DATABASE <database> OWNER <user> ENCODING 'UTF8';

Step 5

Install ntp on all VM machines for time synchronizing

[root@namenode ~]# yum install ntp
[root@namenode ~]# systemctl start ntpd
[root@namenode ~]# ntpdate 1.ro.pool.ntp.org

Final picture

About the author: cosmin chauciuc

Leave a Reply

Your email address will not be published.