Monthly Archives: June 2015

Building a cluster day 2

Setup of headnode and compute nodes

Here is a quick overview of the tasks that I did to get the headnode up and running.

  1. Install Ubuntu server*
  2. Enable networking*
  3. Setup DHCP server*
  4. Setup NAT (Network address transition)
  5. Setup NFS
  6. Setup Csync2

I will assume that the headnode is called node000, and the compute nodes are node001-node00n, usually I will only show an example for one of the compute nodes assuming that all other compute nodes would be configured in the same way. If for some reason you feel something isn’t clear, or there might be better/easier options out there to achieve the same, I am happy to hear about this.
* What I cover in this post.

1. Fresh install of Ubuntu 14.04.2 on the headnode

Either prepare a USB or CD with the Ubuntu server image and boot from it (this might involve changing the boot order in your Bios.) Then Ubuntu will guide you through the setup. In my case I didn’t want an automatic partitioning of my two virtual RAID drives, so I selected a manual partitioning when prompted. I basically ended up following this guide pretty closely.

I have a small 50GB virtual drive and about 10 TB second virtual drive. The 50 GB virtual drive should contain the /root and boot partition, whereas  the 10TB should be a storage partition and somehow also ended up reserving some space for swap.

I chose a boot partition of 1GB, a 40 GB / root partition, 8.2 GB /var partition and 10 TB /storage partition, all using ext4 file system.

After writing out the partitions I just followed the rest of the instructions and managed to boot successfully into ubuntu 14.04, without any networking since I am actually behind a firewall.

2. Getting the networking to work

In theory getting the networking to work should have been quite easy. I want my headnode to have a static IP address and be able to communicate to the outside world. I have two network interfaces, one for the outside and one for the inter-node communication (eth0 and eth1, or at least so I though).

I went ahead and configured

sudo vi /etc/network/interfaces

Using the following information (minus the xxx’s.)

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The local network interface
auto eth0
iface eth0 inet static
address 10.0.0.109
netmask 255.255.255.0
#The external network
auto eth1
iface eth1 inet static
address xxx.xxx.xxx.xx
netmask 255.255.255.0
broadcast xxx.xxx.xxx.xxx
network xxx.xxx.xxx.x
gateway xxx.xxx.xxx.xxx

/etc/hosts should look something like this:

127.0.0.1	localhost

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouter

Then the nameserver of the network needs to be set in the file /etc/resolvconf/resolv.conf.d/base. It will look something like this:

cat /etc/resolvconf/resolv.conf.d/base 
nameserver xxx.xxx.xxx.xxx #your nameserver ip here

After a reboot of the headnode in theory the network should now be working. In practice my system had done something not so uncommon. It had renamed the network devices eth0 and eth1. To diagnose this problem was quite cumbersome.

Looking at the output of sudo ethtools eth1 and you get something like this:

Settings for eth1:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
No data available

Something fishy is going on. Let’s look at some of the log files to see if we can clarify what is going on. Try something like:

dmesg | grep eth1

dmesg, is a tool that displays messages from the boot sequence of the kernel and thus will also contain some information regarding the network interface. In my case I found the message to output something like this:

[...]renamed network interface eth1 to p13p2[...]

Problem found. A fix will be to rename the p13p1 interface back to eth0 and p13p2 to eth1.
Following this post was very useful, including the suggested fix of a bug in ubuntu (14.04) that would not actually allow the renaming rule:
http://www.hellovinoth.com/ubuntu-14-04-renaming-ethernet-interfaces-from-p1p1″-to-eth0″/ (external link)

Here’s how to rename your network card in Ubuntu 14.04:

1. Get your ethernetcard MAC address: ifconfig | grep HWaddr

and make a note of it as you will need it later and move to the directory /etc/udev/rules.d
2. Edit the file: sudo vi 70-persistent-net.rules (don’t worry if it doesn’t exists)
and type exactly:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xx", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xy", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth1"

ATTR(address)==”0x:27:96:c7:7d:xx, should be the MAC address of both of the ethernet cards you hopefully noted down before. Don’t copy the nonsensical one I have given here.

3. Edit the /etc/default/grub grub file, so that the rule file will actually be used at boot and overwrite any other possible naming of the network devices:

GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=1 biosdevname=0″
#followed by
$sudo update-grub

4. reboot (e.g. sudo reboot).

Login and type ifconfig to confirm your network adapter is at eth0 and eth1. Now we can for example install open ssh server, by typing:

sudo apt-get update
sudo apt-get openssh-server

3. Setup DHCP server

DHCP, the dynamic host configuration protocol, is a way to dynamically assign IP addresses in a network. In my case here, I want to run a DHCP server on the headnode, such that the compute nodes will be given a set of (static) IP addresses so that they form their own local cluster network and can communicate with each other. I did the following to get the DHCP server up and running on node000:

sudo apt-get install isc-dhcp-server

There are two main files /etc/default/isc-dhcp-server and /etc/dhcp/dhcpd.conf which you will need to be edited, let’s start with the first one:

sudo vi /etc/default/isc-dhcp-server

You should get the following:

# Defaults for dhcp initscript
# sourced by /etc/init.d/dhcp
# installed at /etc/default/isc-dhcp-server by the maintainer scripts

#
# This is a POSIX shell fragment
#

# On what interfaces should the DHCP server (dhcpd) serve DHCP requests?
#       Separate multiple interfaces with spaces, e.g. "eth0 eth1".
INTERFACES="eth0"

Replace or ADD in INTERFACES “eth0” above with the name of your network interface that you want the server to lease addresses on. (In my case eth0 is for the local network and eth1 talks to the outside world).

Then we can edit the second file:

cat /etc/dhcp/dhcpd.conf
#
# Sample configuration file for ISC dhcpd for Debian
#
# Attention: If /etc/ltsp/dhcpd.conf exists, that will be used as
# configuration file instead of this file.
#
#
# The ddns-updates-style parameter controls whether or not the server will
# attempt to do a DNS update when a lease is confirmed. We default to the
# behavior of the version 2 packages ('none', since DHCP v2 didn't
# have support for DDNS.)
ddns-update-style none;
# option definitions common to all supported networks...
option subnet-mask 255.255.255.0;
option broadcast-address 10.0.0.255;
option routers 10.0.0.109; #internal ip of headnode
option domain-name "node000"; #headnode name
option domain-name-servers 10.0.0.109, xxx.xxx.xxx.xxx #information from the outside domain name server
default-lease-time 3600;
max-lease-time 3600;
subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.120 10.0.0.130;
}
host node1 {
hardware ethernet 00:f5:45:4f:35:xx; #Mac address of the ethernet card use the of compute node001
fixed-address 10.0.0.101;
}
#[...] add any other nodes in the same style, changing the IP address...
# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.
authoritative;

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

Now restart the dhcp service by typing:

sudo service isc-dhcp-server stop
sudo service isc-dhcp-server start

That’s it!! Your dhcp server should be running, however it is best to check. Type:

sudo netstat -uap

The compute nodes now of course need to know about the DHCP server on the headnode. So maybe it is now time to start discussing the compute nodes. These should also have a clean install of Ubuntu server 14.04.2. Make sure that their network devices have the correct names, and in my case I just created the same sudo user as on the headnode. (will make many things easier later) I wasn’t too worried about the partitioning of the compute nodes, since I intend to export the /storage partition of the head nodes to the compute nodes via NFS. So I will assume that there is a node001, the first compute node, up and running Ubuntu server but nothing else configured. In order for the network to work with dhcp edit the following file on node001:

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The internal network
auto eth0
iface eth0 inet dhcp


Restart the networking on node001:

sudo ifdown eth0 
sudo ifup eth

Now you should be able 1. ping the compute node:

ping 10.0.0.101 #for compute node001

And 2. your syslog file on your headnode should show a dhcp request:

tail /var/log/syslog
Jun 13 17:18:02 node000 dhcpd: DHCPREQUEST for 10.0.0.101 from 00:f5:45:47:35:xx via eth0
Jun 13 17:18:02 node000 dhcpd: DHCPACK on 10.0.0.101 to 00:f5:45:47:35:xx via eth0