Category Archives: cluster

It has been a while since I have last looked at CPU benchmarks and performances. In particular now that am I getting a new server that can be used as a small cluster head node. But server prices range from £300 into the astronomical. From back in the days when I build my desktop computers from scratch I also remembered the amazing tools that let you compare various CPUs in terms of spec and performance, price etc, but literally haven’t looked at them in years.

So before delving through manually picking up CPU names and putting them into some CPU comparison site I thought, maybe i can just put a quick script together that will give me a scatter plot of  the CPU mark versus its current price. Easy enough 20 minutes later I have figured out how to use Beautifulsoup to parse some html text. To be precise the html of, the comparison of server CPUs on PassMark. The html contains the chart info from which i read the CPU mark and the current price where given. Then with some very rudimentary plotting I can look at cheap, yet high performing CPUs (Ok I was lazy, I could have made sure that the labels aren’t overlapping…but this is really just a quick and dirty plot):

The price range actually goes up quite a bit higher, but I have a spending limit and therefore decided to only plot up to $1500.

Then I soon realised the flaw in my great plan. Looking at some of the cheap yet high performing CPUs, it turns out they are all quite old and have an ‘end of life’ status. The table does not contain that info and I would have to go back to some manual comparison. I am not that desperate to learn that much more about CPUs nor the time to further crawl my way through some html files. However, it seems that the Xeon Silver 4114 CPU, that is commonly sold in customisable server setups isn’t such a bad choice as a midrange CPU, which will probably be a likely choice in a potential server purchase.


Building a cluster day 2

Setup of headnode and compute nodes

Here is a quick overview of the tasks that I did to get the headnode up and running.

  1. Install Ubuntu server*
  2. Enable networking*
  3. Setup DHCP server*
  4. Setup NAT (Network address transition)
  5. Setup NFS
  6. Setup Csync2

I will assume that the headnode is called node000, and the compute nodes are node001-node00n, usually I will only show an example for one of the compute nodes assuming that all other compute nodes would be configured in the same way. If for some reason you feel something isn’t clear, or there might be better/easier options out there to achieve the same, I am happy to hear about this.
* What I cover in this post.

1. Fresh install of Ubuntu 14.04.2 on the headnode

Either prepare a USB or CD with the Ubuntu server image and boot from it (this might involve changing the boot order in your Bios.) Then Ubuntu will guide you through the setup. In my case I didn’t want an automatic partitioning of my two virtual RAID drives, so I selected a manual partitioning when prompted. I basically ended up following this guide pretty closely.

I have a small 50GB virtual drive and about 10 TB second virtual drive. The 50 GB virtual drive should contain the /root and boot partition, whereas  the 10TB should be a storage partition and somehow also ended up reserving some space for swap.

I chose a boot partition of 1GB, a 40 GB / root partition, 8.2 GB /var partition and 10 TB /storage partition, all using ext4 file system.

After writing out the partitions I just followed the rest of the instructions and managed to boot successfully into ubuntu 14.04, without any networking since I am actually behind a firewall.

2. Getting the networking to work

In theory getting the networking to work should have been quite easy. I want my headnode to have a static IP address and be able to communicate to the outside world. I have two network interfaces, one for the outside and one for the inter-node communication (eth0 and eth1, or at least so I though).

I went ahead and configured

sudo vi /etc/network/interfaces

Using the following information (minus the xxx’s.)

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The local network interface
auto eth0
iface eth0 inet static
#The external network
auto eth1
iface eth1 inet static

/etc/hosts should look something like this:	localhost

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouter

Then the nameserver of the network needs to be set in the file /etc/resolvconf/resolv.conf.d/base. It will look something like this:

cat /etc/resolvconf/resolv.conf.d/base 
nameserver #your nameserver ip here

After a reboot of the headnode in theory the network should now be working. In practice my system had done something not so uncommon. It had renamed the network devices eth0 and eth1. To diagnose this problem was quite cumbersome.

Looking at the output of sudo ethtools eth1 and you get something like this:

Settings for eth1:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
No data available

Something fishy is going on. Let’s look at some of the log files to see if we can clarify what is going on. Try something like:

dmesg | grep eth1

dmesg, is a tool that displays messages from the boot sequence of the kernel and thus will also contain some information regarding the network interface. In my case I found the message to output something like this:

[...]renamed network interface eth1 to p13p2[...]

Problem found. A fix will be to rename the p13p1 interface back to eth0 and p13p2 to eth1.
Following this post was very useful, including the suggested fix of a bug in ubuntu (14.04) that would not actually allow the renaming rule:″-to-eth0″/ (external link)

Here’s how to rename your network card in Ubuntu 14.04:

1. Get your ethernetcard MAC address: ifconfig | grep HWaddr

and make a note of it as you will need it later and move to the directory /etc/udev/rules.d
2. Edit the file: sudo vi 70-persistent-net.rules (don’t worry if it doesn’t exists)
and type exactly:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xx", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xy", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth1"

ATTR(address)==”0x:27:96:c7:7d:xx, should be the MAC address of both of the ethernet cards you hopefully noted down before. Don’t copy the nonsensical one I have given here.

3. Edit the /etc/default/grub grub file, so that the rule file will actually be used at boot and overwrite any other possible naming of the network devices:

GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=1 biosdevname=0″
#followed by
$sudo update-grub

4. reboot (e.g. sudo reboot).

Login and type ifconfig to confirm your network adapter is at eth0 and eth1. Now we can for example install open ssh server, by typing:

sudo apt-get update
sudo apt-get openssh-server

3. Setup DHCP server

DHCP, the dynamic host configuration protocol, is a way to dynamically assign IP addresses in a network. In my case here, I want to run a DHCP server on the headnode, such that the compute nodes will be given a set of (static) IP addresses so that they form their own local cluster network and can communicate with each other. I did the following to get the DHCP server up and running on node000:

sudo apt-get install isc-dhcp-server

There are two main files /etc/default/isc-dhcp-server and /etc/dhcp/dhcpd.conf which you will need to be edited, let’s start with the first one:

sudo vi /etc/default/isc-dhcp-server

You should get the following:

# Defaults for dhcp initscript
# sourced by /etc/init.d/dhcp
# installed at /etc/default/isc-dhcp-server by the maintainer scripts

# This is a POSIX shell fragment

# On what interfaces should the DHCP server (dhcpd) serve DHCP requests?
#       Separate multiple interfaces with spaces, e.g. "eth0 eth1".

Replace or ADD in INTERFACES “eth0” above with the name of your network interface that you want the server to lease addresses on. (In my case eth0 is for the local network and eth1 talks to the outside world).

Then we can edit the second file:

cat /etc/dhcp/dhcpd.conf
# Sample configuration file for ISC dhcpd for Debian
# Attention: If /etc/ltsp/dhcpd.conf exists, that will be used as
# configuration file instead of this file.
# The ddns-updates-style parameter controls whether or not the server will
# attempt to do a DNS update when a lease is confirmed. We default to the
# behavior of the version 2 packages ('none', since DHCP v2 didn't
# have support for DDNS.)
ddns-update-style none;
# option definitions common to all supported networks...
option subnet-mask;
option broadcast-address;
option routers; #internal ip of headnode
option domain-name "node000"; #headnode name
option domain-name-servers, #information from the outside domain name server
default-lease-time 3600;
max-lease-time 3600;
subnet netmask {
host node1 {
hardware ethernet 00:f5:45:4f:35:xx; #Mac address of the ethernet card use the of compute node001
#[...] add any other nodes in the same style, changing the IP address...
# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

Now restart the dhcp service by typing:

sudo service isc-dhcp-server stop
sudo service isc-dhcp-server start

That’s it!! Your dhcp server should be running, however it is best to check. Type:

sudo netstat -uap

The compute nodes now of course need to know about the DHCP server on the headnode. So maybe it is now time to start discussing the compute nodes. These should also have a clean install of Ubuntu server 14.04.2. Make sure that their network devices have the correct names, and in my case I just created the same sudo user as on the headnode. (will make many things easier later) I wasn’t too worried about the partitioning of the compute nodes, since I intend to export the /storage partition of the head nodes to the compute nodes via NFS. So I will assume that there is a node001, the first compute node, up and running Ubuntu server but nothing else configured. In order for the network to work with dhcp edit the following file on node001:

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The internal network
auto eth0
iface eth0 inet dhcp

Restart the networking on node001:

sudo ifdown eth0 
sudo ifup eth

Now you should be able 1. ping the compute node:

ping #for compute node001

And 2. your syslog file on your headnode should show a dhcp request:

tail /var/log/syslog
Jun 13 17:18:02 node000 dhcpd: DHCPREQUEST for from 00:f5:45:47:35:xx via eth0
Jun 13 17:18:02 node000 dhcpd: DHCPACK on to 00:f5:45:47:35:xx via eth0

Building a small CPU/GPU cluster – day 1

I somehow was given the responsibility over a small mixed CPU/ GPU cluster and have the questionable honour of installing it. Since I have limited experience in this, I decided I will document it in an online form, such that I can come back to my notes in the future, should something break. WARNING: I am no expert at all in cluster administration/installation or the like, anything stated here may well be completely silly or wrong. I take no responsibility for that. 

Here is a vague picture of what the cluster will look like when finished.


The headnode is setup with a MegaRAID SAS 9260-8i RAID controller and 8 2 TB harddrives. Since on day one all I did was configure the RAID settings, I will give a summary of the RAID setup I did today.

First things first, what RAID level should I use?

Ideally I want RAID level 1 on two drives that make up my root partition later in the installation and RAID 6 on my /home partition. Ideally I want two harddrives to be dedicated to root and the rest to /home. However, since I only have 2 TB drives and I really don’t need a 2 TB root partition I decided to not go with this setup, but instead have everything setup using RAID level 6 with two virtual devices (vd0 will eventually be /) and (vd1 will eventually be /home).

To be honest I had very little idea about RAID levels or anything connected to physical/ virtual devices but the user guide for the MegaRAID controller was very useful in helping with certain decisions. Therefore I will link to it here. I have also taken a couple of images out of it for illustration purposes here.

The nice thing about this RAID controller is the very user friendly webbios interface. Therefore I power the headnode on and enter the webbios by pressing ctrl+h on boot-up. Soon a welcome screen greets me and I click on the configuration wizard link to get to this:


In my case I then clicked on chose New configuration, since I will start from scratch, but am working on a system with some previous configurations. I will then also choose a manual setup of, so that I can configure my virtual devices exactly how I want them. Now I just walk through the configuration wizard:

  1. When I ended up on a screen like this:
    I selected all the physical drives on the left (using shift) and then clicked on ‘add to array’.
  2. Once all the drives were added and I was happy with the drive group, I clicked on ‘accept DG’ on the right and then next.
  3. Now I am in the virtual drive configuration window of my drive group 0 (and only drive group).
    At the moment there is one virtual drive. I click on it and configure it the way I like:
    – RAID 6
    – Strip size: 128 kB- Access policy: Read + Write
    – Read Policy: Normal
    – Write policy: WThru
    – I/O policy: direct
    – Disc cache policy: disabled
    – Disabled BGI: no
    and set the drive size to 50 GB (for root). I accept the settings and then create a second virtual drive with identical settings but now use the left over drive size, which is around 10.8 TB on the chosen RAID6 in my system.
  4. Once I am happy with my virtual devices I accept their configuration.
  5. When I am prompted to save the new configurations I do so.

Now I have my RAID setup in the way I want. The welcome screen should now show me two virtual drives and 8 physical drives and give me a ‘pretty’ graphic about how everything is set up.  What happens next is that the background initialisation check will start running and I just let ot. This will take a while (I have projected something like at least 24h on my setup) and since I have some other things to do anyway, I let the initialisation happily check away, until my next free time-slot for some more cluster building.