It has been a while since I have last looked at CPU benchmarks and performances. In particular now that am I getting a new server that can be used as a small cluster head node. But server prices range from £300 into the astronomical. From back in the days when I build my desktop computers from scratch I also remembered the amazing tools that let you compare various CPUs in terms of spec and performance, price etc, but literally haven’t looked at them in years.

So before delving through manually picking up CPU names and putting them into some CPU comparison site I thought, maybe i can just put a quick script together that will give me a scatter plot of  the CPU mark versus its current price. Easy enough 20 minutes later I have figured out how to use Beautifulsoup to parse some html text. To be precise the html of, the comparison of server CPUs on PassMark. The html contains the chart info from which i read the CPU mark and the current price where given. Then with some very rudimentary plotting I can look at cheap, yet high performing CPUs (Ok I was lazy, I could have made sure that the labels aren’t overlapping…but this is really just a quick and dirty plot):

The price range actually goes up quite a bit higher, but I have a spending limit and therefore decided to only plot up to $1500.

Then I soon realised the flaw in my great plan. Looking at some of the cheap yet high performing CPUs, it turns out they are all quite old and have an ‘end of life’ status. The table does not contain that info and I would have to go back to some manual comparison. I am not that desperate to learn that much more about CPUs nor the time to further crawl my way through some html files. However, it seems that the Xeon Silver 4114 CPU, that is commonly sold in customisable server setups isn’t such a bad choice as a midrange CPU, which will probably be a likely choice in a potential server purchase.


Baby steps with KNIME

I am trying to explore documenting some of the things I try at work a little better. Rather than just write things into a notebook and not find it ever again I thought I’d give making a small video a go. The story behind this little project, is that I am trying build workflows for biomolecular simulations. However, there are a great many ( or at least one ) workflow tools out there already, so I am trying to explore how easily adaptable such a workflow tool might be for the purpose of running a biomolecular simulation. Until about a month ago I had never heard about KNIME, until today I had absolutely no idea what it did. So I took my first baby steps, downloaded it (, and tried to run a first workflow. Running a simple k-means clustering on a typical benchmark clustering dataset (, was really quite easy and everything was very intuitive and didn’t take much longer than half an hour to get the hang of things. I made a little video of how to do this:

The video certainly has a lot of room for improvement, but it also happens to be my first ever screen capture video I attempted to make, so bare with me as I learn to improve on these skills. (Yes, I am also aware that apparently I don’t know the difference between 250 and 2500, but redoing the video just because of that seemed not all that useful)

Next I’ll try and actually write my own KNIME node that will execute a simple python script, since being able to execute workflow elements mostly run in python is really what I am interested in.

Building a cluster day 2

Setup of headnode and compute nodes

Here is a quick overview of the tasks that I did to get the headnode up and running.

  1. Install Ubuntu server*
  2. Enable networking*
  3. Setup DHCP server*
  4. Setup NAT (Network address transition)
  5. Setup NFS
  6. Setup Csync2

I will assume that the headnode is called node000, and the compute nodes are node001-node00n, usually I will only show an example for one of the compute nodes assuming that all other compute nodes would be configured in the same way. If for some reason you feel something isn’t clear, or there might be better/easier options out there to achieve the same, I am happy to hear about this.
* What I cover in this post.

1. Fresh install of Ubuntu 14.04.2 on the headnode

Either prepare a USB or CD with the Ubuntu server image and boot from it (this might involve changing the boot order in your Bios.) Then Ubuntu will guide you through the setup. In my case I didn’t want an automatic partitioning of my two virtual RAID drives, so I selected a manual partitioning when prompted. I basically ended up following this guide pretty closely.

I have a small 50GB virtual drive and about 10 TB second virtual drive. The 50 GB virtual drive should contain the /root and boot partition, whereas  the 10TB should be a storage partition and somehow also ended up reserving some space for swap.

I chose a boot partition of 1GB, a 40 GB / root partition, 8.2 GB /var partition and 10 TB /storage partition, all using ext4 file system.

After writing out the partitions I just followed the rest of the instructions and managed to boot successfully into ubuntu 14.04, without any networking since I am actually behind a firewall.

2. Getting the networking to work

In theory getting the networking to work should have been quite easy. I want my headnode to have a static IP address and be able to communicate to the outside world. I have two network interfaces, one for the outside and one for the inter-node communication (eth0 and eth1, or at least so I though).

I went ahead and configured

sudo vi /etc/network/interfaces

Using the following information (minus the xxx’s.)

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The local network interface
auto eth0
iface eth0 inet static
#The external network
auto eth1
iface eth1 inet static

/etc/hosts should look something like this:	localhost

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouter

Then the nameserver of the network needs to be set in the file /etc/resolvconf/resolv.conf.d/base. It will look something like this:

cat /etc/resolvconf/resolv.conf.d/base 
nameserver #your nameserver ip here

After a reboot of the headnode in theory the network should now be working. In practice my system had done something not so uncommon. It had renamed the network devices eth0 and eth1. To diagnose this problem was quite cumbersome.

Looking at the output of sudo ethtools eth1 and you get something like this:

Settings for eth1:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
No data available

Something fishy is going on. Let’s look at some of the log files to see if we can clarify what is going on. Try something like:

dmesg | grep eth1

dmesg, is a tool that displays messages from the boot sequence of the kernel and thus will also contain some information regarding the network interface. In my case I found the message to output something like this:

[...]renamed network interface eth1 to p13p2[...]

Problem found. A fix will be to rename the p13p1 interface back to eth0 and p13p2 to eth1.
Following this post was very useful, including the suggested fix of a bug in ubuntu (14.04) that would not actually allow the renaming rule:″-to-eth0″/ (external link)

Here’s how to rename your network card in Ubuntu 14.04:

1. Get your ethernetcard MAC address: ifconfig | grep HWaddr

and make a note of it as you will need it later and move to the directory /etc/udev/rules.d
2. Edit the file: sudo vi 70-persistent-net.rules (don’t worry if it doesn’t exists)
and type exactly:

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xx", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR(address)=="0x:27:96:c7:7d:xy", ATTR(dev_id)=="0x0", ATTR(type)=="1", KERNEL=="eth*", NAME="eth1"

ATTR(address)==”0x:27:96:c7:7d:xx, should be the MAC address of both of the ethernet cards you hopefully noted down before. Don’t copy the nonsensical one I have given here.

3. Edit the /etc/default/grub grub file, so that the rule file will actually be used at boot and overwrite any other possible naming of the network devices:

GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=1 biosdevname=0″
#followed by
$sudo update-grub

4. reboot (e.g. sudo reboot).

Login and type ifconfig to confirm your network adapter is at eth0 and eth1. Now we can for example install open ssh server, by typing:

sudo apt-get update
sudo apt-get openssh-server

3. Setup DHCP server

DHCP, the dynamic host configuration protocol, is a way to dynamically assign IP addresses in a network. In my case here, I want to run a DHCP server on the headnode, such that the compute nodes will be given a set of (static) IP addresses so that they form their own local cluster network and can communicate with each other. I did the following to get the DHCP server up and running on node000:

sudo apt-get install isc-dhcp-server

There are two main files /etc/default/isc-dhcp-server and /etc/dhcp/dhcpd.conf which you will need to be edited, let’s start with the first one:

sudo vi /etc/default/isc-dhcp-server

You should get the following:

# Defaults for dhcp initscript
# sourced by /etc/init.d/dhcp
# installed at /etc/default/isc-dhcp-server by the maintainer scripts

# This is a POSIX shell fragment

# On what interfaces should the DHCP server (dhcpd) serve DHCP requests?
#       Separate multiple interfaces with spaces, e.g. "eth0 eth1".

Replace or ADD in INTERFACES “eth0” above with the name of your network interface that you want the server to lease addresses on. (In my case eth0 is for the local network and eth1 talks to the outside world).

Then we can edit the second file:

cat /etc/dhcp/dhcpd.conf
# Sample configuration file for ISC dhcpd for Debian
# Attention: If /etc/ltsp/dhcpd.conf exists, that will be used as
# configuration file instead of this file.
# The ddns-updates-style parameter controls whether or not the server will
# attempt to do a DNS update when a lease is confirmed. We default to the
# behavior of the version 2 packages ('none', since DHCP v2 didn't
# have support for DDNS.)
ddns-update-style none;
# option definitions common to all supported networks...
option subnet-mask;
option broadcast-address;
option routers; #internal ip of headnode
option domain-name "node000"; #headnode name
option domain-name-servers, #information from the outside domain name server
default-lease-time 3600;
max-lease-time 3600;
subnet netmask {
host node1 {
hardware ethernet 00:f5:45:4f:35:xx; #Mac address of the ethernet card use the of compute node001
#[...] add any other nodes in the same style, changing the IP address...
# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

Now restart the dhcp service by typing:

sudo service isc-dhcp-server stop
sudo service isc-dhcp-server start

That’s it!! Your dhcp server should be running, however it is best to check. Type:

sudo netstat -uap

The compute nodes now of course need to know about the DHCP server on the headnode. So maybe it is now time to start discussing the compute nodes. These should also have a clean install of Ubuntu server 14.04.2. Make sure that their network devices have the correct names, and in my case I just created the same sudo user as on the headnode. (will make many things easier later) I wasn’t too worried about the partitioning of the compute nodes, since I intend to export the /storage partition of the head nodes to the compute nodes via NFS. So I will assume that there is a node001, the first compute node, up and running Ubuntu server but nothing else configured. In order for the network to work with dhcp edit the following file on node001:

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
#The internal network
auto eth0
iface eth0 inet dhcp

Restart the networking on node001:

sudo ifdown eth0 
sudo ifup eth

Now you should be able 1. ping the compute node:

ping #for compute node001

And 2. your syslog file on your headnode should show a dhcp request:

tail /var/log/syslog
Jun 13 17:18:02 node000 dhcpd: DHCPREQUEST for from 00:f5:45:47:35:xx via eth0
Jun 13 17:18:02 node000 dhcpd: DHCPACK on to 00:f5:45:47:35:xx via eth0

Building a small CPU/GPU cluster – day 1

I somehow was given the responsibility over a small mixed CPU/ GPU cluster and have the questionable honour of installing it. Since I have limited experience in this, I decided I will document it in an online form, such that I can come back to my notes in the future, should something break. WARNING: I am no expert at all in cluster administration/installation or the like, anything stated here may well be completely silly or wrong. I take no responsibility for that. 

Here is a vague picture of what the cluster will look like when finished.


The headnode is setup with a MegaRAID SAS 9260-8i RAID controller and 8 2 TB harddrives. Since on day one all I did was configure the RAID settings, I will give a summary of the RAID setup I did today.

First things first, what RAID level should I use?

Ideally I want RAID level 1 on two drives that make up my root partition later in the installation and RAID 6 on my /home partition. Ideally I want two harddrives to be dedicated to root and the rest to /home. However, since I only have 2 TB drives and I really don’t need a 2 TB root partition I decided to not go with this setup, but instead have everything setup using RAID level 6 with two virtual devices (vd0 will eventually be /) and (vd1 will eventually be /home).

To be honest I had very little idea about RAID levels or anything connected to physical/ virtual devices but the user guide for the MegaRAID controller was very useful in helping with certain decisions. Therefore I will link to it here. I have also taken a couple of images out of it for illustration purposes here.

The nice thing about this RAID controller is the very user friendly webbios interface. Therefore I power the headnode on and enter the webbios by pressing ctrl+h on boot-up. Soon a welcome screen greets me and I click on the configuration wizard link to get to this:


In my case I then clicked on chose New configuration, since I will start from scratch, but am working on a system with some previous configurations. I will then also choose a manual setup of, so that I can configure my virtual devices exactly how I want them. Now I just walk through the configuration wizard:

  1. When I ended up on a screen like this:
    I selected all the physical drives on the left (using shift) and then clicked on ‘add to array’.
  2. Once all the drives were added and I was happy with the drive group, I clicked on ‘accept DG’ on the right and then next.
  3. Now I am in the virtual drive configuration window of my drive group 0 (and only drive group).
    At the moment there is one virtual drive. I click on it and configure it the way I like:
    – RAID 6
    – Strip size: 128 kB- Access policy: Read + Write
    – Read Policy: Normal
    – Write policy: WThru
    – I/O policy: direct
    – Disc cache policy: disabled
    – Disabled BGI: no
    and set the drive size to 50 GB (for root). I accept the settings and then create a second virtual drive with identical settings but now use the left over drive size, which is around 10.8 TB on the chosen RAID6 in my system.
  4. Once I am happy with my virtual devices I accept their configuration.
  5. When I am prompted to save the new configurations I do so.

Now I have my RAID setup in the way I want. The welcome screen should now show me two virtual drives and 8 physical drives and give me a ‘pretty’ graphic about how everything is set up.  What happens next is that the background initialisation check will start running and I just let ot. This will take a while (I have projected something like at least 24h on my setup) and since I have some other things to do anyway, I let the initialisation happily check away, until my next free time-slot for some more cluster building.


PyTRAM – a free energy python package

Just before the weekend a colleague and I finally managed to add a new release alpha of the pytram package.
It is now on version 0.1.5 and can either be found on github:
Or the python package index:

New features are an additional free energy estimator. The package now contains dTRAM and xTRAM [1,2]. But probably the best enhancement to the previous release is the existence of a short documentation for this version. It gives a brief overview over supported input file structure, how to use the API etc. Also there are two ipython notebooks available that illustrate very basic usage examples of both estimators.

More features will follow soon.

[1] Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states H. Wu, A.S.J.S. Mey, E. Rosta, Frank Noé, J. Chem. Phys., 141 214106 (2014).

[2] xTRAM: Estimating equilibrium expectations from time-correlated simulation data at multiple thermodynamic states,  A.S.J.S. Mey,   H. Wu, and F. Noé, Phys. Rev. X, 4 041018 (2014)

Fastfoward science entry — news

So, just over a month ago the short film ‘Glacial Mysteries’, was summited as an entry to the fast forward science competition. Now it seems like our hard work on the film has payed off and it has been selected as one of the five finalists by the jury. Also, now people can vote on it to possibly win a community choice award. So please help out if you can and vote of leave a positive comment. It would mean a lot to everyone who was involved in the project.

The latest new of this, is that we have actually won the 1st price of the jury award and the film is finally online in an English version.
Have a look:

Fast forward science entry

Fast forward science, is a video competition for science short films, discussing current research. The idea is to make this research as  broadly comprehensible as possible, meaning that you do not need a PhD to actually understand what the film is about. This year is the first time I have co-entered this competition, with a film by Guillaume Jouvet. Below you can find the German version of the film, that was needed for the competition:


(In the future I will also post an English version of the film.) This short film aims to illustrate Guillaume’s research and in particular focuses on a scientific paper he published last year (2013) which discusses how computer simulations of a glacier can actually be used to reconstruct the trajectory path of a dead body within the glacial ice and in this way solve the mystery of three hikers who had disappeared on a hike over 80 years ago.


EMS Newsletter

As a scientist these days you are required to publish, ideally deep and profound research as fast as you can…well this is tricky and this post is not supposed to be about the scientific community and its publishing craze, but instead on a very enjoyable article I co-wrote.

It is an article on and the whole history behind it, which was featured in this months edition of the European Mathematical Society (EMS) newsletter.

EMS_title_page is a project by the Mathematische Forschungsinstitut Oberwolfach and funded by the Klaus Tschira Stiftung. I do some freelance work for this project and occasionally this means writing a nice little article about it for a newsletter such as the one of the EMS.

The complete newsletter including the article on IMAGINARY can be found here. Enjoy the read!

Hello world and Hello Dar es Salaam!

I have finally decided to restart my blog again.  For my first post I wanted to give an overview about my trip to Dar es Salaam I did in March 2014.  This whole endeavour came about, because I recently started doing some mathematics outreach work for a project called in February 2014.  IMAGINARY is a platform that tries to bring open source mathematics in the form of films, exhibitions, computer programmes and hands on problems that can be built at home. All content is free and lives of the mathematics community uploading anything that might be of interest to the general public, as well as the general public using the website as a resource in order to learn about the fun part of mathematics.

Having freshly started to do some work for IMAGINARY,  it came about that I was asked if I wanted to organise a small imaginary exhibit in Dar es Salaam in conjunction with the Pi-Day celebrations on the 14th of March. Usually when I refer to an IMAGINARY exhibit, I actually mean a small exhibition generated with a software called SURFER that allows to represent the solution of 3-D equations as surfaces. Here is a reposting of the blog post about my visit to Dar es Salaam presenting SURFER at the Pi-Day celebrations:

“Pi-Day is celebrated on March the 14th, by mathematics enthusiasts each year. In Tanzania, the day is celebrated as a national event by students, teachers and policy-makers. The Pi-Day celebration in Tanzania was first held in 2004 through the initiative of Mr. Beniel Seka under the auspices of Tanzania Institute of Education and later in 2007, the Mathematical Association of Tanzania (MAT) took over the responsibility for organising the celebration event. Since then, there has not been a single year without the event not taking place with over 1000 students and teachers participating. 
This year’s event was special as it marked the 10th anniversary of the celebrations.

As a result, the Mathematical Association of Tanzania, currently chaired by Dr. Sylvester E. Rugeihyamu in partnership with the African Institute for Mathematical Sciences (AIMS-NEI) organised this years’ event with the overall objective of widening participation and increase general awareness of the role and importance of Mathematics in our lives. Also to make Mathematics the subject of choice among students, which requires policy-makers to put more emphasis on the teaching of Maths in school. For the first time, there was a pre Pi-Day exhibition event on the 13th March which brought together 18 Exhibitors and 6 Book Publishers and attracted over 900 students and over 60 teachers.  The Guest of Honour to the 14th March event was the Vice President of The United Republic of Tanzania, Dr. Mohammed G. Bilal, who also delivered the keynote speech about the importance of Mathematics in general and in particular amongst other things for Tanzania’s economic growth. The event held in Dar es Salaam’s Mnazi Mmoja Park was attended by over two thousand school children and their teachers from 3 Universities, 33 Secondary Schools, 14 Primary Schools and one Teacher Resource Centre. IMAGINARY in collaboration with The African Institute for Mathematical Sciences actively participated in the two-day event.  This meant that for the first time the SURFER programme, was displayed in the form of an interactive exhibit in Africa. 

The 13th March was a day of pre-celebrations, AIMS and IMAGINARY had a joint tent, where pupils had the opportunity to first get a demonstration of SURFER and then play around with the software themselves using two laptops as well as a larger setup with a projector. I was in charge of this activity and had one Teaching Assistant (Mr. Laurent Shilingi) from the University of Dar es Salaam helping me in particular with the Swahili speaking pupils. The publicity material, consisting of postcards and posters disappeared into the hands of students and teachers very quickly and there was always a larger group of pupils surrounding the tent eager to understand the ideas behind SURFER. This was only broken up by the rain, that forced us to pack up the computer equipment during this pre-celebration event.

On the actual Pi-Day, i.e. the big celebration day, the timetable was a bit more formal with few exclusive tents for Mathematics related exhibitions, featuring IMAGINARY+AIMS, a group of students of one of the best high schools in Tanzania (Loyola High School) and the Peace Corps Volunteers. Tents with seats for a couple of thousand pupils and teachers had been arranged in front of the stage of the main event. Everything started off with a march of majority of all attending pupils, scheduled to arrive at the grounds around 10 am.
The rest of the day followed the following schedule:

10:00 Guests of honour attends the marching students

10:05 Students perform a song followed by an introduction by the chairperson and a performance of a traditional dance.

10:25 Mr. Samuel Awuku from AIMS- Next Einstein Initiative gave a short speech about AIMS and the outlook of an AIMS centre in Tanzania.

10:35 A short speech by the Chair of MAT, Dr. Sylvester E. Rugeihyamu

10:55 Student awards to top performing mathematics students are awarded.

11:25 Speech by the guest of honour, Dr. Mohammed G. Bilal

11:45 Group photos

11:55 Guest of honour visits all exhibition tents

12:00 onwards various traditional dances and performances by students

1:59:26 Pi-Hour

14:30 closing

Samuel Awuku’s speech, encouraged students to take up Mathematics, as well as made everyone aware of what possibilities in terms of scholarships and teacher training opportunities that an AIMS Centre of Excellence in Tanzania can provide. Of course the collaboration with IMAGINARY, was also mentioned. 

Unfortunately, the rest of the speeches by other speakers were in Swahili, so I have no idea, what was being said, apart from that the word ‘Mathematics’ in Swahili (Hisabati), was repeatedly used. The Vice President on the other hand, spoke in English about MAT/AIMS partnership and confirmed his government’s commitment to support the establish AIMS in Tanzania. Furthermore, in my discussion with Dr. Rugeihyamu the importance of the Association’s collaboration with other organisations in improving the teaching and learning of Mathematics in Tanzania was highlighted and welcomes all interested organisations on board.
After the Vice President speech, he personally came around all the tents to have a look at the exhibits, including the AIMS/IMAGINARY tent, where I gave a brief introduction into SURFER, how it can aid teaching the idea behind simple polynomial functions and solutions to multi variable polynomial equations, but also demonstrated the connection of Mathematics and Art. 

Many schools had prepared a dance or some other kind of contribution to the event. This was on going, while I was busy explaining SURFER to the many curious pupils and teachers who had approached out stall, after the most formal part of the event was over (Around 12pm). The event lasted until 2:30pm.

What was particular interesting for me to observe, as young female scientists, was that in particular girls showed an interested and actively asked me, what they could do to improve their grades in maths classes at  school. Also from my own perception, girls seemed more eager to play around with SURFER themselves, rather than just watching me explain the software to them. This is great to see, as Tanzania still has a problem of underperforming girls in science subject, were all top 10 places in a national mathematics competition went to boys. 

All in all the event was a great success, and I want to thank Samuel Awuku, AIMS-NEI and IMAGINARY (a project by the Mathematical Research Institure Oberwolfach and supported by the Klaus Rschira Stiftung) for having made it possible for me to participate in the Pi-Day in Tanzania and Dr. Sylvester Rugeihyamu and MAT/CHAHITA for the organisation and generally inspiring discussion regarding Mathematics in Tanzania. Lastly, I want to thank Mr. Laurent Shilingi for all his help and enthusiasm he put into demonstrating SURFER to pupils. I hope that the success of this year’s event means that next year will be even bigger, again with IMAGINARY’s participation in collaboration with AIMS-NEI.

For some photos related to this, check out the event’s page here.