You are here

Problem solving workshop, COMP-IT Summer School, Aug 16-20, 2010

The problems will be solved in small groups of 3-4 students.

You will get 1 ECTS if you participate in all activities of the summer school on 17-19 August and deliver a draft solution of one of the problems below during the week. A complete solution is worth 3-5 ECTS and can be delivered until 30 November 2010.

(C) Clustering

The attached data matrix contains 4590 binary vectors with 393 elements in each. There are no missing values among the observations. The task is to cluster these data with your method(s) of choice, such that you are able to choose the most appropriate number(s) of clusters according to some criteria. Pay attention to reporting what kind of clusters do emerge in your analysis.
Input data
Contact person: Professor Jukka Corander, available on Wed 18 Aug for questions

(D) Compression of DNA sequencing data
Next generation sequencing (NGS) platforms are generating massive amounts of short DNA reads in hundreds of laboratories around the world. The data contains lots of redundancy since the study targets are typically model organisms, whose whole genome sequences are almost completely known. Generic compression methods do not work well enough for short DNA reads, and hence tailored mechanisms are being developed. The proposed challenge is to develop such a compression mechanism for DNA reads. The results are evaluated on compression efficiency, compression time, decompression time, and on time to decompress the i-th read.
Complete description
Example input data (The complete data set is large and is delivered on-site via a USB stick.)
Contact person: Professor Veli Mäkinen, available Wed 18 Aug - Thu 19 Aug for questions

(E) Classification of eye movement trajectories

Can you read the mind of a computer user by inspecting just where they are looking at? As the old saying goes, eyes are the mirror of the soul, and hence tracking eye movements gives a great deal of data on what the user wants or is doing at the moment. Digging the relevant information out from the raw measurements, however, requires solving difficult machine learning challenges. You are given eye movement data measured for users reading a question and a set of possible answers, and your task is to find out which of the alternatives is the right one.
Complete description
Contact person: Arto Klami, available Wed 18 Aug for questions, phone/email otherwise

(G) Game of Life, supercomputing implementation

The Game of Life is a cellular automaton. In this excercise, a straightforward MPI implementation is discussed. You need to have a Fortran compiler and an MPI library installed in your laptop.
Complete description
Serial Fortran code
Skeleton MPI Fortran code
Contact person: Pekka Manninen and Sebastian Alftan, CSC, available during the supercomputing training Tue 17 Aug +Thu 19 Aug for questions

(I) Inverse kinematics
The task is provided by a company Sandvik Mining and Construction and is delivered to the participants via email.
Contact person: Robert Piche, available Tue 17 Aug - Thu 19 Aug for questions

Added later: Sandvik Mining and Construction http://www.miningandconstruction.sandvik.com sent 2 representatives and some equipment to the summer school, and gave a problem to be solved by the students. A very promising preliminary solution was delivered during the summer school by 8 students and 1 supervisor, and an elaborate solution was provided during autumn 2010 by 3 students and 1 supervisor. Sandvik is very pleased of the results.

(M) Markov Chains and Monte Carlo Simulations Without Detailed Balance
One of the most powerful simulation methods in physics, namely the Monte Carlo (MC) simulation method, is based on the theory of Markov chains. In virtually all practically applied simulation algorithms, the convergence of the Markov chain to its equilibrium distribution is guaranteed by enforcing the so-called detailed balance condition (DBC) between the transition  rates. However, while DBC is sucient, it is not necessary for the convergence of the chain to its equilibrium distribution. Can Monte Carlo be done more efficiently by not imposing the detailed balance condition?
Complete description

Preprint in ArXiv
Contact person: Tapio Ala-Nissilä, available Wed 18 Aug + Thu 19 Aug for questions