Bioinformatics Software Lab 2018

Location: MTZ Seminar Room, Pauwelstr, 19; 3rd Floor, Corridor B room 3.04.

Dates: Monday 9:30-12:30 (starting 16.04.2017)

Language: English

Prerequisite (desirable): Introduction to Bioinformatics

Credits: 7 (10 with extra work)

Lecturers: Ivan G. Costa & Zhijian Li

Evaluation: 20% prototypes / 60% final project / 20% presentation

Campus DescriptionBioinformatik Praktikum


Next-generation sequencing (NGS) allows the measurement of molecular characteristics of individuals on a genome-wide scale. The application of NGS methods to large patient groups enables precise medicine, i.e. finding genetic features to guide medical treatment. The low level analysis of NGS data imposes large computational and statistical challenges. NGS data are typically large (1 to 100 GB per sample/patient) requiring efficient computational strategies for data analysis and storage. Moreover, NGS data contains artifacts and noise, which affects the reliability of predictions and leads to errors.

In this practical course, we will focus on the problem of detection of protein binding sites from open chromatin data. Groups will implement and propose statistical models or machine learning methods for the detection of putative binding sites within highly dimensional genomic data. The proposed tools will be used for analysis of public medical genomic data from the Human Epigenome, ENCODE or the ENCODE DREAM challenge. Students will learn computational pipelines necessary for the analysis of sequencing data including quality check, alignment and post-processing steps. We will use the high performance cluster from the ITC RWTH Aachen as computational platform for this course.



16.04.2018 –Introduction to Bioinformatics and Next Generation Sequencing

  • NGS
  • Chip-seq protocol
  • open chromatin protocol (DNase-seq and ATAC-seq)
  • Transcription factor binding sites

23.04.2018 – Practical Course in NGS data analysis

30.04.2018 – Introduction to the Project

  • cancer clustering (data is not public, need permission)
  • DREAM challenge (too many data)
  • supervised learning for footprinting

7.05.2018 to 9.07.2018 – Project Development

 16.07.2018 – Project Presentation


Richard Durbin, A. Krogh, G. Mitchison, S. Eddy, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1999.

Online material on NGS

Video courses

The genomics data science course in coursera many interesting aspects of the course. We recommend the following lectures, which introduce HMMs (Course 13 Course 14Course 15, Course 16).