Bioinformatics Software Lab 2018

Location: MTZ Seminar Room, Pauwelstr, 19; 3rd Floor, Corridor B room 3.04.

Dates: Monday 9:30-12:30 (starting 16.04.2017)

Language: English

Prerequisite (desirable): Introduction to Bioinformatics

Credits: 7 (10 with extra work)

Lecturers: Ivan G. Costa & Zhijian Li

Evaluation: 20% prototypes / 60% final project / 20% presentation

Campus Description: Bioinformatik Praktikum

Description:

Next-generation sequencing (NGS) allows the measurement of molecular characteristics of individuals on a genome-wide scale. The application of NGS methods to large patient groups enables precise medicine, i.e. finding genetic features to guide medical treatment. The low level analysis of NGS data imposes large computational and statistical challenges. NGS data are typically large (1 to 100 GB per sample/patient) requiring efficient computational strategies for data analysis and storage. Moreover, NGS data contains artifacts and noise, which affects the reliability of predictions and leads to errors.

In this practical course, we will focus on the problem of detection of protein binding sites from open chromatin data. Groups will implement and propose statistical models or machine learning methods for the detection of putative binding sites within highly dimensional genomic data. The proposed tools will be used for analysis of public medical genomic data from the Human Epigenome, ENCODE or the ENCODE DREAM challenge. Students will learn computational pipelines necessary for the analysis of sequencing data including quality check, alignment and post-processing steps. We will use the high-performance cluster from the ITC RWTH Aachen as the computational platform for this course.

Schedule:

16.04.2018 –Introduction to Bioinformatics and Next Generation Sequencing [lecture_1_intro]

NGS
Chip-seq protocol
open chromatin protocol (DNase-seq and ATAC-seq)
Transcription factor binding sites

30.04.2018 – Practical Course in NGS data analysis [lecture_2_practice, installation, pipeline]

File format
Sequence alignment
Peak calling
Footprinting and motif analysis

07.05.2018 – Introduction to RGT and Project Description[lecture_3_RGT, practice]

Introduction to RGT
DREAM challenge
supervised learning for footprinting

14.05.2018 – Introduction to HPC clusters[lecture_4_HPC , script]

28.05.2018 – Introduction to GPU[lecture_5_GPU, script]

14.05.2018 to 9.07.2018 – Project Development

16.07.2018 – Project Presentation

Literature & Videos

Pavel A. Pevzner and Phillip Compeau, Bioinformatics Algorithms: An Active Learning Approach.

Check for Chapter 9 for some algorithms on short read alignment http://bioinformaticsalgorithms.com/videos.htm

Online material on Sequencing

Dream Challenge

One of the proposed problems is to find alternative methods to solve the ENCODE TF Binding Dream Challenge

These are papers describing approaches from the 2 top scoring teams: J-team and FactorNet.