Bioinformatics Software Lab 2018
Location: MTZ Seminar Room, Pauwelstr, 19; 3rd Floor, Corridor B room 3.04.
Dates: Monday 9:30-12:30 (starting 16.04.2017)
Language: English
Prerequisite (desirable): Introduction to Bioinformatics
Credits: 7 (10 with extra work)
Lecturers: Ivan G. Costa & Zhijian Li
Evaluation: 20% prototypes / 60% final project / 20% presentation
Campus Description: Bioinformatik Praktikum
Description:
Next-generation sequencing (NGS) allows the measurement of molecular characteristics of individuals on a genome-wide scale. The application of NGS methods to large patient groups enables precise medicine, i.e. finding genetic features to guide medical treatment. The low level analysis of NGS data imposes large computational and statistical challenges. NGS data are typically large (1 to 100 GB per sample/patient) requiring efficient computational strategies for data analysis and storage. Moreover, NGS data contains artifacts and noise, which affects the reliability of predictions and leads to errors.
In this practical course, we will focus on the problem of detection of protein binding sites from open chromatin data. Groups will implement and propose statistical models or machine learning methods for the detection of putative binding sites within highly dimensional genomic data. The proposed tools will be used for analysis of public medical genomic data from the Human Epigenome, ENCODE or the ENCODE DREAM challenge. Students will learn computational pipelines necessary for the analysis of sequencing data including quality check, alignment and post-processing steps. We will use the high-performance cluster from the ITC RWTH Aachen as the computational platform for this course.
Schedule:
16.04.2018 –Introduction to Bioinformatics and Next Generation Sequencing [lecture_1_intro]
- NGS
- Chip-seq protocol
- open chromatin protocol (DNase-seq and ATAC-seq)
- Transcription factor binding sites
30.04.2018 – Practical Course in NGS data analysis [lecture_2_practice, installation, pipeline]
- File format
- Sequence alignment
- Peak calling
- Footprinting and motif analysis
07.05.2018 – Introduction to RGT and Project Description[lecture_3_RGT, practice]
- Introduction to RGT
- DREAM challenge
- supervised learning for footprinting
14.05.2018 – Introduction to HPC clusters[lecture_4_HPC , script]
28.05.2018 – Introduction to GPU[lecture_5_GPU, script]
14.05.2018 to 9.07.2018 – Project Development
16.07.2018 – Project Presentation
Literature & Videos
Check for Chapter 9 for some algorithms on short read alignment http://bioinformaticsalgorithms.com/videos.htm
Online material on Sequencing
Dream Challenge
One of the proposed problems is to find alternative methods to solve the ENCODE TF Binding Dream Challenge
These are papers describing approaches from the 2 top scoring teams: J-team and FactorNet.