Phylogenetic analyses are increasingly based upon massive genomic data rather than single marker genes. While PCR and Sanger sequencing allowed to study the relationship of organisms on based on one to multiple marker regions, new technologies such as high-throughput sequencing provides opportunities to sequence and analyze whole genomes. Therefore, computational work becomes increasingly important for modern biologist to analyze massive these data sets generated by high-throughput sequencing techniques. Many biologists are now challenged to work with gigabytes of sequencing data generated in only hours by parallel sequencing machines. These sequencing data are usually produced in the form of large text files, which makes the Unix operating system (including Linux) particularly suited to processing such files - especially when operated from the command line. Hence, the aim of this workshop is to introduce phylogenetic and phylogenomic concepts and how to apply these analyses by using the Linux command line.
During the first sessions of the course, you'll learn the history of phylogenetic methods and about traditional molecular marker regions which were used for decades to estimate the relationships of organisms. Then you will learn basic but powerful Linux commands to manage your folder structure, handle large files, and install and execute programs. After you feel comfortable in the Linux command line environment, you will apply your new knowledge to: build alignments, run complex phylogenetic methods, assemble high throughput sequencing data into continuous genomes, verify the integrity of these genomes by sequence mapping, use search methods to identify gene regions, and use these regions for phylogenetic reconstruction – all on the Linux operating system. We’ll finish this course with discussing advantages and problems of this new era of sequencing technologies.
You’ll learn how to:
- Operate Linux from the command line
- Install and execute Linux programs
- Work with phylogenetic data set (build alignmenta with clustalW, learn how to convert file formats)
- Use aligned sequences for phylogenetic reconstruction (using RAxML, Phylip, and MrBayes)
- Work with the structure of large high-throughput sequencing data formats (e.g. fastq, sam, vcf)
- Assembly and analyze genome sequencing (SPAdes, blast)
- Quality control genome assemblies by read mapping (Bowtie)
- Reduced genome representation methods for phylogenomic such as RADseq and target enrichment (pyRAD and HybPiper)
What you need to know about Linux:
- Nothing, absolute beginners are more than welcome.
What you need to bring and prepare:
- Please bring a personal laptop with the BCWS Workshop Appliance installed, since this class will be more workshop than lecture
- Installation instruction of the BCWS Workshop Appliance and course material will be made available on our website.
Course Schedule
(schedule may be subject to change)