[BBC] Few places left on Introduction to Linux and Workflows for Biologists 24-28 April 2017, Edinburgh
Bert Overduin
Bert.Overduin at ed.ac.uk
Thu Apr 6 16:45:51 CEST 2017
Dear all,
We still have a few places left on our new Introduction to Linux and
Workflows for Biologists workshop, which runs from 24-28 April 2017.
Cheers,
Bert
**********
INTRODUCTION TO LINUX AND WORKFLOWS FOR BIOLOGISTS
DATE: Monday 24 – Friday 28 April 2017
VENUE: The King's Buildings, The University of Edinburgh, Edinburgh,
Scotland, UK
REGISTRATION DEADLINE: Monday 10 April 2017 noon
CANCELLATION DEADLINE: Monday 17 April 2017 noon
PLACES: 20 (first come, first served)
REGISTRATION FEE: £525 (includes coffee/tea, but no lunch)
INFORMATION: Bert Overduin (bert.overduin at ed.ac.uk), Martin Jones (
martin at pythonforbiologists.com)
TO REGISTER: http://genomics.ed.ac.uk/services/introduction-linux-and-
workflows-biologists
Most high-throughput bioinformatics work these days takes place on the
Linux command line. The programs which do the majority of the computational
heavy lifting — genome assemblers, read mappers, and annotation tools — are
designed to work best when used with a command-line interface. Because the
command line can be an intimidating environment, many biologists learn the
bare minimum needed to get their analysis tools working. This means that
they miss out on the power of Linux to customise their environment and
automate many parts of the bioinformatics workflow. This course will
introduce the Linux command line environment from scratch and teach
students how to make the most of its tools to achieve a high level of
productivity when working with biological data.
INSTRUCTORS
Dr. Martin Jones (Founder, Python for Biologists)
Dr. Bert Overduin (Training and Outreach Bioinformatician, Edinburgh
Genomics)
WORKSHOP FORMAT
The workshop is delivered over nine half-day sessions (see the detailed
curriculum below). Each session consists of roughly a one hour lecture
followed by two hours of practical exercises, with breaks at the
organiser’s discretion. There will also be plenty of time for students to
discuss their own problems and data.
WHO SHOULD ATTEND
This workshop is aimed at researchers and technical workers with a
background in biology who want to learn to use the Linux operating system
and the command line environment.
REQUIREMENTS
Students should have enough biological background to appreciate the
examples and exercise problems, and have at least some interest in working
with DNA sequence data. No previous computer skills are necessary, as we
will introduce Linux starting with the very basics. Students need to bring
a laptop with a Linux virtual machine installed (we will distribute
instructions for downloading and installing the virtual machine before the
course starts).
SESSION CONTENT
1. The design of Linux
In the first session we briefly cover the design of Linux: how is it
different from Windows/OSX and how is it best used? We’ll then jump
straight onto the command line and learn about the layout of the Linux
filesystem and how to navigate it. We’ll describe Linux’s file permission
system (which often trips up beginners), how paths work, and how we
actually run programs on the command line. We’ll learn a few tricks for
using the command line more efficiently, and how to deal with programs that
are misbehaving. We’ll finish this session by looking at the built in help
system and how to read and interpret manual pages.
2. System management
We’ll first look at a few command line tools for monitoring the status of
the system and keeping track of what’s happening to processor power,
memory, and disk space. We’ll go over the process of installing new
software from the built in repositories (which is easy) and from source
code downloads (which is trickier). We’ll also introduce some tools for
benchmarking software (measuring the time/memory requirements of processing
large datasets).
3. Manipulating tabular data
Many data types we want to work with in bioinformatics are stored as
tabular plain text files, and here we learn all about manipulating tabular
data on the command line. We’ll start with simple things like extracting
columns, filtering and sorting, searching for text before moving on to more
complex tasks like searching for duplicated values, summarizing large
files, and combining simple tools into long commands.
4. Constructing pipelines
In this session we will look at the various tools Linux has for
constructing pipelines out of individual commands. Aliases, shell
redirection, pipes, and shell scripting will all be introduced here. We’ll
also look at a couple of specific tools to help with running tools on
multiple processors, and for monitoring the progress of long running tasks.
5. EMBOSS
EMBOSS is a suite of bioinformatics command-line tools explicitly designed
to work in the Linux paradigm. We’ll get an overview of the different
sequence data formats that we might expect to work with, and put what we
learned about shell scripting to biological use by building a pipeline to
compare codon usage across two collections of DNA sequences.
6. Using a Linux server
Often in bioinformatics we’ll be working on a Linux server rather than our
own computer— typically because we need access to more computing power, or
to specialized tools and datasets. In this session we’ll learn how to
connect to a Linux server and how to manage sessions. We’ll also consider
the various ways of moving data to and from a server from your own
computer, and finish with a discussion of the considerations we have to
make when working on a shared computer.
7. Combining methods
In the next two sessions — i.e. one full day — we’ll put everything we have
learned together and implement a workflow for next-gen sequence analysis.
In this first session we’ll carry out quality control on some paired-end
Illumina data and map these reads to a reference genome. We’ll then look at
various approaches to automating this pipeline, allowing us to quickly do
the same for a second dataset.
8. Combining methods
The second part of the next-gen workflow is to call variants to identify
SNPs between our two samples and the reference genome. We’ll look at the
VCF file format and figure out how to filter SNPs for read coverage and
quality. By counting the number of SNPs between each sample and the
reference we will try to figure out something about the biology of the two
samples. We’ll attempt to automate this analysis in various ways so that we
could easily repeat the pipeline for additional samples.
9. Customisation
Part of the Linux design is that everything can be customised. This can be
intimidating at first but, given that bioinformatics work is often fairly
repetitive, can be used to good effect. Here we’ll learn about environment
variables, custom prompts, soft links, and ssh configuration — a collection
of tools with modest capabilities, but which together can make life on the
command line much more pleasant. In this last session there will also be
time to continue working on the next-gen sequencing pipeline.
The last afternoon is reserved for finishing off the next-gen workflow
exercise, working on your own datasets, or leaving early for travel.
--
Bert Overduin, PhD
TRAINING AND OUTREACH BIOINFORMATICIAN
Bert.Overduin at ed.ac.uk
orcid.org/0000-0002-5281-8838
EDINBURGH GENOMICS
The University of Edinburgh
Ashworth Laboratories
The King's Buildings
Charlotte Auerbach Road
Edinburgh EH9 3FL
Scotland, United Kingdom
tel. +44(0)1316507403
http://genomics.ed.ac.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.psb.ugent.be/pipermail/bbclist/attachments/20170406/97865e35/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://maillist.psb.ugent.be/pipermail/bbclist/attachments/20170406/97865e35/attachment.ksh>
More information about the BBClist
mailing list