[Gtpb] ARANGS16 - ARANGS16 Automated and reproducible analysis of NGS data

ANNOUNCEMENT / REMINDER Applications for ARANGS16 - ARANGS16 Automated and reproducible analysis of NGS data with Darin London and Rutger Vos are OPEN IMPORTANT DATES for this Course Deadline for applications: May 2nd 2016 Course dates: May 9th - May 13th 2016 Candidates with adequate profile will be accepted in the next 72 hours after the application until we reach 20 participants. Description Next generation sequencing (NGS) technologies for DNA have resulted in a yet bigger deluge of data. Researchers are learning that analyzing the data efficiently requires the creation of sophisticated pipelines, typically using command line tools in a Linux or other Open source Unix variant compute environment. Many researchers have created these pipelines to successfully analyse their data. Now they are faced with the challenge of making these pipelines available to their colleagues. The issue of reproducibility has emerged as a major issue, as researchers, peer reviewers, and even pharmaceutical companies discover that the software and data used to produce a particular research finding are either not available, poorly documented, or targeted to specific compute infrastructures that are not available to the wider research community. To remedy this, funding agencies and journals are creating policies to promote software reproducibility. In this brief workshop we will establish several best practices of reproducibility in the (comparative) analysis of data obtained by NGS. In doing so we will encounter the commonly used technologies that enable these best practices by working through use cases that illustrate the underlying principles. Building on the basis of an existing pipeline of command line utilities, we will illustrate how the entire compute environment used to run the pipeline can be packaged into a unit that can be shared with other researchers such that they can make full use of the environment on their own machines, or on standard cloud compute environments such as Amazon or Google. Best practices Command line scripting of analysis steps Provisioning systems to standardise software environment requirements Packaging of compute environment into static, portable units Sharing of compute environment packages Technologies Next generation sequencing platforms Command-line executables, command line scripting and batching Provisioning Systems: Puppet, Dockerfile Virtualization with Virtualbox and Vagrant Containerization with Docker More details at the GTPB website http://gtpb.igc.gulbenkian.pt/bicourses/ARANGS16/ Best wishes Pedro Fernandes -- Pedro Fernandes GTPB Coordinator Instituto Gulbenkian de Ciência Apartado 14 2781-901 OEIRAS PORTUGAL Tel +351 21 4407912 http://gtpb.igc.gulbenkian.pt _______________________________________________ GTPB mailing list GTPB@igc.gulbenkian.pt https://lists.igc.gulbenkian.pt/mailman/listinfo/gtpb
participants (1)
-
Pedro Fernandes