Actions
GATK2 tutorial » History » Revision 3
« Previous |
Revision 3/5
(diff)
| Next »
Bryan Cosca, 03/16/2015 07:53 PM
Running GATK2 using Arvados¶
This tutorial demonstrates how to run the GATK Unified Genotyper pipeline using GenomeAnalysisTK-2.2-16 from the Broad Institute.
Uploading data through the web and using it on Arvados¶
- In your home project, click on the blue + Add data button in the top right.
- Click Upload files from my computer
- Click Choose Files and choose one set of paired end fastq files you would like to run GATK2 on.
- Once you're ready, click > Start
- Feel free to rename your Collection so you can remember it later. Click on the pencil icon in the top left corner next to New collection
- Once that is uploaded, navigate back to the dashboard and click on Run a pipeline... and choose GATK2 / exome PE fastq to snp [dev] [public].
- You can change the input by clicking on the [Choose] button next to the Sample FASTQ reads.
- Click on the dropdown menu, click on your newly-created project, and choose your desired input collection. Click OK and Run to run lobSTR v.3 on your data!
Uploading data through your shell and using it on Arvados¶
Full documentation can be found here
- Install the Arvados Python SDK on the system from which you will upload the data (such as your workstation, or a server containing data from your sequencer). Doing so will install the Arvados file upload tool, arv-put.
- To configure the environment with the Arvados instance host name and authentication token, see here
- Navigate back to your Workbench dashboard and create a new project by clicking on the Projects dropdown menu and clicking Home.
- Click on [+ Add a subproject]. Feel free to edit the Project name or description by clicking the pencil to the right of the text.
- To add data, return to your shell, create a folder, and put the two paired-end fastq files you want to upload inside. Use the command arv-put * --project-uuid qr1hi-xxxxx-yyyyyyyyyyyyyyy. The qr1hi tag can be found in the url of your new project. This ensures that all the files you would like to upload are in one collection.
- The output value xxxxxxxxxxxxxxxxxxxx+yyyy is the Arvados collection locator that uniquely describes this file.
- Once that is uploaded, navigate back to the dashboard and click on Run a pipeline... and choose GATK2 / exome PE fastq to snp [dev] [public].
- You can change the input by clicking on [Choose] next to the Sample FASTQ reads.
- Click on the dropdown menu, click on your newly-created project, and choose your desired input collection. Click OK and Run to run GATK2 on your data!
FAQ¶
WIP
Updated by Bryan Cosca almost 10 years ago · 5 revisions