Project

General

Profile

LobSTR tutorial » History » Version 2

Bryan Cosca, 10/15/2014 02:36 PM

1 1 Bryan Cosca
h1. Running lobSTR v.3 using Arvados
2
3
This tutorial demonstrates how to run the lobSTR pipeline using the example that Melissa Gymrek provides at her "page":http://melissagymrek.com/lobstr-code/usage.html. LobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. The lobSTR publication is available here: "Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.":http://genome.cshlp.org/content/early/2012/04/19/gr.135780.111.abstract This will introduce the following Arvados features:
4
5
* How to run lobSTR v.3 using Arvados
6
* How to access your pipeline results.
7 2 Bryan Cosca
* How to browse and select input your data for lobSTR and submit re-run the pipeline.
8 1 Bryan Cosca
9
# Start by going to our "Curoverse":https://curoverse.com/ website and clicking the Log In button at the top. We currently support all google/google apps accounts and by simply choosing your account, your account will be automatically created and redirect to the "Arvados Workbench":https://workbench.qr1hi.arvadosapi.com/.
10 2 Bryan Cosca
# In the *Active pipelines* panel, you can click on the *Run a pipeline...* button. This will open a dialog box titled *Choose a pipeline to run*.
11
# Select *lobstr v.3* and click the *Next: choose inputs* button.  This will load a new page where you will supply the inputs for the pipeline.
12
# The default inputs from Melissa's git repository are already pre-loaded. Click on the *Run* button.  The page updates to show you that the pipeline has been submitted to run on the Arvados cluster.
13
# After the pipeline starts running, you can track the progress by watching log messages from jobs.  This page refreshes automatically.  You will see a complete label under the job the column when the pipeline completes successfully. The current run time of the job in CPU and clock hours is also displayed. You can view individual job details by clicking on the job name.
14
# Once the job is finished, the output can be viewed to the right of the run time.
15
# Click on the download button to the right of the file to download your results, or similarly the magnifying glass to quickly view your results.
16 1 Bryan Cosca
17
h2. Uploading data and using it on Arvados
18
19
Full documentation can be found "here": http://doc.arvados.org/user/tutorials/tutorial-keep.html
20
21
# First, Begin by installing the "Arvados Python SDK":http://doc.arvados.org/sdk/python/sdk-python.html on the system from which you will upload the data (such as your workstation, or a server containing data from your sequencer). This will install the Arvados file upload tool, arv-put. 
22 2 Bryan Cosca
# Navigate back to your Workbench dashboard and create a new project by clicking on the Projects dropdown menu and clicking Home. 
23 1 Bryan Cosca
# Click on [+ Add a subproject]. Feel free to edit the Project name or description by clicking the pencil to the right of the text.
24 2 Bryan Cosca
# To add data, use the command arv-put as well as the name of your file and the tag --project-uuid qr1hi-xxxxx-yyyyyyyyyyyyyyy. The tag can be found in the url of your new project.
25 1 Bryan Cosca
# The output value xxxxxxxxxxxxxxxxxxxx+yyyy is the Arvados collection locator that uniquely describes this file.
26 2 Bryan Cosca
# Once that is uploaded navigate back to the dashboard and click on *Run a pipeline...* and choose lobstr v.3.
27 1 Bryan Cosca
# You can change the input by clicking on [Choose] next to the *Input fastq collection ID*.
28
# Click on the Dropdown menu and click on your created project and choose your desired input collection. Click OK and Run to run lobSTR v.3 on your data!