Project

General

Profile

Pathomap tutorial » History » Version 9

Bryan Cosca, 02/13/2015 07:09 PM

1 2 Bryan Cosca
h1. Running Pathomap using Arvados
2
3 5 Bryan Cosca
This tutorial demonstrates how to run the Pathomap pipeline using the example that the Mason Lab provides at their "page":http://www.pathomap.org/. PathoMap is a research project by Weill Cornell Medical College to study the microbiome and metagenome of the built environment of NYC.  The Pathomap publication is available here: "Afshinnekoo et al., Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics, CELS 2015":http://dx.doi.org/10.1016/j.cels.2015.01.001. This tutorial introduces the following Arvados features:
4 2 Bryan Cosca
5 1 Bryan Cosca
* How to run Pathomap using Arvados
6 2 Bryan Cosca
* How to access your pipeline results.
7 5 Bryan Cosca
* How to browse and select your input data for Pathomap and submit re-run the pipeline.
8 1 Bryan Cosca
9 2 Bryan Cosca
# Start at the "Curoverse":https://curoverse.com/ website and click Log In at the top. We currently support all Google / Google Apps accounts for authentication. By simply choosing a Google-based account, your account will be automatically created and redirect to the "Arvados Workbench":https://workbench.qr1hi.arvadosapi.com/.
10
# In the *Active pipelines* panel, click on the *Run a pipeline...* button. Doing so opens a dialog box titled *Choose a pipeline to run*.
11 5 Bryan Cosca
# Select *Mason Lab -- Ancestry Mapper (public)* and click the *Next: choose inputs* button. Doing so loads a new page to supply the inputs for the pipeline.
12
# The default inputs from the Pathomap source code repository are already pre-loaded. Click on the *Run* button. The page updates to show you that the pipeline has been submitted to run on the Arvados cluster.
13 2 Bryan Cosca
# After the pipeline starts running, you can track its progress by watching log messages from jobs.  This page refreshes automatically.  You will see a complete label under the job the column when the pipeline completes successfully. The current run time of the job in CPU and clock hours is also displayed. You can view individual job details by clicking on the job name.
14
# Once the job is finished, the output can be viewed to the right of the run time.
15
# Click on the download button to the right of the file to download your results, or the magnifying glass to quickly view your results.
16 1 Bryan Cosca
17 2 Bryan Cosca
h2. Uploading data through the web and using it on Arvados
18 1 Bryan Cosca
19
# In your home project, click on the blue *+ Add data* button in the top right.
20
# Click *Upload files from my computer*
21 5 Bryan Cosca
# Click *Choose Files* and VCF file you would like to run Pathomap on.
22 2 Bryan Cosca
# Once you're ready, click *> Start*
23
# Feel free to rename your Collection so you can remember it later. Click on the pencil icon in the top left corner next to *New collection*
24 5 Bryan Cosca
# Go to the *Mason Ancestry Mapper* Project and select the 00-All.vcf.gz and 1KG chr1-22 VCF Collections.
25
# Move those Collections to your Home project.
26 8 Bryan Cosca
# Select your VCF Collection, 00-All.vcf.gz and 1KG chr1-22 VCF Collections and using the Selection drop down menu in the left left under *Data Collections*, Select *Create new collection with selected collections*
27 6 Bryan Cosca
# Once that is complete, navigate back to the dashboard and click on *Run a pipeline...* and choose *Mason Lab -- Ancestry Mapper (public)*.
28 5 Bryan Cosca
# You can change the input by clicking on the *[Choose]* button next to the *Input VCF file + 00-all.vcf + 1KG VCFs*.
29
# Click on the dropdown menu, click on Home, and choose your desired input collection. Click *OK* and *Run* to run Pathomap on your data!
30 2 Bryan Cosca
31
h2. Uploading data through your shell and using it on Arvados
32
33
Full documentation can be found "here":http://doc.arvados.org/user/tutorials/tutorial-keep.html
34
35
# Install the "Arvados Python SDK":http://doc.arvados.org/sdk/python/sdk-python.html on the system from which you will upload the data (such as your workstation, or a server containing data from your sequencer). Doing so will install the Arvados file upload tool, arv-put.
36
# To configure the environment with the Arvados instance host name and authentication token, see "here":http://doc.arvados.org/user/reference/api-tokens.html 
37
# Navigate back to your Workbench dashboard and create a new project by clicking on the Projects dropdown menu and clicking Home. 
38 9 Bryan Cosca
# Click on *[+ Add a subproject]*. Feel free to edit the Project name or description by clicking the pencil to the right of the text.
39 5 Bryan Cosca
# To add data, return to your shell, create a folder, and put the VCF files you want to upload inside. Use the command arv-put * --project-uuid qr1hi-xxxxx-yyyyyyyyyyyyyyy. The qr1hi tag can be found in the url of your new project. This ensures that all the files you would like to upload are in one collection.
40 2 Bryan Cosca
# The output value xxxxxxxxxxxxxxxxxxxx+yyyy is the Arvados collection locator that uniquely describes this file.
41 7 Bryan Cosca
# Once that is uploaded, navigate back to the dashboard and click on *Run a pipeline...* and choose *Mason Lab -- Ancestry Mapper (public)*.
42 5 Bryan Cosca
# You can change the input by clicking on [Choose] next to the *Input VCF file + 00-all.vcf + 1KG VCFs*.
43
# Click on the dropdown menu, click on your newly-created project, and choose your desired input collection. Click *OK* and *Run* to run Pathomap on your data!