Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422011-02-15T11:15:42ZArvados
Redmine GET-Evidence - Bug #468 (New): latest-flat incorrectly displays non single-base substitution vari...https://dev.arvados.org/issues/4682011-02-15T11:15:42ZMadeleine Ballmpball@gmail.com
<p>I haven't checked, but this problem likely extends to deletion & other new length changing or multiple-aa variants.</p>
<p>This bug blocks creating a flatfile version of gff_getevidence_map.py</p>
<p>For example, FIG4-K278Shift is showing up like this (2nd and 3rd columns are incorrect):<br />FIG4 278 278 pathogenic Moderate clinical importance, Uncertain pathogenic recessive 0 0 0 0 0 3 Y -- - Y 0 Y 3 - 2 - Y - - - - Y 4 Y 2N 1 0 This variant is predicted to cause a frameshift and may cause Charcot-Marie-Tooth Disease Type 4J in an autosomal recessive manner. Other variants in this gene which cause frameshift and premature termination have been implicated in causing this disease when compound heterozygous with another FIG4 variant.</p>
<p>While MYL2-A13T shows up like this:<br />MYL2 Ala13Thr A13T pathogenic Low clinical importance, Uncertain pathogenic dominant 1 1 1 6 6 3 Y 4Y 0 Y ! Y 4 Y 3 Y Y Y - - Y Y 1 - Familial Hypertrophic Cardiomyopathy 4596 1 455 3.054 4 N 0 0 This rare variant is implicated in causing late-onset familial hypertrophic cardiomyopathy. The variant has been found in five affected Caucasian individuals (in four families), but affected non-carriers and unaffected carriers have also been observed. No statistically significant enrichment of this variant in cases vs. controls has been shown.</p> GET-Evidence - Bug #466 (New): xmlrpc server is careless with huge temp fileshttps://dev.arvados.org/issues/4662011-02-11T17:42:02ZTom Cleggtom@curii.com
<p>For example, if processing aborts due to insufficient disk space while writing a huge temp file, the huge temp file (and perhaps other temp files) do not get deleted, and the disk stays full for no good reason.</p>
<p>Ideally, the processing code should use pipelines instead of temp files, or do "foreach record: foreach plugin: ..." instead of "foreach plugin: foreach record: ...".</p> longupload - Bug #451 (New): handle xmlhttprequest timeouts gracefullyhttps://dev.arvados.org/issues/4512011-01-13T19:04:50ZTom Cleggtom@curii.com
<p>Currently, browser may stall for a long time if a "send block to server" operation fails. Perhaps some browsers won't even call the onreadystatechange handler. The "send block" code should use a timer to detect this sort of thing and retry, like it does now after receiving a "fail" message from the server.</p> GET-Evidence - Feature #426 (In Progress): Use compute cloud for back-end processinghttps://dev.arvados.org/issues/4262010-11-28T20:28:11ZWard Vandewegeward@curii.com
We need to modify the background processing code so it can run on a "fresh" node:
<ul>
<li>Pre-process reference data (refFlat, hg18.2bit, hg19.2bit) and put it in warehouse storage</li>
<li>Make mr-get-evidence wrapper:
<ul>
<li>in step 0, scan the input, queue 1 jobstep per chromosome, and output the comments/metadata</li>
<li>fetch/extract the reference data (if not already extracted by previous jobstep)</li>
<li>grep for the desired chromosome, sort, do the rest of the processing</li>
</ul></li>
</ul>
<p>We should still support single-node installations. For this case we need a mechanism to prevent the server from overtaxing itself if many jobs are submitted at once (e.g., by default, max # concurrent jobs = # cpus).</p>
<ul>
<li>Possible solution: Try to flock() one of N lockfiles in /home/trait/lock/slot.X. If all are already locked, wait random# seconds and try again. When a flock succeeds, start the job (pass the lock to the job process, so the lock releases when the process quits).</li>
</ul>
<p>The xmlrpc server should be replaced with a job queue. The web gui should submit a job by inserting a row into a MySQL table.</p>
The background service (probably running on the same machine as the webgui) will check the queue every few seconds (and when triggered by webgui via named socket or something). For each job in the queue:
<ul>
<li>Just delete it if we've already started/queued a process for this dataset.</li>
<li>If cloud processing is available, submit a batch job and note job# J and queuetime</li>
<li>Start a local job if local processing slots are available <em>and</em>...
<ul>
<li>cloud processing is not available, <em>or</em></li>
<li>a batch job was submitted for this data set but failed, <em>or</em></li>
<li>a batch job was submitted for this data set >30 seconds ago <em>and</em> that job hasn't started yet (cloud is busy)</li>
</ul>
</li>
<li>If the batch job J for this data set has succeeded:
<ul>
<li>Make a symlink or something in {hash}-out/ so the web gui knows the results are available.</li>
<li>Delete the queue entry.</li>
<li>If there are some results in {hash}-out/ns.gff.gz etc. from previous analyses, delete them.</li>
<li>Get a local copy of the get-evidence.json file from the warehouse, but wait to get the other stuff from the warehouse until someone downloads them.</li>
</ul></li>
</ul>
Storage:
<ul>
<li>Copy the uploaded data to the cloud in the background service, while checking for new items in the queue. Make a symlink genotype.gff.archive -> warehouse:///{hash}/input.gff.gz</li>
<li>If user provides a warehouse:/// url instead of file:///, just make the genotype.gff.archive symlink instead of copying the file to local storage.</li>
</ul> GET-Evidence - Feature #490 (In Progress): Classify web-hits as relevant/not-relevanthttps://dev.arvados.org/issues/4902010-05-10T09:25:31ZMadeleine Ballmpball@gmail.com
<p>Each time a variant is seen in a new genome, it should be queued for web-search. The search should take place on chr position, rsID, and when available gene/amino acid change (both one letter and three letter abbreviations).</p>
<p>Logged in users should be able to classify web-search results as relevant/not-relevant. (Incrementing a counter for "relevant" or "not relevant"; user can change their mind but only one "vote" per user!)</p>
<p>Implementation notes:</p>
<ul>
<li>Add columns to flat_summary table for autoscore, web hits, genome hits, webscore... and refresh it</li>
<li>Relax web search criteria: include all single-genome-hit variants</li>
<li>Add long form AA (and rsid where applicable) to search terms in web search</li>
<li>Requeue old web searches (to pick up rsid and long form AA results)</li>
<li>Web hit vote history: {variant, url, oid, timestamp, score}</li>
<li>Web hit current vote: {variant, url, score}</li>
<li>UI: "vote yes" and "vote no" buttons (immediate ajax call)</li>
<li>During vote event, update flat summary if webscore has changed as a result</li>
<li>Tie = relevant, otherwise majority</li>
</ul>
<p>Questions:</p>
<ul>
<li>Is webscore=0 (not relevant) suitable for variants with no web search results?</li>
</ul> GET-Evidence - Feature #488 (New): Allow non-nsSNP variantshttps://dev.arvados.org/issues/4882010-05-08T16:02:23ZTom Cleggtom@curii.com
Need to stop restricting database to nsSNP variants.
<ul>
<li>Support lookup/create by dbSNP id</li>
<li>Support lookup/create by {chromosome, position, ref_seq, variant_seq}</li>
</ul>
Considerations:
<ul>
<li>Is "strand" helpful? Or should "chr1:1234:-:A:C" just be described as "chr1:1234:T:G"?</li>
<li>How to deal with different references?
<ul>
<li>Include a reference identifier in database key -- "hg18:chr:pos:refseq:varseq" </li>
<li>Extend database schema to include multiple identifiers per locus, so "hg18:chr1:1234" and "hg19:chr1:1235" can map to the same variant ID</li>
</ul>
</li>
<li>Ensure only one page per variant. E.g., if rs1234 maps to chr1:1234:T:G and causes ABCD-A1T, then there should only be one GET-Evidence page regardless of which order those identifiers are looked up / used to create a page.</li>
</ul> GET-Evidence - Feature #486 (New): support publications without PMIDs (other namespaces? original...https://dev.arvados.org/issues/4862010-05-07T18:53:26ZTom Cleggtom@curii.com
<p>support publications without PMIDs (other namespaces? original contributions? OWW?)</p> GET-Evidence - Feature #485 (New): Don't let users add variant pages for genes that aren't in kno...https://dev.arvados.org/issues/4852010-05-07T18:53:13ZTom Cleggtom@curii.com
<p>(If they're not in knownGene, they can never appear in variant reports...)</p> GET-Evidence - Feature #484 (New): Figure out better solution to HNF1A-Ser574Gly (genomes) vs. HN...https://dev.arvados.org/issues/4842010-05-07T18:52:55ZTom Cleggtom@curii.com
<p>Figure out better solution to HNF1A-Ser574Gly (genomes) vs. HNF1A-Gly574Ser (omim)</p> GET-Evidence - Feature #483 (New): Handle/prevent edit conflicts more effectivelyhttps://dev.arvados.org/issues/4832010-05-07T18:52:35ZTom Cleggtom@curii.com
<ul>
<li>detect “version you’re editing already superseded” and ask user what to do</li>
<li>detect “someone else is editing this page/section” if logged in</li>
</ul> GET-Evidence - Feature #482 (New): add “affects self (hom or dominant)” checkbox (vs “affects off...https://dev.arvados.org/issues/4822010-05-07T18:51:38ZTom Cleggtom@curii.com
<p>add “affects self (hom or dominant)” checkbox (vs “affects offspring”) on result page</p> GET-Evidence - Feature #481 (New): Test Internet Explorer, possibly use Chrome Frame to make edit...https://dev.arvados.org/issues/4812010-05-07T18:50:36ZTom Cleggtom@curii.com
<p>Test Internet Explorer, possibly use Chrome Frame to make editing features work in IE</p> GET-Evidence - Bug #479 (New): NSF-Lys702Asn shows wrong hapmap frequencyhttps://dev.arvados.org/issues/4792010-05-07T18:48:33ZTom Cleggtom@curii.com
<p>NSF-Lys702Asn shows wrong hapmap frequency</p> GET-Evidence - Bug #478 (New): Keep "overall odds ratio" updated when editing individual sets of ...https://dev.arvados.org/issues/4782010-05-07T18:47:29ZTom Cleggtom@curii.com
<p>Keep "overall odds ratio" updated when editing individual sets of OR figures</p> GET-Evidence - Feature #476 (New): allow “lab members” group to upload/view publication PDFshttps://dev.arvados.org/issues/4762010-05-07T17:31:32ZTom Cleggtom@curii.com
<p>allow “lab members” group to upload/view publication PDFs</p>