Story #7455

Update Lightning v0.1 Specifications

Added by Sarah Guthrie almost 4 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assigned To:
Sarah Guthrie
Start date:
10/06/2015
Due date:
% Done:

100%

Estimated time:
(Total: 6.50 h)
Story points:
1.0

Lightning.pdf (213 KB) Lightning.pdf Sarah Guthrie, 10/15/2015 11:21 PM

Subtasks

Task #7458: ReviewClosedAbram Connelly

Task #7457: Respond to commentsClosedSarah Guthrie


Related issues

Follows Lightning - Story #5590: Document Lightning SpecificationsClosed09/15/2015

Follows Lightning - Task #7299: Review documentationClosed10/05/2015

History

#1 Updated by Sarah Guthrie almost 4 years ago

  • Assigned To set to Sarah Guthrie
  • Description updated (diff)

#2 Updated by Sarah Guthrie almost 4 years ago

From 7299:

As requested, I'm focusing on the "Data Structures" and the "Lightning API" sections.

Some points:

  • All API queries should have an example including the request example. I notice a lot of responses but few have requests. Even if it's obvious it still should be provided to provide some context.
  • I don't know what POST /callsets is doing. If this is future functionality maybe it should be taken out of this spec. If you want to keep it please document what it's doing, what each of the parameters means and what the behavior is.
  • Searches can be specified and queried without any credential information? Doesn't this pose security risks? Won't searches accumulate? Why have this in the first place?
  • Provide a detailed description of the response to 8.2.2 POST /tile-variants/loci describing what the vector ['hg19', 'chr1', 0, 3000, 3248] represents. A brief "third parameter is the index (0 in this case), start position, end position (non-inclusive)".
  • Provide examples for each of the data types in the Data Structures Specifications section.
  • What is the tag-set-integer from GET /tile-library/tag-sets/{tag-set-identifier} (section 8.1.4)? Give a brief explanation.
  • In section 8.1.5, make it clear that the "BRCA Lightning Instance" is a Lightning server only serving those two tile paths and nothing else.
  • What is the num-positions in 8.1.6? Explain briefly what that number represents.

Maybe premature for this document but having concrete examples (via a 'wget', 'curl' or some other facility) helps with understanding. In general, having each section or sub-section be as self contained as possible is something I would strive for. The way I read API documents is as a reference, often jumping to the section that I want to understand. Usually, an example is good enough for me to understand what it does and how to use it. I consider it undesirable to bounce around in the document looking up definitions or terminology so that I can understand what a particular API call does.

I also think we should hold off on providing facilities to create objects via the API. I'm happy to hear otherwise but this has some implications that we might not want to address right now.

Otherwise it looks good as an initial API.

#3 Updated by Sarah Guthrie almost 4 years ago

  • Project changed from Curoverse Science to Lightning

#4 Updated by Sarah Guthrie almost 4 years ago

  • Status changed from New to In Progress

#5 Updated by Sarah Guthrie almost 4 years ago

Responses

All API queries should have an example including the request example. I notice a lot of responses but few have requests. Even if it's obvious it still should be provided to provide some context.

GET cannot have request body's by definition of RESTful API, so I only included GET query parameters if they were available.

I don't know what POST /callsets is doing. If this is future functionality maybe it should be taken out of this spec. If you want to keep it please document what it's doing, what each of the parameters means and what the behavior is.

I'll put it in v0.1.1.

Searches can be specified and queried without any credential information? Doesn't this pose security risks?

Since Lightning is built on top of Arvados, I was thinking of using Arvados credentials to run any of these API. Any suggestions?

Won't searches accumulate? Why have this in the first place?

Searches will accumulate, though it's possible to delete them. I wanted a way to RESTfully search, regardless of how much time the query will take. POST queries are the way that was most intuitive.

Provide a detailed description of the response to 8.2.2 POST /tile-variants/loci describing what the vector ['hg19', 'chr1', 0, 3000, 3248] represents. A brief "third parameter is the index (0 in this case), start position, end position (non-inclusive)".

Good catch! That was supposed to be edited and in a v0.1.1 document.

Provide examples for each of the data types in the Data Structures Specifications section.

Done.

What is the tag-set-integer from GET /tile-library/tag-sets/{tag-set-identifier} (section 8.1.4)? Give a brief explanation.

Done.

In section 8.1.5, make it clear that the "BRCA Lightning Instance" is a Lightning server only serving those two tile paths and nothing else.

Done.

What is the num-positions in 8.1.6? Explain briefly what that number represents.

I added more of an explanation.

Maybe premature for this document but having concrete examples (via a 'wget', 'curl' or some other facility) helps with understanding.

That is pretty easy. Done.

In general, having each section or sub-section be as self contained as possible is something I would strive for. The way I read API documents is as a reference, often jumping to the section that I want to understand. Usually, an example is good enough for me to understand what it does and how to use it. I consider it undesirable to bounce around in the document looking up definitions or terminology so that I can understand what a particular API call does.

Point out particular places that I should clarify please?

I also think we should hold off on providing facilities to create objects via the API. I'm happy to hear otherwise but this has some implications that we might not want to address right now.

POST search is for searching the system. Is this acceptable as the only POST in this API version?

Next Step

I uploaded the latex pdf document required, but it has some problems with text wrapping. If you want a dynamic visualization that looks a lot better, follow this recipe using the curoverse/lightning github repository:


git pull origin master
sudo apt-get install python-pip
sudo pip install Sphinx
sudo pip install sphinx_rtd_theme
cd docs
make html

Open lighting/docs/_build/html/index.html in your browser. Enjoy!

#6 Updated by Sarah Guthrie almost 4 years ago

  • Target version changed from 2015-10-23 Lightning sprint to 2015-11-13 Lightning sprint

#7 Updated by Abram Connelly almost 4 years ago

The 'assembly-pdh' (doc/data_structures/v0.1.0.html#assembly) is a bit confusing. Maybe mention that it's the 'assembly-pdh' as returned by the 'GET/assemblies' call where it's a required parameter?

The 'GET/searches' is not very clear on usage. To get the results of a search, you first issue a 'POST/searches' request then follow it up with a 'GET/searches/{search-id}'? This should probably be mentioned somewhere (in all '*/searches' queries for example).

Otherwise looks good.

#8 Updated by Abram Connelly almost 4 years ago

Not something I'm happy about but apparently JSON doesn't allow for single quotes. See jquery-single-quote-in-json-response.

Examples that use single quotes should probably have double quotes in them so anyone cutting and pasting examples won't have a bad experience.

#9 Updated by Abram Connelly almost 4 years ago

doc/api/v0.1.0.html#get-status needs quotes around 'api version' (it's a string).

for example

{ "api-version" :"0.1.0" }

#10 Updated by Abram Connelly almost 4 years ago

GET /tile-library/tag-sets says

Returns a list of tag set unique identifiers (portable data hashes of the collection containing the tag set). This collection contains information about the tag set and the path dividers.

Why are there two PDHs returned? There are two whole genome tagsets available? Give a brief description of what's being returned.

And what's meant by 'information about path dividers'?

#11 Updated by Abram Connelly almost 4 years ago

In the example of the get-tile-library-tag-sets-tag-set-identifier function, the return values should be decimal integers not hexadecimal integer. (doc/api/v0.1.0.html#get-tile-library-tag-sets-tag-set-identifier).

#12 Updated by Abram Connelly almost 4 years ago

These are some notes that should not affect the current specification. I'm placing them here to open a dialogue, to make sure some of these issues don't get lost and have an initial place for discussion (maybe we need to move them somewhere else later).

  • tile-library/tag-sets/:id/tile-variants will produce upwards of 9Gb of data (~46 characters per tile, ~20 tile variants per position, 10M tiles). Enumerating every possible tile variant probably isn't what's wanted or needed and pushing 9Gb of data over a wire is probably excessive for most scenarios.
  • I'm dubious as to the utility of the tile-library/tag-sets/:id/tile-positions function. Most (all?) of the time this will be a straight enumeration from 0 to (# steps in the path - 1). It also enumerates all tile positions for every path, pushing ~150Mb for what's most of the time a simple enumeration.
  • We should figure out a consistent naming scheme for 'partial' tile sooner than later. The issue is for tiles that have no-calls that would have different tiles at different positions hash to the same value. You've (Sally) came up with the idea of concatenating the tile id to the sequence to produce the hash that way, effectively adding 'salt' of the tile id to ensure uniqueness. My vote is to keep the spirit of that idea by allowing 'n' in the body of the sequence where a no-call occurs, but selectively capitalizing the tags if a no-call appears on them (but otherwise filling in the tag with the sequence). For example, if the sequence were 'anntgctggcaagtggtcagcaactggacctttgnnnnacagtacaatcacccctgccccactcctcccggccccacccccaggcagttaatgggagaagggaataactgtgtcactcctggcttccagttgctcatcttgctttaaatt
    ggaggcctctggggctgaaagaaactggacaaagtgtgctaagtagcctaatagggctggttctttttctgaaagttccctattgcagaaaaataannt', this would turn into 'aACtgctggcaagtggtcagcaactggacctttgnnnnacagtacaatcacccctgccccactcctcccggccccacccccaggcagttaatgggagaagggaataactgtgtcactcctggcttccagttgctcatcttgctttaaatt
    ggaggcctctggggctgaaagaaactggacaaagtgtgctaagtagcctaatagggctggttctttttctgaaagttccctattgcagaaaaataaAAt' (notice the capitalized sequence in the tags).

#13 Updated by Abram Connelly almost 4 years ago

Errors should at least be discussed. If an error occurs, what happens? Is an HTTP code returned? Is 200 always returned and one should look at the message body? If so, there should be a standard container to pass data back that includes the possibility of an error message being included. If not, what HTTP error codes should be returned under what scenario?

Possible references (5-second Googling):

#14 Updated by Sarah Zaranek over 1 year ago

  • Status changed from In Progress to Closed

Closing this out.

Also available in: Atom PDF