Feature #4978

Distinguish pgp-originated and other files on public profiles

Added by Ward Vandewege about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Phil Hodgson
Category:
-
Start date:
02/13/2015
Due date:
% Done:

67%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

There is no way for the viewer of a public profile page to tell the difference between a user-uploaded file and a PGP-uploaded file.

Add a way, either as a column in the table, or by making two different sections in the profile.


Subtasks

Task #5208: Review 4978ResolvedWard Vandewege

Task #5209: Show new column with file originResolvedPhil Hodgson

Task #5210: Render public profile as JSONResolvedPhil Hodgson

Associated revisions

Revision 155a9e92 (diff)
Added by Ward Vandewege almost 5 years ago

Also show source on the public genetic data page.

refs #4978

Revision 4c59eb76 (diff)
Added by Ward Vandewege almost 5 years ago

List source as Participant or PGP

refs #4978

Revision c935f806
Added by Ward Vandewege almost 5 years ago

Merge branch 'tapestry-master' into 4978-public-profile-data

refs #4978

Revision 56accd66
Added by Ward Vandewege almost 5 years ago

Merge branch '4978-public-profile-data' into tapestry-master

closes #4978

Revision 5b6809b9 (diff)
Added by Ward Vandewege almost 5 years ago

JSON improvements.

- Provide more information in the JSON version of the public genetic data page.
- Bring the data_source field in the JSON version of the public profiles
and the public genetic data page in line with the HTML version.

refs #4978

Revision b1dc1648
Added by Ward Vandewege almost 5 years ago

Merge branch '4978-public-profile-data' into tapestry-master

refs #4978

Revision 1d578e12 (diff)
Added by Ward Vandewege almost 5 years ago

JSON improvements.

- Provide more information in the JSON version of the public profiles.

refs #4978

Revision 16c91da4
Added by Ward Vandewege almost 5 years ago

Merge branch '4978-public-profile-data' into tapestry-master

refs #4978

History

#1 Updated by Ward Vandewege about 5 years ago

  • Description updated (diff)

#2 Updated by Phil Hodgson about 5 years ago

So, my first question would be: what indeed is the difference?

#3 Updated by Phil Hodgson about 5 years ago

  • % Done changed from 0 to 80

I think I may have determined the difference, and I'm going to use it as a guess: the list of "user files and datasets" has two basic types of files shown: ones from the UserFile model and ones from the Dataset model (in the case of the Public Profile, also using the "published" scope). I'm not rightly sure I know whether the Dataset model implies that its source is the PGP, but it seems a reasonable guess. I'll make a commit which exposes the model class and may be that way it will be possible to distinguish them. I'm also going to throw in the JSON rendering of the Public Profile information, because it's so easy for us to do this and so helpful to what is needed (i.e. rather than scraping the public profile HTML!).

#4 Updated by Ward Vandewege about 5 years ago

That distinction is correct. I also have another way almost ready to download each set of data, by using workbench. I've created projects for both kinds of data, and am wrapping up some information for Madeleine to get at the data that way. Anyway, what you're doing is fine too; it will be good to indicate on the download page which files originate from Harvard PGP.

#5 Updated by Phil Hodgson about 5 years ago

Okay, that's good.

Meanwhile, for the sake of documenting the process, here's Madeleine's email. I'll start with this, and leave some of the other bits out for another iteration (like CCRs and Samples).


Yes, a REST API representing public profile content would be great! I prefer JSON to XML (pretty sure this is also easy for you to do). That JSON should include distinguishing the source of a file.

We should still also distinguish sources on the HTML version of the page though, it's just user unfriendly not to be clear about that I think...

Lots of public profile data could (should?) be represented in a JSON version. My current focus is on getting: * list of links to genome data files generated by the PGP (i.e. PGP is source), plus some descriptor of file type, e.g. "Complete Genomics var file", "Complete Genomics masterVarBeta file" (you might not have data types recorded in this much detail, so it's ok if I have to continue inferring data type) * JSON-format survey data, including for each: name of survey, timestamp, and a list of question and response values

My current code infers all this from the HTML content. But scraping is fragile, JSON would be great to get. It's okay if I still have to do some inferences (e.g. infer file types based on the file title), just avoiding parsing this from HTML would be an important improvement.

#6 Updated by Phil Hodgson about 5 years ago

  • Assigned To set to Phil Hodgson

#7 Updated by Phil Hodgson about 5 years ago

There's some useful work that's been committed (45c0a79c), pushed to the 4978-public-profile-data branch, and which should be able to help all so far concerned.

#8 Updated by Ward Vandewege almost 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 33 to 100

Also available in: Atom PDF