Feature #4978


Distinguish pgp-originated and other files on public profiles

Added by Ward Vandewege over 8 years ago. Updated over 8 years ago.

Assigned To:
Phil Hodgson
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)
Story points:


There is no way for the viewer of a public profile page to tell the difference between a user-uploaded file and a PGP-uploaded file.

Add a way, either as a column in the table, or by making two different sections in the profile.

Subtasks 3 (0 open3 closed)

Task #5208: Review 4978ResolvedWard Vandewege02/13/2015

Task #5209: Show new column with file originResolvedPhil Hodgson02/13/2015

Task #5210: Render public profile as JSONResolvedPhil Hodgson02/13/2015

Actions #1

Updated by Ward Vandewege over 8 years ago

  • Description updated (diff)
Actions #2

Updated by Phil Hodgson over 8 years ago

So, my first question would be: what indeed is the difference?

Actions #3

Updated by Phil Hodgson over 8 years ago

  • % Done changed from 0 to 80

I think I may have determined the difference, and I'm going to use it as a guess: the list of "user files and datasets" has two basic types of files shown: ones from the UserFile model and ones from the Dataset model (in the case of the Public Profile, also using the "published" scope). I'm not rightly sure I know whether the Dataset model implies that its source is the PGP, but it seems a reasonable guess. I'll make a commit which exposes the model class and may be that way it will be possible to distinguish them. I'm also going to throw in the JSON rendering of the Public Profile information, because it's so easy for us to do this and so helpful to what is needed (i.e. rather than scraping the public profile HTML!).

Actions #4

Updated by Ward Vandewege over 8 years ago

That distinction is correct. I also have another way almost ready to download each set of data, by using workbench. I've created projects for both kinds of data, and am wrapping up some information for Madeleine to get at the data that way. Anyway, what you're doing is fine too; it will be good to indicate on the download page which files originate from Harvard PGP.

Actions #5

Updated by Phil Hodgson over 8 years ago

Okay, that's good.

Meanwhile, for the sake of documenting the process, here's Madeleine's email. I'll start with this, and leave some of the other bits out for another iteration (like CCRs and Samples).

Yes, a REST API representing public profile content would be great! I prefer JSON to XML (pretty sure this is also easy for you to do). That JSON should include distinguishing the source of a file.

We should still also distinguish sources on the HTML version of the page though, it's just user unfriendly not to be clear about that I think...

Lots of public profile data could (should?) be represented in a JSON version. My current focus is on getting: * list of links to genome data files generated by the PGP (i.e. PGP is source), plus some descriptor of file type, e.g. "Complete Genomics var file", "Complete Genomics masterVarBeta file" (you might not have data types recorded in this much detail, so it's ok if I have to continue inferring data type) * JSON-format survey data, including for each: name of survey, timestamp, and a list of question and response values

My current code infers all this from the HTML content. But scraping is fragile, JSON would be great to get. It's okay if I still have to do some inferences (e.g. infer file types based on the file title), just avoiding parsing this from HTML would be an important improvement.

Actions #6

Updated by Phil Hodgson over 8 years ago

  • Assigned To set to Phil Hodgson
Actions #7

Updated by Phil Hodgson over 8 years ago

There's some useful work that's been committed (45c0a79c), pushed to the 4978-public-profile-data branch, and which should be able to help all so far concerned.

Actions #8

Updated by Ward Vandewege over 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 33 to 100

Also available in: Atom PDF