Okay, that's good.
Meanwhile, for the sake of documenting the process, here's Madeleine's email. I'll start with this, and leave some of the other bits out for another iteration (like CCRs and Samples).
Yes, a REST API representing public profile content would be great! I prefer JSON to XML (pretty sure this is also easy for you to do). That JSON should include distinguishing the source of a file.
We should still also distinguish sources on the HTML version of the page though, it's just user unfriendly not to be clear about that I think...
Lots of public profile data could (should?) be represented in a JSON version. My current focus is on getting:
* list of links to genome data files generated by the PGP (i.e. PGP is source), plus some descriptor of file type, e.g. "Complete Genomics var file", "Complete Genomics masterVarBeta file" (you might not have data types recorded in this much detail, so it's ok if I have to continue inferring data type)
* JSON-format survey data, including for each: name of survey, timestamp, and a list of question and response values
My current code infers all this from the HTML content. But scraping is fragile, JSON would be great to get. It's okay if I still have to do some inferences (e.g. infer file types based on the file title), just avoiding parsing this from HTML would be an important improvement.