Project

General

Profile

Actions

Task #13686

closed

Common Workflow Language (CWL) CGF check

Added by Abram Connelly almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Target version:
-

Description

This CWL pipeline should check the final CGF to make sure it's consistent with the input file (VCF, GFF, etc.).

The pipeline should:

  • Take in gVCFs or GFFs to compare to a set of CGFs
  • Should take in SGLF
  • Should batch the conversion to minimize reloading the SGLF

A small test on two input datasets will do for testing initially.

Actions #1

Updated by Abram Connelly over 5 years ago

Brach 13686-check-cgf-gff.

I'm restricting the scope to GFF. When we start processing more gVCF files in earnest we can open another ticket to push extend/update/verify the gVCF to CGF checks are working.

The check assumes the scatter is on chromosome. The check loads the SGLF for a particular tilepath and then checks a batch of CGF (for that tilepath) at once.

The GFF needs to be formatted properly as I ran into issues when using tabix with the headers that came out of the Harvard PGP site. Specifically:

  • The GFF files should have not headers (no beginning '#' lines)
  • The GFF files should be indexed with tabix (the .tbi tabix index file should be present).
  • The GFF files should have the same "basename", without the suffix .gff.gz, as the input CGF files.

As a technical note, the script (verify-cgf-gff.sh) does a find for the appropriate file, so the directory structure for the GFF files isn't so important.

Actions #2

Updated by Abram Connelly over 5 years ago

  • Status changed from In Progress to Closed
  • Remaining (hours) set to 0.0
Actions

Also available in: Atom PDF