Bug #15358

[cwl] CWL conformance test formattest2 fails with C locale

Added by Tom Morris 7 months ago. Updated about 3 hours ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/03/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When LANG=C instead of a UTF-8 locale like en_US.UTF-8, the CWL conformance test, v1.0/formattest2.cwl, fails with an encoding error trying to read EDAM.owl which contains UTF-8 characters, but doesn't have an XML encoding declaration in its prolog.

Test 65 failed: /home/ci/arvados-cwl-runner-with-checksum.sh --outdir=/tmp/tmp7oGCB5 --quiet v1.0/formattest2.cwl v1.0/formattest2-job.json
Test format checking against ontology using subclassOf.
Returned non-zero
URI prefix 'edam' of 'edam:format_1929' not recognized, are you missing a $namespaces section?
Could not load extension schema keep:29dc87213e125b67355699e8953d3820+62/EDAM.owl: 'ascii' codec can't decode byte 0xc3 in position 3352: ordinal not in range(128)
ERROR Workflow execution failed:
Expected value of 'input' to have format http://edamontology.org/format_2330 but
  File has an incompatible format: {
    "format": "http://edamontology.org/format_1929", 
    "basename": "ref.fasta", 
    "nameroot": "ref", 
    "nameext": ".fasta", 
    "location": "keep:23b1d68b203d6c75f314fe9804f50c0e+59/ref.fasta", 
    "class": "File", 
    "size": 12010
}
ERROR Workflow error, try again with --debug for more information:
Workflow did not return a result.

the offending XML snippet is:

<dc:creator>Matúš Kalaš</dc:creator>

Subtasks

Task #15393: Review 15358-fetch-text-encoding ResolvedEric Biagiotti


Related issues

Related to Arvados - Bug #15655: [CWL] encoding error when printing error log tailResolved10/09/2019

Associated revisions

Revision 88af3b04
Added by Peter Amstutz 7 months ago

Merge branch '15358-fetch-text-encoding' closes #15358

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Tom Morris 7 months ago

  • Subject changed from [cwl] CWL conformance test fails with C locale to [cwl] CWL conformance test formattest2 fails with C locale

#3 Updated by Peter Amstutz 7 months ago

This might just be an upstream conformance test fix to the XML file to declare the correct encoding (assuming the Python XML loader handles it).

#4 Updated by Tom Morris 7 months ago

  • Target version set to 2019-07-03 Sprint

#5 Updated by Tom Morris 7 months ago

Peter Amstutz wrote:

This might just be an upstream conformance test fix to the XML file to declare the correct encoding (assuming the Python XML loader handles it).

That may mask the bug, but since the default encoding for XML files is supposed to be UTF-8, it should work without an explicit declaration.

#6 Updated by Peter Amstutz 7 months ago

  • Assigned To set to Peter Amstutz

#7 Updated by Peter Amstutz 7 months ago

This might be a default encoding problem when reading from keep, not sure.

#8 Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress

#9 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2019-07-03 Sprint to 2019-07-17 Sprint

#10 Updated by Peter Amstutz 7 months ago

to reproduce:

$ export LANG=C
$ arvados-cwl-runner formattest2.cwl formattest2-job.json 
INFO /home/peter/work/scripts/venv/bin/arvados-cwl-runner 1.4.0.20190627185953, arvados-python-client 1.4.0.20190627173408, cwltool 1.0.20190607183319
INFO Resolved 'formattest2.cwl' to 'file:///home/peter/work/common-workflow-language/v1.0/v1.0/formattest2.cwl'
INFO Upload local files: "ref.fasta" 
INFO Using collection 23b1d68b203d6c75f314fe9804f50c0e+59 (4xphq-4zz18-gdz4bibpfb0e5ko)
INFO Upload local files: "EDAM.owl" 
INFO Using collection 29dc87213e125b67355699e8953d3820+62 (4xphq-4zz18-qweb7yf0dbmqbir)
Could not load extension schema keep:29dc87213e125b67355699e8953d3820+62/EDAM.owl: 'ascii' codec can't decode byte 0xc3 in position 3352: ordinal not in range(128)
...

#11 Updated by Peter Amstutz 7 months ago

15358-fetch-text-encoding @ 927d62b545e90676bf4729b6c1ebee56d51eacbe

Add encoding option to CollectionFsAccess.open() and use in fetch_text()

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1366/

#13 Updated by Peter Amstutz 7 months ago

  • Status changed from In Progress to Resolved

#14 Updated by Peter Amstutz 4 months ago

  • Related to Bug #15655: [CWL] encoding error when printing error log tail added

#15 Updated by Peter Amstutz about 3 hours ago

  • Release set to 22

Also available in: Atom PDF