Bug #13766

[API] bug in libyaml / Pysch used by API server to parse yaml

Added by Peter Amstutz over 1 year ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/09/2018
Due date:
% Done:

50%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Since 0.13.11 ruamel.yaml produces files like this:

  - id: #main/x
    type: File
    default: {class: File, location: keep:169f39d466a5438ac4a90e779bf750c7+53/blorp.txt,
      size: 16, basename: blorp.txt, nameroot: blorp, nameext: .txt}

The issue comes up from keep:169f39d466a5438ac4a90e779bf750c7+53/blorp.txt in "flow style" has an embedded ':' character.

Relevant changelog:

(0.13.11) 2017-01-23:
  - allow ':' in flow style scalars if not followed by space. Also don't
    quote such scalar as this is no longer necessary.
  - add python 3.6 manylinux wheel to PyPI

This is explicitly allowed by the YAML spec, however the API server rejects it:

<HttpError 422 when requesting https://172.17.0.2:8000/arvados/v1/workflows?alt=json returned "Definition is not valid yaml abc: (<unknown>): found unexpected ':' while scanning a plain scalar at line 22 column 38">

This seems to be a bug in libyaml, which is used by Pysch (the Ruby YAML engine). It was recently fixed:

https://github.com/yaml/libyaml/pull/104

However at the time of this writing there is no stable release of libyaml with this bugfix, only a prerelease 0.2.2-pre1

Pysch is part of the standard library. I don't know if it is tied to a particular libyaml, and how to upgrade which libyaml gets used.

Options:

  • Wait for stable releases of libyaml / pysch (unknown amount of time)
  • Downgrade ruamel.yaml (messy)
  • Some kind of regex output fixup in a-c-r to ensure these values are quoted
  • Some kind of regex input fixup on api server to ensure these values are quoted
  • Emit plain block style yaml, not "round trip" (breaks tests, tedious but fixable)
  • Emit plain json (breaks tests, tedious but fixable)

Subtasks

Task #13769: Review 13766-libyaml-workaroundResolvedPeter Amstutz

Task #13848: Monitor dependencies for release of updateIn Progress


Related issues

Related to Arvados - Bug #13681: [CWL] Cannot set submit-runner-ram when running from WorkbenchResolved07/05/2018

Associated revisions

Revision 04c664d6
Added by Peter Amstutz over 1 year ago

Merge branch '13766-libyaml-workaround' refs #13766

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz over 1 year ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#4 Updated by Peter Amstutz over 1 year ago

  • Related to Bug #13681: [CWL] Cannot set submit-runner-ram when running from Workbench added

#5 Updated by Peter Amstutz over 1 year ago

  • Status changed from In Progress to New
  • Assigned To set to Peter Amstutz

#6 Updated by Peter Amstutz over 1 year ago

  • Wait for stable releases of libyaml / pysch (unknown amount of time)

We'll keep an eye on it, but I think this would be hairy for ops, because it requires compiling and packaging a C library for all our supported platforms.

  • Downgrade ruamel.yaml (messy)

Would revert this change, but may revert other bugfixes we want as well. (A downgrade would also probably inflict a lot of pain on ops).

  • Some kind of regex output fixup in a-c-r to ensure these values are quoted
  • Some kind of regex input fixup on api server to ensure these values are quoted

Hard to write a regex that works in every situation, since it isn't actually parsing YAML.

  • Emit plain block style yaml, not "round trip" (breaks tests, tedious but fixable)

Turns out yaml.safe_dump doesn't work because round_trip_load introduces object types that are not plain dicts, and yaml.dump() has the same problem as yaml.round_trip_dump().

  cannot represent an object: ordereddict([('class', 'CommandLineTool'), ('requirements', [ordereddict([('class', 'DockerRequirement'), ('dockerPull', 'debian:8')])]), ('inputs', [ordereddict([('id', u'#submit_tool.cwl/x'), ('type', 'File'), ('default', ordereddict([('class', 'File'), ('location', u'keep:5d373e7629203ce39e7c22af98a0f881+52/blub.txt')])), ('inputBinding', ordereddict([('position', 1)]))])]), ('outputs', []), ('baseCommand', 'cat'), (u'id', u'#submit_tool.cwl')])
  • Emit plain json (breaks tests, tedious but fixable)

This is the fallback solution.

#7 Updated by Peter Amstutz over 1 year ago

13766-libyaml-workaround @ 335ee76030e85fa6ac4da79b598cf4c4a212443d

  • Use json.dumps() instead of yaml.round_trip_dump()
  • Fix tests.

Another way to avoid this problem would be to make the primary workflow definition reference a collection or git commit, and have the record only store metadata / input parameter definition.

#10 Updated by Peter Amstutz over 1 year ago

  • Status changed from New to In Progress

#11 Updated by Tom Morris over 1 year ago

  • Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint

#12 Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint

#13 Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2018-08-15 Sprint to Arvados Future Sprints

Also available in: Atom PDF