Project

General

Profile

Actions

Bug #13766

open

[API] bug in libyaml / Pysch used by API server to parse yaml

Added by Peter Amstutz about 6 years ago. Updated 7 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Since 0.13.11 ruamel.yaml produces files like this:

  - id: #main/x
    type: File
    default: {class: File, location: keep:169f39d466a5438ac4a90e779bf750c7+53/blorp.txt,
      size: 16, basename: blorp.txt, nameroot: blorp, nameext: .txt}

The issue comes up from keep:169f39d466a5438ac4a90e779bf750c7+53/blorp.txt in "flow style" has an embedded ':' character.

Relevant changelog:

(0.13.11) 2017-01-23:
  - allow ':' in flow style scalars if not followed by space. Also don't
    quote such scalar as this is no longer necessary.
  - add python 3.6 manylinux wheel to PyPI

This is explicitly allowed by the YAML spec, however the API server rejects it:

<HttpError 422 when requesting https://172.17.0.2:8000/arvados/v1/workflows?alt=json returned "Definition is not valid yaml abc: (<unknown>): found unexpected ':' while scanning a plain scalar at line 22 column 38">

This seems to be a bug in libyaml, which is used by Pysch (the Ruby YAML engine). It was recently fixed:

https://github.com/yaml/libyaml/pull/104

However at the time of this writing there is no stable release of libyaml with this bugfix, only a prerelease 0.2.2-pre1

Pysch is part of the standard library. I don't know if it is tied to a particular libyaml, and how to upgrade which libyaml gets used.

Options:

  • Wait for stable releases of libyaml / pysch (unknown amount of time)
  • Downgrade ruamel.yaml (messy)
  • Some kind of regex output fixup in a-c-r to ensure these values are quoted
  • Some kind of regex input fixup on api server to ensure these values are quoted
  • Emit plain block style yaml, not "round trip" (breaks tests, tedious but fixable)
  • Emit plain json (breaks tests, tedious but fixable)

Subtasks 2 (1 open1 closed)

Task #13769: Review 13766-libyaml-workaroundResolvedPeter Amstutz07/09/2018Actions
Task #13848: Monitor dependencies for release of updateIn Progress07/09/2018Actions

Related issues

Related to Arvados - Bug #13681: [CWL] Cannot set submit-runner-ram when running from WorkbenchResolvedPeter Amstutz07/05/2018Actions
Actions #1

Updated by Peter Amstutz about 6 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 6 years ago

  • Related to Bug #13681: [CWL] Cannot set submit-runner-ram when running from Workbench added
Actions #5

Updated by Peter Amstutz about 6 years ago

  • Status changed from In Progress to New
  • Assigned To set to Peter Amstutz
Actions #6

Updated by Peter Amstutz about 6 years ago

  • Wait for stable releases of libyaml / pysch (unknown amount of time)

We'll keep an eye on it, but I think this would be hairy for ops, because it requires compiling and packaging a C library for all our supported platforms.

  • Downgrade ruamel.yaml (messy)

Would revert this change, but may revert other bugfixes we want as well. (A downgrade would also probably inflict a lot of pain on ops).

  • Some kind of regex output fixup in a-c-r to ensure these values are quoted
  • Some kind of regex input fixup on api server to ensure these values are quoted

Hard to write a regex that works in every situation, since it isn't actually parsing YAML.

  • Emit plain block style yaml, not "round trip" (breaks tests, tedious but fixable)

Turns out yaml.safe_dump doesn't work because round_trip_load introduces object types that are not plain dicts, and yaml.dump() has the same problem as yaml.round_trip_dump().

  cannot represent an object: ordereddict([('class', 'CommandLineTool'), ('requirements', [ordereddict([('class', 'DockerRequirement'), ('dockerPull', 'debian:8')])]), ('inputs', [ordereddict([('id', u'#submit_tool.cwl/x'), ('type', 'File'), ('default', ordereddict([('class', 'File'), ('location', u'keep:5d373e7629203ce39e7c22af98a0f881+52/blub.txt')])), ('inputBinding', ordereddict([('position', 1)]))])]), ('outputs', []), ('baseCommand', 'cat'), (u'id', u'#submit_tool.cwl')])
  • Emit plain json (breaks tests, tedious but fixable)

This is the fallback solution.

Actions #7

Updated by Peter Amstutz about 6 years ago

13766-libyaml-workaround @ 335ee76030e85fa6ac4da79b598cf4c4a212443d

  • Use json.dumps() instead of yaml.round_trip_dump()
  • Fix tests.

Another way to avoid this problem would be to make the primary workflow definition reference a collection or git commit, and have the record only store metadata / input parameter definition.

Actions #10

Updated by Peter Amstutz about 6 years ago

  • Status changed from New to In Progress
Actions #11

Updated by Tom Morris about 6 years ago

  • Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Actions #12

Updated by Peter Amstutz about 6 years ago

  • Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
Actions #13

Updated by Peter Amstutz about 6 years ago

  • Target version changed from 2018-08-15 Sprint to Arvados Future Sprints
Actions #14

Updated by Peter Amstutz about 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #15

Updated by Peter Amstutz over 1 year ago

  • Release set to 60
Actions #16

Updated by Peter Amstutz 7 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF