Bug #18679

arvados-api fails to start if API.DisabledAPIs is not explicitly set in config.yml

Added by Javier Bértoli 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
02/02/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

The arvados API code fails to start, with this error:

==> error.log <==
App 38957 output: Error: The application encountered the following error: API.DisabledAPIs expected Hash but was NilClass (RuntimeError)
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:163:in `block in coercion_and_check'
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:80:in `each'
App 38957 output:     /var/www/arvados-api/current/lib/config_loader.rb:80:in `coercion_and_check'
App 38957 output:     /var/www/arvados-api/current/config/arvados_config.rb:229:in `<top (required)>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:53:in `require_relative'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:53:in `<class:Application>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:50:in `<module:Server>'
App 38957 output:     /var/www/arvados-api/current/config/application.rb:48:in `<top (required)>'
App 38957 output:     /var/www/arvados-api/current/config/environment.rb:6:in `require_relative'
App 38957 output:     /var/www/arvados-api/current/config/environment.rb:6:in `<top (required)>'
App 38957 output:     config.ru:7:in `require'
App 38957 output:     config.ru:7:in `block in <main>'
App 38957 output:     /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/builder.rb:125:in `instance_eval'
App 38957 output:     /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/builder.rb:125:in `initialize'
App 38957 output:     config.ru:1:in `new'
App 38957 output:     config.ru:1:in `<main>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:101:in `eval'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:101:in `preload_app'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:189:in `block in <module:App>'
App 38957 output:     /usr/lib/ruby/vendor_ruby/phusion_passenger/loader_shared_helpers.rb:397:in `run_block_and_record_step_progress'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:188:in `<module:App>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>'
App 38957 output:     /usr/share/passenger/helper-scripts/rack-preloader.rb:29:in `<main>'
[ E 2022-01-25 16:45:04.9413 12971/T183 age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/arvados-api/current: The application encountered the following error: API.DisabledAPIs expected Hash but was NilClass (RuntimeError)
  Error ID: 63e03141
  Error details saved to: /tmp/passenger-error-oxPEB0.html

[ E 2022-01-25 16:45:04.9460 12971/Tc age/Cor/Con/CheckoutSession.cpp:281 ]: [Client 3-40] Cannot checkout session because a spawning error occurred. The identifier of the error is 63e03141. Please see earlier logs for details about the error.

Seems like it's only the api server that has trouble with this. Perhaps the bug is in the rails code that ingests the config via `arvados-server config-dump`.

config.yml (3.24 KB) config.yml Javier Bértoli, 02/01/2022 10:34 PM

Subtasks

Task #18711: review 18679-handle-keys-with-null-values-in-configResolvedWard Vandewege

Associated revisions

Revision 447e80a6
Added by Ward Vandewege 4 months ago

Merge branch '18679-handle-keys-with-null-values-in-config'

closes #18679

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege 4 months ago

  • Target version set to 2022-02-16 sprint
  • Priority changed from High to Normal

#2 Updated by Ward Vandewege 4 months ago

  • Target version changed from 2022-02-16 sprint to 2022-02-02 sprint

#3 Updated by Ward Vandewege 4 months ago

  • Assigned To deleted (Javier Bértoli)

#4 Updated by Ward Vandewege 4 months ago

  • Description updated (diff)

The

#5 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2022-02-02 sprint to 2022-02-16 sprint

#6 Updated by Javier Bértoli 4 months ago

Added a config.yml file to reproduce the issue (same config file was working correctly in previous versions).

#7 Updated by Peter Amstutz 4 months ago

  • Assigned To set to Ward Vandewege

#8 Updated by Peter Amstutz 4 months ago

  • Release set to 46

#9 Updated by Ward Vandewege 4 months ago

  • Status changed from New to In Progress

#10 Updated by Ward Vandewege 4 months ago

Running that config file through arvados-server config-dump gives:

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 0s
      DisabledAPIs: null
      KeepServiceRequestTimeout: 0s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 0
      MaxItemsPerResponse: 0
      MaxKeepBlobBuffers: 0
      MaxRequestAmplification: 0
      MaxRequestSize: 0
      MaxTokenLifetime: 0s
      RequestTimeout: 0s
      SendTimeout: 0s
      VocabularyPath: "" 
      WebsocketClientEventQueue: 0
      WebsocketServerEventQueue: 0
...

This is very wrong, none of the defaults are there.

Indeed, after updating config.yml like this:

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:
      DisabledAPIs: {}
...

and re-running arvados-server config-dump:

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 20s
      DisabledAPIs: {}
      KeepServiceRequestTimeout: 15s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 134217728
      MaxItemsPerResponse: 1000
      MaxKeepBlobBuffers: 128
      MaxRequestAmplification: 4
      MaxRequestSize: 134217728
      MaxTokenLifetime: 0s
      RequestTimeout: 5m
      SendTimeout: 1m
      VocabularyPath: "" 
      WebsocketClientEventQueue: 64
      WebsocketServerEventQueue: 4
...

which is as expected. The problem appears to be that the API: section has no values at all in config.yml. Setting any one value makes arvados-server config-dump do the right thing. For example, setting MaxItemsPerResponse:

AutoReloadConfig: False

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:
      MaxItemsPerResponse: 50
...

results in

AutoReloadConfig: false
Clusters:
  harpo:
    API:
      AsyncPermissionsUpdateInterval: 20s
      DisabledAPIs: {}
      KeepServiceRequestTimeout: 15s
      MaxConcurrentRequests: 0
      MaxIndexDatabaseRead: 134217728
      MaxItemsPerResponse: 50
      MaxKeepBlobBuffers: 128
      MaxRequestAmplification: 4
      MaxRequestSize: 134217728
      MaxTokenLifetime: 0s
      RequestTimeout: 5m
      SendTimeout: 1m
      VocabularyPath: "" 
      WebsocketClientEventQueue: 64
      WebsocketServerEventQueue: 4

Okay, the problem is that the config:

AutoReloadConfig: False

Clusters:
  harpo:

    SystemRootToken: "systemroottokenmushaveatleast32characters" 
    ManagementToken: "managementtokenmushaveatleast32characters" 

    API:

    Collections:
      BlobSigningKey: "blobsigningkeymushaveatleast32characters" 
...

Is turned into

...
{
  "AutoReloadConfig": false,
  "Clusters": {
    "harpo": {
      "API": null,
      "Collections": {
        "BlobSigningKey": "blobsigningkeymushaveatleast32characters" 
      },
...

and then we call mergo.Merge(&merged, src, mergo.WithOverride) in lib/config/load.go which overwrites the default API structure with null. Doh.

#11 Updated by Ward Vandewege 4 months ago

Ready for review at e414e9d90df8756ab4fc9dab4d2a5729a51a525c on branch 18679-handle-keys-with-null-values-in-config

Tests passed at developer-run-tests: #2894 , minus wb integration tests, which passed on a re-run here: developer-run-tests-apps-workbench-integration: #3079

#12 Updated by Tom Clegg 4 months ago

This looks like a templating problem to me. The template/operator didn't intend to put a null value in their config.yml, but they did.

FBOW these are documented ways to spell null in yaml:

key1:
key2: ~
key3: null

That said, we don't currently need null to mean anything other than "use the default" and we generally prefer to have a real value like "" or {} if the intent is to say "use the zero value instead of the default", so this seems like a way to accommodate templating artifacts that doesn't cost us anything.

LGTM, thanks

#13 Updated by Ward Vandewege 4 months ago

Tom Clegg wrote:

LGTM, thanks

Thanks, merged.

#14 Updated by Ward Vandewege 4 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Also available in: Atom PDF