Project

General

Profile

Actions

Bug #22124

closed

Cannot upload data through keep-web (workbench)

Added by Lucas Di Pentima 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

Since the #21773 merge to main, our test-provision-* Jenkins jobs started to fail (for example: test-provision-debian12: #109 ). I've tried deploying a test cluster with the latest dev packages and got the issue of not being able to upload files through keep-web or keepproxy, although was able to download data just fine either through keep-web (using Workbench) or keepproxy (using arv-get from the outside)

Uploading data directly through keep (i.e.: using arv-put from the shell node) works correctly, so it seems the issue is related to server components that act as keep clients.

Also, tried uploading a file to tordo using Workbench, and got an error too.

While doing an upload through keep-web, I captured both keep-web and keepstore logs:

Keep-web

Sep 12 18:47:13 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-z94iriqcntir1ivaiwry","level":"info","msg":"request","remoteAddr":"10.1.1.15:48972","reqBytes":0,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"OPTIONS","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","time":"2024-09-12T18:47:13.988937205Z"}
Sep 12 18:47:13 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-z94iriqcntir1ivaiwry","level":"info","msg":"response","priority":1,"queue":"api","remoteAddr":"10.1.1.15:48972","reqBytes":0,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"OPTIONS","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","respBytes":0,"respStatus":"OK","respStatusCode":200,"time":"2024-09-12T18:47:13.989037619Z","timeToStatus":0.000095,"timeTotal":0.000095,"timeWriteBody":0.000000}
Sep 12 18:47:14 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-178lb5z57die9p8gz5ti","level":"info","msg":"request","remoteAddr":"10.1.1.15:48982","reqBytes":10718439,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"PUT","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","time":"2024-09-12T18:47:14.161948403Z"}
Sep 12 18:47:14 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-178lb5z57die9p8gz5ti","collection_file_path":"report.pdf","collection_uuid":"xarv4-4zz18-1npep8rzy4f784i","level":"info","msg":"File upload","remoteAddr":"10.1.1.15:48982","reqBytes":10718439,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"PUT","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","time":"2024-09-12T18:47:14.212947790Z","user_full_name":"Lucas Di Pentima","user_uuid":"xarv4-tpzed-3ma928fbhnnejs2"}
Sep 12 18:47:43 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-178lb5z57die9p8gz5ti","error":"sync failed: Could not write sufficient replicas: [405] method not allowed","level":"error","msg":"update() returned *errors.errorString error, changing response to HTTP 500","remoteAddr":"10.1.1.15:48982","reqBytes":10718439,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"PUT","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","time":"2024-09-12T18:47:43.191583205Z"}
Sep 12 18:47:43 workbench keep-web[15367]: {"ClusterID":"xarv4","PID":15367,"RequestID":"req-178lb5z57die9p8gz5ti","level":"info","msg":"response","priority":1,"queue":"api","remoteAddr":"10.1.1.15:48982","reqBytes":10718439,"reqForwardedFor":"123.123.123.123","reqHost":"download.xarv4.example.com","reqMethod":"PUT","reqPath":"c=xarv4-4zz18-1npep8rzy4f784i/report.pdf","reqQuery":"","respBody":"sync failed: Could not write sufficient replicas: [405] method not allowed\n","respBytes":75,"respStatus":"Internal Server Error","respStatusCode":500,"time":"2024-09-12T18:47:43.191700379Z","timeToStatus":29.029711,"timeTotal":29.029750,"timeWriteBody":0.000038}

Keepstore

Sep 12 18:47:43 keep0 keepstore[7895]: {"ClusterID":"xarv4","PID":7895,"RequestID":"req-19bpla8nn24vr1b8rj5g","level":"info","msg":"request","remoteAddr":"10.1.1.15:54916","reqBytes":10718439,"reqForwardedFor":"","reqHost":"10.1.2.13:25107","reqMethod":"PUT","reqPath":"/23e12771d170009f23c89b40481718f3","reqQuery":"","time":"2024-09-12T18:47:43.190142794Z"}
Sep 12 18:47:43 keep0 keepstore[7895]: {"ClusterID":"xarv4","PID":7895,"RequestID":"req-19bpla8nn24vr1b8rj5g","level":"info","msg":"response","priority":0,"queue":"api","remoteAddr":"10.1.1.15:54916","reqBytes":10718439,"reqForwardedFor":"","reqHost":"10.1.2.13:25107","reqMethod":"PUT","reqPath":"/23e12771d170009f23c89b40481718f3","reqQuery":"","respBytes":0,"respStatus":"Moved Permanently","respStatusCode":301,"time":"2024-09-12T18:47:43.190257435Z","timeToStatus":0.000103,"timeTotal":0.000107,"timeWriteBody":0.000004}
Sep 12 18:47:43 keep0 keepstore[7895]: {"ClusterID":"xarv4","PID":7895,"RequestID":"req-19bpla8nn24vr1b8rj5g","level":"info","msg":"request","remoteAddr":"10.1.1.15:54930","reqBytes":0,"reqForwardedFor":"","reqHost":"10.1.2.13:25107","reqMethod":"GET","reqPath":"23e12771d170009f23c89b40481718f3","reqQuery":"","time":"2024-09-12T18:47:43.191265893Z"}
Sep 12 18:47:43 keep0 keepstore[7895]: {"ClusterID":"xarv4","PID":7895,"RequestID":"req-19bpla8nn24vr1b8rj5g","level":"info","msg":"response","priority":0,"queue":"api","remoteAddr":"10.1.1.15:54930","reqBytes":0,"reqForwardedFor":"","reqHost":"10.1.2.13:25107","reqMethod":"GET","reqPath":"23e12771d170009f23c89b40481718f3","reqQuery":"","respBody":"method not allowed\n","respBytes":19,"respStatus":"Method Not Allowed","respStatusCode":405,"time":"2024-09-12T18:47:43.191373814Z","timeToStatus":0.000093,"timeTotal":0.000099,"timeWriteBody":0.000006}

Notes

  • As a first observation, I didn't found any req-id matches between keep-web and keepstore, that might be a sign of a bug, since req ids should be preserved.
  • Keepstore's response to the PUT request was a 301, not sure that's correct.
  • Keep-web's subsequent GET request gets a 405 (method not allowed?) response from Keepstore.

Not sure why the changes made in #21822 would cause these behaviors.


Subtasks 2 (0 open2 closed)

Task #22125: Review 22124-keepstore-put-movedResolvedTom Clegg09/23/2024Actions
Task #22140: Review 22124-keepstore-double-slashResolvedTom Clegg09/25/2024Actions

Related issues

Related to Arvados - Feature #21773: keep-web and keepproxy should use Services.Keepstore.InternalURLs configs directly instead of requesting /arvados/v1/keep_services/accessibleResolvedTom CleggActions
Actions #1

Updated by Lucas Di Pentima 2 months ago

  • Related to Feature #21773: keep-web and keepproxy should use Services.Keepstore.InternalURLs configs directly instead of requesting /arvados/v1/keep_services/accessible added
Actions #2

Updated by Tom Clegg 2 months ago

  • Status changed from New to In Progress
Actions #3

Updated by Tom Clegg 2 months ago

22124-keepstore-put-moved @ 482412d27c0b25bc8fbe57ade0327dbff7f2c840 -- developer-run-tests: #4446

Pretty sure this will fix it.

I'd like to add some more testing, and turn off the redirect-if-double-slash behavior for keepstore, but in the meantime we could merge this and confirm it fixes test-provision.

Actions #4

Updated by Lucas Di Pentima 2 months ago

LGTM, and agree it would be convenient to add more testing to catch this kind of issues earlier. Thanks!

Actions #5

Updated by Tom Clegg about 2 months ago

22124-keepstore-double-slash @ c81c3e3763835141da8e5d62de6550e97416f35a -- developer-run-tests: #4454
  • Trim trailing slashes in ARVADOS_KEEP_SERVICES env var.
  • Reject requests with double slash, except GET //locator.

Longer explanation, from commit message:

    22124: Reject requests with double slash, except GET //locator.

    Previously we responded to "PUT //locator" with a 301 to a cleaned
    path, which did not work because 301 tells clients to repeat the PUT
    request as a GET.  Better to just respond 400, since this means the
    client is broken or misconfigured.

    There may be some clients that have been working successfully with the
    301 redirect behavior because they only do GET and HEAD requests.  To
    avoid breaking these, we continue to accept "GET //locator" -- but now
    we just return the data instead of a redirect to "GET /locator".
Actions #6

Updated by Lucas Di Pentima about 2 months ago

This LGTM, thanks!

Actions #7

Updated by Tom Clegg about 2 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF