Bug #8669

[SDKs] PySDK fails to load CAs for SSL verification when run inside Conda

Added by Sarah Guthrie over 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
03/09/2016
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

(02:51:59 PM) sguthrie: oh dear, what is this? arvados.errors.KeepReadError: failed to read [-----]: service https://keep.qr1hi.arvadosapi.com:443/ responded with 0 (77, 'error setting certificate verify locations:\n CAfile: /etc/pki/tls/certs/ca-bundle.crt\n CApath: none')
(02:54:16 PM) brett: So, in order to verify the Keep proxy's SSL certificate, your client has to load a list of trusted certificate authorities.
(02:54:27 PM) brett: It looks for those at /etc/ssl/certs/ca-certificates.crt.
(02:54:43 PM) brett: If it doesn't find any there, it looks at the /etc/pki path in your error message.
(02:54:56 PM) brett: And if that fails it's supposed to fall back to a file from Python itself.
(02:55:21 PM) brett: Can you run ls l on both the /etc/ssl path I gave, and the /etc/pki path in your error message, and paste the results?
(02:55:56 PM) sguthrie: -rw-r--r-
1 root root 274340 Feb 26 20:47 /etc/ssl/certs/ca-certificates.crt
(02:56:31 PM) sguthrie: ls: cannot access /etc/pki/tls/certs/ca-bundle.crt: No such file or directory
(02:59:23 PM) brett: You ran those ls'es in the same environment as arv keep put? Same system and Docker container (if any)?
(02:59:28 PM) sguthrie: yep
(03:00:16 PM) brett: The error happens basically immediately, I'm assuming?
(03:00:49 PM) sguthrie: as soon as it starts trying to copy something from keep
(03:02:14 PM) brett: And it happens reliably?
(03:03:57 PM) sguthrie: 3/3

  File "/home/sguthrie/anaconda2/bin/arv-copy", line 4, in <module>
    main()
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 136, in main
    src_arv, dst_arv, args)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 290, in copy_pipeline_template
    pt = copy_collections(pt, src, dst, args)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in copy_collections
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "/home/sguthrie/anaconda2/lib/python2.7/_abcoll.py", line 568, in update
    for key, value in other:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in <genexpr>
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in copy_collections
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "/home/sguthrie/anaconda2/lib/python2.7/_abcoll.py", line 568, in update
    for key, value in other:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in <genexpr>
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in copy_collections
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "/home/sguthrie/anaconda2/lib/python2.7/_abcoll.py", line 568, in update
    for key, value in other:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in <genexpr>
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in copy_collections
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "/home/sguthrie/anaconda2/lib/python2.7/_abcoll.py", line 568, in update
    for key, value in other:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in <genexpr>
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in copy_collections
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "/home/sguthrie/anaconda2/lib/python2.7/_abcoll.py", line 568, in update
    for key, value in other:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 342, in <genexpr>
    for v in obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 337, in copy_collections
    obj = arvados.util.portable_data_hash_pattern.sub(copy_collection_fn, obj)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 327, in copy_collection_fn
    dst_col = copy_collection(src_id, src, dst, args)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/arv_copy.py", line 577, in copy_collection
    data = src_keep.get(word)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/retry.py", line 153, in num_retries_setter
    return orig_func(self, *args, **kwargs)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/keep.py", line 980, in get
    "failed to read {}".format(loc_s), service_errors, label="service")
arvados.errors.KeepReadError: failed to read [-----]: service https://keep.qr1hi.arvadosapi.com:443/ responded with 0 (77, 'error setting certificate verify locations:\n  CAfile: /etc/pki/tls/certs/ca-bundle.crt\n  CApath: none')

History

#1 Updated by Sarah Guthrie over 4 years ago

  • Description updated (diff)

#2 Updated by Sarah Guthrie over 4 years ago

This hit cwl-runner/arv-run (3/3 times):

sguthrie@sguthrie-System-Product-Name:~/tmp_snap_gatk/SNAP_GATK_NA24385-template-workflow$ cwl-runner --verbose main-SNAP_GATK_NA24385-template.cwl main-SNAP_GATK_NA24385-template-samples.json 
/home/sguthrie/anaconda2/bin/cwl-runner 1.0.20160310140736
2016-03-10 16:25:09 arvados.cwl-runner[26025] INFO: Job prep_samples (su92l-8i9sb-lbhenjt91oury8r) is Queued
2016-03-10 16:25:09 arvados.arv-run[26025] INFO: Upload local files: "SNAP_GATK_NA24385.csv" 
Unexpected exception
Traceback (most recent call last):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 472, in job
    **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/draft2tool.py", line 162, in job
    builder.pathmapper = self.makePathMapper(reffiles, input_basedir, **kwargs)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados_cwl/__init__.py", line 286, in makePathMapper
    return ArvPathMapper(self.arvrunner, reffiles, input_basedir, **kwargs)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados_cwl/__init__.py", line 260, in __init__
    fnPattern="$(task.keep)/%s/%s")
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/commands/run.py", line 151, in uploadfiles
    item = api.collections().create(body={"owner_uuid": project, "manifest_text": collection.manifest_text()}).execute()
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/collection.py", line 355, in manifest_text
    self.finish_current_stream()
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/collection.py", line 318, in finish_current_stream
    self.flush_data()
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/collection.py", line 264, in flush_data
    copies=self.replication))
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/retry.py", line 153, in num_retries_setter
    return orig_func(self, *args, **kwargs)
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados/keep.py", line 1065, in put
    data_hash, copies, thread_limiter.done()), service_errors, label="service")
KeepWriteError: failed to write 9156ed3e3cde84c45a2fde543f55fc15 (wanted 2 copies but wrote 0): service https://keep.su92l.arvadosapi.com:443/ responded with 0 (77, 'error setting certificate verify locations:\n  CAfile: /etc/pki/tls/certs/ca-bundle.crt\n  CApath: none')
Exception on step 'alignment'
2016-03-10 16:25:38 arvados.cwl-runner[26025] ERROR: Caught unhandled exception, marking pipeline as failed
Traceback (most recent call last):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/arvados_cwl/__init__.py", line 385, in arvExecutor
    for runnable in jobiter:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 370, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 287, in job
    for newjob in step.iterable:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 243, in try_make_job
    for j in jobs:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 528, in dotproduct_scatter
    for j in process.job(jo, basedir, functools.partial(rc.receive_scatter_output, n), **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 145, in job
    for j in self.step.job(joborder, basedir, output_callback, **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 472, in job
    **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 370, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 287, in job
    for newjob in step.iterable:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 243, in try_make_job
    for j in jobs:
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 145, in job
    for j in self.step.job(joborder, basedir, output_callback, **kwargs):
  File "/home/sguthrie/anaconda2/lib/python2.7/site-packages/cwltool/workflow.py", line 479, in job
    raise WorkflowException(str(e))
WorkflowException: failed to write 9156ed3e3cde84c45a2fde543f55fc15 (wanted 2 copies but wrote 0): service https://keep.su92l.arvadosapi.com:443/ responded with 0 (77, 'error setting certificate verify locations:\n  CAfile: /etc/pki/tls/certs/ca-bundle.crt\n  CApath: none')

#3 Updated by Brett Smith over 4 years ago

  • Subject changed from [CLI] arv copy fails with 'error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt, CApath: none' to [SDKs] KeepClient error loading CAs: 'error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt, CApath: none'

Sally,

Would it be possible to get a few minutes on the system where this happens? (sguthrie-System-Product-Name, apparently.) This code is necessarily specific to the operating system it runs on, and apparently there's something funny about the environment leading to this.

#4 Updated by Brett Smith over 4 years ago

  • Target version set to Arvados Future Sprints

#5 Updated by Brett Smith about 4 years ago

Another user reported this issue. They were also using Conda. I'm starting to think that's the common thread here, although I still don't understand how yet.

#6 Updated by Brett Smith about 4 years ago

  • Subject changed from [SDKs] KeepClient error loading CAs: 'error setting certificate verify locations: CAfile: /etc/pki/tls/certs/ca-bundle.crt, CApath: none' to [SDKs] PySDK fails to load CAs for SSL verification when run inside Conda

User reported that they could successfully run tools inside a regular virtualenv, but got the error when running inside Conda, on the same box.

I'm guessing that Conda includes a modified version of one of our dependencies (possibly even in the stdlib) that breaks this loading somehow. Exactly how, I'm still not sure, but that would be the first place I would look.

#7 Updated by Tom Clegg about 4 years ago

The fact that the error happens when connecting to keepproxy suggests the error comes from curl (via pycurl). We use our util.ca_certs_path function to detect and configure the cert path for google-api-client requests, which go through httplib2, but I don't think we do this for Keep requests, which go through curl/pycurl. Perhaps the curl or pycurl package in Conda is set up for Red Hat config?

#8 Updated by Brett Smith about 4 years ago

Tom Clegg wrote:

The fact that the error happens when connecting to keepproxy suggests the error comes from curl (via pycurl). We use our util.ca_certs_path function to detect and configure the cert path for google-api-client requests, which go through httplib2, but I don't think we do this for Keep requests, which go through curl/pycurl. Perhaps the curl or pycurl package in Conda is set up for Red Hat config?

That looks plausible. The fix may be to use util.ca_certs_path when we set up the curl client, to tell it to look up CA certs from the same place.

#9 Updated by Brett Smith about 4 years ago

The above might be a good idea even if it doesn't resolve the immediate bug in Conda. I feel like having the entire SDK use the same CA database provides valuable consistency for users.

Also available in: Atom PDF