Feature #15370
closed[arvados-dispatch-cloud] loopback driver
Added by Tom Clegg over 5 years ago. Updated almost 2 years ago.
Description
- Create() succeeds once, but fails with a quota error if the caller tries to create multiple instances
- Instances() returns the instance that was created, if any
- Destroy() makes the instance disappear from the next Instances() result
- Instance address points to an SSH server (brought up by the driver) that accepts the dispatcher's key and executes shell commands
- If InstanceTypes is empty, it is automatically configured with a single instance type, with the host's RAM/CPU specs
When combined with #14922 this should make crunch-dispatch-local redundant.
This will also facilitate an arvados-dispatch-cloud integration test that uses the real crunch-run program instead of a stub. This might involve a few other changes, like a configurable location for lockfiles.
It's okay that this will be useless (other than single-container test cases) until #14922 is implemented, because it will also make #14922 easier to test.
Related issues
Updated by Tom Clegg over 5 years ago
- Related to Feature #14922: Run multiple containers concurrently on a single cloud VM added
Updated by Tom Clegg over 5 years ago
- Related to Idea #13908: [Epic] Replace SLURM for cloud job scheduling/dispatching added
Updated by Peter Amstutz over 3 years ago
- Target version changed from To Be Groomed to 2021-03-31 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-04-14 sprint to 2021-05-26 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-05-26 sprint to 2021-07-07 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-07 sprint to 2021-07-21 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-21 sprint to 2021-08-04 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-04 sprint to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
2021-09-01 sprint)
Updated by Peter Amstutz over 2 years ago
- Target version set to 2022-04-27 Sprint
Updated by Peter Amstutz over 2 years ago
- Related to Idea #18973: Test combinations of federation scenarios added
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-04-27 Sprint to 2022-05-11 sprint
Updated by Tom Clegg over 2 years ago
- Description updated (diff)
15370-loopback-dispatchcloud @ 34b13b1b9cc34661bf0c6774105ae03b412cbbdb -- developer-run-tests: #3085
(tests are failing because CI image doesn't have rsync)
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-05-11 sprint to 2022-05-25 sprint
Updated by Tom Clegg over 2 years ago
15370-loopback-dispatchcloud @ 2ca82cf645eb7d9dad60f98e1feca67042c38c47 -- developer-run-tests: #3126
Updated by Tom Clegg over 2 years ago
Now tests are failing because the CI image doesn't have docker, so "arv-keepdocker" doesn't work.
Added docker install recipe to arvados-server install
15370-loopback-dispatchcloud @ f07c059fca954e4d001cbf1cb36c845be9d884dd
Updated by Tom Clegg over 2 years ago
15370-loopback-dispatchcloud @ d6497b0fc0464447d6b753809807a1d11d511e50 -- developer-run-tests: #3136
Updated by Tom Clegg over 2 years ago
15370-loopback-dispatchcloud @ d6497b0fc0464447d6b753809807a1d11d511e50 -- developer-run-tests: #3145
Updated by Tom Clegg over 2 years ago
15370-loopback-dispatchcloud @ 731c5e81f5aedc82d03786670610bde68bba27c7 -- developer-run-tests: #3146
Updated by Tom Clegg over 2 years ago
15370-install-docker @ 663f3742a80b1b236d727d2d27068d03a37b4469
Updated by Ward Vandewege over 2 years ago
Updated by Ward Vandewege over 2 years ago
Tom Clegg wrote:
15370-loopback-dispatchcloud @ 731c5e81f5aedc82d03786670610bde68bba27c7 -- developer-run-tests: #3146
I updated the jenkins satellite image to incorporate the changes from main, which means docker should now be present. Running these tests again:
That failed because the jenkins user can't access Docker. I pushed the update that adds docker to the 'test' image, and gives the jenkins user access to Docker, and rebuilt the image once more.
Some different failures here:
developer-run-tests-remainder: #3301 /consoleFull
time="2022-05-20T18:30:33.430743543Z" level=error msg=failed error="Error response from daemon: client version 1.40 is too new. Maximum supported API version is 1.39" 14:30:33 exit status 1 14:30:33 14:30:33 ---------------------------------------------------------------------- 14:30:33 FAIL: build_test.go:27: BuildSuite.TestBuildAndInstall 14:30:33 14:30:33 build_test.go:47: 14:30:33 c.Check(err, check.IsNil) 14:30:33 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc00000e0c0), Stderr:[]uint8(nil)} ("exit status 1") 14:30:33 14:30:33 build_test.go:50: 14:30:33 c.Assert(err, check.IsNil) 14:30:33 ... value *fs.PathError = &fs.PathError{Op:"stat", Path:"/tmp/check-5577006791947779410/0/arvados-server-easy_1.2.3~rc4_amd64.deb", Err:0x2} ("stat /tmp/check-5577006791947779410/0/arvados-server-easy_1.2.3~rc4_amd64.deb: no such file or directory") 14:30:33 14:30:33 OOPS: 0 passed, 1 FAILED 14:30:33 --- FAIL: Test (0.72s)
Hmm, we were using an old Buster base image. I've bumped it to the latest and am re-building the image now. Maybe that will fix the version difference? I've also added user_allow_other to /etc/fuse.conf in the image, which should fix the other problem in lib/crunchrun tests:
packer-build-jenkins-image-arvados-tests: #82
Here we go:
Updated by Tom Clegg over 2 years ago
I suspect adding a non-functional docker may have broken the main branch build by failing tests that were skipped when there was no docker
in PATH. Might need to c.Skip() the arvados-package test for the time being. Checking:
15370-docker-tests @ 36cfafd6e7eae2784c22aefdd9df26783412d42a -- developer-run-tests: #3156
Looks like the same applies to integrationSuite.TestRunTrivialContainerWithLocalKeepstore in lib/crunchrun.
Updated by Tom Clegg over 2 years ago
15370-docker-tests @ 6caeb0768adabd32b50cc2ca6eb49d162745c4b0 -- developer-run-tests: #3157
(only wb1 integration tests failed there)
Updated by Tom Clegg over 2 years ago
Merged main:
15370-loopback-dispatchcloud @ 3fa6aa4043286ad61e5f29c136d3cc2942e8750d -- developer-run-tests: #3158
Looks like I have some more work to do.
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-05-25 sprint to 2022-06-08 sprint
Updated by Tom Clegg over 2 years ago
The cmd/arvados-package test (which we used to skip because it requires docker) fails because it takes longer than 10m. I updated run-tests.sh to change the timeout to 20m for that suite, but I also updated the jenkins config to skip it in [developer-]run-tests-remainder. We can re-enable it after either (a) changing the image prep so a new jenkins worker has a cached build image (which makes the cmd/arvados-package test run much faster) or (b) moving it to a separate run-tests-package / developer-run-tests-package jenkins job.
Also fixed a "missing keep data dir" testing bug, a docker client usage bug, and a flaky error log test.
15370-loopback-dispatchcloud @ bad877eb1d1a84d25c1fab3592e4218774816179 -- developer-run-tests: #3162
retry wb1 developer-run-tests-apps-workbench-integration: #3387
Updated by Ward Vandewege over 2 years ago
Tom Clegg wrote:
The cmd/arvados-package test (which we used to skip because it requires docker) fails because it takes longer than 10m. I updated run-tests.sh to change the timeout to 20m for that suite, but I also updated the jenkins config to skip it in [developer-]run-tests-remainder. We can re-enable it after either (a) changing the image prep so a new jenkins worker has a cached build image (which makes the cmd/arvados-package test run much faster) or (b) moving it to a separate run-tests-package / developer-run-tests-package jenkins job.
Also fixed a "missing keep data dir" testing bug, a docker client usage bug, and a flaky error log test.
15370-loopback-dispatchcloud @ bad877eb1d1a84d25c1fab3592e4218774816179 -- developer-run-tests: #3162
retry wb1 developer-run-tests-apps-workbench-integration: #3387
Is there a reason to pin on a docker API version that is so old? Latest is 1.41, and we're pinning on 1.21.
Otherwise, LGTM, thanks.
Updated by Tom Clegg over 2 years ago
Ward Vandewege wrote:
Is there a reason to pin on a docker API version that is so old? Latest is 1.41, and we're pinning on 1.21.
Sort of. I just figured old API versions are supported for a long time, so there's no particular hurry to use a newer one, in which case we might as well use the same version we use in crunch-run.
If someone wants to use docker 1.9 to build packages, who am I to say no...
Updated by Ward Vandewege over 2 years ago
Tom Clegg wrote:
Ward Vandewege wrote:
Is there a reason to pin on a docker API version that is so old? Latest is 1.41, and we're pinning on 1.21.
Sort of. I just figured old API versions are supported for a long time, so there's no particular hurry to use a newer one, in which case we might as well use the same version we use in crunch-run.
If someone wants to use docker 1.9 to build packages, who am I to say no...
Hmm, actually docker 1.9 would be a problem, the on-disk image format is different (we went through that whole painful migration in #8568 etc). I don't think anyone is using docker that old anymore.
If we're really going to default to a api version that old, there should be a comment in the code that states there is no actual reason for this, only a desire for maximal backwards compatibility.
This would avoid future concern about upping the API version when - inevitably - we'll run into a version of the Docker Engine that doesn't work with an API that old anymore.
It looks like the Docker API 1.21 was introduced with Docker 1.9.0, in 2015-11-03, that's really old.For reference:
- Debian 10 (buster) ships with Docker 1.18.09 which has Docker API 1.39
- Ubuntu 18.04 (bionic) originally shipped with Docker 1.17.12 which has Docker API 1.35
Of course we use the docker package repos to install more recent versions of Docker. Even on CentOS 7 it seems that a recent docker is easily installed, cf. https://docs.docker.com/engine/install/centos/.
Updated by Tom Clegg over 2 years ago
15370-loopback-dispatchcloud @ bac1772ab074713e3c50632a4cad3cc1ce50d0ec -- developer-run-tests: #3163
updated crunch-run to docker API 1.35 and exported it as a const so arvados-package can stay in sync.
Updated by Ward Vandewege over 2 years ago
Tom Clegg wrote:
15370-loopback-dispatchcloud @ bac1772ab074713e3c50632a4cad3cc1ce50d0ec -- developer-run-tests: #3163
updated crunch-run to docker API 1.35 and exported it as a const so arvados-package can stay in sync.
Thank you that's great. LGTM!
Updated by Tom Clegg over 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|86660414472d4ff0d8267f9845a753497bd41692.