Idea #9623
closed
[Crunch2] [API] Reuses Containers to satisfy ContainerRequests
Added by Brett Smith over 8 years ago.
Updated over 8 years ago.
Release relationship:
Auto
Description
In ContainerRequest#resolve, before creating a new Container, find out whether a Container already exists with
- Exactly the same [command, cwd, environment, output_path, container_image, mounts, runtime_constraints] as the Container we're about to create
- state in [Queued, Locked, Running, Complete]
- (if state=Complete) exit_code==0
If multiple matching containers are found, take the first of:
- The most recent container with state=Complete
- The best Running container (order by progress desc, started_at asc -- IOW, most likely to finish soonest)
- The best Locked container (order by priority desc, created_at asc -- IOW, most likely to start soonest)
- The best Queued container (order by priority desc, created_at asc -- IOW, most likely to start soonest)
In order to efficiently compare equality of serialized fields like mounts, runtime_constraints, and environment, we must:
- Recursively sort hash keys before putting them in the database. This can be done in a Container before_save hook.
- Recursively sort hash keys before using them in a query like "where mounts=?". This must be done during ContainerRequest#resolve. (The keys do not need to be sorted in the ContainerRequest's serialized attrs.)
For now, we will
not implement:
- A mechanism allowing a client to skip re-use and force creation of a new container
- Category set to API
- Assigned To set to Tom Clegg
- Description updated (diff)
- Description updated (diff)
- Description updated (diff)
- Target version set to Arvados Future Sprints
- Story points set to 2.0
- Assigned To deleted (
Tom Clegg)
- Target version changed from Arvados Future Sprints to 2016-09-14 sprint
- Assigned To set to Lucas Di Pentima
- Status changed from New to In Progress
- As Tom clarified in IRC, please split the big SQL query in find_reusable into a few as below
- For state “Complete” : “select state=Complete group by output limit 2”. If you get 2, it means we have two different outputs and we do not want to reuse either. If you got just 1, do another query “select where state=Complete order by finished_at limit 1” and use this oldest record.
- If no containers in Complete state (none or can’t reuse), get all containers in state = Running order by progress desc, started_at asc limit 1 and use it
- Otherwise, get containers in state [Locked, Queued] order by state, priority desc, created_at asc, limit 1 and use it
- It appears that the deep_sort_hash is copied over from job.rb. Please consider moving it to arvados_base from job instead of copying
- Please remove output from other containers fixtures that are in non-Complete state
- container_test.rb
- You do not need to_json in these asserts in test “assert_equal c.environment.to_json, Container.deep_sort_hash(env).to_json”
- Progress is between 0.0 to 1.0, so it would be nice if you change progress 15.0 to 0.15 etc
- The test “find_reusable method should select running container most likely to finish sooner” has same progress for both first and second. Can you please add a test where progress is higher and that it gets selected
- Can you add another test like “find_reusable method should not select failed container” where there is one failed and one running (or any other state you like) and verify that the other gets selected?
- Ideally these new tests in container_test.rb could have been in container_request_test.rb so that we can see the reuse logic working as desired. Can you at least add a couple tests in container_request_test such that:
- CR1 in Committed state; CR2 in Uncommitted state + assert container_uuid is nil; update CR2 to Committed state and assert CR2.container_uuid == CR1.container_uuid
- CR1 in Committed state; CR2 with slightly different attributes in Uncommitted state + assert container_uuid is nil; update CR2 to Committed state and assert CR2.container_uuid != CR1.container_uuid
Previous corrections addressed at a02b4f41cb495ecfea49ac0da1f6da0aec385f7d
Added an additional test to check that no completed container is selected if there are more than 1 different outputs.
Regarding the deep_sort_hash(env).to_json comment: If we just compare hashes and not serialized hashes, the assert will be successful even if the hashes are sorted differently, that's why they're serialized on the assertion.
Complete test suite being run @ https://ci.curoverse.com/job/developer-test-job/193/
at a02b4f4
- find_reusable
- The "completed" SQL is pretty awesome :)
- The completed.order should be “finished_at asc” so that we fetch the “oldest” completed CR
- You can please accomplish the “Locked” and “Queued” state handling in one query using order by state asc
- In test "Container request #{(env1 == env2) ? 'does' : 'does not'} reuse container when committed" , please add a not_nil assert on cr1.container_uuid at line 416 (or check cr2.container_uuid is not_nil after line 422)
- In test "find_reusable method should not select completed container when inconsistent outputs exist”, you can call c_older and c_rerent as c_output1 and c_output2 instead
- Nit: If you don’t mind, please replace the “winner” with “selected or chosen container” in the tests?
Thanks.
Comments addressed at 3c69428
Thanks for the compliment! :)
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:6430b728cbce69b56adc01432fc9351088724efd.
Also available in: Atom
PDF