Project

General

Profile

Actions

Idea #15960

open

Computing on external data

Added by Peter Amstutz almost 5 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
08/01/2024
Due date:
03/31/2025 (Due in about 3 months)
Story points:
-
Release:
Release relationship:
Auto

Description

Right now, the feature of automatic HTTP download in cwl-runner is effectively fulfilling this function for users (although it copies it into the local keepstore). Users would probably like it if it were expanded to also support copying s3:// URLs.

However, the big idea for this epic is on-demand retrieval from external storage -- we fetch the data from the external system on demand.

Previous designs involved reading all the data to generate content hashes.

The current design is outlined in https://dev.arvados.org/issues/21936 and involves storing locators to external data in the manifest. The block identifiers are based on hashing the locator (and other metadata) instead of the content.


Related issues 4 (4 open0 closed)

Related to Arvados - Feature #8570: [Crunch2] Impure access to object storeNewActions
Related to Arvados - Feature #8569: [Crunch2] Impure mount from host fsNewActions
Related to Arvados - Idea #17348: Example workflow template which streams data from S3 in first step, does some computation steps, and uploads results back to S3.NewActions
Related to Arvados - Idea #21936: Minimum viable external data access featureNewActions
Actions #1

Updated by Peter Amstutz almost 5 years ago

  • Start date set to 04/01/2020
  • Due date set to 06/30/2020
Actions #2

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 06/30/2020 to 07/31/2020
Actions #3

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 05/01/2020 to 04/01/2020
  • Due date changed from 07/31/2020 to 06/30/2020
Actions #4

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 06/30/2020 to 07/31/2020
Actions #5

Updated by Peter Amstutz almost 5 years ago

  • Related to Feature #8570: [Crunch2] Impure access to object store added
Actions #6

Updated by Peter Amstutz almost 5 years ago

  • Related to Feature #8569: [Crunch2] Impure mount from host fs added
Actions #7

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 05/01/2020 to 04/01/2020
Actions #8

Updated by Peter Amstutz almost 5 years ago

  • Due date changed from 07/31/2020 to 07/01/2020
Actions #9

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 07/01/2020 to 08/01/2020
Actions #10

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 05/01/2020 to 08/01/2020
  • Due date changed from 08/01/2020 to 11/30/2020
Actions #11

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 08/01/2020 to 05/01/2020
  • Due date changed from 11/30/2020 to 06/30/2020
Actions #12

Updated by Peter Amstutz over 4 years ago

  • Due date changed from 06/30/2020 to 07/31/2020
Actions #13

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 05/01/2020 to 05/20/2020
Actions #14

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 05/20/2020 to 06/03/2020
  • Due date changed from 07/31/2020 to 08/31/2020
Actions #15

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 06/03/2020 to 06/17/2020
  • Due date changed from 08/31/2020 to 09/16/2020
Actions #16

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 06/17/2020 to 07/29/2020
  • Due date changed from 09/16/2020 to 11/11/2020
Actions #17

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 07/29/2020 to 10/01/2020
  • Due date changed from 11/11/2020 to 01/31/2021
Actions #18

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 10/01/2020 to 01/01/2021
  • Due date changed from 01/31/2021 to 04/30/2021
Actions #19

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 01/01/2021 to 04/01/2021
  • Due date changed from 04/30/2021 to 07/31/2021
Actions #20

Updated by Peter Amstutz almost 4 years ago

  • Start date changed from 04/01/2021 to 07/01/2021
  • Due date changed from 07/31/2021 to 11/30/2021
Actions #21

Updated by Peter Amstutz almost 4 years ago

  • Related to Idea #17348: Example workflow template which streams data from S3 in first step, does some computation steps, and uploads results back to S3. added
Actions #22

Updated by Peter Amstutz over 3 years ago

  • Start date changed from 07/01/2021 to 08/01/2021
  • Due date changed from 11/30/2021 to 12/31/2021
Actions #23

Updated by Peter Amstutz over 3 years ago

  • Start date changed from 08/01/2021 to 09/01/2021
Actions #24

Updated by Peter Amstutz over 3 years ago

  • Start date changed from 09/01/2021 to 10/01/2021
  • Due date changed from 12/31/2021 to 01/31/2022
Actions #25

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 10/01/2021 to 01/01/2022
  • Due date changed from 01/31/2022 to 06/30/2022
Actions #26

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 01/01/2022 to 06/01/2022
  • Due date changed from 06/30/2022 to 09/30/2022
Actions #27

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 06/01/2022 to 08/01/2022
  • Due date changed from 09/30/2022 to 11/30/2022
Actions #28

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 08/01/2022 to 10/01/2022
  • Due date changed from 11/30/2022 to 01/31/2023
Actions #29

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 10/01/2022 to 11/01/2022
  • Due date changed from 01/31/2023 to 02/28/2023
Actions #30

Updated by Peter Amstutz almost 2 years ago

  • Due date changed from 02/28/2023 to 04/30/2023
Actions #31

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 11/01/2022 to 03/01/2023
  • Due date changed from 04/30/2023 to 09/30/2023
Actions #32

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 03/01/2023 to 05/01/2023
  • Due date changed from 09/30/2023 to 11/30/2023
Actions #33

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 05/01/2023 to 09/01/2023
  • Due date changed from 11/30/2023 to 12/31/2023
Actions #34

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 09/01/2023 to 01/01/2024
  • Due date changed from 12/31/2023 to 03/31/2024
Actions #35

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #36

Updated by Peter Amstutz 12 months ago

  • Start date changed from 01/01/2024 to 01/01/2025
  • Due date changed from 03/31/2024 to 03/31/2025
Actions #37

Updated by Peter Amstutz 10 months ago

  • Target version set to Future
Actions #38

Updated by Peter Amstutz 6 months ago

  • Related to Idea #21936: Minimum viable external data access feature added
Actions #39

Updated by Peter Amstutz 6 months ago

  • Description updated (diff)
Actions #40

Updated by Peter Amstutz 6 months ago

  • Start date set to 08/01/2024
  • Due date set to 12/31/2024
Actions #41

Updated by Peter Amstutz 6 months ago

  • Target version deleted (Future)
Actions #42

Updated by Peter Amstutz about 2 months ago

  • Due date changed from 12/31/2024 to 03/31/2025
Actions

Also available in: Atom PDF