Project

General

Profile

Actions

Idea #15960

open

Computing on external data

Added by Peter Amstutz over 4 years ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
08/01/2024
Due date:
12/31/2024 (Due in about 6 months)
Story points:
-
Release:
Release relationship:
Auto

Description

Right now, the feature of automatic HTTP download in cwl-runner is effectively fulfilling this function for users (although it copies it into the local keepstore). Users would probably like it if it were expanded to also support copying s3:// URLs.

However, the big idea for this epic is on-demand retrieval from external storage -- we fetch the data from the external system on demand.

Previous designs involved reading all the data to generate content hashes.

The current design is outlined in https://dev.arvados.org/issues/21936 and involves storing locators to external data in the manifest. The block identifiers are based on hashing the locator (and other metadata) instead of the content.


Related issues

Related to Arvados - Feature #8570: [Crunch2] Impure access to object storeNewActions
Related to Arvados - Feature #8569: [Crunch2] Impure mount from host fsNewActions
Related to Arvados - Idea #17348: Example workflow template which streams data from S3 in first step, does some computation steps, and uploads results back to S3.NewActions
Related to Arvados - Idea #21936: Minimum viable external data access featureNewActions
Actions #1

Updated by Peter Amstutz over 4 years ago

  • Start date set to 04/01/2020
  • Due date set to 06/30/2020
Actions #2

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 06/30/2020 to 07/31/2020
Actions #3

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 05/01/2020 to 04/01/2020
  • Due date changed from 07/31/2020 to 06/30/2020
Actions #4

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 06/30/2020 to 07/31/2020
Actions #5

Updated by Peter Amstutz over 4 years ago

  • Related to Feature #8570: [Crunch2] Impure access to object store added
Actions #6

Updated by Peter Amstutz over 4 years ago

  • Related to Feature #8569: [Crunch2] Impure mount from host fs added
Actions #7

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 05/01/2020 to 04/01/2020
Actions #8

Updated by Peter Amstutz over 4 years ago

  • Due date changed from 07/31/2020 to 07/01/2020
Actions #9

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 04/01/2020 to 05/01/2020
  • Due date changed from 07/01/2020 to 08/01/2020
Actions #10

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 05/01/2020 to 08/01/2020
  • Due date changed from 08/01/2020 to 11/30/2020
Actions #11

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 08/01/2020 to 05/01/2020
  • Due date changed from 11/30/2020 to 06/30/2020
Actions #12

Updated by Peter Amstutz over 4 years ago

  • Due date changed from 06/30/2020 to 07/31/2020
Actions #13

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 05/01/2020 to 05/20/2020
Actions #14

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 05/20/2020 to 06/03/2020
  • Due date changed from 07/31/2020 to 08/31/2020
Actions #15

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 06/03/2020 to 06/17/2020
  • Due date changed from 08/31/2020 to 09/16/2020
Actions #16

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 06/17/2020 to 07/29/2020
  • Due date changed from 09/16/2020 to 11/11/2020
Actions #17

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 07/29/2020 to 10/01/2020
  • Due date changed from 11/11/2020 to 01/31/2021
Actions #18

Updated by Peter Amstutz almost 4 years ago

  • Start date changed from 10/01/2020 to 01/01/2021
  • Due date changed from 01/31/2021 to 04/30/2021
Actions #19

Updated by Peter Amstutz over 3 years ago

  • Start date changed from 01/01/2021 to 04/01/2021
  • Due date changed from 04/30/2021 to 07/31/2021
Actions #20

Updated by Peter Amstutz over 3 years ago

  • Start date changed from 04/01/2021 to 07/01/2021
  • Due date changed from 07/31/2021 to 11/30/2021
Actions #21

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #17348: Example workflow template which streams data from S3 in first step, does some computation steps, and uploads results back to S3. added
Actions #22

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 07/01/2021 to 08/01/2021
  • Due date changed from 11/30/2021 to 12/31/2021
Actions #23

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 08/01/2021 to 09/01/2021
Actions #24

Updated by Peter Amstutz almost 3 years ago

  • Start date changed from 09/01/2021 to 10/01/2021
  • Due date changed from 12/31/2021 to 01/31/2022
Actions #25

Updated by Peter Amstutz almost 3 years ago

  • Start date changed from 10/01/2021 to 01/01/2022
  • Due date changed from 01/31/2022 to 06/30/2022
Actions #26

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 01/01/2022 to 06/01/2022
  • Due date changed from 06/30/2022 to 09/30/2022
Actions #27

Updated by Peter Amstutz about 2 years ago

  • Start date changed from 06/01/2022 to 08/01/2022
  • Due date changed from 09/30/2022 to 11/30/2022
Actions #28

Updated by Peter Amstutz about 2 years ago

  • Start date changed from 08/01/2022 to 10/01/2022
  • Due date changed from 11/30/2022 to 01/31/2023
Actions #29

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 10/01/2022 to 11/01/2022
  • Due date changed from 01/31/2023 to 02/28/2023
Actions #30

Updated by Peter Amstutz over 1 year ago

  • Due date changed from 02/28/2023 to 04/30/2023
Actions #31

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 11/01/2022 to 03/01/2023
  • Due date changed from 04/30/2023 to 09/30/2023
Actions #32

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 03/01/2023 to 05/01/2023
  • Due date changed from 09/30/2023 to 11/30/2023
Actions #33

Updated by Peter Amstutz about 1 year ago

  • Start date changed from 05/01/2023 to 09/01/2023
  • Due date changed from 11/30/2023 to 12/31/2023
Actions #34

Updated by Peter Amstutz about 1 year ago

  • Start date changed from 09/01/2023 to 01/01/2024
  • Due date changed from 12/31/2023 to 03/31/2024
Actions #35

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #36

Updated by Peter Amstutz 6 months ago

  • Start date changed from 01/01/2024 to 01/01/2025
  • Due date changed from 03/31/2024 to 03/31/2025
Actions #37

Updated by Peter Amstutz 4 months ago

  • Target version set to Future
Actions #38

Updated by Peter Amstutz 6 days ago

  • Related to Idea #21936: Minimum viable external data access feature added
Actions #39

Updated by Peter Amstutz 6 days ago

  • Description updated (diff)
Actions #40

Updated by Peter Amstutz 5 days ago

  • Start date set to 08/01/2024
  • Due date set to 12/31/2024
Actions #41

Updated by Peter Amstutz 5 days ago

  • Target version deleted (Future)
Actions

Also available in: Atom PDF