Project

General

Profile

Actions

Feature #15397

closed

Declutter the API

Added by Peter Amstutz about 5 years ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
3.0
Release:
Release relationship:
Auto

Description

There are several legacy tables "humans" "specimens" and "traits". These were added extremely early on with the best intentions of supporting the PGP use case but as far as I know they have never used for their intended purpose. They should be deprecated and removed to de-clutter the API.

  1. Announce deprecation/removal in future version (already done)
  2. A few integration tests use these APIs because they are generic resources with no business logic. These tests need to be updated to use a different resource or otherwise perform test a different way.
  3. Delete models, controllers, tests, and routes from API server
  4. Delete from documentation

These should not appear in the discovery document or the auto-generated python docs. The auto-generated SDKs (e.g. the R SDK) should also be updated.

We are also dropping the hosting git repository support (the "repositories" table). This means we can delete arvados/services/githttpd.

We would also like to remove and stop publishing anything related to the jobs API, e.g. "jobs" and "job_tasks", "pipeline_instances", "pipeline_templates", "nodes".

We should also get rid of the "keep_disks" table.

We should remove some unused fields from "api_client_authorization" response: default_owner_uuid, api_client_id, user_id (the _id fields may need to remain internally but should not be published by the API because they are not usable with any other API calls).

The updated_at column is redundant with modified_at, I believe update_at should be dropped in all cases.

We should remove redundant resource methods from the Discovery Document (the Python and R autogenerated APIs already filter these out, so they are undocumented and nobody should be using them):

  • "show" (synonym for "get")
  • "index" (synonym for "list")
  • "destroy" (synonym for "delete")

There's a "Managed" section under SLURM that has options related to the obsolete slurm-on-cloud configuration.

Another thing to get rid of: support for legacy component-specific config files.

Config options to remove:

Mail.IssueReporterEmailFrom, Mail.IssueReporterEmailTo -- only used by Workbench1 mailers

Mail.EmailFrom -- also only used by Workbench1 mailers

Mail.MailchimpAPIKey, Mail.MailchimpListID -- seem to be tied to a long lost arvados-mailchimp-plugin that we don't use and can't locate

If we delete all of these, the only option left is "SendUserSetupNotificationEmail" which should really migrate to the "Users" section where all the actually-in-use user notification mail options are located.


Subtasks 2 (0 open2 closed)

Task #21683: Review 15397-remove-obsolete-apisResolvedPeter Amstutz05/06/2024Actions
Task #21749: Review 15397-remove-obsolete-apisResolvedTom Clegg06/14/2024Actions

Related issues

Related to Arvados - Bug #10346: On the API docs (http://doc.arvados.org/api/), rearrange documentation so metadata features (humans, traits, specimens) do not distract/confuse peopleResolvedPeter Amstutz10/25/2016Actions
Related to Arvados - Idea #15880: Remove hosted git serviceResolvedTom CleggActions
Related to Arvados Epics - Idea #20344: Arvados 3.0New08/01/202306/30/2024Actions
Related to Arvados - Idea #20951: Document deprecated api_client_authorization fieldsResolvedPeter AmstutzActions
Related to Arvados - Feature #19929: Improve documentation in the discovery documentNewBrett SmithActions
Related to Arvados - Feature #21226: Fix or remove tests that use deprecated APIsDuplicateTom CleggActions
Related to Arvados - Bug #21416: Document mail-releated configuration optionsResolvedPeter AmstutzActions
Related to Arvados - Feature #21910: Remove api_clients APIs and api_client_id fieldIn ProgressTom Clegg07/03/2024Actions
Related to Arvados - Feature #18967: Drop unused columns and tablesNewActions
Blocked by Arvados - Support #20840: Update documentation to make it clear certain APIs are deprecatedResolvedPeter Amstutz09/03/2023Actions
Blocks Arvados - Feature #21666: provision.sh uses arvados-client diagnostics instead of run-test.shResolvedLucas Di PentimaActions
Actions #1

Updated by Peter Amstutz about 5 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 5 years ago

  • Subject changed from Deprecate human/sample/specimen tables to Deprecate & remove human, specimens and traits tables
  • Description updated (diff)
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #4

Updated by Tom Morris about 5 years ago

  • Related to Bug #10346: On the API docs (http://doc.arvados.org/api/), rearrange documentation so metadata features (humans, traits, specimens) do not distract/confuse people added
Actions #5

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #6

Updated by Tom Morris about 5 years ago

  • Target version set to To Be Groomed
Actions #7

Updated by Tom Morris almost 5 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 2.0
Actions #8

Updated by Peter Amstutz about 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #9

Updated by Peter Amstutz over 1 year ago

  • Release set to 60
Actions #10

Updated by Peter Amstutz about 1 year ago

  • Release deleted (60)
  • Target version set to Future
Actions #11

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
  • Subject changed from Deprecate & remove human, specimens and traits tables to Declutter the API
Actions #13

Updated by Peter Amstutz about 1 year ago

  • Related to Idea #15880: Remove hosted git service added
Actions #14

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #15

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #16

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #17

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #18

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #19

Updated by Peter Amstutz 12 months ago

Actions #20

Updated by Peter Amstutz 12 months ago

  • Blocked by Support #20840: Update documentation to make it clear certain APIs are deprecated added
Actions #21

Updated by Brett Smith 10 months ago

  • Related to Idea #20951: Document deprecated api_client_authorization fields added
Actions #22

Updated by Peter Amstutz 10 months ago

  • Target version changed from Future to To be scheduled
Actions #23

Updated by Peter Amstutz 10 months ago

  • Target version changed from To be scheduled to Development 2023-11-29 sprint
Actions #24

Updated by Peter Amstutz 10 months ago

  • Related to Feature #19929: Improve documentation in the discovery document added
Actions #25

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Actions #26

Updated by Peter Amstutz 10 months ago

  • Category set to API
  • Subject changed from Declutter the API to Declutter the API
Actions #27

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #28

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2023-11-29 sprint to Future
Actions #29

Updated by Peter Amstutz 9 months ago

  • Target version changed from Future to Development 2024-01-03 sprint
Actions #30

Updated by Peter Amstutz 9 months ago

  • Description updated (diff)
Actions #31

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #32

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-03 sprint
Actions #33

Updated by Peter Amstutz 8 months ago

  • Description updated (diff)
Actions #34

Updated by Peter Amstutz 8 months ago

  • Description updated (diff)
Actions #35

Updated by Peter Amstutz 8 months ago

  • Story points changed from 2.0 to 3.0
Actions #36

Updated by Peter Amstutz 8 months ago

  • Related to Feature #21226: Fix or remove tests that use deprecated APIs added
Actions #37

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #38

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Actions #39

Updated by Peter Amstutz 7 months ago

  • Release set to 70
Actions #40

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Actions #41

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Actions #42

Updated by Peter Amstutz 6 months ago

  • Description updated (diff)
Actions #43

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Actions #44

Updated by Peter Amstutz 5 months ago

  • Description updated (diff)
Actions #45

Updated by Peter Amstutz 5 months ago

  • Related to Bug #21416: Document mail-releated configuration options added
Actions #46

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #47

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions #48

Updated by Peter Amstutz 5 months ago

  • Tracker changed from Idea to Feature
Actions #51

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Actions #52

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #53

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-04-24 sprint
Actions #54

Updated by Peter Amstutz 3 months ago

  • Assigned To set to Tom Clegg
Actions #55

Updated by Tom Clegg 3 months ago

  • Status changed from New to In Progress
  • Description updated (diff)
Actions #56

Updated by Lucas Di Pentima 3 months ago

  • Blocks Feature #21666: provision.sh uses arvados-client diagnostics instead of run-test.sh added
Actions #57

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #58

Updated by Tom Clegg 3 months ago

15397-remove-obsolete-apis @ ae87427587b49319677f960edcf7a44d8145814c -- developer-run-tests: #4199

Done so far:
  • remove API endpoints and doc pages: humans, specimens, traits, repositories, nodes, pipeline_*, job*, keep_disks
  • remove services/githttpd and associated test/packaging/arvbox bits
  • remove git_tree mount type from crunch-run and docs
TODO:
  • some more things mentioned in issue description above (user_id, etc)
  • drop database tables? (this is irreversible, and would prevent downgrading to 2.x or even doing a rolling upgrade to 3.0 -- perhaps better to leave the tables alone for now)
  • remove "repositories" menu items etc. from services/workbench2
  • add "deprecated APIs have been removed" note to https://doc.arvados.org/main/admin/upgrading.html
Actions #59

Updated by Tom Clegg 3 months ago

I think it would be good to review & merge the above branch, and address the remaining TODO's in a subsequent branch. This branch changes a lot of files, and is already developing merge conflicts (e.g., #21611).

Actions #60

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #61

Updated by Peter Amstutz 3 months ago

I agree with the plan of merging in several stages to minimize merge conflicts.

15397-remove-obsolete-apis @ ae87427587b49319677f960edcf7a44d8145814c LGTM

Actions #62

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions #63

Updated by Brett Smith 2 months ago

I think after the last merge the documentation ToC needs to be updated to remove references to missing pages. See https://doc.arvados.org/main/api/index.html - on the left the "legacy" sections still appear and have weird empty links.

Actions #64

Updated by Tom Clegg 2 months ago

15397-remove-obsolete-apis @ 35779c1cc5e9666525432866b5e64eee9cb36a12 -- developer-run-tests: #4230

  • ✅ remove missing pages from ToC (#note-63)
  • ✅ remove all Python SDK parts that were marked deprecated in docs, update remaining usage in examples/tools/tests
  • ✅ remove destroy/index/show methods from discovery doc, update usage in examples/tools/tests
  • ✅ remove user_id, default_owner_uuid, updated_at fields from API responses
  • ✅ remove unused configs (mailchimp etc)
  • ✅ check that updated_at is already removed from API responses (2c157382b1ecf0175f0356d6c3a457dca942f5f3 in 2014)
plus some extra cleanup tasks even though they don't declutter the API:
  • ✅ remove some Python2-specific code from the supporting-python2-and-python3 era
  • ✅ fix some improperly quoted regexps (fix "invalid escape sequence \." warnings)
  • ✅ use assertRegex instead of assertRegexpMatches (fix deprecation warnings)
Todo/TBD:
  • remove api_client_id (this isn't as trivial as I'd hoped -- tests and install docs still rely on having tokens that differ only by api_client_id -- so I've set it aside for a separate follow-up branch)
Actions #65

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Actions #66

Updated by Tom Clegg 2 months ago

15397-remove-obsolete-apis @ e98e166ceab6e377036fc87ce31e4d0d5238994f -- developer-run-tests: #4247
  • ✅ [things mentioned above in #note-64]
  • ✅ move remaining configs (SupportEmailAddress and SendUserSetupNotificationEmail) from Mail section to Users section
Todo:
  • remove hosted repositories feature from workbench
  • remove api_client_id
Actions #67

Updated by Peter Amstutz about 2 months ago

I'm having trouble building the Python API documentation:

2024-06-03_13:47:06.24723 running build_scripts
2024-06-03_13:47:06.64739 Traceback (most recent call last):
2024-06-03_13:47:06.64741   File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/extract.py", line 218, in load_module
2024-06-03_13:47:06.64752     return importlib.import_module(module)
2024-06-03_13:47:06.64753   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-06-03_13:47:06.64760     return _bootstrap._gcd_import(name[level:], package, level)
2024-06-03_13:47:06.64760   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-06-03_13:47:06.64765   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-06-03_13:47:06.64769   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-06-03_13:47:06.64773   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-06-03_13:47:06.64777   File "<frozen importlib._bootstrap_external>", line 790, in exec_module
2024-06-03_13:47:06.64779   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-06-03_13:47:06.64782   File "../sdk/python/build/lib/arvados/stream.py", line 7, in <module>
2024-06-03_13:47:06.64786     from future.utils import listvalues
2024-06-03_13:47:06.64786 ModuleNotFoundError: No module named 'future'
2024-06-03_13:47:06.64787 
2024-06-03_13:47:06.64787 The above exception was the direct cause of the following exception:
2024-06-03_13:47:06.64788 
2024-06-03_13:47:06.64788 Traceback (most recent call last):
2024-06-03_13:47:06.64788   File "/usr/src/arvados/doc/pysdk_pdoc.py", line 59, in <module>
2024-06-03_13:47:06.64792     sys.exit(main(sys.argv[1:] or DEFAULT_ARGLIST))
2024-06-03_13:47:06.64792   File "/usr/src/arvados/doc/pysdk_pdoc.py", line 55, in main
2024-06-03_13:47:06.64797     pdoc.__main__.cli(arglist)
2024-06-03_13:47:06.64797   File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/__main__.py", line 199, in cli
2024-06-03_13:47:06.64804     pdoc.pdoc(
2024-06-03_13:47:06.64805   File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/__init__.py", line 520, in pdoc
2024-06-03_13:47:06.64841     all_modules[module_name] = doc.Module.from_name(module_name)
2024-06-03_13:47:06.64842   File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/doc.py", line 404, in from_name
2024-06-03_13:47:06.64852     return cls(extract.load_module(name))
2024-06-03_13:47:06.64852   File "/usr/lib/python3.9/contextlib.py", line 79, in inner
2024-06-03_13:47:06.64858     return func(*args, **kwds)
2024-06-03_13:47:06.64858   File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/extract.py", line 220, in load_module
2024-06-03_13:47:06.64866     raise RuntimeError(f"Error importing {module}") from e
2024-06-03_13:47:06.64867 RuntimeError: Error importing arvados.stream
2024-06-03_13:47:06.73333 rake aborted!
Actions #68

Updated by Brett Smith about 2 months ago

2024-06-03_13:47:06.64782   File "../sdk/python/build/lib/arvados/stream.py", line 7, in <module>
2024-06-03_13:47:06.64786     from future.utils import listvalues
2024-06-03_13:47:06.64786 ModuleNotFoundError: No module named 'future'

That import was removed in 873fcf181c037cc1e42419bfeaf5bb70c9d9e239 and should be gone for #21356. If it's still in the branch, I would start by folding in current main one way or another. If it's not, I would blow away all your Python build directories and see if that helps. (If it does, we can take a bug report against the documentation process being more reliable, but it would at least unblock review.)

Actions #69

Updated by Peter Amstutz about 2 months ago

I merged main:

15397-remove-obsolete-apis @ 6fa7f9fbcf20aa866eed0618bd09e1ce2e109baa

Between that and cleaning up arvados/sdk/python/build I was able to build the Python SDK docs.

Actions #70

Updated by Peter Amstutz about 2 months ago

If we want to merge to main again and then do a 3rd branch to finish up, I'm fine with that.

We need to have a discussion about whether to add migrations to ALTER TABLE and DROP TABLE.

The tables that are completely obsolete (jobs, humans etc) shouldn't clutter up structure.sql, so I think a migration to drop them is merited. To be extra careful, we could conditionally drop them only if they are empty.

The updated_at column is everywhere. Running ALTER TABLE migration on virtually every table will be very expensive. On the other hand, some small percentage of the database is entirely wasted on storing this unused column -- it isn't free, it takes up storage space, and it gets loaded in queries. It might only be a 1% performance penalty but if we don't completely get rid of it now, we probably never will.

Actions #71

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Actions #72

Updated by Tom Clegg about 1 month ago

Peter Amstutz wrote in #note-70:

We need to have a discussion about whether to add migrations to ALTER TABLE and DROP TABLE.

IMO we should
  1. release a version that does not rely on the tables/columns being present (we're already aiming for 3.0 because we want the externally visible API changes to happen between 2.x and 3.x)
  2. release a subsequent version that removes the tables/columns (this can be 3.1)
If we do both things in the same release, upgrading arvados-api-server to 3.0 on a single system node is destructive in that
  • any other arvados-api-server processes using the same database start crashing
  • downgrading to arvados-api-server 2.x results in a broken system (even in the single-node case)
  • the only way to abort a 3.0 upgrade after this point is to wipe and restore from backup
OTOH, with the two-stage approach
  • the database schema is bloated in 3.0, not trimmed until 3.1 (but this is not user visible -- the discovery doc is already trimmed)
  • there is a small performance penalty in 3.0, not fixed until 3.1
  • it is possible to abort a 3.0 upgrade without restoring database from backup

if we don't completely get rid of it now, we probably never will.

The way I see it, 3.0 is a rare opportunity to make API changes, like removing API response fields. But once that is done, each subsequent 3.x release is another opportunity to make non-API changes, like removing unused database columns.

Actions #73

Updated by Peter Amstutz about 1 month ago

Tom Clegg wrote in #note-72:

Peter Amstutz wrote in #note-70:

We need to have a discussion about whether to add migrations to ALTER TABLE and DROP TABLE.

IMO we should
  1. release a version that does not rely on the tables/columns being present (we're already aiming for 3.0 because we want the externally visible API changes to happen between 2.x and 3.x)
  2. release a subsequent version that removes the tables/columns (this can be 3.1)
If we do both things in the same release, upgrading arvados-api-server to 3.0 on a single system node is destructive in that
  • any other arvados-api-server processes using the same database start crashing
  • downgrading to arvados-api-server 2.x results in a broken system (even in the single-node case)
  • the only way to abort a 3.0 upgrade after this point is to wipe and restore from backup
OTOH, with the two-stage approach
  • the database schema is bloated in 3.0, not trimmed until 3.1 (but this is not user visible -- the discovery doc is already trimmed)
  • there is a small performance penalty in 3.0, not fixed until 3.1
  • it is possible to abort a 3.0 upgrade without restoring database from backup

if we don't completely get rid of it now, we probably never will.

The way I see it, 3.0 is a rare opportunity to make API changes, like removing API response fields. But once that is done, each subsequent 3.x release is another opportunity to make non-API changes, like removing unused database columns.

This is a good plan.

Actions #74

Updated by Peter Amstutz about 1 month ago

  • Do a final check

Follow ups

  • Check the R and Java SDKs
  • Add ticket to purge tables in Arvados 3.1
Actions #75

Updated by Tom Clegg about 1 month ago

  • Related to Feature #21910: Remove api_clients APIs and api_client_id field added
Actions #76

Updated by Tom Clegg about 1 month ago

  • Status changed from In Progress to Resolved
Actions #77

Updated by Tom Clegg about 1 month ago

Actions

Also available in: Atom PDF