Feature #15397
closedDeclutter the API
Description
There are several legacy tables "humans" "specimens" and "traits". These were added extremely early on with the best intentions of supporting the PGP use case but as far as I know they have never used for their intended purpose. They should be deprecated and removed to de-clutter the API.
- Announce deprecation/removal in future version (already done)
- A few integration tests use these APIs because they are generic resources with no business logic. These tests need to be updated to use a different resource or otherwise perform test a different way.
- Delete models, controllers, tests, and routes from API server
- Delete from documentation
These should not appear in the discovery document or the auto-generated python docs. The auto-generated SDKs (e.g. the R SDK) should also be updated.
We are also dropping the hosting git repository support (the "repositories" table). This means we can delete arvados/services/githttpd
.
We would also like to remove and stop publishing anything related to the jobs API, e.g. "jobs" and "job_tasks", "pipeline_instances", "pipeline_templates", "nodes".
We should also get rid of the "keep_disks" table.
We should remove some unused fields from "api_client_authorization" response: default_owner_uuid
, api_client_id
, user_id
(the _id
fields may need to remain internally but should not be published by the API because they are not usable with any other API calls).
The updated_at
column is redundant with modified_at
, I believe update_at
should be dropped in all cases.
We should remove redundant resource methods from the Discovery Document (the Python and R autogenerated APIs already filter these out, so they are undocumented and nobody should be using them):
- "show" (synonym for "get")
- "index" (synonym for "list")
- "destroy" (synonym for "delete")
There's a "Managed" section under SLURM that has options related to the obsolete slurm-on-cloud configuration.
Another thing to get rid of: support for legacy component-specific config files.
Config options to remove:
Mail.IssueReporterEmailFrom, Mail.IssueReporterEmailTo -- only used by Workbench1 mailers
Mail.EmailFrom -- also only used by Workbench1 mailers
Mail.MailchimpAPIKey, Mail.MailchimpListID -- seem to be tied to a long lost arvados-mailchimp-plugin that we don't use and can't locate
If we delete all of these, the only option left is "SendUserSetupNotificationEmail" which should really migrate to the "Users" section where all the actually-in-use user notification mail options are located.
Updated by Peter Amstutz over 5 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 5 years ago
- Subject changed from Deprecate human/sample/specimen tables to Deprecate & remove human, specimens and traits tables
- Description updated (diff)
- Status changed from In Progress to New
Updated by Tom Morris over 5 years ago
- Related to Bug #10346: On the API docs (http://doc.arvados.org/api/), rearrange documentation so metadata features (humans, traits, specimens) do not distract/confuse people added
Updated by Tom Morris over 5 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 2.0
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
Arvados Future Sprints)
Updated by Peter Amstutz over 1 year ago
- Release deleted (
60) - Target version set to Future
Updated by Peter Amstutz over 1 year ago
- Description updated (diff)
- Subject changed from Deprecate & remove human, specimens and traits tables to Declutter the API
Updated by Peter Amstutz over 1 year ago
- Related to Idea #15880: Remove hosted git service added
Updated by Peter Amstutz over 1 year ago
- Related to Idea #20344: Arvados 3.0 added
Updated by Peter Amstutz over 1 year ago
- Blocked by Support #20840: Update documentation to make it clear certain APIs are deprecated added
Updated by Brett Smith over 1 year ago
- Related to Idea #20951: Document deprecated api_client_authorization fields added
Updated by Peter Amstutz over 1 year ago
- Target version changed from Future to To be scheduled
Updated by Peter Amstutz over 1 year ago
- Target version changed from To be scheduled to Development 2023-11-29 sprint
Updated by Peter Amstutz over 1 year ago
- Related to Feature #19929: Improve documentation in the discovery document added
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Updated by Peter Amstutz over 1 year ago
- Category set to API
- Subject changed from Declutter the API to Declutter the API
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-29 sprint to Future
Updated by Peter Amstutz about 1 year ago
- Target version changed from Future to Development 2024-01-03 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-01-17 sprint to Development 2024-01-03 sprint
Updated by Peter Amstutz about 1 year ago
- Related to Feature #21226: Fix or remove tests that use deprecated APIs added
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Updated by Peter Amstutz 12 months ago
- Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Updated by Peter Amstutz 11 months ago
- Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Updated by Peter Amstutz 11 months ago
- Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Updated by Peter Amstutz 11 months ago
- Related to Bug #21416: Document mail-releated configuration options added
Updated by Peter Amstutz 10 months ago
- Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Updated by Peter Amstutz 10 months ago
- Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-04-24 sprint
Updated by Lucas Di Pentima 9 months ago
- Blocks Feature #21666: provision.sh uses arvados-client diagnostics instead of run-test.sh added
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Tom Clegg 8 months ago
15397-remove-obsolete-apis @ ae87427587b49319677f960edcf7a44d8145814c -- developer-run-tests: #4199
Done so far:- remove API endpoints and doc pages: humans, specimens, traits, repositories, nodes, pipeline_*, job*, keep_disks
- remove services/githttpd and associated test/packaging/arvbox bits
- remove git_tree mount type from crunch-run and docs
- some more things mentioned in issue description above (user_id, etc)
- drop database tables? (this is irreversible, and would prevent downgrading to 2.x or even doing a rolling upgrade to 3.0 -- perhaps better to leave the tables alone for now)
- remove "repositories" menu items etc. from services/workbench2
- add "deprecated APIs have been removed" note to https://doc.arvados.org/main/admin/upgrading.html
Updated by Peter Amstutz 8 months ago
I agree with the plan of merging in several stages to minimize merge conflicts.
15397-remove-obsolete-apis @ ae87427587b49319677f960edcf7a44d8145814c LGTM
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Updated by Brett Smith 8 months ago
I think after the last merge the documentation ToC needs to be updated to remove references to missing pages. See https://doc.arvados.org/main/api/index.html - on the left the "legacy" sections still appear and have weird empty links.
Updated by Tom Clegg 8 months ago
15397-remove-obsolete-apis @ 35779c1cc5e9666525432866b5e64eee9cb36a12 -- developer-run-tests: #4230
- ✅ remove missing pages from ToC (#note-63)
- ✅ remove all Python SDK parts that were marked deprecated in docs, update remaining usage in examples/tools/tests
- ✅ remove destroy/index/show methods from discovery doc, update usage in examples/tools/tests
- ✅ remove user_id, default_owner_uuid, updated_at fields from API responses
- ✅ remove unused configs (mailchimp etc)
- ✅ check that updated_at is already removed from API responses (2c157382b1ecf0175f0356d6c3a457dca942f5f3 in 2014)
- ✅ remove some Python2-specific code from the supporting-python2-and-python3 era
- ✅ fix some improperly quoted regexps (fix "invalid escape sequence \." warnings)
- ✅ use assertRegex instead of assertRegexpMatches (fix deprecation warnings)
- remove api_client_id (this isn't as trivial as I'd hoped -- tests and install docs still rely on having tokens that differ only by api_client_id -- so I've set it aside for a separate follow-up branch)
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Updated by Tom Clegg 8 months ago
- ✅ [things mentioned above in #note-64]
- ✅ move remaining configs (SupportEmailAddress and SendUserSetupNotificationEmail) from
Mail
section toUsers
section
- remove hosted repositories feature from workbench
- remove api_client_id
Updated by Peter Amstutz 7 months ago
I'm having trouble building the Python API documentation:
2024-06-03_13:47:06.24723 running build_scripts 2024-06-03_13:47:06.64739 Traceback (most recent call last): 2024-06-03_13:47:06.64741 File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/extract.py", line 218, in load_module 2024-06-03_13:47:06.64752 return importlib.import_module(module) 2024-06-03_13:47:06.64753 File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module 2024-06-03_13:47:06.64760 return _bootstrap._gcd_import(name[level:], package, level) 2024-06-03_13:47:06.64760 File "<frozen importlib._bootstrap>", line 1030, in _gcd_import 2024-06-03_13:47:06.64765 File "<frozen importlib._bootstrap>", line 1007, in _find_and_load 2024-06-03_13:47:06.64769 File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked 2024-06-03_13:47:06.64773 File "<frozen importlib._bootstrap>", line 680, in _load_unlocked 2024-06-03_13:47:06.64777 File "<frozen importlib._bootstrap_external>", line 790, in exec_module 2024-06-03_13:47:06.64779 File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed 2024-06-03_13:47:06.64782 File "../sdk/python/build/lib/arvados/stream.py", line 7, in <module> 2024-06-03_13:47:06.64786 from future.utils import listvalues 2024-06-03_13:47:06.64786 ModuleNotFoundError: No module named 'future' 2024-06-03_13:47:06.64787 2024-06-03_13:47:06.64787 The above exception was the direct cause of the following exception: 2024-06-03_13:47:06.64788 2024-06-03_13:47:06.64788 Traceback (most recent call last): 2024-06-03_13:47:06.64788 File "/usr/src/arvados/doc/pysdk_pdoc.py", line 59, in <module> 2024-06-03_13:47:06.64792 sys.exit(main(sys.argv[1:] or DEFAULT_ARGLIST)) 2024-06-03_13:47:06.64792 File "/usr/src/arvados/doc/pysdk_pdoc.py", line 55, in main 2024-06-03_13:47:06.64797 pdoc.__main__.cli(arglist) 2024-06-03_13:47:06.64797 File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/__main__.py", line 199, in cli 2024-06-03_13:47:06.64804 pdoc.pdoc( 2024-06-03_13:47:06.64805 File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/__init__.py", line 520, in pdoc 2024-06-03_13:47:06.64841 all_modules[module_name] = doc.Module.from_name(module_name) 2024-06-03_13:47:06.64842 File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/doc.py", line 404, in from_name 2024-06-03_13:47:06.64852 return cls(extract.load_module(name)) 2024-06-03_13:47:06.64852 File "/usr/lib/python3.9/contextlib.py", line 79, in inner 2024-06-03_13:47:06.64858 return func(*args, **kwds) 2024-06-03_13:47:06.64858 File "/opt/arvados-py/lib/python3.9/site-packages/pdoc/extract.py", line 220, in load_module 2024-06-03_13:47:06.64866 raise RuntimeError(f"Error importing {module}") from e 2024-06-03_13:47:06.64867 RuntimeError: Error importing arvados.stream 2024-06-03_13:47:06.73333 rake aborted!
Updated by Brett Smith 7 months ago
2024-06-03_13:47:06.64782 File "../sdk/python/build/lib/arvados/stream.py", line 7, in <module> 2024-06-03_13:47:06.64786 from future.utils import listvalues 2024-06-03_13:47:06.64786 ModuleNotFoundError: No module named 'future'
That import was removed in 873fcf181c037cc1e42419bfeaf5bb70c9d9e239 and should be gone for #21356. If it's still in the branch, I would start by folding in current main one way or another. If it's not, I would blow away all your Python build
directories and see if that helps. (If it does, we can take a bug report against the documentation process being more reliable, but it would at least unblock review.)
Updated by Peter Amstutz 7 months ago
I merged main:
15397-remove-obsolete-apis @ 6fa7f9fbcf20aa866eed0618bd09e1ce2e109baa
Between that and cleaning up arvados/sdk/python/build
I was able to build the Python SDK docs.
Updated by Peter Amstutz 7 months ago
If we want to merge to main again and then do a 3rd branch to finish up, I'm fine with that.
We need to have a discussion about whether to add migrations to ALTER TABLE
and DROP TABLE
.
The tables that are completely obsolete (jobs
, humans
etc) shouldn't clutter up structure.sql
, so I think a migration to drop them is merited. To be extra careful, we could conditionally drop them only if they are empty.
The updated_at
column is everywhere. Running ALTER TABLE
migration on virtually every table will be very expensive. On the other hand, some small percentage of the database is entirely wasted on storing this unused column -- it isn't free, it takes up storage space, and it gets loaded in queries. It might only be a 1% performance penalty but if we don't completely get rid of it now, we probably never will.
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Updated by Tom Clegg 7 months ago
Peter Amstutz wrote in #note-70:
IMO we shouldWe need to have a discussion about whether to add migrations to
ALTER TABLE
andDROP TABLE
.
- release a version that does not rely on the tables/columns being present (we're already aiming for 3.0 because we want the externally visible API changes to happen between 2.x and 3.x)
- release a subsequent version that removes the tables/columns (this can be 3.1)
- any other arvados-api-server processes using the same database start crashing
- downgrading to arvados-api-server 2.x results in a broken system (even in the single-node case)
- the only way to abort a 3.0 upgrade after this point is to wipe and restore from backup
- the database schema is bloated in 3.0, not trimmed until 3.1 (but this is not user visible -- the discovery doc is already trimmed)
- there is a small performance penalty in 3.0, not fixed until 3.1
- it is possible to abort a 3.0 upgrade without restoring database from backup
if we don't completely get rid of it now, we probably never will.
The way I see it, 3.0 is a rare opportunity to make API changes, like removing API response fields. But once that is done, each subsequent 3.x release is another opportunity to make non-API changes, like removing unused database columns.
Updated by Peter Amstutz 7 months ago
Tom Clegg wrote in #note-72:
Peter Amstutz wrote in #note-70:
IMO we shouldWe need to have a discussion about whether to add migrations to
ALTER TABLE
andDROP TABLE
.If we do both things in the same release, upgrading arvados-api-server to 3.0 on a single system node is destructive in that
- release a version that does not rely on the tables/columns being present (we're already aiming for 3.0 because we want the externally visible API changes to happen between 2.x and 3.x)
- release a subsequent version that removes the tables/columns (this can be 3.1)
OTOH, with the two-stage approach
- any other arvados-api-server processes using the same database start crashing
- downgrading to arvados-api-server 2.x results in a broken system (even in the single-node case)
- the only way to abort a 3.0 upgrade after this point is to wipe and restore from backup
- the database schema is bloated in 3.0, not trimmed until 3.1 (but this is not user visible -- the discovery doc is already trimmed)
- there is a small performance penalty in 3.0, not fixed until 3.1
- it is possible to abort a 3.0 upgrade without restoring database from backup
if we don't completely get rid of it now, we probably never will.
The way I see it, 3.0 is a rare opportunity to make API changes, like removing API response fields. But once that is done, each subsequent 3.x release is another opportunity to make non-API changes, like removing unused database columns.
This is a good plan.
Updated by Peter Amstutz 7 months ago
- Do a final check
Follow ups
- Check the R and Java SDKs
- Add ticket to purge tables in Arvados 3.1
Updated by Tom Clegg 7 months ago
- Related to Feature #21910: Remove api_clients APIs and api_client_id field added
Updated by Tom Clegg 7 months ago
- Related to Feature #18967: Drop unused columns and tables added
Updated by Peter Amstutz 3 months ago
- Related to Bug #22198: Get rid of 'href` in response added