Bug #10543

implement approximate (estimated) counts for API list method

Added by Joshua Randall over 2 years ago. Updated over 1 year ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Implement a -count=estimate option for API list queries, to return an estimated/approximate row count in `items_available` rather than the exact count (or no count, as the option 'none' introduced in #9998 allows).

Postgres has a simple way of getting an approximate row count for an entire table very quickly, and a somewhat more involved way of getting an approximate count for more sophisticated queries (https://wiki.postgresql.org/wiki/Count_estimate), which should still be much faster than a full table scan.

This could be used anywhere only an approximate count is needed. That could include:
- to populate a UI that displays the number of pages available rather than the count
- to populate a UI that displays the number of items available in approximate terms (i.e. instead of showing "Data Collections (7323212)" workbench could say "Data Collections (7.3M)")
- to create an appropriately sized data structure to accommodate all the data (e.g. to set the collection map size at the beginning of a keep-balance run, which already uses 110% of the returned value)


#1 Updated by Tom Morris over 1 year ago

  • Target version set to Arvados Future Sprints

Also available in: Atom PDF