Project

General

Profile

Actions

Bug #22127

closed

event loop slowdowns during normal data table page loads

Added by Peter Amstutz 6 months ago. Updated 12 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Workbench2
Target version:
Story points:
-
Release relationship:
Auto

Description

The red bars represent a stalled event loop, this means the browser is basically unresponsive during these times. With a debug build, total time spent is about 8s.

The release build is better, but event loop is still blocked for about 2s:

One thing that is suspicious is that the stacked blue bars represent "requestContainerStatusCount" and it is only when all those requests are completed that there is a break in the event loop being blocked.


Files

wb2-debug-jank.png (325 KB) wb2-debug-jank.png Peter Amstutz, 09/17/2024 05:55 PM
wb2-release-jank.png (342 KB) wb2-release-jank.png Peter Amstutz, 09/17/2024 05:58 PM
profile_w_stagger.png (279 KB) profile_w_stagger.png Lisa Knox, 12/18/2024 09:50 PM
projectpanelatarefresh.png (344 KB) projectpanelatarefresh.png Lisa Knox, 12/18/2024 09:50 PM
actionCount_base.png (98.7 KB) actionCount_base.png Lisa Knox, 12/18/2024 09:50 PM
actionCount_no_setCount.png (94.8 KB) actionCount_no_setCount.png Lisa Knox, 12/18/2024 09:50 PM
actionCount_no_midware.png (89.3 KB) actionCount_no_midware.png Lisa Knox, 12/18/2024 09:50 PM
actionCount_modified.png (95.1 KB) actionCount_modified.png Lisa Knox, 12/18/2024 09:58 PM
arvados-workbench2-3.1.0~dev20250210170035-1.x86_64.rpm (4.76 MB) arvados-workbench2-3.1.0~dev20250210170035-1.x86_64.rpm Peter Amstutz, 02/17/2025 04:27 PM

Subtasks 2 (0 open2 closed)

Task #22440: Review 22127-wb2-optimizationResolvedStephen Smith02/19/2025Actions
Task #22441: Review vibes ResolvedPeter Amstutz01/14/2025Actions
Actions #1

Updated by Peter Amstutz 6 months ago

  • Subject changed from filterResources is inefficient to event loop slowdowns during normal data table page loads
Actions #2

Updated by Peter Amstutz 6 months ago

  • Assigned To changed from Peter Amstutz to Stephen Smith
Actions #4

Updated by Peter Amstutz 6 months ago

From the "flame" chart, it also looks like it might be doing multiple expensive re-renders in that 'blocked event loop' period. Perhaps this is happening each time the redux state changes? What is happening here is that actions trigger other actions which independently mutate the state in several steps, which might trigger intermediate renders that are never seen by the user but are still slows things down.

Actions #5

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-09-25 sprint to Development 2024-10-09 sprint
Actions #6

Updated by Peter Amstutz 5 months ago

  • Assigned To deleted (Stephen Smith)
Actions #7

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-10-09 sprint to Development 2024-10-23 sprint
Actions #8

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-10-23 sprint to Development 2024-11-06 sprint
Actions #9

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-11-06 sprint to Development 2024-11-20
Actions #10

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-11-20 to Development 2024-12-04
Actions #11

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-12-04 to Future
Actions #12

Updated by Peter Amstutz 3 months ago

  • Target version changed from Future to Development 2025-01-29
Actions #13

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2025-01-29 to Development 2025-01-08
  • Assigned To set to Lisa Knox

Updated by Lisa Knox 3 months ago

What I believe is the primary source of lag is the way Data Explorer middlewares are initiated combined with the fact that all each new middleware triggers its own SET_COLUMNS call in each Data Explorer. There are 19 Data Explorers and 1 middleware for each, leading to an n^2 situation (19 * 19 = 361). Each Data Explorer has its own set of actions that are dispatched as well.

Here are all the actions dispatched when Workbench, viewing the Project Panel, is refreshed on the main branch (note that many things happen exactly 19 times, some happen 38 times (19 * 2), and SET_COLUMNS happens 380 times (19^2 + 19):

If we remove the calls to SET_COLUMNS, we see a marked improvement:

If we remove the middlwares that are not directly affecting the Project Panel:

In this state, workbench works 100% as expected as long as the only Data Explorer viewed is the Project Panel, including switching between projects. There is still some lag when switching between projects. This might be addressed by using web workers to offload some of the work to another JS thread and/or using scheduler.yield() where appropriate, which is where I will focus my efforts after this initial phase.

The long-term solution to the SET_COLUMNS problem is a substantial rewrite with the aim of lazily-loading each Data Explorer when needed, instead of loading all 19 on the initial page load/refresh.

I am still investigating alternatives to our current middleware implementation. These cannot be done lazily, but if we can offload their side effects until they are needed, we can still save a thousands of unnecessary actions.

I also investigated the "requestContainerStatusCount" issue, and implemented a wrapper that just yields control momentarily after each of the requests resolves. This doesn't speed up the page load time (it actually increases it by a tiny but nonzero amount), but it does break up the jank at several points, meaning UI input can be handled more often.

I have also applied some memoization to reduce some of the more problematic actions. The Data Table now renders twice instead of 16 times per page load, and a slight refactor means the side panel tree item renderer gets called 98 times vs 512 times. The reason for this is that it used a calculation that was located within the mapStateToDispatch for that component. This means it was re-calculated every time Redux state updated, whether or not the calculation was relevant to the component. Best prectice is that mapStateToProps should only select and map state properties directly, not compute derived data. Going through and removing any similar instances of this will go a long way to removing excess re-renders.

So far, my work on 22127-optimization @ 1b8d65a0bf940d5a7fe4a13bafc377ecdd3d0dad has yielded the following results:

The gross jank time, beginning to end, in the above picture is just under 3 seconds with the longest individual jank lasting 835ms. By comparison, Instagram, possibly the most optimized React application there is, averages 1.5-2 seconds of jank time on (logged-in) refresh.

Actions #15

Updated by Lisa Knox 2 months ago

  • Status changed from New to In Progress
Actions #16

Updated by Lisa Knox 2 months ago

We should also change it so that as soon as the user clicks to navigate to a new url, a loading screen is shown until the next resource is loaded. This will help mitigate visual confusion, as it is possible now that the Details Card of the new resource will be displayed before the data table changes.

Actions #17

Updated by Peter Amstutz 2 months ago

Lisa Knox wrote in #note-16:

We should also change it so that as soon as the user clicks to navigate to a new url, a loading screen is shown until the next resource is loaded. This will help mitigate visual confusion, as it is possible now that the Details Card of the new resource will be displayed before the data table changes.

+1

We discussed a few weeks ago using "skeleton" loading indicators (#22360), which are a bit less jarring & produce less flicker (compared to wiping the whole panel and showing a spinner) when transitioning between contents. I agree that when the old visual content stays on screen too long, it's very confusing.

Actions #18

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2025-01-08 to Development 2025-01-29
Actions #19

Updated by Peter Amstutz about 2 months ago

  • Subtask #22440 added
Actions #20

Updated by Peter Amstutz about 2 months ago

  • Subtask #22441 added
Actions #21

Updated by Lisa Knox about 2 months ago

22127-optimization @ ef59cf0e2bd8eb09df15f2d3e6951078df8caba4

developer-run-tests-services-workbench2: #1396

  • - All agreed upon points are implemented / addressed.
    • Points implemented/addressed are listed below
  • ✅ Anything not implemented (discovered or discussed during work) has a follow-up story.
    • The biggest improvement that could still be made is lazily loading the Data Explorers, but this is a major refactor that is probably not worth the effort
  • ✅ Code is tested and passing, both automated and manual, what manual testing was done is described
    • Manually tested by clicking around madly and trying to trigger UI changes while the page was loading
  • ✅ New or changed UX/UX and has gotten feedback from stakeholders.
  • - Documentation has been updated.
    • n/a
  • ✅ Behaves appropriately at the intended scale (describe intended scale).
    • Intended scale is a single user navigating through the appplication
  • ✅ Considered backwards and forwards compatibility issues between client and server.
    • No changes to server interaction
  • ✅ Follows our coding standards and GUI style guidelines.
Notes:
  • The main improvements here:
    • `fetchProcessProgressBarStatus` now resolves promises sequentially instead of using `Promise.all()`
    • Context Menus are now rendered lazily instead of on Data Explorer initialization
    • moved several calculations that were done inside `mapStateToProps` to inside their components
    • memoized lots of calculations and components, focusing on those that rerender excessively
    • added `shouldComponentUpdate` to the larger class components so they only update when necessary
    • removed several unused props in various components
    • updated tests because some components now take different props
    • moved contextMenu test from component to e2e testing
Actions #22

Updated by Lisa Knox about 2 months ago

22127-wb-optimization @ 4fb545539ef6c7453aab4058f0cc415ca2a34b63

developer-run-tests-services-workbench2: #1398

The changes in #22159 are incompatible with the changes in this branch. After some discussion, the decision was made to revert those changes in favor of these changes. The 22127-wb-optimization branch is the new branch rebased onto main after reverting #22159.

Actions #23

Updated by Peter Amstutz about 1 month ago

  • Release set to 75
Actions #24

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2025-01-29 to Development 2025-02-12
Actions #25

Updated by Stephen Smith about 1 month ago

  • I like the use of preventRerender to suppress extra renderings, but I think it would be easier to understand if it was inverted to the positive eg. shouldRender, so that understanding it doesn't involve double negatives like preventRerender=false means rendering
  • In the subprocess progress bar, the dataExplorerId is intended to be used when the progress bar should respect the type filter of an associated data explorer. Since the subprocess panel doesn't have a type filter, having dataExplorerId default to the subprocess panel doesn't really do anything, and could cause confusion when it's left blank (and defaults to subprocess panel), so I would recommend leaving that without a default
  • On the project panel runs tab, it seems that changing the type filter takes effect, but the filter selection UI doesn't update, and the progress bar also doesn't re-render to update to the new selection if the new filter changes the number of failed runs
  • Also in the subprocess progress bar, the typeFilter prop is no longer used since it's calculated inside the component, so it can be removed from ProgressBarDataProps
  • I noticed that progressIndicatorActions is now progressIndicatorsActions, since it's really 1 shared progress indicator I think keeping it singular is better since it's easier to type and also matches the filename
  • In collection-panel.tsx:291, we can get rid of the type cast if we wrap the contents in if(this.state.item), should make it a bit safer
  • I noticed the change from dispatching getResource to indexing into state.resources, if the removal of the dispatch is necessary for performance, I would recommend abstracting that into a helper (maybe getResourceFromState or something) with type param T extends Resource = Resource that returns T | undefined, as it is, indexing into the resource array causes TS to think it's guaranteed to get a Resource back, so this would make sure the dispatch-less resource fetch has the same type safety
  • I also noticed that getProperty is replaced by indexing into state.properties, since getProperty is synchronous and not dispatched but a closure that just indexes into the state.properties that it's passed, I think the performance should be the same so I think for readability we should stick to getProperty
  • Similar thing with getUserUuid in collection-panel.tsx which was replaced with state.auth.user?.uuid, getUserUuid is basically doing the same thing so I think it shouldn't hurt performance to stick to using it
Actions #26

Updated by Peter Amstutz about 1 month ago

Stephen Smith wrote in #note-25:

  • I like the use of preventRerender to suppress extra renderings, but I think it would be easier to understand if it was inverted to the positive eg. shouldRender, so that understanding it doesn't involve double negatives like preventRerender=false means rendering

I had glanced at the code before and had the same thought, but I think what's going on here is that the semantics of react.memo is testing for equality of old/new values. So probably we just want to give the methods a less confusing name, like hasSameContent().

Actions #27

Updated by Lisa Knox 29 days ago

22127-wb-optimization @ 1674e22384022cd705a01819ff234718600883b5

developer-run-tests-services-workbench2: #1408

  • I like the use of preventRerender to suppress extra renderings, but I think...

I'm actually going to push back on this one, as I went back and forth several times on the name of this function and decided that prevertRerender is best. The callback is defined by react to only allow rerender if it returns false. Peter's suggestion of hasSameContent() doesn't tell you what the function does, and someone who is not familiar with React.memo wouldn't know what the function does at first glance. There is a comment at each instance explaining to return true to prevent re-render, false to allow re-render. Not willing to die on this hill, but I feel strongly that preventRerender is the best name for this function.

  • In the subprocess progress bar, the dataExplorerId is intended...

I put the default in during an intermediate stage and forgot to remove it.

  • On the project panel runs tab, it seems that changing the type filter...

Filter selection and progress bar update properly now

  • Also in the subprocess progress bar, the typeFilter prop is no longer used since it's calculated inside the component, so it can be removed from ProgressBarDataProps

done

  • I noticed that progressIndicatorActions is now progressIndicatorsActions, since it's really 1 shared progress indicator I think keeping it singular is better since it's easier to type and also matches the filename

Again, at an intermediate stage, that rename made sense.

  • In collection-panel.tsx:291, we can get rid of the type cast if we wrap the contents in if(this.state.item), should make it a bit safer

done

  • I noticed the change from dispatching getResource to indexing...

Fixed, and getResourceFromState was applied a few other placed to take advantage of its memoization. There were some places where getResourceFromState would not work, possibly due to the memoization of the child component? In these places, I added explicit type annotations and null checks instead.

  • I also noticed that getProperty is replaced by indexing into state.properties...

done

I also memoized breadcrumbs, which got skipped in the first iteration of the ticket

Actions #28

Updated by Stephen Smith 29 days ago

  • I've just realized that getResource is not a dispatchable thunk so performance-wise it should be about as performant as:
const getResource = <T extends Resource = Resource>(id: string, state: ResourcesState): T | undefined => {
    return state[id] as T;
}

And at any rate the getResourceFromState is functionally identical to getResource which is what clued me in lol, so the original use of getResource doesn't actually involve dispatch and just adds a plan JS function call overhead on top of indexing into the resource state which shouldn't need any more optimization beyond memoization and doesn't need a separate helper - sorry for the runaround on that one

  • The project panel runs tab filters now work when clicked, but the progress bar doesn't update when the type filter is changed - the existing behavior is that when the type filter is set to run/step/run+step then the progress bar reflects the progress of that type (status filters don't affect it since any status filter would result in a single segmented progress bar)
  • I also saw there are still a few remaining instances of directly indexing into resources / properties that can be reverted:
    • project-panel-run.tsx
      • getResource<ProjectResource>(projectUuid)(state.resources);
      • getProjectPanelCurrentUuid(state) || "";
    • details-card-root.tsx - getResource(properties.currentRouteUuid)(resources);
    • workflow-panel.tsx - getProperty<string>(WORKFLOW_PANEL_DETAILS_UUID)(state.properties);
    • breadcrumbs.ts - (getProperty<Breadcrumb[]>(BREADCRUMBS)(properties) || [])
Actions #29

Updated by Lisa Knox 27 days ago

22127-wb-optimization @ 5ff94d2dfe2fa28db530c0bfb2c7768b525f23b3
developer-run-tests-services-workbench2: #1416

The Details Card reversion caused a series of other bugs that it was decided in discussion were not worth chasing down. That reversion was skipped and the indexing into the state object remains.

The Project details card had a similar intermittent bug with the current item being unavailable, so I added a null check.

Everything else was updated per feedback

Actions #30

Updated by Stephen Smith 26 days ago

Passing this back for now after some discussion about the progress bar not updating and reversion of the removal of getState/Property

Actions #31

Updated by Lisa Knox 21 days ago

22127-wb2-optimization @ fffd7eada66be02cf4f3eb1ad452222a2175af0d
developer-run-tests-services-workbench2: #1430
(all workbench tests pass)

All points addressed, plus I discovered a bug where items in the sidePanelTree wouldn't update their frozen status, which is fixed now.

Actions #32

Updated by Peter Amstutz 20 days ago

  • Target version changed from Development 2025-02-12 to Development 2025-02-26
Actions #33

Updated by Stephen Smith 19 days ago

  • It looks like there are a bunch of merge conflicts now with main, so I would recommend pulling in main again
  • In tree.tsx, the TreeItemWeight enum can be auto incrementing for simplicity by removing = N from each line
  • main-panel-root.tsx and progress-indicator-actions.ts, there are 2 WORKBENCH_LOADING_SCREEN const definitions
  • collection-panel.tsx I noticed that componentDidMount checks if this.state.item is set, but since state.item is initialized as null and the only places that setState is inside componentDidMount and componentDidUpdate, which I think is only called after componentDidMount, I think that check will always return false preventing that block from running. It looks like the purpose is to initialize the state if there is an item, so should it be checking if(!this.state.item) or something else? I'm also wondering if checking the state is necessary since you might want to always overwrite the state with the collection fetched using match.params.id 1 line above, regardless if there's already state. I'm not sure but just want to double check that
    • In the same file it also doesn't look like isFrozen is being set anywhere aside from the state initialization to null, but it's checked in 1 place
  • In resources.ts, the getResource nested memoize doesn't seem to work if the resource changes but the ID is the same - the outer memoize will return the same value without evaluating whether the resource store in the inner memoize changed. Recommend removing the outer memoize so only memoizing on the inner function that accepts resource state
  • In group-details-panel we should stick to getUserUuid(state) for the user uuid
  • Also we can use getCurrentGroupDetailsPanelUuid(state.properties) since we already use that helper function elsewhere
  • In project-panel-run and project-panel we should use getProjectPanelCurrentUuid(state) to get the projectUuid/currentItemId to keep things consistent
  • In project-input line 59 we should use getUserUuid(state) since it does the same thing and is more consistent
  • I noticed a lot of guards for getResource for undefined UUIDs, what if we change getResource to accept string | undefined and then do id ? state[id] as T : undefined, then all consumers can just call getResource without worry (it already returns undefined when resource isn't found by indexing into resources with an invalid key so callers shouldn't need any changes to handle that)
Actions #34

Updated by Peter Amstutz 15 days ago

Attached in-development rocky8 package with Workbench 2 optimizations, for testing.

Actions #35

Updated by Lisa Knox 14 days ago

22127-wb2-optimization @ 0f277306342386aba05c8c6569740ed6381f9be0

developer-run-tests-services-workbench2: #1436

All bullet points addressed and main merged into this branch.

Actions #36

Updated by Stephen Smith 13 days ago

This lgtm!

Actions #37

Updated by Lisa Knox 13 days ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF