November 8, 2020
Some time ago, I posted a screenshot on Twitter showing a flame-chart from the Profiler tool. At that time, I was working on improving the performance of an application we were developing at Egnyte. A certain functionality, for a large amount of data, took an incredibly long time - 3.5 minutes! During this time, the application displayed a "spinner," and the user didn't know if something was happening or if it had frozen. After a few days of working with the Profiler, I managed to implement improvements that reduced the calculation time from 3.5 minutes to 35 seconds. In this post, I would like to describe how I achieved this.
Let's start with a description of the functionality that, to put it briefly, was lacking in terms of performance. One of the main views shows a list of folders with files containing sensitive data. These can be credit card numbers, medical data, personal data, and more. This data can fall under one of several built-in policies, such as HIPAA, GDPR, but we also allow users to define their own policy, for example, based on a previously created dictionary. Initially, the "Sensitive Content" view showed only a flat list of folders. Last year, our Product Owner, along with the UX team, concluded that working with a flat list of folders might be inefficient. Instead, a much better, and essentially more natural way of representing data would be a folder tree.
There have been several approaches to folder trees in our project. They mainly relied on the unique folderId
property. Unfortunately, in the case of "Sensitive Content," we couldn't use this because not all folders could contain sensitive data, and only for such folders did we receive a folderId
.
/Shared/A/B/
In this case, we have 3 folders (Shared, A, B), of which only B has a folderId
There can be a multitude of such SC (Sensitive Content) locations. The performance issue appeared already at 250K locations. And it was not an exception, as confirmed by a client where we found almost a million folders. For 1M list elements, the tree-building time was 3.5 minutes. Therefore, my task was to ensure that the tree for 1M elements builds in less than 60 seconds.
Returning to the algorithm. Very simple or so it seemed :)
/
folderId
, treat it as a meta-folderFor simplicity, I omit the fact that we support different data sources, and these separators can vary greatly :) Moreover, as it later turned out, some data sources can have two folders with the same name at the same nesting level 😱. And how to distinguish them? I will also skip the fact that the resulting tree was to be presented in the form of a "sparse-tree." In short - it means that if a folder contains only one sub-folder, the parent path should be collapsed/merged.
List:
/Shared/A/B/C
/Shared/A/B/D
-->
Tree:
/Shared/A/B
/C
/D
As I mentioned earlier, this was not the first tree we had to display in the application. In a completely different view, we also had to create a sparse-tree and didn't want to have several different implementations. Therefore, we wrote a simple module for building and managing the tree. It was based on two small components:
buildTree
function that took a flat array of nodes and the place (path in the tree) from which it was to insert these nodesredux-toolkit
that managed the tree structure (expanding, collapsing nodes, etc.)The entire tree, or rather these two structures "tree" and "flat," were kept in redux as follows:
{
tree: {
path: "", // root
children: {
"Shared": { // path-part or folder name as a key
path: "/Shared",
children: {
"A": {
path: "/Shared/A",
children: {}
}
}
}
}
},
paths: {
"/Shared": {
meta: true,
...nodeProps
}
"/Shared/A": {
meta: false,
folderId: "some-unique-id",
...nodeProps
}
}
}
This structure is obtained from the helper function buildTree
.
Thanks to the use of redux-toolkit, and consequently the immer library, we could perform operations on the tree very easily:
const initialTreeState = {
initialized: false,
tree: {
path: "",
children: {},
},
paths: {},
}
export const createTreeSlice = treeName =>
createSlice({
name: treeName,
initialState: initialTreeState,
reducers: {
insertTree: ({ tree, paths }, { payload }) => {
paths = {
...paths,
...payload.paths,
}
const node = getNodeByPath(payload.parentPath || "", tree)
node.children = payload.tree.children
},
toggleNode: ({ tree }, { payload: path }) => {
const node = getNodeByPath(path, tree)
node.expanded = !node.expanded
},
},
})
The helper function getNodeByPath
is used to search for a node by path. It can also search for a node in a sparse-tree.
And everything was going smoothly, but then a client with 1 million folders came, and boom... The Product Owner sets up an Epic in Jira titled "Support 1M folders on SC tree view." A quick brainstorming session and a list full of ideas right away:
The first mistake - no one even started the Profiler to see what was taking so long. Everyone assumed that the current tree implementation was top-notch and couldn't be better. The Profiler itself, at first glance, is not a simple tool, and maybe that was the reason we jumped to ideas like building the tree "on the fly" or moving it to a web-worker.
You're probably wondering - but how on the fly? After all, when a request is made, only when the response comes does the browser parse the JSON and provide the response. Yes, but... in this case, our backend developers also had to work a bit on optimization, and instead of returning full data, they started returning our SC list as a stream. Thanks to this, we could, for example, use the oboe.js
library to parse JSON on the fly.
Of course, I tried this approach because, after all, someone wrote it into the Jira task, so it had to be checked, right? 😜 Cool, the JSON was parsing "on the fly," but the stream lasted 30s instead of 10s, and I hadn't even started building the tree yet. So I gave up and decided to look elsewhere.
I also tested the approach with a web-worker. But I encountered a completely different problem. Okay - I can download 1M elements and build a tree based on them, but I have to send it later from the web-worker to the main thread. The tree structure is quite extensive, along with the data we saved in this flat paths
structure. If we want to send such large data from one thread to another, the browser has to serialize the data, send it, and then parse it again. This also caused the browser to "freeze" during the transfer from one memory location to another. Of course, there are ways to send "directly" (without copying) through so-called Transferable Objects, e.g., ArrayBuffer, but I decided that for now, it might not be worth the effort and decided to check if our tree implementation was as great as we thought 😜
I sat in front of the computer screen, launched the dev-tools, and pressed the "record" button in the Profiler. After a while, I got a colorful graph that reminded me of the defragmenter times of Windows 98 🤣
The first thing that caught my eye was this purple color, which dived very, very deep. Upon closer inspection, it turned out that a lot of these purple elements were the work of immer.js
. A quick glance at the documentation and boom! A bullseye. It turns out that when "inserting" a large amount of data through immer
, we can speed up this process through Object.freeze
more info here. This procedure allowed me to go from 12.54s to 11.24s for 54K elements. For 1M, the jump was, of course, proportionally larger. But it still wasn't it...
Did you know that if you click on a block in the Profiler and then move to the file, you get times for individual code blocks? No!? 😎 Now you know ;)
What stands out is 229ms for building a simple string 🤯 which is the current path. It turned out that this simple oversight could be replaced with a shorter piece of code, which ultimately takes 1.7ms.
You might think - (ironically) wow... 227ms... "bravo" 👏. What is 227ms? If we look at it as a single value - indeed... micro-optimization. But remember, the goal was to handle 1M elements, and the path concatenation operation concerned each sub-folder.
How to make a shallow copy of an object or extend another object - nothing simpler - spread operator ...
. If you have to support browsers like IE11, you probably use babel.js - just like us... and such a spread operator, in the end, is translated to Object.assign
(*big simplification).
Object.assign
is relatively slow and can cause problems at a larger scale. In this case, I opted for simple key-by-key copying. Thanks to this simple procedure, I reduced 154ms to 44ms. And again, for individual elements, it doesn't matter at all, but when iterating over a large data set, such optimizations can work wonders.
After "tuning" immer and removing a few Object.assign
calls, or rewriting them into a simple loop, I ran out of ideas for "simple" optimizations. It was necessary to tweak the way the tree was built.
The previous implementation divided SC locations by source and built a sub-tree for each of them. For each sub-tree, aggregated values were calculated (e.g., if a folder itself contained 10 SC, but also had 50 sub-folders, we wanted to show the summed values). For each such sub-tree, summations were performed, and then the source node was updated according to the summed values for all folders.
Each such operation put something into the redux state. I thought - who needs it? Who needs it? After all, we don't show the tree until everything is calculated and updated. Therefore, I changed the code so that the entire tree, along with calculated aggregated values, is first built in memory, and then with one operation, the built tree is inserted into redux.
Moreover - in the tree requirements, it was written that certain nodes were to be expanded by default, e.g., the first level + potentially previously selected item on the list (you can go from the list to the tree with a simple button). Previously, expansion operations were triggered by the toggleNode
action. I changed that too - instead of triggering a redux action, I simply change the expanded
value to true
directly in the node object.
You might say - numbers please! :)
For 54K elements, I went from 12.25s to 2.4s 🚀
Product Owner is over the moon.
I asked the backend developers to prepare an environment for testing 1M elements. I wanted to see if my optimizations for 54K would be justified. And the smile didn't leave my face :)
Before optimization, the tree-building time was ~3.5 minutes. After applying the above-mentioned changes, it was reduced to 59s.
Approximately ~70% savings. In total, one could say - job done - it was supposed to build in less than 60s... 59 is less than 60 😅 It's all good...
I was a bit tired of digging but a teammate rightly pointed out:
Well, nice, nice, but for me, it's still slow.
He also added later that it doesn't take anything away from me, and in his opinion, I did a great job... but it was hard not to agree with him. From the moment of clicking on the navigation element to the time the view was displayed, the user had to wait a total of 90s:
As a user, if I saw only a spinner ("spinner") for 90s, I would be furious :) I don't want to think about what our users felt when they had to wait 3.5 minutes... probably none of them lasted 😅
I dived into the Profiler again. For meta-folders, a folderId
was generated. This was because another place in the code needed this id
(never mind). In the end, the generated id meant nothing (it was never sent to the backend). However, someone came up with the idea that this meta-folder-id should be a hash of the path...
export const createUniqueIdForLocation = path => btoa(encodeURIComponent(path))
The btoa
function encodes a string as base64. It takes an average of 0.25ms... which is a fraction of a millisecond. But when you think about it more - who needs this hash? who needs it?
Exactly! If the meta-folder-id is just base64 of the path, which in fact also contained the source id
, so it was unique concerning the entire list, then why even bother with this whole hash?
- id: createUniqueIdForLocation(path),
+ id: path,
name: getLocationName(path),
This one diff made me go from 59s to 35s for 1M elements, which gave ~40% gain 🤯
So now the client no longer waited 90s but 66s - including data retrieval and parsing! Considering that the requirements stated that the tree should be built in less than 60s, the Product Owner and clients should be satisfied 😅
Of course, we don't rest on our laurels. Blocking the user for 60s is still a bad idea, so we continue to think about improving the implementation. Maybe we'll finally throw it into a web-worker. Who knows? Maybe I'll manage to gather material for the next post 😉.
Lesson one - instead of guessing it's better to measure.
Lesson two - if you operate on a large scale, iterate over a large data set, optimizations at the ms
level for one iteration can work wonders 🚀
Lesson three - put into redux only when you're ready 💪
Lesson four - if there's no need, don't complicate the situation 😉 (see ID & btoa).
I hope that thanks to this story, you'll reach for the Profiler earlier and manage to improve the performance of more than one application.