Performance
Syside tools are designed to be fast and efficient, and we have benchmarks to prove it. Below is a performance comparison between the deprecated SysIDE Editor Legacy and Syside (Modeler, Automator, and new Editor) tools as well as a more detailed breakdown on the performance of Syside.
Syside vs SysIDE Editor Legacy
Table below contains performance comparison between capabilities of Syside tools and the SysIDE Editor Legacy. Even though the latter did not implement some more expensive validation checks, like inherited name duplication and type checking, its validation still took over 30-40 times as long on a single thread.
Stage |
SysIDE Editor Legacy |
Syside 1T |
Syside 8T |
|---|---|---|---|
Semantic resolution (or equivalent) |
❌ |
142 ms |
❌ |
Validation |
❌ |
85 ms |
13 ms |
Total |
❌ |
0.46 s |
0.23 s |
Stage |
SysIDE Editor Legacy |
Syside 1T |
Syside 8T |
|---|---|---|---|
Semantic resolution (or equivalent) |
❌ |
125 ms |
❌ |
Validation |
❌ |
85 ms |
13 ms |
Total |
❌ |
0.46 s |
0.22 s |
Stage |
SysIDE Editor Legacy |
Syside 1T |
Syside 8T |
|---|---|---|---|
Semantic resolution (or equivalent) |
5 s |
195 ms |
195 ms (single-threaded) |
Validation |
2.8 s ⚠* |
70 ms |
12 ms |
Total |
15 s |
0.56 s |
0.32 s |
* Not all more expensive validation checks have been implemented
Detailed Performance Metrics
More Details
As Syside is still under active development, multithreading is implemented only partially – some phases of the pipeline are single-threaded. Table below shows comparison of analysing the standard library and accompanying examples from here with varying number of threads.
The tests were done with /usr/bin/time -vp on CachyOS using Ryzen 7950X3D, with
mimalloc injected using LD_PRELOAD=/usr/lib/libmimalloc.so. For comparison, the
last column shows multithreaded performance using the standard allocator. While
mimalloc yields significant wall clock time improvement, memory usage is nearly
double compared to the standard allocator so the actual choice may depend user
preferences and system configuration. Systems with low memory may want to keep using
standard allocator if memory pressure is an issue. We will look into improving memory
usage in the future.
Currently, the multithreaded performance is bottlenecked by single-threaded semantic resolution phase, and less so by suboptimal AST build phase. With 8 or more threads, semantic resolution takes over 50% of the total pipeline time. These issues will be addressed in future updates.
Additionally, performance can be improved further by indexing and caching relevant information on disk.
Stat |
|
|
|
standard 8T |
|---|---|---|---|---|
User time (seconds) |
0.37 |
0.48 |
0.50 |
0.52 |
System time (seconds) |
0.03 |
0.11 |
0.15 |
0.23 |
Percent of CPU this job got |
98% |
245% |
304% |
275% |
Elapsed (wall clock) time (h:mm:ss or m:ss) |
0:00.41 |
0:00.24 |
0:00.21 |
0:00.27 |
Average shared text size (kbytes) |
0 |
0 |
0 |
0 |
Average unshared data size (kbytes) |
0 |
0 |
0 |
0 |
Average stack size (kbytes) |
0 |
0 |
0 |
0 |
Average total size (kbytes) |
0 |
0 |
0 |
0 |
Maximum resident set size (kbytes) |
630996 |
651288 |
678572 |
313664 |
Average resident set size (kbytes) |
0 |
0 |
0 |
0 |
Major (requiring I/O) page faults |
0 |
0 |
0 |
0 |
Minor (reclaiming a frame) page faults |
1222 |
1275 |
1359 |
71982 |
Voluntary context switches |
10 |
99 |
655 |
2946 |
Involuntary context switches |
10 |
18 |
30 |
84 |
Swaps |
0 |
0 |
0 |
0 |
File system inputs |
0 |
0 |
0 |
0 |
File system outputs |
0 |
0 |
0 |
0 |
Socket messages sent |
0 |
0 |
0 |
0 |
Socket messages received |
0 |
0 |
0 |
0 |
Signals delivered |
0 |
0 |
0 |
0 |
Page size (bytes) |
4096 |
4096 |
4096 |
4096 |
Stat |
|
|
|
standard 8T |
|---|---|---|---|---|
User time (seconds) |
0.39 |
0.42 |
0.46 |
0.44 |
System time (seconds) |
0.02 |
0.03 |
0.04 |
0.13 |
Percent of CPU this job got |
96% |
179% |
227% |
214% |
Elapsed (wall clock) time (h:mm:ss or m:ss) |
0:00.43 |
0:00.25 |
0:00.22 |
0:00.26 |
Average shared text size (kbytes) |
0 |
0 |
0 |
0 |
Average unshared data size (kbytes) |
0 |
0 |
0 |
0 |
Average stack size (kbytes) |
0 |
0 |
0 |
0 |
Average total size (kbytes) |
0 |
0 |
0 |
0 |
Maximum resident set size (kbytes) |
610336 |
631080 |
653556 |
316324 |
Average resident set size (kbytes) |
0 |
0 |
0 |
0 |
Major (requiring I/O) page faults |
0 |
0 |
0 |
0 |
Minor (reclaiming a frame) page faults |
1222 |
1230 |
1308 |
73103 |
Voluntary context switches |
9 |
179 |
1317 |
1251 |
Involuntary context switches |
9 |
19 |
51 |
40 |
Swaps |
0 |
0 |
0 |
0 |
File system inputs |
0 |
0 |
0 |
0 |
File system outputs |
0 |
0 |
0 |
0 |
Socket messages sent |
0 |
0 |
0 |
0 |
Socket messages received |
0 |
0 |
0 |
0 |
Signals delivered |
0 |
0 |
0 |
0 |
Page size (bytes) |
4096 |
4096 |
4096 |
4096 |
Stat |
|
|
|
standard 8T |
|---|---|---|---|---|
User time (seconds) |
0.49 |
0.52 |
0.55 |
0.55 |
System time (seconds) |
0.03 |
0.03 |
0.05 |
0.15 |
Percent of CPU this job got |
92% |
156% |
185% |
185% |
Elapsed (wall clock) time (h:mm:ss or m:ss) |
0:00.56 |
0:00.35 |
0:00.32 |
0:00.38 |
Average shared text size (kbytes) |
0 |
0 |
0 |
0 |
Average unshared data size (kbytes) |
0 |
0 |
0 |
0 |
Average stack size (kbytes) |
0 |
0 |
0 |
0 |
Average total size (kbytes) |
0 |
0 |
0 |
0 |
Maximum resident set size (kbytes) |
510924 |
529084 |
551188 |
285948 |
Average resident set size (kbytes) |
0 |
0 |
0 |
0 |
Major (requiring I/O) page faults |
0 |
0 |
0 |
0 |
Minor (reclaiming a frame) page faults |
1077 |
1142 |
1197 |
66281 |
Voluntary context switches |
10 |
294 |
1176 |
1041 |
Involuntary context switches |
22 |
11 |
27 |
18 |
Swaps |
0 |
0 |
0 |
0 |
File system inputs |
0 |
0 |
0 |
0 |
File system outputs |
0 |
0 |
0 |
0 |
Socket messages sent |
0 |
0 |
0 |
0 |
Socket messages received |
0 |
0 |
0 |
0 |
Signals delivered |
0 |
0 |
0 |
0 |
Page size (bytes) |
4096 |
4096 |
4096 |
4096 |