Database Performance Archaeology
… an expedition to uncover (and fix) database performance issues!
One of the worse things that can happen when upgrading to a newer database version is discovering that the performance is not as good as before. Despite the effort put into gate-keeping MySQL NDB Cluster’s strict performance requirements, one of our customers found a performance regression when upgrading from our 7.4 version to our 7.6 version. How did that happen when our automated performance test suite failed to show it? To answer that, and fix the issue we enrolled on a database performance archaeology expedition…
When your performance is not good enough
MySQL NDB Cluster is an open-source in-memory distributed database developed for high-availability (99.999% or more) and predictable query times. It can be found at the core of gaming, banking, telecommunication, and online services. Benchmarks [1, 2] have shown latencies in the order of 5–10 ms per transaction and throughput of 1.4M New Order TPM using a common 2-node configuration (each transaction consisting of 30 SQL statements — DBT2 benchmark).
While the specific performance requirements of each customer might vary, from the product side it’s critical to ensure that performance is stable from version to version. We take predictable latency and throughput very seriously and do extensive testing to ensure that [3]. Despite our best efforts, one of our customers reported a performance degradation when upgrading from 7.4 version to 7.6 version for one of their multiple cluster configurations.
The performance requirement for this customer defined a transaction latency below 10 milliseconds over a determined load range. Load range was determined as a function of latency — max throughput having a cut-off when 0.1% percent of transactions have latency above the 10ms threshold. This demanding performance criterion illustrates well the level of stability required for MySQL NDB Cluster. When using the 7.4 version for one of their configurations, our customer could achieve a max throughput of 50K TPS. However, when upgrading to 7.6 version, the maximum throughput fell a bit short to 48K TPS, meaning a 4% performance degradation.
Our performance benchmarks reported significant improvements between 7.4, 7.5, and 7.6 versions, so this regression was a surprise to us. How could this happen? Unfortunately, it’s not so difficult… Even with extensive testing, it’s not feasible to evaluate all possible configurations/hardware combinations — and that can result in performance regressions which are challenging to discover. Furthermore, MySQL NDB Cluster provides many configuration options to fine-tune performance matching hardware capabilities increasing the scope of investigation. In our setups, no performance regression was visible — but in our customer setups, it was… and that deserved investigation.
After a few iterations with our customer, we managed to create a setup where our existing performance tests could reproduce a visible degradation between the versions.
The image above compares MySQL NDB Cluster 7.4 and 7.6 versions using a setup where there’s a visible performance degradation between the versions. The blue line shows the throughput measured as transactions per second (TPS) — while max throughput for 7.4 tops ~3500 TPS, for 7.6 it falls a bit short of 3200 TPS, a regression of 9%. The green line shows the mean latency measured in milliseconds (ms), lower values are better— while for 7.4 we can see a very thin line which indicates very low variability, for 7.6 we see a much thicker line (indicating higher variability) across the TPS range until we reach the max throughput. The orange line shows the ratio between transactions taking longer than 10ms with all transactions (sampled for each second, values closer to zero are better) — while for 7.4 we see very small variation up to the point where we’re reaching max throughput, for 7.6 we again see much higher variability and much earlier in the TPS range.
Note that the numbers reported in our setup aren’t comparable with those reported by the client. Even though the MySQL NDB Cluster configurations are pretty similar, database schema, traffic, and hardware are not — and that makes a huge difference. Yet, the regression we observed in our setup was similar enough to what the customer reported giving us confidence to use it for investigation.
With a setup that allowed us to reproduce the problem, we began to diagnose the issue by identifying when it was introduced first. That kicked off our database performance archaeology expedition/investigation.
Database Performance Archaeology
According to Wikipedia, software archaeology is the “study of poorly documented or undocumented legacy software implementations, as part of software maintenance” [4]. So, we can define database performance archaeology as the study of poorly documented or undocumented performance changes, as part of the software maintenance.
While [5] applies database performance archaeology as a case-study to show performance improvements, our goal was to find the changes that explain the observed performance degradation. Given our customer upgraded from a 7.4 version to a 7.6 version our analysis aimed at answering two questions: i) were there changes in the 7.4 version that improved performance but which were not available in later versions (7.5+)?; and ii) which changes introduced performance regressions?
Studying performance improvements
When multiple software releases are maintained simultaneously (e.g. 7.4 and 7.5) improvements and bug fixes in lower releases are applied to later releases. In theory, any performance improvement in the latest version of 7.4 should also be available in the latest version of 7.5. In practice, however, there may be cases when that is not be possible. Code in different versions may differ significantly requiring alternative implementations (which can introduce bugs), or simply it may not be possible (or worthy) to port changes due to removal of old or introduction of new features.
Analyzing a set of patches for MySQL NDB Cluster 7.4.10 we found the following:
In the above chart series, one can see visible performance issues in the first 7.4.10 patches (in particular patch #5 making it worse). Patches #7 and #8 fixed most issues and patch #11 further improving performance. From patch #11 onward performance was stable. That same level of performance is visible for 7.4.10+ versions as well.
This analysis allowed us to reduce our investigation to a few patches responsible for the performance increase. Checking those in 7.6 versions we confirmed that they were all present, albeit not implemented in the same way. Double-checking the implementations we concluded that performance improvements implemented for 7.4 were in 7.6 versions and hence those couldn’t explain the performance differences.
Studying performance regressions
After confirming that all performance-related improvements were implemented in later versions — and answering our first question — we tried to answer the second question: which changes could be responsible for performance regressions? To answer that we looked at the different 7.4, 7.5, and 7.6 releases. In particular, we were trying to identify the change/version where latency variation increased (or when the green latency line become thicker).
The above chart series shows a subset of test runs where we compare several 7.4.10, the latest versions of 7.4 and 7.5 versions, and 7.6 versions. Between 7.4.10 and 7.4.29 performance is practically the same, showing great stability between versions. Also, there are no significant performance changes in 7.5 versions. This was not the case for 7.6 versions — using the new test setup a big performance degradation became visible starting in 7.6.7, where we can see the latency variability increasing (thicker green line). Furthermore, we discovered that in all versions prior to 7.6.7 our new setup managed to stumble on bugs — the reason why no performance data is available for those versions.
Further investigation using the above analysis allowed us to conclude that performance regression was introduced in MySQL NDB Cluster 7.6.4, as part of the Partial Local Checkpoint feature (partial LCP) [6]. This important feature allows reducing the logging responsible for recovering a database in case of a crash by doing more frequent smaller local checkpoints. This feature was extensively tested and benchmarks showed better performance with it enabled. The issue with benchmark testing is that is highly dependent on the configuration and available hardware and changes in any of them can show significantly different results.
Fixing the issue
After identifying the potential changes it would be nice to say that fixing the issue was simple — but it turned out that in addition to partial-LCP two other issues were identified, all contributing to the performance regression in different and important ways. The story about the fixes deserves its own blog but it would be unfair not to show the end result.
The above charts compare the performance between 7.4.10 and 7.6.14 with three performance bug-fixes. The charts show that 7.6.14 with fixes has significantly improved performance when compared to 7.6.14 without the fixes, and also it is even better than for 7.4.10 version in two important ways. First, there is a lower mean latency across the TPS range (lower the better). For example at 2000 TPS, while we had a mean latency of 1.8ms in 7.4.10, mean latency dropped to 1.5ms in 7.6.14 with the fixes (16% improvement). Second, latency variation also became lower (lower the better). For example, at 3500 TPS, latency variation for 7.4.10 was 22ms (and almost 100% of transactions taking longer than 10ms) while for 7.6.14 with the fixes latency variation became 4.4ms (and only 50% of transactions taking longer than 10ms).
Conclusion
Database performance archaeology is valuable to uncover important database performance changes and it proved helpful in supporting our performance regression investigation. The pre-requisites for such an investigation are to have a stable setup comprised of a well-understood configuration, tuned hardware, and benchmarking tests which is not a minor undertaking. In our investigation, we managed to narrow down the search space to locate where the regression was introduced. Additionally, the new configuration used in this investigation became part of our regular test suites, giving an extra guarantee of stability for our customers and the resulting data is used to qualify new versions of our product.
MySQL NDB Cluster
MySQL NDB Cluster is an open-source distributed in-memory database. It combines linear scalability with high availability, providing in-memory real-time access with transactional consistency across partitioned and distributed datasets. It was developed to support scenarios requiring high-availability (99.999% or more) and predictable query time.
Source code and binaries are available from: https://www.mysql.com/products/cluster