MySQL Cluster is an open-source distributed in-memory database. It combines linear scalability with high availability, providing in-memory real-time access with transactional consistency across partitioned and distributed datasets. It was developed to support scenarios requiring high-availability (99.999% or more) and predictable query time. Testing such a system is achieved via many interconnected pieces ranging from a large set of automated tests and manual exploratory testing. This post explores an overview of the testing methodologies we use and the current challenges we face.
Gaming, banking, telcos, and online services all are powered by fully-redundant and fault-tolerant software systems. At the heart of those systems, you can find MySQL Cluster — a distributed in-memory database having a minimum of five-9s availability (around 5 minutes a year). This open-source database provides nearly-linear scalability and real-time querying predictability at the microsecond level.
To achieve such higher availability it’s crucial to ensure that each release is thoroughly validated. Thousands of automated tests are run daily, for different versions and platforms to ensure defects are detected early and fixed. Testing is everyone’s responsibility and we rely on manual exploratory testing when validating releases or selected critical features.
This paper provides a birds-eye view of the software testing process of MySQL Cluster and the challenges we’re addressing.
MySQL Cluster was born in the early 90s as part of Mikael Ronström PhD studies. It was designed to meet predictability and high availability requirements of the telecommunications industry. Initially developed at Ericsson, it became part of MySQL in 2003, later acquired by Sun MicroSystems and currently under Oracle ownership.
MySQL Cluster can be used stand-alone or together with MySQL Server. On the first option, a cluster is setup by running management and data nodes, and accessed via a low-level NdbAPI (C++, Java, or NodeJS). On the second option, we additionally have one or more MySQL Servers which provide the convenience of the SQL language (for data definition, manipulation, and query) and replication features.
99.999% availability is achieved with a single cluster via an architecture where there’s no single point of failure (e.g. multiple management nodes handling connections/monitoring and multiple data nodes storing/processing data) and by a design that ensures all operations can be ran online (e.g. upgrades, configuration changes, or backups). If we use a setup with multiple clusters connected via replication more than five-9s of availability can be achieved.
Current usage and status
MySQL Cluster is used in telecommunications, banking, gaming, and online services. The requirements leading to this product are predictable response times, high-availability, and linear scaling.
Since 2017, MySQL Server has adopted a 3-months release cycle and MySQL Cluster has been following suit. At the time of writing, version 8.0.18 is under preparation. MySQL Cluster supported platforms is a subset of those of MySQL Server: Linux is the prime supported/recommended platform but additional support for Mac OS X, Solaris, and Windows exists.
Development is undertaken by a team of 22 people, from 6 different countries (from GMT-7 to GMT+5:30 timezones). A significant part of the team works remotely from home. The main technology is C++ accounting to ~700K lines of code as of version 8.0.18. Communication is mostly handled by Slack and Zoom.
Trunk-based development is used across the MySQL organization. Changes are implemented and tested in feature branches before pushed to the main branch. Code reviews are done via Gerrit, ReviewBoard, or by sharing patches by e-mail. Daily and weekly automated regression tests are executed in multiple platforms to ensure failures are detected early.
Testing MySQL Cluster
Testing is a fundamental activity to ensure MySQL Cluster availability requirements.
During the delivery of new functionalities or bug fixes, developers are expected to create tests to cover the changes done. Those tests typically fall in one of two categories: (i) single-host system tests; or (ii) multi-host scenario-based testing. There are no unit or integration tests. All tests are automated and can be run on-demand (either in testing servers or in local dev machine). Error-handling and automatic recovery cases are validated through error injection functionality. This is directly supported in MySQL Cluster by enabling error insert codes.
Automated regression testing is complemented with manual exploratory testing. This testing is always done during the time-frames preceding an official release and after the development of selected critical features.
Manual exploratory testing
Manual exploratory testing plays a fundamental role to ensure the product’s quality. It focuses on two interconnected areas: random execution of operations (e.g. traffic, maintenance); and exploiting corner case scenarios. Because MySQL Cluster is built to be highly-available, the testing goal is to make it unavailable. Invariably, this type of testing uncovers many potential issues and ultimately it’s what determines if the changes are stable enough to be released.
This testing starts by taking a build from a feature or pre-release branch, installing it in a multitude of servers, and running the cluster. A mix of automated scripts and manual procedures are then executed. Automatic scripts normally handle traffic operation (i.e. random creating/dropping of databases/tables and insertion/update/deletion of records). Operations include backup/restore of data, upgrade/downgrade of processes, and configuration changes.
MySQL Cluster has been built to support a minimum of 99.999% availability and that all operations can be executed online. For example, operations such as upgrade of cluster processes can be executed while running traffic and performing a backup. Processes can also be manually crashed allowing automatic-handover to occur. In all cases, any combination of operations should maintain the cluster operational.
The choice of operations to be executed depends a lot on the specific features being tested or the product areas that have been changed. Some times it just boils down to intuition/guessing. Still, the combination of operations should be sensible to be meaningful. All this means, that this kind of testing is very dependent on highly skilled people with a deep understanding of the product.
Often we find that exploratory testing goes hand-in-hand with exploiting corner-case scenarios in which we touch MySQL Cluster limits. When there are no crashes (expected), it’s often much harder to interpret these results leading to many discussions. It’s by far a simple process but immensely valuable as it helps to clarify behavior expectations and, consequently, the implementation of a better product.
Single-host system testing
Most of the tests fall into this category (accounting to 8,000 tests). The most common setup to run these tests consists of the simplest cluster configuration: 1 management node, 2 data nodes, and a MySQL Server. All processes run on the same host. Tests are written in a very simple language called mysqltest consisting of a mix of SQL, Perl, and command-execution. Tests are run by a tool developed and maintained by MySQL Server team called MTR (MySQL Test Runner).
MTR-based tests cover a broad range of functionality from database-specific operations to MySQL Cluster specific operations. Database specific operations are data definition operations (CREATE, ALTER, DROP), data manipulation operations (INSERT, UPDATE, DELETE), data querying operations (SELECT, SHOW), data control operations (GRANT, REVOKE), and data transaction operations (START TRANSACTION, COMMIT, etc.). Examples of MySQL Cluster specific operations are backup/restore, nodes restart. Because these tests are executed by running SQL queries via MySQL Server, they also act as integration tests between MySQL Server and MySQL Cluster. Simple replication scenarios using both MySQL Server and MySQL Cluster are also tested in this category.
There are three main reasons why the majority of tests fall into this category. First, test creation is extremely simple — a test can consist of a few lines of SQL code. Second, although starting all MySQL Cluster and Server processes takes a few seconds, it’s fairly easy to run such tests in a developer machine. Third, there is a good infrastructure which is maintained by the MySQL organization to run these tests.
Multi-host scenario-based testing
MySQL Cluster is meant to run in multiple hosts, being four hosts the minimum recommended deployment for full redundancy and fault tolerance. This configuration consists of 2 management nodes, 2 data nodes, and two MySQL servers. However, many other configurations can be used (e.g. up to 144 data nodes).
In addition to multi-host setups, there are other advanced testing functionalities required. For instance, MySQL Cluster supports online upgrade/downgrade between all supported versions (7.4, 7.5, 7.6, and 8.0) which requires setting up all these different versions, start a cluster in one of the versions and then restart different processes in other versions. Also, NdbAPI has been designed to be binary compatible between versions (i.e. a client compiled against version 7.4 can be used against 8.0) which is also tested automatically. Setting up replicated clusters requires starting multiple clusters and configuring replication between them. Finally, performance testing requires setting up the test hosts CPU affinity to specific MySQL Cluster’s processes.
The need to support the above testing requirements in a fully automated manner led to the development of own tooling. We call “Autotest” to the set of tools, scripts, automation server, and front-end to explore test results. Autotest is responsible to run over 1,000 tests using different cluster configurations leading to thousands of different combinations.
Autotest tests are implemented in C++ with the support of our testing library (NDBT). Tests directly access MySQL Cluster via NdbAPI or manipulate processes via the test orchestrator. One particularity of these tests is that they can be used to validate MySQL Cluster behavior independently of the used configuration covering a broad range of functionality. Because these tests typically reproduce customer scenarios combining non-trivial operations, they fall under the category of scenario-based testing. Tests are executed using ATRT (AutoTest Run Test). This tool is responsible for orchestrating the deployment and setup of arbitrary clusters, starting all processes, running tests, manipulating processes (e.g. downgrade), and detect cluster failures. Test results are available via custom front-ends, showing individual test run results, statistics and trend information.
The development and maintenance of testing tools/libraries and support for test execution have many challenges. The first challenge is that testing tools/libraries require constant development to accompany the testing needs. Also, similarly to the development of any other software, this requires planning, developing, and testing which is only possible having a team with clear responsibility for this. Hence, in the middle of 2018, a dedicated 5-people team has been created. The second challenge is to ensure that a complex test infrastructure like Autotest that handles multi-host software deployment and test execution works correctly reporting only test failures. After almost a year of work, we not only managed to stabilize test execution but also built-in features to support new test cases and clean up some of the existent functionality. Having achieved that, we’re now looking at the next challenges ahead.
Testing Ice Cream Cone
The test ice cream cone is an analogy depicting a large number of systems tests (ice cream part) and a minority of integration and unit tests (cone). This is to contrast with the test pyramid that defends the opposite — a large number of unit tests at the bottom and then a smaller number of integration and system tests at the pyramid top.
The challenges of the test ice cream cone are well documented: noisy test results due to the test flakiness, long testing rounds, difficulty to understand the root cause of failures. Preliminary work has been done to optimize the long testing rounds by parallelizing test execution and build mechanisms to cope with test flakiness by retrying test execution.
A significant effort is still spent on determining if a test failure is due to flakiness or not. This issue is exacerbated due to MTR and Autotest tests running in separate environments hence requiring going over test failures in two different systems. Automation has been built to analyze test failures, determine its flakiness likelihood, and check if there are bugs already assigned to these failures or if developers’ input is needed — this effort is still on-going.
Looking forward, the introduction of test selection and reduction techniques is being considered to help to identify which tests can be skipped while preserving the same test coverage.
As the number of tests grows it’s important to categorize them into different groups. Smoke tests, performing basic validation of the overall MySQL Cluster, belong to one of such groups. Examples of other groups are upgrade/downgrade tests or tests validating specific bugs. This segmentation not only helps to identify coverage gaps but also to support test selection through the decision of which tests should run more often. We’ve started this process for upgrade/downgrade tests by documenting them, identifying missing scenarios, and fixing infrastructure shortcomings.
Complementing manual exploratory testing
As mentioned before, manual exploratory testing is an invaluable activity that helps uncover many issues and clarify expectations about the product. However, the quality of this testing is tightly coupled to the skils of the individuals doing it which are hard to find. To scale this kind of testing, we’re considering the usage of Chaos Engineering tools. Tests that inject failures and modify the environment (e.g. introduce loss/delay in network links) exist meaning that basic functionality is already in place. Hence, the natural step is to further run combinations of failures/scenarios-changes to verify of a larger number of scenarios.
Ensuring the lack of performance regressions is essential in a database product. For MySQL Cluster this is critical as small variations in the responses can have a big impact on our users. Traditionally, performance testing has been done in an exploratory fashion. This means that sporadically a performance regression could be introduced.
In 2017, a project was started to develop a fully automated performance test suite. The initial specification of these test scenarios was developed in partnership with one of our customers providing us valuable information on which parts to focus on. Since then we have developed a MySQL Cluster traffic simulator (using the C++ NdbAPI) and added many load and performance test cases. Although the test suite is still not fully automated — requiring some manual monitoring of the results — this has been proved successful in identifying important bugs. Because some tests are executed for several days straight some effort is still required to use them as part of change validation.
The testing of a database product poses many different challenges. While the specifics of testing are tailored for the product under test, the testing challenges are common to many other software systems. In summary our lessons have been: automated testing is paramount to ensure a large number of scenarios are covered but they still leave space for manual exploratory testing; there’s constant need to develop and maintain testing tools and infrastructure which is most effective when handled by a dedicated team; a large number of tools/techniques to ensure we balance between adding new tests and keeping the existent tests under control.
MySQL Cluster source code and binaries are available from https://www.mysql.com/products/cluster