The original implementation of MySQL Cluster stored all information in main memory with no persistent storage. Later releases of MySQL Cluster permit storage of the data on disk. Perhaps the best quality of MySQL Cluster is that it uses the MySQL server as the query engine via the storage engine layer. Thus, you can migrate applications designed to interact with MySQL to MySQL Cluster transparently.
The shared-nothing, peer node concept permits an update executed on one server to become visible immediately on the other servers. The transmission of the updates uses a sophisticated communication mechanism designed for very high throughput across networks. The goal is to have the highest performance possible by using multiple MySQL servers to distribute the load, and high availability and redundancy by storing data in different locations.
Terminology and Components
Typical installations of the MySQL Cluster involve installing the components of the cluster on different machines on a network. Hence, MySQL Cluster is also known as a network database (NDB). When we use the term “MySQL Cluster,” we refer to the MySQL server plus the NDB components. However, when we use “NDB” or “NDB Cluster” we refer specifically to the cluster components.
MySQL Cluster is a database system that uses the MySQL server as the frontend to support standard SQL queries. A storage engine named NDBcluster is the interface that links the MySQL server with the cluster technology. This relationship is often confused. You cannot use the NDBcluster storage engine without the NDB Cluster components. However, is it is possible to use the NDB Cluster technologies without the MySQL server, but this requires lower-level programming with the NDB API.
The NDB API is object-oriented and implements indexes, scans, transactions, and event handling. This allows you to write applications that retrieve, store, and manipulate data in the cluster. The NDB API also provides object-oriented error-handling facilities to allow orderly shutdown or recovery during failures. If you are a developer and want to learn more about the NDB API, see the MySQL NDB API online documentation.
How Does MySQL Cluster Differ from MySQL?
You may be wondering, “What is the difference between a cluster and replication?” There are several definitions of clustering, but it can generally be viewed as something that has membership, messaging, redundancy, and automatic failover capabilities. Replication, in contrast, is simply a way to send messages (data) from one server to another.
You can view the MySQL Cluster as having three layers:
- Applications that communicate with the MySQL server
- The MySQL server that processes the SQL commands and communicates to the NDB storage engine
- The NDB Cluster components (sometimes called data nodes) that process the queries and return the results to the MySQL server
Note: You can scale up each layer independently with more server processes to increase performance.
The image below shows a conceptual drawing of a typical cluster installation.
The applications connect to the MySQL server, which accesses the NDB Cluster components via the storage engine layer (specifically, the NDB storage engine). We will discuss the NDB Cluster components in more detail momentarily.
There are many possible configurations. You can use multiple MySQL servers to connect to a single NDB Cluster and even connect multiple NDB Clusters via MySQL replication. We will discuss more of these configurations in later sections.
Features of MySQL Cluster
To satisfy the goals of having the highest achievable performance, high availability, and redundancy, data is replicated inside the cluster among the peer data nodes. The data is replicated using a synchronous mechanism in which each data node connects to every other data node and data is stored on multiple data nodes.
Note: It is also possible to replicate data between clusters, but in this case you use MySQL replication, which is asynchronous rather than synchronous. Asynchronous replication means you must expect a delay in updating the slaves, slaves do not report back the progress in committing changes, and you cannot expect a consistent view across all servers in the replicated architecture like you can expect within a single MySQL cluster.
MySQL Cluster has several specialized features for creating a highly available system. The most significant ones are:
- Node recovery
Data node failures can be detected via either communication loss or heartbeat failure, and you can configure the nodes to restart automatically using copies of the data from the remaining nodes. Failure and recovery can comprise single or multiple storage nodes. This is also called local recovery.
During normal data updates, copies of the data change events are written to a log stored on each data node. You can use the logs to restore the data to a point in time.
The cluster supports two forms of checkpoints, local and global. Local checkpoints remove the tail of the log. Global checkpoints are created when the logs of all data nodes are flushed to disk, creating a transaction-consistent snapshot of all node data to disk. In this way, checkpointing permits a complete system restore of all nodes from a known good synchronization point.
- System recovery
In the event the whole system is shut down unexpectedly, you can restore it using checkpoints and change logs. Typically, the data is copied from disk into memory from known good synchronization points.
- Hot backup and restore
You can create simultaneous backups of each data node without disturbing executing transactions. The backup includes the metadata about the objects in the database, the data itself, and the current transaction log.
- No single point of failure
The architecture is designed so that any node can fail without bringing down the database system.
To ensure node recovery is possible, all transactions are committed using read commit isolation and two-phase commits. Transactions are then doubly safe; that is, they are stored in two separate locations before the user gets acceptance of the transaction.
Data is automatically partitioned across the data nodes. MySQL version 5.1 Cluster supports user-defined partitioning.
- Online operations
You can perform many of the maintenance operations online without the normal interruptions. These are operations that normally require stopping a server or placing locks on data. For example, it is possible to add new data nodes online, alter table structures, and even reorganize the data in the cluster.
For more information about MySQL Cluster, see the online reference manual.
Learn more about this topic from MySQL High Availability.
Server bottlenecks and failures are a fact of life in any database deployment, but they don't have to bring everything to a halt. MySQL provides several features that can protect you from outages, whether you're running directly on the hardware, on virtual machines, or in the cloud. This book shows you how to use these features effectively, and helps you determine which combination of features will give you the most reliable system for a price you can afford.