Jump to content

What Is MySQL Cluster?

+ 4
  Chuck Bell's Photo
Posted Aug 03 2010 08:51 AM

MySQL Cluster is a shared-nothing, distributed node architecture storage solution designed for fault tolerance and high performance. Data is stored and replicated on individual data nodes (sometimes called storage nodes), where each data node executes on a separate server and maintains a copy of the data. Each cluster also contains management nodes. Updates use read-committed isolation to ensure all nodes have consistent data and a two-phased commit to ensure the nodes have the same data (if any one write fails, the update fails).

The original implementation of MySQL Cluster stored all information in main memory with no persistent storage. Later releases of MySQL Cluster permit storage of the data on disk. Perhaps the best quality of MySQL Cluster is that it uses the MySQL server as the query engine via the storage engine layer. Thus, you can migrate applications designed to interact with MySQL to MySQL Cluster transparently.

The shared-nothing, peer node concept permits an update executed on one server to become visible immediately on the other servers. The transmission of the updates uses a sophisticated communication mechanism designed for very high throughput across networks. The goal is to have the highest performance possible by using multiple MySQL servers to distribute the load, and high availability and redundancy by storing data in different locations.

Terminology and Components

Typical installations of the MySQL Cluster involve installing the components of the cluster on different machines on a network. Hence, MySQL Cluster is also known as a network database (NDB). When we use the term “MySQL Cluster,” we refer to the MySQL server plus the NDB components. However, when we use “NDB” or “NDB Cluster” we refer specifically to the cluster components.

MySQL Cluster is a database system that uses the MySQL server as the frontend to support standard SQL queries. A storage engine named NDBcluster is the interface that links the MySQL server with the cluster technology. This relationship is often confused. You cannot use the NDBcluster storage engine without the NDB Cluster components. However, is it is possible to use the NDB Cluster technologies without the MySQL server, but this requires lower-level programming with the NDB API.

The NDB API is object-oriented and implements indexes, scans, transactions, and event handling. This allows you to write applications that retrieve, store, and manipulate data in the cluster. The NDB API also provides object-oriented error-handling facilities to allow orderly shutdown or recovery during failures. If you are a developer and want to learn more about the NDB API, see the MySQL NDB API online documentation.

How Does MySQL Cluster Differ from MySQL?

You may be wondering, “What is the difference between a cluster and replication?” There are several definitions of clustering, but it can generally be viewed as something that has membership, messaging, redundancy, and automatic failover capabilities. Replication, in contrast, is simply a way to send messages (data) from one server to another.

Typical Configuration

You can view the MySQL Cluster as having three layers:

  • Applications that communicate with the MySQL server
  • The MySQL server that processes the SQL commands and communicates to the NDB storage engine
  • The NDB Cluster components (sometimes called data nodes) that process the queries and return the results to the MySQL server

Note: You can scale up each layer independently with more server processes to increase performance.

The image below shows a conceptual drawing of a typical cluster installation.

Attached Image

The applications connect to the MySQL server, which accesses the NDB Cluster components via the storage engine layer (specifically, the NDB storage engine). We will discuss the NDB Cluster components in more detail momentarily.

There are many possible configurations. You can use multiple MySQL servers to connect to a single NDB Cluster and even connect multiple NDB Clusters via MySQL replication. We will discuss more of these configurations in later sections.

Features of MySQL Cluster

To satisfy the goals of having the highest achievable performance, high availability, and redundancy, data is replicated inside the cluster among the peer data nodes. The data is replicated using a synchronous mechanism in which each data node connects to every other data node and data is stored on multiple data nodes.

Note: It is also possible to replicate data between clusters, but in this case you use MySQL replication, which is asynchronous rather than synchronous. Asynchronous replication means you must expect a delay in updating the slaves, slaves do not report back the progress in committing changes, and you cannot expect a consistent view across all servers in the replicated architecture like you can expect within a single MySQL cluster.

MySQL Cluster has several specialized features for creating a highly available system. The most significant ones are:

  • Node recovery
    Data node failures can be detected via either communication loss or heartbeat failure, and you can configure the nodes to restart automatically using copies of the data from the remaining nodes. Failure and recovery can comprise single or multiple storage nodes. This is also called local recovery.

  • Logging
    During normal data updates, copies of the data change events are written to a log stored on each data node. You can use the logs to restore the data to a point in time.

  • Checkpointing
    The cluster supports two forms of checkpoints, local and global. Local checkpoints remove the tail of the log. Global checkpoints are created when the logs of all data nodes are flushed to disk, creating a transaction-consistent snapshot of all node data to disk. In this way, checkpointing permits a complete system restore of all nodes from a known good synchronization point.

  • System recovery
    In the event the whole system is shut down unexpectedly, you can restore it using checkpoints and change logs. Typically, the data is copied from disk into memory from known good synchronization points.

  • Hot backup and restore
    You can create simultaneous backups of each data node without disturbing executing transactions. The backup includes the metadata about the objects in the database, the data itself, and the current transaction log.

  • No single point of failure
    The architecture is designed so that any node can fail without bringing down the database system.

  • Failover
    To ensure node recovery is possible, all transactions are committed using read commit isolation and two-phase commits. Transactions are then doubly safe; that is, they are stored in two separate locations before the user gets acceptance of the transaction.

  • Partitioning
    Data is automatically partitioned across the data nodes. MySQL version 5.1 Cluster supports user-defined partitioning.

  • Online operations
    You can perform many of the maintenance operations online without the normal interruptions. These are operations that normally require stopping a server or placing locks on data. For example, it is possible to add new data nodes online, alter table structures, and even reorganize the data in the cluster.

For more information about MySQL Cluster, see the online reference manual.

MySQL High Availability

Learn more about this topic from MySQL High Availability.

Server bottlenecks and failures are a fact of life in any database deployment, but they don't have to bring everything to a halt. MySQL provides several features that can protect you from outages, whether you're running directly on the hardware, on virtual machines, or in the cloud. This book shows you how to use these features effectively, and helps you determine which combination of features will give you the most reliable system for a price you can afford.

See what you'll learn

2 Replies

 : Aug 05 2010 04:54 AM
Great summary of MySQL Cluster.

Note that SQL and low-level coding aren't the only options to access the data - there are plugins for Apache, OpenLDAP and OpenJPA that directly access the data in the data nodes.

 : Aug 05 2010 06:40 AM
Really good overview

Would add that MySQL Cluster automatically handles partitioning (or the user can create their own partitioning schemes) but either way, sharding at the application level is avoided, making MySQL Cluster very write-scalable

Also, the combination of synchronous replication (via 2PC protocol) and shared nothing architecture means failover is typically sub-second as lock managers are avoided.

This contributed towards the 5 x 9s design goal of MySQL Cluster, and its strong adoption in telecoms subscriber databases, service delivery and web ecommerce workloads