Quantcast
Channel: Severalnines - galera
Viewing all 111 articles
Browse latest View live

Become a MySQL DBA blog series - Database upgrades

$
0
0

Database vendors typically release patches with bug/security fixes on a monthly basis, why should we care? The news is full of reports of security breaches and hacked systems, so unless security is not a concern, you might want to have the most current security fixes on your systems. Major versions are rarer, and usually harder (and riskier) to upgrade to. But they might bring along some important features that make the upgrade worth the effort.

In this blog post, we will cover one of the most basic tasks of the DBA - minor and major database upgrades.  

This is the sixth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include Replication Topology Changes, Schema Changes, High Availability, Backup & Restore, Monitoring & Trending.

MySQL upgrades

Once every couple of years, a MySQL version becomes outdated and is not longer supported by Oracle. It happened to MySQL 5.1 on December 4, 2013, and earlier to MySQL 5.0 on January 9, 2012. It will also happen to MySQL 5.5 somewhere in 2018, 8 years after the GA was released. It means that for both MySQL 5.0 and MySQL 5.1, users cannot rely on fixes - not even for serious, security bugs. This is usually the point where you really need to plan an upgrade of MySQL to a newer version.

You won’t be dealing only with major version upgrades, though - it’s more likely that you’ll be upgrading to minor versions more often, like 5.6.x -> 5.6.y. Most likely, it is so that the newest version brings some fixes for bugs that affect your workload, but it can be any other reason. 

There is a significant difference in the way you perform a major and a minor version upgrade.

Preparations

Before you can even think about performing an upgrade, you need to decide what kind of testing you need to do. Ideally, you have a staging/development environment where you do tests for your regular releases. If that is the case, the best way of doing pre-upgrade tests will be to build a database layer of your staging environment using the new MySQL version. Once that is done, you can proceed with a regular set of tests. More is better - you want to focus not only on the “feature X works/does not work” aspect but also performance.

On the database side, you can also do some generic tests. For that you would need a list of queries in a slow log format. Then you can use pt-upgrade to run them on both the old and the new MySQL version, comparing the response time and result sets. In the past, we have noticed that pt-upgrade returns a lot of false positives - it may report a query as slow while in fact, the query is perfectly fine on both versions. For that, you may want to introduce some additional sanity checks - parse pt-upgrade output, grab the slow queries it reported, execute them once more on the servers and compare the results again. What you need to keep in mind that you should connect to both old and new database servers in the same way (socket connection will be faster than TCP).

Typical results from such generic tests are queries where the execution plan has changed - usually it’s enough to add some indexes or force the optimizer to pick a correct one. You can also see queries with discrepancies in the result set - it’s most likely a result of lack of explicit ORDER BY in the query - you can’t rely on rows being sorted the way they are if you didn’t sort them explicitly.

Minor version upgrades

A minor upgrade is relatively easy to perform - most of the time, all you need to do is to just install the new version using the package manager of your distribution. Once you do that, you need to ensure that MySQL has been started after the upgrade and then you should run the mysql_upgrade script. This script goes through the tables in your database and ensures all of them are compatible with the current version. It may also fix your system tables if required.

Obviously, installing the new version of a package requires the service to be stopped. Therefore you need to plan the upgrade process. It may slightly differ depending if you use Galera Cluster or MySQL replication.

MySQL replication

When we are dealing with MySQL replication, the upgrade process is fairly simple. You need to upgrade slave by slave, taking them out of rotation for the time required to perform the upgrade (it is a short time if everything goes right, not more than few minutes of downtime). For that you may need to do some temporary changes in your proxy configuration to ensure that the traffic won’t be routed to the slave that is under maintenance. It’s hard to give any details here because it depends on your setup. In some cases, it might not even be needed to make any changes as the proxy can adapt to topology changes on it’s own and detects which node is available and which is not. That’s how you should configure your proxy, by the way.

Once every slave has been updated, you need to execute a planned failover. We discussed the process in an earlier blog post. The process may also depend on your setup. It doesn’t have to be manual one if you have tools to automate it for you (MHA for example). Once a new master is elected and failover is completed, you should perform the upgrade on the old master which, at this point, should be slaving off the new master. This will conclude minor version upgrade for the MySQL replication setup.

Galera Cluster

With Galera, it is somewhat easier to perform upgrades - you need to stop the nodes one by one, upgrade the stopped node and then restart before moving to the next. If your proxy needs some manual tweaks to ensure traffic won’t hit nodes which are undergoing maintenance, you will have to make those changes. If it can detect everything automatically, all you need to do is to stop MySQL, upgrade and restart. Once you gone over all nodes in the cluster, the upgrade is complete.

Major version upgrades

A major version upgrade in MySQL would be 5.x -> 5.y or even 4.x > 5.y. Such upgrade is more tricky and complex that the minor upgrades we just covered in earlier paragraphs.

The recommended way of performing the upgrade is to dump and reload the data - this requires some time (depends on the database size) but it’s usually not feasible to do it while the slave is out of rotation. Even when using mydumper/myloader, the process will take too long. In general, if the dataset is larger than a hundred of gigabytes, it will probably require additional preparations.

While it might be possible to do just a binary upgrade (install new packages), it is not recommended as there could be some incompatibilities in binary format between the old version and the new one, which, even after mysql_upgrade has been executed, may still cause some problems. We’ve seen cases where a binary upgrade resulted is some weird behavior in how the optimizer works, or caused instability. All those issues were solved by performing the dump/reload process. So, while you may be ok to run a binary upgrade, you may also run into serious problems - it’s your call and eventually it’s your decision. If you decide to perform a binary upgrade, you need to do detailed (and time-consuming) tests to ensure it does not break anything. Otherwise you are at risk. That’s why dump and reload is the officially recommended way to upgrade MySQL and that’s why we will focus on this approach to the upgrade.

MySQL replication

If our setup is based on MySQL replication, we will build a slave on the new MySQL version. Let’s say we are upgrading from MySQL 5.5 to MySQL 5.6. As we have to perform a long dump/reload process, we may want to build a separate MySQL host for that. A simplest way would be to use xtrabackup to grab the data from one of the slaves along with the replication coordinates. That data will allow you to slave the new node off the old master. Once the new node (still running MySQL 5.5 - xtrabackup just moves the data so we have to use the same, original, MySQL version) is up and running, it’s time to dump the data. You can use any of the logical backup tools that we discussed in our earlier post on Backup and Restore. It doesn’t matter as long as you can restore the data later.

After the dump had been completed, it’s time to stop the MySQL, wipe out the current data directory, install MySQL 5.6 on the node, initialize the data directory using mysql_install_db script and start the new MySQL version. Then it’s time to load the dumps - a process which also may take a lot of time. Once done, you should have a new and shiny MySQL 5.6 node. It’s time now to sync it back with the master - you can use coordinates collected by xtrabackup to slave the node off a member of the production cluster running MySQL 5.5. What’s important to remember here is that, as you want to eventually slave the node off the current production cluster, you need to ensure that binary logs won’t rotate out. For large datasets, the dump/reload process may take days so you want to adjust expire_logs_days accordingly on the master. You also want to confirm you have enough free disk space for all those binlogs.

Once we have a MySQL 5.6 slaving off MySQL 5.5 master, it’s time to go over the 5.5 slaves and upgrade them. The easiest way now would be to leverage xtrabackup to copy the data from the 5.6 node. So, we take a 5.5 slave out of rotation, stop the MySQL server, wipe out data directory, upgrade MySQL to 5.6, restore data from the other 5.6 slave using xtrabackup. Once that’s done, you can setup the replication again and you should be all set.

This process is much faster than doing dump/reload for each of the slaves - it’s perfectly fine to do it once per replication cluster and then use physical backups to rebuild other slaves. If you use AWS, you can rely on EBS snapshots instead of xtrabackup. Similar to the logical backup, it doesn’t really matter how you rebuild the slaves as long as it will work.

Finally, once all of the slaves were upgraded, you need to failover from the 5.5 master to one of the 5.6 slaves. At this point it may happen that you won’t be able to keep the 5.5 in the replication (even if you setup master - master replication between them). In general, replicating from a new version of MySQL to an older one is not supported - replication might break. One way or another, you’ll want to upgrade and rebuild the old master using the same process as with slaves.

Galera Cluster

Compared to MySQL Replication, Galera is, at the same time, both trickier and easier to upgrade. A cluster created with Galera should be treated as a single MySQL server. This is crucial to remember when discussing Galera upgrades - it’s not a master with some slaves or many masters connected to each other - it’s like a single server. To perform an upgrade of a single MySQL server you need to either do the offline upgrade (take it out of rotation, dump the data, upgrade MySQL to 5.6, load the data, bring it back into rotation) or create a slave, upgrade it and finally failover to it (the process we described in the previous section, while discussing MySQL replication upgrade).

Same thing applies for Galera cluster - you either take everything down for the upgrade (all nodes) or you have to build a slave - another Galera cluster connected via MySQL replication.

An online upgrade process may look as follows. For starters, you need to create the slave on MySQL 5.6 - process is exactly the same as above: create a node with MySQL 5.5 (it can be a Galera but it’s not required), use xtrabackup to copy the data and replication coordinates, dump the data using a logical backup tool, wipe out the data directory, upgrade MySQL to 5.6 Galera, bootstrap the cluster, load the data, slave the node off the 5.5 Galera cluster.

At this point you should have two Galera clusters - 5.5 and a single node of Galera 5.6, both connected via replication. Next step will be to build the 5.6 cluster to a production size. It’s hard to tell how to do it - if you are in the cloud, you can just spin up new instances. If you are using colocated servers in a datacenter, you may need to move some of the hardware from the old to the new cluster. You need to keep in mind the total capacity of the system to make sure it can cope with some nodes taken out of rotation. While hardware management may be tricky, what is nice is that you don’t have to do much regarding building the 5.6 cluster - Galera will use SST to populate new nodes automatically.

In general, the goal of this phase is to build a 5.6 cluster that’s large enough to handle the production workload. Once it’s done, you need to failover to 5.6 Galera cluster - this will conclude the upgrade. Of course, you may still need to add some more nodes to it but it’s now a regular process of provisioning Galera nodes, only now you use 5.6 instead of 5.5.

 

Blog category:


Deploy Asynchronous Replication Slave to MariaDB Galera Cluster 10.x using ClusterControl

$
0
0

Combining Galera and asynchronous replication in the same MariaDB setup, aka Hybrid Replication, can be useful - e.g. as a live backup node in a remote datacenter or reporting/analytics server. We already blogged about this setup for Codership/Galera or Percona XtraDB Cluster users, but master failover did not work for MariaDB because of its different GTID approach. In this post, we will show you how to deploy an asynchronous replication slave to MariaDB Galera Cluster 10.x (with master failover!), using GTID with ClusterControl v1.2.10. 

Preparing the Master

First and foremost, you must ensure that the master and slave nodes are running on MariaDB Galera 10.0.2 or later. MariaDB replication slave requires at least a master with GTID among the Galera nodes. However, we would recommend users to configure all the MariaDB Galera nodes as masters. GTID, which is automatically enabled in MariaDB, will be used to do master failover.

The following must be true for the masters:

  • At least one master among the Galera nodes
  • All masters must be configured with the same domain ID
  • log_slave_updates must be enabled
  • All masters’ MariaDB port is accessible by ClusterControl and slaves
  • Must be running MariaDB version 10.0.2 or later

To configure a Galera node as master, change the MariaDB configuration file for that node as per below:

gtid_domain_id=<must be same across all mariadb servers participating in replication>
server_id=<must be unique>
binlog_format=ROW
log_slave_updates=1
log_bin=binlog

 

Preparing the Slave

For the slave, you would need a separate host or VM, with or without MariaDB installed. If you do not have MariaDB installed, and choose ClusterControl to install MariaDB on the slave, ClusterControl will perform the necessary actions to prepare the slave; configure root password (based on monitored_mysql_root_password), create slave user (based on repl_user, repl_password), configure MariaDB, start the server and finally start replication.

In short, we must perform the following actions beforehand:

  • The slave node must be accessible using passwordless SSH from the ClusterControl server
  • MariaDB port (default 3306) and netcat port 9999 on the slave are open for connections
  • You must configure the following options in the ClusterControl configuration file for the respective cluster ID under /etc/cmon.cnf or /etc/cmon.d/cmon_<cluster ID>.cnf:
    • repl_user=<the replication user>
    • repl_password=<password for replication user>
    • monitored_mysql_root_password=<the mysql root password of all nodes including slave>
  • The slave configuration template file must be configured beforehand, and must have at least the following variables defined in the MariaDB configuration template:
    • gtid_domain_id (the value must be the same across all nodes participating in the replication)
    • server_id
    • basedir
    • datadir

To prepare the MariaDB configuration file for the slave, go to ClusterControl > Manage > Configurations > Template Configuration files > edit my.cnf.slave and add the following lines:

[mysqld]
bind-address=0.0.0.0
gtid_domain_id=1
log_bin=binlog
log_slave_updates=1
expire_logs_days=7
server_id=1001
binlog_format=ROW
slave_net_timeout=60
basedir=/usr
datadir=/var/lib/mysql

 

Attaching a Slave via ClusterControl

Let’s now add a MariaDB slave using ClusterControl. Our example cluster is running MariaDB 10.1.2 with ClusterControl v1.2.10. Our deployment will look like this:

1. Configure Galera nodes as master. Go to ClusterControl > Manage > Configurations, and click Edit/View on each configuration file and append the following lines under mysqld directive:

mariadb1:

gtid_domain_id=1
server_id=101
binlog_format=ROW
log_slave_updates=1
log_bin=binlog
expire_logs_days=7

mariadb2:

gtid_domain_id=1
server_id=102
binlog_format=ROW
log_slave_updates=1
log_bin=binlog
expire_logs_days=7

mariadb3:

gtid_domain_id=1
server_id=103
binlog_format=ROW
log_slave_updates=1
log_bin=binlog
expire_logs_days=7

2. Perform a rolling restart from ClusterControl > Manage > Upgrades > Rolling Restart. Optionally, you can restart one node at a time under ClusterControl > Nodes > select the corresponding node > Shutdown > Execute, and then start it again.

3. On the ClusterControl node, setup passwordless SSH to the slave node:

$ ssh-copy-id -i ~/.ssh/id_rsa 10.0.0.128

4. Then, ensure the following lines exist in the corresponding cmon.cnf or cmon_<cluster ID>.cnf:

repl_user=slave
repl_password=slavepassword123
monitored_mysql_root_password=myr00tP4ssword

Restart CMON daemon to apply the changes:

$ service cmon restart

5. Go to ClusterControl > Manage > Configurations > Create New Template or Edit/View existing template, and then add the following lines:

[mysqld]
bind-address=0.0.0.0
gtid_domain_id=1
log_bin=binlog
log_slave_updates=1
expire_logs_days=7
server_id=1001
binlog_format=ROW
slave_net_timeout=60
basedir=/usr
datadir=/var/lib/mysql

6. Now, we are ready to add the slave. Go to ClusterControl > Cluster Actions > Add Replication Slave. Choose a master and the configuration file as per the example below:

Click on Proceed. A job will be triggered and you can monitor the progress at ClusterControl > Logs > Jobs. You should notice that ClusterControl will use non-GTID replication in the Jobs details:

You can simply ignore it as we will setup our MariaDB GTID replication manually later. Once the add job is completed, you should see the master and slave nodes in the grids:

At this point, the slave is replicating from the designated master using the old way (using binlog file/position). 

Replication using MariaDB GTID

Ensure the slave catches up with the master host, the lag value should be 0. Then, stop the slave on slave1:

MariaDB> STOP SLAVE;

Verify the slave status and ensure Slave_IO_Running and Slave_SQL_Running return No. Retrieve the latest values of Master_Log_File and Read_Master_Log_Pos:

MariaDB> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 10.0.0.131
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000022
          Read_Master_Log_Pos: 55111710
               Relay_Log_File: ip-10-0-0-128-relay-bin.000002
                Relay_Log_Pos: 28699373
        Relay_Master_Log_File: binlog.000022
             Slave_IO_Running: No
            Slave_SQL_Running: No

Then on the master (mariadb1) run the following statement to retrieve the GTID:

MariaDB> SELECT binlog_gtid_pos('binlog.000022',55111710);
+-------------------------------------------+
| binlog_gtid_pos('binlog.000022',55111710) |
+-------------------------------------------+
| 1-103-613991                              |
+-------------------------------------------+

The result from the function call is the current GTID, which corresponds to the binary file position on the master. 

Set the GTID slave position on the slave node. Run the following statement on slave1:

MariaDB> STOP SLAVE;
MariaDB> SET GLOBAL gtid_slave_pos = '1-103-613991';
MariaDB> CHANGE MASTER TO master_use_gtid=slave_pos;
MariaDB> START SLAVE;

The slave will start catching up with the master using GTID as you can verify with SHOW SLAVE STATUS command:

        ...
        Seconds_Behind_Master: 384
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-103-626963

 

Master Failover and Recovery

ClusterControl doesn’t support MariaDB slave failover with GTID via ClusterControl UI in the current version (v1.2.10), this will be supported in v1.2.11. So, if you are using 1.2.10 or earlier, failover has to be done manually whenever the designated master fails. Initially, when you added a replication slave via ClusterControl, it only added the slave user on the designated master (mariadb1). To ensure failover works, we have to explicitly add the slave user on mariadb2 and mariadb3.

Run following command on mariadb2 or mariadb3 once. It should replicate to all Galera nodes:

MariaDB> GRANT REPLICATION SLAVE ON *.* TO slave@'10.0.0.128' IDENTIFIED BY 'slavepassword123';
MariaDB> FLUSH PRIVILEGES;

If mariadb1 fails, to switch to another master, you just need to run following statement on slave1:

MariaDB> STOP SLAVE;
MariaDB> CHANGE MASTER TO MASTER_HOST='10.0.0.132';
MariaDB> START SLAVE;

The slave will resume from the last Gtid_IO_Pos. Check the slave status to verify everything is working:

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.132
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000020
          Read_Master_Log_Pos: 140875476
               Relay_Log_File: ip-10-0-0-128-relay-bin.000002
                Relay_Log_Pos: 1897915
        Relay_Master_Log_File: binlog.000020
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 130199239
              Relay_Log_Space: 12574457
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 133
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 102
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-103-675356

That’s it! Please give it a try and let us know how these instructions worked for you.

Blog category:

How to Deploy and Manage MaxScale using ClusterControl

$
0
0

We were delighted to announce that ClusterControl now supports MaxScale, the intelligent database gateway from the MariaDB team. In this blog, we will show you how to deploy and manage MaxScale from ClusterControl. We previously blogged about MaxScale, read it if you’d like a quick overview and introduction. We also published a quick benchmark of HAProxy vs MaxScale.

How to install MaxScale from ClusterControl

Assume you have a Galera Cluster currently managed by ClusterControl. As next step you want to install MaxScale as load balancer. From the ClusterControl UI, go to Manage > Load Balancer > Install MaxScale. You’ll be presented with following dialog box. 

Please make sure you have chosen ‘Create MaxScale Instance’. We’ll discuss the “MariaDB Repository URL” further down in this blog. The remaining options are pretty clear: 

  • choose or type in the IP address of the node where MaxScale will be installed. ClusterControl has to be able to ssh in a passwordless fashion to this host. 
  • username and password  - it will be used to access MaxScale’s administration interface.
  • next is another username/password - this time, it is a MariaDB/MySQL user that will be used by MaxScale to access and monitor the MariaDB/MySQL nodes in your infrastructure.

Below that you can set couple of configuration settings for MaxScale:

  • how many threads it is allowed to use 
  • which ports should be used for different services. Currently, MaxScale deployed by ClusterControl comes with a CLI service and Debug service - those are standard services available for MaxScale. We also create two ‘production’ services - ‘RW’, which implements a read-write split and ‘RR’ which implements round-robin access to all of the nodes in the cluster. Each of those services have a port assigned to it - you can change it at the deploy stage.

At the bottom of the dialog box, you’ll see your nodes with checkboxes - you can pick those that will be included in the proxy’s configuration.

Let’s get back to the repository URL. Once you click on the tooltip, you’ll see the following screen:

MariaDB introduced individual links to the repository where MaxScale is stored. To get your link, you should log into the MariaDB Enterprise Portal and generate one for yourself. Once you have it, you can paste it in the MariaDB Repository URL box. This concludes the preparations for deployment, you can now click on the ‘Install MaxScale’ button and wait for the process to complete.

You can monitor the progress in ClusterControl > Logs > Jobs. When it finishes, it prints some examples covering how you can connect to your MaxScale instance. For better security you should delete this job by clicking the ‘Delete’ button above it.

How to add an existing MaxScale instance to ClusterControl?

If you already have MaxScale installed in your setup, you can easily add it to ClusterControl to benefit from health monitoring and access to MaxAdmin - MaxScale’s CLI from the same interface you use to manage the database nodes. 
The only requirement is to have passwordless SSH configured between ClusterControl node and host where MaxScale is running. Once you have SSH set up, you can go to Manage > Load Balancer > Install MaxScale. Once you are there, ensure you have chosen ‘Add Existing MaxScale’. You’ll be then presented with a following screen:

The only data you need to pass here is IP of the MaxScale server and port for the MaxScale CLI. After clicking ‘Install MaxScale’, it will be added to the list of processes monitored by ClusterControl.

How to uninstall MaxScale from ClusterControl?

If, for some reason, you would like to uninstall MaxScale, go to the ‘Nodes’ tab and then click on the ‘-’ icon next to the MaxScale node. You’ll be asked if you want to remove the node. If you click yes, MaxScale will be uninstalled from the node you’ve just chosen.

How to access MaxScale MaxAdmin commands?

MaxAdmin is a command line interface available with MaxScale that allows extensive health checks, user management, status and control of MaxScale operations. ClusterControl gives you direct access to the MaxAdmin commands. You can reach it by going to ‘Nodes’ tab and then clicking on your MaxScale node. Once you do that, you’ll be presented with the following screenshot:

You can see the status of the node, and stop the instance if needed. In the user/password boxes, you need to enter MaxScale’s administrative user credentials - this is what you put in the Admin username/password fields where installing MaxScale. If you added an existing MaxScale installation, please use here any administrative user that you created. By default (if you do not provide any input) ClusterControl will try to use the default username and password created during the MaxScale installation (admin/mariadb).

The drop down box contains a predefined set of commands that you can execute by just picking them from the list and then clicking on the ‘Execute’ button. For a full list of commands, you can execute the ‘help’ command.

Results are presented in a console window:

You can clear it at any time by using ‘Reset Console’ button.

Please keep in mind that some of the commands require additional parameters. For example, the ‘show server’ command requires you to pass a server name as a parameter. You can do so by just editing the dropdown box and typing in the parameter:

As we mentioned before, you are not limited to the predefined commands - you can type anything you like in the command box. For example, we can stop and then start one of the services:

CLI access from ClusterControl talks directly to the MaxAdmin, therefore you can use ClusterControl to execute everything that you are able to run using the MaxAdmin CLI client.

Have fun with MaxScale!

Blog category:

ClusterControl Tips & Tricks: Updating your MySQL Configuration

$
0
0

Requires ClusterControl 1.2.11 or later. Applies to MySQL based clusters.

From time to time it is necessary to tune and update your configuration. Here we will show you how you can change/update  individual parameters using the ClusterControl UI. Navigate to Manage > Configurations.

Pretend that you want to change max_connections from 200 to 500 on all DB nodes in your cluster.

Click on Change Parameter. Select all MySQL Servers in the DB Instances drop down and select the Group (in this case MYSQLD) where the Parameter that you want to change resides, select the parameter (max_connections), and set the New Value to 500:

Press Proceed, and then you will be presented with a Config Change Log of your parameter change:

The Config Change Log says that:

  1. Change was successful
  2. The change was possible with SET GLOBAL (in this case SET GLOBAL max_connections=500)
  3. The change was persisted in my.cnf
  4. No restart is required

What if you don’t find the parameter you want to change in the Parameter drop-down? You can type in the parameter by hand then, and give it a new value. If possible, SET GLOBAL variable_name=value will be executed, and if not, then a restart may be required. Remember, the change will be persisted in my.cnf upon successful execution.

Happy Clustering!

PS.: To get started with ClusterControl, click here!

Blog category:

ClusterControl Tips & Tricks: Securing your MySQL Installation

$
0
0

Requires ClusterControl 1.2.11 or later. Applies to MySQL based clusters.

During the life cycle of Database installation it is common that new user accounts are created. It is a good practice to once in a while verify that the security is up to standards. That is, there should at least not be any accounts with global access rights, or accounts without password.

Using ClusterControl, you can at any time perform a security audit.

In the User Interface go to Manage > Developer Studio. Expand the folders so that you see s9s/mysql/programs. Click on security_audit.js and then press Compile and Run.

If there are problems you will clearly see it in the messages section:

Enlarged Messages output:

Here we have accounts that do not have a password. Those accounts should not exist in a secure database installation. That is rule number one. To correct this problem, click on mysql_secure_installation.js in the s9s/mysql/programs folder.

Click on the dropdown arrow next to Compile and Run and press Change Settings. You will see the following dialog and enter the argument “STRICT”:

Then press Execute. The mysql_secure_installation.js script will then do on each MySQL database instance part of the cluster:

  1. Delete anonymous users
  2. Dropping 'test' database (if exists).
  3. If STRICT is given as an argument to mysql_secure_installation.js it will also do:
    • Remove accounts without passwords.

In the Message box you will see:

The MySQL database servers part of this cluster have now been secured and you have reduced the risk of compromising your data.

You can re-run security_audit.js to verify that the actions have had effect.

Happy Clustering!

PS.: To get started with ClusterControl, click here!

Blog category:

Become a MySQL DBA blog series - Galera Cluster diagnostic logs

$
0
0

In a previous post, we discussed the error log as a source of diagnostic information for MySQL. We’ve shown examples of problems where the error log can help you diagnose issues with MySQL. This post focuses on the Galera Cluster error logs, so if you are currently using Galera, read on.

In a Galera cluster, whether you use MariaDB Cluster, Percona XtraDB Cluster or the Codership build, the MySQL error log is even more important. It gives you the same information as MySQL would, but you’ll also find in it information about Galera internals and replication. This data is crucial to understand the state of your cluster and to identify any issues which may impact the cluster’s ability to operate. In this post, we’ll try to make the Galera error log easier to understand.

This is the sixteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

Clean Galera startup

Let’s start by looking at what a clean startup of a Galera node looks like.

151028 14:28:50 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151028 14:28:50 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 pair
151028 14:28:50 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 to wsrep_start_position

What we can see here is Galera trying to understand the sequence number of the last writeset this node applied. It is found in grastate.dat file. In this particular case, shutdown was clean so all the needed data was found in that file making wsrep-recovery process unnecessary (i.e., the process of calculating the last writeset based on the data applied to the database). Galera sets the wsrep_start_position to the last applied writeset.

2015-10-28 14:28:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28 14:28:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56) starting as process 553 ...
2015-10-28 14:28:50 553 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_load(): Galera 3.12.2(rf3e626d) by Codership Oy <info@codership.com> loaded successfully.
2015-10-28 14:28:50 553 [Note] WSREP: CRC-32C: using hardware acceleration.
2015-10-28 14:28:50 553 [Note] WSREP: Found saved state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

Next we can see the WSREP library being loaded and, again, information about last state of the cluster before shutdown.

2015-10-28 14:28:50 553 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.30.4.156; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum =

This line covers all configuration related to Galera.

2015-10-28 14:28:50 553 [Note] WSREP: Service thread queue flushed.
2015-10-28 14:28:50 553 [Note] WSREP: Assign initial position for certification: 11152, protocol version: -1
2015-10-28 14:28:50 553 [Note] WSREP: wsrep_sst_grab()
2015-10-28 14:28:50 553 [Note] WSREP: Start replication
2015-10-28 14:28:50 553 [Note] WSREP: Setting initial position to 98ed75de-7c05-11e5-9743-de4abc22bd11:11152
2015-10-28 14:28:50 553 [Note] WSREP: protonet asio version 0
2015-10-28 14:28:50 553 [Note] WSREP: Using CRC-32C for message checksums.
2015-10-28 14:28:50 553 [Note] WSREP: backend: asio

Initialization process continues, WSREP replication has been started.

2015-10-28 14:28:50 553 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2015-10-28 14:28:50 553 [Note] WSREP: restore pc from disk failed

Those two lines above require a detailed comment. The file which is missing here, gvstate.dat, contains information about cluster state, as a given node sees it. You can find there the UUID of a node and other members of the cluster which are in the PRIMARY state (i.e., which are part of the cluster). In case of unexpected crash, this file will allow the node to understand the state of the cluster before it crashed. In case of a clean shutdown, the gvwstate.dat file is removed. That is why it cannot be found in our example.

2015-10-28 14:28:50 553 [Note] WSREP: GMCast version 0
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2015-10-28 14:28:50 553 [Note] WSREP: EVS version 0
2015-10-28 14:28:50 553 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'

In this section, you will see the initial communication within the cluster - the node is trying to connect to the other members listed in wsrep_cluster_address variable. For Galera it’s enough to get in touch with one active member of the cluster - even if node “A” has contact only with node “B” but not “C”, “D” and “E”, node “B” will work as a relay and pass through the needed communication.

2015-10-28 14:28:50 553 [Warning] WSREP: (3506d410, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.156:4567' points to own listening address, blacklisting
2015-10-28 14:28:50 553 [Note] WSREP: (3506d410, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-10-28 14:28:50 553 [Note] WSREP: declaring 6221477d at tcp://172.30.4.191:4567 stable
2015-10-28 14:28:50 553 [Note] WSREP: declaring 78b22e70 at tcp://172.30.4.220:4567 stable

Here we can see results of the exchange of the communication. At first, it blacklists it’s own IP address. Then, as a result of a network exchange, the remaining two nodes of the cluster are found and declared ‘stable’. Please note the IDs used here - those Node IDs are used also in other places but here you can easily link them with the relevant IP address. Node ID can also be located in gvwstate.dat file (it’s a first segment of ‘my_uuid’) or by running “SHOW GLOBAL STATUS LIKE ‘wsrep_gcomm_uuid’”. Please keep in mind that UUID may change, for example, after a crash.

2015-10-28 14:28:50 553 [Note] WSREP: Node 6221477d state prim
2015-10-28 14:28:50 553 [Note] WSREP: view(view_id(PRIM,3506d410,5) memb {
    3506d410,0
    6221477d,0
    78b22e70,0
} joined {
} left {
} partitioned {
})
2015-10-28 14:28:50 553 [Note] WSREP: save pc into disk
2015-10-28 14:28:50 553 [Note] WSREP: gcomm: connected

As a result, the node managed to join the cluster - you can see that all IDs are listed as members of the cluster, cluster is in ‘Primary’ state. There are no nodes in ‘joined’ state, the cluster was not partitioned.

2015-10-28 14:28:50 553 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-10-28 14:28:50 553 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2015-10-28 14:28:50 553 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2015-10-28 14:28:50 553 [Note] WSREP: Waiting for SST to complete.
2015-10-28 14:28:50 553 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2015-10-28 14:28:50 553 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 355385f7-7d80-11e5-bb70-17166cef2a4d
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: sent state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 0 (172.30.4.156)
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 1 (172.30.4.191)
2015-10-28 14:28:50 553 [Note] WSREP: STATE EXCHANGE: got state msg: 355385f7-7d80-11e5-bb70-17166cef2a4d from 2 (172.30.4.220)
2015-10-28 14:28:50 553 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 4,
    members    = 3/3 (joined/total),
    act_id     = 11152,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11

Cluster members exchange information about their status and as a result we see a report on quorum calculation - cluster is in ‘Primary’ state with all three nodes joined.

2015-10-28 14:28:50 553 [Note] WSREP: Flow-control interval: [28, 28]
2015-10-28 14:28:50 553 [Note] WSREP: Restored state OPEN -> JOINED (11152)
2015-10-28 14:28:50 553 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152, view# 5: Primary, number of nodes: 3, my index: 0, protocol version 3
2015-10-28 14:28:50 553 [Note] WSREP: SST complete, seqno: 11152
2015-10-28 14:28:50 553 [Note] WSREP: Member 0.0 (172.30.4.156) synced with group.
2015-10-28 14:28:50 553 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 11152)

Those few lines above are bit misleading - there was no SST performed as the node was synced with the cluster. In short, at this very moment, the cluster is up and our node is synced with the other cluster members.

2015-10-28 14:28:50 553 [Warning] option 'innodb-buffer-pool-instances': signed value -1 adjusted to 0
2015-10-28 14:28:50 553 [Note] Plugin 'FEDERATED' is disabled.
2015-10-28 14:28:50 553 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-10-28 14:28:50 553 [Note] InnoDB: The InnoDB memory heap is disabled
2015-10-28 14:28:50 553 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

...

2015-10-28 14:28:52 553 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.26-74.0-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12

Finally, MySQL startup process is coming to it’s end - MySQL is up and running, ready to handle connections.

As you can see, in the error log, during the clean start, we can see a lot of information. We can see how a node sees the cluster, we can see information about quorum calculation, we can see IDs and IP addresses of other nodes in the cluster. This give us great info about the cluster state - what nodes were up, were there any connectivity issues? Information about Galera configuration is also very useful - my.cnf may have been changed in the meantime, runtime configuration - the same. Yet, we can still check how exactly Galera was configured at the time of the startup. It may comes handy when dealing with misconfigurations.

Galera node - IST process

Ok, so we covered a regular startup - when a node is up to date with the cluster. What about when a node is not up to date? There are two options - if the donor has all the needed writesets stored in its gcache, Incremental State Transfer (IST) can be performed. If not, State Snapshot Transfer (SST) will be executed and it involves copying all data from the donor to the joining node. This time, we are looking at the node joining/synchronization using IST. To make it easier to read, we are going to remove some of the log entries which didn’t change compared to the clean startup we discussed earlier.

2015-10-28 16:36:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-10-28 16:36:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56) starting as process 10144 ...

...

2015-10-28 16:36:50 10144 [Note] WSREP: Found saved state: 98ed75de-7c05-11e5-9743-de4abc22bd11:-1

Ok, MySQL started and after initialization, Galera located the saved state with ‘-1’ as a writeset sequence number. It means that the node didn’t shutdown cleanly and the correct state wasn’t written in grastate.dat file.

...

2015-10-28 16:36:50 10144 [Note] WSREP: Start replication
2015-10-28 16:36:50 10144 [Note] WSREP: Setting initial position to 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

Even though grastate.dat file didn’t contain sequence number, Galera can calculate it on it’s own and setup the initial position accordingly.

...

2015-10-28 16:36:50 10144 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'
2015-10-28 16:36:50 10144 [Warning] WSREP: (16ca37d4, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.191:4567' points to own listening address, blacklisting
2015-10-28 16:36:50 10144 [Note] WSREP: (16ca37d4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-10-28 16:36:51 10144 [Note] WSREP: declaring 3506d410 at tcp://172.30.4.156:4567 stable
2015-10-28 16:36:51 10144 [Note] WSREP: declaring 78b22e70 at tcp://172.30.4.220:4567 stable
2015-10-28 16:36:51 10144 [Note] WSREP: Node 3506d410 state prim
2015-10-28 16:36:51 10144 [Note] WSREP: view(view_id(PRIM,16ca37d4,8) memb {
    16ca37d4,0
    3506d410,0
    78b22e70,0
} joined {
} left {
} partitioned {
})

Similar to the clean startup, the Galera nodes had some communication and decided on the state of the cluster. In this case, all three nodes were up and the cluster was formed.

2015-10-28 16:36:51 10144 [Note] WSREP: save pc into disk
2015-10-28 16:36:51 10144 [Note] WSREP: Waiting for SST to complete.
2015-10-28 16:36:51 10144 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2015-10-28 16:36:51 10144 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 17633a80-7d92-11e5-8bdc-f6091df00f00
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: sent state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 0 (172.30.4.191)
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 1 (172.30.4.156)
2015-10-28 16:36:51 10144 [Note] WSREP: STATE EXCHANGE: got state msg: 17633a80-7d92-11e5-8bdc-f6091df00f00 from 2 (172.30.4.220)
2015-10-28 16:36:51 10144 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 6,
    members    = 2/3 (joined/total),
    act_id     = 31382,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11

Again, communication exchange happens and quorum calculation is performed. This time you can see that three members were detected and only two of them joined. This is something we expect - one node is not synced with the cluster.

2015-10-28 16:36:51 10144 [Note] WSREP: Flow-control interval: [28, 28]
2015-10-28 16:36:51 10144 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 31382)
2015-10-28 16:36:51 10144 [Note] WSREP: State transfer required:
    Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382
    Local state: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152

State transfer was deemed necessary because the local node is not up to date with the applied writesets.

2015-10-28 16:36:51 10144 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 3
2015-10-28 16:36:51 10144 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-10-28 16:36:51 10144 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '10144''''
WSREP_SST: [INFO] Streaming with xbstream (20151028 16:36:52.039)
WSREP_SST: [INFO] Using socat as streamer (20151028 16:36:52.041)

State transfer is performed using xtrabackup, this is true for both IST and SST (if xtrabackup is configured as SST method). Both xbstream and socat binaries are required for IST (and SST) to execute. It may happen that socat is not installed on the system. In such case, the node won’t be able to join the cluster.

WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151028 16:36:52.132)
2015-10-28 16:36:52 10144 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.191:4444/xtrabackup_sst//1
2015-10-28 16:36:52 10144 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-10-28 16:36:52 10144 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-10-28 16:36:52 10144 [Note] WSREP: Service thread queue flushed.
2015-10-28 16:36:52 10144 [Note] WSREP: Assign initial position for certification: 31382, protocol version: 3
2015-10-28 16:36:52 10144 [Note] WSREP: Service thread queue flushed.
2015-10-28 16:36:52 10144 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.30.4.191:4568
2015-10-28 16:36:52 10144 [Note] WSREP: Member 0.0 (172.30.4.191) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-10-28 16:36:52 10144 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 31389)

This stage covers state transfer - as you can see, the donor is selected. In our case, it is 172.30.4.156. This is pretty important information, especially when SST is executed - donors will run xtrabackup to move the data and they will log the progress in innobackup.backup.log file in the MySQL data directory. Different network issues may also cause problems with state transfer - it’s good to check which node was a donor even if only to check if there are any errors on the donor’s side.

2015-10-28 16:36:52 10144 [Note] WSREP: Requesting state transfer: success, donor: 1
2015-10-28 16:36:53 10144 [Note] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.191) complete.
2015-10-28 16:36:53 10144 [Note] WSREP: Member 1.0 (172.30.4.156) synced with group.
WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20151028 16:36:53.159)
WSREP_SST: [INFO] Galera co-ords from recovery: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152 (20151028 16:36:53.162)
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20151028 16:36:53.164)
WSREP_SST: [INFO] Removing the sst_in_progress file (20151028 16:36:53.166)
2015-10-28 16:36:53 10144 [Note] WSREP: SST complete, seqno: 11152
2015-10-28 16:36:53 10144 [Warning] option 'innodb-buffer-pool-instances': signed value -1 adjusted to 0
2015-10-28 16:36:53 10144 [Note] Plugin 'FEDERATED' is disabled.
2015-10-28 16:36:53 10144 [Note] InnoDB: Using atomics to ref count buffer pool pages

...

2015-10-28 16:36:55 10144 [Note] WSREP: Signalling provider to continue.
2015-10-28 16:36:55 10144 [Note] WSREP: Initialized wsrep sidno 1
2015-10-28 16:36:55 10144 [Note] WSREP: SST received: 98ed75de-7c05-11e5-9743-de4abc22bd11:11152
2015-10-28 16:36:55 10144 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.26-74.0-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12
2015-10-28 16:36:55 10144 [Note] WSREP: Receiving IST: 20230 writesets, seqnos 11152-31382
2015-10-28 16:40:18 10144 [Note] WSREP: IST received: 98ed75de-7c05-11e5-9743-de4abc22bd11:31382
2015-10-28 16:40:18 10144 [Note] WSREP: 0.0 (172.30.4.191): State transfer from 1.0 (172.30.4.156) complete.

In this section, among others, we can see IST being performed and 20k writesets were transferred from the donor to the joining node.

2015-10-28 16:40:18 10144 [Note] WSREP: Shifting JOINER -> JOINED (TO: 34851)

At this point, the node applied all writesets sent by the donor and it switched to ‘Joined’ state. It’s not yet synced because there are remaining writesets (applied by the cluster while IST was performed) to be applied. This is actually something very important to keep in mind - when a node is a ‘Joiner’, it’s applying writesets as fast as possible, without any interruption to the cluster. Once a node switches to ‘Joined’ state, it may send flow control messages. This is particularly visible when there are lots of writesets to be sent via IST. Let’s say it took 2h to apply them on a joining node. This means it still has to apply 2h worth of traffic - and it can send flow control messages in the meantime. Given that the‘Joined’ node is significantly lagging behind (2h), it will definitely send flow control - for the duration of the time needed to apply those 2h worth of traffic, the cluster will run with degraded performance.

2015-10-28 16:42:30 10144 [Note] WSREP: Member 0.0 (172.30.4.191) synced with group.
2015-10-28 16:42:30 10144 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 36402)
2015-10-28 16:42:31 10144 [Note] WSREP: Synchronized with group, ready for connections
2015-10-28 16:42:31 10144 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Finally, the node caught up on the writeset replication and synced with the cluster.

Galera SST process

In the previous part, we covered IST process from the error log point of view. One more important and frequent process is left - State Snapshot Transfer (SST).

151102 15:39:10 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151102 15:39:10 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:71174 pair
151102 15:39:10 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:71174 to wsrep_start_position

...

2015-11-02 15:39:11 1890 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 10,
    members    = 2/3 (joined/total),
    act_id     = 97896,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11
2015-11-02 15:39:11 1890 [Note] WSREP: Flow-control interval: [28, 28]
2015-11-02 15:39:11 1890 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 97896)
2015-11-02 15:39:11 1890 [Note] WSREP: State transfer required:
    Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:97896
    Local state: 98ed75de-7c05-11e5-9743-de4abc22bd11:71174
2015-11-02 15:39:11 1890 [Note] WSREP: New cluster view: global state: 98ed75de-7c05-11e5-9743-de4abc22bd11:97896, view# 11: Primary, number of nodes: 3, my index: 2, protocol version 3
2015-11-02 15:39:11 1890 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-11-02 15:39:11 1890 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '1890''''
WSREP_SST: [INFO] Streaming with xbstream (20151102 15:39:11.812)
WSREP_SST: [INFO] Using socat as streamer (20151102 15:39:11.814)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151102 15:39:11.908)
2015-11-02 15:39:12 1890 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.191:4444/xtrabackup_sst//1
2015-11-02 15:39:12 1890 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-02 15:39:12 1890 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-11-02 15:39:12 1890 [Note] WSREP: Service thread queue flushed.
2015-11-02 15:39:12 1890 [Note] WSREP: Assign initial position for certification: 97896, protocol version: 3
2015-11-02 15:39:12 1890 [Note] WSREP: Service thread queue flushed.
2015-11-02 15:39:12 1890 [Note] WSREP: Prepared IST receiver, listening at: tcp://172.30.4.191:4568
2015-11-02 15:39:12 1890 [Note] WSREP: Member 2.0 (172.30.4.191) requested state transfer from '*any*'. Selected 0.0 (172.30.4.220)(SYNCED) as donor.
2015-11-02 15:39:12 1890 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 98007)
2015-11-02 15:39:12 1890 [Note] WSREP: Requesting state transfer: success, donor: 0

Until this line, everything looks exactly like with Incremental State Transfer (IST). There was a gap in the sequence between joining node and the rest of the cluster. SST was deemed necessary because the required writeset (71174) wasn’t available in the donor’s gcache.

WSREP_SST: [INFO] Proceeding with SST (20151102 15:39:13.070)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151102 15:39:13.071)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151102 15:39:13.093)
removed ‘/var/lib/mysql/DB187/t1.ibd’
removed ‘/var/lib/mysql/DB187/t2.ibd’
removed ‘/var/lib/mysql/DB187/t3.ibd’

SST basically copies data from a donor to the joining node. For this to work, the current contents of the data directory have to be removed - the SST process takes care of it and you can follow the progress in the error log, as can be seen above.

...

removed directory: ‘/var/lib/mysql/DBx236’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151102 15:39:30.349)
2015-11-02 15:42:32 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
2015-11-02 15:48:56 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
2015-11-02 15:54:59 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000002 of size 134217728 bytes
2015-11-02 16:00:45 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000003 of size 134217728 bytes
2015-11-02 16:06:21 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000004 of size 134217728 bytes
2015-11-02 16:09:32 1890 [Note] WSREP: 0.0 (172.30.4.220): State transfer to 2.0 (172.30.4.191) complete.
2015-11-02 16:09:32 1890 [Note] WSREP: Member 0.0 (172.30.4.220) synced with group.

The SST process takes some time, depending on the time needed to copy the data. For the duration of the SST, the joining node caches writesets in gcache and, if in-memory gcache is not large enough, on-disk gcache files are used. Please note that at this point, in the last two lines, we can see information about the donor finishing the SST and getting back in sync with the rest of the cluster. 

WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20151102 16:09:32.187)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check  --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20151102 16:09:32.188)
2015-11-02 16:12:39 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000005 of size 134217728 bytes
2015-11-02 16:20:52 1890 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000006 of size 134217728 bytes

In this case, the joiner use xtrabackup as the SST method therefore the ‘prepare’ phase has to be completed.

WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20151102 16:26:01.062)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check   --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20151102 16:26:01.064)
WSREP_SST: [INFO] Move successful, removing /var/lib/mysql//.sst (20151102 16:26:14.600)

Last phase is to copy back files from the temporary directory to MySQL data directory. The rest of the process is similar to what we discussed earlier. The node initializes all plugins and InnoDB, and MySQL gets ready to handle connections. Once that’s done, it applies the remaining writesets stored in gcache.

2015-11-02 16:27:06 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000000
2015-11-02 16:27:08 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000001
2015-11-02 16:27:11 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000002
2015-11-02 16:27:15 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000003
2015-11-02 16:28:11 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000004
2015-11-02 16:28:56 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000005
2015-11-02 16:29:17 1890 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000006
2015-11-02 16:29:24 1890 [Note] WSREP: Member 2.0 (172.30.4.191) synced with group.
2015-11-02 16:29:24 1890 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 214214)
2015-11-02 16:29:24 1890 [Note] WSREP: Synchronized with group, ready for connections
2015-11-02 16:29:24 1890 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

After all writesets were applied, on-disk gcache files are deleted and the node finally syncs up with the cluster.

Xtrabackup logs for SST

The SST process (if you use xtrabackup as the SST method), as you may have noticed, consists of three stages - streaming data to joiner, applying logs on the joiner and moving data from temporary directory to the final location in MySQL data directory. To have a complete view of the process, we need to mention some additional log files related to those three stages of the SST. You can find them in MySQL data directory or in the temporary directory Galera uses to move the data to the joiner. 

First of them (in order of appearance) is innobackup.backup.log - a file created on the donor host which consists of all the logs from xtrabackup process which is executed to stream the data from donor to joiner. This file can be very useful in dealing with any type of SST issue. In the next blog post, we will show how you can use it to identify the problem.

Another file located either in MySQL data directory (e.g., /var/lib/mysql) or .sst temporary subdirectory (e.g., /var/lib/mysql/.sst) is innobackup.prepare.log. This file contains logs related to ‘apply log’ phase. In this stage xtrabackup applies transaction logs to tablespaces (ibd files) copied from the donor. It may happen that InnoDB data is corrupted and, even though it was possible to stream the data from donor to joiner, it’s not possible to apply transactions from the redo log. Such errors will be logged in this file.

The last one is innobackup.move.log - it contains logs of the ‘move’ stage in which xtrabackup move files from temporary (.sst) directory to the final location in MySQL data directory.

In the next post, we’ll go through real life examples of errors which may impact your Galera Cluster. We will cover a wide range of issues that you can experience in production, and how to deal with them. For that, it is important to know what exactly you should expect in logs from a ‘normal’ system state - now that you know what normal looks like, it will much easier to understand what’s not normal :-) 

Blog category:

ClusterControl Tips & Tricks for MySQL: Max Open Files

$
0
0

Requires ClusterControl 1.2.11 or later. Applies to MySQL single instances, replications and Galera clusters.

You have created a large database with thousands of tables (> 5000 in MySQL 5.6). Then you want to create a backup using xtrabackup. Or, if it is a Galera cluster, you have to recover a galera node using wsrep_sst_method=xtrabackup[-v2].

Unfortunately it fails and the following is emitted in the Job Logs messages:

xtrabackup: Generating a list of tablespaces
2015-11-03 19:36:02 7fdef130a780  InnoDB: Operating system error number 24 in a file operation.
InnoDB: Error number 24 means 'Too many open files'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
InnoDB: Error: could not open single-table tablespace file ./DB75/t69.ibd

In this case you can simply increase the open_files_limit of the MySQL server(s) by going to Manage > Configurations, click on Change Parameter.

We want to make the change on all MySQL hosts, and add the open_files_limit to the ‘MYSQLD’ group. We then have to type in ‘open_files_limit’ in the “Parameter” field, since the parameter has not yet been set in the my.cnf file of the selected servers. Then it is simply to set the New Value to something appropriate. We have > 80000 tables in the database, so we’ll set the new value to 100000.

Next, press Proceed, and then you will be presented with the Config Change Log dialog:

The parameter change was successful, and the next step is to restart the MySQL servers as indicated in the dialogs. Please note that many operating systems impose an upper limit on how many open files can be set on a process.

PS.: To get started with ClusterControl, click here!

Blog category:

Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 1

$
0
0

In this blog post, we are going to show you some examples of things that can go wrong in Galera - inexplicable node crashes, network splits, clusters that won’t restart and inconsistent data. We’ll take a look at the data stored in log files to diagnose the problems, and discuss how we can deal with these.

This builds upon the previous post, where we looked into log files produced by Galera (error log and innobackup.* logs). We discussed how regular, “normal” activity looks like - from initialization of Galera replication to Incremental State Transfer (IST) and State Snapshot Transfer (SST) processes, and the respective log entries. 

This is the seventeenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

 

GEN_CLUST_INDEX issue

This is an example of an issue related to lack of Primary Key (PK) in a table. Theoretically speaking, Galera supports InnoDB tables without a PK. Lack of PK induces some limitations, for e.g. , DELETEs are not supported on such tables. Row order can also be different. We’ve found that lack of PK can also be a culprit of serious crashes in Galera. It shouldn’t happen but that’s how things look like. Below is a sample of error log covering such crash.

2015-04-29 11:49:38 49655 [Note] WSREP: Synchronized with group, ready for connections
2015-04-29 11:49:38 49655 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-04-29 11:49:39 7ff1285b5700 InnoDB: Error: page 409 log sequence number 207988009678
InnoDB: is in the future! Current system log sequence number 207818416901.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
BF-BF X lock conflict
RECORD LOCKS space id 1288 page no 33 n bits 104 index `GEN_CLUST_INDEX` of table `db`.`tab` trx id 4670189 lock_mode X locks rec but not gap

Here’s the hint we are looking for: ‘GEN_CLUST_INDEX’. InnoDB requires a PK to be available - it uses PK as a clustered index, which is the main component of a table structure. No PK - no way to organize data within a table. If there’s no explicit PK defined on a table, InnoDB looks for UNIQUE keys to use for clustered index. If there’s no PK nor UNIQUE key available, an implicit PK is created - it’s referred to by InnoDB as ‘GEN_CLUST_INDEX’. If you see the entry like above, you can be sure that this particular table (db.tab) doesn’t have a PK defined.

In this particular case, we can see that some kind of lock conflict happened.

Record lock, heap no 2 PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 6; hex 00000c8517c5; asc       ;;
 1: len 6; hex 0000004742ec; asc    GB ;;
 2: len 7; hex 5f0000026d10a6; asc _   m  ;;
 3: len 4; hex 32323237; asc 2227;;

...

 15: len 30; hex 504c4154454c4554204147475245474154494f4e20494e48494249544f52; asc xxx    ; (total 50 bytes);
 16: len 30; hex 414e5449504c4154454c4554204452554753202020202020202020202020; asc xxx    ; (total 100 bytes);
 17: len 1; hex 32; asc 2;;

In the section above you can find information about the record involved in the locking conflict. We obfuscated the data, but normally this info may point you to a particular row in the table.

15:49:39 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=25165824
read_buffer_size=131072
max_used_connections=13
max_threads=200
thread_count=12
connection_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 104208 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fed4c000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff10037bd98 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8ea2e5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6788f4]
/lib64/libpthread.so.0[0x391380f710]
/lib64/libc.so.6(gsignal+0x35)[0x3913432625]
/lib64/libc.so.6(abort+0x175)[0x3913433e05]
/usr/sbin/mysqld[0x9c262e]
/usr/sbin/mysqld[0x9c691b]
/usr/sbin/mysqld[0x9c6c44]
/usr/sbin/mysqld[0xa30ddf]
/usr/sbin/mysqld[0xa36637]
/usr/sbin/mysqld[0x997267]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x9c)[0x5ba3bc]
/usr/sbin/mysqld(_ZN14Rows_log_event24do_table_scan_and_updateEPK14Relay_log_info+0x188)[0x883378]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0xd77)[0x88e127]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x68)[0x88f458]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x58e)[0x5b6fee]
/usr/lib64/galera/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xb1)[0x7ff1288ad2c1]
/usr/lib64/galera/libgalera_smm.so(+0x1aaf95)[0x7ff1288e4f95]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x283)[0x7ff1288e5e03]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x45)[0x7ff1288e66f5]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x2c9)[0x7ff1288c3349]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7ff1288c3823]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7ff1288e23f3]
/usr/lib64/galera/libgalera_smm.so(galera_recv+0x23)[0x7ff1288f7743]
/usr/sbin/mysqld[0x5b7caf]
/usr/sbin/mysqld(start_wsrep_THD+0x3e6)[0x5a8d96]
/lib64/libpthread.so.0[0x39138079d1]
/lib64/libc.so.6(clone+0x6d)[0x39134e88fd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 3
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150429 11:49:39 mysqld_safe Number of processes running now: 0
150429 11:49:39 mysqld_safe WSREP: not restarting wsrep node automatically
150429 11:49:39 mysqld_safe mysqld from pid file /var/lib/mysql/hostname.pid ended

Finally, we have a pretty standard MySQL crash report along with stack trace. If you look closely, you’ll see a pretty important bit of information - the stack trace contains entries about ‘libgalera_smm.so’. This points us to the Galera library being involved in this crash. As you can see, we are talking here about a node which is applying an event executed first on some other member of the cluster (_ZN9Log_event11apply_eventEP14Relay_log_info). This event involves SQL performing a table scan: 

_ZN14Rows_log_event24do_table_scan_and_updateEPK14Relay_log_info
_ZN7handler11ha_rnd_nextEPh

Apparently, while scanning our ‘db.tab’ table, something went wrong and MySQL crashed. From our experience, in such cases, it’s enough to create a PK on the table. We don’t recall issues which persist after that schema change. In general, if you don’t have a good candidate for a PK on the table, the best way to solve such problem would be to create a new integer (small/big - depending on the max expected table size) column, auto-incremented, unsigned, not null. Such column will work great as PK and it should prevent this type of errors.

Please keep in mind that for tables without PK, if you execute simultaneous writes on multiple Galera nodes, rows may have been ordered differently - this is because implicit PK base on id of a row. Such ID increases monotonically as new rows are inserted. So, everything depends on the write concurrency - higher it is, more likely rows were inserted in different order on different nodes. If this happened, when you add a Primary Key, you may end up with table not consistent across the cluster. There’s no easy workaround here. The easiest way (but also most intrusive) is to set one node in the cluster as a donor for the rest of it and then force SST on remaining nodes - this will rebuild other nodes using single source of truth. Another option is to use pt-table-sync, script which is a part of the Percona Toolkit. It will allow you to sync chosen table across the cluster. Of course, before you perform any kind of action it’s always good to use another Percona Toolkit script (pt-table-checksum) to check if given table is in sync across the cluster or not. If it is in sync (because you write only to a single node or write concurrency is low and issue hasn’t been triggered), there’s no need to do anything.

Network split

Galera Cluster requires quorum to operate - 50%+1 nodes have to be up in order to form a ‘Primary Component’. This mechanism is designed to ensure that a split brain won’t happen and that your application does not end up talking to two separate disconnected parts of the same cluster. Obviously, it’d be a very bad situation to be in - therefore we are happy for the protection Galera gives us. We are going to cover a real-world example of the network split and why we have to be cautious when dealing with such scenario.

150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspecting node: 428eb82c
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspected node without join message, declaring inactive
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspecting node: d272b968
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspected node without join message, declaring inactive

Two nodes (id’s of 428eb82c and d272b968) were declared inactive and left the cluster. Those nodes, in fact, were still running - the problem was related only to the network connectivity.

150529  9:30:05 [Note] WSREP: view(view_id(NON_PRIM,428eb82c,1111) memb {
        60bbf616,1
        a1024858,1
} joined {
} left {
} partitioned {
        428eb82c,0
        d272b968,0
})

This cluster consists of four nodes. Therefore, as expected, it switched to Non-Primary state as less than 50%+1 nodes are available.

150529  9:30:05 [Note] WSREP: declaring 60bbf616 at tcp://10.87.84.101:4567 stable
150529  9:30:05 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
150529  9:30:05 [Note] WSREP: Flow-control interval: [23, 23]
150529  9:30:05 [Note] WSREP: Received NON-PRIMARY.
150529  9:30:05 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 5351819)
150529  9:30:05 [Note] WSREP: New cluster view: global state: 8f630ade-4366-11e4-94c6-d6eb7adc0ddd:5351819, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
150529  9:30:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
150529  9:30:05 [Note] WSREP: view(view_id(NON_PRIM,60bbf616,1112) memb {
        60bbf616,1
        a1024858,1
} joined {
} left {
} partitioned {
        428eb82c,0
        d272b968,0
})
150529  9:30:05 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
150529  9:30:05 [Note] WSREP: Flow-control interval: [23, 23]
150529  9:30:05 [Note] WSREP: Received NON-PRIMARY.
150529  9:30:05 [Note] WSREP: New cluster view: global state: 8f630ade-4366-11e4-94c6-d6eb7adc0ddd:5351819, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3

...

150529  9:30:20 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') reconnecting to 428eb82c (tcp://172.16.14.227:4567), attempt 30
150529  9:30:20 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') address 'tcp://10.87.84.102:4567' pointing to uuid a1024858 is blacklisted, skipping
150529  9:30:21 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') reconnecting to d272b968 (tcp://172.16.14.226:4567), attempt 30

Above you can see a couple of failed attempts to bring the cluster back in sync. When a cluster is in ‘non-Primary’ state, it won’t rejoin unless all of the nodes are available again or one of the nodes is bootstrapped.

Bootstrapping from incorrect node

Sometimes bootstrapping the cluster using one of the nodes is the only quick way to bring the cluster up - usually you don’t have time to wait until all nodes become available again. Actually, you could wait forever if one of the nodes is down due to hardware failure or data corruption - in such cases bootstrapping the cluster is the only way to bring it back. What’s very important to keep in mind is that a cluster has to be bootstrapped from the most advanced node. Sometimes it may happen that one or more of the nodes is behind in applying writesets. In such cases you need to confirm which one of the nodes is the most advanced one by running, for example, mysqld_safe --wsrep_recover on stopped node. Failing to do so may result in troubles like shown in the error log below:

151104 16:52:19 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151104 16:52:19 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:235368 pair
151104 16:52:19 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:235368 to wsrep_start_position

...

2015-11-04 16:52:19 18033 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'

What we have here is a standard startup process for galera. Please note that the cluster consists of three nodes.

2015-11-04 16:52:19 18033 [Warning] WSREP: (6964fefd, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.220:4567' points to own listening address, blacklisting
2015-11-04 16:52:19 18033 [Note] WSREP: (6964fefd, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-11-04 16:52:20 18033 [Note] WSREP: declaring 5342d84a at tcp://172.30.4.191:4567 stable
2015-11-04 16:52:20 18033 [Note] WSREP: Node 5342d84a state prim
2015-11-04 16:52:20 18033 [Note] WSREP: view(view_id(PRIM,5342d84a,2) memb {
    5342d84a,0
    6964fefd,0
} joined {
} left {
} partitioned {
})
2015-11-04 16:52:20 18033 [Note] WSREP: save pc into disk
2015-11-04 16:52:20 18033 [Note] WSREP: discarding pending addr without UUID: tcp://172.30.4.156:4567
2015-11-04 16:52:20 18033 [Note] WSREP: gcomm: connected
2015-11-04 16:52:20 18033 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-11-04 16:52:20 18033 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2015-11-04 16:52:20 18033 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2015-11-04 16:52:20 18033 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2015-11-04 16:52:20 18033 [Note] WSREP: Waiting for SST to complete.
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: sent state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: got state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643 from 0 (172.30.4.191)
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: got state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643 from 1 (172.30.4.220)

So far so good, nodes are communicating with each other.

2015-11-04 16:52:20 18033 [ERROR] WSREP: gcs/src/gcs_group.cpp:group_post_state_exchange():319: Reversing history: 0 -> 0, this member has applied 140700504171408 more events than the primary component.Data loss is possible. Aborting.
2015-11-04 16:52:20 18033 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
151104 16:52:20 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended

Unfortunately, one of the nodes was more advanced that the node we bootstrapped the cluster from. This insane number of events is not really valid, as you can tell. It’s true though that this node cannot join the cluster. The only way to do it is by running SST. But in such case you are going to lose some of the data which wasn’t applied on the node you bootstrapped the cluster from. The best practice is to try and recover this data from the ‘more advanced’ node - hopefully there are binlogs available. If not, the process is still possible but comparing the data on a live system is not an easy task.

Consistency issues

Within a cluster, all Galera nodes should be consistent. No matter where you write, the change should be replicated across all nodes. For that, as a base method of moving data around, Galera uses row based replication -  this is the best way of replicating data while maintaining consistency of the data set. All nodes are consistent - this is true most of the time. Sometimes, though, things do not work as they are expected to. Even with Galera, nodes may get out of sync. There are numerous reasons for that - from node crashes which result in data loss to human errors while tinkering around replication or ‘wsrep_on’ variables. What is important - Galera has an internal mechanism of maintaining consistency. If it detects an error while applying a writeset, a node will shut down to ensure that inconsistency won’t propagate, for example through SST.

By looking at the error log, this is how things will look like.

2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032

If you are familiar with row based replication, you may have seen errors similar to the one above. This is information from MySQL that one of the events cannot be executed. In this particular case, some update failed to apply.

2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351

Here we can see the Galera part of the information - writeset with sequence number of 563464 couldn’t be applied.

Retrying 2th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464

To avoid transient errors (like deadlocks, for example), the writeset has been re-executed four times. It was a permanent error this time so all attempts failed.

2015-11-12 15:35:59 5071 [ERROR] WSREP: Failed to apply trx: source: 7c775f39-893d-11e5-93b5-0b0d185bde79 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 5 trx_id: 2779411 seqnos (l: 12, g: 563464, s: 563463, d: 563463, ts: 11103845835944)
2015-11-12 15:35:59 5071 [ERROR] WSREP: Failed to apply trx 563464 4 times
2015-11-12 15:35:59 5071 [ERROR] WSREP: Node consistency compromized, aborting…

With failure to apply the writeset, the node was determined inconsistent with the rest of the cluster and Galera took an action which is intended to minimize the impact of this inconsistency - the node is going to be shut down and it won’t be able to join the cluster as long as this writeset remains unapplied.

2015-11-12 15:35:59 5071 [Note] WSREP: Closing send monitor...
2015-11-12 15:35:59 5071 [Note] WSREP: Closed send monitor.
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: terminating thread
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: joining thread
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: closing backend
2015-11-12 15:35:59 5071 [Note] WSREP: view(view_id(NON_PRIM,7c775f39,3) memb {
    8165505b,0
} joined {
} left {
} partitioned {
    7c775f39,0
    821adb36,0
})
2015-11-12 15:35:59 5071 [Note] WSREP: view((empty))
2015-11-12 15:35:59 5071 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: closed
2015-11-12 15:35:59 5071 [Note] WSREP: Flow-control interval: [16, 16]
2015-11-12 15:35:59 5071 [Note] WSREP: Received NON-PRIMARY.
2015-11-12 15:35:59 5071 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 563464)
2015-11-12 15:35:59 5071 [Note] WSREP: Received self-leave message.
2015-11-12 15:35:59 5071 [Note] WSREP: Flow-control interval: [0, 0]
2015-11-12 15:35:59 5071 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2015-11-12 15:35:59 5071 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 563464)
2015-11-12 15:35:59 5071 [Note] WSREP: RECV thread exiting 0: Success
2015-11-12 15:35:59 5071 [Note] WSREP: recv_thread() joined.
2015-11-12 15:35:59 5071 [Note] WSREP: Closing replication queue.
2015-11-12 15:35:59 5071 [Note] WSREP: Closing slave action queue.
2015-11-12 15:35:59 5071 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
151112 15:35:59 mysqld_safe Number of processes running now: 0
151112 15:35:59 mysqld_safe WSREP: not restarting wsrep node automatically
151112 15:35:59 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended

What follows is the regular MySQL shutdown process.

This concludes the first part of our troubleshooting blog. In the second part of the blog, we will cover different types of SST issues, as well as problems with network streaming.

Blog category:


Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 2

$
0
0

This is part 2 of our blog on how to troubleshoot Galera cluster - SST errors, and problems with network streaming. In part 1 of the blog, we covered issues ranging from node crashes to clusters that won’t restart, network splits and inconsistent data. Note that the issues described are all examples inspired from real life incidents in production environments.

This is the eighteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

 

SST errors

State Snapshot Transfer (SST) is a mechanism intended to provision or rebuild Galera nodes. There are a couple of ways you can execute SST, including mysqldump and rsync. The most popular one, though, is to perform SST using xtrabackup - it allows the donor to stay online and makes the SST process less intrusive. Xtrabackup SST process involves streaming data over the network from the donor to the joining node. Additionally, xtrabackup needs access to MySQL to grab the InnoDB log data - it checks ‘wsrep_sst_auth’ variable for access credentials. In this chapter we are going to cover examples of problems related to SST.

Incorrect password for SST

Sometimes it may happen that the credentials for MySQL access, set in wsrep_sst_auth, are not correct. In such case SST won’t complete properly because xtrabackup, on the donor, will not be able to connect to the MySQL database to perform a backup. You may then see the following symptoms in the error log.

Let’s start with a joining node.

2015-11-13 10:40:57 30484 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 16,
        members    = 2/3 (joined/total),
        act_id     = 563464,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11
2015-11-13 10:40:57 30484 [Note] WSREP: Flow-control interval: [28, 28]
2015-11-13 10:40:57 30484 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 563464)
2015-11-13 10:40:57 30484 [Note] WSREP: State transfer required:
        Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:563464
        Local state: 00000000-0000-0000-0000-000000000000:-1

So far we have the standard SST process - a node joined the cluster and state transfer was deemed necessary. 

...

2015-11-13 10:40:57 30484 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-11-13 10:40:57 30484 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --
defaults-group-suffix '' --parent '30484''''
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:40:57.975)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:40:57.977)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20151113 10:40:57.980)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.005)
2015-11-13 10:40:58 30484 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.220:4444/xtrabackup_sst//1

Xtrabackup started.

2015-11-13 10:40:58 30484 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:40:58 30484 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Note] WSREP: Assign initial position for certification: 563464, protocol version: 3
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (98ed75de-7c05-11e5-9743-de4abc22bd11): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2015-11-13 10:40:58 30484 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:40:58 30484 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)

IST was not available due to the state of the node (in this particular case, the MySQL data directory was not available).

2015-11-13 10:40:58 30484 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20151113 10:40:58.430)
WSREP_SST: [INFO] Proceeding with SST (20151113 10:40:58.434)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.435)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 10:40:58.436)
removed ‘/var/lib/mysql/ibdata1’
removed ‘/var/lib/mysql/ib_logfile1’
removed ‘/var/lib/mysql/ib_logfile0’
removed ‘/var/lib/mysql/auto.cnf’
removed ‘/var/lib/mysql/mysql.sock’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 10:40:58.568)

The remaining files from the data directory were removed, and the node started to wait for SST process to complete.

2015-11-13 10:41:00 30484 [Note] WSREP: (05e14925, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20151113 10:41:08.407)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20151113 10:41:08.409)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '30484''' : 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] Aborting

2015-11-13 10:41:08 30484 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:41:08 30484 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

Unfortunately, SST failed on a donor node and, as a result, Galera node aborted and the mysqld process stopped. There’s not much information here about the exact cause of the problem, we’ve been pointed to the donor, though. Let’s take a look at it’s log.

2015-11-13 10:44:33 30400 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:44:33 30400 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 563464)
2015-11-13 10:44:33 30400 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:44:33 30400 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464''
2015-11-13 10:44:33 30400 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:44:33.598)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:44:33.600)
WSREP_SST: [INFO] Using /tmp/tmp.jZEi7YBrNl as xtrabackup temporary directory (20151113 10:44:33.613)
WSREP_SST: [INFO] Using /tmp/tmp.wz0XmveABt as innobackupex temporary directory (20151113 10:44:33.615)
WSREP_SST: [INFO] Streaming GTID file before SST (20151113 10:44:33.619)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:33.621)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20151113 10:44:33.626)
2015-11-13 10:44:35 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 10:44:43.628)

In this part of the log, you can see some similar lines as in the joiner’s log - joiner (172.30.4.220) is a joiner and it requested state transfer from 172.30.4.156 (our donor node). Donor switched to a donor/desync state and xtrabackup has been triggered. 

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>
${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:43.631)
2015-11-13 10:44:43 30400 [Warning] Access denied for user 'root'@'localhost' (using password: YES)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 10:44:43.638)

At some point SST failed - we can see a hint about what may be the culprit of the problem. There’s also information about where we can look even further for information - innobackup.backup.log file in MySQL data directory.

WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 10:44:43.640)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 10:44:43.642)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.s
ock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464': 22 (Invalid argument)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 563464)
2015-11-13 10:44:43 30400 [Note] WSREP: Member 1.0 (172.30.4.156) synced with group.
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 563464)

All SST-related processes terminate and, the donor gets in sync with the rest of the cluster switching to ‘Synced’ state.

Finally, let’s take a look at the innobackup.backup.log:

151113 14:00:57 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:00:57 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Failed to connect to MySQL server: Access denied for user 'root'@'localhost' (using password: YES).

This time there’s nothing new - the problem is related to access issues to MySQL database - such issue should immediately point you to my.cnf and ‘wsrep_sst_auth’ variable.

Problems with network streaming

Sometimes it may happen that streaming doesn’t work correctly. It can happen because there’s a network error or maybe some of the processes involved died. Let’s check how a network error impacts the SST process.

First error log snippet comes from the joiner node.

2015-11-13 14:30:46 24948 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 14:30:46 24948 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 14:30:46 24948 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20151113 14:30:46.352)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:46.353)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 14:30:46.356)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 14:30:46.363)
2015-11-13 14:30:48 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
2015-11-13 14:31:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 0
2015-11-13 14:33:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 30
2015-11-13 14:35:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 60
^@2015-11-13 14:37:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 90
^@2015-11-13 14:39:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 120
^@^@2015-11-13 14:41:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 150
^@^@2015-11-13 14:43:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 180
^@^@2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210
^@^@2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: terminating thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: joining thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closing backend
2015-11-13 14:46:30 24948 [Note] WSREP: view(view_id(NON_PRIM,201db672,149) memb {
    201db672,0
} joined {
} left {
} partitioned {
    b7a335ea,0
    fff6c307,0
})
2015-11-13 14:46:30 24948 [Note] WSREP: view((empty))
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closed
2015-11-13 14:46:30 24948 [Note] WSREP: /usr/sbin/mysqld: Terminated.

As you can see, it’s obvious that SST failed:

2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

A few lines before that, though: 

2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
...
2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210

The donor node was declared inactive - this is a very important indication of network problems. Given that Galera tried 210 times to communicate with the other node, it’s pretty clear this is not a transient error but something more serious. On the donor node, the error log contains the following entries that also make it clear something is not right with the network:

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:56.713)
2015-11-13 14:31:02 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.220:4567
2015-11-13 14:31:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 0

...

2015-11-13 14:45:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 210
2015/11/13 14:46:30 socat[8243] E write(3, 0xc1a1f0, 8192): Connection timed out
2015-11-13 14:46:30 30400 [Warning] Aborted connection 76 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 14:46:30.767)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 14:46:30.769)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 14:46:30.771)

This time the disappearing node is 172.30.4.220 - the node which is joining the cluster. Finally, innobackup.backup.log looks as follows:

Warning: Using unique option prefix open_files instead of open_files_limit is deprecated and will be removed in a future release. Please use the full name instead.
151113 14:30:56 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:30:56 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Using server version 5.6.26-74.0-56
innobackupex version 2.3.2 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 306a2e0)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 4000000, set to 1000000
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:100M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 536870912
xtrabackup: using O_DIRECT
151113 14:30:56 >> log scanned up to (8828922053)
xtrabackup: Generating a list of tablespaces
151113 14:30:57 [01] Streaming ./ibdata1
151113 14:30:57 >> log scanned up to (8828922053)
151113 14:30:58 >> log scanned up to (8828922053)
151113 14:30:59 [01]        ...done
151113 14:30:59 [01] Streaming ./mysql/innodb_table_stats.ibd
151113 14:30:59 >> log scanned up to (8828922053)
151113 14:31:00 >> log scanned up to (8828922053)
151113 14:31:01 >> log scanned up to (8828922053)
151113 14:31:02 >> log scanned up to (8828922053)

...

151113 14:46:27 >> log scanned up to (8828990096)
151113 14:46:28 >> log scanned up to (8828990096)
151113 14:46:29 >> log scanned up to (8828990096)
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

Pipe was broken because the network connection timed out.

One interesting question - how we can track errors when one of the components required to perform SST dies? It can be anything from xtrabackup process to socat, which is needed to stream the data. Let’s go over some examples:

2015-11-13 15:40:02 29258 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:40:02 29258 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20151113 15:40:02.497)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:40:02.499)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:40:02.499)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:40:02.511)
2015-11-13 15:40:04 29258 [Note] WSREP: (cd0c84cd, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015/11/13 15:40:13 socat[29525] E write(1, 0xf2d420, 8192): Broken pipe
/usr//bin/wsrep_sst_xtrabackup-v2: line 112: 29525 Exit 1                  socat -u TCP-LISTEN:4444,reuseaddr stdio
     29526 Killed                  | xbstream -x
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 1 137 (20151113 15:40:13.113)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:40:13.116)

In the case described above, as can be clearly seen, xbstream was killed.

When socat process gets killed, joiner logs do not offer much data:

2015-11-13 15:43:05 9717 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:43:05 9717 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:05.438)
WSREP_SST: [INFO] Proceeding with SST (20151113 15:43:05.438)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:43:05.440)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:43:05.448)
2015-11-13 15:43:07 9717 [Note] WSREP: (3a69eae9, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 137 0 (20151113 15:43:11.442)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:43:11.444)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '9717''' : 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] Aborting

We have just an indication that something went really wrong and that SST failed. On the donor node, things are a bit different:

WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 15:43:15.875)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:15.878)
2015/11/13 15:43:15 socat[16495] E connect(3, AF=2 172.30.4.220:4444, 16): Connection refused
2015-11-13 15:43:16 30400 [Warning] Aborted connection 81 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 15:43:16.305)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 15:43:16.306)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 15:43:16.309)

We can find ‘Aborted connection’ warning and information that innobackupex (wrapper that’s used to execute xtrabackup) finished with error. Still not that much hard data. Luckily, we have also innobackup.backup.log to check:

151113 15:46:59 >> log scanned up to (8829243081)
xtrabackup: Generating a list of tablespaces
151113 15:46:59 [01] Streaming ./ibdata1
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

This time it’s much more clear - we have an error in xtrabackup_copy_datafile() function which, as you may expect, is responsible for copying files. This leads us to the conclusion that some kind of network connection issue happened between donor and joiner. Maybe not a detailed explanation but the conclusion is still valid - killing the process which is on the other end of the network connection closes that connection abruptly - something which can be described like a network issue of some kind.

We are approaching the end of this post - hopefully you enjoyed our small trip through the Galera logs. As you have probably figured out by now, it is important to check an issue from every possible angle when dealing with Galera errors. Galera nodes work together to form a cluster but they are still separate hosts connected via a network. Sometimes looking at single node’s logs is not enough to understand how the issue unfolded - you need to check every node’s point of view. This is also true for all potential issues about SST - sometimes it’s not enough to check logs on a joiner node. You also have to look into the donor’s logs - both error log and innobackup.backup.log. 

The examples of issues described in this post (and the one before) only covers a small part of all possible problems, but hopefully, they’ll help you understand the troubleshooting process.

In the next post, we are going to take a closer look at pt-stalk - a tool which may help you understand what is going on with MySQL when the standard approach fails to catch the problem.

Blog category:

Webinar replay & slides for MySQL DBAs: performing live database upgrades in replication & Galera setups

$
0
0

Thanks to everyone who joined us for our recent live webinar on performing live database upgrades for MySQL Replication & Galera, led by Krzysztof Książek. The replay and slides to the webinar are now available to watch and read online via the links below.

During this live webinar, Krzysztof covered one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments.

Watch the replay

 

Read the slides

 

Topics

  • What types of upgrades are there?
  • How do I best prepare for the upgrades?
  • Best practices for:
    • Minor version upgrades - MySQL Replication & Galera
    • Major version upgrades - MySQL Replication & Galera

up56_logo.png

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. 

This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

Blog category:

Become a ClusterControl DBA - SSL Key Management and Encryption of MySQL Data in Transit

$
0
0

Databases usually work in a secure environment. It may be a datacenter with a dedicated VLAN for database traffic. It may be a VPC in EC2. If your network spreads across multiple datacenters in different regions, you’d usually use some kind of Virtual Private Network or SSH tunneling to connect these locations in a secure manner. With data privacy and security being hot topics these days, you might feel better with an additional layer of security.

MySQL supports SSL as a means to encrypt traffic both between MySQL servers (replication) and between MySQL servers and clients. If you use Galera cluster, similar features are available - both intra-cluster communication and connections with clients can be encrypted using SSL.

A common way of implementing SSL encryption is to use self-signed certificates. Most of the time, it is not necessary to purchase an SSL certificate issued by the Certificate Authority. Anybody who’s been through the process of generating a self-signed certificate will probably agree that it is not the most straightforward process - most of the time, you end up searching through the internet to find howto’s and instructions on how to do this. This is especially true if you are a DBA and only go through this process every few months or even years. This is why we recently added a new ClusterControl feature designed to help you manage SSL keys across your database cluster. We based this blog post on ClusterControl 1.3.1.

Key Management in the ClusterControl

You can enter Key Management by going to Settings->Key Management section.

You will be presented with the following screen:

You can see two certificates generated, one being a CA and the other one a regular certificate. To generate more certificates, switch to the ‘Generate Key’ tab:

A certificate can be generated in two ways - you can first create a self-signed CA and then use it to sign a certificate. Or you can go directly to the ‘Client/Server Certificates and Key’ tab and create a certificate. The required CA will be created for you in the background. Last but not least, you can import an existing certificate (for example a certificate you bought from one of many companies which sell SSL certificates).

To do that, you should upload your certificate, key and CA to your ClusterControl node and store them in /var/lib/cmon/ca directory. Then you fill in the paths to those files and the certificate will be imported.

If you decided to generate a CA or generate a new certificate, there’s another form to fill - you need to pass details about your organization, common name, email, pick the key length and expiration date.

Once you have everything in place, you can start using your new certificates. ClusterControl currently supports deployment of SSL encryption between clients and MySQL databases and SSL encryption of intra-cluster traffic in Galera Cluster. We plan to extend the variety of supported deployments in future releases of the ClusterControl.

Full SSL encryption for Galera Cluster

Now let’s assume we have our SSL keys ready and we have a Galera Cluster, which needs SSL encryption, deployed through our ClusterControl instance. We can easily secure it in two steps.

First - encrypt Galera traffic using SSL. From your cluster view, one of the cluster actions is  Enable Galera SSL Encryption. You’ll be presented with the following options:

If you do not have a certificate, you can generate it here. But if you already generated or imported an SSL certificate, you should be able to see it in the list and use it to encrypt Galera replication traffic. Please keep in mind that this operation requires a cluster restart - all nodes will have to stop at the same time, apply config changes and then restart. Before you proceed here, make sure you are prepared for some downtime while the cluster restarts.

Once intra-cluster traffic has been secured, we want to cover client - server connections. To do that, pick ‘Enable SSL Encryption’ job and you’ll see following dialog:

It’s pretty similar - you can either pick an existing certificate or generate new one. The main difference is that to apply client - server encryption, downtime is not required - a rolling restart will suffice.

Of course, enabling SSL on the database is not enough - you have to copy certificates to clients which are supposed to use SSL to connect to the database. All certificates can be found in /var/lib/cmon/ca directory on the ClusterControl node. You also have to remember to change grants for users and make sure you’ve added REQUIRE SSL to them if you want to enforce only secure connections.

We hope you’ll find those options easy to use and help you secure your MySQL environment. If you have any questions or suggestions regarding this feature, we’d love to hear from you.

Severalnines Launches #MySQLHA CrowdChat

$
0
0

Today we launch our live CrowdChat on everything #MySQLHA!

This CrowdChat is brought to you by Severalnines and is hosted by a community of subject matter experts. CrowdChat is a community platform that works across Facebook, Twitter, and LinkedIn to allow users to discuss a topic using a specific #hashtag. This crowdchat focuses on the hashtag #MySQLHA. So if you’re a DBA, architect, CTO, or a database novice register to join and become part of the conversation!

Join this online community to interact with experts on Galera clusters. Get your questions answered and join the conversation around everything #MySQLHA.

Register free

Meet the experts

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.

Krzysztof Książek is a Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

Ashraf Sharif is a System Support Engineer at Severalnines. He has previously worked as principal consultant and head of support team and delivered clustering solutions for big websites in the South East Asia region. His professional interests focus on system scalability and high availability.

Vinay Joosery is a passionate advocate and builder of concepts and businesses around Big Data computing infrastructures. Prior to co-founding Severalnines, Vinay held the post of Vice-President EMEA at Pentaho Corporation - the Open Source BI leader. He has also held senior management roles at MySQL / Sun Microsystems / Oracle, where he headed the Global MySQL Telecoms Unit, and built the business around MySQL's High Availability and Clustering product lines. Prior to that, Vinay served as Director of Sales & Marketing at Ericsson Alzato, an Ericsson-owned venture focused on large scale real-time databases.

Infrastructure Automation - Deploying ClusterControl and MySQL-based systems on AWS using Ansible

$
0
0

We recently made a number of enhancements to the ClusterControl Ansible Role, so it now also supports automatic deployment of MySQL-based systems (MySQL Replication, Galera Cluster, NDB Cluster). The updated role uses the awesome ClusterControl RPC interface to automate deployments. It is available at Ansible Galaxy and Github.

TLDR; It is now possible to define your database clusters directly in a playbook (see below example), and let Ansible and ClusterControl automate the entire deployment:

cc_cluster:
  - deployment: true
    operation: "create"
    cluster_type: "galera"
    mysql_hostnames:
      - "192.168.1.101"
      - "192.168.1.102"
      - "192.168.1.103"
    mysql_password: "MyPassword2016"
    mysql_port: 3306
    mysql_version: "5.6"
    ssh_keyfile: "/root/.ssh/id_rsa"
    ssh_user: "root"
    vendor: "percona"

What’s New?

The major improvement is that you can now automatically deploy a new database setup while deploying ClusterControl. Or you can also register already deployed databases.

Define your database cluster in the playbook, within the “cc_cluster” item, and ClusterControl will perform the deployment. We also introduced a bunch of new variables to simplify the initial setup of ClusterControl for default admin credentials, ClusterControl license and LDAP settings.

Along with these new improvements, we can leverage the Ansible built-in cloud module to automate the rest of the infrastructure that our database rely on - instance provisioning, resource allocation, network configuration and storage options. In simple words, write your infrastructure as code and let Ansible work together with ClusterControl to build the entire stack.

We also included example playbooks in the repository for reference. Check them out at our Github repository page.

Example Deployment on Amazon EC2

In this example we are going to deploy two clusters on Amazon EC2 using our new role:

  • 1 node for ClusterControl
  • 3 nodes Galera Cluster (Percona XtraDB Cluster 5.6)
  • 4 nodes MySQL Replication (Percona Server 5.7)

The following diagram illustrates our setup with Ansible:

First, let’s decide what our infrastructure in AWS will look like:

  • Region: us-west-1
  • Availability Zone: us-west-1a
  • AMI ID: ami-d1315fb1
    • AMI name: RHEL 7.2 HVM
    • SSH user: ec2-user
  • Instance size: t2.medium
  • Keypair: mykeypair
  • VPC subnet: subnet-9ecc2dfb
  • Security group: default

Preparing our Ansible Master

We are using Ubuntu 14.04 as the Ansible master host in a local data-center to deploy our cluster on AWS EC2.

  1. If you are already have Ansible installed, you may skip this step:

    $ apt-get install ansible python-setuptools
  2. Install boto (required by ec2 python script):

    $ pip install boto
  3. Download and configure ec2.py and ec2.ini under /etc/ansible. Ensure the Python script is executable:

    $ cd /etc/ansible
    $ wget https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.py
    $ wget https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.ini
    $ chmod 755 /etc/ansible/ec2.py
  4. Set up Secret and Access key environment variables. Get the AWS Secret and Access Key from Amazon EC2 Identity and Access Management (IAM) and configure them as per below:

    $ export AWS_ACCESS_KEY_ID='YOUR_AWS_API_KEY'
    $ export AWS_SECRET_ACCESS_KEY='YOUR_AWS_API_SECRET_KEY'
  5. Configure environment variables for ec2.py and ec2.ini.

    export ANSIBLE_HOSTS=/etc/ansible/ec2.py
    export EC2_INI_PATH=/etc/ansible/ec2.ini
  6. Configure AWS keypair. Ensure the keypair exists on the node. For example, if the keypair is located under /root/mykeypair.pem, use the following command to add it to the SSH agent:

    $ ssh-agent bash
    $ ssh-add /root/mykeypair.pem
  7. Verify if Ansible can see our cloud. If there are running EC2 instances, you should get a list of them in JSON by running this command:

    $ /etc/ansible/ec2.py --list

**Note that you can put steps 4 and 5 inside .bashrc or .bash_profile file to load the environment variables automatically.

Define our infrastructure inside the Ansible playbook

We are going to create 2 playbooks. The first one is the definition of our EC2 instances in AWS (ec2-instances.yml) and the second one to deploy ClusterControl and the database clusters (deploy-everything.yml).

Here is an example content of ec2-instances.yml:

- name: Create instances
  hosts: localhost
  gather_facts: False

  tasks:
    - name: Provision ClusterControl node
      ec2:
        count: 1
        region: us-east-1
        zone: us-east-1a
        key_name: mykeypair
        group: default
        instance_type: t2.medium
        image: ami-3f03c55c
        wait: yes
        wait_timeout: 500
        volumes:
          - device_name: /dev/sda1
            device_type: standard
            volume_size: 20
            delete_on_termination: true
        monitoring: no
        vpc_subnet_id: subnet-9ecc2dfb
        assign_public_ip: yes
        instance_tags:
          Name: clustercontrol
          set: ansible
          group: clustercontrol

    - name: Provision Galera nodes
      ec2:
        count: 3
        region: us-east-1
        zone: us-east-1a
        key_name: mykeypair
        group: default
        instance_type: t2.medium
        image: ami-3f03c55c
        wait: yes
        wait_timeout: 500
        volumes:
          - device_name: /dev/sdf
            device_type: standard
            volume_size: 20
            delete_on_termination: true
        monitoring: no
        vpc_subnet_id: subnet-9ecc2dfb
        assign_public_ip: yes
        instance_tags:
          Name: galeracluster
          set: ansible
          group: galeracluster

    - name: Provision MySQL Replication nodes
      ec2:
        count: 4
        region: us-east-1
        zone: us-east-1a
        key_name: mykeypair
        group: default
        instance_type: t2.medium
        image: ami-3f03c55c
        wait: yes
        wait_timeout: 500
        volumes:
          - device_name: /dev/sdf
            device_type: standard
            volume_size: 20
            delete_on_termination: true
        monitoring: no
        vpc_subnet_id: subnet-9ecc2dfb
        assign_public_ip: yes
        instance_tags:
          Name: replication
          set: ansible
          group: replication

There are three types of instances with different instance_tags (Name, set and group) in the playbook. The “group” tag distinguishes our host group accordingly so it can be called in our deployment playbook as part of Ansible host inventory. The "set" tag marks the instances were created by Ansible. Since we are provisioning everything from a local data-center, we set assign_public_ip to “yes” so the instances are reachable inside a VPC under “subnet-9ecc2dfb”.

Next, we create the deployment playbook as per below (deploy-everything.yml):

- name: Configure ClusterControl instance.
  hosts: tag_group_clustercontrol
  become: true
  user: ec2-user
  gather_facts: true

  roles:
    - { role: severalnines.clustercontrol, tags: controller }

  vars:
    cc_admin:
      - email: "admin@email.com"
        password: "test123"

- name: Configure Galera Cluster and Replication instances.
  hosts: 
    - tag_group_galeracluster
    - tag_group_replication
  user: ec2-user
  become: true
  gather_facts: true

  roles:
    - { role: severalnines.clustercontrol, tags: dbnodes }

  vars:
    clustercontrol_ip_address: "{{ hostvars[groups['tag_group_clustercontrol'][0]]['ec2_ip_address'] }}"

- name: Create the database clusters.
  hosts: tag_group_clustercontrol
  become: true
  user: ec2-user

  roles:
    - { role: severalnines.clustercontrol, tags: deploy-database }

  vars:
    cc_cluster:
      - deployment: true
        operation: "create"
        cluster_type: "galera"
        mysql_cnf_template: "my.cnf.galera"
        mysql_datadir: "/var/lib/mysql"
        mysql_hostnames:
          - "{{ hostvars[groups['tag_group_galeracluster'][0]]['ec2_ip_address'] }}"
          - "{{ hostvars[groups['tag_group_galeracluster'][1]]['ec2_ip_address'] }}"
          - "{{ hostvars[groups['tag_group_galeracluster'][2]]['ec2_ip_address'] }}"
        mysql_password: "password"
        mysql_port: 3306
        mysql_version: "5.6"
        ssh_keyfile: "/root/.ssh/id_rsa"
        ssh_user: "root"
        sudo_password: ""
        vendor: "percona"
      - deployment: true
        operation: "create"
        cluster_type: "replication"
        mysql_cnf_template: "my.cnf.repl57"
        mysql_datadir: "/var/lib/mysql"
        mysql_hostnames:
          - "{{ hostvars[groups['tag_group_replication'][0]]['ec2_ip_address'] }}"
          - "{{ hostvars[groups['tag_group_replication'][1]]['ec2_ip_address'] }}"
          - "{{ hostvars[groups['tag_group_replication'][2]]['ec2_ip_address'] }}"
          - "{{ hostvars[groups['tag_group_replication'][3]]['ec2_ip_address'] }}"
        mysql_password: "password"
        mysql_port: 3306
        mysql_version: "5.7"
        ssh_keyfile: "/root/.ssh/id_rsa"
        ssh_user: "root"
        sudo_password: ""
        vendor: "percona"

The ansible user is “ec2-user” for RHEL 7.2 image. The playbook shows the deployment flow as:

  1. Install and configure ClusterControl (tags: controller)
  2. Setup passwordless SSH from ClusterControl node to all database nodes (tags: dbnodes). In this section, we have to define clustercontrol_ip_address so we know which ClusterControl node is used to manage our nodes.
  3. Perform the database deployment. The database cluster item definition will be passed to the ClusterControl RPC interface listening on the EC2 instance that has “tag_group_clustercontrol”. For MySQL replication, the first node in the mysql_hostnames list is the master.

The above are the simplest variables used to get you started. For more customization options, you can refer to the documentation page of the role under Variables section.

Fire them up

You need to have the Ansible role installed. Grab it from Ansible Galaxy or Github repository:

$ ansible-galaxy install severalnines.clustercontrol

Then, create the EC2 instances:

$ ansible-playbook -i /etc/ansible/ec2.py ec2-instances.yml

Refresh the inventory:

$ /etc/ansible/ec2.py --refresh-cache

Verify all EC2 instances are reachable before the deployment begins (you should get SUCCESS for all nodes):

$ ansible -m ping "tag_set_ansible" -u ec2-user

Install ClusterControl and deploy the database cluster:

$ ansible-playbook -i /etc/ansible/ec2.py deploy-everything.yml

Wait for a couple of minutes until the playbook completes. Then, login to ClusterControl using the default email address and password defined in the playbook and you should be inside the ClusterControl dashboard. Go to Settings -> Cluster Job, you should see the “Create Cluster” jobs scheduled and deployment is under progress.

This is our final result on ClusterControl dashboard:

The total deployment time from installing Ansible to the database deployment took about 50 minutes. This included waiting for the instances to be created and database clusters to be deployed. This is pretty good, considering we were spinning 8 nodes and deploying two database clusters from scratch. How long does it take you to deploy two clusters from scratch?

Future Plan

At the moment, the Ansible role only supports deployment of the following:

  • Create new Galera Cluster
    • Percona XtraDB Cluster (5.5/5.6)
    • MariaDB Galera Cluster (5.510.1)
    • MySQL Galera Cluster - Codership (5.5/5.6)
  • Create new MySQL Replication
    • Percona Server (5.7/5.6)
    • MariaDB Server (10.1)
    • MySQL Server - Oracle (5.7)
  • Add existing Galera Cluster
    • Percona/MariaDB/Codership (all stable version)

We’re in the process of adding support for other cluster types supported by ClusterControl.

We’d love to hear your feedback in the comments section below. Would you like to see integration with more cloud providers (Azure, Google Cloud Platform, Rackspace)? What about virtualization platforms like OpenStack, VMware, Vagrant and Docker? How about load balancers (HAProxy, MaxScale and ProxySQL)? And Galera arbitrator (garbd), asynchronous replication slaves to Galera clusters, and backup management right from the Ansible playbook? The list can be very long, so let us know what is important to you. Happy automation!

MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking

$
0
0

In the previous blog post, we looked into Docker’s single-host networking for MySQL containers. This time, we are going to look into the basics of multi-host networking and Docker swarm mode, a built-in orchestration tool to manage containers across multiple hosts.

Docker Engine - Swarm Mode

Running MySQL containers on multiple hosts can get a bit more complex depending on the clustering technology you choose.

Before we try to run MySQL on containers + multi-host networking, we have to understand how the image works, how much resources to allocate (disk,memory,CPU), networking (the overlay network drivers - default, flannel, weave, etc) and fault tolerance (how is the container relocated, failed over and load balanced). Because all these will impact the overall operations, uptime and performance of the database. It is recommended to use an orchestration tool to get more manageability and scalability on top of your Docker engine cluster. The latest Docker Engine (version 1.12, released on July 14th, 2016) includes swarm mode for natively managing a cluster of Docker Engines called a Swarm. Take note that Docker Engine Swarm mode and Docker Swarm are two different projects, with different installation steps despite they both work in a similar way.

Some of the noteworthy parts that you should know before entering the swarm world:

  • The following ports must be opened:
    • 2377 (TCP) - Cluster management
    • 7946 (TCP and UDP) - Nodes communication
    • 4789 (TCP and UDP) - Overlay network traffic
  • There are 2 types of nodes:
    • Manager - Manager nodes perform the orchestration and cluster management functions required to maintain the desired state of the swarm. Manager nodes elect a single leader to conduct orchestration tasks.
    • Worker - Worker nodes receive and execute tasks dispatched from manager nodes. By default, manager nodes are also worker nodes, but you can configure managers to be manager-only nodes.

More details in the Docker Engine Swarm documentation.

In this blog, we are going to deploy application containers on top of a load-balanced Galera Cluster on 3 Docker hosts (docker1, docker2 and docker3), connected through an overlay network. We will use Docker Engine Swarm mode as the orchestration tool.

“Swarming” Up

Let’s cluster our Docker nodes into a Swarm. Swarm mode requires an odd number of managers (obviously more than one) to maintain quorum for fault tolerance. So, we are going to use all the physical hosts as manager nodes. Note that by default, manager nodes are also worker nodes.

  1. Firstly, initialize Swarm mode on docker1. This will make the node as manager and leader:

    [root@docker1]$ docker swarm init --advertise-addr 192.168.55.111
    Swarm initialized: current node (6r22rd71wi59ejaeh7gmq3rge) is now a manager.
    
    To add a worker to this swarm, run the following command:
    
        docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-dzvgu0h3qngfgihz4fv0855bo \
        192.168.55.111:2377
    
    To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
  2. We are going to add two more nodes as manager. Generate the join command for other nodes to register as manager:

    [docker1]$ docker swarm join-token manager
    To add a manager to this swarm, run the following command:
    
        docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-7fd1an5iucy4poa4g1bnav0pt \
        192.168.55.111:2377
  3. On docker2 and docker3, run the following command to register the node:

    $ docker swarm join \
        --token SWMTKN-1-16kit6dksvrqilgptjg5pvu0tvo5qfs8uczjq458lf9mul41hc-7fd1an5iucy4poa4g1bnav0pt \
        192.168.55.111:2377
  4. Verify if all nodes are added correctly:

    [docker1]$ docker node ls
    ID                           HOSTNAME       STATUS  AVAILABILITY  MANAGER STATUS
    5w9kycb046p9aj6yk8l365esh    docker3.local  Ready   Active        Reachable
    6r22rd71wi59ejaeh7gmq3rge *  docker1.local  Ready   Active        Leader
    awlh9cduvbdo58znra7uyuq1n    docker2.local  Ready   Active        Reachable

    At the moment, we have docker1.local as the leader. 

Overlay Network

The only way to let containers running on different hosts connect to each other is by using an overlay network. It can be thought of as a container network that is built on top of another network (in this case, the physical hosts network). Docker Swarm mode comes with a default overlay network which implements a VxLAN-based solution with the help of libnetwork and libkv. You can however choose another overlay network driver like Flannel, Calico or Weave, where extra installation steps are necessary. We are going to cover more on that later in an upcoming blog post.

In Docker Engine Swarm mode, you can create an overlay network only from a manager node and it doesn’t need an external key-value store like etcd, consul or Zookeeper.

The swarm makes the overlay network available only to nodes in the swarm that require it for a service. When you create a service that uses an overlay network, the manager node automatically extends the overlay network to nodes that run service tasks.

Let’s create an overlay network for our containers. We are going to deploy Percona XtraDB Cluster and application containers on separate Docker hosts to achieve fault tolerance. These containers must be running on the same overlay network so they can communicate with each other.

We are going to name our network “mynet”. You can only create this on the manager node:

[docker1]$ docker network create --driver overlay mynet

Let’s see what networks we have now:

[docker1]$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
213ec94de6c9        bridge              bridge              local
bac2a639e835        docker_gwbridge     bridge              local
5b3ba00f72c7        host                host                local
03wvlqw41e9g        ingress             overlay             swarm
9iy6k0gqs35b        mynet               overlay             swarm
12835e9e75b9        none                null                local

There are now 2 overlay networks with a Swarm scope. The “mynet” network is what we are going to use today when deploying our containers. The ingress overlay network comes by default. The swarm manager uses ingress load balancing to expose the services you want externally to the swarm.

Deployment using Services and Tasks

We are going to deploy the Galera Cluster containers through services and tasks. When you create a service, you specify which container image to use and which commands to execute inside running containers. There are two type of services:

  • Replicated services - Distributes a specific number of replica tasks among the nodes based upon the scale you set in the desired state, for examples “--replicas 3”.
  • Global services - One task for the service on every available node in the cluster, for example “--mode global”. If you have 7 Docker nodes in the Swarm, there will be one container on each of them.

Docker Swarm mode has a limitation in managing persistent data storage. When a node fails, the manager will get rid of the containers and create new containers in place of the old ones to meet the desired replica state. Since a container is discarded when it goes down, we would lose the corresponding data volume as well. Fortunately for Galera Cluster, the MySQL container can be automatically provisioned with state/data when joining.

Deploying Key-Value Store

The docker image that we are going to use is from Percona-Lab. This image requires the MySQL containers to access a key-value store (supports etcd only) for IP address discovery during cluster initialization and bootstrap. The containers will look for other IP addresses in etcd, if there are any, start the MySQL with a proper wsrep_cluster_address. Otherwise, the first container will start with the bootstrap address, gcomm://.

  1. Let’s deploy our etcd service. We will use etcd image available here. It requires us to have a discovery URL on the number of etcd node that we are going to deploy. In this case, we are going to setup a standalone etcd container, so the command is:

    [docker1]$ curl -w "\n"'https://discovery.etcd.io/new?size=1'
    https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68
  2. Then, use the generated URL as “-discovery” value when creating the service for etcd:

    [docker1]$ docker service create \
    --name etcd \
    --replicas 1 \
    --network mynet \
    -p 2379:2379 \
    -p 2380:2380 \
    -p 4001:4001 \
    -p 7001:7001 \
    elcolio/etcd:latest \
    -name etcd \
    -discovery=https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68

    At this point, Docker swarm mode will orchestrate the deployment of the container on one of the Docker hosts.

  3. Retrieve the etcd service virtual IP address. We are going to use that in the next step when deploying the cluster:

    [docker1]$ docker service inspect etcd -f "{{ .Endpoint.VirtualIPs }}"
    [{03wvlqw41e9go8li34z2u1t4p 10.255.0.5/16} {9iy6k0gqs35bn541pr31mly59 10.0.0.2/24}]

    At this point, our architecture looks like this:

Deploying Database Cluster

  1. Specify the virtual IP address for etcd in the following command to deploy Galera (Percona XtraDB Cluster) containers:

    [docker1]$ docker service create \
    --name mysql-galera \
    --replicas 3 \
    -p 3306:3306 \
    --network mynet \
    --env MYSQL_ROOT_PASSWORD=mypassword \
    --env DISCOVERY_SERVICE=10.0.0.2:2379 \
    --env XTRABACKUP_PASSWORD=mypassword \
    --env CLUSTER_NAME=galera \
    perconalab/percona-xtradb-cluster:5.6
  2. It takes some time for the deployment where the image will be downloaded on the assigned worker/manager node. You can verify the status with the following command:

    [docker1]$ docker service ps mysql-galera
    ID                         NAME                IMAGE                                  NODE           DESIRED STATE  CURRENT STATE            ERROR
    8wbyzwr2x5buxrhslvrlp2uy7  mysql-galera.1      perconalab/percona-xtradb-cluster:5.6  docker1.local  Running        Running 3 minutes ago
    0xhddwx5jzgw8fxrpj2lhcqeq  mysql-galera.2      perconalab/percona-xtradb-cluster:5.6  docker3.local  Running        Running 2 minutes ago
    f2ma6enkb8xi26f9mo06oj2fh  mysql-galera.3      perconalab/percona-xtradb-cluster:5.6  docker2.local  Running        Running 2 minutes ago
  3. We can see that the mysql-galera service is now running. Let’s list out all services we have now:

    [docker1]$ docker service ls
    ID            NAME          REPLICAS  IMAGE                                  COMMAND
    1m9ygovv9zui  mysql-galera  3/3       perconalab/percona-xtradb-cluster:5.6
    au1w5qkez9d4  etcd          1/1       elcolio/etcd:latest                    -name etcd -discovery=https://discovery.etcd.io/a293d6cc552a66e68f4b5e52ef163d68
  4. Swarm mode has an internal DNS component that automatically assigns each service in the swarm a DNS entry. So you use the service name to resolve to the virtual IP address:

    [docker2]$ docker exec -it $(docker ps | grep etcd | awk {'print $1'}) ping mysql-galera
    PING mysql-galera (10.0.0.4): 56 data bytes
    64 bytes from 10.0.0.4: seq=0 ttl=64 time=0.078 ms
    64 bytes from 10.0.0.4: seq=1 ttl=64 time=0.179 ms

    Or, retrieve the virtual IP address through the “docker service inspect” command:

    [docker1]# docker service inspect mysql-galera -f "{{ .Endpoint.VirtualIPs }}"
    [{03wvlqw41e9go8li34z2u1t4p 10.255.0.7/16} {9iy6k0gqs35bn541pr31mly59 10.0.0.4/24}]

    Our architecture now can be illustrated as below:

Deploying Applications

Finally, you can create the application service and pass the MySQL service name (mysql-galera) as the database host value:

[docker1]$ docker service create \
--name wordpress \
--replicas 2 \
-p 80:80 \
--network mynet \
--env WORDPRESS_DB_HOST=mysql-galera \
--env WORDPRESS_DB_USER=root \
--env WORDPRESS_DB_PASSWORD=mypassword \
wordpress

Once deployed, we can then retrieve the virtual IP address for wordpress service through the “docker service inspect” command:

[docker1]# docker service inspect wordpress -f "{{ .Endpoint.VirtualIPs }}"
[{p3wvtyw12e9ro8jz34t9u1t4w 10.255.0.11/16} {kpv8e0fqs95by541pr31jly48 10.0.0.8/24}]

At this point, this is what we have:

Our distributed application and database setup is now deployed by Docker containers.

Connecting to the Services and Load Balancing

At this point, the following ports are published (based on the -p flag on each “docker service create” command) on all Docker nodes in the cluster, whether or not the node is currently running the task for the service:

  • etcd - 2380, 2379, 7001, 4001
  • MySQL - 3306
  • HTTP - 80

If we connect directly to the PublishedPort, with a simple loop, we can see that the MySQL service is load balanced among containers:

[docker1]$ while true; do mysql -uroot -pmypassword -h127.0.0.1 -P3306 -NBe 'select @@wsrep_node_address'; sleep 1; done
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
10.255.0.8
10.255.0.9
10.255.0.10
^C

At the moment, Swarm manager manages the load balancing internally and there is no way to configure the load balancing algorithm. We can then use external load balancers to route outside traffic to these Docker nodes. In case of any of the Docker nodes goes down, the service will be relocated to the other available nodes.

That’s all for now. In the next blog post, we’ll take a deeper look at Docker overlay network drivers for MySQL containers.

Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

$
0
0

If you have deployed databases with high availability before, you will know that a deployment does not always go your way, even though you’ve done it a zillion times. You could spend a full day setting everything up and may still end up with a non-functioning cluster. It is not uncommon to start over, as it’s really hard to figure out what went wrong.

So, deploying a MySQL Galera Cluster with redundant load balancing takes a bit of time. This blog looks at how long time it would take to do it manually vs using ClusterControl to perform the task. For those who have not used it before, ClusterControl is an agentless management and automation software for databases. It supports MySQL (Oracle and Percona server), MariaDB, MongoDB (MongoDB inc. and Percona), and PostgreSQL.

For manual deployment, we’ll be using the popular “Google university” to search for how-to’s and blogs that provide deployment steps.

Database Deployment

Deployment of a database consists of several parts. These include getting the hardware ready, software installation, configuration tweaking and a bit of tuning and testing. Now, let’s assume the hardware is ready, the OS is installed and it is up to you to do the rest. We are going to deploy a three-node Galera cluster as shown in the following diagram:

Manual

Googling on “install mysql galera cluster” led us to this page. By following the steps explained plus some additional dependencies, the following is what we should run on every DB node:

$ semanage permissive -a mysqld_t
$ systemctl stop firewalld
$ systemctl disable firewalld
$ vim /etc/yum.repos.d/galera.repo # setting up Galera repository
$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install mysql-wsrep-5.6 galera3 percona-xtrabackup
$ vim /etc/my.cnf # setting up wsrep_* variables
$ systemctl start mysql --wsrep-new-cluster # ‘systemctl start mysql’ on the remaining nodes
$ mysql_secure_installation

The above commands took around 18 minutes to finish on each DB node. Total deployment time was 54 minutes.

ClusterControl

Using ClusterControl, here are the steps we took to first install ClusterControl (5 minutes):

$ wget http://severalnines.com/downloads/cmon/install-cc
$ chmod 755 install-cc
$ ./install-cc

Login to the ClusterControl UI and create the default admin user.

Setup passwordless SSH to all DB nodes on ClusterControl node (1 minute):

$ ssh-keygen -t rsa
$ ssh-copy-id 10.0.0.217
$ ssh-copy-id 10.0.0.218
$ ssh-copy-id 10.0.0.219

In the ClusterControl UI, go to Create Database Cluster -> MySQL Galera and enter the following details (4 minutes):

Click Deploy and wait until the deployment finishes. You can monitor the deployment progress under ClusterControl -> Settings -> Cluster Jobs and once deployed, you will notice it took around 15 minutes:

To sum it up, the total deployment time including installing ClusterControl is 15 + 4 + 1 + 5 = 25 minutes.

Following table summarizes the above deployment actions:

AreaManualClusterControl
Total steps8 steps x 3 servers + 1 = 258
Duration18 x 3 = 54 minutes25 minutes

To summarize, we needed less steps and less time with ClusterControl to achieve the same result. 3 node is sort of a minimum cluster size, and the difference would get bigger with clusters with more nodes.

Load Balancer and Virtual IP Deployment

Now that we have our Galera cluster running, the next thing is to add a load balancer in front. This provides one single endpoint to the cluster, thus reducing the complexity for applications to connect to a multi-node system. Applications would not need to have knowledge of the topology and any changes caused by failures or admin maintenance would be masked. For fault tolerance, we would need at least 2 load balancers with a virtual IP address.

By adding a load balancer tier, our architecture will look something like this:

Manual Deployment

Googling on “install haproxy virtual ip galera cluster” led us to this page. We followed the steps:

On each HAproxy node (2 times):

$ yum install epel-release
$ yum install haproxy keepalived
$ systemctl enable haproxy
$ systemctl enable keepalived
$ vi /etc/haproxy/haproxy.cfg # configure haproxy
$ systemctl start haproxy
$ vi /etc/keepalived/keepalived.conf # configure keepalived
$ systemctl start keepalived

On each DB node (3 times):

$ wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck
$ chmod +x clustercheck
$ mv clustercheck /usr/bin/
$ vi /etc/xinetd.d/mysqlchk # configure mysql check user
$ vi /etc/services # setup xinetd port
$ systemctl start xinetd
$ mysql -uroot -p
mysql> GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckpassword!'

The total deployment time for this was around 42 minutes.

ClusterControl

For the ClusterControl host, here are the steps taken (1 minute) :

$ ssh-copy-id 10.0.0.229
$ ssh-copy-id 10.0.0.230

Then, go to ClusterControl -> select the database cluster -> Add Load Balancer and enter the IP address of the HAproxy hosts, one at a time:

Once both HAProxy are deployed, we can add Keepalived to provide a floating IP address and perform failover:

Go to ClusterControl -> select the database cluster -> Logs -> Jobs. The total deployment took about 5 minutes, as shown in the screenshot below:

Thus, total deployment for load balancers plus virtual IP address and redundancy is 1 + 5 = 6 minutes.

Following table summarized the above deployment actions:

AreaManualClusterControl
Total steps(8 x 2 haproxy nodes) + (8 x 3 DB nodes) = 406
Duration42 minutes6 minutes

ClusterControl also manages and monitors the load balancers:

Adding a Read Replica

Our setup is now looking pretty decent, and the next step is to add a read replica to Galera. What is a read replica, and why do we need it? A read replica is an asynchronous slave, replicating from one of the Galera nodes using standard MySQL replication. There are a few good reasons to have this. Long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. An asynchronous slave can also serve as a remote live backup of our cluster in a DR site, especially if the link is not good enough to stretch one cluster across 2 sites.

Our architecture is now looking like this:

Manual Deployment

Googling on “mysql galera with slave” brought us to this page. We followed the steps:

On master node:

$ vim /etc/my.cnf # setting up binary log and gtid
$ systemctl restart mysql
$ mysqldump --single-transaction --skip-add-locks --triggers --routines --events > dump.sql
$ mysql -uroot -p
mysql> GRANT REPLICATION SLAVE ON .. ;

On slave node (we used Percona Server):

$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install Percona-Server-server-56
$ vim /etc/my.cnf # setting up server id, gtid and stuff
$ systemctl start mysql
$ mysql_secure_installation
$ scp root@master:~/dump.sql /root
$ mysql -uroot -p < /root/dump.sql
$ mysql -uroot -p
mysql> CHANGE MASTER ... MASTER_AUTO_POSITION=1;
mysql> START SLAVE;

The total time spent for this manual deployment was around 40 minutes (with 1GB database in size).

ClusterControl

With ClusterControl, here is what we should do. Firstly, configure passwordless SSH to the target slave (0.5 minute):

$ ssh-copy-id 10.0.0.231 # setup passwordless ssh

Then, on one of the MySQL Galera nodes, we have to enable binary logging to become a master (2 minutes):

Click Proceed to start enabling binary log for this node. Once completed, we can add the replication slave by going to ClusterControl -> choose the Galera cluster -> Add Replication Slave and specify as per below (6 minutes including streaming 1GB of database to slave):

Click “Add node” and you are set. Total deployment time for adding a read replica complete with data is 6 + 2 + 0.5 = 8.5 minutes.

Following table summarized the above deployment actions:

AreaManualClusterControl
Total steps153
Duration40 minutes8.5 minutes

We can see that ClusterControl automates a number of time consuming tasks, including slave installation, backup streaming and slaving from master. Note that ClusterControl will also handle things like master failover so that replication does not break if the galera master fails..

Conclusion

A good deployment is important, as it is the foundation of an upcoming database workload. Speed matters too, especially in agile environments where a team frequently deploys entire systems and tears them down after a short time. You’re welcome to try ClusterControl to automate your database deployments, it comes with a free 30-day trial of the full enterprise features. Once the trial ends, it will default to the community edition (free forever).


High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

$
0
0

We regularly get questions about how to set up a Galera cluster with just 2 nodes. The documentation clearly states you should have at least 3 Galera nodes to avoid network partitioning. But there are some valid reasons for considering a 2 node deployment, e.g., if you want achieve database high availability but have limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup.

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes, so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. Arbitrator failure does not affect the cluster operations and a new instance can be reattached to the cluster at any time. There can be several arbitrators in the cluster.

ClusterControl has support for deploying garbd on non-database hosts.

Normally a Galera cluster needs at least three hosts to be fully functional, however at deploy time two nodes would suffice to create a primary component. Here are the steps:

  1. Deploy a Galera cluster of two nodes,
  2. After the cluster has been deployed by ClusterControl, add garbd on the ClusterControl node.

You should end up with the below setup:

Deploy the Galera Cluster

Go to the ClusterControl deploy wizard to deploy the cluster.

Even though ClusterControl warns you a Galera cluster needs an odd number of nodes, only add two nodes to the cluster.

Deploying a Galera cluster will trigger a ClusterControl job which can be monitored at the Jobs page.

Install Garbd

Once deployment is complete, install garbd on the ClusterControl host. It will be under the Manage -> Load Balancer:

Installing garbd will trigger a ClusterControl job which can be monitored at the Jobs page. Once completed, you can verify garbd is running with a green tick icon at the top bar:

That’s it. Our minimal two-node Galera cluster is now ready!

Deploying and Monitoring MySQL and MongoDB clusters in the cloud with NinesControl

$
0
0

NinesControl is a new service from Severalnines which helps you deploy MySQL Galera and MongoDB clusters in the cloud. In this blog post we will show you how you can easily deploy and monitor your databases on AWS and DigitalOcean.

Deployment

At the moment of writing, NinesControl supports two cloud providers - Amazon Web Services and DigitalOcean. Before you attempt to deploy, you need first to configure access credentials to the cloud you’d like to run on. We covered this topic in a blog post.

Once it’s done, you should see in the “Cloud Accounts” tab the credentials defined for the chosen cloud provider.

You’ll see screen below as you do not have any clusters running yet:

You can click on “Deploy your first cluster” to start your first deployment. You will be presented with a screen like below - you can pick the cluster type you’d like to deploy, set some configuration settings like port, data directory and password. You can also set number of nodes in the cluster and which database vendor you’d like to use.

For MongoDB, the deployment screen is fairly similar with some additional settings to configure.

Once you are done here, it’s time to move to the second step - picking credentials to use to deploy your cluster. You have an option to pick either DigitalOcean and Amazon Web Services. You can also pick whatever credentials you have added to NinesControl. In our example, we just have a single credential but it’s perfectly ok to have more than one credential per cloud provider.

Once you’ve made your choice, proceed to the third, final step in which you will pick what kind of VM’s you’d like to use. This screen differs between AWS and DigitalOcean.

If you picked AWS, you will have an option to choose the operating system and VM size. You also need to pick the VPC in which you will deploy too and subnet which will be used by your cluster. If you don’t see anything on the drop-down list, you can click on “[Add]” buttons to create both VPC and subnet and NinesControl will create these for you. Finally, you need to set the volume size of the VMs. After that, you can trigger the deployment.

DigitalOcean uses a bit different screen setup but the idea is similar - you need to pick a region, operating system and a size of droplet.

Once you are done, click on “Deploy cluster” to start deployment.

Status of the deployment will be show in the cluster list. You can also click on a status bar to see full log of a deployment. Whenever you’d like to deploy a new cluster, you will have to click on the “Deploy cluster” button.

Monitoring

Once deployment completes, you’ll see a list of your clusters.

When you click on one of them, you’ll see a list of nodes in the cluster and cluster-wide metrics.

Of course, metrics are cluster-dependent. Above is what you will see on a MySQL/MariaDB Galera cluster. MongoDB will present you different graphs and metrics:

When you click on a node, you will be redirected to host statistics of that particular node - CPU, network, disk, RAM usage - all of those very important basics which tell you about node health:

As you can see, NinesControl not only allows you to deploy Galera and MongoDB clusters in a fast and efficient way but it also collects important metrics for you and shows them as graphs.

Give it a try and let us know what you think.

MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd

$
0
0

In the previous blog post, we have looked into the multi-host networking capabilities with Docker with native network and Calico. In this blog post, our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. Due to the nature of the scheduler in container orchestration tools and the assumption of homogenous images, the scheduler will just fire the respective containers according to the run command and leave the bootstrapping process to the container’s entrypoint logic when starting up. And you do not want to do that for Galera - starting all nodes at once means each node will form a “1-node cluster” and you’ll end up with a disjointed system.

“Homogeneousing” Galera Cluster

That might be a new word, but it holds true for stateful services like MySQL Replication and Galera Cluster. As one might know, the bootstrapping process for Galera Cluster usually requires manual intervention, where you usually have to decide which node is the most advanced node to start bootstrapping from. There is nothing wrong with this step, you need to be aware of the state of each database node before deciding on the sequence of how to start them up. Galera Cluster is a distributed system, and its redundancy model works like that.

However, container orchestration tools like Docker Engine Swarm Mode and Kubernetes are not aware of the redundancy model of Galera. The orchestration tool presumes containers are independent from each other. If they are dependent, then you have to have an external service that monitors the state. The best way to achieve this is to use a key/value store as a reference point for other containers when starting up.

This is where service discovery like etcd comes into the picture. The basic idea is, each node should report its state periodically to the service. This simplifies the decision process when starting up. For Galera Cluster, the node that has wsrep_local_state_comment equal to Synced shall be used as a reference node when constructing the Galera communication address (gcomm) during joining. Otherwise, the most updated node has to get bootstrapped first.

Etcd has a very nice feature called TTL, where you can expire a key after a certain amount of time. This is useful to determine the state of a node, where the key/value entry only exists if an alive node reports to it. As a result, the node won’t have to connect to each other to determine state (which is very troublesome in a dynamic environment) when forming a cluster. For example, consider the following keys:

    {
        "createdIndex": 10074,
        "expiration": "2016-11-29T10:55:35.218496083Z",
        "key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_last_committed",
        "modifiedIndex": 10074,
        "ttl": 10,
        "value": "2881"
    },
    {
        "createdIndex": 10072,
        "expiration": "2016-11-29T10:55:34.650574629Z",
        "key": "/galera/my_wsrep_cluster/10.255.0.7/wsrep_local_state_comment",
        "modifiedIndex": 10072,
        "ttl": 10,
        "value": "Synced"
    }

After 10 seconds (ttl value), those keys will be removed from the entry. Basically, all nodes should report to etcd periodically with an expiring key. Container should report every N seconds when it's alive (wsrep_cluster_state_comment=Synced and wsrep_last_committed=#value) via a background process. If a container is down, it will no longer send the update to etcd, thus the keys are removed after expiration. This simply indicates that the node was registered but is no longer synced with the cluster. It will be skipped when constructing the Galera communication address at a later point.

The overall flow of joining procedure is illustrated in the following flow chart:

We have built a Docker image that follows the above. It is specifically built for running Galera Cluster using Docker’s orchestration tool. It is available at Docker Hub and our Github repository. It requires an etcd cluster as the discovery service (supports multiple etcd hosts) and based on Percona XtraDB Cluster 5.6. The image includes Percona Xtrabackup, jq (JSON processor) and also a shell script tailored for Galera health check called report_status.sh.

You are welcome to fork or contribute to the project. Any bugs can be reported via Github or via our support page.

Deploying etcd Cluster

etcd is a distributed key value store that provides a simple and efficient way to store data across a cluster of machines. It’s open-source and available on GitHub. It provides shared configuration and service discovery. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. It gracefully handles leader elections during network partitions and will tolerate machine failures, including the leader.

Since etcd is the brain of the setup, we are going to deploy it as a cluster daemon, on three nodes, instead of using containers. In this example, we are going to install etcd on each of the Docker hosts and form a three-node etcd cluster for better availability.

We used CentOS 7 as the operating system, with Docker v1.12.3, build 6b644ec. The deployment steps in this blog post are basically similar to the one used in our previous blog post.

  1. Install etcd packages:

    $ yum install etcd
  2. Modify the configuration file accordingly depending on the Docker hosts:

    $ vim /etc/etcd/etcd.conf

    For docker1 with IP address 192.168.55.111:

    ETCD_NAME=etcd1
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.111:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker2 with IP address 192.168.55.112:

    ETCD_NAME=etcd2
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.112:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"

    For docker3 with IP address 192.168.55.113:

    ETCD_NAME=etcd3
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
    ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER="etcd1=http://192.168.55.111:2380,etcd2=http://192.168.55.112:2380,etcd3=http://192.168.55.113:2380"
    ETCD_INITIAL_CLUSTER_STATE="new"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
    ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
  3. Start the service on docker1, followed by docker2 and docker3:

    $ systemctl enable etcd
    $ systemctl start etcd
  4. Verify our cluster status using etcdctl:

    [docker3 ]$ etcdctl cluster-health
    member 2f8ec0a21c11c189 is healthy: got healthy result from http://0.0.0.0:2379
    member 589a7883a7ee56ec is healthy: got healthy result from http://0.0.0.0:2379
    member fcacfa3f23575abe is healthy: got healthy result from http://0.0.0.0:2379
    cluster is healthy

That’s it. Our etcd is now running as a cluster on three nodes. The below illustrates our architecture:

Deploying Galera Cluster

Minimum of 3 containers is recommended for high availability setup. Thus, we are going to create 3 replicas to start with, it can be scaled up and down afterwards. Running standalone is also possible with standard "docker run" command as shown further down.

Before we start, it’s a good idea to remove any sort of keys related to our cluster name in etcd:

$ etcdctl rm /galera/my_wsrep_cluster --recursive

Ephemeral Storage

This is a recommended way if you plan on scaling the cluster out on more nodes (or scale back by removing nodes). To create a three-node Galera Cluster with ephemeral storage (MySQL datadir will be lost if the container is removed), you can use the following command:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Persistent Storage

To create a three-node Galera Cluster with persistent storage (MySQL datadir persists if the container is removed), add the mount option with type=volume:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Custom my.cnf

If you would like to include a customized MySQL configuration file, create a directory on the physical host beforehand:

$ mkdir /mnt/docker/mysql-config # repeat on all Docker hosts

Then, use the mount option with “type=bind” to map the path into the container. In the following example, the custom my.cnf is located at /mnt/docker/mysql-config/my-custom.cnf on each Docker host:

$ docker service create \
--name mysql-galera \
--replicas 3 \
-p 3306:3306 \
--network galera-net \
--mount type=volume,source=galera-vol,destination=/var/lib/mysql \
--mount type=bind,src=/mnt/docker/mysql-config,dst=/etc/my.cnf.d \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

Wait for a couple of minutes and verify the service is running (CURRENT STATE = Running):

$ docker service ls mysql-galera
ID                         NAME            IMAGE               NODE           DESIRED STATE  CURRENT STATE           ERROR
2vw40cavru9w4crr4d2fg83j4  mysql-galera.1  severalnines/pxc56  docker1.local  Running        Running 5 minutes ago
1cw6jeyb966326xu68lsjqoe1  mysql-galera.2  severalnines/pxc56  docker3.local  Running        Running 12 seconds ago
753x1edjlspqxmte96f7pzxs1  mysql-galera.3  severalnines/pxc56  docker2.local  Running        Running 5 seconds ago

External applications/clients can connect to any Docker host IP address or hostname on port 3306, requests will be load balanced between the Galera containers. The connection gets NATed to a Virtual IP address for each service "task" (container, in this case) using the Linux kernel's built-in load balancing functionality, IPVS. If the application containers reside in the same overlay network (galera-net), then use the assigned virtual IP address instead. You can retrieve it using the inspect option:

$ docker service inspect mysql-galera -f "{{ .Endpoint.VirtualIPs }}"
[{89n5idmdcswqqha7wcswbn6pw 10.255.0.2/16} {1ufbr56pyhhbkbgtgsfy9xkww 10.0.0.2/24}]

Our architecture is now looking like this:

As a side note, you can also run Galera in standalone mode. This is probably useful for testing purposes like backup and restore, testing the impact of queries and so on. To run it just like a standalone MySQL container, use the standard docker run command:

$ docker run -d \
-p 3306 \
--name=galera-single \
-e MYSQL_ROOT_PASSWORD=mypassword \
-e DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
-e CLUSTER_NAME=my_wsrep_cluster \
-e XTRABACKUP_PASSWORD=mypassword \
severalnines/pxc56
ClusterControl
Single Console for Your Entire Database Infrastructure
Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Scaling the Cluster

There are two ways you can do scaling:

  1. Use “docker service scale” command.
  2. Create a new service with same CLUSTER_NAME using “docker service create” command.

Docker’s “scale” Command

The scale command enables you to scale one or more services either up or down to the desired number of replicas. The command will return immediately, but the actual scaling of the service may take some time. Galera needs to be run an odd number of nodes to avoid network partitioning.

So a good number to scale to would be 5 and so on:

$ docker service scale mysql-galera=5

Wait for a couple of minutes to let the new containers reach the desired state. Then, verify the running service:

$ docker service ls
ID            NAME          REPLICAS  IMAGE               COMMAND
bwvwjg248i9u  mysql-galera  5/5       severalnines/pxc56

One drawback of using this method is that you have to use ephemeral storage because Docker will likely schedule the new containers on a Docker host that already has a Galera container running. If this happens, the volume will overlap the existing Galera containers’ volume. If you would like to use persistent storage and scale in Docker Swarm mode, you should create another new service with a couple of different options, as described in the next section.

At this point, our architecture looks like this:

Another Service with Same Cluster Name

Another way to scale is to create another service with the same CLUSTER_NAME and network. However, you can’t really use the exact same command as the first one due to the following reasons:

  • The service name should be unique.
  • The port mapping must be other than 3306, since this port has been assigned to the mysql-galera service.
  • The volume name should be different to distinguish them from the existing Galera containers.

A benefit of doing this is you will got another virtual IP address assigned to the “scaled” service. This allows you to have an additional option for your application or client to connect to the “scaled” IP address for various tasks, e.g. perform a full backup in desync mode, database consistency check or server auditing.

The following example shows the command to add two more nodes to the cluster in a new service called mysql-galera-scale:

$ docker service create \
--name mysql-galera-scale \
--replicas 2 \
-p 3307:3306 \
--network galera-net \
--mount type=volume,source=galera-scale-vol,destination=/var/lib/mysql \
--env MYSQL_ROOT_PASSWORD=mypassword \
--env DISCOVERY_SERVICE=192.168.55.111:2379,192.168.55.112:2379,192.168.55.113:2379 \
--env XTRABACKUP_PASSWORD=mypassword \
--env CLUSTER_NAME=my_wsrep_cluster \
severalnines/pxc56

If we look into the service list, here is what we see:

$ docker service ls
ID            NAME                REPLICAS  IMAGE               COMMAND
0ii5bedv15dh  mysql-galera-scale  2/2       severalnines/pxc56
71pyjdhfg9js  mysql-galera        3/3       severalnines/pxc56

And when you look into the cluster size on one of the container, you should get 5:

[root@docker1 ~]# docker exec -it $(docker ps | grep mysql-galera | awk {'print $1'}) mysql -uroot -pmypassword -e 'show status like "wsrep_cluster_size"'
Warning: Using a password on the command line interface can be insecure.
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 5     |
+--------------------+-------+

At this point, our architecture looks like this:

To get a clearer view of the process, we can simply look at the MySQL error log file (located under Docker’s data volume) on one of the running containers, for example:

$ tail -f /var/lib/docker/volumes/galera-vol/_data/error.log

Scale Down

Scaling down is simple. Just reduce the number of replicas or remove the service that holds the minority number of containers to ensure that Galera is still in quorum. For example, if you have fired two groups of nodes with 3 + 2 containers and reach total of 5, the majority need to survive thus you can only remove the second group with 2 containers. If you have three groups with 3 + 2 + 2 containers, you can lose a maximum of 3 containers. This is due to the fact that the Docker Swarm scheduler simply terminates and removes the containers corresponding to the service. This makes Galera think that there are nodes failing, as they are not shut down in a graceful way.

If you scaled up using “docker service scale” command, you should scale down using the same method by reducing the number of replicas. To scale it down, simply do:

$ docker service scale mysql-galera=3

Otherwise, if you chose to create another service to scale up, then simply remove the respective service to scale down:

$ docker service rm mysql-galera-scale

Known Limitations

There will be no automatic recovery if a split-brain happens (where all nodes are in Non-Primary state). This is because the MySQL service is still running, yet it will refuse to serve any data and will return error to the client. Docker has no capability to detect this since what it cares about is the foreground MySQL process which is not terminated, killed or stopped. Automating this process is risky, especially if the service discovery is co-located with the Docker host (etcd would also lose contact with other members). Although if the service discovery is healthy somewhere else, it is probably unreachable from the Galera containers perspective, preventing each other to see the container’s status correctly during the glitch.

In this case, you will need to intervene manually.

Choose the most advanced node to bootstrap and then run the following command to promote the node as Primary (other nodes shall then rejoin automatically if the network recovers):

$ docker exec -it [container ID] mysql -uroot -pyoursecret -e 'set global wsrep_provider_option="pc.bootstrap=1"'

Also, there is no automatic cleanup for the discovery service registry. You can remove all entries using either the following command (assuming the CLUSTER_NAME is my_wsrep_cluster):

$ curl http://192.168.55.111:2379/v2/keys/galera/my_wsrep_cluster?recursive=true -XDELETE # or
$ etcdctl rm /galera/my_wsrep_cluster --recursive

Conclusion

This combination of technologies opens a door for a more reliable database setup in the Docker ecosystem. Working with service discovery to store state makes it possible to have stateful containers to achieve a homogeneous setup.

In the next blog post, we are going to look into how to manage Galera Cluster on Docker.

Online schema change for MySQL & MariaDB - comparing GitHub’s gh-ost vs pt-online-schema-change

$
0
0

Database schema change is one of the most common activities that a MySQL DBA has to tackle. No matter if you use MySQL Replication or Galera Cluster, direct DDL’s are troublesome and, sometimes, not feasible to execute. Add the requirement to perform the change while all databases are online, and it can get pretty daunting.

Thankfully, online schema tools are there to help DBAs deal with this problem. Arguably, the most popular of them is Percona’s pt-online-schema-change, which is part of Percona Toolkit.

It has been used by MySQL DBAs for years and is proven as a flexible and reliable tool. Unfortunately, not without drawbacks.

To understand these, we need to understand how it works internally.

How does pt-online-schema-change work?

Pt-online-schema-change works in a very simple way. It creates a temporary table with the desired new schema - for instance, if we added an index, or removed a column from a table. Then, it creates triggers on the old table - those triggers are there to mirror changes that happen on the original table to the new table. Changes are mirrored during the schema change process. If a row is added to the original table, it is also added to the new one. Likewise if a row is modified or deleted on the old table, it is also applied on the new table. Then, a background process of copying data (using LOW_PRIORITY INSERT) between old and new table begins. Once data has been copied, RENAME TABLE is executed to rename “yourtable” into “yourtable_old” and “yourtable_new” into “yourtable”. This is an atomic operation and in case something goes wrong, it is possible to recover the old table.

The process described above has some limitations. For starters, it is not possible to reduce the overhead of the tool to 0. Pt-online-schema-change gives you an option to define the maximum allowed replication lag and, if that threshold is crossed, it stops to copy data between the old and new table. It is also possible to pause the background process entirely. The problem is that we are talking only about the background process of running INSERTs. It is not possible to reduce the overhead caused by the fact that every operation in “yourtable” is duplicated in “yourtable_new” through triggers. If you remove the triggers, the old and new table would go out of sync without any means to sync them again. Therefore, when you run pt-online-schema-change on your system, it always adds some overhead, even if it is paused or throttled. How big overhead depends on how many writes hit the table which is undergoing a schema change.

Another issue is caused again by triggers - precisely by the fact that, to create triggers, one has to acquire a lock on MySQL’s metadata. This can become a serious problem if you have highly concurrent traffic or if you use longer transactions. Under such load, it may be virtually impossible (and we’ve seen such databases) to use pt-online-schema-change due to the fact that it is not able to acquire metadata lock to create the required triggers. Additionally, the process of acquiring metadata can also lock further transactions, basically grinding all database operations to halt.

Yet another problem are foreign keys - unfortunately, there is no simple way of handling them. Pt-online-schema-change gives you two methods to approach this issue. Neither of those are really good. The main issue here is that a foreign key of a given name can only refer to a single table and it sticks to it - even if you rename the table referred to, the foreign key will follow this change. This leads to the problem: after RENAME TABLE, the foreign key will point to ‘yourtable_old’, not ‘yourtable’.

One workaround is to not use:

RENAME TABLE ‘yourtable’ TO ‘yourtable_old’, ‘yourtable_new’ TO ‘yourtable’;

Instead, use a two step approach:

DROP TABLE ‘yourtable’; RENAME TABLE ‘yourtable_new’ TO ‘yourtable’;

This poses a serious problem. If for some reason, RENAME TABLE won’t work, there’s no going back as the original table has been already dropped.

Another approach would be to create a second foreign key, under a different name, which refers to ‘yourtable_new’. After RENAME TABLE, it will point to ‘yourtable’, which is exactly what we want. Thing is, you need to execute a direct ALTER to create such foreign key - which kind of invalidates the point of using online schema change - to avoid direct alters. If the altered table is large, such operation is not feasible to execute on Galera Cluster (cluster-wide stall caused by TOI) and MySQL replication cluster (slave lag induced by serialized ALTER).

As you can see, while being a useful tool, pt-online-schema-change has serious limitations which you need to be aware of before you use it. If you use MySQL at scale, limitations may become a serious motivation to do something about it.

Introducing GitHub’s gh-ost

Motivation alone is not enough - you also need resources to create a new solution. GitHub recently released gh-ost, their take on online schema change. Let’s take a look at how it compares to Percona’s pt-online-schema-change and how it can be used to avoid some of its limitations.

To understand better what is the difference between both tools, let’s take a look at how gh-ost works.

Gh-ost creates a temporary table with the altered schema, just like pt-online-schema-change does - it uses “_yourtable_gho” pattern. It executes INSERT queries which use the following pattern to copy data from old to new table:

insert /* gh-ost `sbtest1`.`sbtest1` */ ignore into `sbtest1`.`_sbtest1_gho` (`id`, `k`, `c`, `pad`)
      (select `id`, `k`, `c`, `pad` from `sbtest1`.`sbtest1` force index (`PRIMARY`)
        where (((`id` > ?)) and ((`id` < ?) or ((`id` = ?)))) lock in share mode

As you can see, it is a variation of INSERT INTO new_table  SELECT * FROM old_table. It uses primary key to split data in chunks and then work on them.

In pt-online-schema-change, the current traffic was handled using triggers. Gh-ost uses a triggerless approach - it uses binary logs to track and apply changes which happened since gh-ost started to copy data. It connects to one of the hosts, by default it is one of the slaves, simulates that it is a slave itself and asks for binary logs.

This behavior has a couple of repercussions. First of all, network traffic is increased compared to pt-online-schema-change - not only gh-ost has to copy data but it also has to copy binary logs.

It also requires binary logs in row-based format for full data consistency - if you use statement or mixed replication, gh-ost won’t work in your setup. As a workaround, you can create a new slave, enable log_slave_updates and set it to store events in row format. Reading data from a slave is, actually, the default way in which gh-ost operates - it makes perfect sense as pulling binary logs adds some overhead and if you can avoid additional overhead on the master, you most likely want to do it. Of course, if your master uses row-based replication format, you can force gh-ost to connect to it and get binary logs.

What is good about this design is that you don’t have to create triggers, which, as we discussed, could become a serious problem or even a blocker. What is also great is that you can always stop parsing binary logs - it’s like you’d just run STOP SLAVE. You have the binlog coordinates so you can easily start in the same position later on. This makes it possible to stop practically all operations executed by gh-ost. Not only the background process of copying data from old to new table, but also any load related to keeping the new table in sync with the old one. This is a great feature in a production environment - pt-online-schema-change requires constant monitoring as you can only estimate the additional load on the system. Even if you paused it, it will still add some overhead and, under heavy load, this overhead may result in an unstable database. On the other hand, with gh-ost, you can just pause the whole process and the workload pattern will go back to what you are used to see - no additional load whatsoever related to the schema change. This is really great - it means you can start the migration at 9am, when you start your day, stop it at 5pm when you are leaving your office. You can be sure that you won’t get paged late at night because the paused schema change process is not actually 100% paused, and is causing problems to your production systems.

Unfortunately, gh-ost is not without drawbacks. For starters, foreign keys. Pt-online-schema-change does not provide any good way of altering tables which contain foreign keys. It is still way better than gh-ost as gh-ost does not support foreign keys at all. At the moment of writing, that is - it may change in the future. Triggers - gh-ost, at the moment of writing, does not support triggers at all. The same is true for pt-online-schema-change - it was a limitation of pre-5.7 MySQL where you couldn’t have more than one trigger of a given type defined in a table (and pt-online-schema-change had to create them for its own purposes). Even if the limitation is removed in MySQL 5.7, pt-online-schema-change still does not support tables with triggers.

One of the main limitations of gh-ost is, definitely, the fact that it does not support Galera Cluster. It is because of how gh-ost performs a table switch - it uses LOCK TABLE which do not work well with Galera - as of now there is no known fix or workaround for this issue and this leaves pt-online-schema-change as the only option for Galera Cluster.

These are probably the most important  limitations of gh-ost, but there are more of them. Minimal row image is not supported (which makes your binlogs grow larger), JSON and generated columns in 5.7 are not supported. Migration key must not contain NULL values, there are limitations when it comes to mixed cases in table names. You can find more details on all requirements and limitations of gh-ost in its documentation.

In our next blog post we will take a look at how gh-ost operates, how you can test your changes and how to perform it. We will also discuss throttling of gh-ost.

Become a ClusterControl DBA - SSL Key Management and Encryption of MySQL Data in Transit

$
0
0

Databases usually work in a secure environment. It may be a datacenter with a dedicated VLAN for database traffic. It may be a VPC in EC2. If your network spreads across multiple datacenters in different regions, you’d usually use some kind of Virtual Private Network or SSH tunneling to connect these locations in a secure manner. With data privacy and security being hot topics these days, you might feel better with an additional layer of security.

MySQL supports SSL as a means to encrypt traffic both between MySQL servers (replication) and between MySQL servers and clients. If you use Galera cluster, similar features are available - both intra-cluster communication and connections with clients can be encrypted using SSL.

A common way of implementing SSL encryption is to use self-signed certificates. Most of the time, it is not necessary to purchase an SSL certificate issued by the Certificate Authority. Anybody who’s been through the process of generating a self-signed certificate will probably agree that it is not the most straightforward process - most of the time, you end up searching through the internet to find howto’s and instructions on how to do this. This is especially true if you are a DBA and only go through this process every few months or even years. This is why we recently added a new ClusterControl feature designed to help you manage SSL keys across your database cluster. We based this blog post on ClusterControl 1.3.1.

Key Management in the ClusterControl

You can enter Key Management by going to Settings->Key Management section.

You will be presented with the following screen:

You can see two certificates generated, one being a CA and the other one a regular certificate. To generate more certificates, switch to the ‘Generate Key’ tab:

A certificate can be generated in two ways - you can first create a self-signed CA and then use it to sign a certificate. Or you can go directly to the ‘Client/Server Certificates and Key’ tab and create a certificate. The required CA will be created for you in the background. Last but not least, you can import an existing certificate (for example a certificate you bought from one of many companies which sell SSL certificates).

To do that, you should upload your certificate, key and CA to your ClusterControl node and store them in /var/lib/cmon/ca directory. Then you fill in the paths to those files and the certificate will be imported.

If you decided to generate a CA or generate a new certificate, there’s another form to fill - you need to pass details about your organization, common name, email, pick the key length and expiration date.

Once you have everything in place, you can start using your new certificates. ClusterControl currently supports deployment of SSL encryption between clients and MySQL databases and SSL encryption of intra-cluster traffic in Galera Cluster. We plan to extend the variety of supported deployments in future releases of the ClusterControl.

Full SSL encryption for Galera Cluster

Now let’s assume we have our SSL keys ready and we have a Galera Cluster, which needs SSL encryption, deployed through our ClusterControl instance. We can easily secure it in two steps.

First - encrypt Galera traffic using SSL. From your cluster view, one of the cluster actions is  Enable Galera SSL Encryption. You’ll be presented with the following options:

If you do not have a certificate, you can generate it here. But if you already generated or imported an SSL certificate, you should be able to see it in the list and use it to encrypt Galera replication traffic. Please keep in mind that this operation requires a cluster restart - all nodes will have to stop at the same time, apply config changes and then restart. Before you proceed here, make sure you are prepared for some downtime while the cluster restarts.

Once intra-cluster traffic has been secured, we want to cover client - server connections. To do that, pick ‘Enable SSL Encryption’ job and you’ll see following dialog:

It’s pretty similar - you can either pick an existing certificate or generate new one. The main difference is that to apply client - server encryption, downtime is not required - a rolling restart will suffice.

Of course, enabling SSL on the database is not enough - you have to copy certificates to clients which are supposed to use SSL to connect to the database. All certificates can be found in /var/lib/cmon/ca directory on the ClusterControl node. You also have to remember to change grants for users and make sure you’ve added REQUIRE SSL to them if you want to enforce only secure connections.

We hope you’ll find those options easy to use and help you secure your MySQL environment. If you have any questions or suggestions regarding this feature, we’d love to hear from you.

Viewing all 111 articles
Browse latest View live