Getting MySQL Replication to work for Drupal 6 - How I Did It
Last update: March 19, 2012
Instructions:
(Note: This HowTo shares what I did to get MySQL Replication working for me for multi-site Drupal 6.19 running on top of IPSConfig2 on Debian Lenny. This site, and a bunch of other sites I maintain, are running Drupal on 6 geographically-diverse, provider-diverse webservers in a load-balanced configuration by using Multi-master MySQL Replication. I cannot guarantee that it will resolve issues for your configuration, but it surely worked great for me
. Your results may vary.)I was reading and following an extremely thorough and informative HowTo from HowToForge on setting up MySQL Master-Slave Replication, and though it is extremely detailed, it would not work for my Drupal sites. (Make sure you read it as it contains some good basic details on setting up MySQL Replication. Also, check the updates at the bottom of this article as well).
- After meticulously following every step, Replication would start successfully and immediately error out and stop ... on every Drupal site!
- Specifically, MySQL Replication under Drupal often seems to lead to duplicate key errors for INSERT statements, and they look something like this one:
Error 'Duplicate entry '21:9cca7f33b555fa99b9c9c6b5a2b5b639' for key 1' on query. Default database: 'some_database'. Query: 'INSERT INTO cache_filter (cid, data, created, expire, headers) VALUES ('21:9cca7f33b555fa99b9c9c6b5a2b5b639',
... etc.
- To solve the first problem (i.e., special MySQL Replication considerations for Drupal), edit your /etc/mysql/my.cnf (at least that is the config file on Debian) and consider adding the following highlighted sections:
- Make sure to replace "DrupalDB" with your actual Drupal database name (example: webXdbY if you followed the How To referenced above).
- Use binlog-ignore-db for databases you don't want to replicate
- The replicate-wild-ignore-table stops replication of temporary data. For Drupal, you have to stop the replication of cache and watchdog data, otherwise you are likely to have replication errors almost immediately.
- The replicate-ignore-table stops replication of the selected table. In Drupal, you don't want to replicate temporary tables used by Search as that is not needed and can chew up important bandwidth and CPU
- auto_increment_increment and auto_increment_offset are optional but may be very useful if you may want to eventually try "master-master" replication. They supposedly ensure that there is no duplication or conflict in replicated records. This means that the auto_inc columns will all increment by a block of 5 (in this example) and this server will be offset #1 into that block. The other server should be setup the same way EXCEPT that the offset needs to be different (for example, choose 2 for your second server).
- To resolve the second problem (MySQLDump hanging on a database):
- You do not need to READ LOCK the tables to do the dump (that WILL effectively stop access to the databases while locked).
- So, instead of doing this in MySQL:
- You can do this at the commandline (shell) prompt:
- You can then do (at MySQL prompt; CAUTION, this deletes your database!):
- And to load the data onto your replica, at commandline (shell):
"Master" Server:
[mysqld]
log-bin=mysql-bin
# Make sure to replace "DrupalDB" with your actual Drupal database name.
binlog-do-db=DrupalDB
# A database you do NOT want to replicate
binlog-ignore-db=DontReplicateThisDB
# Don't replicate the "mysql" database itself (i.e., local MySQL config)
binlog-ignore-db=mysql
server-id=1
sync_binlog=1
#Minimizes the number of changes you lose if a problem occurs down to 1, reference
#max_allowed_packet = 16M
max_allowed_packet = 32M
#
# Don't replicate Drupal cache, watchdog, temporary search data
replicate-wild-ignore-table=DrupalDB%.cache%
replicate-wild-ignore-table=DrupalDB%.watchdog%
replicate-ignore-table=DrupalDB.temp_search_sids
replicate-ignore-table=DrupalDB.temp_search_results
# I have 5 servers in this group of replicators
auto_increment_increment = 5
# This server is server #1 of the 5
auto_increment_offset = 1
# You could choose to skip only certain MySQL errors; I skip all and I have not experienced any ill effects in months of heavy usage.
slave-skip-errors=all
Secondary or "Slave" Server:
[mysqld]
# This server is server #2 of the 5
server-id = 2
# I DO want to replicate this database
replicate-do-db = DrupalDB
sync_binlog = 1
#Minimizes the number of changes you lose if a problem occurs down to 1, reference
#
# I have 5 servers in this group of replicators
auto_increment_increment= 5
# This server is server #2 of the 5
auto_increment_offset = 2
slave-skip-errors = all
#
# Don't replicate Drupal cache, watchdog, temporary search data
replicate-wild-ignore-table=DrupalDB%.cache%
replicate-wild-ignore-table=DrupalDB%.watchdog%
replicate-ignore-table=DrupalDB.temp_search_sids
replicate-ignore-table=DrupalDB.temp_search_results
#
USE exampledb;
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
$mysqldump -uroot -pYourMySQLRootPassword --master-data --opt exampledb > dumpfile.sql
That little "--master-data" option is key. It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped database. These are the master server coordinates from which the slave should start replicating.
MySQL>DROP DATABASE exampledb; CREATE DATABASE exampledb;
$mysql -uroot -pYourMySQLRootPassword exampledb < dumpfile.sql
This worked perfectly on that pesky file for me!
Some Tips:
- Since READ LOCK prevents access to your database and some of you may be forced to do some of this to a "live" database, I would recommend you reorder the instructions in the HowTo linked above, to remove the READ LOCK after the dump, before copying the Dump files from Master to Slave, so that you reduce the offline time to the bare minimum necessary and the database isn't locked while you are waiting for an scp copy to finish, especially if the servers are geographically remote to each other and you are copying a lot of data.)
- If you receive a "could not initialize master info structure, more error messages can be found in the MySQL error log", try this in MySQL:
- MySQL> reset slave;
UPDATE March 19, 2012:
- Added explanation for sync_binlog = 1
UPDATE January 24, 2012:
- Consider setting your connection timeout to a higher value if one or more of your servers has a slow network link and keeps losing connection to their master server. You might try this especially if you keep getting message like this in the log:
- Within the MySQL environment, you can do this via: MySQL> SET @@global.connect_timeout=30;
- OR within /etc/mysql/my.cnf config file, [mysqld] section: connect_timeout=30
-
[ERROR] Slave I/O thread: error reconnecting to master 'mysqluser@mysqlpassword:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111' errno: 2013 retry-time: 60 retries: 86400
UPDATE July 9, 2011:
- I have been using "Multi-Master" MySQL Replication with Drupal6 for 7 months now; now across at least 6 different providers (including at least 4 OpenVZ VPS's & 2 VMware® virtual machines) and it has been rock solid!
- In fact, the MySQL Replication is also working for our Piwik Analytics, and for multiple Content Delivery Networks (one using Drupal, another using Perl, and another using PHP).
- We will also migrate to a MySQL based DNS solution soon as well that will also leverage replication. The end goal is to setup our own "closest server to you" load balancing, failover, and high availability.
- I will post How I Did it (for Multi-Master MySQL Replication) soon (the "Master-Slave" instructions are below).
UPDATE July 7, 2011:
- Added a recommendation to consider using slave_compressed_protocol=1 to my.cnf
UPDATE January 28, 2011:
- I have now been testing Multiple Master MySQL Replication with Drupal 6 for about a month now and it has been working GREAT! It involves 3 Primaries ("Masters") and 2 Secondaries.
UPDATE December 27, 2010 ... some observations:
- MySQL Replication has been working GREAT for numerous Drupal sites on 5 geographically diverse Servers (1 Primary + 4 Secondaries) across 5 different Providers/ISPs over the Internet!
- I plan to upgrade these servers to "Multi Master MySQL Replication" soon. I will keep you posted on how that goes.
- One key observation: For Drupal Menu changes to show up on your secondaries, you need to clear the cache on the secondaries.
- i.e., go to "/admin/settings/performance" on each replica server and click the button at the bottom that says "Clear Cached Data"
- Also, be careful, MySQL Replication can generate a LOT of traffic! Especially if you have multiple secondaries. And, since my servers are replicating via TLS/SSL Encryption, it also increases CPU usage on all servers involved.
- So, if you have sites that you must do daily uploads or imports into like I do, I highly recommend that you set QoS on your networks to prioritize MySQL (TCP Port 3306) to a LOWER priority than your other traffic (especially your web traffic!) ... otherwise, your Primary can grind to a halt while all the secondaries grab their updates
- Consider adding slave_compressed_protocol=1 to your /etc/mysql/my.cnf, as it enables compression and should help to reduce traffic between your various MySQL instances.
- Within the MySQL environment, you can do this via: MySQL> SET @@global.slave_compressed_protocol=1;
- Of course, test to make sure that compression works properly for you


Mysql replication is much
Mysql replication is much better in new Drupal 7 than in Drupal 6 but thanks for the guide
patch
Hello,
you can find the patch here
is the same system master / slave on pressflow.
Enjoi
Thanks but I am not a fan of
IMHO Replication is a functionality of the database infrastructure and I do not want to have to make application changes in order for it work. (It is hard enough getting the application layer to work properly without having to rely on yet another patch
What I have outlined in the HowTo above works perfectly without any changes or patches to Drupal!
Could you show me some direction for D7 MySQL replication ?
Hi, Could you tell me more about D7 MySQL replication method ?
My email : hai...@sutrixmedia.com
Thanks in advance !
I will create Drupal 7 Replication how to's
I still have painful memories of Drupal 5 to Drupal 6 upgrade (I guess it is too late to write a how to about my findings? lol) when Modules were replaced by a "similar" module (e.g. FeedAPI -> Feeds), simply changed their behavior (e.g. Scheduler), were "broken" (e.g. Node Import and Date fields) and/or had major rewrites (e.g. Views).
On the positive side, I learned more about Drupal in doing the Drupal 5 to Drupal 6 upgrade than any other time.
Open Source software is AWESOME (and low purchase price), but you don't usually control the feature set or direction unless you are one of the developers.
Why cache_filter breaks
Just so readers understand why most tables replicate, and cache_filter breaks.
The auto_increment_increment = 5 and auto_increment_offset = 1 values start off server #1 with "1" for ALL auto_increment tables.
It also makes it count by fivesies as it goes up.
So your IDs on auto_increment fields on server #1 are: 1, 6, 11, 16, 21, ...
On #2, you use auto_increment_offset 2 and it gets: 2, 7, 12, 17, 22, ...
On #3, you use auto_increment_offset 3 and it gets: 3, 8, 13, 18, 23, ...
The remaining servers are left as an exercise for the reader. :-)
The cache_filter table does NOT use an auto_increment field as the unique ID. It doesn't even have an auto_increment field.
Instead, it takes the ID of the node being cached, and some kind of hash, probably involving 'data', and hand-crafts a "unique" ID. Unfortunately, the same algorithm is applied on EVERY server, so you end up with duplicate keys, because the ID and the 'data' match, and so the hash matches, and so the key matches.
Drupal core developers who are serious about scalability would be wise to consider developing a drupal_guid() function that would generate unique GUID (Globally Unique IDs) across a distributed multi-write database setup. This will NOT be trivial, and would require some unique machine_id at the database layer. Although possibly it would be easier to constrain the front-ends to have a unique machine_id...
Anyway, the replication failure for the tables NOT using auto_increment is kind of a face-palm once you realize how it all works and fits together. Hopefully this comment de-mystified at least one tiny piece of that. :-)
Thank you for this comment
The other interesting thing that I have seen and the ONLY thing that doesn't replicate in this setup is the REMOVAL of items from the menu (Primary, Secondary, or Navigation) ... you have to go to each replica and clear the cache to see the menu changes.
Watchdog
PS
I'm not so sure you would want to not replicate watchdog.
It costs bandwidth in a big way, but is purned by cron.
More importantly, it uses an auto_increment, so should be immune to ID conflicts.
And, CRITICAL, is that if your MASTER database crashes and burns due to a bug in MySQL (as unlikely as that may be) your only forensic evidence may exist in that Slave watchdog table...
PS
You may want to consider using auto_increment_increment = 20, as that will make it easier to expand to, say, 20 servers.
It is an upper bound on the number of servers you can have.
Well, okay, you can shut down ALL your databases, and edit the my.cnf files and change them all to 6 when you have 6 servers, and again to 7 for 7, and so on... That's probably not what you really want for "high availability".
Of course your run this risk of running through your INT (or BIGINT) limit 20 times as fast. But if you manage to get THAT many records in a Drupal server, you are going to have much bigger performance problems LONG before you get there, and you can address the ID limit at that time.
PPS
Your CAPTCHA is annoyingly difficult.
Interesting ... I haven't run
Funny you mention the auto_increment, because I already increased mine to 10 as, at one time I had 7 replicas going of these sites.
And yes, the Captcha is annoyingly difficult because I got tired of the BS posts (through ReCaptcha AND the current Image Captcha) from (what I thought was bots) in India and China ... turns it out it might be humans actually employed to manually type in SPAM on websites
I appreciate you leaving the comments in spite of the difficult captcha!!
Wildcard characters
Technically, you should use \_ every place you have _ in wildcard ignore.
_ matches ANY single character.
While it's unlikely you will ever have a Drupal table named 'tempAsearchBsids' that table WOULD match your pattern...
Post new comment