Getting MySQL Replication to work for Drupal 6 - How I Did It

in Drupal Technical Discussion (PUBLIC)

Last update: March 19, 2012

Instructions:

(Note: This HowTo shares what I did to get MySQL Replication working for me for multi-site Drupal 6.19 running on top of IPSConfig2 on Debian Lenny. This site, and a bunch of other sites I maintain, are running Drupal on 6 geographically-diverse, provider-diverse webservers in a load-balanced configuration by using Multi-master MySQL Replication. I cannot guarantee that it will resolve issues for your configuration, but it surely worked great for me Smile. Your results may vary.)

I was reading and following an extremely thorough and informative HowTo from HowToForge on setting up MySQL Master-Slave Replication, and though it is extremely detailed, it would not work for my Drupal sites. (Make sure you read it as it contains some good basic details on setting up MySQL Replication.   Also, check the updates at the bottom of this article as well).

  • After meticulously following every step, Replication would start successfully and immediately error out and stop ... on every Drupal site!
    • Specifically, MySQL Replication under Drupal often seems to lead to duplicate key errors for INSERT statements, and they look something like this one:

Error 'Duplicate entry '21:9cca7f33b555fa99b9c9c6b5a2b5b639' for key 1' on query. Default database: 'some_database'. Query: 'INSERT INTO cache_filter (cid, data, created, expire, headers) VALUES ('21:9cca7f33b555fa99b9c9c6b5a2b5b639',
... etc.

According to some information I have seen, this occurs more often under heavier load.
  • The second problem was that one of the databases would hang MySQLDump! It would just freeze. No error, no crash, no messages ... it just HUNG. It wasn't the largest database, in fact there was nothing remarkable about the database versus the others I had to replicate. And it passed all the integrity checks I ran on it.
  • Here is what I did to get MySQL Replication working and to resolve the MySQLDump hang:
    • To solve the first problem (i.e., special MySQL Replication considerations for Drupal), edit your /etc/mysql/my.cnf (at least that is the config file on Debian) and consider adding the following highlighted sections:
      • "Master" Server:
        [mysqld]
        log-bin=mysql-bin
           # Make sure to replace "DrupalDB" with your actual Drupal database name.
        binlog-do-db=DrupalDB
           # A database you do NOT want to replicate
        binlog-ignore-db=DontReplicateThisDB
           # Don't replicate the "mysql" database itself (i.e., local MySQL config)
        binlog-ignore-db=mysql
        server-id=1
        sync_binlog=1
        #Minimizes the number of changes you lose if a problem occurs down to 1, reference
        #max_allowed_packet = 16M
        max_allowed_packet = 32M
        #
           # Don't replicate Drupal cache, watchdog, temporary search data
        replicate-wild-ignore-table=DrupalDB%.cache%
        replicate-wild-ignore-table=DrupalDB%.watchdog%
        replicate-ignore-table=DrupalDB.temp_search_sids
        replicate-ignore-table=DrupalDB.temp_search_results
            # I have 5 servers in this group of replicators
        auto_increment_increment = 5
            # This server is server #1 of the 5
        auto_increment_offset = 1
            # You could choose to skip only certain MySQL errors; I skip all and I have not experienced any ill effects in months of heavy usage.
        slave-skip-errors=all


        Secondary or "Slave" Server:
        [mysqld]
            # This server is server #2 of the 5
        server-id = 2
            # I DO want to replicate this database
        replicate-do-db = DrupalDB
        sync_binlog = 1
        #Minimizes the number of changes you lose if a problem occurs down to 1, reference
        #
            # I have 5 servers in this group of replicators
        auto_increment_increment= 5
            # This server is server #2 of the 5
        auto_increment_offset = 2
        slave-skip-errors = all
        #
           # Don't replicate Drupal cache, watchdog, temporary search data
        replicate-wild-ignore-table=DrupalDB%.cache%
        replicate-wild-ignore-table=DrupalDB%.watchdog%
        replicate-ignore-table=DrupalDB.temp_search_sids
        replicate-ignore-table=DrupalDB.temp_search_results
        #


      • Make sure to replace "DrupalDB" with your actual Drupal database name (example: webXdbY if you followed the How To referenced above).
      • Use binlog-ignore-db for databases you don't want to replicate
      • The replicate-wild-ignore-table stops replication of temporary data. For Drupal, you have to stop the replication of cache and watchdog data, otherwise you are likely to have replication errors almost immediately.
      • The replicate-ignore-table stops replication of the selected table. In Drupal, you don't want to replicate temporary tables used by Search as that is not needed and can chew up important bandwidth and CPU
      • auto_increment_increment and auto_increment_offset are optional but may be very useful if you may want to eventually try "master-master" replication. They supposedly ensure that there is no duplication or conflict in replicated records. This means that the auto_inc columns will all increment by a block of 5 (in this example) and this server will be offset #1 into that block. The other server should be setup the same way EXCEPT that the offset needs to be different (for example, choose 2 for your second server).

    • To resolve the second problem (MySQLDump hanging on a database):
      • You do not need to READ LOCK the tables to do the dump (that WILL effectively stop access to the databases while locked).
      • So, instead of doing this in MySQL:
      • USE exampledb;
        FLUSH TABLES WITH READ LOCK;
        SHOW MASTER STATUS;


      • You can do this at the commandline (shell) prompt:
      • $mysqldump -uroot -pYourMySQLRootPassword --master-data --opt exampledb > dumpfile.sql


        That little "--master-data" option is key. It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped database. These are the master server coordinates from which the slave should start replicating.

      • You can then do (at MySQL prompt; CAUTION, this deletes your database!):
      • MySQL>DROP DATABASE exampledb; CREATE DATABASE exampledb;


      • And to load the data onto your replica, at commandline (shell):
      • $mysql -uroot -pYourMySQLRootPassword exampledb < dumpfile.sql


        This worked perfectly on that pesky file for me!

    Your results may vary but I hope this helps you like it helped me!
    Some Tips:
    • Since READ LOCK prevents access to your database and some of you may be forced to do some of this to a "live" database, I would recommend you reorder the instructions in the HowTo linked above, to remove the READ LOCK after the dump, before copying the Dump files from Master to Slave, so that you reduce the offline time to the bare minimum necessary and the database isn't locked while you are waiting for an scp copy to finish, especially if the servers are geographically remote to each other and you are copying a lot of data.)
    • If you receive a "could not initialize master info structure, more error messages can be found in the MySQL error log", try this in MySQL:
      • MySQL> reset slave;

    UPDATE March 19, 2012:

    • Added explanation for sync_binlog = 1

    UPDATE January 24, 2012:

    • Consider setting your connection timeout to a higher value if one or more of your servers has a slow network link and keeps losing connection to their master server. You might try this especially if you keep getting message like this in the log:
      • [ERROR] Slave I/O thread: error reconnecting to master 'mysqluser@mysqlpassword:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 111' errno: 2013 retry-time: 60 retries: 86400
      • Within the MySQL environment, you can do this via: MySQL> SET @@global.connect_timeout=30;
      • OR within /etc/mysql/my.cnf config file, [mysqld] section: connect_timeout=30

    UPDATE July 9, 2011:

    • I have been using "Multi-Master" MySQL Replication with Drupal6 for 7 months now; now across at least 6 different providers (including at least 4 OpenVZ VPS's & 2 VMware® virtual machines) and it has been rock solid!
    • In fact, the MySQL Replication is also working for our Piwik Analytics, and for multiple Content Delivery Networks (one using Drupal, another using Perl, and another using PHP).
    • We will also migrate to a MySQL based DNS solution soon as well that will also leverage replication. The end goal is to setup our own "closest server to you" load balancing, failover, and high availability.
    • I will post How I Did it (for Multi-Master MySQL Replication) soon (the "Master-Slave" instructions are below).

    UPDATE July 7, 2011:

    UPDATE January 28, 2011:

    • I have now been testing Multiple Master MySQL Replication with Drupal 6 for about a month now and it has been working GREAT! It involves 3 Primaries ("Masters") and 2 Secondaries.

    UPDATE December 27, 2010 ... some observations:

    • MySQL Replication has been working GREAT for numerous Drupal sites on 5 geographically diverse Servers (1 Primary + 4 Secondaries) across 5 different Providers/ISPs over the Internet!
    • I plan to upgrade these servers to "Multi Master MySQL Replication" soon. I will keep you posted on how that goes.
    • One key observation: For Drupal Menu changes to show up on your secondaries, you need to clear the cache on the secondaries.
      • i.e., go to "/admin/settings/performance" on each replica server and click the button at the bottom that says "Clear Cached Data"
    • Also, be careful, MySQL Replication can generate a LOT of traffic! Especially if you have multiple secondaries. And, since my servers are replicating via TLS/SSL Encryption, it also increases CPU usage on all servers involved.
    Your rating: None Average: 4.5 (22 votes)

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.

    Mysql replication is much

    0

    Mysql replication is much better in new Drupal 7 than in Drupal 6 but thanks for the guide

    patch

    0

    Hello,

    you can find the patch here

    is the same system master / slave on pressflow.

    Enjoi

    Thanks but I am not a fan of

    0
    Thanks but I am not a fan of having to patch anything in Drupal in order to get Replication working.

    IMHO Replication is a functionality of the database infrastructure and I do not want to have to make application changes in order for it work. (It is hard enough getting the application layer to work properly without having to rely on yet another patch Worried )

    What I have outlined in the HowTo above works perfectly without any changes or patches to Drupal!

    Could you show me some direction for D7 MySQL replication ?

    0

    Hi, Could you tell me more about D7 MySQL replication method ?
    My email : hai...@sutrixmedia.com
    Thanks in advance !

    I will create Drupal 7 Replication how to's

    0
    I will create Drupal 7 Replication how to's once we have tested it thoroughly enough. To be honest, I am not usually in a hurry to upgrade until a lot of the kinks are worked out; and I am too busy trying to make money to write how to's (hey, I said I was being honest. Yes Animated )

    I still have painful memories of Drupal 5 to Drupal 6 upgrade (I guess it is too late to write a how to about my findings? lol) when Modules were replaced by a "similar" module (e.g. FeedAPI -> Feeds), simply changed their behavior (e.g. Scheduler), were "broken" (e.g. Node Import and Date fields) and/or had major rewrites (e.g. Views).

    On the positive side, I learned more about Drupal in doing the Drupal 5 to Drupal 6 upgrade than any other time.

    Open Source software is AWESOME (and low purchase price), but you don't usually control the feature set or direction unless you are one of the developers.

    Why cache_filter breaks

    0

    Just so readers understand why most tables replicate, and cache_filter breaks.
    The auto_increment_increment = 5 and auto_increment_offset = 1 values start off server #1 with "1" for ALL auto_increment tables.
    It also makes it count by fivesies as it goes up.
    So your IDs on auto_increment fields on server #1 are: 1, 6, 11, 16, 21, ...
    On #2, you use auto_increment_offset 2 and it gets: 2, 7, 12, 17, 22, ...
    On #3, you use auto_increment_offset 3 and it gets: 3, 8, 13, 18, 23, ...
    The remaining servers are left as an exercise for the reader. :-)
     
    The cache_filter table does NOT use an auto_increment field as the unique ID.  It doesn't even have an auto_increment field.
    Instead, it takes the ID of the node being cached, and some kind of hash, probably involving 'data', and hand-crafts a "unique" ID. Unfortunately, the same algorithm is applied on EVERY server, so you end up with duplicate keys, because the ID and the 'data' match, and so the hash matches, and so the key matches.
     
    Drupal core developers who are serious about scalability would be wise to consider developing a drupal_guid() function that would generate unique GUID (Globally Unique IDs) across a distributed multi-write database setup.  This will NOT be trivial, and would require some unique machine_id at the database layer.  Although possibly it would be easier to constrain the front-ends to have a unique machine_id...
     
    Anyway, the replication failure for the tables NOT using auto_increment is kind of a face-palm once you realize how it all works and fits together. Hopefully this comment de-mystified at least one tiny piece of that. :-)
     

    Thank you for this comment

    0
    Thank you for this comment and excellent information!

    The other interesting thing that I have seen and the ONLY thing that doesn't replicate in this setup is the REMOVAL of items from the menu (Primary, Secondary, or Navigation) ... you have to go to each replica and clear the cache to see the menu changes.

    Watchdog

    0

    PS
    I'm not so sure you would want to not replicate watchdog.
    It costs bandwidth in a big way, but is purned by cron.
    More importantly, it uses an auto_increment, so should be immune to ID conflicts.
    And, CRITICAL, is that if your MASTER database crashes and burns due to a bug in MySQL (as unlikely as that may be) your only forensic evidence may exist in that Slave watchdog table...
    PS
    You may want to consider using auto_increment_increment = 20, as that will make it easier to expand to, say, 20 servers.
    It is an upper bound on the number of servers you can have.
    Well, okay, you can shut down ALL your databases, and edit the my.cnf files and change them all to 6 when you have 6 servers, and again to 7 for 7, and so on...  That's probably not what you really want for "high availability".
    Of course your run this risk of running through your INT (or BIGINT) limit 20 times as fast.  But if you manage to get THAT many records in a Drupal server, you are going to have much bigger performance problems LONG before you get there, and you can address the ID limit at that time.
     
    PPS
    Your CAPTCHA is annoyingly difficult.
     

    Interesting ... I haven't run

    0
    Interesting ... I haven't run into any issues doing without replicating watchdog; but I'll definitely consider this. Bandwidth consumption is a concern too though. I'll try it and see how it is.

    Funny you mention the auto_increment, because I already increased mine to 10 as, at one time I had 7 replicas going of these sites.

    And yes, the Captcha is annoyingly difficult because I got tired of the BS posts (through ReCaptcha AND the current Image Captcha) from (what I thought was bots) in India and China ... turns it out it might be humans actually employed to manually type in SPAM on websites Eek

    I appreciate you leaving the comments in spite of the difficult captcha!! Thumbup

    Wildcard characters

    0

    Technically, you should use \_ every place you have _ in wildcard ignore.
    _ matches ANY single character.
    While it's unlikely you will ever have a Drupal table named 'tempAsearchBsids' that table WOULD match your pattern...
     

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.

    Post new comment

     

    More information about formatting options

    CAPTCHA
    This question is for testing whether you are a human visitor and to prevent automated spam submissions.
    Image CAPTCHA
    Enter the characters shown in the image.

    Short URL


    Copyright © 2007-2012, WidWad LLC. All Rights Reserved.
    All trademarks and service marks are the property of their respective owners.