Log in

No account? Create an account
entries friends calendar profile My Website Previous Previous Next Next
Mark Atwood
An idea for Amazon AWS S3 and MySQL cluster replication distribution
I've described this idea to a few people, but I figured I would post it here.

I've had an idea for using Amazon AWS S3 to distribute MySQL cluster replication data.

The existing architecture for MySQL clustering is as follows:
  1. The master has N slaves
  2. The master copies each binlog replication to each slave. If there are 7 slaves, then the master has to push the same data out it's network pipe 7 times
  3. The slave has a hot TCP connection to the master. The master has 7 hot TCP connections, one for each salve.
  4. The slave takes each replication chunk and applies it.

Here is my idea.
  • For each replication chunk, the master creates a handle name for it, and also the handle name for the next chunk.
  • The server copies each chuck into an S3 item, once. The item's name is it's handle, and it has a piece of S3 metadata that is the handle of the next chunk.
  • Each client tails the bucket's item list, and grabs each chunk in turn. After it's applied that chunk, it writes a short item back to the bucket, stating that it's applied the chunk.
  • A low priority reaper watches the bucket, and when every registered slave marks a given chunk as applied, the reaper deletes the chunk.

The advantages are
  • The master only has to write the chunk out to the network once. There is no increased load when the number of slaves is increased.
  • The slaves can be very geographically dispersed without additional pain.
  • The master and the slave don't need hot TCP connections, VPN connections, or firewall configurations.
  • If the network partitions for a while, the slave falls behind, but will resync without pain. Also, a network partition doesnt crash the master when it's binlog space is exhausted.

Tags: , ,
Current Location: MySQL Con & Expo, Ballroom A, Santa Clara Convention Center, Santa Clara, CA

3 comments or Leave a comment
From: rberger Date: April 25th, 2007 04:16 am (UTC) (Link)
Why not just use Amazon SQS instead? I believe it does all the things you suggest to use S3 for, but does it automatically
fallenpegasus From: fallenpegasus Date: April 25th, 2007 05:51 am (UTC) (Link)
Because SQS does not absolutely guarantee ordering. SQS items must be idempotent. If a MySQL binlog chuck gets randomly reordered, you've just hosed your database.
mauser From: mauser Date: April 26th, 2007 01:31 am (UTC) (Link)
Where can I get N slaves? Or X slaves, preferably... :-)
3 comments or Leave a comment