Log in

No account? Create an account
entries friends calendar profile My Website Previous Previous Next Next
An idea for Amazon AWS S3 and MySQL cluster replication distribution - Mark Atwood
An idea for Amazon AWS S3 and MySQL cluster replication distribution
I've described this idea to a few people, but I figured I would post it here.

I've had an idea for using Amazon AWS S3 to distribute MySQL cluster replication data.

The existing architecture for MySQL clustering is as follows:
  1. The master has N slaves
  2. The master copies each binlog replication to each slave. If there are 7 slaves, then the master has to push the same data out it's network pipe 7 times
  3. The slave has a hot TCP connection to the master. The master has 7 hot TCP connections, one for each salve.
  4. The slave takes each replication chunk and applies it.

Here is my idea.
  • For each replication chunk, the master creates a handle name for it, and also the handle name for the next chunk.
  • The server copies each chuck into an S3 item, once. The item's name is it's handle, and it has a piece of S3 metadata that is the handle of the next chunk.
  • Each client tails the bucket's item list, and grabs each chunk in turn. After it's applied that chunk, it writes a short item back to the bucket, stating that it's applied the chunk.
  • A low priority reaper watches the bucket, and when every registered slave marks a given chunk as applied, the reaper deletes the chunk.

The advantages are
  • The master only has to write the chunk out to the network once. There is no increased load when the number of slaves is increased.
  • The slaves can be very geographically dispersed without additional pain.
  • The master and the slave don't need hot TCP connections, VPN connections, or firewall configurations.
  • If the network partitions for a while, the slave falls behind, but will resync without pain. Also, a network partition doesnt crash the master when it's binlog space is exhausted.

Tags: , ,
Current Location: MySQL Con & Expo, Ballroom A, Santa Clara Convention Center, Santa Clara, CA

3 comments or Leave a comment
From: rberger Date: April 25th, 2007 04:16 am (UTC) (Link)
Why not just use Amazon SQS instead? I believe it does all the things you suggest to use S3 for, but does it automatically
fallenpegasus From: fallenpegasus Date: April 25th, 2007 05:51 am (UTC) (Link)
Because SQS does not absolutely guarantee ordering. SQS items must be idempotent. If a MySQL binlog chuck gets randomly reordered, you've just hosed your database.
mauser From: mauser Date: April 26th, 2007 01:31 am (UTC) (Link)
Where can I get N slaves? Or X slaves, preferably... :-)
3 comments or Leave a comment