Mark Atwood (fallenpegasus) wrote,
Mark Atwood
fallenpegasus

Varchar patch for MySQL memory engine

MySQL has a storage engine slash table type of "memory" or "heap". The memory engine keeps the table in memory instead of on a disk. It can be very fast. It has a couple of problems. One problem is that it handles varchar types poorly and wastefully.

This was a problem for eBay. So they paid one of their own smart employees to fix that problem.

Other people have heard about that patch, and asked for it. I couldn't give it to them, and it all started bogging down in pointless legal discussions. At least by the lawyer people and people who are into lawyer people.

So the author of the patch took a sword to the knot, and released the patch to the public.

Here it is: http://code.google.com/p/mysql-heap-dynamic-rows/



This project provides support for variable size records (aka dynamic row format) in MySQL Heap (Memory) Engine.

Contributed under GPL2 by eBay, Inc April 2008

================================

As of March 2008, MySQL Heap Engine of any version is limited to fixed row format. It allocates fixed memory size for each record stored in a given Heap Table. For example, if table A has a VARCHAR(4000) column, MySQL will use at least 4000 bytes (plus other columns and per-record overhead) for every record regardless of whether it has that much user data. In this example, the table will use 4GB memory per 1M records.

Assuming that actual data in a VARCHAR varies (average data length is less than maximum), it would be best if Heap Engine could optimize its memory use. In the example above, if average VARCHAR data was 1000 bytes, the memory consumption would be 1GB rather than 4GB.

This project changes memory allocation mechanisms used by the MySQL Heap Engine. Each Heap Table uses separate area of memory for its own private use. When inserting new records, previous versions of MySQL would always allocate areas of "reclength" size. Reclength equals to maximum space that could be ever needed by one record in a given table.

The new approach adds a "create table" option to set chunk sizes. When inserting a new record, Heap Engine will analyze actual record data and calculate the number of "chunks" needed for a compact form of that record. It will then allocate the necessary chunks, link them together into a chunkset and store data in these chunks.

Variable size records provide "dynamic" row format, as opposed to "fixed" row format. For backwards compatibility, Heap Tables default to fixed row format. The project's scope is Heap (aka Memory) Engine/Table only, MyISAM and other engines do not have similar limitations.

For added flexibility, project introduces new HP_DATASPACE construct to MySQL Heap Engine, which should greatly simplify future BLOB support. Essentially, all BLOB's in a table would share their own HP_DATASPACE instance. Unfortunately, the author of this project did not have time to add BLOB implementation.

All the interesting technical details are provided in a big comment section in the beginning of hp_dspace.c, which documents design. The same information has been copied to the project's Wiki.

This project has been implemented by Igor Chernyshev of eBay Kernel Team.

Tags: mysql
Subscribe
  • Post a new comment

    Error

    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 3 comments