Mark Atwood (fallenpegasus) wrote,
Mark Atwood

Created test datasets for my project

As I continue to work on my AWS S3 MySQL storage engine, I decided that I needed some S3 buckets that actually contained "stuff", to test and develop it against. So I grabbed The Devil's Dictionary off of Project Gutenburg, wrote a little Perl script to read and parse it, and created an S3 bucket that contains a item for each definition. I also grabbed 63 scans of pieces by William Bouguereau out of ArtRenewal, and created a bucket to hold them.

Unfortunately for me, ArtRenewal doesn't lend itself to being easily scraped, or I would have scriped and then walked away from the process of transferring all 226 pieces into the bucket. Also, something (I suspect the Perl S3 library), doesn't play well with non-ASCII characters in item names, all the pieces with names with accented characters didn't upload. This is annoying, since item names are supposed to handle UTF-8.

But this is enough anyway to serve as a good working test base.

Oh, and thanks many to Amazon, for supporting this work by giving me a comp'ed AWS S3 account.
Tags: art, geek, mysql, s3

