Mark Atwood (fallenpegasus) wrote,
Mark Atwood

Created test datasets for my project

As I continue to work on my AWS S3 MySQL storage engine, I decided that I needed some S3 buckets that actually contained "stuff", to test and develop it against. So I grabbed The Devil's Dictionary off of Project Gutenburg, wrote a little Perl script to read and parse it, and created an S3 bucket that contains a item for each definition. I also grabbed 63 scans of pieces by William Bouguereau out of ArtRenewal, and created a bucket to hold them.

Unfortunately for me, ArtRenewal doesn't lend itself to being easily scraped, or I would have scriped and then walked away from the process of transferring all 226 pieces into the bucket. Also, something (I suspect the Perl S3 library), doesn't play well with non-ASCII characters in item names, all the pieces with names with accented characters didn't upload. This is annoying, since item names are supposed to handle UTF-8.

But this is enough anyway to serve as a good working test base.

Oh, and thanks many to Amazon, for supporting this work by giving me a comp'ed AWS S3 account.
Tags: art, geek, mysql, s3

  • (no subject)

    In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be…

  • I LOLed

    a grim, committed drive to prove that the Peter Principle is wrong, and that, indeed, one can rise beyond one’s level of incompetence, perchance to…

  • (no subject)

    This apparently takes only a little math. Now, I don't know a little math, but I know people who do know a little math. -- Blaine Cook

  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded