Mark Atwood (fallenpegasus) wrote,
Mark Atwood

Created test datasets for my project

As I continue to work on my AWS S3 MySQL storage engine, I decided that I needed some S3 buckets that actually contained "stuff", to test and develop it against. So I grabbed The Devil's Dictionary off of Project Gutenburg, wrote a little Perl script to read and parse it, and created an S3 bucket that contains a item for each definition. I also grabbed 63 scans of pieces by William Bouguereau out of ArtRenewal, and created a bucket to hold them.

Unfortunately for me, ArtRenewal doesn't lend itself to being easily scraped, or I would have scriped and then walked away from the process of transferring all 226 pieces into the bucket. Also, something (I suspect the Perl S3 library), doesn't play well with non-ASCII characters in item names, all the pieces with names with accented characters didn't upload. This is annoying, since item names are supposed to handle UTF-8.

But this is enough anyway to serve as a good working test base.

Oh, and thanks many to Amazon, for supporting this work by giving me a comp'ed AWS S3 account.
Tags: art, geek, mysql, s3

  • Razors

    I'm getting ads for I think five different "all metal" "get the best shave of your life" "throw away the plastic" razor startups. They all seem to be…

  • Doing what needs to be done

    On May 1st, one of my co-residents found one of the feral rabbits that live in the area cuddled up against a corner of the house. It was seriously…

  • The CTO of Visa, after listening to me present

    Some years ago, I was asked to travel to the corporate meeting center to present at a presentation-fest to the CxO staff of Visa. Yes, the one with…

  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded