my methods

April 2nd, 2008

A useful ruby snippet that Cameron. and I came up with last night:


class Object
  def self.my_methods
    methods - (superclass ? superclass.methods : [])
  end
  
  def my_methods
    methods - (self.class.superclass ? self.class.superclass.new.methods : [])
  end
end

This is useful for finding out what methods an object has, without including all the methods inherited from super classes.

Mixtape moveage

March 21st, 2008

Just as I’m getting in the habit of doing mixtapes, I’m moving them to another blog. Recently, some friends and I started a new music blog called Attacked by Jackets, and I’ve posted my latest mixtape there.

Fire in the Mission

March 17th, 2008

My apartment is right by the fire station, so I’m used to hearing sirens all the time. So, the first few I heard didn’t catch my attention. By the time the fourth went by, I could hear yelling and glass being broken. Then I heard an unfamiliar sound from the back of the house. I went to investigate, only to discover that one of the buildings across the back alley was on fire. My only response was to twitter, “Holy shit, a house down the alley is ON FIRE!“.

I quickly grabbed shoes, a jacket and my laptop (duh, I’m a nerd) and went outside.

Long story short, at least 2 buildings burned and our apartment smells smokey. No reports of injuries other than one fire fighter being treated for smoke inhalation.

Photos:

by me: http://flickr.com/photos/ryansking/tags/missionfire
by cameron: http://flickr.com/photos/ymbiont/tags/missionfire

new and video:

Friday Five: Upbeat

February 29th, 2008

I picked these songs because they have an kind of upbeat energy to them. Sorta. Whatever, the just sound good together.

Download

Track Listing

  1. Time to Pretend – MGMT – Oracular Spectacular
  2. So It Goes – The Broken West – I Can’t Go On I’ll Go On
  3. Easy on Yourself – The Drive-by Truckers – A Blessing and Curse
  4. Ode to LRC – Band of Horses – Cease to Begin
  5. Narcocorrido – Okkervil River – Black Sheep Boy Appendix

Introducing Conveyor

February 26th, 2008

For the last month or so, I’ve been working, along with the guys at Minimal Loop(note, that website is blank), on a new open source project called Conveyor.

What is Conveyor? Well, that’s a good question.

One way of describing it is as a “distributed, rewindable, virtual queue server”. It speaks HTTP and will soon have a peer-to-peer replication mode. It can be treated like a queue, but because it doesn’t actually get rid of any data, you can rewind the queue to any point in the past. And you can treat it like a group of virtual queues and it appears like a queue to several sets of consumers, because the “queues” are really just iterators.

A good catchphrase is: “Like TiVo for your data”. It records, it pauses and it rewinds a broadcast stream.

Here’s a bit of the motivation:

Many people in the web industry are coming to the realization that the era of one-size-fits all databases is over– at least for large websites. The future for large websites’ data storage is likely a collection of special purpose data stores: GFS/MapReduce for batch jobs, inverted indexes for search and fast retrieval of small result sets and relational databases for smaller datasets which need online analysis. BigTable and SimpleDB-like things fit in there somewhere too.

The question remains though, how do you tie these together?

In my (limited) experience with storing the same data in multiple data stores, its useful to treat one of the data stores as primary and the others as derivative of that primary store. So, for example, you might keep your primary data in MySQL, but build inverted indexes with Lucene. You can usually tolerate your search indexes being a little out of date with the database, just so long as they aren’t too far out, in the same way that your MySQL slaves can be out of sync with the master, but not too far.

In this case, Conveyor can be used like an application-agnostic version of MySQL binlogs, which can be replayed to write data into multiple, diverse data stores.

Another use case is a multi-stage web crawler. You have a component that fetches pages and stores them in a cache. Another component takes those pages out the cache and parses them, which is passed to another stage that stores it in a database and writes a log of the changed data. See where I’m going with this?

Conveyor is useful at each stage of this architecture– you get queue semantics to distribute work, you get get rewindablility to deal with bugs in your code without re-running jobs from scratch and the virtual-ness of the queues means that your stages can branch with very little overhead and without redoing any previous work. Want to add a new data store later? Just write a Conveyor client that starts at the beginning of the queue or initialize it from a snapshot (making sure you know where in the stream of data that snapshot came from) and let it catch up.

Anyway, Conveyor is still a rough work in progress. It’s very alpha and not many people are using it yet (read: there are probably undiscovered bugs).

If you’d like to try it, it’s a simple as sudo gem install conveyor (if you use rubygems) or you can browse over to the rubyforge page and download a tarball. Conveyor depends on Thin, daemons and and json.

Then to run it you just do conveyor <data dir> where <data dir > is the directory where you want conveyor to store its data.

Update: I forgot to mention that there’s also a mailing list and irc channel.