Eyeballs and Bugs

January 23rd, 2014

This tweet reminded me of an idea bouncing around my head.

“With enough eyeballs, all bugs are shallow” is clever and wise. But also be misleading.

Why is it wise?

Given a piece of software, if you keep adding developers to it, eventually you’ll probably find more bugs, but why?

Why is it misleading?

Because it doesn’t actually tell you who finds bugs, other than the mythical “eyeballs”.

*You’ll find a bug when someone does something new that causes a problem and is sufficiently motivated to track down the problem.*

Almost all software has existing bugs. They exist because no one has hit that case *and* been motivated to fix it.

In my time at Twitter I’ve worked on or observed many projects that make use of open source software (OSS). Invariably we push software into edge-cases that no one has experienced before. And those edge-cases invariably have bugs that we fix.

Did those bugs get fixed because there were more eyes looking at the code? Or was it because someone with larger scale and more rigorous success criteria used it? In other words, do bugs get fixed because someone with a motivations different than its creators, different than even its previous users, relied on the it?

Choosing OSS

When evaluating the quality of open source software, you should consider not just the quantity of users, but their diversity and/or similarity to your usage patterns.

Managing OSS

When running a project, pay attention to your new users, your outliers and the oddballs. They’re often going to discover some interesting things about your software.

Is it plugged in?

January 8th, 2013

(Cross post from Medium)

In high school, I took a class called “Computer Troubleshooting.” This may sound like a weird class for a 16-year-old to take or a high school to offer, but in practice it was savvy: the school netted a built-in, small workforce for IT support and repairs. Nothing like a little free labor to make things go.

Going in, I didn’t think I’d learn anything and when I finished I felt the same way. After all, I’d grown up around computers and was already the de facto family IT person.

Over a decade later, working as a software engineer, I realized that I’d learned something very important.

The course started with the group tackling problems together. Our teacher would approach the problem and guide one of us through it, asking questions.

“Is it plugged in?” “Is the power light on?” “Zap the PRAM,” etc.

After a few weeks, we were fixing problems on our own. My first case put me in an office with a frustrated administrator three times my age. I couldn’t figure out why the printer wouldn’t work. I ran through everything I could think of before heading by to my teacher and asking for help.

By now the problem is probably obvious to you, but to me it was anything but that. The printer was not plugged into the computer.

Thus, my teacher’s mantra became “always check to be sure it’s plugged in”.

Today I would add to that “…especially if you know its plugged in.”

This is the lesson I learned. Unlike technology failures, when people fail we are blind to it and don’t easily self-correct. Instead, we need to learn to step back, question our assumptions and be willing to re-check everything.

Potential Consistency

April 29th, 2010

In my role as the lead on Twitter’s migration towards Cassandra, I spend a lot of time explaining the concept of eventual consistency and why its not as big of a shift for us as people fear.

It seems that this fear stems from misunderstandings of both eventual consistency and its alternatives.

First off, people confuse eventual consistency (and I’ll be speaking in terms of how its implemented in Cassandra) as if it were the normal condition. Its not– it is the error condition. With the right parameters (R + W > N)* and no failures, you get immediate consistency. The eventual part only comes in when there are failures or you purposely tune your consistency down.

One alternative to Cassandra-style eventual consistency is what you see implemented in BigTable and its clones. Under normal operations with these systems you get immediate consistency. However, in the case of failures, the mutation operation can fail, requiring the client to retry (if it can). If you can’t retry (or can’t wait long enough for the retry to succeed), you’ll lose the data. These systems choose to reduce their availability in the case of failures. For some systems, this may be a great tradeoff. I think that class of systems is smaller than many think.

Another alternative to eventual consistency is a pattern I call “Potential Consistency”. Some well known architectures have this property– any system that relies on asynchronous master-slave replication + a cache (think mysql + memcache) has, at best, potential consistency.

Whether you do write-through or read-through caching (do you update the cache or invalidate it?) you can easily have different data in your master, each of your slaves (replication, especially in mysql, isn’t perfect because statement-based replication is non-deterministic) and memcache. And there’s no guarantees that this differences will ever be resolved. Your data might be consistent, but once it becomes inconsistent there no guarantees it will ever become consistent again. Unless you build something to repair that data. If you can do this successfully– congratulations, you’ve built an eventually consistent system.

  • R = read consistency (how many replicas you block for on read); W = write consistency (how many replicas you block for on write); N = number of replicas for the data in question. You typically satisfy this condition with quorum reads and writes.

shutting down inursite

August 17th, 2009

inursite was a fun project to work on, but it never took off.

Despite the presence of a for-pay option, it hasn’t made enough money to even pay its own hosting bills. On top of that my day job has kept me busy to the point that I’m unable to put any effort into inursite.

So, unless someone comes up with a better suggestion, I’m going to be shutting down inursite.com in a few weeks.

Currently, the site is hosted on a $150/month server from serverbeach, but it could probably go cheaper.

how to work with me

December 15th, 2008

I’ve been doing contract consulting work for the last year, since I left my “last ‘real job’”. Looking back on all the projects I’ve worked on this year, I’ve noticed a lot of similar problems.

The biggest theme is that I get hired to work on feature X, but I first have to spend a month building all the prerequisites for feature X. Or setting up some architecture or infrastructure. Or fixing a broken test suite. You get the point.

The next thing I tend to see is that we spend a lot of time trying to figure out what exactly it is I am to do and how we’re going to coordinate the work.

What I’ve concluded is that there are some things that my clients can do to improve the process of working together. Also, if a team doesn’t do all of these things, I’ll still work with them, but I’ll likely suggest that I work on this list first.

  • test suite

    One of my grad school professors spent endless hours trying to convince me that being strict about things like testing and MVC actually made you more agile over the long run. After a few months of working on projects in the “real world” I often couldn’t remember why I wrote a piece of code the way I wrote it, which meant that I didn’t know what I’d break by changing it. If I’d written tests I’d be in a much better situation.

    Of course, a test suite is only valuable to the extent that it covers the full usage of the software. 100% coverage is rarely worth the effort, but there is plenty of good to be had from high levels of coverage (in the 90-100% range). In general, make sure it actually tests what it needs to do.

  • continuous integration

    Not as big a win as just having a test suite, but it’s a good way to nudge you towards having the suite remain unbroken. The more important step is having an agreement on your team to keep the test suite up to date. This agreement requires investment from people who don’t write the code.

    The continuous integration system should also produce code coverage reports, so that you can keep an eye on how well your test suite is doing at testing your code.

  • a vision

    You don’t need a detailed long term vision, but if you want help at the architecture level, I need to know what’s at the horizon, which will be a different amount of time, depending on the stage your company is in. If it’s a brand new startup, a few weeks might be the horizon. For an established company it could be years.

  • a plan

    Who’s working on what? How are they doing to do it?

  • a schedule

    If you’re planning on a release/launch anytime soon you should know the prerequisites for launch and an estimated date.

    Your plan schedule and priority list are all interrelated. My working hypothesis is that you can only decide two of the three, with the other one being derived from the previous two. Note that this is related to the project triangle concept.

    Typically the way I like to work is to set the vision and schedule and figure out the plan from that. So, your vision could dictate features A through Z, each of which would take a week of work, but you’d like to launch something within a month, so we plan on doing features A-D.

  • one-step dependency setup

    In rails, rake gems:install should give me everything I need. I’m sure there are equivalents in other languages and frameworks. If not, there should be.

  • shared workspace for design/planning

    Whatever you do, don’t email me a Word or PDF document.

  • appropriate level of access

    If you want me to help you improve your architecture, I need to know what your current architecture is. Seems simple, but I’ve had this problem before.

  • if it’s important it should be written down

    This is mostly a collaboration issue. I’m fine with spending time in your office or with the occasional phone call, but if the only way to get things done is by being in your office and talking to people face to face or overhearing others’ conversations, I’m not going to get much work done. The work of building software requires (at least for me) times of deep, focussed thinking that can’t be done in a room with twenty people talking to each other. Also, I find that meetings are generally a waste of time, because most people (myself included) don’t know how to run meetings effecively. So, unless you’re very good at running meetings, they should be kept to a minimum.

(Thanks to Cameron, Eric and Randy who all read drafts of this. Of course they didn’t give me any constructive feedback, they just told me to go ahead and post it.)