the ryan king

Then I had a moment when things suddenly became clear to me– Google is a prime target for scraping and therefore the increased bandwidth and ugliness they get from using table-based layouts is probably offset by the scraping they prevent through obfuscation. If their markup were cleaner and more meaningful, they would be much easier to scrape.*

Then another revelation came.

In the process of developing microformats, what we’re really doing is making the Web more scrape-able. By establishing some guidelines we make it so that if we want to scrape a certain kind of data, we should be able to get it from all sites that adhere to the microformat with minimal difficulty.

This led to another revelation.

I think there are probably a class of REST webservices that could disappear (or not appear, but I’ll get to that in a second).

I think webservices which are query-only could probably be accomplished by a well planned url-scheme and clean, meaningful markup, which in my mind includes judicious use of microformats.

Of course, with each page request there would be some wasted bandwidth, so high traffic webservices like Google, Amazon, EBay or Alexa probably wouldn’t work in this manner, but a smaller website could probably get away with saying “just scrape us” and avoid having to expose their data both in XHTML and some other homegrown XML format.

Additionally, with clean, valid, meaningful XHTML, what we’re talking about is no longer scraping, but parsing or consuming. I think this is a “difference in degree that results in a difference in kind.”

* After discussing this with Matt and Niall, I realized that Google probably isn’t maintaining the crusty layout to prevent scraping. Rumor has it that Google’s layout system is ‘inflexible’ and Niall pointed out that there are some cryptic looking html comments in the Google source that facilitate the scraping. Still, I think my conclusion holds and even if I’m wrong about Google, I put this out there as my thought process.

New Adium

Wednesday, May 4th, 2005

I just installed the new version of Adium– very cool stuff, especially the Growl integration.

One random observation, though….

My contacts list has changed to use iChat style colored buttons next to user names, rather than just greying out or changing the background color of the contact’s line.

I don’t like this behavior, because the red dots catch my attention much more than the green dots. When I’m looking at my contacts list I usually just want to see who’s available, not who’s ‘not available.’

Of course the red-yellow-green system is borrow from traffic lights, where having red as the loudest color is a safety *feature*.

Here it’s a bug.

OK, not a bug, but, like I said….. random observation.

Archive for May, 2005

Open Media 100

Minimize button bug

How to make use of all your gmail storage

Better XHTML vs. Web Services

New Adium

Pages