This Year's posts

Archive for April, 2005

Working from coffee shops

Monday, April 11th, 2005

Jonas has posted a link today. to a post of his on working from coffee shops.

I’ve certainly seen the same effect in my life. In fact I’m posting this from the coffee shop on the USF campus right now. I find that I get more work done when I’m at the coffee shop, computer lab or library.

One cause of this productivity boost may be peer pressure– when you’re working at home, no one can walk past you and see that you’re playing blogshares.

Well, uhh…..

/me closes blogshares browser window

Plus there’s cute girls to look at (and occassionally talk to). :-)

Tags and Term extraction

Sunday, April 10th, 2005

Recently, Yahoo released a new web service feature, which allows one to send Y! a bit of text and have Y! return the most significant terms in the text. It seems pretty obvious that they’re using something along the lines of TFIDF(another definition), which means they’re returning the most ‘statistically signifcant’ terms.

Jonas Luster, international man of mystery and WordPress Inc* employee #1, has created a WordPress plugin which uses the term extraction service to add Technorati tags to his posts- a really cool feature- but misguided.

First of all, I think it is functionally unneccessary. The plugin isn’t adding anything in terms of value or content. Why? Because any consumer of the content could do the exact same term extraction. In other words, these tags are noise, which leads me to my second point.

Secondly, even Jonas himself worries that this method could dilute or even pollute the Technorati tag ecosystem. I think its obvious from the tags the plugin is generating on his blog that the extraction terms are not nearly as useful as his tags. For example, is this post really either beautiful or about beautiful? Surprisingly, Dave Sifry, CEO of Technorati is excitied about the new plugin.

A core problem with this approach to classifying text is that the method is text-only. It seems to me that this is the same problem that pre-Google search engines had- they didn’t consider any data outside the text. Google came along and used link graphing as a method for relevance ranking. I like tagging because it is more ‘data outside the text’ that can be exploited.

Third, I thought the whole idea of tags was to have human-generated metadata. Am I wrong? Seriously, collaborative tagging (a la del.icio.us and Flickr) are a revolution in community created metadata. Likewise, Technorati tagging is a significant improvement in author-created metadata.

This brings me back to my first point- there’s nothing meta about machine-generated keywords (especially if others can use a similar generator) because its information derived directly from the data.

Jonas admits that this is really just an experiment:

Indeed, Yahoo! Terms are not how we see the world, they’re how Yahoo!, a machine, sees it. Let’s leave them in, for a while, and see what pans out :)

Jonas: I want to know how you see the world. I’m tired of hearing how Yahoo and company view the world and if I care about how they view the world, I’ll ask them. Oh, and continue the experiment, but with all due respect, I hope it fails. :-)

* Or is it WordPress Foundation now?

Google Maps in the real world

Thursday, April 7th, 2005

This is damn cool.

An Evolutionary Revolution

Thursday, April 7th, 2005

On the shoulders of giants…

A revolution, slowly, is happening to the Web.

Many call the changes that are occuring Web 2.0 and I think the analogy is quite useful. It seems that the Web has reached a degree of stability- browsers are relatively compliant and useful- stuff generally works, which opens up the opportunity for people to innovate.

One vision for the next iteration of the Web is called the Semantic Web. The idea is that we’ll build a web that is structured and meaningful (to computers, not humans). The vision for this comes from Tim Berners-Lee and is essentially distributed knowledge system based on a markup format called RDF, a way to encode logical statements in XML about anything. It is, of course, also extensible on the edges (meaning anyone can add content and meaning to it). That is, if they understand the formats.

The Semantic Web would be a discontinuity from the current Web that we all know, mainly because its primarily for machines and only secondarily for humans. I think we can do better.

The ideas I’m putting across are by no means new to many people, but I’ve been thinking about them this evening in response to a paper we read for class and would like to distill and summarize my viewpoint here.

The above paper presents some interesting technology built on a prototype Semantic Web. The problem is that their rationale for building the Semantic Web is wrong:

…because HTML marries content and presentation into a single representation..

Certainly, HTML can be an intermixing of content and presentation, but it doesn’t have to be- it actually shouldn’t be.

With the advent of CSS and XHTML, markup can now be semantic (notice lowercase ‘s’)- it can have meaningful structure which is independent of how the content is presented in a browser.

So, we already have a web- a web which can be used to create semantic content, yet is, at the same time, presentable to users in its native form. As Tantek Çelik has said, “users first, computers second.”

Going the route of the Semantic Web would be like throwing out the source code for a mature product and rewriting it from scratch. Ask Netscape how well that works!

The Revolution Has Begun

Led by Tantek Çelik, Matt Mullenweg, Eric Meyer, Kevin Marks and others who I’m sure I insult by omitting, a new set of standards, deemed microformats are appearing. These standards specify ways to markup XHTML in ways that give the content some meaning. Some examples include: Votelinks, NoFollow, hCard, hCalendar, podcasting, blogchalking, xfn, RelLicense, RelTag xFolk, and online news.

The promise of microformats is that they offer machine-usable data while at the same time providing human-usable, presentable content.

I think what we’re seeing is a stage of evolution which will have revolutionary impact. This movement toward having semantic, well structured markup which is separated from the presentation will have other fruit as well. In many ways, AJAX, the new buzzword that encompasses all sorts of cool client-side Javascript magic, has been enabled by the maturing of CSS.

Please, let’s forget about trying to build a new Semantic Web, let’s make the one we already have (and love) semantic.

The revolution will be evolutionary.

Viva la revolution!

References:

contemporary christian porn

Monday, April 4th, 2005

Brandon has just compared porn to something interesting. I’m not going to say what it is he compared it to, but I’m sure you can figure it out.