Home > Weblogs > Weblog Search! Get Yer Fresh Weblog Search Here!

Weblog Search! Get Yer Fresh Weblog Search Here!

May 23rd, 2003

Dave Winer implemented a weblog search using the Google API. He seems really thrilled by it.
‘Cept it’s been done. Speaking of prior art, a Google API-based weblog search tool has already been built by Micah Alpern. The results return isn’t nearly as nifty as Dave’s, but Micah’s does integrate with your blogroll to add a “Search Blogs I Read” feature. That is nifty.
However, both Dave’s and Micah’s tools have a fatal flaw: Google.

According to Google’s cache, the last time Google crawled Scripting News was May 16. (Also the last date it crawled 10RW.) If you do a search today for“weblog search” on Dave’s site, the results are incomplete. Google hasn’t yet crawled anything he’s written recently.
Movable Type, on the other hand, has a search tool already built in. (See the little search box over on the side?) I don’t know what Ben and Mena are using to power it, but whatever it is, it indexes every word of my weblog immediately. It’s probably extremely low overhead, as it doesn’t have to “crawl” — it can just index each post as you post it.
I can even search for “the” and get every post back. (Well, I assume every post; I didn’t check to see if there are posts where I didn’t use “the.”)
Like Dave’s tool, Movable Type’s search presents the results reverse-chronologically. Instead of providing the whole web log post (which could be overkill and wasted resources, unless you’re as pithy and brief as Dave), it excerpts the post and provides a link to the full post. If you’re logged in to Movable Type, it also provides an Edit link that kicks you right into the editing form. Ooh, did I mention you can also use regular expressions?
Most importantly, I don’t have to wait for Google to crawl my weblog. The Movable Type weblog search will return hits on the stuff I blogged minutes ago. As Dave would say, Bing! That’s killer.
It seems to me like Google is overkill for a weblog search. Google’s great because it scales for humongobytes (one humongobyte = a gazillion terabytes) of information. But for the amount of content in an individual weblog, you don’t need the scalability of Google. What you do want (at least I want) the freshness of having everything indexed as soon as it’s posted.
Blogger has this functionality, but only on the authoring side. E.g. from the authoring side you can search all your posts. They don’t expose it to the users like MT does. They should.
Why don’t Manilla and Radio already have the kind of search capability MT has? Or do they? Or am I missing the sparkliness of Dave’s tool?
Bottom line: using Google to search your own weblogs, you’re sacrificing freshness for scalability that you don’t need. And freshness is what makes weblogs tasty. [Homer Simpson voice] Mmmmm. Weblogs.

Greg Weblogs

  1. May 23rd, 2003 at 12:59 | #1

    I run a Radio blog and I used to use Atomz as my seatrch engine. It worked great. Then Micah’s application came along and I shelved Atomz for it, not because it provided a better search for my own blog (it didn’t) but because it offered my the ability to search either my blog or all the blogs I read in one search interface. It is the latter set of blogs (the ones I read) that is the real power in Micah’s app. I agree that google for one’s own blog isn’t the best way to go – another problem I have is that google considers searches restricted by “site:” (which is what Micah’s call to the API essentially is doing) to be all of the pages on that server (in my case everything on http://www.island.net, my ISPs server) and not just everything on my site (in my case only the subdirectories http://www.island.net/~leslies/blog) so depending on the search term ou use you can still get a lot of spurious results. And you’re right, Radio does not have a built in search – in theory there’s no reason why the radio hosting service couldn’t provide one, but the software is built so that pages can be hosted anywhere without any special server configurations, so the absence of a search feature is both a feature and a missed business opportunity for the hosting service.

  2. May 23rd, 2003 at 16:30 | #2

    I used Atomz on the old version of this weblog. In my experience, it was particularly bad. It didn’t seem to index the site very well at all, so the results were almost always incomplete.
    I’m incomparably pleased with MT’s search capability, though.

Comments are closed.