My Impressions of Google Wave

Google Wave was in the news the last week, Sept. 30th was the date that platform was opened to the wider developer community. The actual big news event was last May when they presented a demo at their developers conference. The video of that demo is an hour and 20 minutes long, but quite illuminating. I sat down over the weekend and watched it, expecting it to be a slog, but it was quite entertaining.

While watching it, I took some notes, and what follows is a digested version of my immediate responses to the demo.

First off is that all of these web services have on main flaw, a single point of failure, i.e., network connectivity (either yours or Google’s). If, for instance, you use Google hosted services for your Wave conversations, if your Internet is down, you’re dead in the water. Of course, if your Internet is down, you also can’t receive email, so perhaps that’s not so big a deal, but the advantages of Wave come in the real-time and near-real-time collaboration, whereas email suffers very little from the latency problem that a local internet connection failure imposes.

Google certainly has big pipes to and from their servers and lots of redundancy, but they occastionally do have failures in some of their apps that cause them to be unreachable or slow. But consider if you decide to run your own Wave server — likely you would do so on a commercial hosting service rather than on your local office’s servers, but either way, you’re again in the situation where a crucial app is network-based, and only as reliable as the networks you depend on.

Embedding the Wave in a Blog:
Is just anyone allowed to participate in a WAVE embedded in the blog, or do you have to have user authentication in place?

SPELLCHECK DEMO
Brilliant application of their Google search spelling algorithms, but how often does it actually fuck up?

IM MODEL
The default is edits are immediately visible, indeed, they haven’t even built the feature to hide immediate updates. If you have a 10% rate of comments that you don’t want others to see, how many times will you end up accidentally sharing something you don’t want others to see until you’ve finished it? It seems to me, this makes it necessary to be very aware of the nature of the communication before you initiate it, and with certain people you’d want the default to be HIDE updates until SEND, while with others, you’d you’d want it to default to immediate.

A complicated problem, and one that will, I think, cause endless problems for end users — how many people pay for the email services that allow you to undo SEND in an email?

EXTENSIONS vs. ROBOTS
The presentation has confused me. I thought I understood the difference, but now I’m confused. They mentioned a distinction that a robot was server-side and an extension client-side, but the demo of Polly the Pollster seemed to obscure this — I feel less distinction now than I did before going into the example. Perhaps this because they’ve successfully abstracted the underlying technology so that to the user the difference is undetectable.

MORE CONFUSION
At the conclusion of the robot demo, Lars says “OK, that’s EXTENSIONS for you” (1:04), which just goes to show that I’m not the only one who is confused.

The difference is clearly server-side vs. client-side:

  1. an extension runs in the client. Updates to it get passed to the server as part of the wave and distributed through the normal wave distribution process.
  2. a robot is code running on the server that waits to see something pass by it in the wave that triggers its server-side behavior, whatever that may happen to be.
  3. robots have client-side UI elements that dump relevent XML into the wave that will then trigger the server-side action from the robot.
  4. thus, robots are client-side extensions that modify the wave plus a server-side process that reacts to those changes to the wave stream.

So, robots and extensions are *not* two different things. A robot is just an extension paired with server-side actions/

SECURITY/CONTENT CONTROL
While it might seem tempting to integrate a Wave into a website, the problem is the same with the Wave as with SideWiki — you don’t have control. It’s collaborative, so you have all the problems that come with collaboration, where everyone is equal and nobody gets veto power. That is, unless they’ve actually engineered it for superusers who have the privilege of editing the way, i.e., removing from the playback those things that they don’t want part of the permanent record.

IMPORTANCE OF DEVELOPERS
Interesting that the conclusion of the talk is Google’s realization of the importance of developers in making a platform successful. This is something MS always understood, something that Apple has only imperfectly understood, and something that this video shows Google is obviously coming to understand.

Why I despise Microsoft

I read in The Register today about Microsoft’s release of a plugin for Firefox that will allow you to view Open XML documents (MS’s controversial XML-based document format). But the article in The Register gave no download link, so I thought “Grrr. Annoying Register writers — don’t they have any sense?”

So, I went to MS’s download site, and put in “OXML Firefox” and got no matches. I tried some variations and got nothing. So, I went to Google and searched on “microsoft Open XML plugin for firefox” and expected to see a Microsoft.com link somewhere at the top of the search results. No dice — all the links were for third-party websites. So I went to a reputable one (ZDNet) and expected to find a link. Once again, as with The Register, no link at all.

Now I was getting *rilly* annoyed. So I saw a link that I’d missed at the bottom of the first page of results — it was a Microsoft press release and on MS’s website. “Eureka!,” I thought — “that will surely be it!” The press release itself offered nothing, but there was a list of links at the right and the first link was to “Open XML Document Viewer,” and so I thought “Eureka!” again. But when I went to the page, it wasn’t on MS’s website — it was an open-source project, and I didn’t think it could possibly be the right site for this well-publicized plugin, since it listed only 448 downloads.

So I went back to Google and visited the first link that Google had brought up, a website I’d never heard of, Softpedia.com (hence my skepticism in not going there first). It took me right to a download page, and I clicked the DOWNLOAD button. This (as is so often the case) took me to a second page that listed download sites, but there was only ONE download site, so I had trouble finding the link. Finally, I clicked it and started the download. In the meantime, I’d alread downloaded the viewer from the OpenXML Viewer Project’s website’s download page, and when the SAVE prompt popped up for the Softpedia.com download, I noted that the file name was the same as for the previous download. I renamed the file and then compared the two, and, of course, they were identical. *sigh*

This whole frustrating process left me with a number of questions:

  • Is MS trying to hide the fact that this is a non-MS project?
  • Are all the media outlets not providing a link because…um, well, er, because?
  • And why do download sites not have code that checks it there’s only one download site to choose from and automatically initiate the download from that single website, instead of offering the user the opportunity to “choose” the one site (which confused the hell out of me, because I couldn’t see the link).

One might get the idea that MS is not all that enthused about promoting this thing.

Oh, last lesson: always trust Google to give you the right answer at the top of the results page.

Is Google in Danger of Falling from the Top of the Search Engine Heap?

Robert Cringely posts an article today on the subject of Google’s $4.6 billion offer to buy the 700MHz wireless spectrum. In the course of writing about this, he incidentally makes an interesting assertion:

Bill Gates likes to talk about how fragile is Microsoft’s supposed monopoly and how it could disappear in a very short period of time. Well Microsoft is a Pyramid of Giza compared to Google, whose success is dependent on us not changing our favorite search engine.

Now, I’m not sure that’s Google’s only advantage — they have their fingers in a lot of pies, not least of which is ad delivery. But I’m more interested in the question of whether Google is or is not the best search engine. I did some testing recently, motivated by a James Fallows article about a study of search engines financed by Dogpile.com (a search engine aggregator). The conclusion reached by the study was that no individual search engine is providing complete results, so you need a search engine aggregator to get the full picture.

I don’t think the conclusions are correct, because the study’s methodology was based on unique URLs, rather than testing whether or not the results were useful to a human being or not. I sent the following email to Fallows (the spreadsheet referred to in the text is here — sorry about the awful MS-generated HTML, as I didn’t have the time to redo it properly).

There’s a couple of big problems with the story:

  1. if you run a search on Google, Yahoo, Ask and Live and then run the same search on Dogpile, the Dogpile results do not actually replicate what shows up in the search engines it’s claiming to include.
  2. Dogpile returns a lot of bogus search results.

Attached is a spreadsheet that tallies up what’s going on for “antiquarian music,” a search term of interest to a client of mine (I am their webmaster, programmer and IT support person). What it shows:

  1. Eight of the 20 results on Dogpile’s first page are IRRELEVANT to the sought-after results. All 10 of the results on the first page of the four search engines are relevant (though not all equally so).
  2. None of the Ask.com results are included, despite the fact that Dogpile’s search page claims that it’s searching Ask.com.

Now, about the individual search results:

Google is by far the most relevant. While it doubles up for two sites, all the other results are relevant, being legitimate antiquarian music dealers. The only exception is the last entry from the Ex Libris mailing list (antiquarian librarians), which is actually an announcement of a catalog the dealer listed #9, so if #9 is relevant, I think that one is, too — it certainly gives you information directing you to an antiquarian music dealer.

Yahoo includes two links to Harvard Library pages that are not useful (they aren’t selling anything), as well as a link to Theodore Front and Schott, both of whom are music publishers/distributors that no longer sell any antiquarian music materials. It also includes the Antiquarian Funks, a Dutch musical group, which obviously doesn’t belong, though it takes more than simple computer knowledge to understand that (though Google seems smart enough to figure it out!).

Ask.com adds Katzbichler, a music antiquarian in Munich who doesn’t appear in the top 10 results of others, but also includes a worthless link to antiquarian music books on toplivemusic.com, which has nothing at all on it that is relevant to the search. It also gives top billing to Schott, who really offers no significant antiquarian music materials. It also includes a link to a republication of the Open Directory’s (DMoz.org) listing for antiquarian music. These listings are republished all over the net and basically just replicate links already found in the main listings.

Live.com also includes the Schott link, as well as two links to the American Antiquarian Society’s page on sheet music. This may or may not be relevant, but that would depend on the individual user. I doubt someone looking for antiquarian sheet music would fail to leave out the term “sheet music” in a search, and someone looking for antiquarian music dealers would not be helped by these links. It also includes the Antiquarian Funks and the unhelpful open directory category listing.

So, in short, for this particular search:

  1. Dogpile
    a. misrepresents the results (it doesn’t include what it says it does).
    b. Dogpile does a worse job than any of the individual search engines in providing useful links.
  2. Of the individual search engines, Google provides clearly superior results, as it filters out several links that don’t belong (e.g., Schott, Theodore Front, Antiquarian Funks), though how Google knows such complex information is tough to say.

So, for this particular search, the conclusions of the cited study do not apply. I would expect that there are a number of such searches for which that is the case.

I cannot tell from the description of methodology what could cause this kind of discrepancy, but I am bothered by this on p. 11 of the PDF about the study:

When the display URL on one engine exactly matched the display URL from one or more engines of the other engines a duplicate match was recorded for that keyword.

The problem with that is that it doesn’t distinguish equivalent links that could differ. A deep-linked page might be just as useful to a searcher as a link to the home page of an entire website — this would depend on the type of website and the type of search. In the case of my spreadsheet, I counted all links to any of the websites as equivalent, no matter which page was linked, because for this particular search, that’s the way a human being would treat them.

So, I would say that this emphasis on unique URLs is going to skew the results for certain classes of websites and for certain types of searches. Yes, a search that takes you to a specific article on the Washington Post’s website is going to be much more helpful than a link to the paper’s home page, but for searches like my example, that’s just not the case.

Secondly, the emphasis on unique URLs would also not reflect different methods of the different search engines in eliminating duplicates. There can be more than one path to the same information, and if all the search engines do not choose the same path, those URLs would be counted by the study mechanism as different, rather than providing the exact same information.

This study was designed in a way that was guaranteed to make a meta search engine like Dogpile appear to be better. But that is simply not true because of the methodology used — it is a statistical ghost produced by over-reliance on computer-based determination of URL identity, instead of evaluating from a human being’s point of view for equivalent value in different URLs.