Recent Posts

Recent Comments

Archives

Pages

A reader’s delight

Posted: February 15th, 2008, by Michael Jensen

I was Googling for something completely different today, using four terms and a “quoted phrase,” and had pared down the jillions to only 38 results. At the bottom of the first page of results was an oddity: My Favorite Books. I happened to notice the url:

http://infolab.stanford.edu/~sergey/booklist.html

And thought: Stanford, Sergey…. and clicked on it. And yes, it’s a 1998 looong list of “Sergey Brin’s favorite books,” from his Stanford days. His 1998 Web page is accessible from that page, where it becomes clear this long list is something he used for “Extracting Patterns and Relations from the World Wide Web,” given at the WebDB Workshop at “EDBT ‘98.” His home page is a charming little thing, fresh with the newness of the Web.

And so now, I link to it here. Not like Sergey needs links — but it is an example of the “search net” phenomenon. Because I was using four terms and a phrase, my specificity enabled serendipitious discovery, of a substantive chunk of content.
This is worth thinking about by publishers, because increasingly, searchers/researchers are using strategies like I did, to make sense of the density of the underbrush of the abundant Web. If that’s the case, and if encouraging “stumbling upon” our books is a good thing, then it behooves us to make our content indexable, one way or another.

At the National Academies Press site, we include, on the first page of every chapter (the books are presented page-by-page) , the full unformatted text of the first 10 and last 10 pages of that chapter. We include key phrases extracted from the chapter. And by doing this, we provide a huge, juicy target for search engines to slurp up.

Consequently, if someone’s putting in three, four, or five terms into Google or MSN or wherever, and those terms happen to be in our chapter, then we’ll show up in the search results, and get that wee bit of traffic. And a wee bit of opportunity to sell that book to someone who might be interested in it (note: only 0.24% of visitors currently buy anything from our site).

But those terms would almost certainly not all be in the book’s metadata, or in the publisher’s catalog blurb, or in the table of contents. It’s only openly indexable content that will provide a big enough pool of possibilities to match ever-more-esoteric and -specific search strategies: ‘net casting of a paragraph or a document, selectable groups of terms, phrase-pair searches, etc.

I’m still convinced that for small-market publications in particular — the kinds of books that are generally hard to justify significant promotion of — openly indexable content is a precondition for survival, in terms of long-tail backlist success in the scholarly environment. People find something, link to it, and thus promote it for free, for us, in the venues that care about that publication.

This theme pertains a bit to my comment on Joe’s Baby and Bathwater post, on the University of Pittsburgh Press’s digital library experiment, though alas, I don’t think that UPP’s library provides any indexable content. Even rough OCR would help, and I hope it’s part of their plan, eventually.

Comments are closed.