Friday, June 30, 2006


The Organization of Information about Information Organization

At the UC Berkeley School of Information I have the great privilege and responsibility of teaching a required course called "Information Organization and Retrieval" to all incoming graduate students in our master’s program. This program attracts a wonderfully diverse set of students, some right out of college with computer science degrees, some with a few years of information industry experience, and some with social science or humanities orientations. But this heterogeneity makes it challenging for the IO&IR course to establish the foundations and framework for the program.

To me it seems natural to teach information organization and information retrieval in a single course because they are inherently interconnected. We organize to enable retrieval, and the more effort we put into organizing information, the more effectively it can be retrieved. Likewise, the more effort we put into retrieving information, the less it needs to be organized first. This is the tradeoff embodied in the contrast between the library’s and Yahoo’s (original) approach of human classification of the web and Google’s computer analysis of its link and text co-occurrence patterns. We can analyze this tradeoff in terms of intellectual or computational investments made and the subsequent allocation of costs and benefits between the information organizer and the information retriever. And of course the relationship between these two parties is critical to the tradeoff, and sometimes they are one and the same, or they belong to the same company or social group, or have no knowledge whatsoever of each other. That’s why this is all such interesting stuff to think about and teach.

A year ago, when I was first getting ready to teach the IO&IR course in Fall 2005 I was a little surprised to learn that there isn’t any textbook that emphasizes this yin and yang of IO {and,or,vs} IR. Instead, there are books that teach IO, and books that teach IR. I guess that’s because the key concepts of IO -- categorization, classification, metadata, modeling, tagging, facet, thesaurus, ontology, information architecture, interoperability, integration … -- are more abstract and conceptual than those of IR, which are more technical -- indexing, weighting, filtering, crawling, clustering... You find IO books targeted for library and information science students and IR books aimed for computer science and computational linguistics students.

So for the IO part of the course last year, the text I used was Arlene Taylor’s
"The Organization of Information."
This textbook has been used in our school since 1998, when the IO&IR course was first taught, and it is undoubtedly the definitive text for students in library and information science programs. It emphasizes the foundational concepts and methods of bibliographic description and classification from these disciplines, and I thought that this would give some useful perspective to our students who almost too eagerly embrace new technology as "progress" or who are inadequately appreciative of the value embedded in these traditional approaches. But even though I used the recent 2004 edition, my students generally dismissed the Taylor book as reactionary and with few insights about current topics that most intrigued them, like social organization on the web, digital multimedia, or domain-specific metadata standards.

I’ve now spent nearly a month revising the IO&IR course for Fall 2006, and in particular I’ve been looking for a book to replace Taylor. My first candidate was Peter Morville’s "Ambient Findability," which I had high hopes for because Morville is a library scientist by training who evolved to co-author "Information Architecture for the World Wide Web," the popular O’Reilly "polar bear" book.

Morville says "ambient findability describes a fast emerging world where we can find anyone or anything from anywhere at anytime" and that’s a great theme on which to base a book. It is easy to read and my students would probably have liked it – but I just can’t use it as a textbook. Taylor comes across as tedious in her description of cataloguing and controlled vocabularies, but she’s rigorous and practical. Morville comes across as glib and shallow, with many clever examples but not enough detail to know how to do anything. To be fair, maybe Morville isn’t trying to write a textbook, but I suspect he’s capable of doing it so it is unfortunate that he didn’t.

I then discovered a wonderful and deep little book by Elaine Svenonius called
"The Intellectual Foundation of Information Organization."
Svenonius is an emeritus professor of Library and Information Studies at UCLA, and my first thought was that even though the title sounded perfect, the book would just be Taylor in a more theoretical wrapper. But to Svenonius

"much of the literature… is inaccessible to those who have not devoted considerable time to the study of the disciplines of cataloguing, classification, and indexing… It mires what is theoretical interest in a bog of detailed rules... This book is an attempt to synthesize this literature in a language and at a level of generality that makes it understandable to those outside the discipline."

Svenonius takes on the fundamental challenges of determining what to describe, describing it, classifying it, and ensuring that the descriptions and classifications will be comprehensible to others – and pulls it off. Now she’s not as readable as Morville, and as practical as Taylor, but I think that Svenonius is going to be good for our students. Some of them will go on to work for Yahoo and Google and so on, and they will appreciate that they had a chance to think deeply about these challenges about information organization before they faced the hard reality of designing, building and deploying information and applications that have to deal with them.

-Bob Glushko

Tags: BookReview, InformationOrganization, InformationScience, UCBerkeley

I agree with you about Ambient Findability, I found it fun, but not particularly deep or useful.

Recently I've been readming through Soumen Chakrabarti's Mining the Web which is one of the more approachable and practical works on retrieval I've read, though with a focus on hypertext discovery. It's also a bit mroe up to date than most of the other works I've read, which were written in the 90s.
Hi Bob,

I wrote about Morville and Svenonius's books in the same posting several months ago (Metadata since the nineteenth century).

Bob DuCharme
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?