Doc Or Die: June 2006

Friday, June 30, 2006

The Organization of Information about Information Organization

At the UC Berkeley School of Information I have the great privilege and responsibility of teaching a required course called "Information Organization and Retrieval" to all incoming graduate students in our master’s program. This program attracts a wonderfully diverse set of students, some right out of college with computer science degrees, some with a few years of information industry experience, and some with social science or humanities orientations. But this heterogeneity makes it challenging for the IO&IR course to establish the foundations and framework for the program.

To me it seems natural to teach information organization and information retrieval in a single course because they are inherently interconnected. We organize to enable retrieval, and the more effort we put into organizing information, the more effectively it can be retrieved. Likewise, the more effort we put into retrieving information, the less it needs to be organized first. This is the tradeoff embodied in the contrast between the library’s and Yahoo’s (original) approach of human classification of the web and Google’s computer analysis of its link and text co-occurrence patterns. We can analyze this tradeoff in terms of intellectual or computational investments made and the subsequent allocation of costs and benefits between the information organizer and the information retriever. And of course the relationship between these two parties is critical to the tradeoff, and sometimes they are one and the same, or they belong to the same company or social group, or have no knowledge whatsoever of each other. That’s why this is all such interesting stuff to think about and teach.

A year ago, when I was first getting ready to teach the IO&IR course in Fall 2005 I was a little surprised to learn that there isn’t any textbook that emphasizes this yin and yang of IO {and,or,vs} IR. Instead, there are books that teach IO, and books that teach IR. I guess that’s because the key concepts of IO -- categorization, classification, metadata, modeling, tagging, facet, thesaurus, ontology, information architecture, interoperability, integration … -- are more abstract and conceptual than those of IR, which are more technical -- indexing, weighting, filtering, crawling, clustering... You find IO books targeted for library and information science students and IR books aimed for computer science and computational linguistics students.

So for the IO part of the course last year, the text I used was Arlene Taylor’s
"The Organization of Information." This textbook has been used in our school since 1998, when the IO&IR course was first taught, and it is undoubtedly the definitive text for students in library and information science programs. It emphasizes the foundational concepts and methods of bibliographic description and classification from these disciplines, and I thought that this would give some useful perspective to our students who almost too eagerly embrace new technology as "progress" or who are inadequately appreciative of the value embedded in these traditional approaches. But even though I used the recent 2004 edition, my students generally dismissed the Taylor book as reactionary and with few insights about current topics that most intrigued them, like social organization on the web, digital multimedia, or domain-specific metadata standards.

I’ve now spent nearly a month revising the IO&IR course for Fall 2006, and in particular I’ve been looking for a book to replace Taylor. My first candidate was Peter Morville’s "Ambient Findability," which I had high hopes for because Morville is a library scientist by training who evolved to co-author "Information Architecture for the World Wide Web," the popular O’Reilly "polar bear" book.

Morville says "ambient findability describes a fast emerging world where we can find anyone or anything from anywhere at anytime" and that’s a great theme on which to base a book. It is easy to read and my students would probably have liked it – but I just can’t use it as a textbook. Taylor comes across as tedious in her description of cataloguing and controlled vocabularies, but she’s rigorous and practical. Morville comes across as glib and shallow, with many clever examples but not enough detail to know how to do anything. To be fair, maybe Morville isn’t trying to write a textbook, but I suspect he’s capable of doing it so it is unfortunate that he didn’t.

I then discovered a wonderful and deep little book by Elaine Svenonius called
"The Intellectual Foundation of Information Organization." Svenonius is an emeritus professor of Library and Information Studies at UCLA, and my first thought was that even though the title sounded perfect, the book would just be Taylor in a more theoretical wrapper. But to Svenonius

"much of the literature… is inaccessible to those who have not devoted considerable time to the study of the disciplines of cataloguing, classification, and indexing… It mires what is theoretical interest in a bog of detailed rules... This book is an attempt to synthesize this literature in a language and at a level of generality that makes it understandable to those outside the discipline."

Svenonius takes on the fundamental challenges of determining what to describe, describing it, classifying it, and ensuring that the descriptions and classifications will be comprehensible to others – and pulls it off. Now she’s not as readable as Morville, and as practical as Taylor, but I think that Svenonius is going to be good for our students. Some of them will go on to work for Yahoo and Google and so on, and they will appreciate that they had a chance to think deeply about these challenges about information organization before they faced the hard reality of designing, building and deploying information and applications that have to deal with them.

-Bob Glushko

Tags: BookReview, InformationOrganization, InformationScience, UCBerkeley

# posted by Bob Glushko @ 4:20 PM 2 comments

Friday, June 16, 2006

Thinking Outside the Box about the Box

From my house in the Berkeley hills I have a great view across the bay to San Francisco but I like to watch the foreground action in the Port of Oakland where towering Panamax cranes taller than most of the buildings in downtown Oakland load and unload huge container ships.

So when I heard about Marc Levinson’s book called "The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger" I immediately bought a copy.

The shipping container was invented 50 years ago, and Levinson makes a compelling case that by radically shrinking the cost of moving goods the container was a key enabler of globalization. It also completely changed the character of port cities, eliminating most of the work on the waterfront traditionally done by longshoremen and shifting it to places built to exploit containers; for example, Port Newark took over from New York City, and Oakland supplanted San Francisco. The container played a crucial role in the Vietnam War, where the huge buildup of the US forces was enabled through new container ports, and rather than coming back empty, containers were routed through Japan and filled with the "made in Japan" goods to start the import flood from Asia.

I am sure that stories from this book will make it into my courses this coming year on "The Information and Services Economy" and "Document Engineering." The container story is a great example of how technology and business models co-evolve, and while I usually focus more on information technology in that relationship, containers are now a key part of information-centric business models and processes too. You can bundle a battery and satellite transponder into small box that can be attached to a cargo container and report on its location, content, and condition from anywhere, even the middle of the ocean.

-Bob Glushko

Tags: BookReview, BusinessProcess, Logistics

# posted by Bob Glushko @ 7:53 AM 0 comments

Thursday, June 01, 2006

Online Banking Lock-in and the Double WAMU

Jon Udell's critique of his online banking service was posted just a few minutes after I had a frustrating experience with my own online bank. I have an account with Washington Mutual, but not by choice. WAMU bought the bank that bought the bank that bought the bank I started with 10 years ago when I moved to San Francisco, and it just seemed like too much effort to change banks even though the bigger the bank got the worse service I seemed to get.

And service has gotten worse with online banking. Today I was paying a stack of bills and after I'd entered about 10 payments I needed to check something so I stupidly hit the back arrow on my browser, which erased all my work. A minute after I tediously re-entered my payments the phone rang and I spoke for 10 or 15 minutes, just long enough for WAMU to give me more great online service by logging me off (I guess that's a double WAMU).

I would change banks tomorrow if I could somehow avoid having to re-enter the account and address information for the few dozen payees, because the thought of that wasted effort keeps me locked into a bank I have come to hate. I write and teach extensively about information modeling and interoperability and it just makes me angry to see how so often people get screwed by companies that don't give much thought to either, or that use proprietary data formats to lock them in.

Like Jon's bank, WAMU seems to make changes to its online banking that shift work to its customers. One that is particularly annoying is a recent change to the URL for the login page.

Old URL: https://login.personal.wamu.com/logon/logon.asp
New URL: https://login.personal.wamu.com/IdentityManagement/Logon.asp

So I needed to change the bookmark link. Some directive must have come down from on-high that required the programmers of the online banking system to put more emphasis on Identity Management rather than just logging-in. But couldn't they just use the same login page?

-Bob Glushko

# posted by Bob Glushko @ 11:57 AM 7 comments

An Insanity Defense for VA Data Theft

When I first learned about the May 3rd theft of a laptop with personal data on about 26.5 million U.S. military veterans, my first thought was that the Veterans Affairs employee who took the data home was an illiterate idiot who somehow hadn't read any news about all the recent laptop data theft incidents. My second thought was a reminder that personal data had also been stolen last year as a result of employee negligence at my own place of work, the University of California at Berkeley, and I wondered whether these events too often occurred in the public sector where people seem to have less accountability and more employment security than in the private sector. (Breaking news – they've fired some of the responsible people -- it's about time). But I didn't feel compelled to post about the incident because I didn’t feel that I had any unique commentary to offer.

But the latest news about this incident contains a twist that gives me something to rant about. The chief privacy officer of the VA, Mark Whitney, wrote an internal memo on May 5, just two days after the burglary, in which he attempted to downplay the significance of the data loss. His reasoning was that "given the file format used to store the data, the data may not be easily accessible." In other words, because the VA stores information in a proprietary data format, presumably tied to a single application, the thief won’t be able to make much use of it.

But the application that uses the data is probably also on the stolen laptop, or why would the employee have the data there? And in any case, it is pretty easy to find specifications for most statistical data formats (a typical compendium is this one at Carnegie-Mellon University). And lots of us could probably whip up a little script that transforms almost any format into something we could more easily use or sell.

So we have a kind of insanity defense here, or maybe two of them. A VA employee who copies 26.5 million records that he can’t use onto his laptop is clearly insane. But the VA is also insane if it stores information about veterans in multiple proprietary and incompatible formats. I thought it was motherhood and apple pie to "create a single view of your customer."

-Bob Glushko

# posted by Bob Glushko @ 9:18 AM 1 comments

Doc Or Die

Friday, June 30, 2006

The Organization of Information about Information Organization

Friday, June 16, 2006

Thinking Outside the Box about the Box

Thursday, June 01, 2006

Online Banking Lock-in and the Double WAMU

An Insanity Defense for VA Data Theft

About Me

Links

Previous Posts

Tag Cloud

Archives