Monday, August 13, 2007


"Individual," "Cultural," and "Institutional" Categorization

Last week at the annual meeting of the Cognitive Science Society, I organized and led off a symposium titled "Semantics in the Wild" with Paul Maglio (IBM Almaden), Teenie Matlock (UC Merced), and Larry Barsalou (Emory University). Our goal in this set of talks was to challenge the field of cognitive science to take a more comprehensive view of categorization and classification. These fundamental cognitive activities have been thought about for millennia (e.g., Plato vs Aristotle on whether knowledge is objective or empirical) and studied in the psychology laboratory for a few decades (e.g., reaction time measurements on judgments of category membership, like comparing the time to answer "is a robin a bird?" vs "is a penguin a bird?"). The four of us got together for this symposium because our own perspectives on categorization are very different and complementary -- while I have spent many years doing "business semantics" and " "document engineering," Larry has published dozens of papers on categorization studies in his psychology laboratory, Paul has studied how people manage information inside of large organizations, and Teenie has taken a psycholinguistic approach.

Paul, Teenie, Larry and I pointed out that in today's world of ubiquitous computing and ubiquitous information resources, we interact daily as individuals and as participants in organizational processes with a bewildering variety of information types, and we constantly make choices about whether and how to categorize them. So we're proposing to broaden the scope of research on categorization to study the explicit activities by individuals to classify web resources (e.g., flickr,, …) and institutional efforts to define and deploy category systems to achieve business and organizational objectives.

Our fundamental claim is that these different kinds of categorization and classification activities or systems lie in a continuous multidimensional space where we can identify three important regions:

Cultural Categorization Systems

Individual Categorization (aka "Tagging")

Institutional Categorization (aka "Business Semantics")

CULTURAL categorization systems are the traditional subject matter for research. These are acquired implicitly through development via parent-child interactions, language, and experience. Formal education can build on this, but the non-formal cultural system can often dominate.

INDIVIDUAL categorization systems are developed by an individual for organizing a personal domain to aid memory, retrieval, or usage. These can serve social goals to convey information, develop a community, or manage reputation. Individual categorization systems have always existed, but they have exploded with the advent of cyberspace, especially in applications based on "tagging."

INSTITUTIONAL categorization systems involve the explicit construction of a semantic model of a domain to enable more control, robustness, and interoperability than is possible with just the cultural system. They are often the collaborative artifact of many individuals who represent different organizational or business perspectives, and they are usually developed via rigorous and formal processes (e.g., in standards organizations like OASIS, where I'm a member of the Board of Directors). Finally, institutional categorization systems require ongoing governance and maintenance because of continuous changes taking place in related cultural and individual systems.

We frankly admit that our thinking isn't fully developed, but it seems that there are many very interesting and important issues to study when you take this broader view of categorization. In particular, we see a number of dimensions or tradeoffs that define the space of categorization activities, such as:

Explicitness vs implicitness
Semantic rigor
Effort to acquire and use
Individual vs group goals
Amount of reuse of other categorization systems
Nature and rate of change over time

This fall in my Information Organization and Retrieval course at UC Berkeley, I'll be using this new framework, and I think it will help students understand better how tagging by individuals in flickr or compares to the "institutional tagging" of business information in standard product classification systems like the United Nations Standard Products and Services Code (UNSPSC) or business vocabularies like the Universal Business Language (UBL).

- Bob Glushko

Do you three perceive these as interdependent? Instead of thinking of these as a cascade, or spectrum of organization types, it seems to me they are circularly dependent on one another (and themselves).

Malcolm Gladwell talks about, what I think you are calling, cultural organization in his book Blink. It's an interesting layman's approach to demonstrating how the culture one grows up in directly affects one's ability to quickly sort words like house, woman, man and evil into categories.
I think about this topic all the time at my UN work. But I don't agree with the comment above. Obviously, everything is interdependent, therefore the point is moot unless you're trying to identify a specific process or lifecycle for some reason.

I wonder if actually there's another angle at this problem: looking at it more from two vectors. One vector would be from most personal to most collective. The other vector would have to do with degree of structure. This second vector is still the one I cant quite get my head around. One could also look at it as most individualized meanings vs. most multifaceted meaning.

I'll explain. So take the category "fire." You could say fire is a highly collective understanding from the most micro to macro societal groupings. However, it is also the most multi-faceted or unstructured. We use fire to denote many different things/actions/processes, and it has numerous connotations depending upon context.
