Friday, July 21, 2006


My InfoWorld Podcast with Jon Udell

This week I did a phone interview with Jon Udell, the InfoWorld columnist and prolific blogger, to talk about Document Engineering {and, or, vs} bottom-up tagging, XML schemas {and, or, vs} microformats, business process and business information patterns, the unique challenges and opportunities of the university computing environment, and lots of other interesting topics…

You can replay the interview as a podcast from here.

The highlight of the interview for me was when Jon volunteered that some people say he’s a Document Engineer. I told him that I agreed, and that maybe we’d give him an honorary degree if he’d only stop saying that XML was self-describing. I think I convinced him -- someone who stresses the importance of metadata (check out the "InfoWorld Metadata Explorer" that he built) -- that if XML were really self-describing you wouldn’t need any of it.

I will probably write about some of the topics that Jon and I talked about. But in the meantime, listen to the podcast.

-Bob Glushko

Thursday, July 20, 2006


The Joy of SOX

I'm writing about a book with this title… and there’s no misspelling here. It will help if I add that the subtitle is "Why Sarbanes-Oxley and Service-Oriented Architecture May Be the Best Thing That Ever Happened to You." Its author is Hugh Taylor, the VP of Marketing at SOA Software, whom I met a couple of months ago at the OASIS Interoperability Conference in San Francisco.

The Sarbanes-Oxley Act was enacted by the US Congress in 2002 to curb corrupt business activities and fraudulent accounting practices like those of Enron and WorldCom. SOX requires firms to implement adequate internal control structures and procedures and attest to their effectiveness. The essence of SOX for someone with my perspective is that a firm needs accurate information about anything that affects its financial statements, and the best way to capture and maintain that information is by automating business activities and internal operations. So that's why I talk about SOX in two of the courses I teach at UC Berkeley's School of Information: Information Organization and Retrieval and Document Engineering and Information Architecture.

Much of the writing about SOX is impenetrable, filled with accounting and business jargon. But "The Joy of SOX" reads almost like a novel, because Taylor has brilliantly written it as a comprehensive case study of a fictitious company’s efforts to deal with SOX. So Taylor's CFO character explains aspects of financial controls and reporting, his CEO and COO characters explain the interdependence of business strategy and controls, and his CIO character explains how computing infrastructure and software development practices shape and are shaped by the controls and strategy.

I especially enjoyed (and so will my students, because now my lectures on SOX will be more concrete) the many examples of how controls, business models, and information technology come together. For example, the case study firm doesn't have a uniform product coding standard, which makes it hard to track inventory and transactions, and this problem is made worse by its practice of buying closeout inventory from suppliers. Another example shows how a good policy for managing employee passwords and access privileges is worthless without policy enforcement and change management processes.

This book enabled me to finally understand some of the arcane details of compliance, just as accountants and business people who read this book will be able to understand service-oriented architecture, enterprise integration, and business process specification languages.

In addition to being hard to read, most of the writing about SOX presents it as a necessary evil to prevent worse evils from being done to unsuspecting investors or other stakeholders in a business. No question that SOX is causing increased spending (some say excessively so) in document and records management, security, business process management and document engineering as companies define, document, and automate the processes that are needed to run the company while enabling auditing and timely reporting. Some of my former students who are working for IT consulting firms are saying that SOX is like "Y2K that won’t go away" or a "full employment act" for them.

Again, here's where The Joy of SOX is unique. Taylor argues against the standard "lose-lose-lose" proposition that most people see in SOX:

If you comply, you may harm your ability to be agile and stay competitive

If you don't comply, you could go out of business (or go to jail)

If you make an empty effort at compliance, you may pass through the process but merely bury company-killing problems (and spend a lot doing so).

Instead, Taylor argues for "agile compliance," urging firms to treat their SOX efforts as an investment. This approach relies on service-oriented architecture, business process specification languages, and so on. He makes a very compelling case.

If you want SOX, buy this book.

-Bob Glushko

Wednesday, July 12, 2006


Terrorist Threat Markup Language, part 2: The Blame Game and Semantic Illiteracy

The news about the poor information quality in the "terrorist threat" database of the Department of Homeland Security is producing the expected fallout. States like New York that complied with the specifications and submitted carefully-prepared lists of plausible targets -- and got stiffed as a result when DHS passed out funds -- are complaining about that. And of course, states like Indiana, whose list includes a popcorn factory, a petting zoo and a flea market -- are trying to justify why they deserve disproportionate shares of funds to protect against terrorist acts.

But what interests me is that both New York and Indiana are saying the same thing, and they're both wrong. As I pointed out in my previous post, the DHS provided state and local officials with "Guidelines for Identifying National Level Critical Infrastructure and Key Resources" that included detailed definitions, classification criteria, and requirements for how to describe each asset.

Nevertheless, Rep. Caroline Maloney,a New York congresswoman complained on National Public Radio that the threat database shows her state with only two percent of the nation's banking and financial assets, somewhere between North Dakota and Missouri. Her explanation:

"It appears not to have any standards or definitions of what should be on this list."

And likewise, Peter Beering, Indianapolis "terrorism preparedness coordinator"
blames federal officials
for not defining what assets should be protected:

"If you can't define it, if we can't agree on the definition of what a thing is, then we will never be able to count how many we should be worried about."

Another Indianan, Pam Bright (spokeswoman for Indiana's homeland security department), also blames the feds for Indiana’s inclusion of petting zoos and flea markets:

"I don't think there was a clarification as to what assets were, so every state had a different version of what they were supposed to submit."

I can understand why Congresswoman Maloney and other New Yorkers think they got ripped off because the DHS didn't have any rules, but there WERE rules (see Appendices D and E in the already-infamous report)-- they just didn’t enforce them. And the Indiana folks are clearly suffering from semantic illiteracy. They must have convinced themselves that they had a sensible definition of "terrorist threat" when they submitted their lists, and just couldn't imagine that other people might understand it a different way.

The only other possible explanation is that out in Indiana they were scheming to get an unfair amount of taxpayer money from the Homeland Security pork barrel, and that just wouldn't be fair.

-Bob Glushko


Needed: Terrorist Target Markup Language

The Office of Inspector General for the US Department of Homeland Security has just issued a scathing criticism of the National Asset Database. The NADB is supposed to be a comprehensive list of vital systems or locations whose destruction would have a debilitating impact on security, public health, the economy, or even morale and confidence. Unfortunately, the Inspector General's review shows that the NABD inventory contains many "non-critical assets" and thus can’t support the resource allocation and risk assessment for which it was commissioned.

Many news stories and commentaries, like that on page 1 of the 11 July New York Times, whose title is "U.S. Terror Targets: Petting Zoo and Flea Market?" or in the "Homeland Stupidity" blog have focused on the contents of the NADB:

For example, the inventory includes 4,055 malls, shopping centers, and retail
outlets; 224 racetracks; 539 theme or amusement parks and 163 water parks; 514 religious meeting places; 4,164 educational facilities; 1,305 casinos; 234 retail stores; 127 gas stations; 130 libraries; 335 petroleum pipelines; 217 railroad bridges; 140 defense industrial base assets; 224 national monuments and icons; and 8 wind power plants.

(The NADB also includes 159 cruise ships and 34 Coca Cola bottlers/distributors).

But joking about the contents misses the far more important concern emphasized by CNN that the NADB is too flawed to determine allocation of federal security funds, supporting complaints by New York City, Washington DC and other cities that they are being shortchanged.

And now I am going to look at this news from a Document Engineering and Information Architecture perspective. Why did it happen, and how could we prevent this from happening again?

We wouldn't be surprised that the NADB contained bad information if the DHS hadn't provided state and local governments with any criteria or specifications. But that's not the explanation. Two years ago the DHS Office of Infrastructure Protection provided "Guidelines for Identifying National Level Critical Infrastructure and Key Resource" (included as an Appendix of the recent IG’s report) that included detailed definitions, classification criteria, and requirements for how to describe each asset. The Guidelines include a taxonomy with 17 first-level categories, and scores of subcategories, and also specify the information components needed to describe each asset such as state, address, sector, owner, owner type, phone, local law enforcement POC, and latitude and longitude coordinates.

Here, for example, is some of the guidance about Chemical assets:

1. Sites that could cause death or serious injury in the event of a chemical release and have greater than 300,000 persons within a 25-mile radius of the facility.
2. Economic impact of more than one billion dollars per day (e.g., an event impacting multiple sectors and cumulatively cause this amount of economic damage).
NOTE: The term "sites" includes manufacturing plants; rail, maritime, or other transport systems; pipeline and other distribution networks; and storage, stockpile, and supply areas.

Nevertheless, despite this guidance, states and local governments submitted assets that didn't follow the specified formats, were incomplete, were duplicates, and in the case of Puerto Rico, were in Spanish! All of this reflects some mixture of incompetence, negligence, and political calculation to get more than a fair share of Homeland Security funds.

But suppose that the DHS had encoded these narrative specifications in an XML vocabulary called "Terrorist Target Markup Language" and required all asset submissions to conform to it. TTML would have made it possible to detect most of these problems immediately when they were submitted, and the standard organization and format of the data would have enabled additional data mining to detect anomalous information.

This isn't a far-fetched suggestion. There are numerous XML standards activities underway in the homeland security domain, including biometric data exchange, common alerting protocols, and emergency response.

-Bob Glushko

Wednesday, July 05, 2006


Lobsters in Louisville

A recent issue of the Economist (15 June 2006) has a collection of interesting articles about logistics (something I’ve posted about before), including one with the statistically improbable title of "Just-in-time Lobsters." The main point of that story is that Clearwater Seafoods, based in Bedford, Nova Scotia, ships 30,000 pounds of live lobsters each week from a warehouse in Louisville, Kentucky. This is an odd location for lobsters but makes perfect logistical sense because Louisville is the main hub for UPS, which ensures that when you buy lobsters on the web they can be anywhere in the world the day after they leave Louisville.

Of course, the lobsters aren’t from Kentucky, they’re from Nova Scotia. So the lobsters have to be first shipped by truck to Louisville, and you might wonder why it is worth the bother to do that. But that’s where document engineering issues come in. Clearwater used to ship live lobsters to the US from Canada, but each package required numerous documents. A truck full of lobsters is essentially a giant package, so Clearwater saves a lot time filling out forms to cross the border.

By the way, the documentation requirements for lobster shipping are pretty simple as things go. My Document Engineering course syllabus includes a report by the Australian government on "Paperless Trading" that has this staggering observation about the ridiculously complex information architecture for international transactions:

According to the United Nations Conference on Trade and Development, the average international transaction involves 27 to 30 different parties, 40 documents, 200 data elements (30 of which are repeated at least 30 times) and the re-keying of 60 to 70 per cent of data at least once.

-Bob Glushko

Tags: DocumentEngineering, Logistics

This page is powered by Blogger. Isn't yours?