Computerizing Pupils and Patients

The 15 May 2006 New York Times has a story titled "States Struggle to Computerize School Records" that reminded me of a post I made here a couple of months ago about electronic health records. Many states are trying to create a composite view of all the information about their public school students, including their courses, grades, test scores, attendance, disciplinary actions – all linked by a unique tracking number. Many of the efforts have failed to meet their functional, performance, usability, or budget requirements and for all the usual reasons.

Most large computerization projects are complex and challenging. But building a data system to collect information from all the schools in a state can be extraordinarily daunting, involving the integration of computer systems used in hundreds of districts, each of which may have multiple databases using distinct operating systems.

The only cause of failure unique to school systems seems to be that many of them simply lack the technological and process maturity to attempt this integration but have been forced to take it on by the No Child Left Behind Act of 2001, which mandates lots of reports from school districts to make them more "accountable" (I won’t get into the heated debate over this goal because it isn’t a document engineering issue).

The déjà vu part of this story is that like the computerized pupil records required by No Child Left Behind, the US government has also called for electronic patient records. Last November the Department of Health and Human Services began funding multi-million dollar pilot projects and even conservative estimators expect that it will take hundreds of billions of dollars to make EHRs happen for everyone in the US.

Maybe we could save ten or twenty billion bucks by recognizing that implementing electronic students and electronic patients have a great deal in common and are also likely to face similar non-technical challenges in their implementation and adoption.

Semantic Illiteracy

“Analyzing sets of possible values” is an important task in the design of information models and I emphasize it in courses and projects (it is also a section heading in Chapter 12 of my Document Engineering book). Specifying constraints on content is an essential part of specifying what something means and enforcing them is critical to interoperability.

But just as people often assign bad names to things or concepts, they often fail to analyze possible values, they specify them incompletely, or they just get it wrong. You could say that they suffer from semantic illiteracy.

A few weeks ago in the 10 April 2006 New Yorker’s “Talk of the Town” column David Owen wrote about the drop-down list for honorifics in the sign-up form for the Skywards frequent-flier program of Emirates airline. In addition to the usual Mr / Mrs / Ms/ Miss / Dr the article said that there are at least another hundred of them. Some of them are translations of the usual ones (Frau, Senor, etc.), but most are infrequent or even exotic, such as Dowager, King, Midshipman, Shriman, Swami, Sultan, The Very Reverend, Vice Admiral, Viscount, etc. These examples illustrate that honorifics can encode gender, age, occupation, organizational status, and cultural values. Languages and cultures vary a great deal in how they represent this information.

You might wonder whether Kings, Sultans, and Vice Admirals would ever fly on commercial airlines rather than on their private jets, but as a document engineer that’s not what interests me the most. So I made a brief tour of airlines to see how they handle honorifics and it was interesting.

I started with Emirates… and was astonished to see that instead of the long list described in the New Yorker article, the application form drop-down contained only Mr, Ms, Mrs, Miss, and Master. Maybe the Emirates webmaster was embarrassed by the article and changed the list. That might be the end of the story, except that my next stop on the airline tour was British Air, and I found the exact list described in the article. BA might even be the originator of the list, because that would explain all the British military, peerage, and Church of England honorifics on the list.

I then checked United, the airline that I fly the most often and for which I have amassed over a half million miles in the frequent flier program, where I am registered as "Dr Robert J Glushko" but which says "Robert J Glushko" on the plastic card they gave me. United’s drop-down for honorifics lists seven choices in this order: Mr Ms Mrs Miss Dr Hon Prof. I am also a member of US Air’s program, which uses check boxes instead of a drop-down menu on its registration form. US Air’s choices are Mr Mrs Ms Dr, which is both a shorter list (maybe fewer professors fly US Air) but more interestingly has Mrs before Ms, the opposite order than United. Can we infer anything about how United and US Air treat women as employees or customers?

Air France offers forms in French, where the honorific drop-down is a short list (M, Melle, Mme), and also in English (Mr, Miss, Mrs) – the US Air ordering for the two titles for women. Interesting twist for Air France is that you get to these different forms by choosing a country, not a language, and when you choose Canada the form defaults to French.

Singapore Air, which I’ve only flown a couple of times but really enjoyed, has an odd drop-down in its registration form. Its list of choices is Mr, Ms, Mrs, Dr, Miss, Master, Madam, Others -- but choosing Others doesn’t ask or allow you to specify what other title you go by. Does this mean that if I chose Others my plastic card would say "Others Robert J Glushko" -- I feel like joining just to find out.

Asian cultures are very big on honorifics, so I expected Japan Air to have a comprehensive list of honorifics. I was astonished to see none at all, just the simple First Name and Last Name. This was on the English-language site, and since I don’t read Japanese, I couldn’t check what they do on the Japanese site. But maybe the Japan Air forms designers are better document engineers than those working for the other airlines and they understand how tricky the semantics are here.

The New Yorker article tries to say this in a clever way:

Attempts at exhaustivenesss are inherently self-defeating; the longer a list, the more conspicuous its lacunae.

This isn’t the best advice. You should definitely be exhaustive in situations where there are standard code sets like ISO 3166 (Country codes) or ISO 4217 (currency codes). And when you can’t be exhaustive because of the distribution’s "long tail," I’d recommend that you be sensitive to the frequency of the values, cut off the low-frequency tail, and provide an OTHER category.

But I think there’s a larger message here. The heterogeneity of approaches here for what might seem to be a straightforward information modeling task shows that many people just don’t realize how difficult it is to be precise about what something means. We emphasize "computer literacy" (desktop applications and web surfing) but I’ve never heard anyone fret about how poorly people name and define the things and concepts that their computer applications capture and process for them, which seems more important to me. We need "semantic literacy" or maybe even "ontological literacy" but maybe we don’t teach it because it is too hard to explain what they mean.

