How's that for a headline? I hope it's caught your interest. Strictly speaking, the headline should be "XML in the service of reporting crime", or better still, "XML in the service of making historical accounts of crime available and searchable online", but that doesn't have the same ring to it!
So what on earth am I talking about? Well, two hundred and thirty-nine years worth of reports of proceedings at London's Central Criminal Court, familiarly known as the Old Bailey, have just been made available online in a relaunch of the Old Bailey Online web site.
The "Proceedings", covering the period from 1674 to 1913, started off as privately published journalism and gradually developed into quasi official records. They were discontinued quite abruptly when the law was changed to make it obligatory to have an official court reporter make a verbatim record of every trial.
According to Professor Richard Shoemaker of Sheffield University, the records are a "treasure trove of social, legal and family history....Now everyone from schoolchildren and amateur historians to scholars working in a range of academic disciplines can have easy access to this wealth of information."
I find historical documents like these quite fascinating, particularly when they refer to my home town. I am a very loyal Londoner, and I know the City quite well. I am also a bit of an amateur genealogist and I was eager to search the records to see if any of my forbears were ever "transported for life" or worse. Searching through the records is what brings us neatly back to XML.
The Old Bailey Online site includes considerable information about the project itself, including details of how the original documents were digitised, and how the texts were then marked up using XML tags so that information could be categorised. I am very pleased that this information has been included. Many people are quite dismissive of markup languages, and believe that the availability of full text-search has made markup obsolete. This project makes a fascinating example of why structured markup is useful and important.
Tuesday, 6 May 2008
XML in the service of crime
Sunday, 27 April 2008
Where's that command gone?
I have been using Microsoft Word professionally for quite a long time - since Word 2 on Windows 3, if you want to get historical about it. Each time Microsoft have presented a major upgrade I've got a little annoyed - sometimes more than a little - because they keep moving the commands. Just when you get used to finding something on particular menu, they go right ahead and bring out a new version, and - where's that command gone again?
Microsoft Office 2007 brought in a huge redesign of the user interface, and there's been a lot of criticism because of it. People just don't like change. Worse still, from Microsoft's point of view, is that organisations and individuals have been slow to upgrade to this new version, because it looks and feels so different from its predecessors. I myself am still sitting on the fence, with Office 2003 on my desktop machine where I do most of my work, and Office 2007 on my laptop.
There is a lot of help available if you want to (or have to) make the transition from Office 2003 to Office 2007 - much of it on the Microsoft Office Online web site. One item that's particularly useful for Word users is an interactive tool that maps Word 2003 commands to their Word 2007 equivalents. (While you wait for it to load you might like to reflect on the irony that this tool has been built with Adobe Flash.)
I'm trying to share my knowledge and expertise as widely as I can, and because of this I've recently started a Microsoft Word Users Club on Ecademy, which is a social networking website for business and self-employed people. Ecademy is more than just an online network as there are regular real-life Ecademy meetings all over Britain and in many other countries as well. This new Ecademy group isn't in competition with the existing Word user lists and forums, it's just an extra way of spreading some useful information.
Friday, 25 April 2008
At last, some consideration for users
As I work extensively with companies developing computer software and hardware, I am proud to be a member of the British Computer Society (the BCS). I receive regular emails and magazines and occasionally attend local meetings. This week an email newsletter alerted me to a blog by John Morris on the BCS web site entitled "Data Migration and User Stories".
Morris writes in praise of "User Stories" used in Agile programming which are (and here he quotes from Wikipedia) "A software system requirement formulated as one or two sentences in the everyday language of the user". When I read this I didn't know whether to cheer or to weep. Here is a technical expert in a highly technical field who is at last paying attention to the fact that the system he is building needs to be used by real-world end users - not techies or geeks - to do real-world jobs. I suppose any consideration of user needs is a huge improvement on no consideration at all, so I think I'll err on the side of cheering.
But putting the needs of the real end user at the focus of technology projects is something everyone should all aspire to, and technical authors, usability specialists, and interface designers have been fighting on behalf of real end users for decades. It's been a fight because in most cases software developers and other technical experts dismiss our concerns because we're not programmers. The only things that are new about Agile "User Stories" is that a it's a cute name for thinking about real end users, and it's part of a popular development methodology which happens to be the latest trend.
I've not been a big fan of Agile, because in most of the implementations I've seen the daily scrums and the like focus on minutiae of code, and the big picture goals - helping real-world end users do their jobs - get lost. But if Agile development teams really do develop user stories, and really do keep referring back to them to make sure their project is on track, then there is a glimmer of hope.
Saturday, 19 April 2008
Death of a muse
I don't usually write about poetry, but I can't resist the opportunity to comment on a news item I heard this week. While there are arguments about the literary merits of John Betjeman's poetry, and about the new statue of him at St. Pancras station, I have always found his work readable and amusing, even if not profound or deeply meaningful. There is a certain kind of Englishness, at once both deferential and self-denigrating, that was exemplified by his poetry. He managed to show his love for England even while he was satirising it, and a particularly good example of this is shown in one of my favourite Betjeman poems, A Subaltern's Love Song, published in 1941.
The era this poem evokes was familiar to me in my childhood, not at first-hand (I'm not that old) but at second-hand. My mother and her sisters were, like this poem's heroine, young women in the Second World War, although their origins were closer to the urban working class than to the golf clubs and tennis courts of Surrey. But they loved this poem, the world it portrayed, and the way Betjeman could praise and mock in a single phrase.
I was therefore quite sad to learn that the woman who was Betjeman's muse for this poem and who really was called Miss Joan Hunter Dunn, passed away earlier this month at the age of 92. She has received an extensive and informative obituary in the Times, and has been the subject of comment in The Guardian and elsewhere. My mother and her sisters would have felt she deserved nothing less.
Saturday, 15 March 2008
Reading by numbers
I am indebted to Karen Schriver, author of Dynamics in Document Design, for posting a note to the Info-Design Cafe mailing list about a recent article in the Wall Street Journal about readability formulas.
In his article "Can you read as well as a fifth-grader? Check the formula" columnist Carl Bialik discusses the readability formulas included in word processing software such as Microsoft Word, and discusses the value of the mechanical application of such formulas. He has opinions from both supporters and detractors of readability formulas, and counts both Karen Schriver, and Professor J. Peter Kincaid, one of the original instigators of the Flesch-Kincaid formula used in Microsoft Word, amongst those who question the value of the formulas.
I recently read a far more sustained attack on readability formulas, and in particular on their "dubious use" by the UK's Department for Education and Skills (DfES), written by Martin Cutts of the Plain Language Commission. In Writing by numbers: are readability formulas to clarity what karaoke is to song? Cutts complains that public bodies like the DfES use readability formulas as part of their propaganda and ignore the obvious shortcomings of what he terms "crude" tests. He notes that the main problem with readability tests is that:[those] who apply them uncritically tend to assume that any 10-sentence [passage] with, say, 12 polysyllabic words is as good and clear as any other with 12 polysyllabic words. But its grammar and punctuation may be poor and its message muddled, ambiguous or misleading. Such findings are only likely to emerge after usability testing (not readability testing) or editorial scrutiny or both.
In an effort to offer an alternative to the flawed readability formulas, Cutts and the Plain Language Commission have published a Plain English Lexicon, available free of charge for download from their website. The lexicon helps you find out whether the words you write will be easily understood, by comparing their grade level in the US Living Word Vocabulary (LWV) and their frequency in the UK British National Corpus (BNC). Words that have low LWV grade levels and high BNC frequencies should be easily understood by readers on both side of the Atlantic, according to Cutts. As long as the spelling isn't too different over there.
Wednesday, 12 March 2008
On wranglers, and other fancy titles
What's in a name? More particularly, what's in the name of a profession? Some professions are easy to identify by a one word name: tinker, tailor, soldier, spy. Other professional designations are longer: civil engineer, cloakroom attendant, sagger-maker's bottom-knocker, or Lord Privy Seal. (Job titles can get silly - Lois Wakeman has collected some from her local supermarket such as "Oven Fresh Manager", and I have spotted nice ones in Social Services departments like "Teenage Pregnancy Team Leader".)
For my particular professional activity there is no single agreed term. I like to call myself an Information Design Consultant, because what I can do goes far beyond just writing the right words. But I have been called a technical writer, a technical communicator, an information developer, a technical author, a documentation specialist, or more fancifully, a font fondler and a member of the word police (and of the Word police as well). Scott Abel goes by another term - he calls himself the Content Wrangler.
At the University of Cambridge a wrangler is a student who gets first class honours in mathematics; it's also the name of a brand of jeans; and in the US in particular it's someone who handles animals, particularly cattle and horses, professionally. I think it's this meaning Scott had in mind - a content wrangler herds words and content elements together, selecting the best ones and coralling them into the places they need to be. Not an easy job, but immensely satisfying if done well. Scott is a content management specialist, a conference organiser, and a first class speaker himself, and his Content Wrangler web site and newsletter are extremely popular amongst us technical writers/authors/communicators.
Less than two weeks ago Scott launched a social networking website for anyone interested in "content wrangling" called The Content Wrangler Community, and yes, it is one of the groups on Ning that I was invited to join last week.
According to Scott:The Content Wrangler Community is the new social network dedicated to people who value content as a business asset, worthy of being effectively managed. This is the place where technical communicators, medical and science writers, marketing pros, content management gurus, indexers, online community managers, document engineers, information architects, localization and translation pros, e-learning pros, taxonomists, bloggers, documentation and training managers, and content creators of all types hang out. It's much more than a blog. It's a place to join peers, to share, to collaborate, to contribute, to find information.
"Social networks are about connecting people and ideas," said Scott Abel, manager of The Content Wrangler Community. "Web-based social networks are the natural evolution of the web from a passive broadcast medium to a multi-directional communication platform that more closely supports the way humans interact in the physical world. We congregate. We join others like us. We interact with birds of a feather. Until the advent of social networking tools, the web failed miserably to connect people in meaningful ways."
I think this community is a great idea, and so do about 680 others, at the last count. I certainly need all the help I can get when I am "wrangling" words and pictures and information content into the right size and shape and format for my clients' needs, and I am sure I'll get a lot of inspiration here.
Tuesday, 4 March 2008
Do-it-yourself social networking
In the last week I have had two different invitations to join two completely unrelated social networks, but both are hosted on the same service - Ning. Ning offers ordinary mortals - people who wouldn't know where to start if they were told to configure their own webserver - a chance to create their own social networks.
This sounds like a great idea. People can start their own networks for their own interests - vintage cars, stamp-collecting, train-spotting - or their own business needs - customers, distributors or suppliers.
Ning is a clear example of a "Web 2.0" phenomenon - distributed control, open access, and user-generated content. (I am actually a sceptic about whether there is anything new in "Web 2.0". It could just be all marketing hype.) But there are two big dangers inherent in all this. The first is related to quality. The content you are reading might not actually be of any value, and might easily be bogus or deliberately misleading. How can you verify the credentials of the person whose page you are reading? In a commercial environment, a successful Wiki has participants from all levels of the company, not just the geeks. Knowing that the CEO is reading what you write can help keep you on track. And in non-commercial environments, you need to achieve a high level of participation for a user-generated knowledge network to be self-regulating.
The second problem is quantity. I barely have time to read my email, and once I start looking at the networking sites I can lose hours of productive time without noticing it. There aren't enough hours in the day to keep track of everything, which means I constantly need to make decisions about what messages to open, what links to follow and what articles to read. Sometimes, it feels easier just to switch off.

