I Can’t Believe It’s Not Bodah
A hilariously terrible first date (via Universal Hub.)
A hilariously terrible first date (via Universal Hub.)
To keep track of people’s birthdays, I use Mac OS X’s1 Birthday Calendar feature of Address Book/iCal. I was going through my calendar the other day, and I noticed that a birthday which I knew was sometime in January wasn’t showing up. It was on the corresponding Address Book contact, though. I deleted the birthday from this contact and reentered it, which fixed that entry, but on the suspicion that more birthdays might be missing, I flipped through my calendar and found:2

The Address Book birthday field has the misfeature that it forces a year to be specified.3 What a rude thing for Address Book to be asking! Anyway, I’d arbitrarily picked year 14 for the year for any contacts whose birth years I didn’t know. Maybe, I thought, the Gregorian reform was throwing things off. However, changing the year to 1900 didn’t help matters, and in fact made them worse:

Turning the birthday calendar off (which wipes out iCal’s backing store for the calendar) and on didn’t help matters. A web search turned up some other people having the same problem, but the only useful solution they came up with was deleting and recreating entire contacts by hand.
I wanted to see if the raw data was wrong in Address Book’s database. Address Book uses Core Data in a way that makes the database difficult to work with at the SQLite command-line level, so instead I hacked /Developer/Examples/Python/PyObjC/AddressBook/Scripts/exportBook.py to emit the birthday field by adding ('Birthday', AddressBook.kABBirthdayProperty) to FIELD_NAMES and the following to encodeField:
elif isinstance(value, AppKit.NSCalendarDate): return value.descriptionWithCalendarFormat_("%Y-%m-%d")
It turns out that a number of entries had negative years, e.g. -1900-03-23 instead of 1900-03-23. I’m not sure how this happened, but here’s a script to fix it:
#!/usr/bin/python """ Fix negative birthday years in Address Book. This work is hereby released into the Public Domain. """ import AddressBook import AppKit def personName(person): return "%s %s" % ( person.valueForProperty_(AddressBook.kABFirstNameProperty), person.valueForProperty_(AddressBook.kABLastNameProperty) ) def formatDate(date): return date.descriptionWithCalendarFormat_("%Y-%m-%d") def fixBirthday(birthday): year = int(birthday.descriptionWithCalendarFormat_("%Y")) if year < 0: return birthday.dateByAddingYears_months_days_hours_minutes_seconds_( -year * 2, 0, 0, 0, 0, 0) else: return None def fixPersonBirthday(person): birthdayProp = AddressBook.kABBirthdayProperty birthday = person.valueForProperty_(birthdayProp) if birthday == None: return fixedBirthday = fixBirthday(birthday) if fixedBirthday != None: print "Fixing up %s: %s -> %s" % ( personName(person), formatDate(birthday), formatDate(fixedBirthday) ) person.setValue_forProperty_(fixedBirthday, birthdayProp) book = AddressBook.ABAddressBook.sharedAddressBook() for person in book.people(): fixPersonBirthday(person) book.save()
10.5.1, MacBook Pro Core 2 Duo ↩
Names have been changed to protect the innocent. ↩
There’s also an implementation flaw; I have my date format set to YYYY-MM-DD, and when I try to enter a year in the field, whether or not pressing a number on the keyboard will actually result in a digit appearing in the input field appears to be random. It also behaves very weirdly if there are four digits in the field already and I press another digit. I wish I could get a video of all this, but it’s not quite worth the effort of taking a screencast and a video of my fingers on the keyboard and then splice them together… ↩
Anno Domini, not Anno Antidomini ↩
What’s in a name? The answer turns out to vary quite widely around the world. When an English-language form, either electronic or paper, asks for a person’s name, it usually provides separate fields for first and last name, and sometimes middle name or middle initial. Aristotle Pagaltzis linked to a post by Jim Clark on Thai names, demonstrating that this approach, or even the alternative “given name, family name”, falls down pretty quickly outside the English-speaking world.
Thai names consist of:
The obvious mapping of Thai name components onto English, (given name, family name, chue len) → (first name, last name, nickname), doesn’t work very well. Consider the Thai name Thaksin Shinawatra, chue len Meow, the former prime minister. His (romanized; more on that later) legal name is Thaksin Shinawatra. If addressing him politely, I would refer to him as Khun Thaksin.1 Note that this is {honorific} {given name}, not {honorific} {family name}; in other words, Mr. Matthew as opposed to Mr. Sachs. His friends and family will call him Meow, not Thaksin or Shinawatra.
A further wrinkle is that when sorting a list of Thai names, the given name, not the family name, should be the sort key. Then there’s also the matter that Thaksin Shinawatra, aka Meow isn’t really the gentleman’s name at all; it’s ทักษิณ ชินวัตร, aka แม้ว. There are several standard romanizations for Thai, and whichever one the named individual prefers is considered canonical. There are also other quirks involved in the Thai script form of a name, like the lack of whitespace between the honorific and the given name.
Then there are the whole sets of different requirements for other kinds of names. The comments on Jim Clark’s blog entry, and this post by Richard Ishid, who’s in charge of i18n issues for the W3C, give some other good examples.
How much of this do we really need to worry about? When I say that Thai names should be sorted by given name, should, of course, is a horribly loaded term. If an American border control agent pulls up a list of people who have entered the country at a particular point, they probably want the sort key to be Thaksin, not Shinawatra. Mapping (given, family) → (first, last) is also probably fine for this application. So when, exactly, does the extra information need to be preserved?
Some reasons that a system might be interested in a name, or parts of a name, are:
For most English applications that don’t cater to a large international audience, it might be “good enough” to either simply have a flat name field where users can either enter arbitrary names or at least their romanizations.2 A flat name field is much more flexible. Since you probably need to support substring searches anyway, it doesn’t lose anything as far as searching’s concerned.
If you want to sort by last name, or communicate with other systems that take a (first name, last name) tuple, it might be good enough to just split off the last whitespace-separated token and treat that as the last name.3 If that’s not good enough, a pair of (first names, last name) or (given names, family name) inputs may be called for, but characters such as spaces and apostophes (O’Flannagan) should be valid. If your application wants to try to automatically derive a secondary form of address from the name entered, maybe it shouldn’t. Is the ability to have form letters say Mr. Sachs as opposed to Matthew Sachs really worth the faux pas of Mr. Shinawatra? I guess it depends on how international your audience is; you could always ask for multiple forms of address.4
For applications that want to really get localized names right, like a system-wide address book or a global social networking site, a more complex approach is called for. For instance, the Mac OS X address book framework knows about the address formats for various countries; it could extend that functionality to support different name formats. It has some rudimentary support for this, in that an individual address book entry can have a set of name ordering flags associated with it, either first name first or last name first (sic); name fields are fixed at title, first name, middle name, last name, suffix, nickname, maiden name, and phonetic (first, middle, last) name.
Per-country address format support doesn’t change which fields exist,
but it changes the order they’re displayed in. Per-country name
format would need to be more complicated. A Name (which a person might have more than one of with different NameFormats) might consist of:
NameFormat, defining the (country, language) associated with the name (e.g. en.US and the set of available NameComponent)NameComponent, Value, (optional) PhoneticValue)
The system could provide functions like:int Name.compareWith(Name)String Name.representation(NAME_REPRESENTATION) where NAME_REPRESENTATION is one of:
LEGAL_NAMEFORMAL_NAMESHORT_FORMAL_NAMEINFORMAL_NAMEVERY_INFORMAL_NAMEName Name.convertTo(NameFormat) would try to convert to a different name representation using automated rules for things like romanization.Khun is a generic honorific roughly akin to Mr./Ms./Mrs. There might be a better one to use for a (former) Prime Minister. This list includes ones for teacher, aunt, sister, older person, and younger person, but suggests that khun is always used when addressing someone formally. ↩
In part two of his post Mr. Ishid recommends that applications that expect ASCII input specify it; detecting and erroring on input in unsupported scripts is probably sufficient. ↩
It might be worth having a list of tokens which will also get treated as part of the last name, such as de, with this approach. ↩
“Enter your name and how you’d like to be addressed:” ? ↩
I’d set up a Trac installation for wedding planning, instead of using MediaWiki (the system Wikipedia uses, which I already had a couple of installations of) since we wanted both a wiki (venue data, possible honeymoon destinations, guest lists… shut up, it’s useful!) and ticket system (useful for tracking things like thank-you notes and being able to assign specific ones to either Liz or myself).
However, Dreamhost doesn’t support mod_python, so pages were taking way too long to load. I decided to switch over to MediaWiki for the wiki part and just use my existing Bugzilla installation for ticket tracking. Hence, a new script over on the code page, trac2mw. Our wiki was fairly tiny, so caveat user. I didn’t bother having it migrate tickets tickets or attachments, since we didn’t have any data there that was worth preserving. The input format, a MySQL XML dump, probably isn’t ideal for a lot of people (since Trac runs on SQLite by default.) It does fix up the wiki page syntax (the parts of it we were using, at least), though.
A New York Times blog post on holiday tipping linked to a gem from the Times archives, its own ancestor from 1911.
The most striking feature of the article, which appeared on page six of the magazine section, is the large political cartoon-like illustration in the center (drawn by Reginald Russom, who evidently went on to help found what later became the Australian Cartoonists’ Association.) From what I’ve noticed, while the Times Magazine still employs plenty of illustrations, they’re mostly charts and graphs; when there’s a lead image that’s not a more or less realist photograph of the article’s subject, it tends to be a photo like this one.
I love how one old newspaper article can shed light on:
Maybe this is still routine in Manhattan, at least in the more highfalutin co-ops, but I also found it noteworthy that the building’s management was expected to send you candidates if you wanted to sublet your apartment (but watch out; if you anger your super by not tipping around Christmas, he might send “several negroes and a Chinaman” your way!)
When I first got Times archives access (by subscribing to TimesSelect back in the day), I trawled the archives, there’s a lot of good stuff there. If anyone else has a favorite, I’d love to hear about it in the comments.
I love programs with a sense of humor. Note the trademarked error detail name in this screenshot from NetNewsWire:

(Not to toot my own horn, but you might like the old Zevils 404 page if you’re into that sort of thing (good luck finding the new Zevils 404 page, most attempts will just redirect get you the Wordpress index (I should figure out how to change that… Update: The solution was to create a 404.php page in my theme’s directory.)) 404lounge.net has a bunch of the things, and The Daily WTF’s Error’d! chronicles the unintentionally humorous.)
Francis Heaney pointed me at this post about things that I can’t believe aren’t “I Can’t Believe It’s Not Butter”, and their amusing names, such as “Butter It’s Not!” and “You’d Think It’s Butter!” My suggestions:
I’ve been enjoying The Superest, an ongoing game of “My Team, Your Team”; one player draws a superhero, the next draws a superhero that can defeat that one, repeat. (Via John Gruber.)