Switching Finks

One of the open-source projects I contribute to is Fink, a package manager for OS X; if you’ve used apt-get or yum on Linux, it provides a similar facility, allowing you to install, say, GnuPG by running fink install gnupg. It installs things into its own directory tree, rooted at /sw by default, to avoid interfering with things shipped by Apple (/, /usr) or manually installed by the user (/usr/local.) That is, if you have Fink installed, your system will have /sw/bin, /sw/lib, /sw/etc, /sw/share/man, &c.

So that you can run things installed in these nonstandard locations, Fink provides some shell commands in /sw/bin/init.sh which edit environment variables like PATH and MANPATH to include the /sw/* directories. Most Fink users have . /sw/bin/init.sh in their ~/.profile, so these commands will be invoked when their shell starts.

Having my shell automatically pull in Fink at startup doesn’t work for me, though. It’s important to me to have a clean environment available. For instance, when I’m contributing to non-Fink open-source projects, trying to help someone who doesn’t have Fink installed troubleshoot something, or submitting a bug report for a program that interacts with other programs where I have the Fink version installed, but Apple ships a different version with the system.1 (Note that this is only an issue if program A interacts with program B by invoking it as a standalone process without using an absolute path.2)

Also, as a Fink developer, I actually have multiple Fink installations at different paths,3 and I only want one loaded at a time; I don’t want to activate /Volumes/SandBox/fink/dev-sw in an environment where /Volumes/SandBox/fink/sw has already been pulled in!

It’s much easier to pull Fink stuff in later when I need it than to undo the changes that /sw/bin/init.sh makes to my environment. My solution for making it easy to activate a particular Fink installation was to add the following to ~/.bashrc:

  1. if [ -n "$SW" ]
  2. then export CFLAGS="-I$SW/include"
  3. export LDFLAGS="-L$SW/lib"
  4. export CXXFLAGS="$CFLAGS"
  5. export CPPFLAGS="$CXXFLAGS"
  6. export ACLOCAL_FLAGS="-I \"$SW/share/aclocal\""
  7. export PKG_CONFIG_PATH="$SW/lib/pkgconfig"
  8. export PS1="[$SW_DISPNAME \\W@$(hostname -s)]\\\$ "
  9. . "$SW/bin/init.sh"
  10. export PATH=~/bin:"$PATH"
  11. fi

What this does is arranges it so that if I start a new shell with SW and SW_DISPNAME set, it’ll pull in the Fink installation rooted at the directory $SW and put $SW_DISPNAME in my shell prompt so that I can see which environment I’m using. The extra environment variables before . $SW/bin/init.sh set things up so that if I compile things by hand, they’ll find and link against Fink-installed libraries; the PATH setting at the end is because init.sh places the Fink bin directory at the front of the PATH, and I want my personal bin directory to come before it.

I run the following script (saved as ~/bin/finkinit) when I want to pull in Fink:

  1. #!/bin/bash
  2.  
  3. FINK=${1:-main}
  4.  
  5. case "$FINK" in
  6. main)
  7. SW=/Volumes/SandBox/fink/sw
  8. SW_DISPNAME="fink"
  9. ;;
  10. dev)
  11. SW=/Volumes/SandBox/fink/dev-sw
  12. SW_DISPNAME="fink-dev"
  13. ;;
  14. *) echo "Unknown fink install '$FINK'" >&2 ; exit 1
  15. esac
  16.  
  17. export SW SW_DISPNAME
  18. exec /bin/bash

This gives me a subshell with Fink turned on, which I can exit out of when I want to return to a clean environment. If I run it as finkinit, I get my main Fink installation, or I can run finkinit dev to get an alternate Fink.


  1. Yes, this actually happens somewhat frequently. Sometimes Fink has a newer version of a program than the OS (e.g. Subversion 1.4.6 vs. 1.4.4), or sometimes the Fink version has more extensions enabled (I use Fink’s Apache because it lets me use Fink’s PHP, which in turn lets me install PHP’s MySQL extension by doing fink install php5-apache2-ssl-mysql as opposed to compiling it by hand.) 

  2. Unlike Linux, executables on Darwin have the absolute paths to their shared libraries hardcoded in the binary, so a program linked against /usr/lib/libexpat.1.dylib will always use it, even if /sw/lib/libexpat.1.dylib exists. Linux, on the other hand, uses a search path mechanism at runtime to find the libraries, similar to the way the shell figures out which program to invoke when you command it to ls

  3. At the moment, one for my personal use and a clean one for testing packages I maintain in an environment without extra packages installed. This is important for testing that you’ve declared all of the necessary dependencies. 


Unmitigated Audacity

Details to follow, but I’m working on porting Audacity 1.3.4 to Mac. I’m making good progress (note that actually clicking on anything doesn’t work yet, if you want to nitpick Update: This was simple pilot error, stuff does actually work):

Audacity 1.3.4 on Mac OS 10.5


For All Your Finger-Pointing Needs

While working with a large codebase, I often want to find the origin of a particular line. Subversion offers a tool, annotate (aka blame, aka praise), which displays the author and revision for every line in a file, indicating who made the last change to a line. However, the last change is often not very useful; it was a minor change as a result of some other change you’re not interested in, or the code was moved around due to refactoring, and you need to go back even further.

When I need to do this, I find myself doing a sequence of:

  1. svn blame FILE | less; find the revision N where the line was last changed
  2. svn log -rN FILE | less; if the change is interesting, read the commit log for the file
  3. svn blame FILE@N-1 | less; using Subversion’s little-known pinned revision1 syntax, find the previous time the line was changed
  4. Using N-1 as the new N, return to step 2.

I’ve put together a rough version of a tool to make this easier; it’s at /trunk/blamegame in my repository, which is here for browsing with ViewVC, or it can be checked out with svn co http://zevils.com/svn/trunk/blamegame blamegame . It still needs some fine-tuning and documentation, but invoke it like blamegame FILE LINE (where FILE is a URL or the path to a file in a Subversion working copy) to start looking at a particular line of a file. You can navigate and search the file using a less-like interface. To drill down to the previous change to a line, hit r and then enter the line number. l, o, n, and m switch between viewing the commit log, the changed parts of the old file, the changed parts of the new file, and (the default) the diff. If you need to change the path you’re looking at (for instance, to jump inside a branch), use the p command. h will show the available commands.

Let me know what you think.


  1. Pretty much any Subversion command that takes a path argument can be given PATH@REVISION instead to use the version of the path at a particular revision. This is great for diff and cat as well as blame. I use it for working with deleted files and branches and diffing a branch against trunk. 


The Washington Post has a nifty gallery of mushroom photos by Taylor Lockwood.


A hilariously terrible first date (via Universal Hub.)


Wrong Dates in iCal Birthday Calendar

To keep track of people’s birthdays, I use Mac OS X’s1 Birthday Calendar feature of Address Book/iCal. I was going through my calendar the other day, and I noticed that a birthday which I knew was sometime in January wasn’t showing up. It was on the corresponding Address Book contact, though. I deleted the birthday from this contact and reentered it, which fixed that entry, but on the suspicion that more birthdays might be missing, I flipped through my calendar and found:2

Address Book says Mar 23, iCal says Mar 21

The Address Book birthday field has the misfeature that it forces a year to be specified.3 What a rude thing for Address Book to be asking! Anyway, I’d arbitrarily picked year 14 for the year for any contacts whose birth years I didn’t know. Maybe, I thought, the Gregorian reform was throwing things off. However, changing the year to 1900 didn’t help matters, and in fact made them worse:

Address Book says Mar 23, iCal says June 23

Turning the birthday calendar off (which wipes out iCal’s backing store for the calendar) and on didn’t help matters. A web search turned up some other people having the same problem, but the only useful solution they came up with was deleting and recreating entire contacts by hand.

I wanted to see if the raw data was wrong in Address Book’s database. Address Book uses Core Data in a way that makes the database difficult to work with at the SQLite command-line level, so instead I hacked /Developer/Examples/Python/PyObjC/AddressBook/Scripts/exportBook.py to emit the birthday field by adding ('Birthday', AddressBook.kABBirthdayProperty) to FIELD_NAMES and the following to encodeField:

    elif isinstance(value, AppKit.NSCalendarDate):
        return value.descriptionWithCalendarFormat_("%Y-%m-%d")

It turns out that a number of entries had negative years, e.g. -1900-03-23 instead of 1900-03-23. I’m not sure how this happened, but here’s a script to fix it:

  1. #!/usr/bin/python
  2. """
  3. Fix negative birthday years in Address Book.
  4. This work is hereby released into the Public Domain.
  5. """
  6. import AddressBook
  7. import AppKit
  8.  
  9. def personName(person):
  10. return "%s %s" % (
  11. person.valueForProperty_(AddressBook.kABFirstNameProperty),
  12. person.valueForProperty_(AddressBook.kABLastNameProperty)
  13. )
  14.  
  15. def formatDate(date):
  16. return date.descriptionWithCalendarFormat_("%Y-%m-%d")
  17.  
  18. def fixBirthday(birthday):
  19. year = int(birthday.descriptionWithCalendarFormat_("%Y"))
  20. if year < 0:
  21. return birthday.dateByAddingYears_months_days_hours_minutes_seconds_(
  22. -year * 2, 0, 0, 0, 0, 0)
  23. else:
  24. return None
  25.  
  26. def fixPersonBirthday(person):
  27. birthdayProp = AddressBook.kABBirthdayProperty
  28.  
  29. birthday = person.valueForProperty_(birthdayProp)
  30. if birthday == None: return
  31.  
  32. fixedBirthday = fixBirthday(birthday)
  33. if fixedBirthday != None:
  34. print "Fixing up %s: %s -> %s" % (
  35. personName(person),
  36. formatDate(birthday),
  37. formatDate(fixedBirthday)
  38. )
  39. person.setValue_forProperty_(fixedBirthday, birthdayProp)
  40.  
  41. book = AddressBook.ABAddressBook.sharedAddressBook()
  42.  
  43. for person in book.people():
  44. fixPersonBirthday(person)
  45.  
  46. book.save()

  1. 10.5.1, MacBook Pro Core 2 Duo 

  2. Names have been changed to protect the innocent. 

  3. There’s also an implementation flaw; I have my date format set to YYYY-MM-DD, and when I try to enter a year in the field, whether or not pressing a number on the keyboard will actually result in a digit appearing in the input field appears to be random. It also behaves very weirdly if there are four digits in the field already and I press another digit. I wish I could get a video of all this, but it’s not quite worth the effort of taking a screencast and a video of my fingers on the keyboard and then splice them together… 

  4. Anno Domini, not Anno Antidomini 


Internationalization of Names

Names are complicated

What’s in a name? The answer turns out to vary quite widely around the world. When an English-language form, either electronic or paper, asks for a person’s name, it usually provides separate fields for first and last name, and sometimes middle name or middle initial. Aristotle Pagaltzis linked to a post by Jim Clark on Thai names, demonstrating that this approach, or even the alternative “given name, family name”, falls down pretty quickly outside the English-speaking world.

Thai names consist of:

  • A given name, similar to the English first name, except that it must come from a list of government-approved names;
  • A family name, which is also government-regulated; all people with the same family name are related, and new Thai citizens must select an unused name. Like all non-namespaced identifiers (domain names, instant messenger handles, user names on popular web services), the good short ones are taken; and
  • A chue len, which is typically translated as nickname, but according to Mr. Clark is more like an informal given name; it’s selected by one’s parents or close relatives early in life (though not necessarily at birth).

The obvious mapping of Thai name components onto English, (given name, family name, chue len) → (first name, last name, nickname), doesn’t work very well. Consider the Thai name Thaksin Shinawatra, chue len Meow, the former prime minister. His (romanized; more on that later) legal name is Thaksin Shinawatra. If addressing him politely, I would refer to him as Khun Thaksin.1 Note that this is {honorific} {given name}, not {honorific} {family name}; in other words, Mr. Matthew as opposed to Mr. Sachs. His friends and family will call him Meow, not Thaksin or Shinawatra.

A further wrinkle is that when sorting a list of Thai names, the given name, not the family name, should be the sort key. Then there’s also the matter that Thaksin Shinawatra, aka Meow isn’t really the gentleman’s name at all; it’s ทักษิณ ชินวัตร, aka แม้ว. There are several standard romanizations for Thai, and whichever one the named individual prefers is considered canonical. There are also other quirks involved in the Thai script form of a name, like the lack of whitespace between the honorific and the given name.

Non-Thai complications

Then there are the whole sets of different requirements for other kinds of names. The comments on Jim Clark’s blog entry, and this post by Richard Ishid, who’s in charge of i18n issues for the W3C, give some other good examples.

  • Russian and Icelandic have gender suffixes on the family name (Fuzaylova for a woman, Fuzaylov for a man; Fjalar Jónsson vs. Katrín Jónsdóttir.)
  • Russian has nicknames (which, like Thai “nicknames”, are much more widely used than English nicknames) which are usually (always?) systematically derivable from their given names; Vladimir → Vova.
  • Scandanavian given names typically include spaces, and convention varies as to how acceptable it is to refer to Hans Christian Andersen as Hans vs. Hans Christian. This isn’t unheard of in the southern United States, either — Billy Jean, &c. In some parts of Europe, these multipart given names are hyphenated, as in the Austrian Hans-Christian or the French Jean-Claude.
  • In France and Italy, names can have a comma which essentially divides a series of first names from a series of middle names; in France, the middle names are rarely used outside of legal contexts, while in Italy, the middle names aren’t used in legal contexts. A Mario, Alberto Giovanni Rossi would have a legal name of Mario Rossi in Italy, whereas a French Jean, Christophe Dupond would be commonly known as Jean Dupond but legally Jean, Christophe Dupond.
  • Many countries use patronymics instead of stable family names, so a set of related people won’t have the same family name.
  • Many Chinese take arbitrary western nicknames for ease of communicating with westerners.
  • Chinese names also have generational markers, so a set of siblings will all have the same “middle” name, and names are written {family}{generational}{given} in Chinese script.

So what?

How much of this do we really need to worry about? When I say that Thai names should be sorted by given name, should, of course, is a horribly loaded term. If an American border control agent pulls up a list of people who have entered the country at a particular point, they probably want the sort key to be Thaksin, not Shinawatra. Mapping (given, family) → (first, last) is also probably fine for this application. So when, exactly, does the extra information need to be preserved?

Some reasons that a system might be interested in a name, or parts of a name, are:

  • Correlating records with other systems
  • Displaying people’s names
  • Addressing people in writing (“Dear Mr. Sachs,”, “Welcome, Matthew!”) or on the phone
  • Identifying people (“To look up your records, enter your name”)
  • Searching for people (on, say, a social networking site)
  • Sorting a list of people

For most English applications that don’t cater to a large international audience, it might be “good enough” to either simply have a flat name field where users can either enter arbitrary names or at least their romanizations.2 A flat name field is much more flexible. Since you probably need to support substring searches anyway, it doesn’t lose anything as far as searching’s concerned.

If you want to sort by last name, or communicate with other systems that take a (first name, last name) tuple, it might be good enough to just split off the last whitespace-separated token and treat that as the last name.3 If that’s not good enough, a pair of (first names, last name) or (given names, family name) inputs may be called for, but characters such as spaces and apostophes (O’Flannagan) should be valid. If your application wants to try to automatically derive a secondary form of address from the name entered, maybe it shouldn’t. Is the ability to have form letters say Mr. Sachs as opposed to Matthew Sachs really worth the faux pas of Mr. Shinawatra? I guess it depends on how international your audience is; you could always ask for multiple forms of address.4

For applications that want to really get localized names right, like a system-wide address book or a global social networking site, a more complex approach is called for. For instance, the Mac OS X address book framework knows about the address formats for various countries; it could extend that functionality to support different name formats. It has some rudimentary support for this, in that an individual address book entry can have a set of name ordering flags associated with it, either first name first or last name first (sic); name fields are fixed at title, first name, middle name, last name, suffix, nickname, maiden name, and phonetic (first, middle, last) name.

Per-country address format support doesn’t change which fields exist, but it changes the order they’re displayed in. Per-country name format would need to be more complicated. A Name (which a person might have more than one of with different NameFormats) might consist of:

  • NameFormat, defining the (country, language) associated with the name (e.g. en.US and the set of available NameComponent)
  • A list of (NameComponent, Value, (optional) PhoneticValue) The system could provide functions like:
  • int Name.compareWith(Name)
  • String Name.representation(NAME_REPRESENTATION) where NAME_REPRESENTATION is one of:
    • LEGAL_NAME
    • FORMAL_NAME
    • SHORT_FORMAL_NAME
    • INFORMAL_NAME
    • VERY_INFORMAL_NAME
  • Name Name.convertTo(NameFormat) would try to convert to a different name representation using automated rules for things like romanization.

  1. Khun is a generic honorific roughly akin to Mr./Ms./Mrs. There might be a better one to use for a (former) Prime Minister. This list includes ones for teacher, aunt, sister, older person, and younger person, but suggests that khun is always used when addressing someone formally. 

  2. In part two of his post Mr. Ishid recommends that applications that expect ASCII input specify it; detecting and erroring on input in unsupported scripts is probably sufficient. 

  3. It might be worth having a list of tokens which will also get treated as part of the last name, such as de, with this approach. 

  4. “Enter your name and how you’d like to be addressed:” ? 


Migrating a wiki from Trac to MediaWiki

I’d set up a Trac installation for wedding planning, instead of using MediaWiki (the system Wikipedia uses, which I already had a couple of installations of) since we wanted both a wiki (venue data, possible honeymoon destinations, guest lists… shut up, it’s useful!) and ticket system (useful for tracking things like thank-you notes and being able to assign specific ones to either Liz or myself).

However, Dreamhost doesn’t support mod_python, so pages were taking way too long to load. I decided to switch over to MediaWiki for the wiki part and just use my existing Bugzilla installation for ticket tracking. Hence, a new script over on the code page, trac2mw. Our wiki was fairly tiny, so caveat user. I didn’t bother having it migrate tickets tickets or attachments, since we didn’t have any data there that was worth preserving. The input format, a MySQL XML dump, probably isn’t ideal for a lot of people (since Trac runs on SQLite by default.) It does fix up the wiki page syntax (the parts of it we were using, at least), though.


Less Edward Tufte, More Don Martin

A New York Times blog post on holiday tipping linked to a gem from the Times archives, its own ancestor from 1911.

The most striking feature of the article, which appeared on page six of the magazine section, is the large political cartoon-like illustration in the center (drawn by Reginald Russom, who evidently went on to help found what later became the Australian Cartoonists’ Association.) From what I’ve noticed, while the Times Magazine still employs plenty of illustrations, they’re mostly charts and graphs; when there’s a lead image that’s not a more or less realist photograph of the article’s subject, it tends to be a photo like this one.

I love how one old newspaper article can shed light on:

  • Other concerns of the period (the legality of a state (or city?)-wide income tax debate was argued before the State Supreme Court)
  • Typical incomes and wages (a bit over $1M/yr in 2006 dollars is their example income for a “well-bred” New Yorker)
  • Types of service-sector employees one might utilize (such as elevator boy, charwoman, furnaceman, telephone operator, milkman, and stenographer, in addition to less remarkable professions)
  • Things that one might fear malfunctioning in an apartment (how little some things change; here we have the electric buzzer, hot water, windows (by the glass being broken, not routine mechanical failure), and mail delivery)

Maybe this is still routine in Manhattan, at least in the more highfalutin co-ops, but I also found it noteworthy that the building’s management was expected to send you candidates if you wanted to sublet your apartment (but watch out; if you anger your super by not tipping around Christmas, he might send “several negroes and a Chinaman” your way!)

When I first got Times archives access (by subscribing to TimesSelect back in the day), I trawled the archives, there’s a lot of good stuff there. If anyone else has a favorite, I’d love to hear about it in the comments.


I love programs with a sense of humor. Note the trademarked error detail name in this screenshot from NetNewsWire:

Fancy Debug Info™

(Not to toot my own horn, but you might like the old Zevils 404 page if you’re into that sort of thing (good luck finding the new Zevils 404 page, most attempts will just redirect get you the Wordpress index (I should figure out how to change that… Update: The solution was to create a 404.php page in my theme’s directory.)) 404lounge.net has a bunch of the things, and The Daily WTF’s Error’d! chronicles the unintentionally humorous.)