Human Name Variations in Databases

I normally write about my adventures learning data science here, but my expertise for years has been database design and reporting, and I have some knowledge to contribute to a discussion that I thought I’d document here.

A conversation on Twitter today about how people’s names are stored in databases, with stories of frustration from people that have had terrible customer/patient experience because of “unusual” names, made me want to write about this topic. When you search for information on name standards in databases, you will usually get information on field names, lengths, etc. What is harder to find is information on how to store the variety of names in a system of record.

To get an idea of how people are named in different cultures, see w3’s article about it at http://www.w3.org/International/questions/qa-personal-names

Some examples they give of names that may not be entered into a database the “traditional American way”  are:

  • “Mao Ze Dong” – Mao is the family name, Dong is the given name, and Ze is a generational name common to all siblings in a family. In Chinese script, the names are not separated by spaces.
  • “José Eduardo Santos Tavares Melo Silva” – Brazilian name which includes many ancestral family names.
  • “Kogaddu Birappa Timappa Nair” – Indian name which includes village name,  father’s name, given name, and last name.

You may think that people should just conform their name to our forms, like just choosing three names for “first”, “middle”, and “last”. Or maybe you think the data collection form should just have one entry field for “name” and not split it up. However, it’s not that simple, and the need to format names for various uses (like mailing labels, letters, etc.) provides additional challenges.

Unfortunately, sometimes the challenge is just getting an organization to accept your actual name at all. The next challenge, once your name is in a system of record, is how it ends up used. Different usages can end up complicating things like government IDs and driver’s licenses (if the state ID name rules don’t allow you to use the name that is on your federal record, for instance), insurance claim rejections (when your name doesn’t match up exactly between the doctor’s office and the insurance company databases), or multiple accounts at retail establishments like pharmacies (I have two different accounts at my local CVS, and when picking up medicine always have to confuse the person at the window by mentioning both my maiden and married names. Also, the name on my CVS discount card was mis-entered, so I have to spell my name incorrectly if they need to look that up for any reason).

Here’s my experience with a “nontraditional” name, since I got married and wanted to make my maiden name a 2nd middle name:

Luckily, I didn’t have trouble changing my name with social security, despite scary stories from other women I know who had to fight to get their name the way they wanted it on their card. Side note: one benefit I’ve since discovered is that since my new name on my driver’s license still includes my maiden name, it makes me more believable when I show up somewhere that doesn’t have my married name and I’m trying to get them to change it.

My Birth Name: Renée Marie Parilak
(I leave off the accent when entering on forms because that would just increase the chance for entry error, like Rene’e or Renee’. Yes, I’ve seen both.)

My Married Name: Renee Marie Parilak Teate.
First name “Renee”, middle name is now “Marie Parilak”, and last name “Teate”.

When I submitted this name on name change forms with credit card companies and other organizations, I had a few challenges, like not enough room on the middle name line, but I sent in all of the forms with my full name. My credit cards came back with all of these variations printed on them, depending on each company’s conventions:

  • Renee Marie Parilak Teate
  • Renee M. Teate
  • Renee M. P. Teate
  • Renee M. Parilak (yes, one came back unchanged)

Actually, the other card that came back unchanged was my voter registration card. I filled out the update form at the polls because they gave me a hard time last time I voted and had different names on my “proof of identity” information, and they mailed me a new card with my old name on it. Really.

Another has my full name correct when concatenated, but has stored my last name as Parilak Teate instead of Teate.

Here’s a friend’s experience (with names removed for privacy purposes):

My parents, to forestall in-law fights over naming, made it so that the first name of all their female children was my mother’s middle name (which is what she goes by…family tradition of female going by middle name), so we are all named “SharedFirstName MiddleName Surname”. Every girl child was meant to be called by her middle name and so it has been.

Many bureaucracies that have forms that force everyone into using their middle initial only (if they even acknowledge the middle name). It’s been a problem throughout my life, especially at doctor’s offices, but it escalated enormously with the connection of various bureaucratic systems to the internet.

The latest wrinkle started about 2 years ago when one of my sisters moved from the family home to an apartment. She filled out a USPS change-of-address form. Suddenly, not just her mail but some of my mail, my mom’s mail and my sisters’ mail started going to her apartment. Our small-town postal personnel suggested workarounds, none of which worked. It wasn’t as simple as going to the post office, showing ID to prove who we were, and having postal personnel escalate it to whoever is in charge of that database. We tried repeatedly to get the mistake corrected by personally visiting our local post office and talking with the supervisor.

Meanwhile, my sister moved out of that apartment, meaning she was no longer there to reroute that mail to us. My mom, my sister, and I, who are all on Social Security (for age or disability) started to find Medicare and other important SocSec mail now had that apartment address on it instead of our true address. No one from USPS or SocSec ever contacted the address/contact of record to verify we wanted an address change. We have different Social Security numbers and you’d think that would be enough to double-check we weren’t the same person despite slightly similar names, but no. This points out how easy it would be for a ID thief to get a bunch of your most sensitive information re-routed to them. Google has a simple check-in when someone logs in from a strange computer or tries to change the password, yet USPS and Soc Sec don’t??

While this was going on, I’d tell medical providers to be sure to address bills using my middle name as the first name so it wouldn’t be rerouted, which it would be if they used my first name and middle initial, which happens to be identical to one of my sisters’ names in that format. They’d refuse or beg off, saying that their biller took care of that and there was no way to get that kind of customization. So despite being a customer who didn’t want to shirk my bills, who was being pro-active about it all, I ran the risk of running up huge medical bills because of mail going somewhere else and medical offices being so maddeningly subject to the almighty database that they would not avoid sending their mail into a black hole.

A couple months ago I tried to call Soc Sec to get advice about something. Even to get advice, they ask you your SS#, mother’s maiden name and other stuff. When I gave my mother’s maiden name over the phone, the operator told me I was wrong! WTF! She was ENORMOUSLY rude to me when I tried to let her know what mistake was happening.

I was fed up with years of this and called my congressperson’s office. So far, they’ve dropped the ball completely. I guess it’s The White House next. Which is a waste of my time, a waste of gov’t time, etc. All of this could have been avoided if good database design practices prevailed (and if bureaucratic organizations would quit distancing customers/clients from reaching the person in the organization who would be capable of making changes to the database once you’ve proved that you are YOU and have always been YOU).

In the ariticle linked above, w3 addresses some “implications for field design” for name fields, including field length, whether to split the name up, etc. I suggest you go read it. Here’s the link again.

If I were designing a name entry form today (Note: for a system that actually needs to store full names! Not all of them do!), I would ask the user for:

  • Prefix: [Mr., Ms., Rev., etc. – optional, could allow “other” entry]
  • Given/First Name(s):
  • Middle Name(s): [optional]
  • Family/Last Name(s):
  • Maiden Last Name: [if applicable]
  • Suffix(es): [Jr, III, Esq. etc. – optional, allow “other” entry]
  • Full Legal Name: [optional, would default to First, Middle, Last]
  • What should we call you? (Preferred given name or nickname if desired): [auto-fill with Given name. allow user to edit]
  • Preferred Mail Name: [auto-fill with prefix, first, middle, last, suffix. allow to edit]

Note: if this were a mobile form, I wouldn’t ask for all of the name variations up front! Complicated mobile forms are a turn-off and can result in lost customers, so just ask for either full legal name (and have your code guess how to split the full name into first/middle/last fields) or first and last, and “What do you want us to call you?”, then have a detailed profile page that lets them go in later and fix anything your system got wrong.

In my case, I would be able to specify that my first name is “Renee” and my last name is “Teate”, and the other two names are my middle names. My friend could specify her full legal name and also that she prefers to be addressed as her middle name and not her legal first name. Mao Ze Dong could specify that Mao is the Family Name and Dong is the Given name, while still leaving the names in the original order for the full legal name.

Make sure your database can handle all of the variations that may be written on a paper form. Also make sure you can handle special characters.

Another Note: If you are going to display the name publicly in any way, like on a user profile on a website, you have to give the person full control over how they want their name displayed there. There are various safety and personal reasons a person may not want their name displayed the way you want to display it. Here’s one example story.  Also see Nymwars.

And if my system were something like a medical insurance record where past names may come in from doctor’s offices even though I have the patient’s current name, I might ask for a list of past names to keep on the record. You can store several “former names” in a table with a one-to-many relationship to the person’s primary record, and store all of the prior name fields when a new name comes in. You can even store names that aren’t their actual name, but may come in from another system regularly (like a misspelling) or be an incorrect version of their name that you stored in the past.

When in doubt, you can use the full legal name on communications. If sending an informal email, you can use the “What should we call you?” name. If sending a formal letter, you can use the Preferred Mail Name. Someone could have the preferred mail name of “Mr. J. Edgar Hoover” (funny that’s the name that came to mind as an example of first name initial when I’m writing about storing personal information), but prefer to be called “Ed” in person, and it’s good for your organization to know.

If you have spouses in your database, the marriage record should store preferred informal and mailing joint names, like “Mr. and Mrs. Doe” or “John and Jane”, and not just auto-generate the combinations dynamically (though you could default to that), since some couples have strong preferences on how their names are shown in letter salutations (like wanting the wife’s name first, or preferring to always be referred to formally – I work in a fundraising organization, and they definitely want to address large donors’ names they way they want them!). A relatively safe way to do envelope labels is to have “stacked” name labels, where both spouses’ preferred mailing names are completely written out and listed one above the other. This avoids uncomfortable situations in cases where the spouses have different last names, for instance. Also, you can then handle cases that some systems have trouble with, such as “Drs. Jane and John Doe”, or professional/military prefixes and suffixes in a certain order.

Speaking of marriage, there is a great article on storing a variety of marriage relationships in your database, called “Y2Gay“. It’s a good read for database designers or anyone that has to think about data issues!

Besides having a system that can respond to customer preferences (following their preferences will make them happier customers; rejecting an insurance claim because the first/middle/last names don’t exactly line up with your record will not), having all of these name variations does help in a data science way as well: you can now better match profiles coming in from different systems and have a higher degree of confidence that you are pulling incoming data into the correct “Mary Smith” record, for instance.

Hopefully, our stories made for some food-for-thought for people designing databases, websites, and processes involving people’s names!

Please share your experiences, thoughts, and comments on my database field choices below!

Here are some other references on the topic:

Bad assumptions programmers make about names

Related stackoverflow topic

Using External Data in Data Matching

Data Models and Real World Alignment