I apologize to anyone I may appear to be talking down to, but I honestly affirm that is not what I am doing here. I don't know the technical level of anyone here, so I'm just going for the lowest common denominator!
It's certainly true that comparing integers to other integers is quicker than comparing one set of letters to another set of letters. It's also true that storing an integer in a database takes up a lot less space that storing all those letters.
(Incidentally, I'm using the word 'integer' and not 'number' specifically - using any 'number', such as 13.5674201, would be just as problematic. Using a neat integer with no decimal places is a lot easier to compare.)
But the problem described above is not insurmountable, and here's why...
In a database you would have a table listing all the languages in the core rules and against each one there would be a unique number. No two languages would have the same number. This means that each character "User A" creates stores those numbers (for speed and storage reasons) - but when you look at the character, the software goes "hey, Albion the Dwarf knows languages 1 and 3; which according to this list is Common and Dwarven" (albeit a lot quicker than you or I can read).
So how do you allow "User A" to have other languages that aren't in the core list? Simple. You have another table, which adds in any language that doesn't already exist - but is only visible to "User A" and all the characters he cares to create. How is this stored? Using numbers - no different to the way the core list is stored.
It's so simple any reasonably experienced developer would know how to design it. I've done it several times on several projects - each with many users. But let's number crunch a little...
How many DDi subscribers are there? Any idea? I don't. Let's pick a completely arbitrary and ridiculous 1 million! How many languages are there in 4e? I know this one! Well, I don't - but I'd be surprised if it was more than 20.
Out of those 1 million subscribers how many do you think use languages not in the core list? Let's be equally arbitrary and say *all* of them. And not just one additional language, let's have another 20 languages for *each* subscriber!
How big would this table be? 20,000,000 lines. Yes, but in terms of storage? Sorry to go all technical but let's assume 4 bytes for the number (well, I say *assume*, it *is* 4 bytes for an integer) and 100 bytes for a 50 letter language (err, wow! That's probably a language in itself) - this means each new language takes up 104 bytes. Multiply that by 20,000,000 we get just under 2Gb. 2Gb!? That's less than a memory stick nowadays.
In the examples above I've show a simple way of implementing it, then blown it apart with over the top numbers. It's peanuts, it really is.