The following essay was originally published in the September 1991 issue of the Linguica APA (Issue #9). I have made a few minor changes since then. **************************************************************** Designing an Artificial Language Morphology by Rick Morneau September 23, 1991 Revised: July 16, 1994 In this essay, I will discuss ways in which phonemes can be combined into morphemes (minimal units of meaning), and how morphemes can be combined into words. I will discuss morphology only in a very restricted sense; i.e., the *shapes* of words. I will not discuss inflectional morphology at all. And I will postpone the discussion of derivational morphology to my forthcoming essay on Lexical Semantics. As a result, this essay will be somewhat abstract. Since the morphological rules of a language state how phonemes can be linked together to form morphemes, the morphology of a language will have a strong effect on how easy or difficult it is to pronounce. Fortunately or unfortunately, most people have difficulty with complex consonant clusters, and so words such as /mksjzptlk/ are not likely to be part of any language's lexicon (unless, of course, you're from the fifth dimension :-). Even clusters that some people consider simple can be quite a challenge to others. For example, most Indo-European languages allow consonant clusters within a single syllable. English examples of this are the "str" in "string", the "bl" in "blue", the "spl" in "splash", the "sk" in "skip", and the "pr" in "prune". Native speakers of most Indo-European languages have few if any problems producing these sounds, but others who study English find them quite difficult and many never master them. Keep this in mind when designing your artificial language (henceforth AL) if you want your language to appeal to as many people as possible. A word can consist of one or more syllables. For the purpose of this discussion, a syllable is a vowel or diphthong optionally preceded by one or more consecutive consonants, and optionally followed by one or more consecutive consonants. Thus, for the vast majority of languages, a syllable has the form: {C}V{V}{C} where {} indicates zero or more of the enclosed item C indicates a consonant V indicates a vowel or semivowel However, very few languages take full advantage of the capabilities of the human vocal tract. In fact, a large majority of the world's languages manage to get by with a subset of the above structure which looks more like this: [C][S]V[V][S][N] where [] indicates that the enclosed item is optional C indicates a consonant S indicates a semivowel V indicates a vowel N indicates a nasal Thus, the simpler structure will allow syllables pronounced like English "him", "queen", "boa" and "toy", but it will not allow syllables like "hit", "string", "plank" or "flirt". The more complex structure will allow either. Note that the lack of consonant clusters and the requirement that the final consonant be a nasal greatly reduces the number of possible syllables that one can create from a fixed phonemic inventory. However, when two such syllables are juxtaposed, the result is very easy to pronounce. For example, speakers of Indo-European languages can pronounce /gwikto/ as easily as /gwinto/, but speakers of most other languages will find /gwikto/ so difficult that they will often slip in a vowel between the /k/ and the /t/. The nasal /n/ is not a problem because nasals are highly vocalic in nature, and co-articulate very smoothly with the preceding vowel. If you feel that the second structure is too limiting, you may want to consider a compromise which will be easy to pronounce for most but not all people, and looks like this: [C1][S]V[V][S][C2] where [] indicates that the enclosed item is optional C1 indicates any consonant S indicates a semivowel V indicates a vowel C2 indicates a continuant consonant or a nasal Continuant consonants are fricatives and liquids; i.e., just about everything except nasals, stops and affricates. However, a potential problem shows up here when C2 of a syllable equals C1 of the following syllable, as in /bassun/. One solution is simply to insist that the double consonant be audibly lengthened. A second approach is to use only non-continuants and non-nasals for C1. Once you've decided on the general shape of a syllable, the next step is to decide how to hook them together to form morphemes and words. At this point, you have two choices: an ad hoc approach or a formal approach. If you plan to borrow morphemes directly from existing languages, then you're limited to the ad hoc approach. Basically, you'll choose your morphemes from existing languages and combine the roots, prefixes, suffixes and infixes to create a word. Esperanto and most of the ALs based on European languages fall into this category. In a more formal approach, the shape of a morpheme will indicate the role it plays in a word. Thus, a prefix will have a different shape than a root, which will have a different shape than a suffix, and so forth. In fact, if you play your cards right, you will not only be able to split a word into it's component morphemes on sight, but you'll also know where word boundaries are, even if there are no spaces or pauses between them. You might say that your morphemes and words are auto-isolating or self-segregating. (Loglanists use the expression *audiovisual isomorphism* to describe this, which, in my humble opinion, is totally inappropriate and overly pedantic.) This, of course, would be ideal if you want to speak to a computer, since you won't have to put pauses between words. [By the way, the problem of isolating words in continuous speech is one of the most difficult that the speech- processing community is now facing. I don't expect a solution any time soon.] So, how do we create a self-segregating morphology? We do it by insuring that each type of morpheme can always be identified by its shape, and by insuring that each type can occupy only one position in a word. Consider a simple example of an easy-to-pronounce language with only three morpheme types: C = b, p, d, t, g, k, z, s, v, f V = a, e, i, o, u S = y, w N = m, n prefix = CSV root = CVN suffix = CV word = {prefix} {root} suffix Thus, examples of complete words would be: za, ke, tembo, sandu, kwabe, pyobendi, kyusintemda, byupwetu, etc. Note that if we removed all spaces and squished them all together, we could easily and unambiguously split them apart. This example, however, has at least one serious flaw. Since the root form is CVN, the maximum number of roots we can form with our phoneme inventory is only 10 x 5 x 2 = 100. Since we'll need much more than that, let's add disyllabic and trisyllabic root forms: C = b, p, d, t, g, k, z, s, v, f V = a, e, i, o, u S = y, w N = m, n SPECIAL = q (English "ch" in "church") x (English "sh" in "shop") prefix = CSV root = CVN or CV[N]qV[N] or CV[N]xV[N]CV[N] suffix = CV word = {prefix} {root} suffix Note that "q" and "x" simply indicate that the root continues with one or two more syllables, respectively. Examples of two-syllable roots would be binqan, temqu and saqem. Examples of three-syllable roots would be kuxiba, tixendi, zomxate and panxotun. Next, add prefixes and suffixes and you would have something like kwabinqandu, temqusa, pyosaqembe and kuxibato, fyotixendika, zomxatebi, panxotunki. (With this type of morphology, no one is going to accuse you of being Eurocentric. :-) Note that, even with this small phonemic inventory, you can create 2,250 unique disyllabic roots and 337,500 unique trisyllabic roots. The above is just one of many possible examples of what can be done with a formally designed morphology. There are many other things that you can do. You can add new forms (CVC, CV[S]N, C[S]VN, C[S]V[S]N, CV'V, CV'VN, C[S]V'VN, etc. (where the apostrophe indicates a glottal stop)), or you can dedicate specific phonemes for specific purposes as we did above with "q" and "x". Your choices are only limited by the requirements you set for yourself. **************************************************************** Addendum: An idea that occurred to me after I wrote the above piece was to dedicate a vowel, such as /a/, for exclusive use in creating polysyllabic morphemes. This phoneme would not be used for anything else. For example, a morpheme of type CVN, could be "tun", "batun", "kwasatun", "dambyamatun", etc. In other words, whenever /a/ appears, it indicates that the morpheme continues to the right. Only the last syllable is used to determine the morpheme's type. End of essay