The following essay was originally published in the September 1991 issue
of the Linguica APA (Issue #9).  I have made a few minor changes since
then.


****************************************************************

                   Designing an Artificial Language

                            Morphology

                         by Rick Morneau

                        September 23, 1991
                       Revised: July 16, 1994


In this essay, I will discuss ways in which phonemes can be combined
into morphemes (minimal units of meaning), and how morphemes can be
combined into words.  I will discuss morphology only in a very
restricted sense; i.e., the *shapes* of words.  I will not discuss
inflectional morphology at all.  And I will postpone the discussion of
derivational morphology to my forthcoming essay on Lexical Semantics.
As a result, this essay will be somewhat abstract.

Since the morphological rules of a language state how phonemes can be
linked together to form morphemes, the morphology of a language will
have a strong effect on how easy or difficult it is to pronounce.
Fortunately or unfortunately, most people have difficulty with complex
consonant clusters, and so words such as /mksjzptlk/ are not likely to
be part of any language's lexicon (unless, of course, you're from the
fifth dimension :-).  Even clusters that some people consider simple can
be quite a challenge to others.  For example, most Indo-European
languages allow consonant clusters within a single syllable.  English
examples of this are the "str" in "string", the "bl" in "blue", the
"spl" in "splash", the "sk" in "skip", and the "pr" in "prune".  Native
speakers of most Indo-European languages have few if any problems
producing these sounds, but others who study English find them quite
difficult and many never master them.  Keep this in mind when designing
your artificial language (henceforth AL) if you want your language to
appeal to as many people as possible.

A word can consist of one or more syllables.  For the purpose of this
discussion, a syllable is a vowel or diphthong optionally preceded by
one or more consecutive consonants, and optionally followed by one or
more consecutive consonants.  Thus, for the vast majority of languages,
a syllable has the form:

        {C}V{V}{C}     where {} indicates zero or more of the
                                  enclosed item
                              C indicates a consonant
                              V indicates a vowel or semivowel

However, very few languages take full advantage of the capabilities of
the human vocal tract.  In fact, a large majority of the world's
languages manage to get by with a subset of the above structure which
looks more like this:

        [C][S]V[V][S][N] where [] indicates that the enclosed
                                   item is optional
                                C indicates a consonant
                                S indicates a semivowel
                                V indicates a vowel
                                N indicates a nasal

Thus, the simpler structure will allow syllables pronounced like English
"him", "queen", "boa" and "toy", but it will not allow syllables like
"hit", "string", "plank" or "flirt".  The more complex structure will
allow either.  Note that the lack of consonant clusters and the
requirement that the final consonant be a nasal greatly reduces the
number of possible syllables that one can create from a fixed phonemic
inventory.  However, when two such syllables are juxtaposed, the result
is very easy to pronounce.  For example, speakers of Indo-European
languages can pronounce /gwikto/ as easily as /gwinto/, but speakers of
most other languages will find /gwikto/ so difficult that they will
often slip in a vowel between the /k/ and the /t/.  The nasal /n/ is not
a problem because nasals are highly vocalic in nature, and co-articulate
very smoothly with the preceding vowel.

If you feel that the second structure is too limiting, you may want to
consider a compromise which will be easy to pronounce for most but not
all people, and looks like this:

        [C1][S]V[V][S][C2] where []  indicates that the enclosed
                                      item is optional
                                  C1 indicates any consonant
                                  S  indicates a semivowel
                                  V  indicates a vowel
                                  C2 indicates a continuant consonant
                                      or a nasal

Continuant consonants are fricatives and liquids; i.e., just about
everything except nasals, stops and affricates.  However, a potential
problem shows up here when C2 of a syllable equals C1 of the following
syllable, as in /bassun/.  One solution is simply to insist that the
double consonant be audibly lengthened.  A second approach is to use
only non-continuants and non-nasals for C1.

Once you've decided on the general shape of a syllable, the next step is
to decide how to hook them together to form morphemes and words.  At
this point, you have two choices: an ad hoc approach or a formal
approach.  If you plan to borrow morphemes directly from existing
languages, then you're limited to the ad hoc approach.  Basically,
you'll choose your morphemes from existing languages and combine the
roots, prefixes, suffixes and infixes to create a word.  Esperanto and
most of the ALs based on European languages fall into this category.

In a more formal approach, the shape of a morpheme will indicate the
role it plays in a word.  Thus, a prefix will have a different shape
than a root, which will have a different shape than a suffix, and so
forth.  In fact, if you play your cards right, you will not only be able
to split a word into it's component morphemes on sight, but you'll also
know where word boundaries are, even if there are no spaces or pauses
between them.  You might say that your morphemes and words are
auto-isolating or self-segregating.  (Loglanists use the expression
*audiovisual isomorphism* to describe this, which, in my humble opinion,
is totally inappropriate and overly pedantic.)  This, of course, would
be ideal if you want to speak to a computer, since you won't have to put
pauses between words.  [By the way, the problem of isolating words in
continuous speech is one of the most difficult that the speech-
processing community is now facing.  I don't expect a solution any time
soon.]

So, how do we create a self-segregating morphology?  We do it by insuring
that each type of morpheme can always be identified by its shape, and by
insuring that each type can occupy only one position in a word.
Consider a simple example of an easy-to-pronounce language with only
three morpheme types:

                C = b, p, d, t, g, k, z, s, v, f
                V = a, e, i, o, u
                S = y, w
                N = m, n

                prefix = CSV
                root = CVN
                suffix = CV

                word = {prefix} {root} suffix

Thus, examples of complete words would be: za, ke, tembo, sandu, kwabe,
pyobendi, kyusintemda, byupwetu, etc.  Note that if we removed all
spaces and squished them all together, we could easily and unambiguously
split them apart.  This example, however, has at least one serious flaw.
Since the root form is CVN, the maximum number of roots we can form with
our phoneme inventory is only 10 x 5 x 2 = 100.  Since we'll need much
more than that, let's add disyllabic and trisyllabic root forms:

                C = b, p, d, t, g, k, z, s, v, f
                V = a, e, i, o, u
                S = y, w
                N = m, n
                SPECIAL = q (English "ch" in "church")
                          x (English "sh" in "shop")

                prefix = CSV
                root = CVN   or   CV[N]qV[N]   or   CV[N]xV[N]CV[N]
                suffix = CV

                word = {prefix} {root} suffix

Note that "q" and "x" simply indicate that the root continues with one
or two more syllables, respectively.  Examples of two-syllable roots
would be binqan, temqu and saqem.  Examples of three-syllable roots
would be kuxiba, tixendi, zomxate and panxotun.  Next, add prefixes and
suffixes and you would have something like kwabinqandu, temqusa,
pyosaqembe and kuxibato, fyotixendika, zomxatebi, panxotunki.  (With
this type of morphology, no one is going to accuse you of being
Eurocentric.  :-)  Note that, even with this small phonemic inventory,
you can create 2,250 unique disyllabic roots and 337,500 unique
trisyllabic roots.

The above is just one of many possible examples of what can be done with
a formally designed morphology.  There are many other things that you
can do.  You can add new forms (CVC, CV[S]N, C[S]VN, C[S]V[S]N, CV'V,
CV'VN, C[S]V'VN, etc.  (where the apostrophe indicates a glottal stop)),
or you can dedicate specific phonemes for specific purposes as we did
above with "q" and "x".  Your choices are only limited by the
requirements you set for yourself.

****************************************************************

Addendum:

An idea that occurred to me after I wrote the above piece was to
dedicate a vowel, such as /a/, for exclusive use in creating
polysyllabic morphemes.  This phoneme would not be used for anything
else.  For example, a morpheme of type CVN, could be "tun", "batun",
"kwasatun", "dambyamatun", etc.  In other words, whenever /a/ appears,
it indicates that the morpheme continues to the right.  Only the last
syllable is used to determine the morpheme's type.


                         End of essay