Proto Indo-European

What is Proto Indo-European Language?

The term Proto Indo-European (PIE) refers to a common, super-family language that spawned numerous others over time and through the movement of tribes. The theory of language development centers around the concept that several European languages developed from a single, original language. Linguists believe that the original Indo-European language began around 5000 B.C. either between Poland and the former U.S.S.R. or modern day Turkey. As tribes grew, evolved, and migrated, dialects developed and changed, thus making them substantially different from one another. Over time, each dialect became its own daughter language of the original PIE; it is believed that over half the world speak a derivative of an Indo-European Language, English being the most widespread. Though different, these several languages share some basic spellings, pronunciations, and phonetic rules; linguists believe that by comparing these different languages one can identify connections between languages that link them to this original PIE.

Where did it Originate?

Two Theories:

The origination of PIE language is a long-debated topic yet two major theories suport this beginning. The first is the Kurgan theory, the most widely accepted and supported, which postulates that the people of an archaeological "Kurgan culture" (a term grouping the Pit Grave culture and its predecessors) in the Pontic steppe were the most likely speakers of the PIE language.

The other less accepted theory is that of the Anatolian Urheimat. This theory suggests the spread of the Indo-European languages was a result of the spread of agriculture. This belief implies a significantly older age of the PIE language (ca. 9,000 years as opposed to ca. 6,000 years),


According to archeologist, Marija Gimbutas the Indo-Europeans were a nomadic tribe in southern Russia and expanded on horseback in several waves during the 3rd millennium BC. Their expansion coincided with the taming of the horse, thus,giving them archeological evidence, though you will see further down below, that the Kurgan theory is able to revert this evidence back to the Kurgan theory. Gimbutas put further emphasis on the fact that the invading cultures were patriarchal while those invaded were matriarchal. This idea was supported by Neolithic graves which indicated a matriarchal society. Other evidence of the Anatolian theory comes from a process called glottochronology (an approach in historical linguistics for estimating the time at which languages diverged, based on the assumption that the basic (core) vocabulary of a language changes at a constant average rate. According to this practice, the Indo Europeans date back 9,000 years coinciding with the later timeline of the Anatolian theory. However, this method has been considered as invalid by most mainstream linguists.

The Kurgan theory hypothesizes that the Kurgan people had four successive periods, with the earliest (Kurgan I) including the Samara and Seroglazovo cultures of the Dnieper/Volga region in the Copper Age (early 4th millennium BC).

external image 400px-IE_expansion.png
To the Right:
The Scheme of Indo-European migrations from ca. 4000 to 1000 BC according to the //Kurgan hypothesis//. The purple area corresponds to the assumed //Urheimat// (//Samara culture//, //Sredny Stog culture//). The red area corresponds to the area which may have been settled by Indo-European-speaking peoples up to ca. 2500 BC; the orange area to 1000 BC.
(Chart and the above explaination are Courtesy of Wikipedia ).

The Kurgans were thought to be nomadic pastoralists, who, according to the model, by the early 3rd millennium BC expanded throughout the Pontic-Caspian steppe and into Eastern Europe. Evidence for such a theory has come from studying cognates from many languages in a family. For example, when we consider the cognate Scheme of Indo-European migrations from ca. 4000 to 1000 BC .
The Kurgan theory has evidence in family sets of modern Indo-European languages. For example, there are no cognates for words meaning tiger, camel, monkey · olive, palm, desert, rice · gold, silver, iron · ocean, ship, sea. This suggests that the original Indo-Europeans did not live in a warm climate, did not live near the sea, and did not work metals. Instead, we do find cognates sets for words meaning · snow, freezing, cold, winter, summer, spring · oak, beech, birch, willow · bear, wolf, otter, beaver, deer, rabbit · horse, sheep, goat, pig, dog, herd, cow · wheel, axle, door, timber, thatch, yoke, oxen, wagon · seed, sow, weave, sew. The Indo-Europeans were originally an inland people. They lived in a temperate climate, which had seasons; they drank alcohol made of grain, not wine made of grapes, which suggests this temperate climate was more cold than warm. They farmed, had herds of domesticated animals, and transported themselves in wagons drawn by oxen. (Notice that there are cognates for all the major parts of a wagon in modern Indo-European languages!) All this evidence suggests that the original homeland of the Indo-Europeans was an inland area between what is now northern Europe and southern Russia.

One of the best received suggestions is that the original Indo-Europeans are the Kurgan mound-builders who lived northwest of the Caucasus Mountains and north of the Caspian Sea about 4000 B.C. This evidence is attractive because: 1. Kurgan cultural artifacts and their geographic location fit the vocabulary of the culture and environment suggested by Indo-European cognate sets, and 2. Between about 4000 B.C. and 2000 B.C. the Kurgan people began a massive series of expansions into Europe and the Middle East. This is approximately the time that linguists believe the original P.I.E. language separated into different branches in different geographical areas. (NOTE: When the Indo-Europeans spread across Europe, they were not moving into unoccupied territory. Archeological evidence shows that Europe was inhabited by humans long before 4000 B.C.) The Indo-Europeans either pushed aside or absorbed the earlier people, causing their languages to become extinct or nearly so. Only a few isolated pockets of Proto-Indo-European languages remain today. One is Basque, spoken in the Pyrenees Mountains on the border between France and Spain. Archeological evidence also supports the Kurgan theory. Compare it to the information we can infer about the Indo-Europeans based on the linguistic evidence in cognate sets. The Kurgans:

  • domesticated horses and cattle and used them for meat, milk and transportation;
  • farmed and herded;
  • were a mobile people who used four-wheeled wagons to cart their belongings;
  • had a warrior nobility and a common laboring class;
  • worshipped a sky god associated with thunder;
  • built elaborate burial sites, suggesting a belief in life after death;
  • they built fortified places on hill-tops.

Though most evidence leads to the Kurgan theory, it is important to remember that these are only hypotheses and we may never know the real origins of the Indo-European people. One of the major controversies is the basis that the Kurgan theory promotes the migration from Europe to Asia while the Anatolian theory promotes migration from Asia to Europe. This became a great source of conflict particularly during the 20th century when the Nazi party used the Kurgan theory to support their expanison of an Aryan race.
(Sources: Department of Linguistics, University of Oregon, Wikipedia and Wordiq)

Sir William Jones (1746-1794)

Sir William Jones is best known for discovering that Sanskrit resembled Greek and Latin. Throughout his lifetime, Sir William Jones learned 28 languages, often by teaching himself. In 1783, Jones traveled to Calcutta, India to serve as a judge in the Supreme Court. There he learned the ancient language of Sanskrit in order to better prepare himself for his study of Hindu and Muslim law. He found numerous similarities among Sanskrit, Greek and Latin, and knowing that these three languages were considered to be some of the earliest, he used the connections found within them as the basis of his new proposition that they must all have one common root. The connections among Greek, Latin and Sanskrit were, according to Jones in his 1786 speech, "so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source... [and] supposing that both the Gothic and the Celtic, though blended with a very different idiom, had the same origin." That common source, the same origin, was Proto-Indo-European. Although at that time, other scholars had been aware that some languages were "relatives," it was Jones' discovery that gained the most attention.

We can think of cognates as linguistic acheological remains. Cognates are “pairs/sets of words descended from a common ancestor” and not just words that look like each other (i.e. coffee and kaffee and café which is an instance of borrowing of the same word by various languages). Cognates are historically related words.

Indo-European came to be reconstructed by scholars when, in 1647, a Dutch linguist and scholar, Marcus Zuerius proposed the possibility of common origin due to similarities among Indo-European languages. He then supposed the existence of a primitive common language which he called “Scythian.” In his hypothesis he included the following languages: Dutch, Greek, Latin, Persian, and German, later adding Slavic, Celtic, and Baltic languages.

Zuerius’ hypothesis re-appeared in 1786 when Sir William Jones lectured on similarities among the four oldest languages known in his time: Latin, Greek, Sanskrit, and Persian.

Sanskrit   Avestan   Greek    Latin    Gothic     English
pita                 pater    pater    fadar      father
padam                poda     pedem    fotu       foot
bhratar              phrater  frater   brothar    brother
bharami    barami    phero    fero     baira      bear
jivah      jivo               wiwos    qius       quick
sanah      hano      henee    senex    sinista    senile
virah      viro               wir      wair       were(wolf)
                     tris     tres     thri       three
                     deka     decem    taihun     ten
           satem     he-katon centum   hund(rath) hundred

Then, between 1833 and 1852 German linguist Franz Bopp supported Sir William Jones’ theory and produced Comparative Grammar of Sanskrit, Zend, Greek, latin, Lithuanian, Gothic and German, which was the beginning of Indo-European studies as an academic discipline.

However, there are opponents to the aforementioned idea of reconstruction. The Anatolians associate the spread of Indo-European languages with the Neolithic spread of farming, and the Kurgan hypothesis lies in the fact that part of its proposed mode of spread through military conquest coincides with historical reports.
Information courtesy of the following website: Cambridge Encyclopedia Vol. 36

The Comparative Method & Reconstruction

If Proto-Indo-European was no longer spoken before it was ever written down, how do we know it was an encompassing ancestor of many other languages we speak today? We know it due to the Comparative Method; it's a technique that was developed during the 19th century by scholars, in their attempts to reconstruct the proto-languages by comparing them to related languages by looking at similarities. We arrive at reconstructions using the Comparative Method as linguists discover the words and rules of language. There are two kinds of reconstruction: internal reconstruction (looking closely at a single language to determine its history and evolution), and comparative reconstruction (comparing two or more related languages using the Comparative Method); comparative reconstruction is utilized more frequently. What we know today about Proto-Indo-European is because of the Comparative Method. It gave linguists a way to compare what was believed to be the similar sounds of Proto-Indo-European to the languages that are said to have derived from it. It's how we know there is a connection.

Certainly, just because words appear to be similar does not mean they are related. Some similarities could be coincidental; for example, in many languages the words for mother and father are "mama" and "papa," based on the early first sounds most babies make. Also, as we know with the English language, many words are "borrowed" from other languages; for example in English the scientific word for spider is "arachnid" which comes right from the Greek myth Arachne (a myth that explained to ancient Greeks how the spider came to exist). So, the comparatist must exclude these instances and assume the genetic relationship from one common ancestor. Comparatists work by the theory of "one fact/one hypothesis," meaning the one fact is that languages present so many exact similarities that this cannot be a matter of "chance" or "borrowing." The one hypothesis, consequently, is that they must be a descendent of a common ancestor.

Early in the 19th century scholars started to examine similarities in languages spoken now, and they were able to group them into what we call "Indo-European." As scholars compared the similarities among the different modern languages, they found the modern languages were a continuation of a single early language that we call Indo-European or Proto-Indo-European. It's an incredible and fascinating fact that scholars reconstructed sounds and words of a language spoken before writing even existed. English is said to be the most widespread member of the Indo-European family of languages, because it is spoken by over 300 million people and is one of the most important languages spoken today.

12 Branches of the Indo-European family

The Celtic Branch

This is now the smallest branch.

The Germanic Branch

These languages originate from Old Norse and Saxon. Due to the influence of early Christian missionaries, the vast majority of the Celtic and Germanic languages use the Latin Alphabet.

The Latin Branch

Also called the Italic or Romance Languages.
Latin is one of the most important classical languages. Its alphabet (derived from the Greek alphabet) is used by many languages of the world. Latin was long used by the scientific establishment and the Catholic Church as their means of communication.
Italian and Portuguese are the closest modern major languages to Latin.

The Slavic Branch

These languages are confined to Eastern Europe.

The Baltic Branch

Three Baltic states but only two Baltic Languages - Lithuanian and Latvian.

The Hellenic Branch

The only extant language in this branch is Modern Greek.

The Illyric Branch

Another single language branch. Only Albanian belongs to this branch.

The Anatolian Branch

All languages in this branch are extinct.

The Thracian Branch

This branch is represented by a single modern language, Armenian. It has its own script.

The Iranian Branch

These languages are descended from Ancient Persian, the literary language of the Persian Empire and one of the great classical languages.
The main language of this branch is Farsi (also called Iranian, Dari and Persian), the main language of Iran and much of Afghanistan. Kurdish is a close relation.

The Indic Branch

This branch has the most languages. Most are found in North India. They are derived from Sanskrit (the classical language of Hinduism dating from 1000BC).

The Tokharian Branch

Turfanian and Kuchean are recently identified extinct languages once spoken in north west China.
Information courtesy of: Krysstal

Indo-European Language Family Trees

Dan Short: Centum Languages - we were granted permission to use this family tree on our wikispace.
This family tree offers a listing of Indo-European languages from western Europe - click on the website for even more comprehensive information about each language.


Dan Short: Satem Languages - we were granted permission to use this family tree as well on our wikispace.
This family tree offers a listing of Indo-European languages from Eastern Europe & Asia - click on the website for even more comprehensive information about each language.


Schleicher's Fable

external image 26406-003-D6F6BF69.gif
In 1868, August Schleicher , a German linguist, was the first to write a text using the reconstructed Proto Indo-European language. The text is a fable with all its necessary components: a short narrative that makes a moral point, where the main characters are animals.

  • Proto Indo-European text:Avis, akvasas ka. Avis, jasmin varna na a ast, dadarka akvams, tam, vagham garum vaghantam, tam, bharam magham, tam, manum aku bharantam. Avis akvabhjams a vavakat: kard aghnutai mai vidanti manum akvams agantam. Akvasas a vavakant: krudhi avai, kard aghnutai vividvant-svas: manus patis varnam avisams karnauti svabhjam gharmam vastram avibhjams ka varna na asti. Tat kukruvants avis agram a bhugat.

  • Modern English Literary translation:The Sheep and the Horses. A sheep that had no wool saw horses, one pulling a heavy wagon, one carrying a big load, and one carrying a man quickly. The sheep said to the horses: "My heart pains me, seeing a man driving horses." The horses said: "Listen, sheep, our hearts pain us when we see this: a man, the master, makes the wool of the sheep into a warm garment for himself. And the sheep has no wool." Having heard this, the sheep fled into the plain.

Schleicher's text was largely based on Sanskrit, but aside from the language, it is also important to note the subject matter of this tale. The action verbs are "saw," "pulling," "carrying," "driving," "makes," "heard," and "fled." To a hunter/gatherer society, these actions illustrate the typical day. Their way of life depended on physical labor and use of the senses (sight, hearing, touch). Furthermore, the nouns are "sheep," "wool," "wagon," "load," "man," "garment," all of which may have been real objects of importance to these original PIE speaking people. Therefore, Schleicher was not only illustrating the construction of the language but also doing so in a way reflective of the type of vocabulary that might be utilized by a native speaker of the language. One may even conclude, that the moral of the fable may be a reference to man's influence over language and its many changes.

Although Schleicher was the first to compose a text in PIE, various scholars later published revised versions to demonstrate what PIE language looked like as it changed over time. Those revisions were by Hermann Hirt in 1939, Winfred Lehmann and Ladislav Zgusta in 1979, Douglas Adams in 1997, and most recently in 2005 by Frederik Kartlandt.

Synthetic Languages

A synthetic language is a language which uses inflectional forms, such as affixes , as a primary means of indicating the grammatical function of the words in the language. Synthetic languages are also referred to as inflected languages. An example of a synthetic language is Latin.

The opposite of a synthetic language is an analytic language, also known as an isolating language, in which the word forms are mostly or totally fixed, and grammatical functions are indicated through the use of helper words and word order. Chinese is an example of an analytic language.

Inflection in Indo-European Languages

Dr. Kelley Ross writes that “A conspicuous feature of Indo-European grammar is the original extensive inflection of nouns and verbs.” She further explains that, “All these languages actively inflect nouns and adjectives for case, gender, and number, except English, where there is only a remnant of the system, mainly in the pronouns.”

Dr. Ross uses the chart featured below to illustrate various cases that occur in the inflection of nouns in a selection of Indo-European languages.






To help explain the function of these inflections, Dr. Ross notes that

  • The vocative case (Voc) occurs when someone is being addressed
  • The nominative case (Nom) is the subject of a sentence.
  • The genitive case (Gen) denotes possession, "of" or "from."
  • The accusative case (Acc) is the direct object of a sentence or motion towards.
  • The dative case (Dat) is the indirect object; can use "to" or "for."
  • The ablative case (Abl) modifies nouns marked by a motion away from something; sometimes referred to as the adverbial case. 15 documented types of ablative case in Latin
  • The instrumental case (Ins) is the agent for the passive voice or the means.
  • The locative case (Loc) means "at" or the location of something.


The Nostratic is the result of a controversial hypothesis of a single family of languages that is a part of the indigenous language families of Asia, Europe, North American and Africa. When translated, the term "Nostratic" means "our language". It is a broad term used to encompass and group together a variety of related languages.
Holger Pedersen, a Danish linguist, built upon the current idea that all languages decended from one central language, and first proposed the notion of "Nostratic", as the basis for the Indo-European, Finno-Ugric, Samoyed, Turkish, Mongolian, Manchu, Yukaghir, Eskimo, Semitic, and Hamitic languages. He believed this "Nostratic" hypothesis could potentially encompass a variety of other languages as well.
Nostratic theory relies heavily upon the comparative method, that is, the process by which sound-and-meaning correlates as well as grammar correlates are matched among language families. Though some linguists in Europe and Russia have expressed support for the hypothesis, many American linguists have challenged its claims, arguing that the data collected to support the hypothesis is flawed.

Many linguists have long been fascinated by the "fist-five" dilemma, where there are clear similarities between the English words, "fist", "finger", and "five". This can also be seen in Dutch, (vuist," "vinger" and "vijf") and German, ("faust," "finger" and "funf"). According to linguists, many years ago, before these different languages split from their original "mother tongue", there was a clear correlation between these three words. Linguist, Dr. Manaster Ramer, contends that in order to find the root of this similarity between these three words, one must look beyond the P.I.E. and examine two other language groups, Uralic and Altaic, for example. Uralic includes Finnish, Estonian, and Hungarian, while Altaic includes Turkish and Mongolian lanuages. These two language families, along with P.I.E. seem to have similar roots back to the lost language of Nostratic. This language was spoken over 12,000 years ago. Many linguists are skeptical of the existence of the ancient language, Nostratic, due to a lack of evidence. The 1998 book, Nostratic: Sifting the Evidence, by John Benjamins looks to examine all sides of the issue and provide a balanced view of the different sides of the debate.

This diagram shows the breakdown of the supposed Nostratic language.

Evolution of the Celtic, Italic, and Hellenic Branches

In the tables that follow, columns show 500/1000-year ranges, reading left to right; successive rows display groupings of sub-families (in bold face), languages within them (italicized if dead), and, reading left to right, not just a chronological but an evolutionary sequence.

Proto-Celtic speakers moved generally west from the PIE homeland, probably alongside groups from the Italic branch, spreading across southern Europe into central Turkey, northern Italy, France, Spain, and eventually the British Isles. As centuries passed, their language evolved into one group of languages labeled Continental (spoken by "Gauls" across southern Europe and mentioned by Julius Caesar among others), and another labeled Insular (spoken in the British Isles). Continental Celts later adopted Latin, or Greek in the case of those in Turkey, and the Continental Celtic languages, attested from the 6th century B.C., were lost. Insular Celtic split into a Goidelic subgroup that developed in Ireland, and a Brythonic subgroup that developed in England & Wales. Later in history, Goidelic Celts migrated to Scotland; also later in history, Brythonic Celts under pressure from the Anglo-Saxons returned to the Continent and settled in Brittany, on the western point of France.



500-1 BC
1-500 AD











Ogham Irish

Old Irish

Middle Irish

Irish Gaelic

Scots Gaelic



Old Welsh

Middle Welsh


Old Cornish

Middle Cornish


Old Breton

Middle Breton


The Italic peoples began their descent into the Italian peninsula around the 2nd millenium B.C. Two subgroups developed from Proto-ItalicSabellic and Latino-Faliscan, both attested by 7th century B.C. inscriptions (the former in Umbrian, the latter in Faliscan). But the growing strength of the Latin speakers, culminating in the Roman Empire, resulted in most competing tongues in Italy (and many elsewhere, for example Continental Celtic) being extinguished. With the collapse of the Empire, the provincial Vulgar Latin dialects rather than Classical Latin survived, and in time developed into the Romance languages.



500-1 BC

1-500 AD










Classical Latin



Old Italian


Old French


Old Provençal


Old Spanish


Old Portuguese


For all practical purposes, the Hellenic family is represented by a single language spoken in Greece and the Aegean Islands: Greek, which is attested in a number of dialects spanning more than three millenia. The oldest, Mycenaean Greek texts pre-date the 14th century B.C. and were written in the script known as Linear B. But an invasion of possibly illiterate Dorian tribes circa 1100 B.C. was followed by the collapse of Mycenaean civilization and the loss of the art of Greek writing. A few hundred years later the Greeks adapted a Phoenician script—adding, for the first time, letters representing vowels. This script developed into what we know as the Greek alphabet, which formed the early basis of the Etruscan & Roman alphabets among others (a more modern example being Cyrillic).




500-1 BC

1-500 AD


Ancient Greek

Attic Greek

Koine Greek
Middle Greek

Homeric Greek

Doric Greek

Source: University of Texas: Linguistic Research Center

Other Proto Languages

Besides Proto Indo-European, proto language families that exist are Proto-Uralic and Proto-Dravidian.


Proto-Uralic languages were spoken from 7000-5000 B.C.E.
in the region of the Ural mountains. This proto language developed into Proto-Finno-Urgic and Proto-Samoyedic, as well as
Finno-Permic (to the west). Interestingly enough, transcriptions of Proto-Uralic languages are not done through IPA; instead, UPA (Uralic Phonetic Alphabet). Providing a stark contrast to IPA, there are no diphthongs or long vowels in UPA. Additionally, there are no initial or final consonant clusters. Six noun cases exist,
yet there are only three number classifications (one, two, many). There are no noun articles nor noun genders. With all the research that has been conducted, only 200 words in Proto-Uralic have been reconstructed.
Wikipedia: Proto-Uralic Language

Proto-Dravidian languages of India can be classified into three groups to make twenty-one
languages: North, Central, and South. The Northern Group includes Brahui, Malto,
and Kudukh. The Central Group includes Gondi, Konda, Kui, Manda, Parji, Gadaba,
Kolami, Pengo, Naiki, Kuvi, and Telugu. Of these, only Telugu became a literary
language when it split from the Proto language between 1500 and 1000 B.C.E. The
Southern Group includes Tulu, Kannada, Kodagu, Toda, Kota, Malayalam, and Tamil.
Not remarkably, the groupings have been made because the dialects share
significant linguistic features. The features include five short vowels, their five long
counterparts, and sixteen consonant sounds. Typically, sentence syntax appears as

Proto-Dravidian helped to form over 21 different languages. They are classified into
1. The Northern Group:
  • consists of three languages

2. The Southern Group:
  • consists of languages such as Kannada, Tamil, Malayalam, and Tulu.

3. The Central Group:
  • consists of ten languages
  • only one of these ten became a literary language. The rest remained tribal languages.

For further reference, see:

The Comparitive Method and IE LanguagesThe Early History of Indo-European Languages
Brittanica: Sir William Jones
Wikipedia: Sir William Jones
Dngu.Org Schleicher's Fable

NYTimes Article
Was Nostratic A Real Language? - Newswise Article