South Cushitic is one of the branches of Cushitic whose unity has been repeatedly debated. Last fall I have taken an initial stance in favor of continuing to recognize at least the wider Rift family, and that also Rift-like features in Ma’a and Dahalo could very well be their real genealogical stratum, opposed to some contact influence from East Cushitic (and not the inverse). There might be however also a different problem to tackle: is “South Cushitic” really a member of Cushitic? After all, Cushitic is itself not a primary language family but, most likely, only a branch of the wider Afrasian / Afro-Asiatic, so even finding some parallels with Cushitic that would point to relationship does not necessarily have to mean membership in it. We also have a very similar research-historical parallel that difficulties in demonstrating the membership of “West Cushitic” was initially resolved by promoting it into “Omotic”, an independent Afrasian subfamily, and concluding that its similarities with Cushitic mainly result from contact. It also helps that the remainder of Cushitic is itself widespread / diverse enough that it has been easy to notice that several “Cushomotic” features are not well-distributed in the subfamily, not even in the closest-by East Cushitic, but rather concentrate in the third-order subgroups in closest contact with Omotic (mostly in Highland East Cushitic = HEC and in Oromoid).
Basically the same procedure has brought me to today’s idea. For the last year I’ve been going on and off thru the scattered literature on Cushitic reconstruction, starting from the large East Cushitic. Its reconstruction efforts have naturally relied more on the bigger and more accessible languages, that were documented in some detail earlier than others, e.g. Saho–Afar, Somali, Oromo, Konso, Sidamo, Hadiyya. A stroll through newer data e.g. on the smaller group of Arboroid languages in the far southwest (Arbore, El Molo, Daasanach), comparing their data with already extant PEC reconstructions as you go, still though reveals unreported but obvious cognates by the dozens. This is about what I would expect: good distribution (≈ early reconstructibility) within a given subset of related languages predicts also good distribution in relatives outside of it, especially relatives that descend from the same proto-language, but often also in external relatives. Brief initial ventures, though I don’t have systematic data about this yet, suggests that this pattern would continue even into the more distantly related Agaw languages and Beja. So it seems to me Proto-East-Cushitic was at least close enough to Proto-Cushitic that it had not radically changed all of its core lexicon, and will be thus its best first-order approximation.
(I would hope to show this methodological point also with other data eventually; e.g. within the Finnic or Samic subfamilies, the distribution of Uralic vocabulary seems to me to be on average better than that of vocabulary from external sources or of unknown origin. But this still requires detailed statistical analysis.)
Only some of the third-order subfamilies like HEC then seem to have, probably due to contact effects, lexically diverged yet more radically. Still, there are many enough East Cushitic subgroups (8 or 9 altogether [1]) that most of the old native vocabulary would be expected to have been preserved in at least a few of them, and so to remain reconstructible. This is the hypothesis I initially took as my plan for approaching South Cushitic too. Going thru e.g. the entire known Dahalo lexicon, or all the ~2500 existing Proto-West-Rift (PWR) reconstructions, and comparing them with dozens of other Cushitic languages all on equal ground, would be lots of work and would risk false positives. Integrating these into general Cushitic reconstruction should be easier by first taking the 200–400 well-established East Cushitic reconstructions and following thru to see what cognates we will have for them in the proposed SCu. languages; especially since there already exists literature arguing that it might be merely an additional East Cushitic subgroup or a couple of them. An even more specific hypothesis I had sketched (and others have supported too) was indeed that perhaps they don’t even diverge as their own branch entirely, but rather might be derived by heavy relexification from, or close to, some known East Cushitic subgroup? A couple curious cases also exist where SCu. seems to side with Arboroid or Yaaku, two branches right where we want them in the southwest of the ECu. area, in showing some sort of innovative / divergent basic vocabulary. A few examples from Václav Blažek’s 1997 “Cushitic Lexicostatistics: The Second Attempt”:
- ‘bird’: HEC *čʼiiɗa (? < *čʼidʔ-a) ~ PWR *tsʼiraʕa ≠ common EC *kimbir-
- ‘to eat’: Yaaku ɛk- ~ Dahalo ʕag- ~ PWR *ʕag- ≠ common EC *ʕun-; though there is *ʕVg- (HEC *ag-, others *ʕu/ig-) instead for ‘to drink’
- ‘fat’, ‘to be fat’: Yaaku deeʔ-ɛuʔ a. ~ Dahalo ɗeʕʕ-em- v. ~ PWR *duʕ-iya n. ≠ common EC *ħayɗ- a., *kils- n., *-kli/us- v.
- ‘fire’: Yaaku iku ~ Dahalo ʔééga ~ Arboroid *ʔeeg-; ≠ PWR *ʔaɬa; no common EC term. [2] Also, an ablaut relationship with ‘eat’ seems conceivable, in which case this would be probably downstream of the ‘drink’ > ‘eat’ shift (…but would also require the Dahalo to be a loan from some different branch to explain *ʕ > ʔ).
Thus, after finally getting my list of the most solid PEC vocabulary into a tentatively workable shape (though much lit. review would also remain to be done), I have simply gone looking up their West Rift equivalents. In work extending existing language families, I find this approach, the “streetlight method” as I call it (a fortified version of the streetlight effect), a preferrable start to following something entirely pre-given like the Swadesh list. If we already know there is no good reconstructed PEC candidate for e.g. ‘green’, and also know there is too a good one for ‘elephant’, then not bothering to look up ‘green’ in any additional relatives we’re surveying either will save work; while checking their words for ‘elephant’ will have good odds of finding additional cognates that a Swadesh list method would miss. There is only a certain size bracket of language families this works well with though. Again, the 2.5k Proto-West Rift reconstructions would be too much to use as a basis for this, while something like Proto-Afrasian with only a dozen or two widely accepted proto-etyma would be too little. But since Proto-East Cushitic comes out with a well-suited count of a couple hundred strong reconstructions, let’s run with those.
The initial results turned out baffling. Little of the solidly PEC reconstructions seem to have equivalents in PWR (be it as descendants or cognates); and a decent number of what turns up is hardly core vocabulary, instead things like PEC *bohl- ~ PWR *bohooŋʷ ‘hole’, PEC *fuučʼ- ~ Iraqw fuutɬʼ- ‘to whistle’. [3] The haul does not give the appearence of a set of cognates as much as an oldish set of loanwords. This I guess accords also with one stance by Kießling & Mous (K&M) in their book on Proto-West Rift, in their discussion of external connections: that a few clearly related and clearly widespread-in-Cushitic terms like *ʔaf- ‘mouth’, *ʔil- ‘eye’ look “too similar” and could be suspected of also being loans rather than real cognates. I would find (did find) this claim ill-founded if (West) Rift were simply a sister or niece branch to East Cushitic, even more so if it were a slightly disguised daughter: there is no rule that real cognates must look obscure, and many simple root structures can easily persist for millennia without major change (they do just that within ECu. itself). Dahalo then turns up somewhat more reflexes of PEC roots, but it seems some of these have to be recent loans already for phonological reasons. Old word-initial *w-, for example, is normally reflected in Dahalo as v- (in cases like vatɬʼ- ‘to return’ ~ PWR *watɬʼ- ‘to return home’), and yet for PEC *waraab- ‘hyena’, *warħan- ‘spear’ we find wáraaba, waraħa (probably from some Somaloid variety, e.g. Rendille has waraba, warħan); this already covers two of just four words documented by Tosco as beginning with w-. [4]
Is there not a “South Cushitic”?
What if this is simply an Omotic-like situation all over again? That is, we are rather dealing with a family distinct from Cushitic — to establish some distance, let’s call it formulaically “Rift–Dahalo” (RD) for now — which has however been its neighbor for millennia. I would think Proto-RD, irrespective of if it is Cushitic or not, was spoken in the ballpark of 4000–5000 years ago somewhere in the vicinity of Lake Turkana, from where the Rift branch spread southwards along the titular Rift Valley, while pre-Dahalo went instead east, over the valley’s escarpments and then roughly downstream along either the Tana, or maybe already the through the deserts of today’s northern Kenya and its intermittent rivers draining into the Jubba (cf. map at Wikipedia); this same approximate sort of route was probably also followed later on by pre-Somali (ca. 2000–3000 years ago) and still later by southern Oromo varieties (< 1000 years ago). The dispersal of RD might well have been also itself driven by the (gradual?) arrival or expansion of some East Cushitic group(s) in its former homeland. The current geographic gap between the Rift languages in central Tanzania and East Cushitic in Ethiopia + Northern Kenya then most likely just results from their intermediate contact zone having been run over by Nilotic “recently” (ca. 1000 years ago), but Cushitic material in RD & RD material in e.g. Arboroid would still testify for their former status as neighbors. Still, perhaps not as any kind of close relatives! This is why we would find parallels mainly in somewhat marginal vocabulary, or with somewhat marginal East Cushitic distribution. Even *ʔaf- ‘mouth’ could be easily chalked as a Wanderwort, since it has after all also gotten into all three Omotic families and Ethiopian Semitic. (This was, I suspect, likely driven by its use also for ‘language’.)
The idea that RD could be its own Afroasiatic branch actually does not even appear to be new — overview literature refers to this having been suggested in passing already in a 1983 in a paper by Fleming, that I have not yet seen. However, it is difficult to find either any follow-up on this idea, or clear critiques of it, and the idea seems to have been left basically ignored for four decades now.
As outlined when I first aired my suspicions on Twxttxr a few weeks back, this hypothesis would clearly be testable in various ways. If Proto-RD still comes from the milieu of modern Ethiopia, as is corroborated also by other lines of evidence like genetics and archeology, and with suggestions even in Iraqw oral history, we would probably expect contacts not just with rump Cushitic but also (the various components of) Omotic. My currently also ongoing collection of Aroid etymologies should thus be a good dataset to compare with as well. If RD were still a part of Cushitic, we’d definitely want to find more (e.g.) Rift–ECu. etymologies than Rift–Aroid etymologies; a few of which are already noted by K&M too. Also the other, definitely non-Eastern parts of Cushitic would make good control groups, now in the other direction. If those few RD ~ PEC comparisons that cannot be critiqued by any distributional grounds, the couple things like *ʔil- ‘eye’, were an older inherited Proto-Cushitic layer, then we should find signs of this layer also in Agaw and Beja, even some stuff that had been mostly or entirely lost in ECu. while still surfacing in SCu. Ehret’s first-pass Proto-Cushitic reconstruction provides in fact many contenders for this already. These would have to be critically reviewed though, already due to the many uncertainties of his SCu. reconstruction or his freestyling attitude to semantics. To pick an example at random, he compares Beja lekik ‘to lose, misplace’ with Iraqw ɬaq- ‘to fail, be unable’ and nothing in-between; in K&M’s reconstruction however this shows up only as a medial derivative *ɬaaqʼat- ‘to be tired, unable’. This suggests ‘to fail’, if this is a real meaning, doesn’t come from the direction of Ehret’s proposed PCu. meaning ‘to miss’, but instead from ‘to be to tired to manage’… and overall I would hesitate to take this as a reliable etymology. The idea I’ve seen recently that RD *ɬ doesn’t correspond to ECu. *l but *š could also point to comparing instead HEC *šakkʼ- ‘to tire, become tired’ (plausibly a similar medial derivative as in Rift: < *šakʼ-ɗ- < *šakʼ-at-), which will be now again a comparison within geographic reach of being due to borrowing rather than inheritance.
There remains to my eyes good evidence that the RD family is still a member of Afroasiatic in general though (as far as anything is), in their general grammar and, I suppose also, typology; e.g. distinctive markers in phonology include the ejectives, pharyngeals and at least traces of ablaut. Ejectives may count as just some sort of an old East African areal feature (also found in all of Omotic and Koman–Gumuz), but the other two have been for long taken as at least probable AA markers (even their lack has been used for arguing against Omotic being in AA…). Perhaps also some of the lexical evidence that has been presented in Proto-AA reconstructions, at times drawn only from RD but not (the rest of?) Cushitic, is worth something. A couple random examples of this from Ehret that do not look immediately terrible:
- Egyptian ibw ‘refuge, shelter’ ~ PSCu. *yab- (Dahalo jab- ‘to save’ + PWR *yab- ‘to fence in’; Ehret also tries to include Kʼwadza yabo ‘bow’ by some just-so-story);
- Eg. wħʕ ‘to investigate’ ~ PSCu. *waħ- (Da. vaħ- ‘to see’ + uncertain-looking Ma’a data; Blažek adds also Yaaku -wax- ‘to see’) ~ Chadic *w- ‘to see’;
- Eg. rḫs ‘to slaughter’ ~ PSCu. *raaxʷ- (PWR *daaxʷ- ‘to draw blood from cattle’ + Ma’a ráo ‘arrow for drawing blood from cattle’);
- Coptic [which variety?] hmom ‘to be hot’ ~ Arabic √ħmm ‘to be warm, to bathe, etc.’ ~ PSCu. *ħam- (Da. ħanṯið- < *ħam-t-is- ‘to warm oneself’ + ? PWR *haam- ‘to be hot’ with irregular *h-).
It’s also interesting to notice that many of these have parallels in Egyptian in particular, and this trend seems to continue further too. Now, the subgrouping of Afroasiatic is anything but settled, and Cushitic gets placed sometimes closer to Semitic and Berber, sometimes closer to Chadic (which is itself often placed close to Egyptian) … in theory this instability, too, could be due to Cushitic being really two branches, such that RD is the older Ethiopian branch that groups with Egyptian and/or Chadic, while mainline Cushitic could have arrived from Arabia along the same path as Ethiopian Semitic later did, thus grouping with Semitic? If we remove RD from consideration, the mainline Cushitic maximum of diversity is clearly in Eritrea (between Beja, Bilin and Saho), which could suggest a homeland along the Red Sea coast; perhaps even the Arabian coast rather than the African; with Agaw and East Cushitic later expanding south and inland independently, and at least the latter running there into Rift–Dahalo.
Grammatical-typological notes
Moving briefly from lexicon to grammar, a general weakness is that no such thing as Proto-Cushitic or even Proto-East Cushitic grammar has been reconstructed, and comparative work remains at comparing the various shallow third-degree branches among each other. At least one discussion of the grammatical evidence for Cushitic subclassification, Kießling (2001), [5] interestingly underlines thruout that many ECu. and SCu. grammatical parallels are either definitely or at least possibly already PAA archaisms. This angle would probably require yet more serious consideration. He also notes that some other parallels, such as a handful of plural suffixes (which are remarkably numerous across Cushitic), are concentrated in Omo–Tana; i.e. in Arboroid + Somaloid, i.e. the two language groups that, before the Oromo expansion, seem to have formed the southern fringe of East Cushitic… are your language contact senses not tingling yet? In another discussion, Tosco (2000), [6] Rift and Dahalo are assigned as independent entities under East Cushitic, but not much is given to motivate its own unity (some scarcer parallels crisscrossing its members clearly demand at least part of them to be arealisms). Also, for the overall unity of Cushitic, he refers mainly to the suffix conjugation. But then how sure are we to treat it as an innovation rather than archaism, and if still that, as a localized rather than areal innovation? I must remind as a warning that all of Omotic, too — if any of it is really Afrasian — is suffix-conjugating; thus suggesting also either 1) a retention, pointing to an Ethiopian origin of Afrasian in general; 2) the ability of suffix-conjugation to develop in multiple lineages in parallel, thus maybe also independently in RD; or maybe 3) returning to some sort of a “Macro-Cushitic” theory, where we merely demote Omotic to the first offshoot of Cushitic overall.
My sympathies are on the 2nd of these options. My experience with various close dives to the dialectology of well-documented modern languages has taught me above all that syntax and “pure” structural morphology spread very freely areally among close relatives or tight language areals, and are close to worthless for family subclassification; phonology is at times better but, especially in its major outlines, also often weak to arealisms; small details of lexicology and morphophonology, fully arbitrary single features free from roaming systemic pressures, provide the real gold standard results. Though since innovations of these sorts will be in turn vulnerable to secondary losses, some of them also to accidental similarities, even here only statistically compounding evidence can be, well, statistically significant. It will be very rare to find any “silver bullet” features, sharply and deeply cutting enough that they could cleanly establish a subgroup all by themselves, not just initially but for all time. All this might sound slightly backwards to people who have been taught repeatedly only about the value of paradigmatic morphology for language classification — but there is no contradiction here; we are simply talking about different meanings of “classification”! Paradigmatic morphology is standard proof for the existence of relationships between languages, but is not much evidence about their closeness. Pick twenty modern and ten older Indo-European languages between several branches, then try to classify them just by their nominal case systems; would you ever be able to find that Modern English, with its zero cases, has any particular relationship even with just Old English, let alone with e.g. any stage of German? Or consider the recent case of an alleged “Rung” subgroup of Sino-Tibetan, a concept set up for various languages clearly sharing much paradigmatic morphology, but which failed to take heed of widespread later drift across ST towards isolating grammar (i.e. loss of morphology), thus leaving the system to be more likely a shared archaism from somewhere close to Proto-ST than an exclusively shared innovation. Its disproof, too, has come precisely from what I call above “small arbitrary features”.
One more argument for my idea for an independent RD family is itself somewhat typological but is hopefully not particularly weak to arealisms. Contact-induced relexification is possible, not even particularly extraordinary; in Cushitic we already have Highland East Cushitic as a fairly clear example, although that too seems to be less thorough than what would have to be assumed under a hypothesis of Rift-as-East-Cushitic. But relexification is a process we expect to leave some sort of a phonological signal, if our starting language family and its new linguistic neighborhood are itself phonotypologically distinct to begin with. In Afroasiatic, we do have unique markers that will serve for this! The most conspicuous are the pharyngeal consonants *ħ, *ʕ (or perhaps really epiglottals *[ʜ], *[ʡ]; beside the point though) — highly rare worldwide, but regardless maintained in e.g. Arabic, Hebrew, Somali, Dullay, Dahalo and Rift for not just a couple thousands of years since their recent branchings in their immediate family tree, but probably for the whole whopping 10,000+ years since Proto-Afrasian. I believe they will thus make particularly good examples of what I call “inheritance phonemes“. And yes, e.g. corresponding to ? Proto-RD *ʕag- ‘to eat’, PEC *ʕVg- ‘to drink’ noted above, we can indeed find confirmation for deep native origin also in Arabic √ʕǧm ‘to chew, bite’ (even if evidence elsewhere seems lacking and perhaps this is ultimately not quite PAA, but something like a loan between pre-Cushitic and pre-Semitic). If we assume relexification in HEC, or even if we went back to assuming any given bit of Omotic to be still close to Cushitic and just massively relexified, we expect this process to probably eliminate or marginalize pharyngeals. Disappear they do, actually soundlawfully entirely; the one trace of former pharyngeals in HEC is probably word-medial *-ʔ-, which reflects all three of PEC *-ʔ-, *-ʕ-, *-ħ- [7] and is mostly found in native Cushitic roots. The novel, non-PEC vocabulary of HEC is instead full of very generic-looking roots like *bululo ‘ash’, *buudaa ‘horn’, *gooba ‘neck’, *ibibe ‘louse’, *kin- ‘stone’, that you might run into in practically any language family in the world. Their respective PEC equivalents *darʕ-, *gaas-, *lukʼm-, *ʔingir-, *ɗagħ- meanwhile already demonstrate one of each pharyngeal among them (duly preserved in e.g. Tsʼamakko darʕo, East Dullay tarʕo, Somali ɖagaħ).
The Rift languages & Dahalo do not work like this! They not only preserve pharyngeals putatively inherited from Proto-Cushitic, but also seem retain them as “productive” in their own vocabulary. Running into some is indeed so easy, that for demonstration — let’s look up the exact same Swadesh-100 concepts again in these, although I just now picked them only from my HEC lexical database without looking at wider Cushitic:
- ‘Ash’: Da. ʔíívu ~ ʔííbu, PWR *ʔura, *daʕaraa (the latter of course akin to PEC *darʕ-)
- ‘Horn’: Da. tumpi, PWR *xadaaŋʷ
- ‘Neck’: Da. ɗááʕeero, PWR *ʔisa
- ‘Louse’: Da. ʔítta, PWR *ʔitinoo (mutual cognates)
- ‘Stone’: Da. máve (← Swahili ma-we pl.), PWR *tɬʼaaʕa
Thus, whatever is the pre-Cushitic source of all the divergent South Cushitic vocabulary, clearly it itself already had an Afroasiatic look to it, as we can see in examples like Dahalo ɗááʕeero ‘neck’ (or also e.g. ʕani ‘head’ ≠ PEC *matħ-), West Rift *tɬʼaaʕa ‘rock’ [8] (or also e.g. *tsʼigaħa ‘four’ ≠ PEC *ʔafar-).
(No clear cognates for these in Beja or Agaw either. Awngi yíntí ‘louse’, ≠ South Agaw *bɨtt-, is noted by Blažek to have some similarity to RD *ʔit(t)-, but then this would to my eyes come also about as close to PEC *ʔingir-. The one definite Proto-Cushitic item for this mini-list is rather Agaw *kɨrm- ~ Konsoid *kʰolm- ~ ? Beja kokelem for ‘neck’, from which we find also some divergent reflexes like Somali kolon ‘dewlap’, Kʼwadza kolima ‘nape’.)
My proposal of the independence of RD can be thus also restated as follows:
- the source of their non-Cushitic vocabulary not just looked like but was/is Afrasian;
- it is not any hypothetical lost family, but simply just the still-extant Rift-Dahalo family itself, which already existed before eventually coming under major East Cushitic influence.
Sanity checkpoint: what’s the previous evidence?
Before I go too wild with all this, we should also pause to check what really is the prior argument for the Cushitic membership of Rift & co. A consistent research-historical claim (thus also Kießling & Mous) attributes this classification having been established “by” or “since” Greenberg, with the precise reference varying slightly, between his original 1950 Hamito-Semitic article, its reappearence in the 1955 collection Studies in African Linguistic Classification, and its 1963 revision as a chapter in The Languages of Africa (in the last, Ma’a and Dahalo now appear in SCu. also). This is surely correct in that the classification as South Cushitic has persisted as received knowledge since then. But did Greenberg really show this classification, or merely assert it? His classification work is after all not very careful at, or reliable for, distinguishing these two. Even many of his entirely novel details are given with only passing comment. E.g. the theory of Omotic membership of Aroid has its origins in his footnote III.14 (as of the 1963 book), which I quote here in full: “Bako [= Aari; J.P.] and the languages closely related to it were left unclassified in SALC because of lack of evidence. Material now available shows that these languages are without doubt Western Cushitic.” [9] On South Cushitic we can find a more promising assertion in footnote III.12; “I do plan, however, to publish separately the evidence for the affiliation of Southern Cushitic to the rest of Cushitic“. Alas no trace of such a work seems to be found cited in later literature or to appear in a purportedly complete bibliography of his. [10]
The Iraqw grammar by Mous adds a little more detail on the early history of classification of SCu., referring also to the proposal having been first made already by good old Reinisch (in some unspecified source) and elaborated further by lexical comparison by Meinhof “in 1906”, again uncited, but this probably refers to an article “Linguistische Studien in Ostafrika, XI: Mbulungwe” (= on the Rift language Burunge, its name here in a Bantu-adopted form [11]). If lexical comparison was the main evidence cited there, as well as only concluding more vaguely in favor a “Hamitic” affinity (however, unclear to me from Mous’ wording if at least Reinisch might have asserted them being indeed Cushitic), I will probably not miss anything too major by restricting my discussion mainly to what is presented by Greenberg.
For the second half of this blog post, let us thus go over the SCu. material that incidentally appears in Greenberg’s discussion of Afroasiatic, to find out if later researchers have been justified in taking him at his word on this. Greenberg’s discussion on demonstrating the AA origin of Chadic grammatical elements already covers some Cushitic material, but there the only example language he cites repeatedly is Beja; single apparences also by Oromo, Sidamo, and several Agaw varieties; but nothing SCu. We’ll have to turn to his list of 78 Afroasiatic cognate sets to find any concrete South Cushitic data. I will stick to discussing the Cushitic parallels, since again, I am not planning on questioning Rift-Dahalo being still ultimately Afroasiatic.
1. ‘Antelope’: Beja garuwa; East: Sidamo gedimo; Iraqw gwarɛħi ‘dikdik’ (+ Chadic)
K&M confirm PWR *gʷareeħa ‘dikdik’, as well as have a very similar-looking *gʷareʕáy ‘gazelle sp.’, which probably allows segmenting off *gʷaree- ‘antelope’. No term for ‘antelope’ appears in Hudson’s Proto-HEC reconstructions, nor is Greenberg’s Sidamo word recorded by Hudson, though it seems to be akin to his Gedeo gadansa ‘antelope’, Burji gadama ‘kudu’. Sasse’s An Etymological Dictionary of Burji (1982) adds further Oromo gadamsa and proposes that this might be a loan into Burji. At least the Gedeo form clearly is (reflecting the exclusively Oromo development of the feminine singular ending *-tV to -sV after a sonorant). Consistent medial *-d- in these would be compatible with WR *-r-, but is not so with Beja. Hudson has however also a more promising-looking synonym from Sidamo: sg.m. gaarraančo, pl. gaarraano, and perhaps Burji gargulaččo ‘Ethiopian duiker’ could be related to this too. There is a lack of any explicit labial element here though that could account for PWR *gʷ-, but maybe -rr- represents assimilation from something like *garw- (and is the Beja then too from this by epenthesis to *garuw-?). All in all, I could buy this comparison… Still, distribution in East Cushitic is very spotty and checking additional data fails to turn up anything in the neighborhood of #g⁽ʷ⁾ar- either (e.g. Daasanach has ðeere ‘kudu’, gini ‘gazelle’, nooto ‘dikdik’; Arbore for the same species respectively serem, ʔizze, tarri; Tsʼamakko has mirža ‘kudu’, sawro ‘dikdik’; Gawwada has katinko ‘greater kudu’, illikakko ‘lesser kudu’, mirkiya ‘gazelle’, sawro ‘dikdik’, the first of which does match Oromo-HEC *gadVm- though). I can’t really rule out any of accidental similarity, external substrate origin, or a loan between HEC and RD. Also, outside Cushitic entirely, Hadza has geweda-ko for ‘dikdik’, and also the borrowing of this into pre-WR could probably yield at least *gweda- > *gʷar- (I cannot tell if the *-ee- adds more difficulty).
21. ‘Cow’: Beja šaʔ; East: Somali saa; Burungi [sic] se ‘ox’ (+ Chadic, Berber)
A solid Cushitic and ECu. etymology (PEC *šaʕ-, reflected in all branches other than Yaaku and Dullay [12]). The Burunge word seems like a mis-citation of WR ⁽*⁾ɬee ‘cow’. However, if this is related (through an *š > *ɬ shift — and not to PEC *loʔ ‘cattle’), lack of *ʕ and the *e-vocalism probably suggests an old borrowing from (pre-)Arboroid; cf. Arbore, Daasanach seʔ. If this is not mis-cited, and rather a word found in Burunge only (K&M have no trace of it), then we have clearly a still more recent loan from the same source.
22. ‘Day’: Ma’a azi; Yemsa aši ‘now’ (+ Chadic, Berber)
Way too spotty evidence to trust an ounce on, and note that Yemsa is North Omotic rather than Cushitic proper. Elsewhere we have clearly unrelated words for ‘day’ like PWR *lalaʔoo, Oromoid+Dullay *guyy-, HEC *barra (< PEC *barr- ‘some time period’), Agaw *gärk-. If Bender is to be trusted, the Yemsa word is rather akin to Bench haši ‘now’, and probably further to Proto-Ometo *hat(s)ʼi id., with no signs of coming from a meaning ‘day’ at any reconstructible time-depth.
28. ‘Egg’: Agaw: Bilin kaɣaluuna, Chamir qaluuna; East: Saho unkualale; Iraqw qanhi (+ Chadic, Berber)
The first two come clearly from some Proto-Agaw root at least, though Appleyard notes reconstruction is difficult (with other data he gets together *qäɣal- ~ *qʷäräɣ-). The Saho word must be just a loan from Amharic ɨnkʼwɨlal, which has also been borrowed into other Agaw varieties such as Awngi ɨnkʷlal. The Iraqw, properly qanħi, is per K&M rather cognate to Gorowa qanħaa ‘germinated grains’, but it looks like the semantic shift here is by bleeding from PWR *qʼanaʔoo ‘eggs’ (or are they perchance both from a common verb root meaning ca. ‘hatch, sprout, spawn’?) … still, I do not think sharing *kʼa- + some resonant gives sufficient grounds to reliably compare it with Agaw.
29. ‘Eye’: Beja lili; East: Somali il; Agaw: Quara (y)il; Iraqw ila (+ Chadic, Egyptian, Berber)
Solid pan-Cushitic *ʔil-, as already noted, continues to be widely recognized as PAA too.
41. ‘Knee’: Beja gunba; East: Saho guluub, Somali jilib; Agaw: Bilin girib; Iraqw guruŋgura (+ Chadic, Berber)
A solid Cushitic etymology; Proto-Agaw *gɨrb, PEC *gi/ulb- (I have been suspecting primarily *i, with an umlaut *gilub > *gulub in some consonant-stem reflexes), Dahalo gilli sg. : gillibe pl., PWR *guruguunda ~ *guruŋguuda (borrowed apparently also into Hadza: gurunguri). The *r/*l discrepancy seems to rule out borrowing from ECu. into Rift (OTOH might point to borrowing into Dahalo), and probably it is rather a real cognate; but it equally well seems to rule out Rift as a low-level member of ECu.
51. ‘Mouth’: Beja yaf; East: Somali af; Iraqw aafa (+ Chadic, Semitic)
Solid pan-Cushitic *ʔaf-, as already noted, again widely recognized as an already PAA root.
74. ‘Tree’: East: Somali ged; Ma’a m-xatu (+ Chadic, Egyptian, Akkadian)
The Ma’a is compared by Ehret with PWR *xaʔi ‘trees’ (sg. *xaʔi-noo), which would alas show that -tu is a suffix and lead to problems with all other cognates proposed by Greenberg. Ehret instead adduces Agaw *kan- ‘tree’, but this also ends up requiring ad hoc sound correspondences and Appleyard is not in support. Even Ehret leaves Somali (properly geed) off as well, since a Rendille cognate gey known by now shows that this rather comes from Proto-Somaloid *geyz. Thus no good cognates for SCu. to be found in this entry. A better-established Cushitic term for ‘tree’ is PEC *kʼor- (also in Dahalo: kʼoro).
78. ‘Woman’: East: Daasanach minne, Sidamo menti; Bench main; Iraqw ameni (+ Chadic)
The Sidamo has cognates all across HEC (and all with the feminine singular suffix); they probably reflect metathesis from PEC *nVm- ‘person’. The Bench (in North Omotic!) is presumably in origin a loan from this, though Proto-Ometo *ma(č)čʼ- is not dissimilar either (could this lexeme have somehow triggered metathesis in HEC?). It’s conceivable that the Daas. word shows that a metathesized form of the root used to occur also more widely across ECu. — it is not cognate with Arbore saallé, El Molo sáále for ‘woman’, but these seem to be innovative anyway. The precise Iraqw word here doesn’t seem to appear in K&M but would look like some sort of derivative from PWR *ʔaama ‘mother, old woman’. If so, it is then clearly not compareable (not analyzable as **a-men-) and instead a typical nursery term, worse yet of a type that’s not itself common in Cushitic (for ‘mother’, PEC has parallel variants *ʔaayy-, *yaayy-). — Also worth noting that metathesis and semantic shift from older *nVm- seems to further shoot down the Chadic comparison, too.
…and there we have it: Greenberg’s total provided evidence that might be in favor of thinking that the Rift languages are Cushitic; just nine etymologies, of which by my tally ‘day’ is immediately rejectable, ‘tree’ and ‘woman’ also rejectable after consideration, and ‘antelope’, ‘egg’ probably wrong too; while ‘cow’ is likely a loan. ‘Eye’, ‘mouth’ and ‘knee’ remain, but these for one do not provide much reason to think they’re specifically Cushitic (being decently represented also wider across AA), and for the first two, we could indeed also think they are loans from East Cushitic rather than cognates. Probably some more data could have been dug up from the Rift records available already in the 1950s–60s to supplement some of Greenberg’s other AA etymologies too (he has some Cushitic reflexes given for 40 more of them, many of them representing what we can today identify as generally solid PEC or Proto-Agaw etymologies). But if this kind of basic work has not been done, perhaps we should just default to the weaker position: there is no evidence from Greenberg to think RD is Cushitic.
Substantially more RD data has been involved in further Cushitic comparative work since the 60s, but I believe basically all of this should be re-looked over also with the hypothesis of an unrelated or only distantly related neighboring family in mind. A starting assumption that a relationship does exist often leads to proposing etymological connections that otherwise would not have been made, or which would have otherwise been taken as loan relationships, especially in the hands of Ehret-like etymological optimists. E.g. it seems to me there are good odds that the HEC word for ‘bird’ and the Yaaku words for ‘eat’, ‘fat’ just derive as “einzelzweiglich” loans from RD, back when it used to still be distributed up to Ethiopia. Inversely, even a quick glance again over Blažek’s Cushitic lexicostatistic data shows that many parallels in the extinct-ish and more poorly attested RD languages (Kʼwadza, Aasax, Ma’a), that have been assumed to be cognates, probably show just the effects of (East) Cushitic contact. If we find e.g. Ma’a i-tirao ‘liver’, which matches PEC *tir(-aw) excellently but neither PWR *daʔayee nor Dahalo mákko at all, we will have now good grounds to analyze it as a relatively recent loanword rather than any kind of an archaism (and should probably forget all about trying to treat PWR *rararaʔoo ‘spleen’ as a semantically and phonetically divergent cognate). In this case we awkwardly cannot get together any proper Proto-RD term either, but keeping on sifting the mess that is current RD comparative work beyond West Rift might eventually reveal cases of this, of e.g. a shared native term in West Rift + Dahalo that contrasts with an intrusive Cushitic term in Kʼwadza or Aasax.
Possible future directions
(Remember when I wrote a dozen or so paragraphs ago, “before I go too wild with all this…”?)
This reanalysis gives also new openings for the study of other East African straggler languages, as we should expect of any new language family proposal. My thoughts turn first of all to Yaaku, which per received knowledge is also supposed to be a very distinct language that retains fairly little inherited East Cushitic material. In fact it seems to me to be about the same proportion as we find also in Dahalo altogether. As already demonstrated above, it also sides with Dahalo and/or West Rift on at least a few interesting isoglosses. Earlier theories that place it as a heavily divergent member of either Dullay or Arboroid also seem to me to not really hold up, and rather there would seem to have been some secondary contacts between Yaaku and both of these subgroups. Perhaps these would be even enough to accommodate for most or all specifically East Cushitic material found in it? Perhaps it, too, is not actually Cushitic in origin, but an additional branch of Rift-Dahalo which was also heavily Cushitized, in a different fashion, sometime in its earlier history? It is not obvious to me if this is ruled out.
A second, more adventurous target could be Hadza: notoriously a click language, which again since Greenberg at least, has been mainly compared with the Khoisan languages. This does not have to be fatal though, since the third central–eastern African language with clicks is indeed Dahalo! In the absense of reasons to think there ever were clicks in Proto-Cushitic, let alone Proto-Afrasian, the consensus has always been that the Dahalo click vocabulary would represent recent substrate influence from some sort of a Paleoafrican hunter-gatherer language (perhaps simply the current speakers’ former language, partly preserved through language shift). If we now opt to derive Dahalo rather from a proto-language in common only with Rift, this is no longer obvious. Moreover, Kießling & Mous have already proposed that also the disproportionately high frequency of the ejective affricates *tsʼ, *tɬʼ in PWR could result from some sort of Paleoafrican substratum (they do write “Khoisan”, but this does not seem to carry any claim of an observable relationship with the languages this usually covers), whose clicks were mapped to these consonants. On a very quick initial partial look-over, I have not yet found any likely-looking West Rift cognates for click vocabulary in Dahalo; the closest find was PWR *ɬanú ~ Da. ⁿǀéénu ‘python’, but this turns out to have an even closer but clickless match in Hadza: ɬanó ‘python’. But even the initial setting-up of a hypothesis that maybe Proto-Rift-Dahalo already had clicks, which were then lost in Rift by sound change rather than sound substitution, gives good reasons to extend this exploration also further into Hadza; spoken not far to the west of Iraqw, and quite a few Rift–Hadza parallels are known (a few already came up naturally above). Regardless fieldworkers report there is close to no contact between their speakers, either ongoing or within living memory, and that all such parallels would have to derive from some earlier period in history. It would be again good to even check that we’re not dealing with outright a relationship.
Hadza was, incidentally, already compared with Afroasiatic in general in some detail just last year by Militarev… he leaves all click vocabulary outside consideration, but still finds an interestingly high proportion of core vocabulary similarities with Dahalo in particular (though still a smaller percentage than what we have between Dahalo and PEC), say Ha. mitɬʼa ~ Da. mittɬʼo ‘bone’. [13] A reserved but not entirely negative review by Starostin also notes that Militarev’s hypothesis currently has too many degrees of freedom due to comparison with all too many Afrasian languages or branches at once, and that one way of getting more believable results could be to not leave Hadza as a complete loose cannon in the giant AA family tree, but put a foot down to analyze it as something like a highly divergent Chadic or Cushitic offshoot. Alright, may I sell you gentlemen a brand-new seventh branch of Afrasian, located right next door, which also happens to already have clicks in it…? Honestly though this sounds too perfect; it would be kind of a miracle if it turned out there actually were identifiable click cognates to be found between Hadza and Dahalo, and just the “Khoisan” vs. “Afrasian” conceptual firewall had prevented everyone from finding them so far; but again, needs to be actually checked! [14] — Perhaps also Sandawe, click language “number two” of the region, could have something to contribute to all this, e.g. some confirmation on what exactly happens to different click types in Rift. But this I do not think has much odds at all of being a part of RD, given that the quite distinct Khoe–Kwadi (Central Khoisan) family seems to remain its best candidate for relatives.
I don’t think I need to start underlining too hard other possible corollaries such as, if there is an independent RD family, it might also have had extinct branches spoken elsewhere yet, and that this could help explain various phenomena…; since at this point it seems further work will be limited most of all by working out its own overall reconstruction beyond where Ehret had left it. Of course, also the safety checks I propose above, of seeing how will West Rift and Dahalo compare with e.g. Aroid or Agaw, or indeed why not e.g. Egyptian or Semitic, would be helpful in any case, whatever that then turns up. But I think I have at least managed to sound an alarm, at the very minimum for myself, that some things have to be in fact checked here in more detail.
[1] North to south by farthest extent: Saho–Afar, Somaloid, Oromo, Highland East, Bayso, Konsoid, Dullay, Arboroid, Yaaku. Oromo+Konsoid are often grouped as Oromoid, which seems likely (strong lexical and “family-typological” similarity), which gives the count of 8; the theory of grouping Somaloid, Bayso and Arboroid as Omo–Tana I don’t think I trust as much, and the theory of combining everything except HEC, Yaaku and maybe Dullay as a single Lowland East Cushitic seems basically unfounded, a research-historical artefact from early understanding of East Cushitic being mostly based on languages of this group. The South LEC theory (LEC minus Saho–Afar = the union of Oromoid and Omo-Tana) could be at least better than either this or Omo–Tana, but here I am in turn afraid it might not represent common inheritance as much as the long-standing contact influence of Somali and Oromo on their neighbors.
[2] Contrast e.g. HEC *giira, Oromoid *abidda (I’d think akin to Arboroid *awat- ‘sun’), Somaloid *dab, Dullay *katte for ‘fire’.
[3] Loss of *-l- and *čʼ ~ *tɬʼ are not grounds for trouble, they do appear regular in light of other parallels. K&M note also e.g. PEC *-kʼsol- ~ PWR *qʼas-aw- ‘to laugh’; PEC *kʼačʼ- ~ PWR *qʼuutɬʼ- ‘to cut’ (relatively basic verbs yes but neither of these especially deep core vocabulary either).
[4] Of the other two, wongo ‘earth’ is stated by Tosco to be a recent Northern Swahili loan; waala ‘rhinoceros’ is probably connected with Somali wiyil id., but I don’t think I like the Proto-Cushitic reconstruction *waɣl- ~ *wiɣl- proposed by Ehret, his *ɣ looks often dubious and reconstructing ablaut from just two reflexes with differing vocalism is definitely bad practice. — Medial -w- could maybe provide similar leverage too for identifying recent loans, but I do not have a structurally-alphabetized rootlist of Dahalo ready to go for easily locating these.
[5] Ronald Kießling (2001): “South Cushitic links to East Cushitic” [I believe to be read as an NP “the links of SCu. to ESc.” and not a full clause “SCu. does link to ECu.”], from the collection New Data and New Methods in Afroasiatic Linguistics. Robert Hetzron in memoriam (ed. Zaborski, pub. Harrassowitz).
[6] Mauro Tosco (2000): “Cushitic Overview”. – Journal of Ethiopian Studies 33/2, 87–121. [It took me actually until the writing of this blog post to get around to this paper, I had been confusing it with a different paper of Tosco’s from 2003 titled “Cushitic and Omotic overview”; which, despite more promises in the title, is actually far shorter and superficial.]
[7] I’m not sure if the last of these correspondences has been explicitly noted in literature (at least Sasse’s 1979 article on PEC does not note it), but cf. HEC *baʔ- ‘to extinguish’ < PEC *baħ- ‘to go out’, Burji naʔ- < PEC *naħ- ‘to fear’, supporting this development at least after *a. More indirectly and speculatively, maybe also HEC *kʼoʔo << PEC *ɗuħ- ‘marrow’, where the onset might be routable by loaning from East Dullay varieties that have *čʼ > kʼ, if this was really rather *čʼ = “*ɗ₁” (but alas the key lexical sources for Dullay do not even document ‘marrow’, and Oromo ɗuha, Konso ɗohota do not support *čʼ either). — Conversely, after *e, again perhaps most laryngeals rather generally merge as *h (HEC *leho < PEC *liħ- ‘six’, but also HEC *reh- < PEC *leʔ- ‘to die’), though there is some inconsistency even within HEC that may suggest more complex history (e.g. -h- in Hadiyya leh-, Kambaata reh- ‘to die’; Burji reeya ~ reeha ‘death’; but Sidamo re- and even Gedeo reʔ- ‘to die’).
[8] There would be some possibility that *tɬʼaaʕa ~ *ɗagħ- are cognates, by cluster simplification *għ > *gʕ > *ʕ; the onsets still cannot match natively. K&M only propose Somali ɖaʕa ‘sound of falling rock’ as a cognate… I’m not sure how much I like the concept of comparing Swadesh-list core vocabulary with ideophones.
[9] A discussion similar to this post on Aroid material appearing in Greenberg’s comparative data would be trivially easy: the total number of concrete Aroid data he cites at any point seems to be zero.
[10] There would be a possibility that some content on the topic appears in his later works on other aspects of Afro-Asiatic, but I’m not going on a wild goose chase for this, especially if professional Cushiticists have apparently not found anything to cite either. — See Greenberg’s article on Wikipedia for the bibliography, attributed there to William Croft who at least seems to have once hosted it, though the document itself (nor Croft’s old webpage on Greenberg) does not tell me anything about who actually assembled it, nor where it might have been published.
[11] Which one, I have no idea. At least Swahili seems to have by now indeed Kiburunge, and would be expected to have rendered at least -r- correctly for long now (though b → mb might be understandable for its own b being typically implosive).
[12] HEC *saʔa (and not **šaʔa) however looks to me like a loan from Oromo saʔa. Also original *saʕ- and, inversely, a loan into Oromo from e.g. HEC or pre-Konso (to explain lack of expected *s > f) could be reconstructed within East Cushitic, but this would be a worse match with both Beja and West Rift.
[13] These are likely further connected with HEC *mikʼe, Tsʼamakko (= West Dullay) meeqʼe, East Dullay *mikʼatte and Yaaku mučo for ‘bone’ (altogether adding up to complementary distribution with “Lowland East Cushitic” *laf-). Even in the absense of Rift cognates, this seems to already pull this word group to Proto-RD. Presumably we have the established *čʼ > *tɬʼ change here that happens at least in Dahalo and Rift, further fed here by some sort of palatalization *-ikʼ- > *-ičʼ-; but what family did this latter change happen in, Cushitic or RD? PWR only has unrelated *fara for ‘bone’. Some relationship with *mitsʼoo ‘calf of the leg’ could be possible though, even if *tsʼ rather than *tɬʼ is awkward.
[14] From some basic wordlists of Hadza and Dahalo at hand, it is at least clear these won’t be turning up by the cartloads. One initial comparison I can note as maybe within speculation distance is Ha. ⁿǀʼóso- ‘to be full’ ~ Da. ǀut- ‘to fill’; does not fill me with confidence though, sure enough both show a dental click, but then not much else directly matches. Nothing good in PWR to bridge the distance either; among several (near-)synonyms, *hatsʼ- ‘to be full’, *hatsʼ-is- ‘to fill’, *niiqʼ- ‘to fill’, *qʼip- ‘to be shut, stuffed, full’, *tip- ‘to fill (a hole)’ are all non-starters.
The Old Hungarian Vowel Shift
Talking with people about the history of Hungarian over the last couple of centuries usually leaves me with the impression that the topic would deserve to be better-known. It’s after all about as well-researched as the history of any modern European language; maybe towards the lower end due to not being embedded in a wider field like Slavic studies or Germanistics. While Hungarian is of course not entirely an isolate, it is still about the next-closest thing though, and the context of Uralic studies for some reason has not provided even what little help it could. E.g. the most prominent reference works on overall Uralic historical phonology have never really gone beyond modern standard Hungarian. Skipping details of dialectology could be justified, but I have for a good while now thought that, in particular, it would be worthwhile to note that several sound changes posited between Proto-Uralic and modern Hungarian are not just conjecture, but recent enough to be demonstrated also philologically. Getting started on that, here is an overview of what I think of as the most prominent bundle, which I’m going to collectively call the Old Hungarian Vowel Shift — or OHVS for short. [1]
For a recap first of all, the modern standard Hungarian vowel system is structurally relatively symmetric with seven long/short vowel pairs, but with quality differences between a few pairs. For the benefit of non-Uralicists, I’m including also a reminder about their IPA values:
The e / é asymmetry can be also easily reconstructed back to a more symmetric system. Most Hungarian dialects (but not that of Budapest, which has naturally been the base of the literary standard) traditionally distinguish also a short /e/, whose usual Hungarological transcription is the slightly unintuitive ë. (Here I must also give a shout-out to the portal / blog kiejtés.hu, which hosts an extremely handy cheat sheet on Hungarian words having mid ë instead of open e. This information would really deserve to be found also in almost any good dictionary, but regardless usually isn’t.) A smaller number of dialects and various late medieval to early modern written records have also a distinction between two long vowels that have merged as standard é, which distinction tends to be notated é ê, or sometimes é₁ é₂. [2] This originally pairs up with the /e ɛ/ distinction, even though é₂ is today mostly not an open vowel; in some dialects the distinction is even é₁ > í versus é₂ > é. Vowel harmony, too, still helps in distinguishing all these E-vowels: open e ê are harmonic equivalents of open a á (in still earlier times these four might be best reconstructed & notated as fully symmetric *ä *ǟ *a *ā — and some morphophonological descriptions follow this as even for modern Hungarian), while mid ë is a harmonic equivalent of ö o. (“Real” mid é is however rare in suffixes, mainly just from *ëj.) I have, regardless, yet to see / find any really thorough breakdown of which instances of modern é go back to which; and for further confusion, in a few words ModHu. has even adopted the dialectal reflex of *é₁ as í.
So altogether this ends up at a 8+8 system that we could call “common Middle Hungarian“:
(The qualitative asymmetry of a / á still remains here; it seems to me to have much older roots, already pre-Proto-Hungarian.)
On the path further back in time there are more drastic changes. The vowel system of sufficiently old Hungarian seems fairly different and there have been several theories on how to interpret the unstabilized orthography of the time, smushed down to just the five standard Latin vowel letters a e i o u. With the help of triangulation from not just modern Hungarian, but also from loanword strata and ultimately from Uralic comparison, I think that a decent understanding is possible by now. It might not have been presented anywhere especially well though. And what seems to be presented even worse in the literature is the development from that system to the standard Middle Hungarian one, due to the long-standing Hungarological bad habit of discussing sound change mainly as “tendencies”, not recognizing the fundamental difference between regular sound laws vs. irregular changes.
A note about periodicization is also in order by this point. “Old Hungarian” (ómagyar) seems to be these days most often defined as the period up to the mid-1500s, prior to the stabilization of standard orthography. However, nothing of particular note for the overall phonological inventory happens around the end of this period, and the Middle Hungarian vowel system per se seems to have been in existence at least thruout the 1400s. This period of, if you will, “Not So Old Hungarian” could be structurally simply treated as an extension of Middle Hungarian. A system that appears much different is only found even further back: this is the form in which Hungarian is attested in the first two longer texts from 1055 and 1195, as well as in fragmentary mentions in Byzantine and Arabic sources for a couple centuries yet further back still. (Besides the lack of the vowel shift under discussion, this stage of Hungarian is also distinguished by e.g. lack of apocope, lack of some vowel epentheses and syncopes, many preserved diphthongs, and some consonantal archaisms like the voiced velar fricative /ɣ/.) This period is what I have most often referred to as “Old Hungarian”, following the definition I have learned at Helsinki, after Kulonen 1993, Johdatus unkarin kielen historiaan, where the cutoff is given instead at 1300, or Papp 1968, Unkarin kielen historia, where it is given at 1350. But if the vast majority of texts that can be now included under that label represent a different, later language variety, an additional term would be in order for disambiguation. “Archaic Hungarian” (AHu.) seems tentatively workable.
I thus define the OHVS as the vowel system rearrangement that seems to appear between Archaic Hungarian and “Not So Old Hungarian”. Going by attestations would suggest most of the shift happening around the 13th–14th century, which is a bit of a gap in the written record of Hungarian, but probably it has been actually gradual and proceeded in different dialects at different rates, and there are indications of it being underway already in the AHu. period itself. Orthography might have also lagged behind actual phonetics, particularly for [ø] where no pre-established grapheme existed. Although this would be a topic for later, it is to me doubtless that this was also triggered by Hungarian’s arrival in a new phonological ecosystem in Central Europe. Earlier pre-Hungarian seems like its vowel phonology would have followed broadly similar structural lines as attested in Ob-Ugric, in particular with a primary split between lax / tense rather than short / long vowels.
The most generally interesting part of the OHVS is close vowel lowering. What we today reconstruct as *i, *ü, *u in Proto-Uralic (and which surface largely as such in Finnic and Meadow Mari) have short mid vowels ë, ö, o as their regular default reflexes in Hungarian: this is among the oldest known points of wider Uralic vocalic reconstruction still standing, first identified by Genetz in 1898. [3] Mid vowel reflexes appear widely in Uralic though, and a mid-century theory from Steinitz reconstructed reduced vowels *ĕ, *ö̆, *ŏ to begin with (≈ IPA [ɪ], [ʏ], [ʊ]; in a few of Steinitz’ later works notated also *ĭ, *ü̆, *ŭ) … however there is not much evidence for corresponding non-reduced close vowels, and thus since circa 1970 the consensus has been, slightly implicitly, in favor of widespread reduction and lowering across Uralic.
To the other evidence for original close vowels should be clearly added also Archaic Hungarian, where we find quite consistently ‹i›, ‹u›. At this time the letter ‹ü› had, I believe, not even been invented yet [4] and so the reflex of PU *ü is not distinguished; either of ‹i u› may substitute. Usually the latter though, and in at least some cases it’s conceivable that ‹i› indicates actual /ɪ/ that was only later labialized by some combinatory rules. I would reconstruct these close vowels as being already reduced however, to explain their eventual lowering in all Hungarian varieties. This also matches general reduction and lowering of all original close vowels in both Mansi and Khanty. [5]
This is not to say that AHu. only had reduced close vowels. Vowel length must have already existed at the time, and was simply not indicated. Many treatments of AHu. have been oddly skeptical about this, but the maintenance of non-contracted í ú and their short alternants i u into Modern Hungarian is inexplicable otherwise. A contrast between reduced /ɪ ʏ ʊ/ and tense /ī ū/ would even allow for the latter to have short allophones [i u] already in AHu., but if this was actually the case I am not committed on (e.g. the history of the mostly secondary tense *ǖ > modern ű ~ ü could complicate things). I will only insist on a phonological distinction. Traditional explanations, where e.g. the standard example of út : uta- ‘road’ develops a long vowel only around this period, by compensatory lengthening after apocope from *utu (AHu. 1055 utu) are incapable of explaining why the short vowel alternant in the oblique stem remains as uta- and does not lower to **ota-; and does a poor job also in explaining why many other words do not develop any such long-vocalic forms. [6] The only coherent explanation that I see is that this word was already *ūtʊ, not **utu. There was no “compensatory lengthening” here: the modern Hungarian length alternation in close vowels instead represents conditional shortening — e.g. uta- < *ūta-, preceding the open stem vowel a — and it is this that re-introduces short close [i y u] into the language, after the entirely regular lowering of old reduced †ɪ ʏ ʊ. Their phonologization seems to have had multiple causes, such as loanwords; the introduction of contracted long vowels; dialectally maybe even the development of secondary í from *é₁.
(I have objections also to the compensatory lengthening analysis of Hungarian non-close-vowel length-alternating stems, of the type (kêz > ) kéz : keze- ‘hand’, but this would be a different discussion.)
Another interesting point about close vowel lowering is that these are not mergers. There is, in particular, no evidence whatsoever for the existence of ö /ø/ in Hungarian before this! All native instances come from †ʏ. An old short /e/ is often proposed, e.g. in the very few cases where Hungarian *ē alternates with /e/ in some dialect forms or doublets, but this is not very compelling to me (however I am also not settled on if we should assume e.g. post-Archaic Hungarian vowel shortening, or AHu. *ē ~ *ɪ alternations). Modern mid /o/ is definitely also new as such and similarly about always goes back to †ʊ, but in the literature it remains as an open question if a distinct old †o might have existed that has now merged into modern a (see below for my view). Examples where we find modern e, ö, o corresponding to, say, Turkic *e, *ö, *o do not require being treated as archaisms: they can be easily accounted as having been first substituted as close vowels, which then have coincidentally later evolved to mid vowels. It does not seem there are many AHu. examples confirming this situation, but might include at least ölyv ‘buzzard’, attested pre-vowel shift in 1055 as uluueſ = †ʏľw-äš. If from earlier *ɪlɪɣɪ, this can be well derived ← Mongolic *elige ‘hawk sp.’ [7]
Of course, as expected, we find mid vowels also for the short close vowels of loangiving languages. Examples of this in Turkic loans would be numerous, already say török ‘Turk’ < AHu. turku = †tʏrkʏ ← *türk. Examples from Slavic maybe deserve closer attention: here they often stand for the “yers”, *ь *ъ (from Proto-Balto-Slavic short *i *u), which at the time of early contacts must have been the very same reduced vowels [ɪ ʊ] as in AHu. Their later development in Slavic has been also often towards mid vowels, but not necessarily identically to Hungarian. So e.g. given ModHu. rozs ‘rye’, we do not need to go hunting for a Slavic source that would have even a modern o (as in Russian рожь) instead of palatalizing it into a modern e (as in say Czech rež) or lowering it into a modern a (as in say Serbo-Croatian raž) — we can just start right from Proto-Slavic *rʊžɪ, and have it borrowed as AHu. *rʊžʊ, which will then natively develop into rozs. I think I’ve seen a handful of papers already by now which seem to be confused about this, and discuss Slavic loans in Hungarian in terms of various things being substituted as mid ë, ö, o … where there would be in fact nothing unexpected going on, as long as we took the loans to precede the OHVS and go back to the AHu. close vowels †ɪ, †ʏ, †ʊ. Again, the lowering is not a merger, and thus given almost any word with modern short mid vowels, this is not just a possible but an obligatory reconstruction. A word like rozs could not have had anything other than /ʊ/ in it, if it had existed already in the Archaic Hungarian period with its different vowel system. Whether we could actually find an attestation with ‹u› in the limited corpus of the period is not highly relevant for this conclusion (tho as it happens, in this case we can — a placename Ruzus = †[rʊžʊš] has been recorded in 1292, [8] presumably equal to the modern personal name Rozsos).
The relationship between AHu. close reduced and later short mid vowels is alas still not entirely bijective: some conditional developments create instances of o and ö also from short open a and e (that is, not /e/ but /ɛ/). The most regular condition is probably the somewhat odd-looking environment /l/ + coronal, reflected in examples such as hal- ‘die’ : holt ‘dead’, or †feld > föld ‘land’; a unique enough soundlaw in Hungarian that I suspect it still needs a good summary treatment eventually (which I believe would provide a few corollaries for other details of Hungarian historical phonology too). Suffice to say for now that I believe it suggests coda /l/ to have been velarized [ɫ] in Hungarian at time, which might even have been close-to-phonemic after rise of new coda [l] by apocope (and thus, no a > o in basic CVC words like hal ‘fish’); and I reiterate the warning that, particularly in this environment, modern o, ö cannot be mechanically re-written back to AHu. †ʊ, †ʏ.
A fourth vowel lowering is also frequently posited to happen around this time in the history of Hungarian: o to a. If this existed, it might as well be regular (since modern o by default comes from †ʊ) and would count as another part of the OHVS. But this proposal seems very dubious to me. Recall that Hungarian short a is to this day labial and slightly higher-than-open — I have used /ɔ/ in this blog post mainly for ease of typing, but calling it /ɒ/ is also commonplace and might be justified by how it is indeed the short counterpart of the fully open /aː/. [9] There is also not much consistency in where ‹o› appears in AHu. While the text corpus of the time is small, e.g. placenames such as Lake Balaton can be found during this period in many variants like ‹Balatin›, ‹Balotin› or ‹Bolotin›. I believe this is nothing more than graphical vacillation: Archaic Hungarian had no **/o/, but did already have /ɔ/, which was inconsistently written as either ‹a› or ‹o› in different records.
This is all regardless of the fact that Hungarian does show lowering of Proto-Uralic *o, including in some of these words (e.g. PU *konta ‘hunting group’ >> AHu. 1055 hodu > ModHu. had ‘army’). I think this lowering had regardless already happened much earlier. One clear reason in favor is that the apparent shift from ‹o› to modern a only applies to short vowels, while for modern Hungarian long fully open á, we find consistently ‹a› all across Old Hungarian — including in cases where this *ā is from ultimate PU *o (e.g. PU *kota >> AHu. 1195 haz > ModHu. ház ‘house’). Another is that also an original PU *a can be fleetingly reflected as ‹o› (e.g. PU *amta- >> AHu. 1195 od- > ModHu. ad- ‘to give’). Instead of assuming two separate lowerings of *o, the first of which would either coincidentally happen only in words that eventually develop a long vowel, or would apply only after the rise of the length contrast (an unnatural sound change: long vowels tend towards rising, not lowering) — as well as some kind of raising of old *a, again only when short — my reconstruction is that there was only a single, early lowering *o > *a, followed later on by a length split, and eventual re-labialization and slight phonetic raising of short *ă.
In loanwords, too, it is open /ɔ/ and not mid /o/ that we can take as the initial reflex Early Common Slavic *ă; which later yields /o/ everywhere across Slavic, but which loanwords to & from e.g. Greek and Finnic alike demonstrate was still open [ɑ] or [ɒ] well into the medieval period. Correspondences like modern Hungarian pap ~ modern Slavic pop ‘priest’ are thus not a case of (or evidence of) Hungarian having innovated an open vowel, but of all of Slavic having innovated a mid vowel (from Early Common Slavic *păpŭ). This word illustrates well also the other external anchors of the Slavic reconstruction, being derived from Greek πάπας and surfacing in Finnic as pappi, with open /a~ɑ/ maintained despite Slavic transmission all across Europe. — Although this has been standard knowledge in Slavistics for long, notation such as *popъ remains common, and I can’t help but wonder whether Hungarologists have taken proper notice yet, as this is another topic on which I have often seen confused statements. [10]
The last, maybe best-known, part of what I would include in the OHVS is the introduction of the long mid labial vowels ó, ő. Much as their short counterparts, these too are innovative in the Middle Hungarian vowel system, coming about primarily in the many diphthong smoothings of the period. The three main sources seem to be †ɔw, †ʊw and †ʏw (all possibly from still earlier *Vɣ). This much could be even still inferred already from modern Hungarian morphophonology: while long á, é, í, ú, ű can be found to alternate with short vowels, for ó and ő there are no such alternants, and instead we find stems like hó : hava- ‘moon’, ló : lova- ‘horse’, kő : köve- ‘stone’, which readily allow reconstructing, “*hav, *lov, *köv”, i.e. *χɔw(ɔ-), *lʊw(ɔ-), *kʏw(ɛ-), and point towards a similar history also for non-alternating ó, ő. We will be of course happy to find out that the Old Hungarian data confirms this as well: at least a diphthong ou (perhaps already merged from *ɔw × *ʊw) — especially in the passive participle ending — remains well-attested for quite a while, not only in the Archaic Hungarian period but also later, up to the 14th century. The case of the front diphthong is harder to discern, since ‹ew› or ‹eu› is taken in the “Not So Old Hungarian” period widely up as a grapheme for even short /ø/, making it impossible to discern directly if cases standing for eventual ő should be read as already [øː], as still [øw], or in some cases even perhaps even more literally as [ew] or [ɪw]. But I would follow the principle of symmetry and assume that [øw] > [øː] takes place simultaneously with [ou] > [oː].
(On the other hand, some dialects of Hungarian do not do this — they show instead long close ú, ű for *ʊw, *ʏw; lú ‘horse’, kű ‘stone’ etc. Secondary raising from *ow, *øw, but before the merger of the first with *ɔw, would be possible too, but phonetically the even simpler option seems to be direct smoothing from *ʊw, *ʏw to *ū, *ǖ.)
At this point, then, only one mid vowel proper is left in Archaic Hungarian: long mid front *ē, well preserved later on as Middle Hungarian é₁ and modern standard Hu. é. It could be tempting to try analysing even this away as an innovation (e.g. from an illabial diphthong *ëj < *ɪj?); but etymologically, it reflects Proto-Uralic *e (e.g. PU *pelə- >> AHu. fel- > ModHu. fél- ‘to fear’), and thus it does appear to be simply a long-standing preservation.
There are some more difficult additional questions about the Archaic Hungarian vowel system, e.g. if a back unrounded *ɨ or *ɯ, or a short open illabial *a could have still existed at this point, merging into other vowels later. I will therefore not present any presumptive overall vowel inventory of Archaic Hungarian at this point. The most important conclusion I would like to underline here is after all more simple: to outline regular phonological changes between Archaic Hungarian and Modern Hungarian — which also equips us to reconstruct AHu. forms even where they have not been attested exactly or at all. At times there will be options to this, but at others, such as the example of rozs ‘rye’ above, the choice would be unambiguous. Doing so will typically even bring many words closer to their etymological origins, whether native or borrowed.
[1] My initial thought was to go with “Great Hungarian Vowel Shift” with more gravitas, but that seems a bit overblown, when it leaves at least a third of the vowel system unscathed and is not even the only rearrangement event of similar scale found in Hungarian historical vocalism.
[2] A fun but missed possibility might have been to notate this as open é versus close e̋, to go with Hungarian’s distinctive ő and ű as the long versions of ö and ü.
[3] Genetz, Arvid 1899: Unkarin ensi tavuun vokaalien suhteet suomalais-lappalais-mordvalaisiin, based on a talk the year before. — There are many slightly earlier results on vowel correspondences among either the western end of Uralic or among the Ugric group, but this I think is the first work properly reaching from the “Finno” to the “Ugric”.
[4] The gradual graphical development of e.g. oe to oᵉ to oͤ to ö can be in fact seen, parallel to German, also in Not So Old Hungarian (which attests also a few other variants like the inverse-order eo, eᵒ, e̊).
[5] Sammallahti’s 1988 paper on Uralic historical phonology even took this as grounds to still follow Steinitz in part, setting up a contrast between full *ī *ū and reduced *ĭ *ü̆ *ŭ in Proto-Ugric (even if not in Proto-Uralic or -Finno-Ugric), but the yet wider appearence of close vowel reduction in Uralic by now makes me suspect an areal explanation might fare better than a strictly genealogical one. For that matter, Hungarian and Ob-Ugric do not quite agree on the distribution of *ī *ū and later post-Proto-Ugric adjustments are still needed in his model.
[6] One fairly easy adjustment, that seem to be also required for other reasons, would be to abandon the idea that all modern CVC nouns were originally CVCV (as I’ve already passingly suggested in another post about Old Hungarian). This would alas still leave plenty of problems in the theory of compensatory lengthening.
[7] In Turkic only attested in Yakut, surely as a Mongolic loan. Existing literature for some reason mostly prefers to speculate on an unattested Bulgharic *iläg- behind both Mongolic and Hungarian, I presume due to balking at the prospect of direct Mongolic loanwords in Hungarian.
[8] As reported in the major etymological dictionary of Hungarian, A Magyar Nyelv Történeti-Etimológiai Szótára. I would have no idea off the cuff what is the exact primary source for this kind of AHu. attestations, outside the handful of major sources.
[9] I assume this is the primary reason, since no one seems to like calling e equivalently /æ/, although it is also relatively open as far as vowels-usually-called-/ɛ/ come.
[10] For discussion of problems that come from reading the traditional Old Church Slavonic-based transcription of Proto-Slavic overly literally, I will pitch Lindstedt 1991: “The Notational Fallacy in Common Slavonic Phonology“.
Posted in Commentary, Reconstruction