The problematics of genetics and the Aryan issue

Illustration by Deepak Harichandan  

Tony Joseph’s article (“How genetics is settling the Aryan migration debate, June 16) on how recent genetic studies of Indian populations might be “settling the Aryan migration debate” attempts to summarise polemical as well as technical aspects of the contribution of genetics to the debate in question. I will focus here mostly on methodological issues to argue that the conclusions of two recent studies Mr. Joseph’s article heavily leans on are much shakier than conveyed. (Those are “A genetic chronology for the Indian Subcontinent”, published earlier this year in BMC Evolutionary Biology, which I will refer to as “Silva et al.” after its first author, and “Reconstructing Indian population history” by David Reich and four co-authors, published in Nature in 2009, henceforth “Reich et al.”)


To begin with, most studies of population genetics suffer from shortcomings and flaws, some of which are currently unavoidable, while others are the result of subjective, personal choices. The larger public interested in the field of population genetics should be all the more aware of this problematics as genetic studies come to us in a scientific garb; in actual fact, they are scientific only in part; there is still much room for human prejudice and error.


Skewed population samplings

The first limitation is one of numbers. Silva et al.’s study sequenced very few new genomes of the Subcontinent’s populations; rather, it revisited older samples with new techniques (about 1,500 for their mtDNA study and 850 for their genome-wide study, if I have correctly read the paper’s Additional file 1). That is, of course, a valid exercise, but such a small data set remains inadequate to represent the diversity of Indian populations, which the paper itself often stresses (“a remarkable genetic diversity”, “a very complex history”, etc.), and may easily lead to over-interpretation of the limited data. It is the case with all genetic studies to date — understandably so, DNA sequencing being a time-consuming and costly affair. Moreover, the paper (see its Fig. 2) inherits from earlier studies serious inconsistencies in categorising the samples, grouped sometimes regionally (“Sindhi”, “Bengali from Bangladesh”, “Gujarati from Houston”, “Indian Telugu from UK”, with no further details), sometimes caste-wise (“Kshatriya”, “Low-caste South” and “Central”, “Brahmin South” and “Central”, again without further details), and sometimes religion-wise (“Muslim”, with no geographical precision). A look at Table S3 in Additional file 1 makes it clear that thousands of communities from all over the Subcontinent are left out of the picture. Bias is built into the data set.


Let us turn to another example of this limitation: I refer to the construct of Ancestral North Indians (ANI) and Ancestral South Indians (ASI) promoted by Reich et al., according to whom the ANI were “genetically close to Middle Easterners, Central Asians, and Europeans”, making it “tempting to assume that the population ancestral to ANI and CEU [Europeans] spoke ‘Proto-Indo–European’”. In reality, as I showed in a paper published last year (“Aryans and the Indus Civilization: Archaeological, Skeletal, and Molecular Evidence”), the study had no samples whatsoever from several major Indian States (Himachal Pradesh, Punjab, Haryana, Bihar, West Bengal, Odisha, Maharashtra, Tamil Nadu and a few Northeastern States), while other States (Jammu & Kashmir, Uttarakhand, Rajasthan, Gujarat, Madhya Pradesh, Jharkhand, Chhattisgarh, Kerala) were represented by a single population. With such a heavily skewed distribution, the constructs of ANI and ASI clearly lacked any scientific validity — and, moreover, were never defined genetically in the said paper. This prompted in 2014 the biologist B.M. Reddy to call them “ill-conceived and untenable as units of study”. ANI and ASI were advanced simply because they were expected to conform to predetermined results, such as ANI entering as Indo-Aryan speakers. When Mr. Joseph asserts that Reich et al.’s study “proved that most groups in India today can be approximated as a mixture of these two populations, with the ANI ancestry higher in traditionally upper caste and Indo-European speakers”, he merely resorts to the classic “argument from authority” — except that there is no authority at all in this case. That particular study only “proved” how good scientists are apt to bungle when their theoretical framework remains amateurish. We might as well put forth constructs of “Ancestral Eastern Indians” and “Ancestral Western Indians” and demonstrate that most Indian populations “can be approximated as a mixture of these two” — the approach would be just as valid, or invalid, as that of Reich et al.


In fact, Reich et al. were themselves careful to “warn that ‘models’ in population genetics... although they provide an important framework for testing historical hypotheses, are oversimplifications. For example, the true ancestral populations of India were probably not homogeneous as we assume in our model, but instead were probably formed by clusters of related groups that mixed at different times”. Clearly, the same complex processes of intermixing continued through most of Indian prehistory and history, which means that there are no “true ancestral populations of India”, barring the original immigration from Africa some 60 or 70,000 years ago. The concept of Adivasi has no genetic legitimacy, unless we are ready to restrict it to the Andamanese.


The problem of circularity

The second flaw, hinted at above, is that of circularity. Mr. Joseph also quotes Reich et al.’s above words of caution, and adds, “In other words, ANI is likely to have resulted from multiple migrations, possibly including the migration of Indo-European language speakers” (note the use of the definite article “the” before “migration”). Silva et al.’s paper also refers to “multiple dispersals into the Subcontinent.” If, then, we have “multiple migrations” or “dispersals” at different periods, what are the criteria that allow us to associate a particular migration with “the” presumed migration of Indo-Aryan speakers? There are no criteria at all — only a predetermined dogma that Indo-Aryans speakers did enter India in the early or mid-second millennium B.C.E.


Silva et al.’s study suffers from this argument from circularity by constantly assuming that any populations migrating in or about central Asia during the Bronze Age must have been “speaking a proto-Indo-Iranian language”; the genetic data are not allowed to gradually build up a picture of possible population movements but immediately squeezed into the mainstream linguistic model.


In a perceptive paper of 2007 entitled “Anthropological, historical, archaeological and genetic perspectives on the origins of caste in South Asia”, Nicole Boivin, an archaeologist at Oxford University’s School of Archaeology with wide experience of the Subcontinent, offered this critique of genetic studies: “In reading the genetics literature on South Asia, it is very clear that many of the studies actually start out with some assumptions that are clearly problematic, if not in some cases completely untenable. Perhaps the single most serious problem concerns the assumption, which many studies actually start with as a basic premise... that the Indo–Aryan invasions are a well-established (pre)historical reality.” Ms. Boivin cited a few well-advertised genetic studies to show how they “confirm such invasions in large part because they actually assume them to begin with”.


There is another aspect to the circularity issue: migrations out of India are simply kept out of the picture, yet some did occur. In the Bronze Age, all archaeological evidence brandished in support of an Aryan migration into the Subcontinent has failed in the end, as none of the second-millennium B.C.E. material cultures earlier attributed to immigrating Indo-Aryans has been shown to be intrusive; on the other hand, we do have clear archaeological trails for a Harappan presence in central Asia, across Iran, in the Persian Gulf (with a few Harappan colonies, outposts or “enclaves” all the way to Mesopotamia). Those are not isolated, punctual cases, but occurrences repeated over several centuries. Why not investigate such trails in the whole R1a debate, instead of positing, as Silva et al. do, that “the Y-chromosome haplogroup R1a... spread with pastoralism and the Indo-European languages into South Asia”?


The same applies to the historical period: no doubt, Persian, Greek, Kushana etc. invasions did take place, but there is also firm historical evidence for an Indian presence in Persia, Anatolia, Armenia and Greece, besides of course Afghanistan and central Asia (to look only northward and westward): why is this never factored in? It is peculiar that the perspective of the two genetic studies Mr. Joseph leans on is wholly unidirectional, almost as a distant echo of Karl Marx’s diktat that “the whole of [India’s] past history, if it be anything, is the history of the successive conquests she has undergone”.


Besides, geneticists tend to neglect non-migratory mechanisms, such as sustained interactions, which historically occurred along trade networks and are bound to have complexified the genetic picture. It is not just facile correlations between migration and language that genetic studies must guard against, but also unilinear, north-to-south interpretations.


“Freezing” India’s populations

Another pitfall has been pointed out by several scholars, among them Ms. Boivin again. It may be best illustrated by Silva et al.’s assertion that “the full jāti system” was established about the start of the Common Era (2,000 years ago) and its endorsement of another genetic study as regards the “freezing of India’s population structure” some 1,500 years ago. Ms. Boivin noted the “problematic assumption... that caste is unchanging” — for instance, that today’s Brahmin necessarily had Brahmin ancestors, which need not be correct, or again that castes were strictly endogamous, which was rarely the case. In fact, there is sound historical and epigraphic evidence of caste mobility in early India — how much is impossible to quantify, but even a small amount would have meaningful consequences; in any case, India’s population has been anything but “frozen”, which means that genetic studies reconstructing a picture on the basis of today’s castes are liable to err when they assume that those castes were identical three to four millennia back.


K.S. Singh, who headed the Anthropological Survey of India and its massive “People of India” project, went further and wrote: “We are mostly a mixed people, and there is no genetical basis to either caste or varna.” As biologists V. Tripathy, A. Nirmala, and B.M. Reddy also pointed out in 2008, many genetic studies betray “a lack of anthropological insights into Indian population structure, as many of the papers have been written by people of non-Anthropology (especially Indian Anthropology) background.”


More issues

Silva et al.’s paper suffers from this shortcoming on a few more issues. One is its limited understanding of the archaeological context of South Asia. It speaks of the “re-peopling [of South Asia] after the Last Glacial Maximum” (about 18,000 years ago); even if there were, as there must have been, migrations into the Subcontinent after the Last Glacial Maximum, there is no question of “re-peopling”, since substantial upper Palaeolithic populations thrived in many parts of the Subcontinent, as numerous prehistoric studies have shown. Second, the study naively tries to correlate the “spread of agriculture” with a few haplogroups, suggesting that agriculture came into the Subcontinent through migrations — but we have enough evidence for the indigenous spread of agriculture in the Subcontinent, not only in the Northwest but independently in the Ganges valley too at a deep Neolithic period. (True, some important cultivars, millets for instance, came from outside India, but that is a different story.) Thirdly, Silva et al. attribute the “spread of Dravidian languages” to those first farmers, while recent studies from genetics (S. Sengupta et al., 2006; P.A. Underhill, 2008) as well as agro-linguistics (D.Q. Fuller, 2003) have tended to show that those languages originated in south India — this far-reaching contradiction should have at least required a discussion.


Finally, Silva et al.’s study opines that “genetic influx from Central Asia in the Bronze Age was strongly male-driven, consistent with the patriarchal, patrilocal and patrilineal social structure attributed to the inferred pastoralist early Indo-European society” (or, as Mr. Joseph puts it, “those who migrated were predominantly male”). This is a double case of circularity: a patriarchal social structure is “attributed” to an “inferred” pastoralist Indo-European society, and results are interpreted to fit this double assumption. Anthropologically, “pastoral” migrations that leave their womenfolk behind make no sense. “The male-dominated arrival of Indo-Aryan speakers from Central Asia” is nothing but a recycling of the nineteenth-century paradigm: it would be conceivable only in the context of an aggressive military campaign, invasion and conquest (which archaeology has emphatically ruled out), not with repeated waves of peaceful pastoral immigrations.


The importance of multidisciplinarity

Contrary to Mr. Joseph’s thesis that “genetics is settling the Aryan migration debate”, no single discipline will ever be able to do so on its own. Vedic scholars have agreed that there is no direct reference to an Aryan invasion or migration into India, although from the mid-nineteenth century till today, many of them (Western scholars mostly) have tried to force this scenario on the texts; archaeologists have largely rejected the same scenario (owing to the absence of evidence on the ground); bioanthropologists (such as the late K.A.R. Kennedy, Brian Hemphill or John Lukacs) have also ruled it out, finding no discontinuity in the Northwest’s skeletal record of the 2nd millennium B.C.E.; archaeoastronomers have unearthed numerous references in the Vedic literature pointing to astronomical observations in the 3rd and 4th millennia; on the other hand, linguists have, though with a few exceptions, endorsed the invasion/migration paradigm as providing the simplest explanation for the diffusion of the family of Indo-European languages into the Subcontinent (with similar migrations into Iran and Europe), although after two centuries of linguistic studies they remain unable to agree on the location of the original homeland of the said family or the chronology of the said diffusion (the French prehistorian Jean-Paul Demoule’s recent book, Mais où sont passés les Indo-Européens?, offers a solid erudite critique of the linguistic approach). At bottom, few scholars from either side, or from any side, seem to realise that the Indo-European problem will be laid to rest only when a theory can effectively take care of all the above disciplines, and a few more: the problem is essentially multidisciplinary.


Returning to genetics, to brandish one study as the final solution to the Indo-European problem while ignoring the many earlier (and a few recent) studies that reached vastly different conclusions, specially as regards the “genomic unity” of Indian populations, the “caste-tribe genetic continuum”, or the origin of the R1a haplogroup, simply reflects a personal choice, not an objective assessment. Better sequencing or analytical techniques are no guarantee against interpretative flows. And contrary to what Mr. Joseph claims at the start of his article, many studies based not just on (matrilineal) mtDNA, but also on (patrilineal) Y-DNA of Indian populations were published in the last two decades or so (my above-mentioned paper lists quite a few). The author has clearly not surveyed the field, rich with at least 150 studies for the Subcontinent alone.


To give one last example, a study, “Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese”, by Mayukh Mondal et al., just published in the May issue of Human Genetics, concludes that “Indian populations have complex ancestry which cannot be explained by a single expansion model” and throws a few challenging statements: “The time divergence between Indian and European Y-chromosomes, based on the closest neighbour analysis, shows two different distinctive divergence times for J2 and R1a, suggesting that the European ancestry in India is much older (>10 kya [more than 10,000 years]) than what would be expected from a recent migration of Indo-European populations into India (~4 to 5 kya [4 to 5,000 years])… Our time divergence estimate matches the previous studies which argued that most of the haplogroups present in India arose inside India rather than being brought from outside… The puzzling point is that the well-recognized later Indo-European migration, which strongly affected the northern regions, did not produce detectable major changes in the Y-chromosome gene-pool… [This result] downplays the importance of migration related to the Indo-Aryan linguistic expansion.” Clearly, divergences in genetic studies are much wider than Mr. Joseph tells us!


In fact, it is reassuring and instructive to see geneticists disagree — and a useful reminder that the discipline still has much room for subjective interpretations. It may be that in 20 or 30 years many or most of the grey areas will be rigorously settled, and flawed methodologies averted, but we are still far from that: “The journey is really just beginning,” wrote Peter A. Underhill in concluding a 2008 paper on “Interpreting patterns of Y chromosome diversity: pitfalls and promise”, which highlighted technical details of potential misinterpretations of genetic data. One of them, incidentally, concerns different estimates among geneticists for the average frequency at which mutations occur in the Y-DNA (the so-called “molecular clock”), rendering the dates for splits in haplogroups more uncertain than most studies take care to specify. The “clock” can eventually be calibrated only by sequencing of ancient DNA (aDNA) from the Subcontinent, ideally from various regions and periods; we are, again, very far from having such data (although there is some promise from a few Harappan skeletons).


Finally, Mr. Joseph’s article suffers from overemphatic language, which, for instance, omits all cautionary statements from Underhill et al.’s paper of 2014 (such as “our data do not enable us to directly ascribe the patterns of R1a geographic spread to specific prehistoric cultures or more recent demographic events”) so as to make its conclusions appear ironclad, a stand the authors themselves carefully avoided. However, Mr. Joseph is undoubtedly right in stating that “We are all migrants.” (K.S. Singh said it earlier: “An Indian is a migrant par excellence.”) As regards his conclusion that “We are a multi-source civilization, not a single-source one, drawing its cultural impulses, its tradition and practices from a variety of lineages and migration histories”, I agree again, but only partly: Indian civilization is something more than a khichri of ethnicons, even if they all did enrich it: culturally, it has been more a creator and a giver than a recipient, as most Indologists recognised long ago.

Tony Joseph responds


Michel Danino’s response to my article suffers from multiple deficiencies, the first of which is that he fails to recognise that my piece rests heavily on five studies (not two, or one as he says at different places). Four of these were published between 2013 and now, and one was published in 2009. This is important because as was explained in my piece, in a field making rapid advances, it is misleading to depend, as Prof. Danino recommends, on older studies that arrived at their conclusions using fuzzy data.


Second, the doubt that Prof. Danino tries to throw over the science of population genetics is peculiar because until recently, it was this science that was being used to argue that the migration of Indo-European language speakers into India in the Bronze Age had been disproved! That doubts about population genetics are being cast precisely at a time when it has started using better techniques such as whole genome sequencing and arriving at clearer answers is noteworthy.


Prof. Danino makes an effort to question the methodology adopted by these studies, but the proper way to question commonly-accepted methodology in a scientific field might be to get a peer-reviewed article published in a well-regarded journal.


An intriguing aspect of Prof. Danino’s response is his opposition to the Ancestral North Indians/Ancestral South Indians framework. From 2009 until now, this was the framework that was used, in media reports, to argue against Bronze Age migrations into India, as my piece explains at length. Now when it is shown that the study actually supports it, he says the framework is faulty! A typical case of wanting to have it both ways, one would think.


At one place, Prof. Danino says: “If, then, we have ‘multiple migrations’ or ‘dispersals’ at different periods, what are the criteria that allow us to associate a particular migration with ‘the’ presumed migration of Indo-Aryan speakers? There are no criteria at all — only a predetermined dogma that Indo-Aryans speakers did enter India in the early or mid-second millennium B.C.E.” This is incorrect. It is the phylogenetic and geographic structure of haplogroups that allow scientists to associate particular migrations with particular regions, as explained in my piece. In addition, the discovery of R1a in ancient DNA across Europe and Central Asia (especially in Sintashta, Andronovo and Srubnaya remains) provides strong evidence. The well-understood science of linguistics in turn allows us to define the linguistic areas and follow the evolution of languages.


Prof. Danino next moves on to suggest that ‘Out-of-India’ is a reasonable theory, and not a far-out hypothesis as it is usually considered to be. Again, the best way to demonstrate this would have been through a peer-reviewed article in a well-regarded journal that provides an “Out of India” explanation for the genetic spread that we observe. That he quotes not a single study to back his conviction is informative. The ‘Out-of-India’ hypothesis has been around for many decades, and if it were tenable at all, shouldn’t there have been many peer-reviewed papers by now making the case and fleshing out the details?


At one place, Prof. Danino asks why the migration history of India is “unidirectional”, as if India has been singled out for special treatment. But neither is the migration history of India unidirectional, nor is its migration history singular. There have been well-accepted out-migrations from India, such as that of the Romani people. And Europe has been subject to no fewer migration episodes than India. In fact, the Silva paper of 2017 makes explicit and detailed comparisons between the migration histories of India and Europe, so it is wrong to suggest that India has somehow been singled out.


The sections in Prof. Danino’s response dealing with the caste issue are irrelevant to my piece which doesn’t anywhere link the caste system to the arrival of the Bronze Age migrants from Central Asia. On the contrary, my piece clearly says, quoting from the Priya Moorjani et al. study in the American Journal of Human Genetics in 2013, that there was significant mixture of populations of India between 4,200 years ago and 1,900 years ago which, to the best of our knowledge, left only the Onge of Andaman and Nicobar unaffected. My piece also quoted Ms. Moorjani to say there was a shift from widespread mixture to strict endogamy which happened only in the beginning centuries of the Common Era, long after the Bronze Age migrations.

On the question of multidisciplinarity, it should be borne in mind that linguistics has always favoured the theory of migration of Indo-European language speakers to South Asia and other regions of the world. (And, ironically, Prof. Danino, who argues for multidisciplinarity, also finds fault with the geneticists for relying on linguistics to connect the geographical spread of R1a and the spread of Indo-European languages!) Archaeology has been indeterminate on the Bronze Age migration question, but there is no archaeological discovery that will militate against genetics joining linguistics in favouring migration as the right explanation.


Lastly, that Prof. Danino doesn’t like khichri is also informative in a way. It seems to suggest a discomfort with diversity and a preference for uniformity or perhaps, purity, or may be just a preference for nursery pudding. Or perhaps it is just that he hasn’t had an opportunity to taste aviyal. As Indians and citizens of a great civilization, we take delight in having a multitude of things in common — and comfort with diversity is certainly one of them.



Michel Danino is an author, a guest professor at IIT Gandhinagar, where he assists the growth of its Archaeological Sciences Centre, and a member of the Indian Council of Historical Research

Tony Joseph is a writer and former editor of BusinessWorld. Twitter: @tjoseph0010

This article is closed for comments.
Please Email the Editor

Printable version | Apr 12, 2021 9:21:06 PM |

Next Story