New reports clearly confirm ‘Arya’ migration into India

A Rakhigarhi resident surveying the excavation site in 2014.   | Photo Credit: D. Krishnan

The last time a paper titled ‘The Genomic Formation of South and Central Asia’ was released online, in March 2018, it created a sensation in India and around the world. Mostly because the paper, co-authored by 92 scientists, many of them doyens of different disciplines, said that between 2000 BCE and 1000 BCE, there were significant migrations from the Central Asian Steppe that most likely brought Indo-European languages into India — just as Steppe migrations into Europe a thousand years earlier, beginning around 3000 BCE, had spread Indo-European languages to that continent as well. In other words, the paper supported the long-held idea of an ‘Arya’ migration into India — or, to put it more accurately, a migration of Indo-European language speaking people who called themselves ‘Arya’.

There were many who did not like that finding, and the most important counter-argument they made was that the paper was not peer-reviewed and was merely released in a pre-print server and, therefore, one had to withhold judgement until the paper was published in a scientific journal with peer review. That the paper was co-authored by 92 scientists of high reputation, including many from India, did not matter in their opinion. The lead author of the paper was Vagheesh Narasimhan of Harvard Medical School, while Kumarasamy Thangaraj of the Centre for Cellular and Molecular Biology was a co-director, along with David Reich of Harvard Medical School. Other Indian co-authors included Niraj Rai of the Birbal Sahni Institute for Palaeosciences and Vasant Shinde, then Director of Deccan College.

Even more evidence

Well, that paper has now been peer-reviewed and published in the most reputed of journals, Science. It has 117 scientists as co-authors, significantly up from the 92 last year. The paper is now titled ‘The Formation of Human Populations in South and Central Asia’. And what does it say on the question of Steppe migrations? The same thing, but with even more evidence and detail. Here is a direct quote, and not just any quote, but the very essence of the paper:

“By sequencing 523 ancient humans, we show that the primary source of ancestry in modern South Asians is a prehistoric genetic gradient between people related to early hunter-gatherers of Iran and Southeast Asia. After the Indus Valley Civilization’s decline, its people mixed with individuals in the southeast [i.e, southeast of northwestern India where the Indus Valley Civilization flourished: editor] to form one of the two main ancestral populations of South Asia [called Ancestral South Indians or ASI: editor], whose direct descendants live in southern India. Simultaneously, they mixed with descendants of Steppe pastoralists who, starting around 4000 years ago, spread via Central Asia to form the other main ancestral population [or Ancestral North Indians, ANI: editor]. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the distinctive features shared between Indo-Iranian and Balto-Slavic languages.”

A map of Steppe migrations, from the paper in ‘Science’.  

The First Indians

Shorn of scientific jargon, here is what that means: The reference to the early hunter-gatherers of Southeast Asia is a reference to the Andamanese, whom the rest of the paper abbreviates as AHG or Andamanese Hunter Gatherers. This is the same as the Ancient Ancestral South Indians (AASI) that the earlier paper talked about, or First Indians, which is the term used in my book, Early Indians. No matter which name you use — hunter-gatherers of Southeast Asia, AHG or First Indians — they all refer to the descendants of the Out of Africa migrants who reached India around 65,000 years ago and then moved on to Southeast Asia, East Asia and further on.

So this is what the abstract means in full: The primary source of ancestry for today’s South Asians is a mixture of First Indians and a people related to the hunter-gatherers of Iran. This mixed population created the agricultural revolution in northwestern India and built the Harappan Civilisation that followed. When the Harappan Civilisation declined after 2000 BCE due to a long drought, the Harappans moved south-eastwards (from northwestern India) to mix with other First Indians to form the Ancestral South Indian (ASI) population whose descendants live in south India today.

Around the same time, the Harappans also mixed with Steppe pastoralists who had by then migrated to north India through Central Asia, to form the Ancestral North Indian (ANI) population. The Steppe ancestry of the people of both South Asia and Eastern Europe in the Bronze Age explains how the movements of the Central Asians between the two regions caused the well-known similarities between the Indo-Iranian and Balto-Slavic languages.

A calf grazes at the excavation site.   | Photo Credit: V.V. Krishnan

The study goes on to further elaborate on the Steppe migration. Here is another quote from the same paper: “Between around 2000 and 1000 BCE, people of largely Central Steppe - MLBA (or Middle to Late Bronze Age) ancestry expanded toward South Asia, mixing with people along the Indus Periphery Cline to form the Steppe Cline.”

If these quotes surprise you because you thought the recent genetic studies had disproved Arya migration, then you have a bone to pick with some voices in Indian mass media for utterly misleading you. The Science study substantiated its earlier findings about Steppe migrations into India with even more evidence, but many newspapers and websites chose to go to town with headlines such as this: ‘New genetic studies dent Arya migration theory.’


So how did Indian media twist a straight story into something diametrically opposite? To answer that, we have to look at a second study that was released at the same time. This study, based on the ancient DNA of a woman who lived in the Harappan site of Rakhigarhi about 4,600 years ago, was published in Cell, co-authored by 28 scientists including some co-authors of the Science report, such as Thangaraj, Reich, Narasimhan and Rai, with Shinde being the lead author. The study’s title seemed straightforward: ‘An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers.’ But this made many journalists jump to the conclusion that it meant there was no Arya migration either.

Two of the four skeletons found at Rakhigarhi in 2015.   | Photo Credit: AFP

The journalists would not have reached this hasty conclusion had they read at least the summary of the Cell paper. Here is a direct quote from the summary: “These individuals had little if any Steppe pastoralist related ancestry, showing that it was not ubiquitous in northwest South Asia during the IVC as it is today.” Pay particular attention to the last four words: “as it is today”. The meaning is clear. Today, Steppe pastoralist ancestry is ubiquitous, but it was not so during the period of the Indus Valley Civilisation. (How ubiquitous is it today? The new studies have that figure too: it could be up to 30% in some population groups in India.)

The only possible conclusion from this, therefore, is that the Steppe migrations to India happened after the decline of the Harappan Civilisation. That is no surprise. It has always been understood that the Arya migration from the Steppe happened after 2000 BCE. So to anyone who applies their mind, the absence of Steppe ancestry in a skeleton in Rakhigarhi from 2600 BCE is clear confirmation that the earlier understanding was correct, that the Arya were not present during the Harappan Civilisation, and that they arrived later. In other words, the Harappan Civilisation was pre-Arya, and so was the language they spoke.


So what’s new?

The Cell paper, in fact, goes on to talk about Indo-European languages arriving with the Steppe pastoralists after 2000 BCE. Here is a quote from the paper that minces no words about migrations from the Central Asian Steppe bringing Indo-European languages to India between 2000 BCE and 1500 BCE: “However, a natural route for Indo-European languages to have spread into South Asia is from Eastern Europe via Central Asia in the first half of the 2nd millennium BCE, a chain of transmission that did occur as has been documented in detail with ancient DNA. The fact that Steppe pastoralist ancestry in South Asia matches that in Bronze Age Eastern Europe (but not Western Europe) provides additional evidence for this theory, as it elegantly explains the shared distinctive features of Balto-Slavic and Indo-Iranian languages.”


On the two key issues: who were the Harappans and who were the Arya, the new studies thus arrive at the exact same conclusions. The Harappans who created the agricultural revolution in northwestern India and then built the Harappan civilisation were a mix of First Indians and Iranians who spoke a pre-Arya language. The Arya were central Asian Steppe pastoralists who arrived in India between roughly 2000 BCE and 1500 BCE, and brought Indo-European languages to the subcontinent. Is there anything on which the two papers differ? No. They have the same conclusions — not surprising considering that the simultaneously published papers have many authors in common many authors are common between the two papers published simultaneously.

But is there anything new in these two studies, which we didn’t know earlier? Yes, a few details. For example, the earlier study on Genomic Formation of South and Central Asia said the migrants from Iran who mixed with First Indians were herders. The new study says the Iranians arrived in India before agriculture or even herding had begun anywhere in the world. In other words, these migrants were likely to have been hunter-gatherers, which means they did not bring a knowledge of agriculture. In Early Indians, I have made a strong case for agricultural experiments to have begun in India independently and have pointed out, in support, that critical domestications of animals such as zebu cattle and water buffalo had happened in India independently of elsewhere.

A seal found at Rakhigarhi.   | Photo Credit: V.V. Krishnan

A few other details provide greater clarity to these prehistoric migrations that shaped Indian demography. For example, previous studies had described the Steppe migrations as happening between 2000 BCE and 1000 BCE. The new Science paper narrows it to between 2000 BCE and 1500 BCE or to the first half of the second millennium. This is because after 1500 BCE the populations of Central Asia begin to show a higher level of East Asian ancestry of a kind that is not noticeable in India.

Another spin around the new studies suggests an ‘Out of India’ migration. This is also misleading. If by ‘Out of India’ migration we are referring to the fact that some Harappans visited neighbouring civilisations or cultures such as the Bactria-Margiana Archaeological Complex (BMAC) or Shahr-i-Sokhta, with whom they had trade and cultural links, these are well-known and unsurprising facts.


But if by ‘Out of India’ migration we mean large-scale migration of prehistoric Indians towards the West, spreading culture and language all the way from, say, Harappa to Iceland, then there is not a shred of evidence, genetic or otherwise, to suggest that. It also contradicts the studies’ position that migrations from the Central Asian Steppe brought Indo-European languages to India after 2000 BCE.

The DNA clincher

One question often raised is: how robust can ancient DNA studies based on a few samples be? The answer is that it would be a mistake to look at one genome as something akin to say, a person in a survey answering a question as Yes or No. A single genome carries within itself the genetic track record of a person’s ancestors going back thousands or tens of thousands of years. So when you sequence a genome, say the one belonging to the woman from Rakhigarhi, you are getting a peek into the genomes of thousands of people ancestral to her. That is why in population genetics studies, even a few samples can provide huge insights.

Importantly, the two recent studies are based on 12 ancient DNA samples: one from Rakhigarhi, 8 from Shahr-i-Sokhta in eastern Iran and three from Gonur in BMAC. The study published last year was based on just three samples from Shahr-i-Sokhta and Gonur. The number of samples has now trebled.

The reason why the three samples studied last year were considered as proxy for Harappan people was because they stood out (or were outliers) from the rest of the population of that time; they carried a significant amount of First Indian ancestry unlike the others around them. This suggested they were migrants from the Harappan Civilisation. The new Cell study validates the assumption that these outliers were indeed migrants from Harappan cities because the Rakhigarhi DNA sample matches exactly the 11 samples from Shahr-i-Sokhta and Gonur. It has reconfirmed the earlier findings with even more robust data.

Clay toys found at Rakhigarhi   | Photo Credit: V.V. Krishnan

On the question of the language of the Harappans, the 2018 study had mentioned the possibility of it being Dravidian. The new paper goes into greater detail to suggest that Dravidian was likely to have been the language of the Ancestral South Indians (ASI) formed as a result of the mixing of the Harappan population with First Indians. The study says: “A possible scenario combining genetic data with archaeology and linguistics is that proto-Dravidian was spread by peoples of the Indus Valley Civilisation along with the Indus Periphery Cline ancestry component of the ASI.” The study also points out that of the 11 ancient DNA samples of Harappan migrants recovered from Shahr-i-Sokhta and Gonur, two carried the Y chromosome haplogroup H1a1d2, which is today primarily found in southern India.

There is a lesson here for both readers and the media. When reporting on science, it is important to go by what is written in the papers rather than by statements made outside them. Peer-reviewed papers published in reputed journals by well-known scientists are robust and durable; fleeting statements made at press conferences are ephemeral and prone to being misheard or misreported, and could sometimes run contrary to the evidence on hand.


What we know today, based on these two papers, is mostly what we knew last year, but with far greater supporting evidence. Which is that we are a multi-source civilisation, not a single-source one, drawing our cultural impulses, traditions and practices from a variety of heredities and migration histories. We are all Indians. We are all migrants. And we are all mixed.

The writer is the author of Early Indians: The Story of Our Ancestors and Where We Came From.

