Imagine a Schumacher or, if you are a still older aficionado of motorsport, a Niki Lauda at the wheel of a Formula 1 car attempting a manoeuvre at speeds touching 300 km per hour. The expert racer, racing a car that is loaded with an array of sensors and embedded microprocessors, attempts a move that requires the machine to respond instantly to his command.

Now imagine that Schumacher, instead, gets a turning wheel like you do on your desktop PC or your laptop, indicating that the computer is still busy doing the task he asked it to do in an instant.

In engineering or computing parlance this delayed response is termed latency, which may be only a fraction of a millisecond but enough to propel a Lauda to crash (Lauda actually suffered a crash in 1976, but, of course, not because of any latency in the car’s electronics, and returned to the circuit as a hero after the life-threatening injuries he sustained).

Latency is the key

Latency is a critical issue in many applications that abound in everyday life. For most of our ordinary tasks on a PC, a Windows, a Linux or a Mac OS would do, but mission-critical tasks that allow near zero-tolerance levels require a Real-Time Operating System (RTOS).

Of course, there is a hierarchy of applications, depending on the extent of tolerance that is permissible. For instance, the margin of error in manoeuvring a craft like the Curiosity on the surface of Mars is lower than a racing car or embedded systems that may be present in home appliances or the infotainment systems in automobiles.

Every device that is part of an information network (switches, routers and equipment that is closer to the customer-end) requires an RTOS that drives each of the components. Industries that are driven by automation, such as a petrochemical facility or a power utility, require controllers to drive them. Railway equipment such as anti-collision systems, signalling gateways and braking systems too require an RTOS to drive them precisely.

“Wherever you need a specific functionality, which is to be executed at a specific time and under specified circumstances, you need an RTOS,” explains Venkatesh Kumaran, country head, Wind River (India), the company that designed the operating system that drives the Curiosity.

“Basically, you need it when you have zero margin of error,” Mr. Kumaran observes. “Imagine if you used a Windows operating system to run the controls of an aircraft. If it fails to boot up when you need it, you are finished. Any mission- or safety-critical application that operates under sub-millisecond performance conditions requires an RTOS,” he says. This is the key difference between an enterprise operating system and an RTOS.

Severe constraints

Mission-critical applications, such as those aboard the Curiosity, or the controls aboard a fighter aircraft or an F1 racing car, typically require to function under four sets of severe constraints: space, weight, power consumption, and an extremely hostile environment.

Wind River’s VXWorks is the core operating system of the National Aeronautics and Space Administration’s Mars mission.

“It is not merely the landing of the Curiosity on Mars. From the launch of the rocket (in November 2011), the craft had to survive, land and then conduct a series of experiments during its stay on the Mars surface,” explains Mr. Kumaran.

The operating system itself provides the environment in which the hardware is configured for the hostile environment on Mars, says Praful Joshi, technology head, Wind River India. This is done by ‘swap’, a means by which the system is enabled to operate under constraints imposed by weight, space and power requirements, he explains.

“The hardware can be consolidated by using multiple cores, but consolidating software is much more difficult,” he adds.

The challenges

Why is software development for such consolidated architectures more difficult? “The complexity of software code is becoming more and more complex, the challenge being the constant push to keep reducing the lines of code, while at the same time ensuring that you get more out of what you write,” says Mr. Kumaran.

“It is surprising that the Curiosity, considering the complexity of the tasks that it had to perform, required only 500,000 lines of code. It took eight years of work,” he observes. To put this in some perspective, consider this: a generic switch or a router requires many more lines of code than what was needed for the Curiosity. A critical difference between an RTOS and a general-purpose operating systems such as Linux is the manner in which they respond to “interrupt service routines” that are triggered by hardware components.

A general-purpose system would respond at varying time intervals, depending on what else the system is preoccupied with, which is indicated by the irritating phenomenon of the spinning wheel just when you want the system to do something fast. An RTOS, however, will consistently respond at the same interval throughout its life. To put it simply, every request made to an RTOS is treated as coming from a VIP and nobody is kept waiting.

What’s the big deal?

If it is such a nice and desirable objective, why cannot Microsoft make Windows behave like an RTOS where every request is handled consistently and quickly? Or, to put it differently, what is the challenge that Microsoft faces in addressing such an objective?

Developing an RTOS is difficult because it needs to be nimble and less bulky even as it has the capability to handle complex tasks, explains Mr. Kumaran. Significantly, while the VXWorks in all its glory is only 100 kilobytes, the footprint of the Windows operating system is not less than 10 MB, he points out.