A pg2 article on OSnews about yet another Java vulnerability recently gave me something to think about : is there something intrinsically insecure about Java, or Flash or any other interpreted language for that matter ? Are interpreters actually as good for security as their proponents often claim them to be in the long run ? What do interpreters actually bring and don’t bring in the security department ?
The promises of interpreters
It has been the case in the past and it is becoming the case again : interpreters (aka “virtual machines”, “managed execution environments”) are trendy. Whole categories of software that used to be exclusively written in C or C++ now tend to prefer C#, Python, and Java. This is because these languages bring a whole range of functionality that would be hard to provide with a compiled programming language : cross-platform scripting, compile-time array definition without a need for manual memory management, deeply integrated garbage collection, dynamic typing, run-time code generation…
So interpreted languages make the developer’s life easier, and as such make coding faster. In the current software development market where paid developers are seldom treated better than slaves, this is undeniably a precious asset, that I will not dispute. I also will overlook the endless interpreter speed debate : yes, JITd code could theoretically perform better than compiled code due to per-CPU optimization, and no, this is far from happening nowadays for real-world use cases.
No, what is of interest to me here is the alleged security benefits of using interpreted code over compiled code. Interpreted C# and Java code is often claimed to be somehow more secure than its C/C++ cousin. Buffer overflow, return to libc, privilege escalation, and a whole range of other attacks is supposedly prevented by the intrinsically hardened, malware-proof structure of managed languages. In a purely interpreted world, malware would surely become nothing but a bad memory of ancient times, with only social engineering-based attacks remaining, avoidable through better user education. This belief is actually so popular that OS development in managed code is a hot research topic : we have seen OSs written in C# (Singularity, SharpOS), Java (JNode, JX), and obscure languages created specifically for this purpose (Inferno, Valix).
Why is it so ? Aren’t all these languages immune to common attacks like buffer overflow ?
The limits of managed code
There is no such thing as a purely interpreted language. At some point, one always need to run something on real hardware, and for this machine code is needed. All or part of each interpreter is as such written in an “unsafe” language like Assembly, C, or C++. So although languages may be protected from things like buffer overflow by design, their interpreter is not. Interpreters are big and complex, often more complex than the operating systems they run on, so they will have exploits. Triggering them is only a matter of feeding the right code into the interpreter.
Worse yet, interpreters are less protected than most compiled programs. Due to their very nature, they need to execute arbitrary code all the time and make untrusted writable memory regions executable, which is a very bad practice from a security point of view. Unless the OS has explicit support for them, interpreters also bust sandbox system security, since their purpose cannot be determined in advance (it should be done on a script-level basis) and as such they need permission to do whatever the scripts they run will ever need to do.
Although not an intrinsic part of interpreter design, many popular interpreted languages offer a very extensive standard library for cross-CPU and cross-OS program portability. While cross-CPU portability is often not much of a big deal (most modern OSs provide abstractions that make this a breeze), cross-OS portability is, from a security point of view. That’s because it requires the language’s standard library to re-implement a large part of each OS’ API, in order to achieve a consistent application behavior among OSs which couldn’t care less. More code means more exploit, effort duplication means that exploits get fixed more slowly, and putting a wider range of responsibilities on the shoulders of interpreter developers means that they’ll have to become familiar with a growing range of system design areas, thus either becoming less specialized or increasing in number, both outcomes being bad news as far as code quality and security is concerned.
What modern interpreted languages do, and do well, is to reduce the trusted computing base. Average application developers, working 16h a day for $800 a month and heavily lacking sleep, are not responsible for ensuring system security and avoiding buffer overflow and return to libc exploits, only the interpreter’s developers are. This means that if interpreters are kept clean and lean, coded by a small number of highly specialized people who focus on JIT compilation performance and rock-solid security, they should indeed result in a major increase of computer security in the long run.
However, this is not how interpreters are currently designed. Languages like Flash or Java aim at being jack of all trades offering consistent high-level functionality across every existing platform on Earth. Their standard library is bloated with a huge amount of work duplication in the name of portability, resulting in interpreter development work requiring much more manpower and losing focus. Instead of being very good at one thing, the interpreters for these languages become average at everything. Long forgotten here are the lessons of the old UNIX philosophy : “Do one thing, and do it well”. Too bad, for this idea actually makes a lot of sense, unless you can’t avoid doing otherwise.
Interpreter security benefits are also heavily overrated if OSs do not offer explicit support for it. In the current context of OSs providing ACLs and sandboxing at the binary level, interpreters become a nasty use case that prevents advances in sandboxing technology, by requiring maximal security privileges in order to manage every possible arbitrary script order. As an example, there is currently no such thing as an ACL for a web application : either it can’t access your home, and as such can’t be used for things like backup, or it can and random ads will be able to wipe your home without you knowing. Fine-grained security must then be implemented by interpreters themselves, which results in inconsistent UIs, annoying behaviours, and much reduced security as work is duplicated.
So in my opinion, interpreter security is not currently a big enough concern that the beliefs of managed code resulting in improved security are justified. Only time will tell if security concerns will be considered more carefully in the long run. Until then, I think that managed code security is highly overrated. Convenient – yes, portable – yes, secure – not so much…