Is managed code actually more secure ?

A pg2 article on OSnews about yet another Java vulnerability recently gave me something to think about : is there something intrinsically insecure about Java, or Flash or any other interpreted language for that matter ? Are interpreters actually as good for security as their proponents often claim them to be in the long run ? What do interpreters actually bring and don’t bring in the security department ?

The promises of interpreters

It has been the case in the past and it is becoming the case again : interpreters (aka “virtual machines”, “managed execution environments”) are trendy. Whole categories of software that used to be exclusively written in C or C++ now tend to prefer C#, Python, and Java. This is because these languages bring a whole range of functionality that would be hard to provide with a compiled programming language : cross-platform scripting, compile-time array definition without a need for manual memory management, deeply integrated garbage collection, dynamic typing, run-time code generation…

So interpreted languages make the developer’s life easier, and as such make coding faster. In the current software development market where paid developers are seldom treated better than slaves, this is undeniably a precious asset, that I will not dispute. I also will overlook the endless interpreter speed debate : yes, JITd code could theoretically perform better than compiled code due to per-CPU optimization, and no, this is far from happening nowadays for real-world use cases.

No, what is of interest to me here is the alleged security benefits of using interpreted code over compiled code. Interpreted C# and Java code is often claimed to be somehow more secure than its C/C++ cousin. Buffer overflow, return to libc, privilege escalation, and a whole range of other attacks is supposedly prevented by the intrinsically hardened, malware-proof structure of managed languages. In a purely interpreted world, malware would surely become nothing but a bad memory of ancient times, with only social engineering-based attacks remaining, avoidable through better user education. This belief is actually so popular that OS development in managed code is a hot research topic : we have seen OSs written in C# (Singularity, SharpOS), Java (JNode, JX), and obscure languages created specifically for this purpose (Inferno, Valix).

Yet if you look at the top sources of exploits nowadays, you curiously won’t see huge unmanaged programs like Windows or Linux ranking in a very high position. You’ll pretty consistently observe that the top attack vectors on nowadays’ desktops are the JRE, Adobe Reader, and Adobe Flash. A bit more research will also tell you that a great deal of exploits come from Javascript code running in Web browsers.

Why is it so ? Aren’t all these languages immune to common attacks like buffer overflow ?

The limits of managed code

There is no such thing as a purely interpreted language. At some point, one always need to run something on real hardware, and for this machine code is needed. All or part of each interpreter is as such written in an “unsafe” language like Assembly, C, or C++. So although languages may be protected from things like buffer overflow by design, their interpreter is not. Interpreters are big and complex, often more complex than the operating systems they run on, so they will have exploits. Triggering them is only a matter of feeding the right code into the interpreter.

Worse yet, interpreters are less protected than most compiled programs. Due to their very nature, they need to execute arbitrary code all the time and make untrusted writable memory regions executable, which is a very bad practice from a security point of view. Unless the OS has explicit support for them, interpreters also bust sandbox system security, since their purpose cannot be determined in advance (it should be done on a script-level basis) and as such they need permission to do whatever the scripts they run will ever need to do.

Although not an intrinsic part of interpreter design, many popular interpreted languages offer a very extensive standard library for cross-CPU and cross-OS program portability. While cross-CPU portability is often not much of a big deal (most modern OSs provide abstractions that make this a breeze), cross-OS portability is, from a security point of view. That’s because it requires the language’s standard library to re-implement a large part of each OS’ API, in order to achieve a consistent application behavior among OSs which couldn’t care less. More code means more exploit, effort duplication means that exploits get fixed more slowly, and putting a wider range of responsibilities on the shoulders of interpreter developers means that they’ll have to become familiar with a growing range of system design areas, thus either becoming less specialized or increasing in number, both outcomes being bad news as far as code quality and security is concerned.

Conclusion

What modern interpreted languages do, and do well, is to reduce the trusted computing base. Average application developers, working 16h a day for $800 a month and heavily lacking sleep, are not responsible for ensuring system security and avoiding buffer overflow and return to libc exploits, only the interpreter’s developers are. This means that if interpreters are kept clean and lean, coded by a small number of highly specialized people who focus on JIT compilation performance and rock-solid security, they should indeed result in a major increase of computer security in the long run.

However, this is not how interpreters are currently designed. Languages like Flash or Java aim at being jack of all trades offering consistent high-level functionality across every existing platform on Earth. Their standard library is bloated with a huge amount of work duplication in the name of portability, resulting in interpreter development work requiring much more manpower and losing focus. Instead of being very good at one thing, the interpreters for these languages become average at everything. Long forgotten here are the lessons of the old UNIX philosophy : “Do one thing, and do it well”. Too bad, for this idea actually makes a lot of sense, unless you can’t avoid doing otherwise.

Interpreter security benefits are also heavily overrated if OSs do not offer explicit support for it. In the current context of OSs providing ACLs and sandboxing at the binary level, interpreters become a nasty use case that prevents advances in sandboxing technology, by requiring maximal security privileges in order to manage every possible arbitrary script order. As an example, there is currently no such thing as an ACL for a web application : either it can’t access your home, and as such can’t be used for things like backup, or it can and random ads will be able to wipe your home without you knowing. Fine-grained security must then be implemented by interpreters themselves, which results in inconsistent UIs, annoying behaviours, and much reduced security as work is duplicated.

So in my opinion, interpreter security is not currently a big enough concern that the beliefs of managed code resulting in improved security are justified. Only time will tell if security concerns will be considered more carefully in the long run. Until then, I think that managed code security is highly overrated. Convenient – yes, portable – yes, secure – not so much…

10 thoughts on “Is managed code actually more secure ?

  1. Tom Novelli July 13, 2011 / 11:44 pm

    Right. You could make a good case that managed code offers a false sense of security. Sure, GC eliminates one class of low-level exploits. And rewriting large parts of Firefox in JavaScript (for example) could eliminate some of the C++ exploits. But there will always be weaknesses in application logic, and a careless Java/JS/Python/PHP coder is going to make more mistakes than a good C coder.

  2. anon4cec July 14, 2011 / 1:49 pm

    Interesting acticle
    i’ve been noticing that when you read security warning bulletins they hardly ever mention whether this only occur on a machine without a firewall and anti-virus (good).

    Third world programmers working 16hrs a day.
    It’s sad that such highly qualified people should worked for such long hours for such low pay. But this is starting to happen in richer countries also.
    It made me think about this story below about gaming programmers.

    http://arstechnica.com/gaming/news/2011/05/the-death-march-the-problem-of-crunch-time-in-game-development.ars

  3. Hadrien July 15, 2011 / 11:12 am

    I’d say that you need antivirus software at the moment the OS’ security is already busted :) They just help limit the spread of the attack, because once the identified threat is reported to AV manufacturers, they start building up rules against it so that people with an antivirus are warned in the future.

    I’d like to see an OS that’s secure enough that the large overhead of modern antiviruses is not needed.

    As for firewalls, I don’t know how important they are on the client side. I’ve heard of them being used in large corporate networks, making sure that desktop A that doesn’t need access to server B doesn’t have it. But for the average Joe’s desktop, connected to a network that only includes a printer, a few PCs, and the modem/router ? I don’t know.

    I’ve heard horrifying stories about how professional developers are being treated, and not solely in the third world. Apparently, even in the US, you can still sometimes find 70-hours weeks in some places. This saddens me as an horrible loss of perspective : not only do these people lose their personal life, they also lose their intellectual productivity to a large extent. At this amount of stress, the human mind becomes fairly poor at creativity and smart solutions, and essentially turns into a dumb machine that turns coffee, food and orders into code. So I don’t understand how this can even make sense from a management point of view, unless the people above them share some horrifying belief that all hours of work are equal.

  4. Luke McCarthy September 5, 2011 / 2:04 am

    “All or part of each interpreter is as such written in an “unsafe” language like Assembly, C, or C++.”

    This is not necessarily true. For example, many Smalltalk interpreters are written in Smalltalk. Of course, you need a way of bootstrapping the system initially, but once that is done there is no more need for any “unsafe” language.

  5. Nick P August 23, 2013 / 3:25 am

    Good write up. I’m gonna play devil’s advocate as I usually do on this concept.

    “Yet if you look at the top sources of exploits nowadays, you curiously won’t see huge unmanaged programs like Windows or Linux ranking in a very high position. ”

    Perhaps the low hanging fruit was removed after 10+ years of steady exploits? I’ll add that Microsoft has their SDL and Linux kernel hackers have higher than average quality. You’d expect to see less vulnerabilities than average company’s code, native or other wise. And less over time as they leverage many battle hardened components in new efforts.

    “You’ll pretty consistently observe that the top attack vectors on nowadays’ desktops are the JRE, Adobe Reader, and Adobe Flash.”

    Yes. Each of these runs overprivileged, is written in an unsafe language, often have monolithic style, makes extensive use of risky pointers, allows easy execution of code from untrusted sources, and has plenty of built-in functionality. Sounds like Windows in pre-NT days. Would anyone look at that mess and generalize its problems to native code in general? Not wise. ;)

    “Why is it so ? Aren’t all these languages immune to common attacks like buffer overflow ?”

    They aren’t languages. JRE and Flash are language runtimes that, as you pointed out, lack the safe properties of the languages they host. Adobe Reader is a native app written in C++ so it doesn’t even compare.

    To reply to the question directly, though, yes these languages seem to prove their immunity. I’ve already said why those specific platforms are a good target. However, notice that everyone is targeting the platform itself rather than all the code in the app. This shows in practice that using managed code removes the apps themselves from attackers’ crosshairs far as local or code injection attacks go (excluding SQL/web apps). That benefits security by letting app developers focus on what they’re better at and reducing TCB in general.

    “Interpreters are big and complex, often more complex than the operating systems they run on, so they will have exploits. Triggering them is only a matter of feeding the right code into the interpreter.”

    That’s mostly true in practice. However, this is not an inherent limitation. The correct by construction and verified approaches to software development might knock that out like they have other things (incl OS kernels). I mean, I’ve seen prototype and production systems where code injection from input data is impossible by design from hardware up. And CPU’s are interpreters for native bytecode when you get down to it. ;)

    “Due to their very nature, they need to execute arbitrary code all the time and make untrusted writable memory regions executable, which is a very bad practice from a security point of view. ”

    Not inherently. Current problems stem from their developers’ design choices.

    “That’s because it requires the language’s standard library to re-implement a large part of each OS’ API, in order to achieve a consistent application behavior among OSs which couldn’t care less. More code means more exploit, effort duplication means that exploits get fixed more slowly, and putting a wider range of responsibilities on the shoulders of interpreter developers means that they’ll have to become familiar with a growing range of system design areas, thus either becoming less specialized or increasing in number, both outcomes being bad news as far as code quality and security is concerned.”

    That’s currently bad across the board with popular approaches. Some academic work shows us it can help when done right by wrapping unsafe OS constructs with typesafe, HLL interfaces. I’ve often combined process separation, IPC, mandatory access controls, and API wrapping to prevent vulnerabilities in interactions between isolated component and OS services. I’m sure such techniques can help with interpreters. Example: Google App Engine isn’t hacked daily by hosting untrusted Python apps, IIRC.

    Re conclusion

    Your conclusion makes some of my own points too. I’ll throw in a few suggestions about certain issues.

    Implementation

    Write the interpreter in a type-safe systems language. There are several (Ada & Oberon/Modula come to mind). A number of issues were already covered in some OS implementations in type safe languages.

    If you can’t totally use a type safe language, you can use typesafe programming partially. The use of model checkers, carefully structured internals, and typesafe prototypes can help knock many problems out of the design before it gets close to production. Maybe unsafe code becomes a tiny portion of interpreter with carefully designed interface (eg H-Layer Tolmach’s people use). Or you go totally with other language via conversion with careful source mapping. Early Orange Book A1 systems did that between, for ex, Gypsy spec language and Pascal implementations.

    If unsafe code all the way, try to use the layering, information hiding, and reduction of risky constructs anyway. Throw in safe subsets & static analysis for most of it if possible. Move most of the extra risky/privileged stuff into its own module with careful interfaces. The easier it is to mentally analyse, the better the coders will do. Do Fagan-style software inspections to find the errors that most hackers know how to find.

    Library Code

    Ideally, each piece of library code is a guarded, typesafe function within the interpreter. Why do it that way? Well, you get the type safety and info flow analyses that’s already happening for the interpreter anyway. Also, if the runtime is generated per app, then any unneeded modules or privileges can be left out. Inline reference monitor type approaches are also easier as every valid path is known ahead of time. So, total security benefits are reduction of code, reduction of trusted code, managed external libraries, easier attack detection, and possibly easier embedding of other security functionality in later (like provenance tracking or something).

    Native libraries. If outside code is native/unsafe, then you probably can’t rewrite it. Possibilities for enhancements here include tech such as NaCl, binary rewriting to catch certain classes of problems (esp memory), native to managed compilers, API wrapping, process isolation, physical isolation, etc. If that all sounds like trouble, well interfacing with complicated unsafe code in a safe way usually is. And nobody has proof it can be done securely either: new attacks resulting from complexity might be on their way. NaCl gives us some hope as it’s had a pretty nice effect so far: most problems seem to be found by skilled white hats and improved.

    Underlying platform

    Relatively easy approach. Use an OS with few security vulnerabilities, a decent number of security features, and mature set of features for files/networking. An enterprise Linux, OpenBSD, and NetBSD come to mind. Strip out every kernel or userland component you can. Put protections in front of the apps and regularly reinstall the OS from a clean copy. Takes care of plenty of problems.

    Safest, but hard, approach. Start with an architecture with trusted boot, signed code loading, simple CPU, hardware-enforced POLA [esp for memory/IO], and hardware-protected control flow. Leverage that, EAL6 type design techniques and safe language/coding to build any trusted native software including VM. Finally, port standard libraries over to make it useful. Exemplary designs to look into for this stuff include KeyKOS (POLA), JX (low TCB interpreter), SPIN (type safe OS w/ dynamic loading), and SAFE architecture’s tagged processor (or alternate HISC chip).

    Middle. Use a microkernel and virtualization to keep trusted and untrusted portions separated. The safer interpreter can be implemented directly on the microkernel. The other stuff can run within the virtualized OS using standard libraries. Communication is done carefully while leveraging IPC, address space and capabilities of underlying kernel. This strategy has been used to host Java and Ada runtimes on top of microkernels like Integrity-178B for safety-critical projects. Nizza and Perseus Security Architectures did similar things for security. I’m sure the idea has more payoff potential.

    Conclusion

    Current managed runtimes aren’t designed for high security. Many even follow many antipatterns for security. This predictably resulted in huge amounts of vulnerabilities past, present & surely in future. The good news is there exist many technologies and practices in security engineering with provable security benefits. At the very least, strong TCB reduction with provably impossible code injection are achievable with existing tech with added expense and headache by users. A high amount of TCB and risk reduction over traditional apps is also possible with less investment.

    Like with most things, it’s economic and social forces that drive the insecurity. Anyone wanting to go in the opposite direction just combine some of the ideas I mentioned. Either compartmentalize out the risk in existing platforms or build safer ones from the ground up. Tradeoffs abound in each path. Good luck.

  6. Hadrien August 24, 2013 / 5:05 pm

    @Nick P: First, thank you for this high-quality comment :)

    I agree with your overall point about interpreters shifting part of the security burden away from application developers and towards interpreter developers, and as such enforcing better separation of concerns. I also agree that if they had better designs and were written in safer programming languages like Ada, they could probably do away with part of their security flaws, at the cost of the extra headache of not using C/C++ which is pretty much the lingua franca of native code libraries.

    On the other hand, I am less convinced by the idea that runtimes could do away with excessive privileges and untrusted code execution. To keep a runtime in an appropriately tight sandbox would require to make it highly modular and apply different sandboxing rules to its core functionality and extra modules. The latter sounds very difficult to do with modern-day OSs, at least unless each module is run in a separate process, which could easily become pretty wasteful as far as performance is concerned.

    As for arbitrary code execution, I have a hard time understanding how something like the Java runtime could do its job if it were not allowed to translate third-party bytecode to machine code and execute that code. It also sounds hard to design a bytecode that can be proven incapable of doing harm to the system, even on a per-software basis, without strongly sacrificing VM performance. Again, there seems to be a tradeoff between speed and security here.

    Regarding formal verification of code, I like the idea in theory, but as far as I understand it, current implementations have serious limitations. Proving the security of nontrivial code which makes heavy use of pointers and dynamic allocation, as an example, still remains out of the reach of today’s tools as far as I know. Exceptions and hardware interrupts also seem to remain as much of a headache when someone wants to prove something about a piece of code as they are when it comes to actually debugging that piece of code.

  7. Nick P August 24, 2013 / 7:46 pm

    ” First, thank you for this high-quality comment :)”

    You’re very welcome. :)

    “The latter sounds very difficult to do with modern-day OSs, at least unless each module is run in a separate process, which could easily become pretty wasteful as far as performance is concerned.”

    Well you answered your own question. Using built-in OS restrictions on processes containing the runtimes is an easy route. Tresys’s SELinux and Argus Pitbull solutions are examples here. A recent example making better tradeoffs is Capsicum, a capabilities framework embedded into FreeBSD. It was just a few percent slower than normal applications and requires minimal modification. There’s also solutions to memory and control flow integrity for native apps in published literature with small performance penalty.

    “As for arbitrary code execution, I have a hard time understanding how something like the Java runtime could do its job if it were not allowed to translate third-party bytecode to machine code and execute that code.”

    The simplest method is via pre-verification and signed loaders. The Sandia Secure Processor, a Java processor, trades away regular dynamic loading of Java functionality for the signed loading of a static binary. The developers choose what they want on there, compile it, sign it and load it. Plus, it’s an object processor so all code is conformant to the type system. SPIN and JX achieved similar type safety for most of their system code, API and apps. So, the easy method remains safe coding guidelines, pervasive type safety, and/or careful code loading.

    What about our Windows/Linux desktops? Yes, we don’t have it so easy there, do we? ;) Using a runtime that eliminates direct memory access already helps a whole lot. Untrusted apps interfacing with standardized, well-understood interfaces can help too. Honestly, I think outside writing all the trusted code in something like Ada the best thing we can do is API wrapping/guarding (see Joe-E), OS limitations of runtime (inc CPU/memory use), design by contract and static analysis on programs. There’s no silver bullet for safely running untrusted programs on untrustworthy platforms. It’s a bullshit requirement if you ask me but it’s also the status quo, eh?

    There are other ways. One is to use microkernels or capability operating systems in combination with these managed runtimes. Keeping a low amount of kernel code and isolating trusted code behind clear interfaces has proven security benefits. Strong controls on CPU, memory and IPC with capabilities for each service can be done easily with microkernels. POLA and decomposition are natural. So, the app would be broken into pieces with each piece getting just the privileges it needs and only controlled information sharing.

    I was excited to see that the Barracuda Application Server combines a tiny HTTP component, a Lua VM, and INTEGRITY microkernel. I’m not saying they follow all my design considerations. They just seem to be going in a similar direction. Remember also that we can program apps to dump privileges after they no longer need them. That can be *very* helpful when the runtime ultimately needs too much privilege, esp at config/startup.

    (Note: A trusted platform like this could be made with virtualization to run side-by-side with a mainstream OS. And there’s many examples of tiny, robust VMM’s. Majority of apps would run in unsafe side and critical stuff in safe side. Trusted path ensures you know which you are looking at. Nizza did this for eCommerce demonstrator, for instance.)

    I used to put critical apps on their own piece of hardware and design the TCB specifically for them. That option illustrated by two commercial systems: Sentinel’s HYDRA firewall (PPC PCI card) and Secure 64’s SourceT OS (Itanium based). NSA said HYRDRA lived up to its penetration resistance claims. Matasano’s evaluation of SourceT said it’s architectural design meant they couldn’t find any way to inject code into it. Both products have plenty of functionality from networking to application code. And they perform well. So, I don’t feel it’s a stretch of the imagination to think we could do the same for a node that just runs managed code, esp. one application at a time.

    “Regarding formal verification of code, I like the idea in theory, but as far as I understand it, current implementations have serious limitations. Proving the security of nontrivial code which makes heavy use of pointers and dynamic allocation, as an example, still remains out of the reach of today’s tools as far as I know. ”

    They do. Much can be improved by changing architecture or coding style. There’s been many designs and system deployed that do that to varying degrees. GEMSOS Security Kernel used Pascal and call-by-value, for instance. The best work I’ve seen recently are Microsoft’s Verve OS, the SPARK language, and the CompCert compiler. Latter two have been proven via industrial use and independent testing, respectively.

    *If* I used formal verification, I’d apply it to critical parts such as interpreter state machine, interface to native code, memory model, control flow, etc. I’d also use the best tool for each job and just be careful with abstraction mapping. Many people call that a risk. My goal is knocking out errors rather than a stack of math. It seems to get that done. So that is that. ;)

    That leads me to make a quick mention of formal specification vs verification. I read almost all the papers written in the Orange Book days of “verified” security. Most projects that did proofs consistently said that, although proofs caught few errors, “designing for” proofs seemed to catch/prevent all kinds of problems. A precise specification, good models, good structuring, code to spec correspondence… these things that were necessary to begin a proof effort also inherently knocked out many problems and helped find others. So, I’d focus less on “proofs” and more on strong “specs” that evolved with the code side-by-side, one regularly checking the other. A1 VAX Security Kernel took that approach and seL4 did that for proofs on C code too.

    Random Note: I also think it will help if we make sure interpreters accepting arbitrary code only accept “bytecode” with good semantic definition. Accepting strings of long complicated commands and programs is a sure way to have both a complex interpreter, parsing erros, and maybe even injection. I’d rather it be in a format that’s easy to convert to the internal representation with enough type information to check. The verifier/JIT would be separate from the main interpreter code and is small enough to rigorously verify. Keeping a bunch of string processing out of the interpreter also makes it easier to design it as a state engine, something verification community has extensive tools/experience dealing with.

    Hope some of this helps you or other readers solve some of their security problems. I always say many modern problems were solved in the past to varying degrees. Our industry just has a hard time learning from the past or passing down our experience.

  8. fidel Madibana November 16, 2016 / 7:54 am

    between unmanaged language and managed language which one is better protecting aganist buffer overflow attack

  9. Hadrien November 16, 2016 / 9:00 am

    The one with array bounds checking, which is mostly unrelated to managed versus unmanaged. For example, Ada, D and OCaml have array bounds checking.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s