After my HCS talk last week, a grad student who was in attendance mailed to ask for my thoughts about the intersection of security and programming languages.
I’ve received this question with some frequency, and even gave a brief talk about it last year. The subject matter is rather nuanced, and providing an explanation that does it justice would take a lot of effort, so it’s been sitting on my “to properly write about when I have some time” pile for quite a while now. Unfortunately, it recently became clear to me that The Pile is mostly a black hole. Not wishing to sorely disappoint Greg the Grad Student, I sent him the following sketch of an answer.
If I had to grossly overgeneralize, I’d say people looking at language security fall in roughly three schools of thought:
- The “My name is Correctness, king of kings” people say that security problems are merely one manifestation of incorrectness, which is dissonance between what the program is supposed to do and what its implementation actually does. This tends to be the group led by mathematicians, and you can recognize them because their solutions revolve around proofs and the writing and (automatic) verification thereof.
- The “If you don’t use a bazooka, you can’t blow things up” people say that security problems are a byproduct of exposing insufficiently intelligent or well-trained programmers to dangerous language features that don’t come with a safety interlock. You can identify these guys because they tend to make new languages that no one uses, and frequently describe them as “like popular language X but safer”.
- The “We need to change how we fundamentally build software” people say that security problems are the result of having insufficiently fine-grained methods for delegating individual bits of authority to individual parts of a running program, which traditionally results in all parts of a program having all the authority, which means the attack surface becomes a Cartesian product of every part of the program and every bit of authority which the program uses. You can spot these guys because they tend to throw around the phrase “object-capability model”.
Now, while I’m already grossly overgeneralizing, I think the first group is almost useless, the second group is almost irrelevant, and the third group is absolutely horrible at explaining what the hell they’re talking about.
(If I was trying to be less overly general, I’d mention that in some instances the groups overlap substantially, and some subsets of these groups, such as the subset of group 2 that’s working on SFI and sandboxing, are relevant and occasionally produce good work.)
Combined, these will give you enough references to chase the subject matter as far down the rabbit hole as you dare descend. Good luck, and may the gods have mercy on your soul.