A formally verified compiler is a compiler that comes with a machine-checked proof that no bug is introduced during compilation. This correctness property states that the compiler preserves the semantics of programs. Formally verified compilers guarantee the absence of correctness bugs, but do not protect against other classes of bugs, such as security bugs. This limitation partly arises from the traditional form of stating compiler correctness as preservation of semantics that do not capture non-functional properties such as security. Moreover, proof techniques for compiler correctness, including the traditional notions of simulation, do not immediately apply to secure compilation, and need to be extended accordingly.
This talk will address the challenges of secure compilation from the specific angle of turning an existing formally-verified compiler into a formally-verified secure compiler. Two case studies will illustrate this approach, where each case study addresses a notion of security and uses modular reasoning (first proving correctness then security) to show that compilation preserves security. Specifically, we consider the problem of secure compilation for CompCert, a formally-verified moderately optimizing compiler for C programs, programmed and verified using the Coq proof assistant [1]. CompCert evolved significantly over the last 15 years, starting as an academic project and now being used in commercial settings [2].
The first case study focuses on software fault isolation and considers a novel security-enhancing sandboxing transformation [3]; it ensures that an untrusted module cannot escape its dedicated isolated address space. The second case study [4] focuses on side-channel protection, and considers cryptographic constant-time, a popular software-based counter- measure against timing-based and cache-based attacks. Informally, an implementation is secure with respect to the cryptographic constant-time policy if its control flow and sequence of memory accesses do not depend on secrets.
Logic Programming (LP) is considered to be relatively simple for non-programmers, and allows the developer to focus on developing facts and rules of a logical derivation, and not on algorithms. Secure multiparty computation (MPC) is a methodology that allows several parties to process private data collaboratively without revealing the data to any party. In this paper, we bring together the notions of MPC and LP, allowing users to write privacy-preserving applications in logic programming language.
Memory-trace Obliviousness (MTO) is a noninterference property: programs that enjoy it have neither explicit nor implicit information leaks, even when the adversary can observe the program counter and the address trace of memory accesses. Probabilistic MTO relaxes MTO to accept probabilistic programs. In prior work, we developed λobliv, whose type system aims to enforce PMTO [2]. We showed that λobliv could typecheck (recursive) Tree ORAM [6], a sophisticated algorithm that implements a probabilistically oblivious key-value store. We conjectured that λobliv ought to be able to typecheck more optimized oblivious data structures (ODSs)[8], but that its type system was as yet too weak.
In this short paper we show we were wrong: ODSs cannot be implemented in λobliv because they are not actually PMTO, due to the possibility of overflow, which occurs when a oram_write silently fails due to a local lack of space. This was surprising to us because Tree ORAM can also overflow but is still PMTO. The paper explains what is going on and sketches the task of adapting the PMTO property, and λobliv's type system, to characterize ODS security.
Types indexed with extra type-level information are a powerful tool for statically enforcing domain-specific security properties. In many cases, this extra information is runtime-irrelevant, and so it can be completely erased at compile-time without degrading the performance of the compiled code. In practice, however, the added bureaucracy often disrupts the development process, as programmers must completely adhere to new complex constraints in order to even compile their code.
In this work we present WRIT, a plugin for the GHC Haskell compiler that relaxes the type checking process in the presence of runtime-irrelevant constraints. In particular, WRIT can automatically coerce between runtime equivalent types, allowing users to run programs even in the presence of some classes of type errors. This allows us to gradually secure our code while still being able to compile at each step, separating security concerns from functional correctness.
Moreover, we present a novel way to specify which types should be considered equivalent for the purpose of allowing the program to run, how ambiguity at the type level should be resolved and which constraints can be safely ignored and turned into warnings.
Verification techniques have been applied to the design of secure protocols for decades. However, relatively few efforts have been made to ensure that verified designs are also implemented securely. Static code verification techniques offer one way to bridge the verification gap between design and implementation, but require substantial expertise and manual labor to realize in practice. In this short paper, we propose black-box runtime verification as an alternative approach to extend the security guarantees of protocol designs to their implementations. Instead of instrumenting the complete protocol implementation, our approach only requires instrumenting common cryptographic libraries and network interfaces with a runtime monitor that is automatically synthesized from the protocol specification. This lightweight technique allows the effort for instrumentation to be shared among different protocols and ensures security with presumably minimal performance overhead.
It is easier than ever before to build complex web applications that handle sensitive user data. At same time, regulatory shifts have made data breaches more costly than ever before.
While starting Akita, I discovered just how difficult it is for software teams to maintain an up-to-date picture of how sensitive data flows across complex applications. A major challenge is that modern web applications run across many heterogenous components, often communicating via remote procedure calls. Unfortunately, network calls subvert all known software analysis methods for the application layer---and using network tools alone do not yield the full picture. The result is that developers end up piecing the whole story together through reading code, logs, and documentation.
At Akita, we observed that network-based application programming interfaces (APIs) are both a root cause of what we call the Software Heterogeneity Problem---and also the key to the solution. The proliferation of APIs for both internal and external use, with the rise of service-oriented architectures and the growth of the API economy, have made it easy to quickly build applications that are amalgams of cross-service network calls. At the same time, there is consolidation around a handful of interface definition languages for web APIs. This makes it possible for us to address the Software Heterogeneity problem by applying programming languages techniques at the API layer.
In this talk, I will introduce the Software Heterogeneity Problem and its consequences, demonstrate one way to tackle it at the API layer, and outline API-level security problems I believe we can solve as a community.
Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically.
Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages.
We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums.
Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.
Recent efforts have sought to design new smart contract programming languages that make writing blockchain programs safer. But programs on the blockchain are beholden only to the safety properties enforced by the blockchain itself: even the strictest language-only properties can be rendered moot on a language-oblivious blockchain due to inter-contract interactions. Consequently, while safer languages are a necessity, fully realizing their benefits necessitates a language-aware redesign of the blockchain itself.
To this end, we propose that the blockchain be viewed as a typechain: a chain of typed programs --- not arbitrary blocks --- that are included iff they typecheck against the existing chain. Reaching consensus, or blockchecking, validates typechecking in a byzantine fault-tolerant manner. Safety properties traditionally enforced by a runtime are instead enforced by a type system with the aim of statically capturing smart contract correctness.
To provide a robust level of safety, we contend that a typechain must minimally guarantee (1) asset linearity and liveness, (2) physical resource availability, including CPU and memory, (3) exceptionless execution, or no early termination, (4) protocol conformance, or adherence to some state machine, and (5) inter-contract safety, including reentrancy safety. Despite their exacting nature, typechains are extensible, allowing for rich libraries that extend the set of verified properties. We expand on typechain properties and present examples of real-world bugs they prevent.