Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries
Thu 16 Jun 2022 23:00 - 23:20 at Toucan - Analysis
The increasing popularity of WebAssembly as a compilation target creates a demand for understanding and reverse engineering WebAssembly binaries. An important first step in this process is to recover the types of functions in the binary. Unfortunately, there currently is no automated approach for obtaining type information beyond the four built-in, low-level types of WebAssembly. This paper presents SnowWhite, a learning-based approach for recovering precise, high-level parameter and return types for WebAssembly functions. SnowWhite distinguishes itself from prior work for other binary formats by representing the types-to-predict in an expressive type language. This language can describe a large number of complex types, instead of the fixed, and usually small type vocabulary used in prior binary type prediction approaches. As types are sentences in the type language, we formulate the prediction as a sequence prediction task and build on the success of neural sequence-to-sequence models. We evaluate SnowWhite on a large-scale dataset of 6.3 million type samples extracted from over 300,000 WebAssembly object files. The results show the type language to be more expressive than prior work, precisely describing 1,225 types instead the 7 to 35 types considered previously. Despite this expressiveness, the type prediction has high accuracy, exactly predicting 44.5% (75.2%) of all parameter types and 57.7% (80.5%) of all return types within the top-1 (top-5) predictions.
Thu 16 JunDisplayed time zone: Pacific Time (US & Canada) change
10:40 - 12:00 | |||
10:40 20mTalk | CycleQ: an efficient basis for cyclic equational reasoning PLDI Eddie Jones University of Bristol, C.-H. Luke Ong University of Oxford, Steven Ramsay University of Bristol DOI | ||
11:00 20mTalk | Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries PLDI DOI Pre-print | ||
11:20 20mTalk | Abstract Interpretation Repair PLDI Roberto Bruni University of Pisa, Roberto Giacobazzi University of Verona, Roberta Gori University of Pisa, Francesco Ranzato University of Padova DOI Pre-print | ||
11:40 20mTalk | Differential Cost Analysis with Simultaneous Potentials and Anti-potentials PLDI Đorđe Žikelić IST Austria, Pauline Bolignano Amazon, Bor-Yuh Evan Chang University of Colorado Boulder & Amazon, Franco Raimondi Amazon & Middlesex University DOI Pre-print |
22:40 - 00:00 | |||
22:40 20mTalk | CycleQ: an efficient basis for cyclic equational reasoning PLDI Eddie Jones University of Bristol, C.-H. Luke Ong University of Oxford, Steven Ramsay University of Bristol DOI | ||
23:00 20mTalk | Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries PLDI DOI Pre-print | ||
23:20 20mTalk | Abstract Interpretation Repair PLDI Roberto Bruni University of Pisa, Roberto Giacobazzi University of Verona, Roberta Gori University of Pisa, Francesco Ranzato University of Padova DOI Pre-print | ||
23:40 20mTalk | Differential Cost Analysis with Simultaneous Potentials and Anti-potentials PLDI Đorđe Žikelić IST Austria, Pauline Bolignano Amazon, Bor-Yuh Evan Chang University of Colorado Boulder & Amazon, Franco Raimondi Amazon & Middlesex University DOI Pre-print |