ExeBench: An ML-scale dataset of executable C functions
Tue 14 Jun 2022 04:15 - 04:30 at Boardroom - Evening
Machine-learning promises to transform compilation and software engineering, yet is frequently limited by the scope of available datasets. In particular, there is a lack of runnable, real-world datasets required for a range of tasks ranging from neural program synthesis to machine learning-guided program optimization. We introduce a new dataset, ExeBench, which attempts to address this. It tackles two key issues with real-world code: references to external types and functions and scalable generation of IO examples. ExeBench is the first publicly available dataset that pairs real-world C code taken from GitHub with IO examples that allow these programs to be run. We develop a toolchain that scrapes GitHub, analyzes the code, and generates runnable snippets of code. We analyze our benchmark suite using several metrics, and show it is representative of real-world code. ExeBench contains 4.5M compilable and 700k executable C functions. This scale of executable, real functions will enable the next generation of machine learning-based programming tasks.
Mon 13 JunDisplayed time zone: Pacific Time (US & Canada) change
15:30 - 17:00 | |||
15:30 45mKeynote | Unsupervised Program Synthesis: Hierarchy and Perception MAPS Kevin Ellis Cornell University | ||
16:15 15mTalk | ExeBench: An ML-scale dataset of executable C functions MAPS Jordi Armengol-Estapé University of Edinburgh, Jackson Woodruff University of Edinburgh, Alexander Brauckmann University of Edinburgh, José Wesley de Souza Magalhães University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh | ||
16:30 15mTalk | Automatically Debugging AutoML Pipelines Using Maro: ML Automated Remediation Oracle MAPS | ||
16:45 15mTalk | A Graph Neural Network-based performance model for Deep Learning Applications MAPS Shikhar Singh University of Texas, James Hegarty Facebook, Hugh Leather University of Edinburgh, UK, Benoit Steiner Facebook |
Tue 14 JunDisplayed time zone: Pacific Time (US & Canada) change
03:30 - 05:00 | |||
03:30 45mKeynote | Unsupervised Program Synthesis: Hierarchy and Perception MAPS Kevin Ellis Cornell University | ||
04:15 15mTalk | ExeBench: An ML-scale dataset of executable C functions MAPS Jordi Armengol-Estapé University of Edinburgh, Jackson Woodruff University of Edinburgh, Alexander Brauckmann University of Edinburgh, José Wesley de Souza Magalhães University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh | ||
04:30 15mTalk | Automatically Debugging AutoML Pipelines Using Maro: ML Automated Remediation Oracle MAPS | ||
04:45 15mTalk | A Graph Neural Network-based performance model for Deep Learning Applications MAPS Shikhar Singh University of Texas, James Hegarty Facebook, Hugh Leather University of Edinburgh, UK, Benoit Steiner Facebook |