Project acronym BIGCODE
Project Learning from Big Code: Probabilistic Models, Analysis and Synthesis
Researcher (PI) Martin Vechev
Host Institution (HI) EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
Call Details Starting Grant (StG), PE6, ERC-2015-STG
Summary The goal of this proposal is to fundamentally change the way we build and reason about software. We aim to develop new kinds of statistical programming systems that provide probabilistically likely solutions to tasks that are difficult or impossible to solve with traditional approaches.
These statistical programming systems will be based on probabilistic models of massive codebases (also known as ``Big Code'') built via a combination of advanced programming languages and powerful machine learning and natural language processing techniques. To solve a particular challenge, a statistical programming system will query a probabilistic model, compute the most likely predictions, and present those to the developer.
Based on probabilistic models of ``Big Code'', we propose to investigate new statistical techniques in the context of three fundamental research directions: i) statistical program synthesis where we develop techniques that automatically synthesize and predict new programs, ii) statistical prediction of program properties where we develop new techniques that can predict important facts (e.g., types) about programs, and iii) statistical translation of programs where we investigate new techniques for statistical translation of programs (e.g., from one programming language to another, or to a natural language).
We believe the research direction outlined in this interdisciplinary proposal opens a new and exciting area of computer science. This area will combine sophisticated statistical learning and advanced programming language techniques for building the next-generation statistical programming systems.
We expect the results of this proposal to have an immediate impact upon millions of developers worldwide, triggering a paradigm shift in the way tomorrow's software is built, as well as a long-lasting impact on scientific fields such as machine learning, natural language processing, programming languages and software engineering.
Summary
The goal of this proposal is to fundamentally change the way we build and reason about software. We aim to develop new kinds of statistical programming systems that provide probabilistically likely solutions to tasks that are difficult or impossible to solve with traditional approaches.
These statistical programming systems will be based on probabilistic models of massive codebases (also known as ``Big Code'') built via a combination of advanced programming languages and powerful machine learning and natural language processing techniques. To solve a particular challenge, a statistical programming system will query a probabilistic model, compute the most likely predictions, and present those to the developer.
Based on probabilistic models of ``Big Code'', we propose to investigate new statistical techniques in the context of three fundamental research directions: i) statistical program synthesis where we develop techniques that automatically synthesize and predict new programs, ii) statistical prediction of program properties where we develop new techniques that can predict important facts (e.g., types) about programs, and iii) statistical translation of programs where we investigate new techniques for statistical translation of programs (e.g., from one programming language to another, or to a natural language).
We believe the research direction outlined in this interdisciplinary proposal opens a new and exciting area of computer science. This area will combine sophisticated statistical learning and advanced programming language techniques for building the next-generation statistical programming systems.
We expect the results of this proposal to have an immediate impact upon millions of developers worldwide, triggering a paradigm shift in the way tomorrow's software is built, as well as a long-lasting impact on scientific fields such as machine learning, natural language processing, programming languages and software engineering.
Max ERC Funding
1 500 000 €
Duration
Start date: 2016-04-01, End date: 2021-03-31
Project acronym DAPP
Project Data-centric Parallel Programming
Researcher (PI) Torsten Hoefler
Host Institution (HI) EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
Call Details Starting Grant (StG), PE6, ERC-2015-STG
Summary We address a fundamental and increasingly important challenge in computer science: how to program large-scale heterogeneous parallel computers. Society relies on these computers to satisfy the growing demands of important applications such as drug design, weather prediction, and big data analytics. Architectural trends make heterogeneous parallel processors the fundamental building blocks of computing platforms ranging from quad-core laptops to million-core supercomputers; failing to exploit these architectures efficiently will severely limit the technological advance of our society. Computationally demanding problems are often inherently parallel and can readily be compiled for various target architectures. Yet, efficiently mapping data to the target memory system is notoriously hard, and the cost of fetching two operands from remote memory is already orders of magnitude more expensive than any arithmetic operation. Data access cost is growing with the amount of parallelism which makes data layout optimizations crucial. Prevalent parallel programming abstractions largely ignore data access and guide programmers to design threads of execution that are scheduled to the machine. We depart from this control-centric model to a data-centric program formulation where we express programs as collections of values, called memlets, that are mapped as first-class objects by the compiler and runtime system. Our holistic compiler and runtime system aims to substantially advance the state of the art in parallel computing by combining static and dynamic scheduling of memlets to complex heterogeneous target architectures. We will demonstrate our methods on three challenging real-world applications in scientific computing, data analytics, and graph processing. We strongly believe that, without holistic data-centric programming, the growing complexity and inefficiency of parallel programming will create a scaling wall that will limit our future computational capabilities.
Summary
We address a fundamental and increasingly important challenge in computer science: how to program large-scale heterogeneous parallel computers. Society relies on these computers to satisfy the growing demands of important applications such as drug design, weather prediction, and big data analytics. Architectural trends make heterogeneous parallel processors the fundamental building blocks of computing platforms ranging from quad-core laptops to million-core supercomputers; failing to exploit these architectures efficiently will severely limit the technological advance of our society. Computationally demanding problems are often inherently parallel and can readily be compiled for various target architectures. Yet, efficiently mapping data to the target memory system is notoriously hard, and the cost of fetching two operands from remote memory is already orders of magnitude more expensive than any arithmetic operation. Data access cost is growing with the amount of parallelism which makes data layout optimizations crucial. Prevalent parallel programming abstractions largely ignore data access and guide programmers to design threads of execution that are scheduled to the machine. We depart from this control-centric model to a data-centric program formulation where we express programs as collections of values, called memlets, that are mapped as first-class objects by the compiler and runtime system. Our holistic compiler and runtime system aims to substantially advance the state of the art in parallel computing by combining static and dynamic scheduling of memlets to complex heterogeneous target architectures. We will demonstrate our methods on three challenging real-world applications in scientific computing, data analytics, and graph processing. We strongly believe that, without holistic data-centric programming, the growing complexity and inefficiency of parallel programming will create a scaling wall that will limit our future computational capabilities.
Max ERC Funding
1 499 672 €
Duration
Start date: 2016-06-01, End date: 2021-05-31
Project acronym FLIRT
Project Fluid Flows and Irregular Transport
Researcher (PI) Gianluca Crippa
Host Institution (HI) UNIVERSITAT BASEL
Call Details Starting Grant (StG), PE1, ERC-2015-STG
Summary "Several important partial differential equations (PDEs) arising in the mathematical description of physical phenomena exhibit transport features: physical quantities are advected by velocity fields that drive the dynamics of the system. This is the case for instance for the Euler equation of fluid dynamics, for conservation laws, and for kinetic equations.
An ubiquitous feature of these phenomena is their intrinsic lack of regularity. From the mathematical point of view this stems from the nonlinearity and/or nonlocality of the PDEs. Moreover, the lack of regularity also encodes actual properties of the underlying physical systems: conservation laws develop shocks (discontinuities that propagate in time), solutions to the Euler equation exhibit rough and ""disordered"" behaviors. This irregularity is the major difficulty in the mathematical analysis of such problems, since it prevents the use of many standard methods, foremost the classical (and powerful) theory of characteristics.
For these reasons, the study in a non smooth setting of transport and continuity equations, and of flows of ordinary differential equations, is a fundamental tool to approach challenging important questions concerning these PDEs.
This project aims at establishing:
(1) deep insight into the structure of solutions of nonlinear PDEs, in particular the Euler equation and multidimensional systems of conservation laws,
(2) rigorous bounds for mixing phenomena in fluid flows, phenomena for which giving a precise mathematical formulation is extremely challenging.
The unifying factor of this proposal is that the analysis will rely on major advances in the theory of flows of ordinary differential equations in a non smooth setting, thus providing a robust formulation via characteristics for the PDEs under consideration. The guiding thread is the crucial role of geometric measure theory techniques, which are extremely efficient to describe and investigate irregular phenomena."
Summary
"Several important partial differential equations (PDEs) arising in the mathematical description of physical phenomena exhibit transport features: physical quantities are advected by velocity fields that drive the dynamics of the system. This is the case for instance for the Euler equation of fluid dynamics, for conservation laws, and for kinetic equations.
An ubiquitous feature of these phenomena is their intrinsic lack of regularity. From the mathematical point of view this stems from the nonlinearity and/or nonlocality of the PDEs. Moreover, the lack of regularity also encodes actual properties of the underlying physical systems: conservation laws develop shocks (discontinuities that propagate in time), solutions to the Euler equation exhibit rough and ""disordered"" behaviors. This irregularity is the major difficulty in the mathematical analysis of such problems, since it prevents the use of many standard methods, foremost the classical (and powerful) theory of characteristics.
For these reasons, the study in a non smooth setting of transport and continuity equations, and of flows of ordinary differential equations, is a fundamental tool to approach challenging important questions concerning these PDEs.
This project aims at establishing:
(1) deep insight into the structure of solutions of nonlinear PDEs, in particular the Euler equation and multidimensional systems of conservation laws,
(2) rigorous bounds for mixing phenomena in fluid flows, phenomena for which giving a precise mathematical formulation is extremely challenging.
The unifying factor of this proposal is that the analysis will rely on major advances in the theory of flows of ordinary differential equations in a non smooth setting, thus providing a robust formulation via characteristics for the PDEs under consideration. The guiding thread is the crucial role of geometric measure theory techniques, which are extremely efficient to describe and investigate irregular phenomena."
Max ERC Funding
1 009 351 €
Duration
Start date: 2016-06-01, End date: 2021-05-31
Project acronym GRAPHCPX
Project A graph complex valued field theory
Researcher (PI) Thomas Hans Willwacher
Host Institution (HI) EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
Call Details Starting Grant (StG), PE1, ERC-2015-STG
Summary The goal of the proposed project is to create a universal (AKSZ type) topological field theory with values in graph complexes, capturing the rational homotopy types of manifolds, configuration and embedding spaces.
If successful, such a theory will unite certain areas of mathematical physics, topology, homological algebra and algebraic geometry. More concretely, from the physical viewpoint it would give a precise topological interpretation of a class of well studied topological field theories, as opposed to the current state of the art, in which these theories are defined by giving formulae without guarantees on the non-triviality of the produced invariants.
From the topological viewpoint such a theory will provide new tools to study much sought after objects like configuration and embedding spaces, and tentatively also diffeomorphism groups, through small combinatorial models given by Feynman diagrams. In particular, this will unite and extend existing graphical models of configuration and embedding spaces due to Kontsevich, Lambrechts, Volic, Arone, Turchin and others.
From the homological algebra viewpoint a field theory as above provides a wealth of additional algebraic structures on the graph complexes, which are some of the most central and most mysterious objects in the field.
Such algebraic structures are expected to yield constraints on the graph cohomology, as well as ways to construct series of previously unknown classes.
Summary
The goal of the proposed project is to create a universal (AKSZ type) topological field theory with values in graph complexes, capturing the rational homotopy types of manifolds, configuration and embedding spaces.
If successful, such a theory will unite certain areas of mathematical physics, topology, homological algebra and algebraic geometry. More concretely, from the physical viewpoint it would give a precise topological interpretation of a class of well studied topological field theories, as opposed to the current state of the art, in which these theories are defined by giving formulae without guarantees on the non-triviality of the produced invariants.
From the topological viewpoint such a theory will provide new tools to study much sought after objects like configuration and embedding spaces, and tentatively also diffeomorphism groups, through small combinatorial models given by Feynman diagrams. In particular, this will unite and extend existing graphical models of configuration and embedding spaces due to Kontsevich, Lambrechts, Volic, Arone, Turchin and others.
From the homological algebra viewpoint a field theory as above provides a wealth of additional algebraic structures on the graph complexes, which are some of the most central and most mysterious objects in the field.
Such algebraic structures are expected to yield constraints on the graph cohomology, as well as ways to construct series of previously unknown classes.
Max ERC Funding
1 162 500 €
Duration
Start date: 2016-07-01, End date: 2021-06-30