Title: Coded Caching and Distributed Computing: Opportunities and Challenges (Download the slides)
Speakers: Aditya Ramamoorthy (Iowa State University, U.S.), Petros Elia (Eurecom, France)
Abstract: This tutorial will bring together the exciting areas of information-theoretic caching and distributed computing, and will discuss powerful ways in which advanced coding can exploit the deep connections between memory, communication and computing. Current research is at a juncture where the massive potential gains from jointly tapping into these three resources is evident. However, it is well understood that the gains are massively limited by fundamental bottlenecks. This requires, not only addressing these bottlenecks head on, but also exploring some of these ideas in a variety of new settings and topologies. The tutorial will explore the role of information theory and coding, in meeting the fundamental performance limits in a variety of such pertinent settings, as well as their role in resolving some of these bottlenecks. Furthermore, we will discuss how the consideration of alternate settings may complement and even accentuate the original gains.
We will first consider recent advances in information-theoretic coded caching, demonstrating the tremendous benefits of applying coding techniques to the caching problem, for a variety of wired as well as wireless settings. In this context, we will offer a top level overview of caching, with a detailed discussion of the underlying coding machinery, offering explanations of the achievable schemes, discussing the corresponding outer bounds, and highlighting more recent results as well as opportunities for future work.
The tutorial will then discuss the basics of distributed computing, and outline the coding connections that connect computing with caching. We will focus on the basic Map Reduce paradigm and demonstrate that information-theoretic approaches allow us to operate on a natural computation vs. communication trade-off. In particular, we will offer a detailed discussion of how judicious placement of tasks on computation nodes along with coded transmission in the shuffle phase, benefits job execution times.
The overall discussion will outline a unified view of the biggest challenges that appear in both these domains that can be key obstacles to realizing the promised gains. In the final part of the tutorial we will present an in-depth discussion of interesting directions based on combinatorics and graph theory for their resolution.
Title: Taming Nonconvexity in Information Science (Download the slides)
Speakers: Yuxin Chen (Princeton University, U.S.), Yuejie Chi (Carnegie Mellon University, U.S.)
Abstract: In various information science applications of interest, parameter estimation is naturally posed as a nonconvex optimization problem, especially in the high-dimensional scenario. In order to reduce the degrees of freedom and to combat the curse of dimensionality, it is often necessary to exploit low-dimensional geometric structures of the information embedded in the data, including sparsity, low-rank structure, and other structural priors. Notably, many important low-dimensional structures are best described using nonconvex constraints. This poses significant challenges --- both statistically and computationally --- for developing globally convergent algorithms with near-optimal statistical guarantees. In recent years, statistical procedures have been developed to promote low-dimensional structures using convex relaxation, which typically lift the problem into higher dimensions and convexify the problem. However such approaches are often computationally expensive.
Motivated by the computational consideration, there is a recent surge in designing nonconvex procedures, in which one attempts to solve the original nonconvex formulation directly. Fortunately, despite the nonconvexity, the loss surface of many information processing tasks exhibits benign geometric structures under natural statistical models, thus enabling provably efficient algorithmic solutions without resorting to convex relaxation. In this tutorial, we will introduce these recent findings and discuss how to design provably fast algorithms that properly exploit such geometric properties.
The tutorial will be organized as follows. We will start by introducing classical convex optimization algorithms and discussing why they are inefficient for solving large-scale information science problems. We will then use several important problems (e.g. low-rank matrix estimation, phase retrieval, robust PCA, mixed linear regression, blind deconvolution, one-layer neural networks) to motivate nonconvex formulations for statistical estimation, and to discuss the hidden convexity properties underlying such problems, such as restricted local convexity and regularity conditions. Specifically, we will discuss two threads of research. The first one focuses on global landscape of such problems and unveils the “no bad local minima” phenomenon. The second one introduces several classes of fast algorithms that properly exploit global/local geometry and other statistical properties of the data. The tutorial will concentrate on gradient-based first-order algorithms, which are well suited for solving large-scale problems. The roles of initialization (including both random initialization and spectral initialization) and regularization procedures will be discussed. We will also emphasize an implicit regularization phenomenon underlying unregularized first-order methods.