Data, Physics, and Life Through the Lens of Information Theory Special Issue Dedicated to the Memory of Toby Berger

2024

Guest editors

Jerry Gibson

Ioannis Kontoyiannis

Yingbin Liang

S. Sandeep Pradhan

Andreas Winter

Ram Zamir

This special issue of the �鶹��ýӳ�� Journal on Selected Areas in Information Theory is dedicated to the memory of Toby Berger, one of the most important information theorists of our time, who passed away in 2022 at the age of 81. He made foundational contributions to a wide range of areas in information theory, including rate-distortion theory, network information theory, quantum information theory, and bio-information theory. He also left a deep imprint on diverse fields in applied mathematics and theoretical engineering, such as Markov random fields, group testing, multiple access theory, and detection and estimation. Well known for his technical brilliance, he tackled many challenging problems, but above all, it is his pursuit of elegance in research and writing that shines throughout his work. The goal of this special issue is to celebrate Toby Berger’s lasting legacy and his impact on information theory and beyond. Original research papers on topics within the realm of his scientific investigations and their “offspring”, as well as expository articles that survey his pioneering contributions and their modern developments, are invited.

Jun Chen Jerry Gibson Ioannis Kontoyiannis Yingbin Liang S. Sandeep Pradhan Andreas Winter Ram Zamir Richard E. Blahut Yasutada Oohama Aaron B. Wagner Raymond W. Yeung

Jingjing Qian Sadaf Salehkalaibar Jun Chen Ashish Khisti Wei Yu Wuxian Shi Yiqun Ge Wen Tong

This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. Specifically, the RDP setting with either the Kullback-Leibler (KL) divergence or Wasserstein-2 metric as the perception loss function is examined, and it is shown that for Gaussian vector sources, jointly Gaussian reconstructions are optimal. We further demonstrate that the optimal tradeoff can be expressed as an optimization problem, which can be explicitly solved. An interesting property of the optimal solution is as follows. Without the perception constraint, the traditional reverse water-filling solution for characterizing the rate-distortion (RD) tradeoff of a Gaussian vector source states that the optimal rate allocated to each component depends on a constant, called the water level. If the variance of a specific component is below the water level, it is assigned a zero compression rate. However, with active distortion and perception constraints, we show that the optimal rates allocated to the different components are always positive. Moreover, the water levels that determine the optimal rate allocation for different components are unequal. We further treat the special case of perceptually perfect reconstruction and study its RDP function in the high-distortion and low-distortion regimes to obtain insight to the structure of the optimal solution.

Venugopal V. Veeravalli Georgios Fellouris George V. Moustakides

In the problem of quickest change detection, a change occurs at some unknown time in the distribution of a sequence of random vectors that are monitored in real time, and the goal is to detect this change as quickly as possible subject to a certain false alarm constraint. In this work we consider this problem in the presence of parametric uncertainty in the post-change regime and controlled sensing. That is, the post-change distribution contains an unknown parameter, and the distribution of each observation, before and after the change, is affected by a control action. In this context, in addition to a stopping rule that determines the time at which it is declared that the change has occurred, one also needs to determine a sequential control policy, which chooses the control action at each time based on the already collected observations. We formulate this problem mathematically using Lorden’s minimax criterion, and assuming that there are finitely many possible actions and post-change parameter values. We then propose a specific procedure for this problem that employs an adaptive CuSum statistic in which (i) the estimate of the parameter is based on a fixed number of the more recent observations, and (ii) each action is selected to maximize the Kullback-Leibler divergence of the next observation based on the current parameter estimate, apart from a small number of exploration times. We show that this procedure, which we call the Windowed Chernoff-CuSum (WCC), is first-order asymptotically optimal under Lorden’s minimax criterion, for every possible value of the unknown post-change parameter, as the mean time to false alarm goes to infinity. We also provide simulation results to illustrate the performance of the WCC procedure.

Sundara Rajan Srinivasavaradhan Pavlos Nikolopoulos Christina Fragouli Suhas Diggavi

We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.

Ezgi Ozyilkan Johannes Ballé Elza Erkip

We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.

Giuseppe Serra Photios A. Stavrou Marios Kountouris

In this paper, we study the computation of the rate-distortion-perception function (RDPF) for a multivariate Gaussian source assuming jointly Gaussian reconstruction under mean squared error (MSE) distortion and, respectively, Kullback–Leibler divergence, geometric Jensen-Shannon divergence, squared Hellinger distance, and squared Wasserstein-2 distance perception metrics. To this end, we first characterize the analytical bounds of the scalar Gaussian RDPF for the aforementioned divergence functions, also providing the RDPF-achieving forward “test-channel” realization. Focusing on the multivariate case, assuming jointly Gaussian reconstruction and tensorizable distortion and perception metrics, we establish that the optimal solution resides on the vector space spanned by the eigenvector of the source covariance matrix. Consequently, the multivariate optimization problem can be expressed as a function of the scalar Gaussian RDPFs of the source marginals, constrained by global distortion and perception levels. Leveraging this characterization, we design an alternating minimization scheme based on the block nonlinear Gauss–Seidel method, which optimally solves the problem while identifying the Gaussian RDPF-achieving realization. Furthermore, the associated algorithmic embodiment is provided, as well as the convergence and the rate of convergence characterization. Lastly, for the “perfect realism” regime, the analytical solution for the multivariate Gaussian RDPF is obtained. We corroborate our results with numerical simulations and draw connections to existing results.

Xinyi Tong Jian Xu Shao-Lun Huang

In the popular federated learning scenarios, distributed nodes often represent and exchange information through functions or statistics of data, with communicative processes constrained by the dimensionality of transmitted information. This paper investigates the fundamental limits of distributed parameter estimation and model training problems under such constraints. Specifically, we assume that each node can observe a sequence of i.i.d. sampled data and communicate statistics of the observed data with dimensionality constraints. We first show the Cramer-Rao lower bound (CRLB) and the corresponding achievable estimators for the distributed parameter estimation problems, and the geometric insights and the computable algorithms of designing efficient estimators are also presented. Moreover, we consider model parameters training for distributed nodes with limited communicable statistics. We demonstrate that in order to optimize the excess risk, the feature functions of the statistics shall be designed along the largest eigenvectors of a matrix induced by the model training loss function. In summary, our results potentially provide theoretical guidelines of designing efficient algorithms for enhancing the performance of distributed learning systems.

Yanyan Dong Shenghao Yang Jie Wang Fan Cheng

Wireless communication links suffer from outage events caused by fading and interference. To facilitate a tractable analysis of network communication throughput and latency, we propose an outage link model to represent a communication link in the slow fading phenomenon. For a line-topology network with outage links, we study three types of intermediate network node schemes: random linear network coding, store-and-forward, and hop-by-hop retransmission. We provide the analytical formulas for the maximum throughputs and the end-to-end latency for each scheme. To gain a more explicit understanding, we perform a scalability analysis of the throughput and latency as the network length increases. We observe that the same order of throughput/latency holds across a wide range of outage functions for each scheme. We illustrate how our exact formulae and scalability results can be applied to compare different schemes.

Hassan ZivariFard Rémi A. Chou

Consider a source and multiple users who observe the independent and identically distributed (i.i.d.) copies of correlated Gaussian random variables. The source wishes to compress its observations and store the result in a public database such that (i) authorized sets of users are able to reconstruct the source with a certain distortion level, and (ii) information leakage to non-authorized sets of colluding users is minimized. In other words, the recovery of the source is restricted to a predefined access structure. The main result of this paper is a closed-form characterization of the fundamental trade-off between the source coding rate and the information leakage rate. As an example, threshold access structures are studied, i.e., the case where any set of at least t users is able to reconstruct the source with some predefined distortion level and the information leakage at any set of users with a size smaller than t is minimized.

Jay Whang Alliot Nagle Anish Acharya Hyeji Kim Alexandros G. Dimakis

We consider the Distributed Source Coding (DSC) problem concerning the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. This seminal result was later extended to lossy compression of distributed sources by Wyner, Ziv, Berger, and Tung. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational auto-encoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR.

Ariel K. Feldman Praveen Venkatesh Douglas J. Weber Pulkit Grover

This paper brings together topics of two of Berger’s main contributions to information theory: distributed source coding, and living information theory. Our goal is to understand which information theory techniques can be helpful in understanding a distributed source coding strategy used by the natural world. Towards this goal, we study the example of the encoding of location of an animal by grid cells in its brain. We use information measures of partial information decomposition (PID) to assess the unique, redundant, and synergistic information carried by multiple grid cells, first for simulated grid cells utilizing known encodings, and subsequently for data from real grid cells. In all cases, we make simplifying assumptions so we can assess the consistency of specific PID definitions with intuition. Our results suggest that the measure of PID proposed by Bertschinger et al. (Entropy, 2014) provides intuitive insights on distributed source coding by grid cells, and can be used for subsequent studies for understanding grid-cell encoding as well as broadly in neuroscience.

Ahmed H. Salamah Kaixiang Zheng Linfeng Ye En-Hui Yang

Conventional image compression techniques are primarily developed for the human visual system. However, with the extensive use of deep neural networks (DNNs) for computer vision, more and more images will be consumed by DNN-based intelligent machines, which makes it crucial to develop image compression techniques customized for DNN vision while being JPEG compliant. In this paper, we revisit the JPEG rate distortion theory for DNN vision. First, we propose a novel distortion measure, dubbed the sensitivity weighted error (SWE), for DNN vision. Second, we incorporate SWE into the soft decision quantization (SDQ) process of JPEG to trade SWE for rate. Finally, we develop an algorithm, called OptS, for designing optimal quantization tables for the luminance channel and chrominance channels, respectively. To test the performance of the resulting DNN-oriented compression framework and algorithm, experiments of image classification are conducted on the ImageNet dataset for four prevalent DNN models. Results demonstrate that our proposed framework and algorithm achieve better rate-accuracy (R-A) performance than the default JPEG. For some DNN models, our proposed framework and algorithm provide a significant reduction in the compression rate up to 67.84% with no accuracy loss compared to the default JPEG.

Elad Domanovitz Anatoly Khina Tal Philosof Yuval Kochman

We consider a line network of nodes, connected by additive white noise channels, equipped with local feedback. We study the velocity at which information spreads over this network. For transmission of a data packet, we give an explicit positive lower bound on the velocity, for any packet size. Furthermore, we consider streaming, that is, transmission of data packets generated at a given average arrival rate. We show that a positive velocity exists as long as the arrival rate is below the individual Gaussian channel capacity, and provide an explicit lower bound. Our analysis involves applying pulse-amplitude modulation to the data (successively in the streaming case), and using linear mean-squared error estimation at the network nodes. For general white noise, we derive exponential error-probability bounds. For single-packet transmission over channels with (sub-)Gaussian noise, we show a doubly-exponential behavior, which reduces to the celebrated Schalkwijk–Kailath scheme when considering a single node. Viewing the constellation as an “analog source”, we also provide bounds on the exponential decay of the mean-squared error of source transmission over the network.

Nadim Ghaddar Lele Wang

The problem of coding for the uplink and downlink of cloud radio access networks (C-RAN’s) with K users and L relays is considered. It is shown that low-complexity coding schemes that achieve any point in the rate-fronthaul region of joint coding and compression can be constructed starting from at most $4(K+L)-2$ point-to-point codes designed for symmetric channels. This reduces the seemingly hard task of constructing good codes for C-RAN’s to the much better understood task of finding good codes for single-user channels. To show this result, an equivalence between the achievable rate-fronthaul regions of joint coding and successive coding is established. Then, rate-splitting and quantization-splitting techniques are used to show that the task of achieving a rate-fronthaul point in the joint coding region can be simplified to that of achieving a corner point in a higher-dimensional C-RAN problem. As a by-product, some interesting properties of the rate-fronthaul region of joint decoding for uplink C-RAN’s are also derived.

Michael Gastpar Erixhen Sula

The Shannon lower bound has been the subject of several important contributions by Berger. This paper surveys Shannon bounds on rate-distortion problems under mean-squared error distortion with a particular emphasis on Berger’s techniques. Moreover, as a new result, the Gray-Wyner network is added to the canon of settings for which such bounds are known. In the Shannon bounding technique, elegant lower bounds are expressed in terms of the source entropy power. Moreover, there is often a complementary upper bound that involves the source variance in such a way that the bounds coincide in the special case of Gaussian statistics. Such pairs of bounds are sometimes referred to as Shannon bounds. The present paper puts Berger’s work on many aspects of this problem in the context of more recent developments, encompassing indirect and remote source coding such as the CEO problem, originally proposed by Berger, as well as the Gray-Wyner network as a new contribution.

Sundara Rajan Srinivasavaradhan Pavlos Nikolopoulos Christina Fragouli Suhas Diggavi

Proactive testing and interventions are crucial for disease containment during a pandemic until widespread vaccination is achieved. However, a key challenge remains: Can we accurately identify all new daily infections with only a fraction of tests needed compared to testing everyone, everyday? Group testing reduces the number of tests but overlooks infection dynamics and non i.i.d nature of infections in a community, while on the other hand traditional SIR (Susceptible-Infected-Recovered) models address these dynamics but don’t integrate discrete-time testing and interventions. This paper bridges the gap. We propose a “discrete-time SIR stochastic block model” that incorporates group testing and daily interventions, as a discrete counterpart to the well-known continuous-time SIR model that reflects community structure through a specific weighted graph. We analyze the model to determine the minimum number of daily group tests required to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.

Michael G Jabbour Nilanjana Datta

Uniform continuity bounds on entropies are generally expressed in terms of a single distance measure between probability distributions or quantum states, typically, the total variation-or trace distance. However, if an additional distance measure is known, the continuity bounds can be significantly strengthened. Here, we prove a tight uniform continuity bound for the Shannon entropy in terms of both the local-and total variation distances, sharpening an inequality in I. Sason, �鶹��ýӳ�� Trans. Inf. Th., 59, 7118 (2013). We also obtain a uniform continuity bound for the von Neumann entropy in terms of both the operator norm-and trace distances. We then apply our results to compute upper bounds on channel capacities. We first refine the concept of approximate degradable channels by introducing (ɛ,ν)—degradable channels. These are ε;-‒close in diamond norm and ν-‒close in completely bounded spectral norm to their complementary channel when composed with a degrading channel. This leads to improved upper bounds to the quantum-and private classical capacities of such channels. Moreover, these bounds can be further improved by considering certain unstabilized versions of the above norms. We show that upper bounds on the latter can be efficiently expressed as semidefinite programs. As an application, we obtain a new upper bound on the quantum capacity of the qubit depolarizing channel.

�鶹��ýӳ��

Table of Contents