The Worst-Case Data-Generating Probability Measure in Statistical Learning
The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms.
The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms.
Linear regression is a fundamental and primitive problem in supervised machine learning, with applications ranging from epidemiology to finance. In this work, we propose methods for speeding up distributed linear regression. We do so by leveraging randomized techniques, while also ensuring security and straggler resiliency in asynchronous distributed computing systems. Specifically, we randomly rotate the basis of the system of equations and then subsample blocks, to simultaneously secure the information and reduce the dimension of the regression problem.
The framework of approximate differential privacy is considered, and augmented by leveraging the notion of 鈥渢he total variation of a (privacy-preserving) mechanism鈥 (denoted by $\eta $ -TV). With this refinement, an exact composition result is derived, and shown to be significantly tighter than the optimal bounds for differential privacy (which do not consider the total variation). Furthermore, it is shown that $(\varepsilon ,\delta )$ -DP with $\eta $ -TV is closed under subsampling. The induced total variation of commonly used mechanisms are computed.
We present the notion of reasonable utility for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a randomized graph coloring, we characterize these optimal DP in terms of their behavior on a certain subset of the boundary datasets we call a boundary hitting set.
We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use?
Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport 鈥 a popular regularization method in the literature that has often been leveraged for its computational benefits 鈥 enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples.
In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde {O}\left({\sqrt {T}+{}\frac {1}{\varepsilon }}\right)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde {O}\left({{}\frac {1}{\varepsilon }\sqrt {T}}\right)$ .
We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks.
Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios.
Secure aggregation protects the local models of the users in federated learning, by not allowing the server to obtain any information beyond the aggregate model at each iteration. Naively implementing secure aggregation fails to protect the integrity of the aggregate model in the possible presence of a malicious server forging the aggregation result, which motivates verifiable aggregation in federated learning.