| CARVIEW |
Workshop on Algorithms for Large Data (Online) 2021
This workshop aims to foster collaborations between researchers across multiple disciplines through a set of central questions and techniques for algorithm design for large data. We will focus on topics such as sublinear algorithms, randomized numerical linear algebra, streaming and sketching, and learning and testing.
| What | Workshop on Algorithms for Large Data |
|---|---|
| When | Monday, August 23 - Wednesday, August 25, 2021 |
| Where | The workshop will be held virtually. |
Organizers
Organizers- Ainesh Bakshi (Carnegie Mellon)
- Rajesh Jayaram (Google Research NYC)
- Samson Zhou (Carnegie Mellon)
Registration
RegistrationThe workshop is now over. Thanks for your interest in attending WALDO21!
Speakers
Speakers- Sepehr Assadi (Rutgers)
- Clément Canonne (Sydney)
- Petros Drineas (Purdue)
- Talya Eden (MIT)
- Alina Ene (Boston University)
- Sumegha Garg (Harvard)
- Anna C. Gilbert (Yale)
- Robert Krauthgamer (Weizmann)
- Rasmus Kyng (ETH)
- Jerry Li (MSR)
- Sepideh Mahabadi (TTIC)
- Cameron Musco (UMass Amherst)
- Christopher Musco (NYU)
- Jelani Nelson (Berkeley)
- Richard Peng (Georgia Tech)
- Eric Price (UT Austin)
- Sofya Raskhodnikova (Boston University)
- Dana Ron (Tel Aviv)
- Madhu Sudan (Harvard)
- Santosh Vempala (Georgia Tech)
- Erik Waingarten (Stanford)
- David Wajc (Stanford)
- David P. Woodruff (Carnegie Mellon)
- Qin Zhang (Indiana)
Schedule
Schedule
I will discuss the Steiner forest problem in the Euclidean plane, where the input is a multiset of points, partitioned into k color classes, and the goal is to find a minimum-cost
Euclidean graph G such that every color class is connected. We study this problem in dynamic streams, where the input is provided by a stream of insertions and
deletions of colored points from the discrete grid [\Delta]^2.
Our main result is a single-pass streaming algorithm that uses poly(k \log\Delta) space and time, and estimates the cost of an optimal Steiner forest solution
within ratio arbitrarily close to the famous Euclidean Steiner ratio $\alpha_2$ (currently $1.1547 \leq \alpha_2 \leq 1.214$). Our approach relies on a novel
combination of streaming techniques, like sampling and linear sketching, with the classical dynamic-programming framework for geometric optimization problems,
which usually requires large memory and has so far not been applied in the streaming setting.
I will also discuss possible directions for future work.
Joint work with Artur Czumaj, Shaofeng H.-C. Jiang, and Pavel Vesely.
There has been a rapidly growing interest in recent years in understanding various graph parameter estimation problems such as estimating
the size of maximum cuts or maximum matchings, weight of minimum spanning trees, or property testing problems such as connectivity,
cycle-freeness, or bipartiteness in the streaming model. However, from the lower bound perspective, almost all prior work has solely
focused on single-pass algorithms and not much is known for multi-pass algorithms, even in just two passes.
In this talk, we discuss the first multi-pass lower bounds for a wide family of these problems: for many problems of interest,
including all the above, obtaining a (1+eps)-approximation requires near-linear space or Omega(1/eps) passes, even on highly
restricted families of graphs such as bounded-degree planar graphs. A key ingredient of our proofs is a simple streaming XOR Lemma,
a generic hardness amplification result that might be of independent interest: informally speaking, if a p-pass s-space streaming
algorithm can only solve a decision problem with advantage delta > 0 over random guessing, then it cannot solve XOR of L independent
copies of the problem with advantage much better than delta^L.
We will also discuss open problems and possible directions for future work. Based on the following joint works:
Sepehr Assadi and Vishvajeet N: Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma (arXiv: abs/2104.04908)
Sepehr Assadi, Gillat Kol, Raghuvansh Saxena, and Huacheng Yu: Multi-Pass Graph Streaming Lower Bounds for Cycle Counting, MAX-CUT, Matching Size, and Other Problems (arXiv: abs/2009.03038)
Counting up to N deterministically of course takes Theta(log N) bits. In the first ever streaming algorithm, Morris in 1978 gave a randomized algorithm that improved upon this bound exponentially. The best known analysis of his algorithm shows that it gives a 1+eps approximation to N with probability at least 1-delta using O(loglog N + log(1/eps) + log(1/delta)) bits with high probability (the space usage is itself a random variable). We show that a very slight (but necessary) tweak of his algorithm actually achieves the better bound O(loglog N + log(1/eps) + loglog(1/delta)) bits, and we also show a new matching lower bound, establishing optimality. Our upper and lower bounds nearly match, up to small explicit constant factors. Joint work with Huacheng Yu.
Consider the problem of computing the majority of a stream of n i.i.d. uniformly random bits.
This problem, known as the coin problem, is central to a number of counting problems in different data stream models.
We show that any streaming algorithm for solving this problem with large constant advantage (over the uniform distribution)
must use Ω(log n) bits of space. Previously, it was known that computing the majority on every input with a constant probability
takes Ω(log n) space. We extend our lower bound to proving tight lower bounds for solving multiple, randomly interleaved copies
of the coin problem, as well as for solving the OR of multiple copies of a variant of the coin problem. Our proofs involve new measures
of information complexity that are well-suited for data streams.
We use these lower bounds to obtain a number of new results for data streams. In each case there is an underlying d-dimensional
vector x with additive updates to its coordinates given in a stream of length m. The input streams arising from our coin lower bound
have nice distributional properties, and consequently for many problems for which we only had lower bounds in general turnstile streams,
we now obtain the same lower bounds in more natural models, such as the bounded deletion model, in which ||x||_2 never drops by a
constant fraction of what it was earlier, or in the random order model, in which the updates are ordered randomly.
Based on joint work with Mark Braverman and David P. Woodruff.
In this talk we will describe some of our recent work giving new upper and lower bounds on the approximability of
constraint satisfaction problems (CSPs) in the (single-pass, often dynamic) streaming settings. In particular,
when streaming algorithms are constrained to sub-polynomial space in the input length and the constraints may be
inserted or removed we get a fine dichotomy: CSPs on n variables are either solvable in polylogarithmic space or require
at least sqrt(n) space. We also get Omega(n) lower bounds for a broad class of CSPs. Our positive results show the broad
applicability of what we call ``bias-based algorithms'', and our negative results work by abstracting and significantly
generalizing previous bounds for the Maximum Cut problem. In the talk we will also describe the many obvious and non-obvious
open questions and directions.
Based on the following joint works:
1) Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, Santhoshini Velusamy: Approximability of all Boolean CSPs in the dynamic streaming setting. CoRR abs/2102.12351 (2021)
2) Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, Santhoshini Velusamy: Approximability of all finite CSPs in the dynamic streaming setting. CoRR abs/2105.01161 (2021)
3) Noah Singer, Madhu Sudan, Santhoshini Velusamy: Streaming approximation resistance of every ordering CSP. CoRR abs/2105.01782 (2021)
4) Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, Ameya Velingker, Santhoshini Velusamy: Linear Space Streaming Lower Bounds for Approximating CSPs. CoRR abs/2106.13078 (2021)
Adaptive gradient descent methods, such as the celebrated Adagrad algorithm (Duchi, Hazan, and Singer; McMahan and Streeter) and ADAM algorithm (Kingma and Ba), are some of the most popular and influential iterative algorithms for optimizing modern machine learning models. Algorithms in the Adagrad family use past gradients to set their step sizes and are remarkable due to their ability to automatically adapt to unknown problem structures such as (local or global) smoothness and convexity. However, these methods achieve suboptimal convergence guarantees even in the standard setting of minimizing a smooth convex function, and it has been a long-standing open problem to develop an accelerated analogue of Adagrad in the constrained setting. In this talk, we present one such accelerated adaptive algorithm for constrained convex optimization that simultaneously achieves optimal convergence in the smooth, non-smooth, and stochastic setting, without any knowledge of the smoothness parameters or the variance of the stochastic gradients. The talk is based on joint work with Huy Nguyen (Northeastern University) and Adrian Vladu (CNRS & IRIF, University de Paris).
The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms. The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of k-step walks from a vertices chosen uniformly at random can be approximated up to error epsilon per walk using O_{k,epsilon}(a) words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA ‘18]. Applications of our result include the estimation of the average return probability of the k-step walk (the trace of the k'th power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs. Based on joint work with John Kallaugher and Michael Kapralov.
I will discuss the (semi-)streaming submodular matching problem, where the edges of an unknown n-node graph are presented one by one, and
the objective is to, using only \tilde{O}(n) bits of memory, output a matching of high value, as quantified by some submodular function.
This problem, which generalizes semi-streaming matching and weighted matching, is one of the first problems studied in the context of
streaming submodular optimization, and has spurred the development of key techniques in the area.
We present a number of improved upper and lower bounds on the approximability of this problem. I will outline the new techniques used to
obtain our upper bounds (applying the primal-dual method to a natural configuration LP for our streaming submodular problem) and our
lower bounds (combining conditional and unconditional hardness in a streaming context). I will leave time to discuss a number of intriguing
follow-up directions and open questions.
Based on joint work with Roie Levin.
In this talk, I'll discuss a number of recent works in fine-grained complexity that establish hardness results for many classes of structured linear equations and linear programs. We will see that if the nearly-linear time solvers for Laplacian matrices and their generalizations can be extended to solve just slightly larger families of linear systems, then they can be used to quickly solve all systems of linear equations over the reals. This result can be viewed either positively or negatively: either it suggests a new route to getting faster general linear equation solvers or it indicates that progress on solvers for structured linear equations may soon halt. Along similar lines, if the accuracy of fast solvers for packing and covering linear programs can be substantially improved, this will lead to faster solvers for general linear equations. And finally, a recent result shows that any algorithm that solves the 2-commodity flow maximum flow problem can solve general linear programs in essentially the same time. This last result follows the strategy of Itai’s polynomial-time reduction of a linear program to a 2-commodity flow problem (JACM’78) but makes the reduction faster and more robust. Finally, I'll note some open problems in fine-grained complexity and point out how you might spot more opportunities. The talk is based on joint works with Ming Ding (ETH Zurich), Di Wang (Google Research), and Peng Zhang (Rutgers).
A celebrated line of work on isoperimetric inequalities for the hypercube
[Margulis '74, Talagrand '93, Chakrabarty and Seshadhri '13, Khot Minzer Safra '15]
studies the size of the ''boundary'' between the points on which a Boolean function takes value 0 and the points on which it takes value 1.
This boundary is defined in terms of the (directed or undirected) edges of a high-dimensional hypercube that represents the domain of
the function. The inequality of Khot, Minzer, and Safra '15 implies all the previous inequalities in this line of work.
It relates the average of the square root of the number of decreasing edges incident on a vertex of the hypercube to the distance to
monotonicity of the function.
We generalize all the inequalities in this line of work to real-valued functions. Our main contribution is a Boolean decomposition that represents every
real-valued function f over an arbitrary partially ordered domain as a collection of Boolean functions over the same domain, roughly capturing the distance
of f to monotonicity and the structure of violations of f to monotonicity. We apply our generalized isoperimetric inequalities to improve algorithms for
testing monotonicity and approximating the distance to monotonicity for real-valued functions.
Based on joint work with Hadley Black (UCLA) and Iden Kalemaj (Boston University).
In this talk, we exploit the ill-posedness of linear inverse problems to design algorithms to release differentially private data or measurements of the physical system. We discuss the spectral requirements on a matrix such that only a small amount of noise is needed to achieve privacy and contrast this with the ill-conditionedness. We then instantiate our framework with several diffusion operators and explore recovery via l1 constrained minimization. Our work indicates that it is possible to produce locally private sensor measurements that both keep the exact locations of the heat sources private and permit recovery of the "general geographic vicinity" of the sources.
Effective resistance is a physics-motivated quantity that encapsulates both the congestion and the dilation of a flow.
It has close connections with many key concepts in combinatorial optimization, graph sparsification, and network science,
such as electrical flows, leverage scores, and network centrality.
This talk will briefly overview sublinear-time data structures for maintaining effective resistances and related quantities in dynamically changing graphs.
It will focus on random walk interpretations of Schur Complements, which are intermediate states of Gaussian elimination.
Potential connections with vertex sparsification and fine-grained complexity will also be discussed.
The Gaussian Mixture Model (Pearson 1906) is widely used for high-dimensional data. While classical results establish its unique identifiability, it was shown in 2010 (Kalai-Moitra-Valiant, Belkin-Sinha) that for any fixed number of component Gaussians, the underlying mixture parameters can be estimated to arbitrary accuracy in polynomial time. Robust Statistics (Huber; Tukey) asks for estimation of underlying models robustly, i.e., in the presence of a bounded fraction of arbitrary noise (outliers). This appeared to be computationally intractable, even for mean estimation of "nice" distributions, till relatively recently (Diakonikolas-Kamath-Kane-Li-Moitra-Stewart16,Lai-Rao-Vempala16). These and other works highlight the robust estimation of GMMs as a central open problem. In this talk, we will present the first polytime algorithm for the problem for any fixed number of components with no assumptions on the underlying mixture. The techniques developed appear likely to be useful for other problems. This is joint work with Ainesh Bakshi, Ilias Diakonikolas, He Jia, Daniel Kane and Pravesh Kothari.
In the adversarially robust streaming model, a stream of elements is presented to an algorithm and is allowed to depend on the output of the algorithm at earlier times during the stream. In the classic insertion-only model of data streams, Ben-Eliezer et. al. show how to convert a non-robust algorithm into a robust one with a roughly 1/eps factor overhead. This was subsequently improved to a 1/sqrt{eps} factor overhead by Hassidim et. al., suppressing logarithmic factors. For general functions the latter is known to be best-possible, by a result of Kaplan et. al. We show how to bypass this impossibility result by developing data stream algorithms for a large class of streaming problems, with no overhead in the approximation factor. Our class of streaming problems includes the most well-studied problems such as the L2-heavy hitters problem, Fp-moment estimation, as well as empirical entropy estimation. We substantially improve upon all prior work on these problems, giving the first optimal dependence on the approximation factor. As in previous work, we obtain a general transformation that applies to any non-robust streaming algorithm and depends on the so-called flip number. However, the key technical innovation is that we apply the transformation to what we call a difference estimator for the streaming problem, rather than an estimator for the streaming problem itself. We then develop the first difference estimators for a wide range of problems. Our difference estimator methodology is not only applicable to the adversarially robust model, but to other streaming models where temporal properties of the data play a central role, and in particular, we obtain the first optimal dependence on eps for Fp-moment estimation in the sliding window model. Based on joint work with Samson Zhou.
Linear programming is a central problem in computer science and applied mathematics with numerous applications across a wide range of domains, including machine learning and data science. Interior point methods (IPMs) are a common approach to solving linear programs with strong theoretical guarantees and solid empirical performance. The time complexity of these methods is dominated by the cost of solving a linear system of equations at each iteration. In common applications of linear programming, particularly in data science and scientific computing, the size of this linear system can become prohibitively large, requiring the use of iterative solvers which provide an approximate solution to the linear system. Approximately solving the linear system at each iteration of an IPM invalidates common analyses of IPMs and the theoretical guarantees they provide. In this talk we will discuss how randomized linear algebra can be used to design and analyze theoretically and practically efficient IPMs when using approximate linear solvers.
Given samples from an unknown distribution p over {1,2,...,n}, how much
data is required to decide whether p is eps-far from the uniform
distribution, versus eps'-close to it? This question, tolerant
uniformity testing (and its two related variants, tolerant identity and
closeness testing), has received significant interest over the past
decade and is now reasonably well understood in both extreme cases,
eps'=0 ("standard" testing) and eps=2eps' (maximally noisy setting), for
which the sample complexity scales as sqrt(n) and n/log n, respectively.
In this talk, I will describe recent work revisiting this question, in
which we obtain the full trade-off between those two cases for both
identity and closeness testing. Specifically, we fully characterize the
"price of tolerance" for those problems as a function of n, eps, and
eps', up to a single log n factor. Our upper and lower bounds show quite
a few interesting phenomena, even in regimes which were hitherto thought
to be well understood.
Joint work with Ayush Jain, Gautam Kamath, and Jerry Li
(arXiv: abs/2106.13414).
In this work, we study the problem of testing subsequence-freeness.
For a given subsequence (word) w = w_1 … w_k, a sequence (text) T = t_1 \dots t_n is said to contain w if there exist indices 1 <= i_1 < … < i_k <= n
such that t_{i_{j}} = w_j for every 1 <= j <= k. Otherwise, T is w-free.
While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the ``standard'' sample-based model (i.e., under the uniform distribution), the algorithm is given samples (i,t_i) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w-free, and the case that T is \epsilon-far from being w-free (i.e., more than an \epsilon-fraction of its symbols should be modified so as to make it w-free).
Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O(k^2\log k/\epsilon) samples suffice for this testing task.
We obtain the following results.
+ The number of samples sufficient for sample-based testing (under the uniform distribution) is O(k/\epsilon). This upper bound builds on a characterization that we present for the distance of a text T from w-freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions.
+ We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight.
+ The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples (i,t_i) where $i$ is distributed according to an arbitrary distribution p (and the distance from w-freeness is measured with respect to p).
We highlight the fact that while we require that the testing algorithm work for every distribution and when only provided with samples, the complexity we get matches a known lower bound for a special case of the seemingly easier problem of testing subsequence-freeness under the uniform distribution and with queries by Canonne et al (Theory of Computing, 2019).
This is joint work with Asaf Rosin.
We study the problem of estimating the trace of a matrix that can only be accessed through matrix-vector multiplication. We introduce a new randomized algorithm, Hutch++, which computes a (1+epsilon) approximation to the trace of any positive semidefinite (PSD) matrix using just O(1/epsilon) matrix-vector products. This improves quadratically on the ubiquitous Hutchinson's estimator, which requires O(1/epsilon^2) matrix-vector products. Our approach is based on a simple technique for reducing the variance of Hutchinson's estimator using low-rank approximation, and is easy to implement and analyze. Moreover, we prove that up to a logarithmic factor, the complexity of Hutch++ is optimal amongst all matrix-vector query algorithms. We show that it significantly outperforms Hutchinson's method in experiments, on both PSD and non-PSD matrices. In the talk, I will also discuss a broader research program of finding asymptotically optimal algorithms for basic linear algebra problems in the matrix-vector query model of computation. Beyond our work on trace estimation, there is exciting recent progress on linear systems and eigenvalue approximation. I will briefly discuss remaining open questions and interesting connections to work in theoretical computer science, including on communication complexity and fine-grained complexity theory. This work is joint with Raphael A Meyer, Christopher Musco, and David P Woodruff. The associated paper appeared in the SIAM Symposium on Simplicity in Algorithms (SOSA 2021) and can be found here: https://arxiv.org/abs/2010.09649.
We consider the problem of approximating the arboricity of a graph $G=(V,E)$, which we denote by $arb(G)$, in sublinear time, where the arboricity of a graph is the minimal number of forests required to cover its edges. An algorithm for this problem may perform degree and neighbor queries, and is allowed a small error probability. We design an algorithm that outputs an estimate $\hat{\alpha}$, such that with probability $1-1/poly(n)$, $arb(G)/c log2 n \leq \hat{\alpha} \leq \arb(G)$ , where $n=|V|$ and $c$ is a constant. The expected query complexity and running time of the algorithm are $O(n/arb(G))\cdot \poly(\log n)$, and this upper bound also holds with high probability. This bound is optimal for such an approximation up to a $poly(log n)$ factor. Based on joint work with Saleet Mossel and Dana Ron.
In this talk, we consider the problem of preconditioning a given matrix M by a diagonal matrix. In other words, the problem is to find a scaling of the rows or columns of M which maximally improves the condition number of M. This is a well studied problem in numerical computing, and heuristics such as Jacobi preconditioning are widely used in practice to solve it. Despite this, the problem remains poorly understood. We make two contributions to this front: (1) we provide new, optimal guarantees for Jacobi preconditioning, demonstrating that it achieves a quadratic approximation to the optimal scaling, and moreover, this is optimal, and (2) we provide new, nearly-linear time algorithms which computes a constant factor scaling for M. We achieve these algorithms by developing new solvers for structured mixed packing and covering SDPs. Finally, we demonstrate applications of these techniques to settings such as semi-random linear regression, and improving risk in several statistical regression models.
In this talk, we prove a two-sided variant of the Kirszbraun theorem. Consider an arbitrary subset X of Euclidean space and its superset Y. Let f be a 1-Lipschitz map from X to R^m. The Kirszbraun theorem states that the map f can be extended to a 1-Lipschitz map f' from Y to R^m. While the extension f' does not increase distances between points, there is no guarantee that it does not decrease distances significantly. In fact, f' may even map distinct points to the same point (that is, it can infinitely decrease some distances). However, we prove that there exists a (1 + \epsilon)-Lipschitz outer extension f' (that maps from Y to R^{m′}) that does not decrease distances more than "necessary"; and then show some applications of this theorem in metric embedding. This is a joint work with Arturs Backurs, Konstantin Makarychev and Yury Makarychev.
I will discuss new work on the popular kernel polynomial method (KPM) for approximating the spectral density (eigenvalue distribution) of an nxn Hermitian matrix A with real-valued eigenvalues. I will show that a practical variant of the KPM algorithm can approximate the spectral density to epsilon accuracy in the Wasserstein-1 distance with roughly O(1/epsilon) matrix-vector multiplications with A. Moreover, the method is robust to inaccuracy in these matrix-vector multiplications, which allows it to be combined with any approximation algorithm for computing them. As an application, I will discuss a randomized sublinear time algorithm for approximating the spectral density of any normalized graph adjacency or Laplacian matrix. The talk will cover the main tools used in our work, which include Jackson’s seminal work on approximation with positive kernels, and stability results for computing orthogonal polynomials via three-term recurrence relations.
Reinforcement learning has witnessed great research advancement in recent years and achieved successes in many practical applications. However, reinforcement-learning algorithms also have the reputation for being data- and computation-hungry for large-scale applications. Today we will talk about how to make reinforcement-learning algorithms scalable via introducing multiple learning agents and allowing them to collect data and learn optimal strategies collaboratively. We will use a basic problem called best arm identification in multi-armed bandits as a vehicle to introduce the collaborative learning model. We will also discuss some follow-up works and future directions.
We give new sketching algorithms for Geometric MST and EMD. For Geometric MST, we are given n points x1, …, xn\in[N]^d,
and we give a linear sketch that can approximate the minimum over all spanning trees T of [n]^2 of ∑_{(i,j)\in T} | x_i - x_j|_p, where p\in[1,2]
up to a factor of ~O(log n) using space polylog(n,d,N). For EMD, we are given two sets A, B of n points in [N]^d, and we give a linear sketch
that can approximate the Earth Mover Distance of A and B in L_p (the cost of minimum-cost matching between A and B) up to multiplicative factor of ~O(log n)
and additive factor epsilon*Nnd using space poly(1/epsilon, log(nd)) (the additive error can be removed with two rounds of sketching). These linear sketches
give corresponding streaming algorithms.
Prior work gave sketches which were tailored for low-dimensional spaces (since space complexity was exponential in dimension), or
suffered approximations which degraded as the dimension increased. I'll survey what's known about these two problems, and explain
the main idea behind our new sketches. Namely, we revisit the method of tree embeddings in the context of these sketching algorithms,
and give a new "data-dependent" tree embedding which doesn't suffer a degrading distortion as the dimension increases. As we go,
I'll make sure to highlight a lot of really exciting open problems related to MST and EMD, as well as geometric sketching/streaming
algorithms more broadly.
Joint work with Xi Chen, Rajesh Jayaram, and Amit Levi.
Posters
Posters Tuesday, August 24th, 2-2:30 pm ET:- Mitali Bafna (Harvard): Optimal Fine-grained Hardness of Approximation of Linear Equations
- Sabyasachi Basu (UC Santa Cruz): The Complexity of Testing All Properties of Planar Graphs, and the Role of Isomorphism
- Alex Block (Purdue): Private and Resource-Bounded Locally Decodable Codes for Insertions and Deletions
- Justin Chen (MIT): Worst-Case Analysis for Randomly Collected Data
- Tyler Chen (Washington): Simple Algorithms for Spectral Sum and Spectrum Approximation
- Zhili Feng (Carnegie Mellon): Dimensionality Reduction for the Sum-of-Distances Metric
- Mehrdad Ghadiri (Georgia Tech): Fast Low-Rank Tensor Decomposition by Ridge Leverage Score Sampling
- Praneeth Kacham (Carnegie Mellon): Reduced Rank Regression with Operator Norm Error
- Iden Kalemaj (Boston University): Sublinear-Time Computation in the Presence of an Online Adversary
- John Kallaugher (UT Austin): Separations and Equivalences between Turnstile Streaming and Linear Sketching
- Young-San Lin (Purdue): Online Directed Spanners and Steiner Forests
- Arvind Mahankali (Carnegie Mellon): Streaming and Distributed Algorithms for Robust Column Subset Selection
- Raphael Meyer (NYU): Hutch++: Optimal Stochastic Trace Estimation
- Tamalika Mukherjee (Purdue): Differentially Private Sublinear-Time Clustering
- Shyam Narayanan (MIT): Learning-Based Support Estimation in Sublinear Time
- Aditya and Advait Parulekar (UT Austin): L1 Regression with Lewis Weights Subsampling
- Edward (Ted) Pyne (Harvard): Local Access to Random Walks
- Archan Ray (UMass Amherst): Estimating Eigenvalues of Symmetric Indefinite Matrices
- Raghuvansh Saxena (Princeton): Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs
- Ayush Sekhari (Cornell): SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
- Supratim Shit (Technion): Nonparametric Coreset for Clustering
- Sandeep Silwal (MIT): Adversarial Robustness of Streaming Algorithms through Importance Sampling
- Kevin Tian (Stanford): Recent Advances in List-Decodable Mean Estimation
- Arsen Vasilyan (MIT): On Learning Monotone Probability Distributions over the Boolean Cube
- Santhoshini Velusamy (Harvard): How Well Can We Approximate CSPs in Streaming Settings?
- Chen Wang (Rutgers): Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
- Pruthuvi Maheshakya Wijewardena (Utah): Additive Error Guarantees for Weighted Low Rank Approximation
- Taisuke Yasuda (Carnegie Mellon): Exponentially Improved Dimension Reduction in L1
- Fred Zhang (Berkeley): Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
- Lichen Zhang (Carnegie Mellon): Fast Sketching of Polynomial Kernels of Polynomial Degree
Support
SupportWALD(O)'21 is generously supported by the LEarning and Algorithms for People and Systems group and the National Science Foundation. Web design by Pedro Paredes.