Tutorials will have a duration of 3 hours and will take place at the conference venue on Sunday 4th and Monday 5th, June 2023.
Sun Jun 4, AM
Feng Yin (The Chinese University of Hong Kong, Shenzhen), Lei Cheng (Zhejiang University), Sergios Theodoridis (National and Kapodistrian University of Athens)
Motivation and Significance
Sparsity-aware learning has been the focus of scientific research for over twenty years. However, modern applications in the data era have posed new challenges such as overfitting avoidance, scalability, and uncertainty quantification, among others. In the realm of machine learning and signal processing, two major paths have survived for extracting information from data: path through optimization-based methods (sometimes called Frequentist methods) and path through Bayesian methods. A major difference between the two paths lies in the way regularization is embedded into the problem. In optimization based methods, one has to first decide on the specific regularization function to be used and then search for the optimal configuration of the associated regularization parameters. It is widely acknowledged that the regularization parameters need fine-tuning, which can be carried out either through cross-validation or via a subjective choice by the user. On the other hand, in light of the Bayesian philosophy, one only has to adopt a proper prior, and all the related information for both the model selection and model parameter learning can be acquired from the training data.
Although in the context of optimization-based methods, extensions to some cutting-edge data-adaptive models have been extensively investigated, it is only until very recently that Bayesian sparsity promoting techniques, including the joint design of prior and inference algorithms, start attracting significant attention. In particular, we will focus on sparsity-aware Bayesian learning for three popular models, namely the deep neural networks (DNNs), Gaussian processes (GPs), and tensor decomposition. The challenges that we need to conquer are twofold: 1)The art of prior: how can the sparsity-promoting priors in simple models be extended and tailored to fit such modern data driven models with complex patterns? 2)The art of inference: how can modern optimization theory and stochastic approximation techniques be leveraged to design fast, accurate, and scalable inference algorithms?
In recent years, various breakthroughs to the above models and challenges have been achieved. Particularly, in supervised learning involving over-parameterized deep neural networks, novel data-driven mechanisms have been proposed to intelligently prune redundant neuron connections without human assistance. In a similar vein, sparsity-promoting priors have been used in the context of Gaussian processes that give rise to optimal and interpretable kernels automatically, resulting in enhanced generalization performance. In the unsupervised learning front, advances in tensor decompositions have shown that sparsity-promoting priors can unravel the few underlying source signals from their mixed observations in a completely tuning free fashion. Such techniques have found various applications in modern data processing, including data classification, time-series data prediction, blind source separation, image completion, and wireless communications.
Timeliness and Novelty
It is celebrating to see that Bayesian learning has returned to the central stage of the AI community in recent years due to the ever-increasing interest in learning with uncertainties, scalable methods, meta-learning with small (non-stationary) data, active learning, continual learning, etc (Z. Ghahramani, NeurIPS keynote 2016; N. Lawrence, NeurIPS tutorial 2017; D. Dunson, NeurIPS 2018; E. Khan, NeurIPS tutorial 2019; S. Theodoridis, ICASSP Plenary 2022). Inspired by the above stated facts, we see the great potential of applying advanced Bayesian models to more and more intelligent signal processing, wireless communications, and general data modeling tasks. It is the goal of this tutorial to provide a timely and comprehensive review and hands-on implementations of sparsity aware Bayesian learning techniques, in the context of three frontline modeling trends, namely, over-parameterized deep neural networks, Gaussian processes, and tensor decompositions. We will introduce, on the one hand, various newly proposed sparsity-promoting priors, as well as some salient ones that, although being powerful and generic, had never been used before in the above models, and, on the other hand, some recent developments in tailored inference algorithms. These two fundamental challenges, namely the art of prior and the art of inference, have been rigorously tackled in a number of recent works. However, there is still no tutorial aiming to give a unified treatment on the underlying key ideas and techniques, possibly due to the wide variety of data tools and applications involved therein. This tutorial aims to present, in a unified way, a broad range of sparsity-promoting priors and the associated inference techniques for generic signal processing and machine learning tasks. We will articulate the inherent mechanisms using easy-to-understand wordings and abundant graphical illustrations. Also, well selected experimental studies will help the audience from signal processing society to grasp the essence of modern sparsity-aware Bayesian learning, as a powerful and promising alternative to the optimization based methods.
Outline of the Tutorial
In this tutorial, we will take a pedagogical approach to overview the motivation, fundamentals, and recent advances of sparsity-aware Bayesian learning. Detailed outline is given as follows. A tentative schedule for our tutorial is given in Table 1 below.
- Part I: Gentle Introduction to Bayesian Learning Basics. In this part, we will first provide some touches on the philosophy of Bayesian learning and then use Bayesian linear regression as an example to elucidate different symbol notations, terminology and unique features of Bayesian learning.
- Part II: The Art of Prior in Sparsity-Aware Bayesian Learning. This part will be devoted to exemplifying how the sparsity-promoting priors can be incorporated into modern data processing tools, including deep neural networks (DNN), Gaussian processes (GP), and tensor decompositions (TD) models. More precisely, in this part,
- we will survey the design of Gaussian scale mixture prior for DNN and TD, the Indian buffet process prior for DNN, and the spectral mixture kernel prior for GP;
- we will further analyze the sparsity-promoting properties of such priors in these models, which will naturally lead us to the fundamental trade-off between the expressive power of a statistical model and the tractability of the associated inference problems.
- Part III: The Art of Inference in Sparsity-Aware Bayesian Learning. To enable efficient inference for modern data processing, in this part:
- we will show that different inference tasks can be unified as a common evidence maximization problem, and introduce popular relaxation/approximation techniques such as (stochastic/sparse) variational approximation and smooth transform approximation;
- we will show how modern optimization techniques, including alternating direction method of multipliers (ADMM), inexact block coordinate descent method (BCD), and stochastic optimization, can be integrated into the realm of Bayesian inference.
- Part IV: Signal Processing and Machine Learning Applications.
In this part, we will present a number of experimental studies to show the superior automatic model learning capability of the introduced algorithms. The selected applications will include, among others,
- Adversarial learning in data classification using sparsityaware DNN;
- Time-series prediction using sparsity-aware GP regression;
- Image completion using sparsity-aware TD;
- Social group clustering using sparsity-aware TD.
- Part V: Concluding Remarks and Future Directions. We will conclude the tutorial, summarize the key outcomes, and discuss potential research directions.
Yuejie Chi (Carnegie Mellon University), Zhize Li (Carnegie Mellon University)
Rationale for the tutorial: The proliferation of multi-agent environments in emerging applications such as internet-of-things (IoT), networked sensing, and autonomous systems leads to a flurry of activities on developing federated and decentralized optimization algorithms for training predictive models, particularly under the realm of federated learning (FL). The distinctive features of these large-scale distributed systems have posed unique challenges that are not well-captured by the classical distributed optimization framework, and therefore spurred a significant amount of recent algorithmic developments focusing on the following aspects:
- Resource efficiency: the ever-growing large scale and high dimensionality of the datasets necessitate the need to develop algorithms that perform well in a resource-efficient manner in terms of both communication and computation.
- Resiliency to heterogeneity: Data samples collected from different agents can be highly unbalanced and heterogeneous, where vanilla federated optimization algorithms (e.g. FedAvg) can converge very slowly or even diverge, and better algorithm designs are called for to handle the heterogeneity issue.
- Privacy preserving: While FL holds great promise of harnessing the inferential power of private data stored on a large number of distributed clients, these local data at clients often contain sensitive or proprietary information without consent to share. It is thus desirable for federated optimization algorithms to preserve privacy in a guaranteed manner.
Our goal is to equip signal processing researchers with the core toolkits and recent advances of federated optimization and inspire the pursuit of further theory, algorithms, and applications from the signal processing community on this multi-disciplinary and fast-growing topic. Given the popularity of FL in various signal processing applications, we expect this tutorial will be very timely and attract a large audience. Last but not least, our theme on FL is closely related to building AI systems, and therefore fits very well with the conference’s theme on AI this year.
Tutorial abstract and outline: The proposed tutorial will cover systematically recent advances in federated optimization that highlight algorithmic ideas that enable resource efficiency, resiliency, and privacy in both server-client and network settings, as well as address multiple FL paradigms including but not limited horizontal, vertical and personalized FL. In particular, the primary focus will be on the nonconvex setting, which is more important for modern machine learning applications. The structure of the tutorial is tentatively outlined below. It is worth emphasizing that the algorithms mentioned below are examples of what we intend to cover/illustrate and should not be taken as an exhaustive list.
- We will begin with an introduction to federated learning with its various popular variants, and discuss the unique challenges associated with federated optimization.
- Efficient federated optimization: we will discuss resource-efficient, and in particular, communication-efficient federated optimization algorithms, including algorithms that perform multiple local updates (which aim to reduce the number of communication rounds) and communication compression (which aim to reduce the communication cost per round).
- Resilient federated optimization: we will discuss the vulnerability of federated optimization in the presence of data heterogeneity, together with efficient algorithmic solutions to provably overcome these limitations.
- Private federated optimization: we will highlight the necessity of privacy guarantees and notions of privacy measures, followed by algorithm developments.
No Touch Needed: Contact-Free Physiological Sensing for Fitness and Healthcare Using Cameras and RF Signals
Wenjin Wang (Southern University of Science and Technology (SusTech)), Chenshu Wu (University of Hong Kong), Chau-Wai Wong (NC State University), Min Wu (University of Maryland)
The Rationale of the Tutorial
Contact-free sensing via cameras and radio frequency (RF) enables continuous and comfortable monitoring of vital signs and physiological signals without the need for physical contact with the human body; it can also support social distancing and lower the risk of contact-based infection. Contact-free sensing can facilitate a rich set of healthcare and fitness applications, including in-hospital care units (such as intensive/neonatal care), automatic gating/triggering for medical imaging (MRI/CT), sleep centers, assisted living and senior centers, home-based baby and elderly care, fitness and sports, automotive driver monitoring, and AR/VR based therapy. Contact-free monitoring may also help mitigate a future epi-/pandemic via remote triage and home-based self-monitoring.
Timeliness: Techniques in contact-free sensing via cameras and RF have been rapidly developing over the past decade and have attracted attention from academia, funding agencies, industries, and healthcare professionals. Many design principles of related algorithms are signal-processing based, but contact-free sensing has not been systematically introduced or embraced by the broader signal-processing community at ICASSP. This proposed tutorial will be a timely opportunity to engage signal-processing researchers and industry technologists in this promising yet technically challenging area.
Novelty: Contact-free sensing applications almost always face very low SNR conditions. For example, in camera-based sensing, glares on human skin and subject motions may result in the signal of interest having a smaller magnitude than that of unwanted artifacts. Similarly, in RF sensing, reflection signals due to the subtle breathing/heartbeat motions are extremely weak, while RF signals usually experience significant noises and complex multipath effects. Various signal processing and machine learning techniques have been tailored to extract signals of interest in very low SNR conditions. This tutorial will cover both fundamental principles and more sophisticated real-world applications. Signal processing and deep learning approaches will be covered.
Importance and Appeal to the SP community: Contact-free physiological sensing by visual and/or RF modalities is a suite of increasingly popular cross-disciplinary research areas that such sister communities as biomedical engineering, computer vision, communications, and mobile computing have been embracing with open arms in recent years. Yet the key building blocks touch multiple technical areas of the SP community—it is truly “Signal Processing Inside” and a perfect fit for the ICASSP audience. The challenging signal processing scenarios encountered in contact-free sensing may remind SP researchers of similar problems they have encountered in other research topics such as denoising and multimedia forensics. The proposed tutorial will raise more awareness of the exciting R&D opportunities in the SP community on contact-free physiological sensing, stimulate discussions and explorations, and help the SP community play strong roles in this emerging area. The vital roles of signal processing to be discussed in the tutorial can help students appreciate the importance of signal processing through timely and appealing examples.
Detailed Description and Topic Outline of the Proposed Tutorial
Physiological signals such as heart rate, heart rate variability, respiration rate, and blood oxygenation saturation have been playing important roles in our life. This tutorial will focus on camera- and RF-based contact-free sensing techniques that enable convenient and ubiquitous physiological sensing. Respective applications in fitness and healthcare will be covered. Below is a preliminary outline that we envision for the ICASSP audience:
Part 1. Fundamentals and techniques for camera-based physiological sensing (led by Wenjin Wang)
1.1 Pulse rate: different measurement principles and models (blood absorption, BCG motion); core algorithms for
pulse extraction (physiological model based); solutions to improve robustness (multi-site measurement, distortion-based optimization, robust multi-frequency tracking, motion estimation and compensation, etc.), RGB and IR setups (multi-wavelength cameras, time-sequential cameras, RGB2NIR systems, designed light source, auto-camera control);
1.2 Respiration rate: different measurement principles (motion based, temperature based); core algorithms for respiratory signal extraction (optical flow, profile correlation, pixel flow); limitations and sensitivities (body motion);
1.3 Blood oxygen saturation: different multi-wavelength settings (red-IR, full-IR); core algorithms for SpO2 signal extraction; solutions to improve robustness (parallax reduction between cameras, wavelength selection, etc.);
1.4 Blood pressure: multi-wavelength pulse transit time based blood pressure, multi-site pulse transit time based blood pressure, camera-PPG waveform based blood pressure;
1.5 Clinical trials in hospital units such as NICU and ICU.
Part 2: Robust and privacy-aware physiological sensing (led by Chau-Wai Wong & Min Wu)
2.1 Physiological signal extraction under low SNR conditions: Micro-signal extraction strategies; fitness motion handling; robust tracking of multiple weak frequency traces.
2.2 Privacy-aware physiological sensing. Privacy protection: PulseEdit, identify-preserving transformation; adversarial manipulation.
2.3 PPG-to-ECG inference: Biophysical linkage between ECG and PPG; principled vs. data-driven approaches.
Part 3. Physiological sensing in the dark and through occlusion via RF signals (led by Chenshu Wu)
3.1 Principles and challenges of RF sensing; WiFi sensing vs. FMCW radar sensing; CSI introduction.
3.2 Core techniques for WiFi sensing (motion estimation, speed monitoring, breathing rate estimation); applications of WiFi sensing (wellbeing monitoring, fall detection, gait recognition, sleep monitoring).
3.3 Data-driven wireless sensing: datasets and model.
Audrey Repetti (Heriot Watt University), Nelly Pustelnik (CNRS), Jean-Christophe Pesquet (CentraleSupélec)
Since early 2000’s, two major trends have highly impacted signal and image processing methods, including for solving inverse problems: (i) sparsity and proximal algorithms and (ii) deep learning. Both take their foundation into fixed point strategies, that provide a simplified and unified framework to model, analyze, and solve a great variety of problems.
This tutorial will be dedicated to summarize most of the recent developments of this subject, focusing mostly on applications to imaging inverse problems.
The first part of the tutorial will be dedicated to splitting proximal algorithms. They form a particular class of fixed point strategies that enable to handle sums of non-necessarily smooth composite functions. During more than two decades these methods, associated with Bayesian or compressive sensing frameworks, have been state-of-the-art for solving inverse problems in signal and image processing. They usually aim at solving variational problems, including minimization and variational inequalities, usually favoring analysis or synthesis sparsity, and possibly taking into account feasibility constraints. During the past decade, numerous splitting proximal algorithms have been developed in an attempt to efficiently solve such problems. They operate by dividing problems into individual components that can be handled independently (either sequentially or in parallel) in the algorithm during the iterative process.
The first part of the tutorial will be dedicated to a synthetic presentation of the vast literature on splitting methods
(Forward-backward, Douglas-Rachford, Primal-dual schemes, ADMM, including inertial acceleration, block approaches, and extension to non-convex optimization). During the last decade, neural networks become ubiquitous to solve most of signal and image processing problems. Traditionally neural networks were mostly seen as black-boxes. However, recently, they have been paired with optimization techniques to attempt both better theoretical understanding, and more complicated task handling, including inverse problems. In this context, two main frameworks have emerged, inspired by fixed point strategies: unfolded neural networks, where the architecture of the network mimics a finite number of iterations of an optimisation algorithm, and plug-and-play (PnP) algorithms, where neural networks are substituted for nonlinear steps involved in iterative optimization
algorithms. On the one hand, unfolded networks, as the pioneering work LISTA based on unrolled forward backward iterations, alternate between a linear transform and a non-linear step (proximal step). Since then, more advanced proximal algorithm structures have been used to design more flexible and versatile networks. Although the design of unfolded networks is grounded in fixed point theory, only a part of these schemes can be interpreted as such.
On the other hand, PnP approaches usually replace the proximal operator of standard proximal algorithms (e.g. forwardbackward or ADMM) by a more powerful denoising network. Recent works show that, by training the network to hold some Lipschitz properties, the resulting PnP algorithm converges to a solution to a monotone variational inequalities. Both approaches have shown excellent performances in image processing.
The second part of the tutorial will be focused on how neural networks and splitting optimization algorithms can be paired to create very powerful tools for signal and image processing.
Numerical performances in the context of imaging inverse problems will be illustrated through several real-applications studied by the authors.
Importance, timeliness, and novelty:
Fixed point theory enables to present in a unifying framework most of the recent advances in data science. Despite the numerous contributions on this subject in the signal and imaging processing community, no tutorial has been proposed in the past years about this subject.
Zhijin Qin (Tsinghua University)
The rationale for the tutorial
Shannon and Weaver categorized communications into three levels:
- Level A. How accurately can the symbols of communication be transmitted?
- Level B. How precisely do the transmitted symbols convey the desired meaning?
- Level C. How effectively does the received meaning affect conduct in the desired way?
In the past decades, researchers primarily focus level A communications. With the development of cellular communication systems, the achieved transmission rate has been improved tens of thousands of times and the system capacity is gradually approaching to the Shannon limit. Semantic communications have been regarded as a promising direction to improve the system efficiency and reduce the data traffic so that to realize the level B or even level C communications. Semantic communications aim to realize the successful semantic information transmission that is relevant to the transmission task at the receiver. In
It has been demonstrated recently that deep learning (DL) has great potentials to break the bottleneck of the conventional communication systems, especially when it is applied to design semantic communication system, which will bring the revolution of communication systems. In the tutorial, we will provide a tutorial on semantic communications, including its difference from the typical communication systems and the performance metrics. We will also introduce our recent research in deep learning enabled semantic communications, communications beyond Shannon paradigm.
In the past decades, communications primarily focus on how to accurately and effectively transmit symbols (measured by bits) from the transmitter to the receiver, in which bit-error rate (BER) or symbol-error rate (SER) is usually taken as the performance metrics. With the development of cellular communication systems, the achieved transmission rate has been improved tens of thousands of times than before and the system capacity is gradually approaching to the Shannon limit. According to IDC, the global amount of data will be increased from 33 ZB in 2018 to 175 ZB in 2025, which brings a serious bottleneck for the existing communication system as the massive amount of data transmission requires wireless connectivity while the spectrum resource is scarce.
To further improve the system efficiency and reduce the data traffic, semantic communications has been regarded as a promising direction, which is recognized as the second level of communications by Shannon and Weaver in addition to typical communications focusing on successful transmission of symbols. Semantic communications aim to realize the successful semantic information exchange beyond the bit sequences or symbols transmission. Different from the typical source coding methods, only the information relevant to the transmission goal will be captured and transmitted to reduce the data traffic significantly.
In this tutorial, we will first introduce the concept of the semantic communication and highlight its key difference from typical communications. We then detail the general model and performance metrics of semantic communications. Afterwards, we will present the latest work on deep learning enabled semantic communications for text, speech, and image transmission. By employing a semantic encoder and channel encoder and dealing with their jointly design, the semantic communication system could achieve a significant performance improvement in terms of semantic information exchange. Besides, those massive amount of data are usually high dimensional, multimodal, distributed and required to be exchanged in an efficient, effective, and timely manner. We will provide a unified semantic communication structure to support multimodal data transmission for multi-tasks.
This tutorial will bring the new idea of semantic communications and how deep learning could be used to facilitate the design of it, which will be beneficial to researchers in signal processing community. The intended audience include PhD students, postdocs, and researchers with general background on machine learning and wireless signal processing.
A detailed description of the tutorial outlining the topics and subtopics covered.
- Introduction: what is semantic communication? (10 minutes)
- Semantic Communication Basis (40 minutes)
- General Model of Semantic Communications
- Design Principles and Performance Metrics
- Remarks and Questions-and-Answers
- DL enabled Semantic Communications (40 minutes)
- Semantic Communications for Text, Speech, and Image
- Semantic Communications for Multimodal Data
- Remarks and Questions-and-Answers
- Unified Semantic Communications (40 minutes)
- Multi-User Semantic Communications
- Unified Semantic Communication Structure
- Remarks and Questions-and-Answers
- Research Challenges and Conclusions (30 minutes)
Panagiotis Traganitis (Michigan State University), Georgios B. Giannakis (University of Minnesota)
Crowdsourcing has emerged as a powerful paradigm for tackling various machine learning, data mining, and data science tasks, by enlisting inexpensive crowds of human workers, or annotators, to accomplish a given learning and inference task. While conceptually similar to distributed data and decision fusion, crowdsourcing seeks to not only aggregate information from multiple human annotators or unreliable (a.k.a. weak) sources, but to also assess their reliabilities. Thus crowdsourcing can be readily adapted to information fusion tasks in unknown or contested environments, where data may be provided from unreliable and even adversarial agents. The overarching goal of this tutorial is a unifying framework for learning from unreliable (or “weak”) information sources, while being resilient to adversarial attacks.
Focusing on the classification task, exposition will start with classical tools for crowdsourced label aggregation, that simultaneously infer annotator reliabilities and true labels. Contemporary methods that leverage the statistical moments of annotator responses will be presented next. Building on the aforementioned models, a host of approaches that deal with data dependencies, including dynamic, networked data, and Gaussian Process-, as well as Deep Learning-based tools will be presented. Finally, approaches that can identify coalitions of colluding adversaries will be presented. Impact of the unified framework will be demonstrated through extensive synthetic and real-data tests.
Timeliness and intended audience:
Contemporary machine learning (ML) and artificial intelligence (AI) advances yield impressive results in a variety of tasks, thanks to training on massive labeled datasets. Nevertheless, curating such large datasets can require long times, and may also incur prohibitive costs. The continued proliferation of larger ML and AI models calls for approaches that reduce the need for ground-truth labels. Crowdsourcing offers an efficient low-cost option to label massive datasets, by querying crowds of (possibly non-expert and even unreliable) human annotators, and judiciously combining their responses, hereby facilitating training of large ML/AI models.
At the same time, crowdsourcing offers a fresh perspective on data and decision fusion; under the crowdsourcing regime, sensors or information sources, are allowed to be unreliable, or have unknown noise, and these sources of unreliability are estimated. The combination of crowdsourcing and data fusion is further realized in the field of crowdsensing, where mobile devices of human crowds can act as sensors in a distributed system.
The target audience includes graduate students and researchers with basic background in machine learning, statistical signal processing, and interests in data science as well as data fusion, for learning and decision making problems. The audience will become familiar with state-of-the art approaches to crowdsourcing and decision fusion tools under various scenaria, including the presence of adversaries; will obtain in-depth understanding of their merits and key technical challenges involved; and will leverage their potential on a spectrum of learning tasks.
Outline of tutorial content:
The tutorial is organized as follows.
Part I. Introduction: Context, motivation, and timeliness. (20 min)
Part II. Crowdsourced classification: (i) Annotator models for label aggregation; (ii) Probabilistic and Bayesian algorithms for label aggregation; (iii) Moment-matching approaches. (45 min)
Part III. Data-aware crowdsourcing: (i) models for sequential and networked data; (ii) Gaussian Process and Deep Learning algorithms. (30 min)
Part IV. Adversarially-robust crowdsourcing: (i) Identifying spammer annotators; (ii) Identifying arbitrary adversaries; (45 min)
Part V. Open issues: Challenging yet promising research directions. (15 min)
Sun Jun 4, PM
Parameter-Efficient Learning for Speech and Language Processing: Adapters, Prompts, and Reprogramming
Pin-Yu Chen (IBM Research), Hung-yi Lee (National Taiwan University); Chao-Han Huck Yang (Georgia Institute of Technology ), Kai-Wei Chang (National Taiwan University), Cheng-Han Chiang (National Taiwan University)
The Importance of the selected tutorial topic
With rising interests of using frozen pre-trained models for diverse downstream applications, how to design a performance-effective and parameter-efficient training framework is one open topic. When some recently discovered techniques share similar design principles, we aim to provide an in-depth summary and draw a taxonomy on the differences of parameter-efficient learning modules. The presenting topic is emerging as an essential pathway to design foundation models for the research community.
Timeliness and tutorial outlining the topics and subtopics covered.
- Introduction and Motivation for Studying Parameter-Efficient Learning (50 mins)
To be presented by Dr. Huck Yang, Amazon Alexa
- Background: Large-scale Pre-trained and Foundation Models
- Definition of parameter-efficient learning
- Basics of Adapters, Prompts, Reprogramming, and LoRA
- Theoretical justification with Wasserstein Measurement
- Model reprogramming for time series, music, and multilingual ASR
- New Approaches on Neural Model Reprogramming (50 mins)
To be presented by Dr. Pin-Yu Chen, IBM Research AI
- From Adversarial ML to Model Reprogramming
- Reprogramming for Medical Images and DNA
- Reprogramming for Privacy and Fairness
- Parameter-Efficient Learning for Natural Language Processing (30 mins)
To be presented by Cheng-Han Chiang and Prof. Hung-yi Lee
- Various Adapters
- Various Prompting Technology
- Parameter-Efficient Learning for Speech Processing (30 mins)
To be presented by Kai-Wei Chang and Prof. Hung-yi Lee
- Adapter for Speech
- Speech Prompt
- Conclusion and Open Questions (10 mins)
– Lessons learned: a signal processor wandering in the land of large-scale models
– Available resources and code for research in parameter-efficient learning
Potential research directions: Parameter-efficient learning have proven with promising success together with pre-trained models for new and challenging applications, such as few-shot learning, cross-tasks adaptation (e.g., ASR to SLU), and cross-domain transfer learning (e.g., speech to ECG signals).
Novelty of the tutorial: This is the first ever tutorial with a focus on resource and parameter-efficient learning in the new research direction with a comprehensive overview.
How it can introduce new ideas, topics, and tools to the SP community. The tutorial organizers are firm research groups in this new area covering most perspectives and applications. We will provide reproducible benchmarks and surveys as supporting material to the ICASSP audiences for this tutorial.
Osvaldo Simeone (King’s College London)
In the current noisy intermediate-scale quantum (NISQ) era, quantum machine learning is emerging as a dominant paradigm to program gate-based quantum computers. In quantum machine learning, the gates of a quantum circuit are parametrized, and the parameters are tuned via classical optimization based on data and on measurements of the outputs of the circuit. Parametrized quantum circuits (PQCs) can efficiently address combinatorial optimization problems, implement probabilistic generative models, and carry out inference (classification and regression). They can be implemented within actual quantum computers via cloud-based interfaces accessible through several software libraries — such as IBM’s Qiskit, Google’s Cirq, and Xanadu’s PennyLane.
The emergence of quantum machine learning offers an opportunity for signal processing researchers to contribute to the development of quantum computing. In this context, this tutorial will provide a self-contained introduction to quantum machine learning for an audience of signal processing practitioners with a background in probability and linear algebra. It first describes the necessary background, concepts, and tools necessary to describe quantum operations and measurements. Then, it covers parametrized quantum circuits, the variational quantum eigensolver (VQE), as well as unsupervised and supervised quantum machine learning formulations.
The tutorial will be structured as follows:
- Qubits, quantum gates, and measurements
- Single qubit systems (45 minutes)
- Multi-qubit systems (45 minutes)
- Quantum computing and quantum machine learning
- Basics of quantum computing (30 minutes)
- Quantum machine learning: from the VQE to supervised learning (1 hour)
Jose Principe (University of Florida); Robert Jenssen (UiT – The Arctic University of Norway), Shujian Yu (UiT – The Arctic University of Norway)
The Rationale for the Tutorial
In recent years, Information-Theoretic Learning (ITL) is exploiting the remarkable advantages of information theoretic methods in solving various deep learning problems. Notable examples include: 1) using Information Bottleneck principle to explain the generalization behavior of DNNs or improve their adversarial robustness and out-of-distribution (OOD) generalization; 2) incorporating information-theoretic concepts (such as divergence and conditional mutual information) to learn causal and invariant representations, to quantify uncertainty, and optimize the value of information in abstract tasks such as the exploitation-exploration dilemma in reinforcement learning.
The “Information-Theoretic Learning” is always a main subject area under the category “Machine Learning for Signal Processing” in previous ICASSP iterations. However, there has been no related tutorials so far. With the recent rapid development of advanced techniques on the intersections between information theory, machine learning, and signal processing, such as neural network-based mutual information estimators, deep generative models and causal representation learning for time series, it is now a good time to deliver a tutorial on these topics. This tutorial will attract audiences from both academia and industry.
Topics and Subtopics in this Tutorial (This tutorial includes 5 sections)
Part I (50 minutes)
°Section I: Introduction and background (15 minutes)
➢ History of Information-Theoretic Learning (ITL) and its recent surge in deep learning
° Section II: Basic elements of Information Theory (35 minutes)
➢ Information-Theoretic quantities (entropy, mutual information, divergence) and their estimators
Part II (60 minutes)
° Section III: Information-Theoretic Learning principles
➢ Information Bottleneck: theory and applications
➢ Principle of Relevant Information: methods and applications
➢ Other learning principles (Information maximization, Correlation Explanation, etc.)
Part III (50 minutes)
° Section IV: Advancing deep learning problems by Information Theory (40 minutes)
➢ Self-supervised, semi-supervised and multi-view learning
➢ Domain adaptation and out-of-domain generalization
° Section V: Challenges and Open Research Direction (10 minutes)
➢ Theories, Principles, and Algorithms
➢ Broader Applications
Tianyi Chen (Rensselaer Polytechnic Institute), Zhangyang Wang (University of Texas at Austin)
Importance of the tutorial. Deep learning has achieved remarkable success in many machine learning (ML) tasks, such as image classification, speech recognition, and language translation. However, these breakthroughs are often difficult to translate into real-world applications, because deep learning faces new challenges when deploying in real-world settings. These challenges include lack of training samples, cost of iterative training, cost of tuning hyperparameters, distribution shifts and adversarial samples. The bilevel machine learning foundation in this tutorial offers a promising framework to address these challenges.
Timeliness of the tutorial. Recently, bilevel optimization (BO) and its applications to ML encompassing learning-to-optimize (L2O), implicit/equilibrium models, meta-learning and neural architecture search (NAS) are gaining the growing interests in ML, signal processing, and optimization communities. For example, in NeurIPS 2022, roughly 100 papers are on topics related to BO, which span both the fundamental theory and applications. Our tutorial will cover the basic tools of solving BO, review statistical learning theory for analyzing the solution of obtained by the empirical version of BO, and highlight promising applications.
Novelty of the tutorial. Existing efforts on BO applications such as L2O, NAS and meta learning have been mostly devoted to the empirical improvements, many of which do not have theoretical guarantees. Without a solid theoretical foundation, one cannot assess the quality of the learned ML models from a certain number of samples. On the other hand, modern optimization algorithms can already tackle large-scale ML problems. But the majority of efforts have been made to solve problems with relatively simple structures. However, to tackle new challenges in ML, the resultant optimization problems often have hierarchically coupled structures and are typically formulated as BO. While BO dates back to von Stackelberg’s seminal work, developing scalable algorithms for ML and providing non-asymptotic guarantees are new.
Contributions to the SP community. This tutorial will stimulate new theoretical and algorithmic research on bilevel machine learning by experts in statistical and adaptive SP; it can also greatly benefit the use of bilevel learning in traditional SP application domains such as communications, image processing and audio speech processing; see two recent monographs on image processing and communications.
Outline of Tutorial (total 180 mins)
- Part I – Introduction and Background (20 mins)
- New challenges of machine learning in recent years (5 mins)
- Bilevel optimization as a toolbox to address those challenges (5 mins)
- Comparison to conventional optimization for machine learning (10 mins)
- Part II – Applications of Bilevel Machine Learning (60 mins)
- Learning-to-optimize (L2O) such as model-based, model-free L2O (20 mins)
- Hyperparameter optimization such as neural architecture search (NAS) (20 mins)
- Meta-learning such as MAML, meta representation learning (20 mins)
- Part III – Optimization Methods for Bilevel Machine Learning (40 mins)
- History of bilevel optimization and its recent surge (5 mins)
- Implicit gradient-based methods for bilevel optimization (10 mins)
- Explicit gradient-based methods for bilevel optimization (15 mins)
- Value function-based methods for bilevel optimization (10 mins)
- Part IV – Generalization Theory for Bilevel Machine Learning (40 mins)
- Generalization bounds on popular (overparameterized) meta-learning methods (10 mins)
- Generalization and adaptation for L2O algorithms (10 mins)
- Generalization bounds on generic bilevel learning algorithms (10 mins)
- Sample complexity comparison with single-level learning algorithms (10 mins)
- Part VI – Challenging yet Promising Open Research Directions. (20 mins)
- Applications of L2O, hyperparameter optimization and meta-learning (10 mins)
- New theory and algorithms for bilevel optimization for machine learning (10 mins)
Takuya Fujihashi (Osaka University), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories)
Thanks to rapid progress in robotics, sensors, communications, and artificial intelligence (AI), human-machine interaction (HMI), i.e., the interaction between users and remote robots via wired/wireless channels, will be a key technology for realizing teleworking, remote operation, and epidemic care. For better HMI to support human intelligence, users and robots will exchange multiple sensorial media (mulsemedia) signals (audiovisual and tactile from robots and bioelectric signals from users) with low delay, to identify each user’s intelligence through biosignal processing for the received bioelectric signals. This tutorial will give a comprehensive review of fundamentals and state-of-the-art communications techniques for mulsemedia signals and biosignal processing techniques.
RATIONALE FOR TUTORIAL
HMI utilizing mulsemedia information is one of promising techniques in the 5G through 6G Era. Fig. 1 shows the overview of HMI. To exchange high-quality mulsemedia signals between the remote robots and users over networks, many studies have designed digital based coding and transmission solutions. However, such digital based solutions sometimes suffer from significant quality degradation, known as cliff and leveling effects, due to fluctuating channel characteristics. One typical solution is to retransmit the distorted signals from the sender to the receiver open failure. However, the retransmission will increase the response delay of the HMI and long response delay may disturb the interaction between the user and remote robot. The remote robot tries to identify each user’s vital signs, physiological data, and biometrics from the bioelectric signals, such as electroencephalography (EEG), to support the user’s activity. For this purpose, we have designed a signal processing solution for the bioelectric signals. It is found that the measured bioelectric signals highly depend on the user’s physiological conditions, mental states, and other factors. To deal with user-dependent nuisance factors for biosignal processing, the remote robot needs long calibration for analyzing other users.
This tutorial addresses how to solve the above-mentioned issues of the communications techniques for mulsemedia signals and biosignal processing techniques. Specifically, we will introduce analog joint source channel coding (AJSCC) and pre-shot learning, which are the promising paradigm for the mulsemedia communications and biosignal processing techniques. We show that AJSCC originally designed for visual information has a high potential for the untethered delivery of mulsemedia signals. We then introduce the basic principles of AJSCC and its applications, including efficient energy compaction, optimal power allocation, and communication overhead reduction in a comprehensive manner. We show that the AJSCC improves the quality compared to the digital-based delivery schemes by some emerging techniques such as graph signal processing and graph neural networks.
Pre-shot learning, a.k.a., domain generalization, disentangles the user-dependent nuisance factors and extracts nuisancerobust/invariant features from the measured bioelectric signals. The fundamentals and recent achievements of the pre-shot learning based on adversarial training and other regularization techniques will be introduced in this tutorial. We will verify that the designed domain-invariant learning is effective for HMI development.
DETAILED DESCRIPTION OF THE TUTORIAL – OUTLINING THE TOPICS AND SUBTOPICS COVERED
1) Human machine interaction: overview and challenges
2) Basic principles and applications of AJSCC
- Key issues of low-delay communications for mulsemedia signals: cliff, leveling-off, and stair-case effects
- How to solve the issues – energy compaction, power allocation, and modulation designs of AJSCC
- Remaining issues and opportunities – efficient energy compaction, optimal power allocation, communication overhead, packet loss resilience
3) AJSCC of mulsemedia signals
- Designs for visual, haptic, and bioelectric signals
- AI-empowered AJSCC: multi-layer perceptron (MLP), convolutional neural networks (CNN), vs. graph neural networks (GNN)
- Integration with digital-based solutions: hybrid digital analog delivery
4) Remote human-machine interaction systems with biosignal processing
- Basics of bioelectric signals: biological electroencephalogram (EEG) and electromyography (EMG)
- Biosignal processing: individual identification from the measured bioelectric signals
- Key issues: long machine-calibration or long human training sessions due to user-dependent nuisance features
5) Advanced transfer learning for user-independent biosignal processing
- Few-shot and zero-shot learning: state-of-the-art TL methods across different users
- Pre-shot learning: extraction of nuisance-robust features without calibration
- AutoBayes/AutoTransfer: Automated optimization for Bayesian inference and transfer learning.
Urbashi Mitra (USC)
Learning or exploration-exploitation problems abound in applications such as anomaly detection, target localization, dynamical system tracking, medical diagnosis, wireless body area sensor networks etc. Initially, one is unclear about the state of the environment and the goal is to take observations that refine the understanding of the state. If one has a series of “experiments’’ (or queries), each of which provide information about the state, an important question is how to design that sequence of experiments to enable a decision about the environmental state as quickly as possible. In particular, it is of interest to determine the next best experiment as a function of the past observations. A formulation of this problem is active hypothesis testing which has been persistently studied since the 1940s. We will review classic results in sequential decision making and informativeness and make connections to active testing and learning focusing on new results for the non-asymptotic case. In particular, random walks and martingale theory play a large role. We shall examine static and dynamic environments (active
tracking). We shall then apply these strategies to active matrix completion problems.
I have investigated the tutorials from the last six years (not all websites are accessible) and realized that there have not been any recent tutorials on this topic – active learning, active classification as well as focus on finite sample approaches. Most classic, and even fairly recent, active learning strategies are focused on asymptotic regimes, in contrast to our approaches (which are asymptotically optimal as well).
The tutorial will cover classical fundamentals as well as new approaches developed within my research group. These active learning and classification strategies can have an impact on a wide variety of applications. Applications that we have considered include active sensor selection in wireless body area sensing networks, resource allocation for cognitive networks, active localization, actuation and control, and health care. The use of the active methods in these applications will be presented.
- History on experiment design
- Motivating example: active boundary detection in sensor networks
- Basics of Hypothesis Testing (Bayes, Neyman Pearson, Chernoff Stein, Chernoff Information, SPRTs)
- Monty Hall and decision trees
- WBANs and POMDPs (notions of informativeness)
- Moment generating functions of log likelihood ratio functions and properties
- Finite Sample analysis of NP tests/composite tests
- Kullback Leibler divergence and distribution optimization
- Optimal strategies/comparisons to Chernoff strategy
- Neural network architectures for optimal active strategies
- Anomaly detection and tight finite sample bounds
- Martingales/Concentration inequalities
- SARS-CoV2 testing/group testing
- Application: active localization (multi-armed bandits)
Mon Jun 5, AM
Moe Win (Massachusetts Institute of Technology), Andrea Conti (University of Ferrara)
Motivation and objectives: The availability of real-time high-accuracy location awareness is essential for current and future wireless applications, particularly for the Internet-of-Things and 5G/B5G ecosystem. Reliable localization and navigation of people, objects, and vehicles – Localization-of-Things (LoT) – is a critical component for a diverse set of applications including connected communities, smart environments, vehicle autonomy, asset tracking, medical services, military systems, and crowd sensing.
The coming years will see the emergence of network localization and navigation in challenging environments with sub-meter accuracy and minimal infrastructure requirements. Network localization and navigation give rise to a new paradigm for context-aware wireless communications, enabling a variety of new applications that rely on position information of mobile nodes. As the ability to localize devices in wireless networks becomes increasingly important, it is necessary for researchers in communications to be aware of both the fundamentals and the state of the art in location aware networks. This tutorial is aimed at students, researchers, and practitioners to provide the knowledge on LoT in a rigorous, yet concise form.
Scope: Attendees of this tutorial will learn about location-aware networks in two ways. On the one hand, they will get a high-level overview of fundamental performance bounds, ranging techniques, positioning algorithms, and network experimentation. On the other hand, the tutorial will serve as an introduction to the state of the art in location inference for active and passive localization employing wideband wireless technologies. Results based on measurements collected via network experimentation employing wideband and ultra-wideband radios are used to illustrate the concepts.
Abstract: The availability of real-time high-accuracy location awareness is essential for current and future wireless applications, particularly those involving Internet-of-Things and beyond 5G ecosystem. Reliable localization and navigation of people, objects, and vehicles – Localization-of-Things (LoT) – is a critical component for a diverse set of applications including connected communities, smart environments, vehicle autonomy, asset tracking, medical services, military systems, and crowd sensing. The coming years will see the emergence of network localization and navigation in challenging environments with sub-meter accuracy and minimal infrastructure requirements. We will discuss the limitations of traditional positioning, and move on to the key enablers for high-accuracy location awareness: wideband transmission and cooperative processing.
Topics covered will include: fundamental bounds, cooperative algorithms for 5G and B5G standardized scenarios, and network experimentation. Fundamental bounds serve as performance benchmarks, and as a tool for network design. Cooperative algorithms are a way to achieve dramatic performance improvements compared to traditional non-cooperative positioning. To harness these benefits, system designers must consider realistic operational settings; thus, we present the performance of cooperative localization based on measurement campaigns. We will also present LoT enablers, including reconfigurable intelligent surfaces, which promise to provide a dramatic gain in terms of localization accuracy and system robustness in next generation networks.
Outline: The presentation outline is as follows.
- Problem Formulation
- Localization Basics
- Measurement Phase
- Localization Phase
- Performance Evaluation
- Localization Systems
- High-accuracy Localization
- Theoretical Foundation
- Cooperative Algorithms
- Network Experimentation
- Performance Evaluation
- Localization of Untagged Objects
- Problem Formulation
- Performance Evaluation
- Research Directions
- Summary and Conclusions
Lajos Prof. Hanzo (University of Southampton)
Moore’s laws has indeed prevailed since he outlined his empirical rule-of-thumb in 1965, but based on this trend the scale of integration is set to depart from classical physics, entering nano-scale integration, where the postulates of quantum physics have to be obeyed. The quest for quantum-domain communication solutions was inspired by Feynman’s revolutionary idea in 1985: particles such as photons or electrons might be relied upon for encoding, processing and delivering information. Hence in the light of these trends it is extremely timely to build an interdisciplinary momentum in the area of quantum communications, where there is an abundance of open problems for a broad community to solve collaboratively. In this workshop-style interactive presentation we will address the following issues:
- We commence by highlighting the nature of the quantum channel, followed by techniques of mitigating the effects of quantum decoherence using quantum codes.
- Then we bridge the subject areas of large-scale search problems in wireless communications and exploit the benefits of quantum search algorithms in multi-user detection, in joint-channel estimation and data detection, localization and in routing problems of networking, for example.
- We survey advance in quantum key distribution networks.
- The dawn of quantum communications
- The basics of quantum mechanics
- Superposition, Measurement, Entanglement, No-Cloning, Teleportation
- Quantum Communications Models
- Decoherence: the Quantum Noise
- Noisy Quantum Teleportation
- Quantum Coding & Quantum Error Mitigation Techniques
- Quantum Search Algorithms
- Quantum Key Distribution (QKD)
- QKD networks and quan Qinternet
- What are the basic pros and cons as well as limitations of quantum technologies;
- How to mitigate the deleterious effects of quantum omain impairments;
- How to design quantum codes for quantum-domain error mitigation;
- What are the pros and cons of quantum search algorithms;
- What are the pros and cons of quantum key distribution networks;
Importance and Timeliness:
The Nobel Prize in Physics was awarded to three quantum scientists in the year 2022. Realizing the rapid evolution of this field, the ComSoc Emerging Technical Committee on Quantum Communications and Information Technology was created (https:qcit.committees.comsoc.org/) and there is also an IEEE-level cross-society initiative (https://quantum.ieee.org/) Furthermore, a JSAC special issue was published. ComSoc is also a sponsor of the IEEE Quantum Week https://qce.quantum.ieee.org/ The EU is investing 5 Billion Euros into quantum science and it has also become part of the current 5-year plan in China.
Fanghui Liu (EPFL), Johan Suykens (KU Leuven), Volkan Cevher (EPFL)
Brief description of tutorial proposal:
In this tutorial, we contend that the conventional wisdom which supports the use of simple signal models in signal processing and machine learning tasks missed the bigger picture that launched large-scale neural networks (NNs) into stardom. Along the way, we have also found out that even if the inference problems can be non-convex and are huge-scale, we can still find “good” solutions along with these non-convex representations with acceptable computational resources. To be specific,
- The conventional wisdom of simple signal models missed the bigger picture, especially overparameterized neural networks. There are statistical advantages when we go beyond concise signal representations can be beneficial, e.g., sparsity recovery.
- The conventional wisdom of avoiding large-scale non-convex optimization is a bit outdated. Even if problems are non-convex, we can find good solutions to non-convex problems with neural network representations.
- Moreover, against intuition, these models go against classical learning theory in generalization, e.g., benign over-fitting, double descent; and robustness.
We believe understanding NNs in “good” performance, “bad” explanation, and “ugly” phenomena is desirable to have a comprehensive tutorial. Our tutorial summarize the progress on theoretical understanding NNs from theory to computation, including two parts: 1) generalization guarantees of NNs, and benefit of over-parameterization in NNs regarding implicit bias, benign overfitting, and robustness; 2) stochastic approximation with acceleration, min-max optimization for robust learning, covering applications such as NNs and adversarial training, reinforcement learning, sparse recovery and inverse problems.
Outline: We will use the following outline and elaborate on the details for a three-hour tutorial:
- Part I: Introduction, preliminaries, and background
- Background of NNs in over-parameterization
- Introduction to non-convex optimization of over-parameterized NNs training
- Statistical learning theory for over-parameterized models
- Part II: Theoretical guarantees of over-parameterized models
- Generalization guarantees: benign overfitting, double descent
- Benefits of over-parameterization: implicit bias, robustness
- Application to signal processing: sparsity recovery, phase retrieval
- Part III : Stochastic approximation in over-parameterized NNs training
- Basic methods, gradient based algorithm, and Adagrad
- Extensions of Adagrad for neural network training: RMSprop and Adam
- Properties, shortcomings and extensions of RMSprop and Adam
- Part IV : Min-max optimization: inverse problems, GANs
- Basic methods, gradient descent-ascent, extra-gradient.
- Deterministic primal-dual methods: properties of primal-dual hybrid gradient (PDHG)
- Stochastic primal-dual methods: properties of stochastic PDHG and variants.
The tutorial welcomes not only graduate students of all levels but also veteran ML researchers and other signal processing researchers to learn new techniques and algorithms for NNs training with statistical guarantees. We also believe that the solid understanding of over-parameterized NNs will be of great interest to the industrial participants for practical uses.
Keshab K Parhi (University of Minnesota)
With exponential increase in the amount of data collected per day, the fields of artificial intelligence and machine learning continue to progress at a rapid pace with respect to algorithms, models, applications and hardware. In particular, deep neural networks have revolutionized the field by providing unprecedented human-like performance in solving many real-world problems such as image or speech recognition. There is also a significant research aimed at unraveling the principles of computation in large biological neural networks and, in particular, biologically plausible spiking neural networks. Research efforts are also directed towards developing energy-efficient computing systems for machine learning and AI. New system architectures and computational models from tensor processing units to in-memory computing are being explored. Reducing energy consumption requires careful design choices from many perspectives. Some examples include: choice of model, approximations of the models for reduced storage and memory access, choice of precision for different layers of networks and in-memory computing. The half-day tutorial will provide a detailed overview of the new developments related to brain-inspired computing models and their energy-efficient architectures. Specific topics include: (a) Computing models: Perceptrons, convolutional neural networks, recurrent neural networks, spiking neural networks, Boltzmann machines, hyper-dimensional computing; (b) backpropagation for training, (c) Computing architectures: systolic arrays for convolutional neural networks, low-energy accelerators via sparsity, tensor decomposition, and quantization, in-memory computing. (d)Accelerators for training deep neural networks.
A.Computing Models: Perceptrons, multi-layer perceptrons, convolutional neural networks, recurrent neural networks: long short-term memory and gated recurrent units, spiking neural networks, graph neural networks.
B.Training: Review of back-propagation
C.Computing Architectures: Systolic arrays for convolutional neural networks, computer architectures for spiking neural networks, low-energy neural network accelerators via sparsity, tensor decomposition, and quantization, and in-memory computing. Architectures for training neural networks with reduced latency and energy consumption. Architectures for graph neural networks.
George Alexandropoulos (National and Kapodistrian University of Athens), Yonina Eldar (Weizmann Institute), Merouane A Debbah (Technology Innovation Institute)
The proposed tutorial is very timely and will be largely based on our recent overview article entitled “Pervasive Machine Learning for Smart Wireless Environments Enabled by Reconfigurable Intelligent Surfaces” which has appeared in the Proceedings of the IEEE journal in September 2022. In addition, the tutorial will include the late research of the speakers on federated and reinforcement learning approaches for such environments, as described by their relevant articles. In contrast to the various available tutorials on the RIS technology and its applications for communications, localization, and sensing, this tutorial is the first to focus on RIS-enabled smart wireless environments, which are defined as the wireless networks that include multiple RISs that need to be jointly optimized with affordable control information exchange overhead, enabling fascinating applications resulting for their widespread incorporation in the environment of interest.
A comprehensive modeling of the RIS-enabled smart wireless environments comprising multi-antenna base stations/access points, multiple, possibly multi-antenna, user equipment, and multiple commonly accessible RISs will be provided. In the sequel, a detailed introduction to the reinforcement learning theory will be provided with the purpose of explaining the principles behind the most prominent deep reinforcement learning algorithms currently in
use by the RIS-empowered wireless communications community, supplemented by a thorough taxonomy that emphasizes their different characteristics. In addition, the state-of-the-art in centralized and distributed supervised learning approaches for orchestrating smart wireless environments will be presented, highlighting their limitations in comparison with their online learning counterparts. To this end, we only expect the audience to be familiar with basic concepts in machine learning. A principled application of deep reinforcement learning to RIS-empowered smart radio environments is presented, detailing the correspondences among the design parameters of the wireless system and the respective terminology. The tutorial’s last part will be devoted in the latest applications of machine learning for smart wireless environments, highlighting various future research directions.
- Current 5G status and 6G requirements
- RIS-enabled smart wireless environments
- Part I: Fundamentals of RISs
- Hardware architectures
- Uses cases and network architectures
- Channel and operation modeling
- Control and optimization
- Part II: Learning for Smart Wireless Environments
- System modeling
- Centralized supervised learning
- Reinforcement learning
- Distributed learning with reduced control overhead
- Open challenges and future directions
- Part III: Applications of Smart Wireless Environments
- Capacity optimization
- Federated spectrum learning
- Localization and radio-frequency sensing
- Integrated RIS control and communications
- Open challenges and future directions
- Concluding Remarks
Ismail Ben Ayed (ETS Montreal)
The rationale for the tutorial:
Deep learning is dominating computer vision, natural language processing and a broader spectrum of signal-processing applications. When trained on labeled data sets that are large enough, deep models could achieve outstanding performances. Nevertheless, such supervised models may have difficulty generalizing to novel tasks unseen during training, given only a few experience and related context. Few-shot learning has emerged as an appealing paradigm to bridge this gap. It has recently triggered substantial research efforts and interests, with large numbers of publications within the computer vision, natural language processing and machine learning communities. Whether it is for detecting a rare sound event or for the classification of a rare condition in a medical image, few-shot learning problems occur almost everywhere in practice. The importance and wide interest of few-shot inference is even more appealing with the recent raise of very large-scale unsupervised image-language models. These foundational pre-training models, such as CLIP (Radford et al., ICML 2021), have shown very promising few-shot generalization capabilities. Therefore, fast and practical algorithms for adapting such foundational models to downstream tasks, given only a handful of labels, are becoming of central interest.
This tutorial will focus on very recent trends in few-shot learning, and is a substantial update of the tutorial we gave to a pattern-recognition audience at ICPR in 2022 (link provided above). In this update, I will emphasize very recent approaches to few-shot inference and fine-tuning, and provide a good focus on adaptation of large-scale foundational image-language models to downstream tasks. Several of the discussed inference methods are based on techniques/concepts that should be appealing to a broad signal-processing audience, including minimum description length (MDL), expectation-maximization (EM), Markov Random Fields (MRFs), and information-theoretic concepts. This provides a good ground to trigger new ideas and applications of few-shot learning in signal processing. I expect the tutorial to be appealing to a large and diversified signal-processing audience, including: researchers and graduate students interested in developing few-shot algorithms in the wide range of signal-processing applications; and industrial researchers/engineers working on real-world problems, in which the few-shot learning challenge arises naturally. The last two editions of ICASSP did not have a tutorial focused on these very recent progresses in few-shot learning. There are previous ICASSP tutorials that focused on meta-learning approaches: [Li et al., Meta Learning and its applications to Human Language Processing, 2021] and [Simeone and Chen, Learning with Limited Samples – Meta Learning and Applications to Communications, 2022]. While I will mention briefly meta-learning, the focus will be on recent transfer-learning and inference approaches that are shown to outperform substantially meta-learning methods by a large body of recent works, and on adaptation of very recent large-scale image-language models.
A description of the tutorial outlining the topics and subtopics covered:
- Introduction to few-shot learning and test-time adaptation (with focus on foundational image-language models):
- Motivation: From large-scale image-language models to downstream tasks
- Basic definitions: Base training, few-shot inference and test-time adaptation
- Applications and public benchmarks
- Meta-learning approaches: Are they really working?
- Metric-learning based approaches (e.g, ProtoNets)
- Optimization-based approaches (e.g., MAML)
- Strong transfer-learning and base-training baselines (e.g., SimpleShot)
- Advanced inference and fine-tuning approaches
- Inductive vs. transductive inference
- Strong fine-tuning baselines: what and how to fine-tune?
- Minimum description length (MDL) and expectation-maximization inference
- MRF-regularization approaches (e.g., LaplacianShot)
- Information-theoretic approaches (e.g., TIM)
- An example of advanced applications: Few-shot and open-vocabulary image segmentation
- Specific challenges
- Meta-learning vs. transfer-learning approaches: A case study of transductive few-shot inference
- Strong fine-tuning baselines for open-vocabulary segmentation
- Limitations of current models and benchmarks
- Conclusion and outlook
Mon Jun 5, PM
Fernando Pereira (Instituto Superior Técnico – Instituto de Telecomunicações)
The rationale for the tutorial
Visual communications have a fundamental role in Human societies. In the digital era, this has led to the explosion of image and video-based applications and services, notably following the democratization of image and video acquisition, storage and streaming. However, conventional image and video-based experiences are far from the real-world immersion; this has motivated a rush for more realistic, immersive and interactive visual experiences, notably offering 6-DoF (Degrees-Of-Freedom) experiences where the user is free to exploit the six translational and rotational types of motion, possible in the real world. The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where Point Clouds (PCs) are playing an increasingly important role. Point clouds are a 3D visual model where the visual scene is represented through a set of points and associated attributes, notably color. To offer realistic and immersive experiences, point clouds need to have millions, or even billions, of points, thus asking for efficient representation and coding solutions. This is critical for emerging applications and services, notably virtual and augmented reality, personal communications and meetings, education and medical applications and virtual museum tours. The COVID 19 pandemic has exponentially increased this need for more realistic and immersive experiences, thus making it urgent to evolve in this domain.
Recent years have seen a growing wave of data driven algorithms, such as deep neural networks, which have taken a lead role in many research and development areas, especially in multimedia. This interest is driven by several factors, notably advances in processing power, availability of large data sets, and algorithmic advances. Deep learning (DL)-based solutions are the state-of-the-art for multiple computer vision tasks, both high-level and lower-level tasks. These advances have led to the exploitation of DL-based tools in the image coding domain, reaching competitive performance in a short time. This has been recognized by JPEG (Joint Picture Experts Group) which has launched the JPEG AI project with the scope of creating a learning-based image coding standard; the first version of this standard has shown above 30% rate reduction regarding the best available conventional image coding solutions.
Despite the recent arrival to the point cloud coding arena, the first exploitations of DL-based technology, many pioneered by the proposer of this tutorial, have shown very promising compression performance and opened a new line of research. Moreover, DL-based point cloud coding has the potential to offer an effective, common, unique representation for both human and machine consumption since it is possible to use the same, single coded bitstream to decode for human visualization as well as for direct machine consumption, notably to perform computer vision tasks such as classification, recognition, detection, etc. with competitive performance.
The potential of this type of coding approach has been recognized by JPEG and MPEG which have started explorations towards standardizing DL-based point cloud coding solutions. The initiatives mentioned above clearly demonstrate that after many years of developments and advances with hand-crafted coding tools, the multimedia coding landscape is facing a revolution with the emergence of deep learning-based technology, which has the potential to unify several multimedia domains under the same technological umbrella, notably coding, classification, recognition, detection, denoising, super-resolution, etc. Considering the huge impact of multimedia representation technology in our society and lives, this tutorial would be highly beneficial for the multimedia signal processing community since this DL-based technical wave is coming fast and will have a lasting impact.
A detailed description of the tutorial outlining the topics and subtopics covered
The point cloud representation technical area has received many contributions in recent years, notably adopting deep learning-based approaches, and it is critical for the future of immersive media experiences. In this context, the key objective of this tutorial is to review the most relevant point cloud representation and coding solutions available in the literature with a special focus on DL-based solutions and its specific novel features, e.g. model design and training. Special attention will be dedicated not only to point cloud coding for human visualization but also to computer vision tasks, e.g. classification, performed from the same coded stream. In brief, the outline will be:
- 3D visual representation and coding
- Plenoptic function-based imaging: light fields, point clouds and meshes
- Point cloud basics and applications
- Point cloud coding standards for static and dynamic point clouds
- Emerging deep learning-based point cloud representation
- Deep learning-based point cloud coding standardization projects
- Trends and coming challenges
Marcello Caleffi (University of Naples Federico II)
In the theory of quantum communications, a deeper structure has been recently unveiled, showing that the capacity does not completely characterize the channel ability to transmit information due to phenomena – namely, superadditivity, superactivation and causal activation – with no counterpart in the classical world. Although how deep this structure goes is yet to be fully uncovered, it is crucial for the communication & signal-processing engineering community to own the implications of these phenomena for understanding and deriving the fundamental limits of communications. Hence, the aim of this treatise is to shed light on these phenomena by providing the reader with an easy access and guide towards the relevant literature and the prominent results from a
communication engineering perspective.
A description of the topics that the tutorial will address, emphasizing their Timeliness:
The tutorial first provides an introduction to quantum communications by reviewing the basics of the associated quantum mechanics formalism. Then, the fundamental differences between classical and quantum communications are illustrated. This includes the very recent results about the deeper – yet to be fully understood – structure of quantum capacities, including the marvels of quantum phenomena – namely, superadditivity, superactivation and causal activation – with no counterpart in the classical world.
The tutorial is complemented with an overview of the challenges arising with the implications of these phenomena for understanding and deriving the fundamental limits of communications. A workshop-style interactive presentation will be adopted to engage the audience in paving the way for the classical communications community to contribute to the quantum communications field. Whilst this overview is ambitious in terms of providing a research-oriented outlook, potential attendees require only a modest background in communications.
The mathematical contents are kept to a minimum and a conceptual approach is adopted. Postgraduate students, researchers, and practitioners as well as managers looking for cross-pollination of their experience with other topics may find the coverage of the presentation beneficial.
The participants will receive the set of slides as supporting material and they may find the detailed mathematical analysis in a paper recently published by IEEE Communications Surveys & Tutorials.
- Introduction and Motivation
- The dawn of quantum communications
- The basics of quantum mechanics:Superposition, Measurement, Entanglement, No-Cloning, Teleportation
- Quantum Communications
- From classical capacity to quantum capacities
- Classical Capacity of Quantum Channels
- Quantum Capacity of Quantum Channels
- Quantum Marvels
- Beyond Shannon
- Superadditivity of Quantum Channel Capacities
- Superactivation of Quantum Channel Capacities
- Causal activation of Quantum Channel Capacities
- Conclusions and Future Perspectives
Indicative Scheduling: 9:00 am-12:30 pm
– 09:00 – 09:30 Introduction and Motivation
– 09:30 – 10:20 Quantum Communications
– 10:20 – 10:40 Break
– 10:40 – 12:15 Beyond Shannon
– 12:15-12:30: Discussions
Linglong Dai (Tsinghua University), Haiyang Zhang (Nanjing University of Posts and Telecommunications), Yonina Eldar
The rationale for the tutorial
Extremely large antenna arrays (ELAA) have been viewed as one of the essential technologies for 6G networks. ELAA, deploying hundreds or even thousands of antennas at the base station or at passive reconfigurable intelligent surfaces (RIS), can significantly improve the system performance. ELAA for 6G not only means a sharp increase in the number of antennas, but also results in a fundamental change of the electromagnetic characteristics. The electromagnetic radiation field can generally be divided into far-field and near-field regions. The boundary between these two regions is determined by the Rayleigh distance, which is proportional to the product of the square of the array aperture and the carrier frequency. With the significant increase of the antenna number and carrier frequency in future 6G systems, the near-field region of ELAA will expand by orders of magnitude. Therefore, near-field communications will become essential for future 6G networks, and have attracted growing research interest in the past two years.
Different from far-field communications from 1G to 5G, where plane wave propagation holds, in the near-field communications for 6G has the spherical plane wave propagation. This fundamental change in the properties of electromagnetic fields introduces both challenges and opportunities for ELAA systems. On one hand, some of the classic theories of wireless communication derived based on the assumption of far-field plane waves may suffer from unacceptable performance degradation. On the other hand, the near-field propagation brings new potential benefits to communications, e.g., enhancing the signal strength of target receivers or providing a new degree of freedom in the distance domain for the improvement of spectrum efficiency. The challenges and opportunities encountered in near-field communications are highly related to signal processing techniques, such as the sparse signal processing for near-field channel estimation, the optimization-based near-field beamforming design, near-field inter-user interference mitigation techniques based on iterative algorithm design, etc. Consequently, the design and analysis of dedicated signal processing techniques will play an important role in near-field communications, which motivates us the deliver this tutorial on signal processing for 6G near-field communications.
The proposed tutorial is self-contained. We do not require any background in near-field communications. It is enough for the audience to understand this tutorial if they have some basic concepts of signal processing and communications. We believe that the tutorial would be beneficial and inspiring for attendees with a background in the areas of signal processing, communications, information theory, and thus a relatively large number of attendees will be attracted.
The tutorial outlining the topics and subtopics covered:
In this tutorial, we shall present in a pedagogic fashion the leading approaches for facilitating near-field communications using signal processing techniques. Specifically, the fundamental difference between far-field and near-field communications will be clarified at first. Then, we will discuss the challenges of near-field communications and how the signal processing techniques can be utilized to address these challenges. Moreover, we will introduce the opportunities of near-field communications and how to achieve these potentials from tools derived from signal processing. Finally, several open problems and future research directions will be pointed out from the signal processing perspective. The outline and time schedule are listed as below:
|20 minutes||The background of near-field communications for 6G||Yonina C. Eldar|
|40 minutes||Fundamentals of near-field communications||Linglong Dai|
|30 minutes||Challenges of near-field communications||Linglong Dai and Haiyang Zhang|
|10 minutes||Coffee Break||/|
|30 minutes||Opportunities for near-field communications||Linglong Dai and Haiyang Zhang|
|30 minutes||Future research directions and conclusions||Yonina C. Eldar|
|20 minutes||Q&A||Linglong Dai, Haiyang Zhang, and Yonina C. Eldar|
Manan Suri (IIT Delhi), Sounak Dey (Tata Consultancy Services Ltd.), Arun M. George (TCS Research & Innovation)
Embedding intelligence at the edge has become a critical requirement for many industry domains, especially disaster management, healthcare, manufacturing, retail, surveillance, remote sensing etc. Classical Machine learning or Deep learning (ML/DL) based systems, being heavy in terms of required computation and power consumption, are not suitable for Edge devices such as robot, drones, automated cars, satellites, routers, wearables etc. which are mostly battery driven and have very limited compute resource. Inspired from the extreme power efficiency of mammalian brains, an alternative computing paradigm of Spiking Neural Networks (SNN) also known as Neuromorphic Computing (NC), has evolved with a promise to bring in significant power efficiency compared to existing edge-AI solutions. NC follows non-von Neumann architecture where data and memory are collocated like brain neurons and SNNs handle only sparse event-based data (spikes) in asynchronous fashion. Inherently SNNs are very efficient to understand features in temporally varying signals and is found to efficiently classify/process auditory data, gestures/actions from video streams, spot keywords from audio streams, classify & predict time series from different sensors used in IoT, regenerate temporal patterns etc. The community is pursuing multiple sophisticated dedicated Neuromorphic hardware platforms such as: Intel Loihi, IBM TrueNorth, Brainchip Akida, SpiNNaker, DYNAPs to name a few.
Moreover, ultra-advanced and futuristic nanoelectronic devices and materials are being explored to build energy efficient neuromorphic computers. So this domain, as well as this tutorial, lines in the intersection of Computational Neuroscience, Machine Learning and In-memory Neuromorphic Computation techniques.
RATIONALE AND STRUCTURE OF THE TUTORIAL
The ICASSP community is at the forefront of research in the domain of signal processing. Thus it is extremely relevant to conduct a tutorial on advances in the domain of Neuromorphic Computing at the forum. We are proposing two valuable cross vertical elements in this tutorial:
(i) Firstly, the proposed tutorial has been developed keeping both academic and industrial/application interests in mind. The speakers represent leading academic and industry research teams on the subject with several years of theoretical and applied experience on the topic. Over last few years, while solving customer requirements related to edge computing, TCS Research has successfully taken neuromorphic research to real market applications. At the same time, group at IIT-Delhi has contributed significantly towards development of cutting-edge neuromorphic hardware and memory-inspired computing.
(ii) Secondly, the proposed tutorial will not only cover foundational basics (i.e. algorithms, bio-inspiration, mathematics) of the subject, but will also delve in to real hardware-level implementation and actual application use-cases (such as gesture recognition in robotics, time series classification and prediction in IoT, continuous health monitoring, remote sensing via satellite etc.) as pursued in industry so far.
The tutorial is structured to cover all relevant aspects of SNN and NC as detailed below.
- Biological Background, Software & Simulation of SNNs:
We will start the tutorial by giving a background of modus operandi of biological neurons, their equivalent computational models, spike generation process, synaptic weight updates and learning rules. Next, we will speak in detail about (i) existing software tools and SNN simulators, (ii) how to create a basic SNN, feed data into it and do a simple classification task, (iii) how to tune it towards better performance.
Speaker: Sounak Dey, duration: 50 minutes.
- Neuromorphic Hardware Basics:
Second part of the tutorial will cover the basics of dedicated hardware approaches for implementing neuromorphic algorithms in real world. Key techniques will be bench-marked for their pros and cons. How specialized neuromorphic hardware can offer performance benefits will be discussed. The session will end with discussion of futuristic nanomaterials proposed for neuromorphic computation.
Speaker: Manan Suri, duration: 50 minutes.
- Application & Implementation:
Third and final part of the tutorial will provide a detailed landscape of the applications that have been developed and tested so far using SNN and NC, including our own experiences. Aspects such as spike encoding techniques, conversion of ANNs to SNNs, FPGAs will be discussed.
Speaker: Arun George, duration: 50 minutes.
A flexible and interactive model of discussion and Q-A with the audience will be followed throughout the tutorial. Duration:10-30 minutes.
Pier Luigi Dragotti (Imperial College London), Amanda Foust (Imperial College London), Pingfan Song (University of Cambridge)
In order to achieve step changes in our understanding of brain function and dysfunction, for example to detect the onset of neurological diseases and disorders, large-scale imaging studies of neural populations are needed. Achieving this goal requires the ability to capture the dynamics of large populations of neurons at high speed and resolution over a large area of the brain. Multi-photon microscopy with fluorescent indicators is unparalleled in its ability to image cellular activity and neural circuits, deep in live light-scattering tissues, at single-cell resolution. In the typical set-up, multi-photon microscopes image a sample by operating in raster scanning modality. However, this approach is slow and is therefore inadequate to capture fast biological dynamics over a large area. The best way to image fast is by using scanless microscopes, and a particularly attractive candidate for high-speed three dimensional (3D) imaging is light-field microscopy (LFM). Unfortunately, disentangling volumetric information from light-field images is a formidably difficult computational task. At the same time, the flexibility of modern microscopes where illumination strategies, optics and fluorescence indicators can be changed according to specifics needs open up the possibility of developing very creative and innovative computational solutions for the reconstruction of neural activities from light-field data and in particular calls for the development of machine learning methods that fully exploit the physics of the acquisition device and any prior on the objects being imaged.
The goal of this tutorial is to introduce the signal processing audience to multi-photon microscopy for neuroscience and to highlight the computational challenges related to fast, volumetric, and cellular-resolution neuronal activity imaging. The main focus is on providing an overview of light-field microscopy and on recently developed physics-driven machine learning architectures for extracting neural activity from light-field temporal sequences. These data-driven methods exploit in full the structure and sparsity of neural activity as well as properties of the acquisition device.
As highlighted before, because of the flexibility of these microscopes, we believe there is a great opportunity for the signal processing community to contribute with new algorithms and new physics-driven neural networks to this timely important topic and also to develop theory and methods to “learn” the whole microscope pipeline (“end-to-end learning”).
- Introduction: the problem, its importance and why signal processing and machine learning methods are essential to address the inverse problems related to reconstructing neural activities from microscopy images
- Multi-photon microscopy overview: o Type of fluorescence indicators,
- illumination strategies (raster scanning versus scanless techniques),
- Light-field microscopy: light-field optics, wave-optic model to describe the image formation process
- Volume Reconstruction Methods: o Properties of neuronal activity signals and structure of light-field images
- Model-based and data-driven methods to retrieve volumetric information from light-field microscopy images.
- Recent results and opportunities for machine learning in multi-photon microscopy o Hybrid approaches for volume reconstruction
- Model-based machine learning for localization of neurons
- Multi-modal microscopes for semi-supervised learning
- Learning microscopes (end-to-end learning)
Recent Advances in Independent Vector Analysis/Extraction: Applications in Speech Processing and Data Fusion
Tulay Adali (University of Maryland, Baltimore County), Zbynek Koldovsky (Technical University of Liberec)
The rationale for the tutorial
° Importance: Blind Source Separation (BSS) is the decomposition of a given set of observations to its latent variables in an unsupervised manner. Independent Vector Analysis (IVA) generalizes Independent Component Analysis (ICA) to multiple datasets to fully leverage the statistical dependence among the datasets. ICA and IVA, have been useful in numerous applications, such as audio and speech processing, medical data analysis, and when used with multiple types of diversity provides a particularly attractive framework also relaxing the need for strong independence among the underlying latent variables (sources).
° Timeliness: Blind methods provide attractive alternatives to solutions such as deep neural networks (DNN) as they do not require any training and yield factorizations that are directly interpretable, which is not easy to achieve with deep nets. They can also successfully work in conjunction with DNNs, and can be attractive as physics based solutions.
° Novelty of the tutorial: The tutorial considers both IVA and Independent Vector Extraction, considers a general framework for both based on use of multiple types of diversity, and emphasizes recent advances of the framework in methods and applications to speech processing and medical data analysis. Matrix decompositions are attractive machine learning solutions, and IVA with multiple types of diversity provides a powerful solution to a number of practical problems.
° New ideas, topics, and tools: new mixing models: In addition to the basic introduction, new dynamic mixing models, theoretical topics such as lower bounds for achievable accuracy or identifiability conditions, new algorithms, integration of sparse representations and Graph Signal Processing into ICA/IVA will be also discussed. A number of new ideas will be presented, addressing important open problems in terms of theory and applications in speech processing and medical data analysis/fusion.
Part I – Blind Source Separation (BSS)
° Problem formulation: BSS, ICA and IVA principles, instantaneous and convolutive mixing models, frequency domain problem formulation, indeterminacies, permutation problem, discontinuity problem, application to complex-valued signals.
° ICA and IVA will be posed in a general framework where multiple types of diversity (statistical property) are jointly leveraged, in particular, non-Gaussianity, nonstationarity, non-whiteness, non-circularity, and statistical dependence across multiple datasets.
° Maximum likelihood estimation and Cramér-Rao-induced bounds on the achievable interference-to-signal ratio, performance bounds, identifiability conditions.
° Algorithm development: gradient and second-order algorithms, auxiliary function-based optimization, symmetric and deflationary approaches.
Part II – Blind Source Extraction (BSE)
° BSE problem: Independent Component/Vector Extraction (ICE/IVE), relation to previous methods, indeterminacies of the source-of-interest.
° Dynamic mixing models: naïve time-varying mixing model, Constant Separating Vector (CSV) model, Constant Mixing Vector (CMV) model, convex CSV model, corresponding CRiBs and identifiability conditions, double nonstationarity-based model.
° IVA/IVE approaches exploiting sparsity of mixing (de-mixing) channels.
° Piloted IVE: Partial control of global convergence (extraction of the desired source) using pilot signals.
Part II – Applications of BSS and BSE
° Blind extraction of a moving speaker.
° Fusion and joint analysis of neuroimaging data.