Electrical Engineering and Systems Science
- [1] arXiv:2405.10331 [pdf, ps, other]
-
Title: Detecting 5G Signal Jammers Using Spectrograms with Supervised and Unsupervised LearningSubjects: Signal Processing (eess.SP)
Cellular networks are potential targets of jamming attacks to disrupt wireless communications. Since the fifth generation (5G) of cellular networks enables mission-critical applications, such as autonomous driving or smart manufacturing, the resulting malfunctions can cause serious damage. This paper proposes to detect broadband jammers by an online classification of spectrograms. These spectrograms are computed from a stream of in-phase and quadrature (IQ) samples of 5G radio signals. We obtain these signals experimentally and describe how to design a suitable dataset for training. Based on this data, we compare two classification methods: a supervised learning model built on a basic convolutional neural network (CNN) and an unsupervised learning model based on a convolutional autoencoder (CAE). After comparing the structure of these models, their performance is assessed in terms of accuracy and computational complexity.
- [2] arXiv:2405.10372 [pdf, ps, html, other]
-
Title: Efficient model predictive control for nonlinear systems modelled by deep neural networksComments: 8 pages, 5 figuresSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper presents a model predictive control (MPC) for dynamic systems whose nonlinearity and uncertainty are modelled by deep neural networks (NNs), under input and state constraints. Since the NN output contains a high-order complex nonlinearity of the system state and control input, the MPC problem is nonlinear and challenging to solve for real-time control. This paper proposes two types of methods for solving the MPC problem: the mixed integer programming (MIP) method which produces an exact solution to the nonlinear MPC, and linear relaxation (LR) methods which generally give suboptimal solutions but are much computationally cheaper. Extensive numerical simulation for an inverted pendulum system modelled by ReLU NNs of various sizes is used to demonstrate and compare performance of the MIP and LR methods.
- [3] arXiv:2405.10389 [pdf, ps, html, other]
-
Title: Physics-Informed Heterogeneous Graph Neural Networks for DC Blocker PlacementHongwei Jin, Prasanna Balaprakash, Allen Zou, Pieter Ghysels, Aditi S. Krishnapriyan, Adam Mate, Arthur Barnes, Russell BentComments: Paper is accepted by PSCC 2024Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The threat of geomagnetic disturbances (GMDs) to the reliable operation of the bulk energy system has spurred the development of effective strategies for mitigating their impacts. One such approach involves placing transformer neutral blocking devices, which interrupt the path of geomagnetically induced currents (GICs) to limit their impact. The high cost of these devices and the sparsity of transformers that experience high GICs during GMD events, however, calls for a sparse placement strategy that involves high computational cost. To address this challenge, we developed a physics-informed heterogeneous graph neural network (PIHGNN) for solving the graph-based dc-blocker placement problem. Our approach combines a heterogeneous graph neural network (HGNN) with a physics-informed neural network (PINN) to capture the diverse types of nodes and edges in ac/dc networks and incorporates the physical laws of the power grid. We train the PIHGNN model using a surrogate power flow model and validate it using case studies. Results demonstrate that PIHGNN can effectively and efficiently support the deployment of GIC dc-current blockers, ensuring the continued supply of electricity to meet societal demands. Our approach has the potential to contribute to the development of more reliable and resilient power grids capable of withstanding the growing threat that GMDs pose.
- [4] arXiv:2405.10429 [pdf, ps, html, other]
-
Title: Physics-Guided State-Space Model Augmentation Using Weighted Regularized Neural NetworksSubjects: Systems and Control (eess.SY)
Physics-guided neural networks (PGNN) is an effective tool that combines the benefits of data-driven modeling with the interpretability and generalization of underlying physical information. However, for a classical PGNN, the penalization of the physics-guided part is at the output level, which leads to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. Furthermore, the classical PGNN cost function regularizes the model estimate over the entire state space with a constant trade-off hyperparameter. In this paper, we introduce a novel model augmentation strategy for nonlinear state-space model identification based on PGNN, using a weighted function regularization (W-PGNN). The proposed approach can efficiently augment the prior physics-based state-space models based on measurement data. A new weighted regularization term is added to the cost function to penalize the difference between the state and output function of the baseline physics-based and final identified model. This ensures the estimated model follows the baseline physics model functions in regions where the data has low information content, while placing greater trust in the data when a high informativity is present. The effectiveness of the proposed strategy over the current PGNN method is demonstrated on a benchmark example.
- [5] arXiv:2405.10435 [pdf, ps, html, other]
-
Title: Two-Stage Stochastic Optimal Power Flow for Microgrids With Uncertain Wildfire EffectsSubjects: Systems and Control (eess.SY)
Large-scale power outages caused by extreme weather events are one of the major factors weakening grid resilience. In order to prevent the critical infrastructure from cascading failure, power lines are often proactively de-energized under the threat of a progressing wildfire. In this context, the potential of microgrid (MG) functioning in islanded mode can be exploited to enhance the resiliency of the power grid. However, there are numerous uncertainties originating from these types of events and an accurate modeling of the MG is required to harness its full potential. In this paper, we consider the uncertainty in line outages depending on fire propagation and reduced solar power generation due to the particulate matter in wildfire smoke. We formulate a two-stage stochastic MG optimal power flow problem by utilizing a second-order cone relaxation of the DistFlow model. Leveraging an effective approximation of the resistive heat gain, we separate the complicating constraints of dynamic line rating from the resulting optimization problem. Extensive simulation results corroborate the merits of our proposed framework, which is tested on a modified IEEE 22-bus system.
- [6] arXiv:2405.10507 [pdf, ps, html, other]
-
Title: Flexible Beamforming for Movable Antenna-Enabled Integrated Sensing and CommunicationSubjects: Signal Processing (eess.SP)
This paper investigates flexible beamforming design in an integrated sensing and communication (ISAC) network with movable antennas (MAs). A bistatic radar system is integrated into a multi-user multiple-input-single-output (MU-MISO) system, with the base station (BS) equipped with MAs. This enables array response reconfiguration by adjusting the positions of antennas. Thus, a joint beamforming and antenna position optimization problem, namely flexible beamforming, is proposed to maximize communication rate and sensing mutual information (MI). The fractional programming (FP) method is adopted to transform the non-convex objective function, and we alternatively update the beamforming matrix and antenna positions. Karush-Kuhn-Tucker (KKT) conditions are employed to derive the close-form solution of the beamforming matrix, while we propose an efficient search-based projected gradient ascent (SPGA) method to update the antenna positions. Simulation results demonstrate that MAs significantly enhance the ISAC performance when employing our proposed algorithm, achieving a 59.8% performance gain compared to fixed uniform arrays.
- [7] arXiv:2405.10510 [pdf, ps, html, other]
-
Title: Implementation of the Feedforward Multichannel Virtual Sensing Active Noise Control (MVANC) by Using MATLABSubjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
The multichannel virtual sensing active noise control (MVANC) methodology is an advanced approach that may provide a wide area of silence at specific virtual positions that are distant from the physical error microphones. Currently, there is a scarcity of open-source programs available for the MVANC algorithm. This work presents a MATLAB code for the MVANC approach, utilizing the multichannel filtered-x least mean square (MCFxLMS) algorithm. The code is designed to be applicable to systems with any number of channels. The code can be found on GitHub.
- [8] arXiv:2405.10535 [pdf, ps, html, other]
-
Title: Dual-Robust Integrated Sensing and Communication: Beamforming under CSI Imperfection and Location UncertaintySubjects: Signal Processing (eess.SP)
A dual-robust design of beamforming is investigated in an integrated sensing and communication (ISAC) system.Existing research on robust ISAC waveform design, while proposing solutions to imperfect channel state information (CSI), generally depends on prior knowledge of the target's approximate location to design waveforms. This approach, however, limits the precision in sensing the target's exact location. In this paper, considering both CSI imperfection and target location uncertainty, a novel framework of joint robust optimization is proposed by maximizing the weighted sum of worst-case data rate and beampattern gain. To address this challenging problem, we propose an efficient two-layer iteration algorithm based on S-Procedure and convex hull. Finally, numerical results verify the effectiveness and performance improvement of our dual-robust algorithm, as well as the trade-off between communication and sensing performance.
- [9] arXiv:2405.10540 [pdf, ps, html, other]
-
Title: Radar Positioning for Accurate Sensing of Pulse Waves at Multiple Sites Using a 3D Human ModelComments: 8 pages, 8 figures, 8 tables. This work is going to be submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP)
This study proposes a sensing method using a millimeter-wave array radar and a depth camera to measure pulse waves at multiple sites on the human body. Using a three-dimensional shape model of the target human body measured by the depth camera, the method identifies reflection sites on the body through electromagnetic scattering simulation. On the basis of the simulation, the radar system can be positioned at a suitable location for measuring pulse waves depending on the posture of the target person. Through measurements using radar and depth camera systems, we demonstrate that the proposed method can estimate the body displacement waveform caused by pulse waves accurately, improving the accuracy by 14% compared with a conventional approach without a depth camera. The proposed method can be a key to realizing an accurate and noncontact sensor for monitoring blood pressure.
- [10] arXiv:2405.10550 [pdf, ps, html, other]
-
Title: LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-DiffusionTong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping ZhouSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.
- [11] arXiv:2405.10553 [pdf, ps, html, other]
-
Title: Revealing the Trade-off in ISAC Systems: The KL Divergence PerspectiveComments: 5 pages, 5 figures; submitted to IEEE journals for possible publicationSubjects: Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups.
- [12] arXiv:2405.10561 [pdf, ps, html, other]
-
Title: Infrared Image Super-Resolution via Lightweight Information Split NetworkSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
- [13] arXiv:2405.10570 [pdf, ps, other]
-
Title: Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRIYirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo QuComments: 10 pages, 8 figures, 6 tablesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
- [14] arXiv:2405.10606 [pdf, ps, html, other]
-
Title: Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and CommunicationComments: 13page, 9figures, Submitted to IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP)
In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier aggregation (CA)- enabled multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system fusing the sensing data on high and low-frequency bands by symbol-level fusion for ultimate communication experience and high-accuracy sensing. The challenges in sensing signal processing introduced by CA include the initial phase misalignment of the echo signals on high and low-frequency bands due to attenuation and radar cross section, and the fusion of the sensing data on high and lowfrequency bands with different physical-layer parameters. To this end, the sensing signal processing is decomposed into two stages. In the first stage, the problem of initial phase misalignment of the echo signals on high and low-frequency bands is solved by the angle compensation, space-domain diversity and vector crosscorrelation operations. In the second stage, this paper realizes symbol-level fusion of the sensing data on high and low-frequency bands through sensing vector rearrangement and cyclic prefix adjustment operations, thereby obtaining high-precision sensing performance. Then, the closed-form communication mutual information (MI) and sensing Cramer-Rao lower bound (CRLB) for the proposed ISAC system are derived to explore the theoretical performance bound with CA. Simulation results validate the feasibility and superiority of the proposed ISAC system.
- [15] arXiv:2405.10623 [pdf, ps, html, other]
-
Title: Model-free fast charging of lithium-ion batteries by online gradient descentSubjects: Systems and Control (eess.SY)
A data-driven solution is provided for the fast-charging problem of lithium-ion batteries with multiple safety and aging constraints. The proposed method optimizes the charging current based on the observed history of measurable battery quantities, such as the input current, terminal voltage, and temperature. The proposed method does not need any detailed battery model or full-charging training episodes. The theoretical convergence is proven under mild conditions and is validated numerically on several linear and nonlinear battery models, including single-particle and equivalent-circuit models.
- [16] arXiv:2405.10649 [pdf, ps, html, other]
-
Title: Recovery of Sparse Graph SignalsComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper investigates the recovery of a node-domain sparse graph signal from the output of a graph filter. This problem, often referred to as the identification of the source of a diffused sparse graph signal, is seminal in the field of graph signal processing (GSP). Sparse graph signals can be used in the modeling of a variety of real-world applications in networks, such as social, biological, and power systems, and enable various GSP tasks, such as graph signal reconstruction, blind deconvolution, and sampling. In this paper, we assume double sparsity of both the graph signal and the graph topology, as well as a low-order graph filter. We propose three algorithms to reconstruct the support set of the input sparse graph signal from the graph filter output samples, leveraging these assumptions and the generalized information criterion (GIC). First, we describe the graph multiple GIC (GM-GIC) method, which is based on partitioning the dictionary elements (graph filter matrix columns) that capture information on the signal into smaller subsets. Then, the local GICs are computed for each subset and aggregated to make a global decision. Second, inspired by the well-known branch and bound (BNB) approach, we develop the graph-based branch and bound GIC (graph-BNB-GIC), and incorporate a new tractable heuristic bound tailored to the graph and graph filter characteristics. Finally, we propose the graph-based first order correction (GFOC) method, which improves existing sparse recovery methods by iteratively examining potential improvements to the GIC cost function through replacing elements from the estimated support set with elements from their one-hop neighborhood. We conduct simulations that demonstrate that the proposed sparse recovery methods outperform existing methods in terms of support set recovery accuracy, and without a significant computational overhead.
- [17] arXiv:2405.10691 [pdf, ps, html, other]
-
Title: LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image CompletionZihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han ZhangSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the developing brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories.
- [18] arXiv:2405.10705 [pdf, ps, html, other]
-
Title: 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation LearningZhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming CuiComments: 12 pages, 13 figures, 5 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. However, sparse-view DSA reconstruction, aimed at reducing radiation dosage, is still underexplored in the research community. The dynamic blood flow and insufficient input of sparse-view DSA images present significant challenges to the 3D vessel reconstruction task. In this study, we propose to use a time-agnostic vessel probability field to solve this problem effectively. Our approach, termed as vessel probability guided attenuation learning, represents the DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the vessel probability field. Functioning as a dynamic mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism facilitates a self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves the reconstruction quality. Our model is trained by minimizing the disparity between synthesized projections and real captured DSA images. We further employ two training strategies to improve our reconstruction quality: (1) coarse-to-fine progressive training to achieve better geometry and (2) temporal perturbed rendering loss to enforce temporal consistency. Experimental results have demonstrated superior quality on both 3D vessel reconstruction and 2D view synthesis.
- [19] arXiv:2405.10723 [pdf, ps, html, other]
-
Title: Eddeep: Fast eddy-current distortion correction for diffusion MRI with deep learningComments: submitted to MICCAI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic because it causes misalignment between volumes, disrupting downstream modelling and analysis. The essential correction of eddy distortions is typically done post-acquisition, with image registration. However, this is non-trivial because correspondence between volumes can be severely disrupted due to volume-specific signal attenuations induced by varying directions and strengths of the applied gradients. This challenge has been successfully addressed by the popular FSL~Eddy tool but at considerable computational cost. We propose an alternative approach, leveraging recent advances in image processing enabled by deep learning (DL). It consists of two convolutional neural networks: 1) An image translator to restore correspondence between images; 2) A registration model to align the translated images. Results demonstrate comparable distortion estimates to FSL~Eddy, while requiring only modest training sample sizes. This work, to the best of our knowledge, is the first to tackle this problem with deep learning. Together with recently developed DL-based susceptibility correction techniques, they pave the way for real-time preprocessing of diffusion MRI, facilitating its wider uptake in the clinic.
- [20] arXiv:2405.10749 [pdf, ps, html, other]
-
Title: Universal Joint Source-Channel Coding for Modulation-Agnostic Semantic CommunicationSubjects: Signal Processing (eess.SP)
From the perspective of joint source-channel coding (JSCC), there has been significant research on utilizing semantic communication, which inherently possesses analog characteristics, within digital device environments. However, a single-model approach that operates modulation-agnostically across various digital modulation orders has not yet been established. This article presents the first attempt at such an approach by proposing a universal joint source-channel coding (uJSCC) system that utilizes a single-model encoder-decoder pair and trained vector quantization (VQ) codebooks. To support various modulation orders within a single model, the operation of every neural network (NN)-based module in the uJSCC system requires the selection of modulation orders according to signal-to-noise ratio (SNR) boundaries. To address the challenge of unequal output statistics from shared parameters across NN layers, we integrate multiple batch normalization (BN) layers, selected based on modulation order, after each NN layer. This integration occurs with minimal impact on the overall model size. Through a comprehensive series of experiments, we validate that this modulation-agnostic semantic communication framework demonstrates superiority over existing digital semantic communication approaches in terms of model complexity, communication efficiency, and task effectiveness.
- [21] arXiv:2405.10750 [pdf, ps, html, other]
-
Title: Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian OptimizationComments: 6 pagesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are limited by their lack of robustness, high computational costs, and susceptibility to local minima. In this study, Bayesian Optimization is used for tuning the dynamic parameters of an electrochemical equivalent circuit battery model (E-ECM) for a nickel-manganese-cobalt (NMC)-graphite cell. The performance of the Bayesian Optimization is compared with baseline methods based on gradient-based and metaheuristic approaches. The robustness of the parameter optimization method is tested by performing verification using an experimental drive cycle. The results indicate that Bayesian Optimization outperforms Gradient Descent and PSO optimization techniques, achieving reductions on average testing loss by 28.8% and 5.8%, respectively. Moreover, Bayesian optimization significantly reduces the variance in testing loss by 95.8% and 72.7%, respectively.
- [22] arXiv:2405.10779 [pdf, ps, html, other]
-
Title: Baseline Results for Selected Nonlinear System Identification BenchmarksMax D. Champneys, Gerben I. Beintema, Roland Tóth, Maarten Schoukens, Maarten Schoukens, Timothy J. RogersSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Nonlinear system identification remains an important open challenge across research and academia. Large numbers of novel approaches are seen published each year, each presenting improvements or extensions to existing methods. It is natural, therefore, to consider how one might choose between these competing models. Benchmark datasets provide one clear way to approach this question. However, to make meaningful inference based on benchmark performance it is important to understand how well a new method performs comparatively to results available with well-established methods. This paper presents a set of ten baseline techniques and their relative performances on five popular benchmarks. The aim of this contribution is to stimulate thought and discussion regarding objective comparison of identification methodologies.
- [23] arXiv:2405.10780 [pdf, ps, other]
-
Title: Intelligent Neural Interfaces: An Emerging Era in NeurotechnologySubjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Integrating smart algorithms on neural devices presents significant opportunities for various brain disorders. In this paper, we review the latest advancements in the development of three categories of intelligent neural prostheses featuring embedded signal processing on the implantable or wearable device. These include: 1) Neural interfaces for closed-loop symptom tracking and responsive stimulation; 2) Neural interfaces for emerging network-related conditions, such as psychiatric disorders; and 3) Intelligent BMI SoCs for movement recovery following paralysis.
- [24] arXiv:2405.10786 [pdf, ps, html, other]
-
Title: Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted MatrixComments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language ProcessingSubjects: Audio and Speech Processing (eess.AS)
Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these systems suffer from deterioration in the naturalness of anonymized speech, degradation in speaker distinctiveness, and severe privacy leakage against powerful attackers. To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity. Our approach extracts frame-level speaker vectors from a pre-trained ASV model and employs an attention mechanism to create a speaker-score matrix and speaker-related tokens. Notably, the speaker-score matrix acts as the weight for the corresponding speaker-related token, representing the speaker's identity. The singular value transformation-assisted matrix is generated by recomposing the decomposed orthonormal eigenvectors matrix and non-linear transformed singular through Singular Value Decomposition (SVD). Experiments on VoicePrivacy Challenge datasets demonstrate the effectiveness of our approach in protecting speaker privacy under all attack scenarios while maintaining speech naturalness and distinctiveness.
- [25] arXiv:2405.10803 [pdf, ps, html, other]
-
Title: A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for ExplainabilityComments: Early AcceptSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction.
- [26] arXiv:2405.10825 [pdf, ps, html, other]
-
Title: Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and OpportunitiesHao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan LiuSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.
- [27] arXiv:2405.10828 [pdf, ps, html, other]
-
Title: Analysis of Impulsive Interference in Digital Audio Broadcasting Systems in Electric VehiclesComments: 44th Symposium on Information Theory and Signal Processing in the Benelux (SITB 2024), Delft, the NetherlandsSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Recently, new types of interference in electric vehicles (EVs), such as converters switching and/or battery chargers, have been found to degrade the performance of wireless digital transmission systems. Measurements show that such an interference is characterized by impulsive behavior and is widely varying in time. This paper uses recorded data from our EV testbed to analyze the impulsive interference in the digital audio broadcasting band. Moreover, we use our analysis to obtain a corresponding interference model. In particular, we studied the temporal characteristics of the interference and confirmed that its amplitude indeed exhibits an impulsive behavior. Our results show that impulsive events span successive received signal samples and thus indicate a bursty nature. To this end, we performed a data-driven modification of a well-established model for bursty impulsive interference, the Markov-Middleton model, to produce synthetic noise realization. We investigate the optimal symbol detector design based on the proposed model and show significant performance gains compared to the conventional detector based on the additive white Gaussian noise assumption.
- [28] arXiv:2405.10833 [pdf, ps, html, other]
-
Title: Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scansSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients.
Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline.
Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission.
Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs. - [29] arXiv:2405.10870 [pdf, ps, html, other]
-
Title: Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases AutosegmentationYixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian PutzComments: Submission to the Green Journal (Major Revision)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training. - [30] arXiv:2405.10875 [pdf, ps, html, other]
-
Title: Recursively Feasible Shrinking-Horizon MPC in Dynamic Environments with Conformal Prediction GuaranteesSubjects: Systems and Control (eess.SY); Machine Learning (stat.ML)
In this paper, we focus on the problem of shrinking-horizon Model Predictive Control (MPC) in uncertain dynamic environments. We consider controlling a deterministic autonomous system that interacts with uncontrollable stochastic agents during its mission. Employing tools from conformal prediction, existing works derive high-confidence prediction regions for the unknown agent trajectories, and integrate these regions in the design of suitable safety constraints for MPC. Despite guaranteeing probabilistic safety of the closed-loop trajectories, these constraints do not ensure feasibility of the respective MPC schemes for the entire duration of the mission. We propose a shrinking-horizon MPC that guarantees recursive feasibility via a gradual relaxation of the safety constraints as new prediction regions become available online. This relaxation enforces the safety constraints to hold over the least restrictive prediction region from the set of all available prediction regions. In a comparative case study with the state of the art, we empirically show that our approach results in tighter prediction regions and verify recursive feasibility of our MPC scheme.
- [31] arXiv:2405.10892 [pdf, ps, html, other]
-
Title: Neuroscheduling for Remote EstimationComments: Submitted for presentation at the 2024 Asilomar Conference on Signals, Systems, and ComputersSubjects: Systems and Control (eess.SY)
Many modern distributed systems consist of devices that generate more data than what can be transmitted via a communication link in near real time with high-fidelity. We consider the scheduling problem in which a device has access to multiple data sources, but at any moment, only one of them is revealed in real-time to a remote receiver. Even when the sources are Gaussian, and the fidelity criterion is the mean squared error, the globally optimal data selection strategy is not known. We propose a data-driven methodology to search for the elusive optimal solution using linear function approximation approach called neuroscheduling and establish necessary and sufficient conditions for the optimal scheduler to not over fit training data. Additionally, we present several numerical results that show that the globally optimal scheduler and estimator pair to the Gaussian case are nonlinear.
New submissions for Monday, 20 May 2024 (showing 31 of 31 entries )
- [32] arXiv:2405.10391 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle AvoidanceComments: 8 pages, 10 figures, 3 tablesSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
- [33] arXiv:2405.10441 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Trajectory tracking control of a Remotely Operated Underwater Vehicle based on Fuzzy Disturbance Adaptation and Controller Parameter OptimizationSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
- [34] arXiv:2405.10463 (cross-list from physics.optics) [pdf, ps, html, other]
-
Title: Single-shot volumetric fluorescence imaging with neural fieldsOumeng Zhang, Haowen Zhou, Brandon Y. Feng, Elin M. Larsson, Reinaldo E. Alcalde, Siyuan Yin, Catherine Deng, Changhuei YangSubjects: Optics (physics.optics); Image and Video Processing (eess.IV)
Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. Existing SVF imaging methods often require large, complex point spread functions (PSFs) to meet the multiplexing requirements of compressed sensing, which limits the signal-to-noise ratio, resolution and/or field of view. In this paper, we introduce the QuadraPol PSF combined with neural fields, a novel approach for SVF imaging. This method utilizes a cost-effective custom polarizer at the back focal plane and a polarization camera to detect fluorescence, effectively encoding the 3D scene within a compact PSF without depth ambiguity. Additionally, we propose a reconstruction algorithm based on the neural fields technique that addresses the inaccuracies of phase retrieval methods used to correct imaging system aberrations. This algorithm combines the accuracy of experimental PSFs with the long depth of field of computationally generated retrieved PSFs. QuadraPol PSF, combined with neural fields, significantly reduces the acquisition time of a conventional fluorescence microscope by approximately 20 times and captures a 100 mm$^3$ cubic volume in one shot. We validate the effectiveness of both our hardware and algorithm through all-in-focus imaging of bacterial colonies on sand surfaces and visualization of plant root morphology. Our approach offers a powerful tool for advancing biological research and ecological studies.
- [35] arXiv:2405.10496 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Electromagnetic Information Theory for Holographic MIMO CommunicationsLi Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau YuenSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
- [36] arXiv:2405.10513 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Federated Learning With Energy Harvesting Devices: An MDP FrameworkSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms.
- [37] arXiv:2405.10514 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Secrecy Performance Analysis of Multi-Functional RIS-Assisted NOMA NetworksComments: 14 pages, 9 figures, submitted to IEEE transactions on wireless communicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-orthogonal multiple access (NOMA) networks. Specifically, we derive closed-form expressions for the secrecy outage probability (SOP) and secrecy throughput of users in the MF-RIS-assisted NOMA networks with external and internal eavesdroppers. The asymptotic expressions for SOP and secrecy diversity order are also analyzed under high signal-to-noise ratio (SNR) conditions. Additionally, we examine the impact of receiver hardware limitations and error transmission-induced imperfect successive interference cancellation (SIC) on the secrecy performance. Numerical results indicate that: i) under the same power budget, the secrecy performance achieved by MF-RIS significantly outperforms active RIS and simultaneously transmitting and reflecting RIS; ii) with increasing power budget, residual interference caused by imperfect SIC surpasses thermal noise as the primary factor affecting secrecy capacity; and iii) deploying additional elements at the MF-RIS brings significant secrecy enhancements for the external eavesdropping scenario, in contrast to the internal eavesdropping case.
- [38] arXiv:2405.10518 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Decoding remote sensing images to achieve high perceptual quality, particularly at low bitrates, remains a significant challenge. To address this problem, we propose the invertible neural network-based remote sensing image compression (INN-RSIC) method. Specifically, we capture compression distortion from an existing image compression algorithm and encode it as a set of Gaussian-distributed latent variables via INN. This ensures that the compression distortion in the decoded image becomes independent of the ground truth. Therefore, by leveraging the inverse mapping of INN, we can input the decoded image along with a set of randomly resampled Gaussian distributed variables into the inverse network, effectively generating enhanced images with better perception quality. To effectively learn compression distortion, channel expansion, Haar transformation, and invertible blocks are employed to construct the INN. Additionally, we introduce a quantization module (QM) to mitigate the impact of format conversion, thus enhancing the framework's generalization and improving the perceptual quality of enhanced images. Extensive experiments demonstrate that our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.
- [39] arXiv:2405.10589 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Improving Point-based Crowd Counting and Localization Based on Auxiliary Point GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points, adversely affecting overall performance. To address this issue, we introduce an effective approach to stabilize the proposal-target matching in point-based methods. We propose Auxiliary Point Guidance (APG) to provide clear and effective guidance for proposal selection and optimization, addressing the core issue of matching uncertainty. Additionally, we develop Implicit Feature Interpolation (IFI) to enable adaptive feature extraction in diverse crowd scenarios, further enhancing the model's robustness and accuracy. Extensive experiments demonstrate the effectiveness of our approach, showing significant improvements in crowd counting and localization performance, particularly under challenging conditions. The source codes and trained models will be made publicly available.
- [40] arXiv:2405.10695 (cross-list from cs.IT) [pdf, ps, other]
-
Title: On the Design of Super ConstellationsThrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, George K. KaragiannidisSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In the evolving landscape of sixth-generation (6G) wireless networks, which demand ultra high data rates, this study introduces the concept of super constellation communications. Also, we present super amplitude phase shift keying (SAPSK), an innovative modulation technique designed to achieve these ultra high data rate demands. SAPSK is complemented by the generalized polar distance detector (GPD-D), which approximates the optimal maximum likelihood detector in channels with Gaussian phase noise (GPN). By leveraging the decision regions formulated by GPD-D, a tight closed-form approximation for the symbol error probability (SEP) of SAPSK constellations is derived, while a detection algorithm with O(1) time complexity is developed to ensure fast and efficient SAPSK symbol detection. Finally, the theoretical performance of SAPSK and the efficiency of the proposed O(1) algorithm are validated by numerical simulations, highlighting both its superiority in terms of SEP compared to various constellations and its practical advantages in terms of fast and accurate symbol detection.
- [41] arXiv:2405.10814 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive NoiseComments: This work has been submitted to the IEEE for possible publicationSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
We developed machine learning approaches for data-driven trellis-based soft symbol detection in coded transmission over intersymbol interference (ISI) channels in presence of bursty impulsive noise (IN), for example encountered in wireless digital broadcasting systems and vehicular communications. This enabled us to obtain optimized detectors based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm while circumventing the use of full channel state information (CSI) for computing likelihoods and trellis state transition probabilities. First, we extended the application of the neural network (NN)-aided BCJR, recently proposed for ISI channels with additive white Gaussian noise (AWGN). Although suitable for estimating likelihoods via labeling of transmission sequences, the BCJR-NN method does not provide a framework for learning the trellis state transitions. In addition to detection over the joint ISI and IN states we also focused on another scenario where trellis transitions are not trivial: detection for the ISI channel with AWGN with inaccurate knowledge of the channel memory at the receiver. Without access to the accurate state transition matrix, the BCJR- NN performance significantly degrades in both settings. To this end, we devised an alternative approach for data-driven BCJR detection based on the unsupervised learning of a hidden Markov model (HMM). The BCJR-HMM allowed us to optimize both the likelihood function and the state transition matrix without labeling. Moreover, we demonstrated the viability of a hybrid NN and HMM BCJR detection where NN is used for learning the likelihoods, while the state transitions are optimized via HMM. While reducing the required prior channel knowledge, the examined data-driven detectors with learned trellis state transitions achieve bit error rates close to the optimal full CSI-based BCJR, significantly outperforming detection with inaccurate CSI.
- [42] arXiv:2405.10847 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: Model Predictive Contouring Control for Vehicle Obstacle Avoidance at the Limit of Handling Using Torque VectoringAlberto Bertipaglia, Davide Tavernini, Umberto Montanaro, Mohsen Alirezaei, Riender Happee, Aldo Sorniotti, Barys ShyrokauComments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emergencies. The controller's prediction model is a nonlinear double-track vehicle model based on an extended Fiala tyre to capture the nonlinear coupled longitudinal and lateral dynamics. The controller computes the optimal steering angle and the longitudinal forces per each of the four wheels to minimise tracking error in safe situations and maximise the vehicle-to-obstacle distance in emergencies. Thanks to the optimisation of the longitudinal tyre forces, the proposed controller can produce an extra yaw moment, increasing the vehicle's lateral agility to avoid obstacles while keeping the vehicle stable. The optimal forces are constrained in the tyre friction circle not to exceed the tyres and vehicle capabilities. In a high-fidelity simulation environment, we demonstrate the benefits of torque vectoring, showing that our proposed approach is capable of successfully avoiding obstacles and keeping the vehicle stable while driving a double-lane change manoeuvre, in comparison to baselines lacking torque vectoring or collision avoidance prioritisation.
- [43] arXiv:2405.10859 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: A Nonlinear Model Predictive Control for Automated Drifting with a Standard Passenger VehicleComments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents a novel approach to automated drifting with a standard passenger vehicle, which involves a Nonlinear Model Predictive Control to stabilise and maintain the vehicle at high sideslip angle conditions. The proposed controller architecture is split into three components. The first part consists of the offline computed equilibrium maps, which provide the equilibrium points for each vehicle state given the desired sideslip angle and radius of the path. The second is the predictive controller minimising the errors between the equilibrium and actual vehicle states. The third is a path-following controller, which reduces the path error, altering the equilibrium curvature path. In a high-fidelity simulation environment, we validate the controller architecture capacity to stabilise the vehicle in automated drifting along a desired path, with a maximal lateral path deviation of 1 m. In the experiments with a standard passenger vehicle, we demonstrate that the proposed approach is capable of bringing and maintaining the vehicle at the desired 30 deg sideslip angle in both high and low friction conditions.
Cross submissions for Monday, 20 May 2024 (showing 12 of 12 entries )
- [44] arXiv:2208.14328 (replaced) [pdf, ps, html, other]
-
Title: 3D Near-Field Virtual MIMO-SAR Imaging Using FMCW Radar Systems at 77 GHzSubjects: Signal Processing (eess.SP)
In this paper, we present 3D high resolution radar imaging at millimeter-Wave (mmWave) frequencies by means of a combination of virtual Multiple Input Multiple Output (MIMO) Frequency Modulated Continuous Wave (FMCW) Radars and Synthetic Aperture Radar (SAR) which results in a compact, low-cost, and high-speed 3D mmWave imagery system with low complexity.
- [45] arXiv:2303.00351 (replaced) [pdf, ps, html, other]
-
Title: Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical dataComments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URLJournal-ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen poses, smaller network size, and improved sample efficiency. Despite this, most segmentation networks used in medical image analysis continue to rely on standard convolutional kernels. In this paper, we present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. In addition, we demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks, with enhanced robustness to reduced amounts of training data and improved parameter efficiency. Code to reproduce our results, and to implement the equivariant segmentation networks for other tasks is available at this http URL
- [46] arXiv:2303.12286 (replaced) [pdf, ps, html, other]
-
Title: Explainable Semantic Communication for Text TasksSubjects: Signal Processing (eess.SP)
Task-oriented semantic communication has gained increasing attention due to its ability to reduce the amount of transmitted data without sacrificing task performance. Although some prior efforts have been dedicated to developing semantic communications, the semantics in these works remains to be unexplainable. Challenges related to explainable semantic representation and knowledge-based semantic compression have yet to be explored. In this paper, we propose a triplet-based explainable semantic communication (TESC) scheme for representing text semantics efficiently. Specifically, we develop a semantic extraction method to convert text into triplets while using syntactic dependency analysis to enhance semantic completeness. Then, we design a semantic filtering method to further compress the duplicate and task-irrelevant triplets based on prior knowledge. The filtered triplets are encoded and transmitted to the receiver for completing intelligent tasks. Furthermore, we apply the propsed TESC scheme to two emblematic text tasks: sentiment analysis and question answering, in which the semantic codec is meticulously customized for each task. Experimental results demonstrate that 1) TESC scheme outperforms benchmarks in terms of Top-1 accuracy and transmission efficiency, and 2) TESC scheme enjoys about 150% performance gain compared to the traditional communication method.
- [47] arXiv:2303.17593 (replaced) [pdf, ps, html, other]
-
Title: Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiogramsFlorin Condrea, Saikiran Rapaka, Lucian Itu, Puneet Sharma, Jonathan Sperl, A Mohamed Ali, Marius LeordeanuComments: Accepted to Computers in Biology and Medicine journalSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Pulmonary Embolisms (PE) represent a leading cause of cardiovascular death. While medical imaging, through computed tomographic pulmonary angiography (CTPA), represents the gold standard for PE diagnosis, it is still susceptible to misdiagnosis or significant diagnosis delays, which may be fatal for critical cases. Despite the recently demonstrated power of deep learning to bring a significant boost in performance in a wide range of medical imaging tasks, there are still very few published researches on automatic pulmonary embolism detection. Herein we introduce a deep learning based approach, which efficiently combines computer vision and deep neural networks for pulmonary embolism detection in CTPA. Our method features novel improvements along three orthogonal axes: 1) automatic detection of anatomical structures; 2) anatomical aware pretraining, and 3) a dual-hop deep neural net for PE detection. We obtain state-of-the-art results on the publicly available multicenter large-scale RSNA dataset.
- [48] arXiv:2305.09441 (replaced) [pdf, ps, html, other]
-
Title: STLCCP: An Efficient Convex Optimization-based Framework for Signal Temporal Logic SpecificationsComments: 17 pagesSubjects: Systems and Control (eess.SY); Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO)
Signal Temporal Logic (STL) is capable of expressing a broad range of temporal properties that controlled dynamical systems must satisfy. In the literature, both mixed-integer programming (MIP) and nonlinear programming (NLP) methods have been applied to solve optimal control problems with STL specifications. However, neither approach has succeeded in solving problems with complex long-horizon STL specifications within a realistic timeframe. This study proposes a new optimization framework, called \textit{STLCCP}, which explicitly incorporates several structures of STL to mitigate this issue. The core of our framework is a structure-aware decomposition of STL formulas, which converts the original program into a difference of convex (DC) programs. This program is then solved as a convex quadratic program sequentially, based on the convex-concave procedure (CCP). Our numerical experiments on several commonly used benchmarks demonstrate that this framework can effectively handle complex scenarios over long horizons, which have been challenging to address even using state-of-the-art optimization methods.
- [49] arXiv:2306.08722 (replaced) [pdf, ps, html, other]
-
Title: Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided ExplorationComments: 32 pages, 7 figures; Accepted by the 6th Annual Conference on Learning for Dynamics and Control (L4DC 2024)Subjects: Systems and Control (eess.SY)
Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.
- [50] arXiv:2308.05042 (replaced) [pdf, ps, other]
-
Title: An Electrical Grid with Discrete Energy LevelsComments: 15 pages, 7 figuresSubjects: Systems and Control (eess.SY)
Minimizing both power fluctuations and energy waste in an electrical grid is a central challenge to energy policy. Any discrepancy between power production and loads may lead to inefficiencies and instability in the system. Right now, the electrical grid is an analog system that only retroactively reacts to power demands. The balancing act becomes even harder with the penetration of sustainable resources (e.g., wind turbines). Here, we consider the effect of random perturbations to the grid's steady states operation. A model is constructed and analyzed within the framework of a randomly perturbed Markovian chain. Instead of balancing continuous values for supply and demand, the model assumes that both the generators and the users adhere to discrete pattern energy levels which is supplemented by local, short-term energy storage units (STESU). Under reasonable assumptions, we show that this grid maintains stability (meaning, a constant difference between supply and demand) over long periods of time while subjected to randomly fluctuating energy conditions.
- [51] arXiv:2309.05075 (replaced) [pdf, ps, html, other]
-
Title: Secure Set-Based State Estimation for Linear Systems under Adversarial Attacks on SensorsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Set-based state estimation plays a vital role in the safety verification of dynamical systems, which becomes significantly challenging when the system's sensors are susceptible to cyber-attacks. Existing methods often impose limitations on the attacker's capabilities, restricting the number of attacked sensors to be strictly less than half of the total number of sensors. This paper proposes a Secure Set-Based State Estimation (S3E) algorithm that addresses this limitation. The S3E algorithm guarantees that the true system state is contained within the estimated set, provided the initialization set encompasses the true initial state and the system is redundantly observable from the set of uncompromised sensors. The algorithm gives the estimated set as a collection of constrained zonotopes, which can be employed as robust certificates for verifying whether the system adheres to safety constraints. Furthermore, we demonstrate that the estimated set remains unaffected by attack signals of sufficiently large and also establish sufficient conditions for attack detection, identification, and filtering. This compels the attacker to inject only stealthy signals of small magnitude to evade detection, thus preserving the accuracy of the estimated set. When a few number of sensors (less than half) can be compromised, we prove that the estimated set remains bounded by a contracting set that converges to a ball whose radius is solely determined by the noise magnitude and is independent of the attack signals. To address the computational complexity of the algorithm, we offer several strategies for complexity-performance trade-offs. The efficacy of the proposed algorithm is illustrated through its application to a three-story building model.
- [52] arXiv:2311.03682 (replaced) [pdf, ps, html, other]
-
Title: Incentive Design for Eco-driving in Urban Transportation NetworksSubjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
Eco-driving emerges as a cost-effective and efficient strategy to mitigate greenhouse gas emissions in urban transportation networks. Acknowledging the persuasive influence of incentives in shaping driver behavior, this paper presents the `eco-planner,' a digital platform devised to promote eco-driving practices in urban transportation. At the outset of their trips, users provide the platform with their trip details and travel time preferences, enabling the eco-planner to formulate personalized eco-driving recommendations and corresponding incentives, while adhering to its budgetary constraints. Upon trip completion, incentives are transferred to users who comply with the recommendations and effectively reduce their emissions. By comparing our proposed incentive mechanism with a baseline scheme that offers uniform incentives to all users, we demonstrate that our approach achieves superior emission reductions and increased user compliance with a smaller budget.
- [53] arXiv:2311.13889 (replaced) [pdf, ps, html, other]
-
Title: SIMBa: System Identification Methods leveraging BackpropagationComments: First two authors contributed equally. Submitted to IEEE TCSTSubjects: Systems and Control (eess.SY)
This manuscript details and extends the SIMBa toolbox (System Identification Methods leveraging Backpropagation) presented in previous work, which uses well-established Machine Learning tools for discrete-time linear multi-step-ahead state-space System Identification (SI). SIMBa leverages linear-matrix-inequality-based free parametrizations of Schur matrices to guarantee the stability of the identified model by design. In this paper, backed up by novel free parametrizations of Schur matrices, we extend the toolbox to show how SIMBa can incorporate known sparsity patterns or true values of the state-space matrices to identify without jeopardizing stability.
We extensively investigate SIMBa's behavior when identifying diverse systems with various properties from both simulated and real-world data. Overall, we find it consistently outperforms traditional stable subspace identification methods, and sometimes significantly, especially when enforcing desired model properties. These results hint at the potential of SIMBa to pave the way for generic structured nonlinear SI. The toolbox is open-sourced on this https URL. - [54] arXiv:2401.07780 (replaced) [pdf, ps, other]
-
Title: Learning Soft Constrained MPC Value Functions: Efficient MPC Design and Implementation providing Stability and Safety GuaranteesSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Model Predictive Control (MPC) can be applied to safety-critical control problems, providing closed-loop safety and performance guarantees. Implementation of MPC controllers requires solving an optimization problem at every sampling instant, which is challenging to execute on embedded hardware. To address this challenge, we propose a framework that combines a tightened soft constrained MPC formulation with supervised learning to approximate the MPC value function. This combination enables us to obtain a corresponding optimal control law, which can be implemented efficiently on embedded platforms. The framework ensures stability and constraint satisfaction for various nonlinear systems. While the design effort is similar to that of nominal MPC, the proposed formulation provides input-to-state stability (ISS) with respect to the approximation error of the value function. Furthermore, we prove that the value function corresponding to the soft constrained MPC problem is Lipschitz continuous for Lipschitz continuous systems, even if the optimal control law may be discontinuous. This serves two purposes: First, it allows to relate approximation errors to a sufficiently large constraint tightening to obtain constraint satisfaction guarantees. Second, it paves the way for an efficient supervised learning procedure to obtain a continuous value function approximation. We demonstrate the effectiveness of the method using a nonlinear numerical example.
- [55] arXiv:2402.03145 (replaced) [pdf, ps, other]
-
Title: SafEDMD: A certified learning architecture tailored to data-driven control of nonlinear dynamical systemsSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose Stability- and certificate-oriented EDMD (SafEDMD): a novel EDMD-based learning architecture which comes along with rigorous certificates, resulting in a reliable surrogate model generated in a data-driven fashion. To ensure the trustworthiness of SafEDMD, we derive proportional error bounds, which vanish at the origin and are tailored to control tasks, leading to certified controller design based on semi-definite programming. We illustrate the developed method by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
- [56] arXiv:2403.13680 (replaced) [pdf, ps, html, other]
-
Title: Step-Calibrated Diffusion for Biomedical Optical Image RestorationYiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. HollonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, which can result in diagnostic errors and misguided treatment. Restoration of optical images is a challenging computer vision task because the sources of image degradation are multi-factorial, stochastic, and tissue-dependent, preventing a straightforward method to obtain paired low-quality/high-quality data. Here, we present Restorative Step-Calibrated Diffusion (RSCD), an unpaired image restoration method that views the image restoration problem as completing the finishing steps of a diffusion-based image generation task. RSCD uses a step calibrator model to dynamically determine the severity of image degradation and the number of steps required to complete the reverse diffusion process for image restoration. RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation metrics for restoring optical images. Medical imaging experts consistently prefer images restored using RSCD in blinded comparison experiments and report minimal to no hallucinations. Finally, we show that RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging. Our code is available at this https URL.
- [57] arXiv:2403.15521 (replaced) [pdf, ps, html, other]
-
Title: Exploring new territory: Calibration-free decoding for c-VEP BCIComments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
This study explores two zero-training methods aimed at enhancing the usability of brain-computer interfaces (BCIs) by eliminating the need for a calibration session. We introduce a novel method rooted in the event-related potential (ERP) domain, unsupervised mean maximization (UMM), to the fast code-modulated visual evoked potential (c-VEP) stimulus protocol. We compare UMM to the state-of-the-art c-VEP zero-training method that uses canonical correlation analysis (CCA). The comparison includes instantaneous classification and classification with cumulative learning from previously classified trials for both CCA and UMM. Our study shows the effectiveness of both methods in navigating the complexities of a c-VEP dataset, highlighting their differences and distinct strengths. This research not only provides insights into the practical implementation of calibration-free BCI methods but also paves the way for further exploration and refinement. Ultimately, the fusion of CCA and UMM holds promise for enhancing the accessibility and usability of BCI systems across various application domains and a multitude of stimulus protocols.
- [58] arXiv:2404.00082 (replaced) [pdf, ps, html, other]
-
Title: Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay LinesComments: The article has been submitted to EURASIP Journal on Audio, Speech, and Music Processing on Jan 02, 2024 and is currently under reviewSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually-motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics, and outperforms existing methods based on genetic algorithms and analytical FDN design.
- [59] arXiv:2405.09551 (replaced) [pdf, ps, html, other]
-
Title: Towards Bi-Hemispheric Emotion Mapping through EEG: A Dual-Stream Neural Network ApproachDavid Freire-Obregón, Daniel Hernández-Sosa, Oliverio J. Santana, Javier Lorenzo-Navarro, Modesto Castrillón-SantanaComments: Second place award at the Brain Responses to Emotional Avatars Challenge held by the 18th IEEE International Conference on Automatic Face and Gesture Recognition(FG2024)Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC)
Emotion classification through EEG signals plays a significant role in psychology, neuroscience, and human-computer interaction. This paper addresses the challenge of mapping human emotions using EEG data in the Mapping Human Emotions through EEG Signals FG24 competition. Subjects mimic the facial expressions of an avatar, displaying fear, joy, anger, sadness, disgust, and surprise in a VR setting. EEG data is captured using a multi-channel sensor system to discern brain activity patterns. We propose a novel two-stream neural network employing a Bi-Hemispheric approach for emotion inference, surpassing baseline methods and enhancing emotion recognition accuracy. Additionally, we conduct a temporal analysis revealing that specific signal intervals at the beginning and end of the emotion stimulus sequence contribute significantly to improve accuracy. Leveraging insights gained from this temporal analysis, our approach offers enhanced performance in capturing subtle variations in the states of emotions.
- [60] arXiv:2405.09554 (replaced) [pdf, ps, html, other]
-
Title: Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto PriorSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.
- [61] arXiv:2206.05850 (replaced) [pdf, ps, html, other]
-
Title: Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual AlgorithmComments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function. For general policy parametrization, we prove convergence of value function to global optimal upto an approximation error due to restricted policy class. We even improve the sample complexity of existing constrained NPG-PD algorithm \cite{Ding2020} from $\mathcal{O}(1/\epsilon^6)$ to $\mathcal{O}(1/\epsilon^4)$. To the best of our knowledge, this is the first work to establish zero constraint violation with Natural policy gradient style algorithms for infinite horizon discounted CMDPs. We demonstrate the merits of proposed algorithm via experimental evaluations.
- [62] arXiv:2304.00598 (replaced) [pdf, ps, html, other]
-
Title: Stochastic Reachability of Uncontrolled Systems via Probability Measures: Approximation via Deep Neural NetworksComments: 8 pages, 4 figures, 1 table, Submitted to the Conference on Decision and Control 2024Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper poses a theoretical characterization of the stochastic reachability problem in terms of probability measures, capturing the probability measure of the state of the system that satisfies the reachability specification for all probabilities over a finite horizon. We achieve this by constructing the level sets of the probability measure for all probability values and, since our approach is only for autonomous systems, we can determine the level sets via forward simulations of the system from a point in the state space at some time step in the finite horizon to estimate the reach probability. We devise a training procedure which exploits this forward simulation and employ it to design a deep neural network (DNN) to predict the reach probability provided the current state and time step. We validate the effectiveness of our approach through three examples.
- [63] arXiv:2309.00520 (replaced) [pdf, ps, html, other]
-
Title: Robust Online Learning over NetworksSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
The recent deployment of multi-agent networks has enabled the distributed solution of learning problems, where agents cooperate to train a global model without sharing their local, private data. This work specifically targets some prevalent challenges inherent to distributed learning: (i) online training, i.e., the local data change over time; (ii) asynchronous agent computations; (iii) unreliable and limited communications; and (iv) inexact local computations. To tackle these challenges, we apply the Distributed Operator Theoretical (DOT) version of the Alternating Direction Method of Multipliers (ADMM), which we call "DOT-ADMM". We prove that if the DOT-ADMM operator is metric subregular, then it converges with a linear rate for a large class of (not necessarily strongly) convex learning problems toward a bounded neighborhood of the optimal time-varying solution, and characterize how such neighborhood depends on (i)-(iv). We first derive an easy-to-verify condition for ensuring the metric subregularity of an operator, followed by tutorial examples on linear and logistic regression problems. We corroborate the theoretical analysis with numerical simulations comparing DOT-ADMM with other state-of-the-art algorithms, showing that only the proposed algorithm exhibits robustness to (i)-(iv).
- [64] arXiv:2401.07494 (replaced) [pdf, ps, html, other]
-
Title: Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering TasksSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Computational efficiency and non-adversarial robustness are critical factors in process modeling and optimization for real-world engineering applications. Yet, conventional neural networks often fall short in addressing both simultaneously, or even separately. Drawing insights from natural physical systems and existing literature, it is known theoretically that an input convex architecture will enhance computational efficiency, while a Lipschitz-constrained architecture will bolster non-adversarial robustness. However, integrating both properties into one model is a nontrivial task, as enforcing one property may compromise the other one. Therefore, in this work, we develop a novel network architecture, termed Input Convex Lipschitz Recurrent Neural Networks, that inherits the strengths of both convexity and Lipschitz continuity. This model is explicitly designed for fast and robust optimization-based tasks, which outperforms existing recurrent units in terms of computational efficiency and non-adversarial robustness. Additionally, we have successfully implemented this model in various practical engineering applications, such as optimization of chemical processes and real-world solar irradiance prediction for Solar PV system planning at LHT Holdings in Singapore. Source code is available at this https URL.
- [65] arXiv:2401.17583 (replaced) [pdf, ps, html, other]
-
Title: Agile But Safe: Learning Collision-Free High-Speed Legged LocomotionComments: Published at RSS 2024, Project website: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles.
- [66] arXiv:2402.08502 (replaced) [pdf, ps, other]
-
Title: Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open SeaSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.
- [67] arXiv:2403.01009 (replaced) [pdf, ps, html, other]
-
Title: Design and Performance Evaluation of SEANet, a Software-defined Networking Platform for the Internet of Underwater ThingsComments: 14 pages, 18 figuresSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
This paper presents the design and performance evaluation of the SEANet platform, a software-defined acoustic modem designed for enhancing underwater networking and Internet of Underwater Things (IoUT) applications. Addressing the limitations of traditional acoustic modems, which suffer from low data rates and rigid architectures, SEANet introduces a versatile, adaptive framework capable of reconfiguring all layers of the protocol stack in real-time to accommodate diverse marine applications. The platform integrates high-performance, wideband data converters with modular hardware and software components, enabling real-time adaptation to changing environmental and operational conditions. Experimental evaluations conducted in oceanic settings demonstrate the SEANet capability to significantly exceed the performance of existing commercial underwater modems, supporting data rates up to 150 kbit/s and effectively doubling the performance metrics of conventional systems. Our robust testing also highlights the SEANet proficiency in channel estimation, Orthogonal Frequency-Division Multiplexing (OFDM) link establishment, and interoperability through the JANUS communication standard. Our results underscore the SEANet potential to transform underwater communication technologies, providing a scalable and efficient solution that supports high data rate applications and fosters the expansion of IoUT deployments.
- [68] arXiv:2403.15523 (replaced) [pdf, ps, html, other]
-
Title: Towards auditory attention decoding with noise-tagging: A pilot studyComments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Auditory attention decoding (AAD) aims to extract from brain activity the attended speaker amidst candidate speakers, offering promising applications for neuro-steered hearing devices and brain-computer interfacing. This pilot study makes a first step towards AAD using the noise-tagging stimulus protocol, which evokes reliable code-modulated evoked potentials, but is minimally explored in the auditory modality. Participants were sequentially presented with two Dutch speech stimuli that were amplitude-modulated with a unique binary pseudo-random noise-code, effectively tagging these with additional decodable information. We compared the decoding of unmodulated audio against audio modulated with various modulation depths, and a conventional AAD method against a standard method to decode noise-codes. Our pilot study revealed higher performances for the conventional method with 70 to 100 percent modulation depths compared to unmodulated audio. The noise-code decoder did not further improve these results. These fundamental insights highlight the potential of integrating noise-codes in speech to enhance auditory speaker detection when multiple speakers are presented simultaneously.
- [69] arXiv:2403.16165 (replaced) [pdf, ps, html, other]
-
Title: Input-to-State Stability of Newton Methods for Generalized Equations in Nonlinear OptimizationComments: Submitted to 2024 Conference on Decision and ControlSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We show that Newton methods for generalized equations are input-to-state stable with respect to disturbances such as due to inexact computations. We then use this result to obtain convergence and robustness of a multistep Newton-type method for multivariate generalized equations. We demonstrate the usefulness of the results with other applications to nonlinear optimization. In particular, we provide a new proof for (robust) local convergence of the augmented Lagrangian method.
- [70] arXiv:2405.05170 (replaced) [pdf, ps, html, other]
-
Title: Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortionsSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Digital watermarking is the process of embedding secret information by altering images in an undetectable way to the human eye. To increase the robustness of the model, many deep learning-based watermarking methods use the encoder-noise-decoder architecture by adding different noises to the noise layer. The decoder then extracts the watermarked information from the distorted image. However, this method can only resist weak noise attacks. To improve the robustness of the decoder against stronger noise, this paper proposes to introduce a denoise module between the noise layer and the decoder. The module aims to reduce noise and recover some of the information lost caused by distortion. Additionally, the paper introduces the SE module to fuse the watermarking information pixel-wise and channel dimensions-wise, improving the encoder's efficiency. Experimental results show that our proposed method is comparable to existing models and outperforms state-of-the-art under different noise intensities. In addition, ablation experiments show the superiority of our proposed module.
- [71] arXiv:2405.07291 (replaced) [pdf, ps, html, other]
-
Title: Robust Beamforming with Gradient-based Liquid Neural NetworkXinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane DebbahSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address these issues, we propose a robust gradient-based liquid neural network (GLNN) framework that utilizes ordinary differential equation-based liquid neurons to solve the beamforming problem. Specifically, our proposed GLNN framework takes gradients of the optimization objective function as inputs to extract the high-order channel feature information, and then introduces a residual connection to mitigate the training burden. Furthermore, we use the manifold learning technique to compress the search space of the beamforming problem. These designs enable the GLNN to effectively maintain low complexity while ensuring strong robustness to noisy and highly dynamic channels. Extensive simulation results demonstrate that the GLNN can achieve 4.15% higher spectral efficiency than that of typical iterative algorithms, and reduce the time consumption to only 1.61% that of conventional methods.
- [72] arXiv:2405.09062 (replaced) [pdf, ps, html, other]
-
Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion ModelsEmilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo AkamaSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
- [73] arXiv:2405.09814 (replaced) [pdf, ps, html, other]
-
Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisComments: SIGGRAPH 2024 (Journal Track); Project page: this https URLSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.