# Calendar

Apr
30
Thu
Thesis Proposal: Ke Li
Apr 30 @ 3:00 pm

This presentation is happening remotely. Click this link as early as 15 minutes before the scheduled start time of the presentation to watch in a Zoom meeting.

Title: Context-aware Language Modeling and Adaptation for Automatic Speech Recognition

Abstract: Language models (LMs) are an important component in automatic speech recognition (ASR) and usually trained on transcriptions. Language use is strongly influenced by factors such as domain, topic, style, and user-preference. However, transcriptions from speech corpora are usually too limited to fully capture contextual variability in test domains. And some of the information is only available at test time. It is easily observed that the change of application domains often induces mismatch in lexicon and distribution of words. Even within the same domain, topics can shift and user-preference can vary. These observations indicate that LMs trained purely on transcriptions that may not be well representative for test domains are far from ideal and may severely affect ASR performance. To mitigate the mismatches, adapting LMs to contextual variables is desirable.

The goal of this work is to explore general and lightweight approaches for neural LM adaptation and context-aware modeling for ASR. In the adaptation direction, two approaches are investigated. The first is based on cache models. Although neural LMs outperform n-gram LMs on modeling longer context, previous studies show that some of them, for example, LSTMs, still only capture a relatively short span of context. Cache models that capture relatively long-term self-trigger information have been proved useful for n-gram LMs adaptation. This work extends a fast margin adaptation framework for neural LMs and adapts LSTM LMs in an unsupervised way. Specifically, pre-trained LMs are adapted to cache models estimated from decoded hypotheses. This method is lightweight as it does not require retraining. The second approach is interpolation-based. Linear interpolation is a simple and robust adaptation approach, while it is suboptimal since weights are globally optimized and not aware of local context. To tackle this issue, a mixer model that combines pre-trained neural LMs with dynamic weighting is proposed. Experimental results show that it outperforms finetuning and linear interpolation on most scenarios. As for context-aware modeling, this work proposes a simple and effective way to implicitly integrate cache models into neural LMs. It provides a simple alternative to the pointer sentinel mixture model. Experiments show that the proposed method is more effective on relatively rare words and outperforms several baselines. Future work is focused on analyzing the importance and the effect of various contextual factors on ASR and developing approaches for representing and modeling these factors to improve ASR performance.
May
14
Thu
Thesis Proposal: Arun Nair
May 14 @ 3:00 pm

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. It is advised that you do not enter the meeting until at least 15 minutes before the talk is scheduled to take place.

Title: Machine Learning for Collaborative Signal Processing in Beamforming and Compressed Sensing

Abstract: Life today has become inextricably linked with the many sensors working in concert in our environment, from the webcam and microphone in our laptops to the arrays of wireless transmitters and receivers in cellphone towers. Collaborative signal processing methods tackle the challenge of efficiently processing data from multiple sources. Recently, machine learning methods have become very popular tools for collaborative signal processing, largely due to the success of deep learning. The large volume of data created by multiple sensors pairs well with the data-hungry nature of modern machine learning models, holding great promise for efficient solutions.

This proposal extends ideas from machine learning to problems in collaborative signal processing. Specifically, this work will focus on two collaborative signal processing methods – beamforming and compressed sensing. Beamforming is commonly employed in sensor arrays for directional signal transmission and reception by combining the signals received in the array elements to enhance a signal of interest. On the other hand, compressed sensing is a widely applicable mathematical framework that guarantees exact signal recovery even at sub-Nyquist sampling rates if suitable sparsity and incoherence assumptions are satisfied. Compressed sensing accomplishes this via convex or greedy optimization to fuse the information in a small number of signal measurements.

The first part of this work was motivated by the common experience of attempting to capture a video on a mobile device but having the target of interest contaminated by the surrounding environment (e.g., construction sounds from outside the camera’s field of view). Fusing visual and auditory information, we propose a novel audio-visual zooming algorithm that directionally filters the received audio data using beamforming to focus only on audio originating from within the field of view of the camera. Second, we improve the quality of ultrasound image formation by introducing a novel beamforming framework that leverages the benefits of deep learning. Ultrasound images currently suffer from severe speckle and clutter degradations which cause poor image quality and reduce diagnostic utility. We propose to design a deep neural network to learn end-to-end transformations that extract information directly from raw received US channel data. Finally, we improve upon optimization-based compressed sensing recovery by replacing the slow iterative optimization algorithms with far faster convolutional neural networks.

Jun
5
Fri
Thesis Proposal: Uejima Takeshi
Jun 5 @ 10:00 am

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 9:45 AM EDT.

Title: A Unified Visual Saliency Model for Neuromorphic Implementation

Abstract: Human eyes capture and send large amounts of data from the environment to the brain. However, the visual cortex cannot process all the information in detail at once. To deal with the overwhelming quantity of the input, the early stages of visual processing select a small subset of the input for detailed processing. Because only the fovea has high resolution imaging, the observer needs to move the eyeballs for thorough scene inspection. Therefore, eye movements can be thought as one of the observable outputs of the early visual process in the brain, which represents what is interesting and important for the observer. Modeling how the brain selects important information, and where humans fixate, is an intriguing research topic in neuroscience and computer vision and is generally referred to as visual saliency modeling. Beyond its grave scientific ramifications, a better understanding of this process will improve the effectiveness of graphic arts, advertisements, traffic signs, camouflage and many other applications.

To date, there has been some studies on developing bioinspired saliency models. Russell et al. proposed a biologically plausible visual saliency model called proto-object based saliency model. It has shown successful result to predict human fixation; however, it exclusively works on low-level features; intensity, color, and orientation. Russell et al. model has been extended by addition of a motion channel as well as a disparity (depth) channel. Texture feature, however, has neither been well studied in the visual saliency field, nor been incorporated into a proto-object based model. And no attempt has been made to combine all of these features in one model. Here, we propose an augmented version of the model that incorporates texture, motion, and disparity features.

In addition to designing the unified proto-object based model, we investigate rationality of the visual process in biological system from the viewpoint of efficiency to represent natural stimuli. This study will advance visual saliency modeling and improve the accuracy of human fixation prediction. In addition, it will deepen our knowledge on how the visual cortex deals with complex environment.

Committee Members:

Ralph Etienne-Cummings, Department of Electrical and Computer Engineering

Andreas Andreou, Department of Electrical and Computer Engineering

Philippe Pouliquen, Department of Electrical and Computer Engineering

Jun
18
Thu
Thesis Proposal: Soohyun Lee
Jun 18 @ 3:00 pm

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 2:45 PM EDT.

Title: Optical coherence tomography (OCT) – guided ophthalmic therapy

Abstract: Optical coherence tomography (OCT), which provides cross-sectional images noninvasively with a micro-scale in real-time, has been widely applied for the diagnosis and treatment guidance for ocular diseases.

Selective retina therapy (SRT) is an effective laser treatment method for retinal diseases associated with a degradation of the retinal pigment epithelium (RPE). The SRT selectively targets the RPE, so it reduces negative side effects and facilitates healing of the induced retinal lesions. However, the selection of proper laser energy is challenging because of ophthalmoscopically invisible lesions in the RPE and variance in melanin concentration between patients and even between regions within an eye. In the first part of this work, we propose and demonstrate SRT monitoring and temperature estimation based on speckle variance OCT (svOCT) for dosimetry control. SvOCT quantifies speckle pattern variation caused by moving particles or structural changes in biological tissues. We find that the svOCT peak values have a reliable correlation with the degree of retinal lesion formation. The temperature at the neural retina and RPE is estimated from the svOCT peak values using numerically calculated temperature, which is consistent with the observed lesion creation.

In the second part, we propose to develop a hand-held subretinal-injector actively guided by a common-path OCT (CP-OCT) distal sensor. Subretinal injection delivers drug or stem cells in the space between RPE and photoreceptor layers, so it can directly affect resident cell and tissues in the subretinal space. The technique requires high stability and dexterity of surgeon due to fine anatomy of the retina, and it is challenging because of physiological motions of surgeons like hand tremor. We mainly focus on two aspects of the CP-OCT guided subretinal-injector: (i) A high-performance fiber probe based on high index epoxy lensed-fiber to enhance the CP-OCT retinal image quality in a wet environment; (ii) Automated layer identification and tracking: Each retinal layer boundary, as well as retinal surface, is tracked using convolutional neural network (CNN)-based segmentation for accurate localization of a needle. The CNN performing retinal layer segmentation is integrated into the CP-OCT system for targeted layer distance sensing, and the CP-OCT distal sensor guided system is tested on ex vivo bovine retina.

Sep
3
Thu
Thesis Proposal: Jonathan Jones
Sep 3 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Fine-grained activity recognition for assembly videos

Abstract: When a collaborative robot is working with a human partner to build a piece of furniture or an industrial part, the robot must be able to perceive which parts are connected and where, and it must be able to reason about how these connections can change as the result of its partner’s actions. This need can also arise in industrial process monitoring and manufacturing applications, where an automated system verifies a product as it progresses through the assembly line. These assembly processes require systems that can reason geometrically and temporally, relating the structure of an assembly to the manipulation actions that created it.

Grounded in a behavioral study of spatial cognition, this proposal combines methods for physical and temporal reasoning to enable the analysis and automated perception of assembly actions. We develop a temporal model that relates manipulation actions to the structures they produce and describe its use in enabling fine-grained behavioral analyses. Then, we apply our sequence model to recognize assembly actions in a variety of assembly scenarios. Finally, we describe a method for part-based reasoning that makes our approach robust to occluded and previously unseen assemblies.

Committee Members

Sanjeev Khudanpur, Department of Electrical and Computer Engineering

Greg Hager, Department of Computer Science

Vishal Patel, Department of Electrical and Computer Engineering

Sep
10
Thu
Thesis Proposal: Vishwanath Sindagi
Sep 10 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Single Image-based Crowd Counting Using Deep Learning Techniques

Abstract: With ubiquitous usage of surveillance cameras and advances in computer vision, crowd scene analysis has gained a lot of interest in the recent years. In this work, we focus on the task of estimating crowd count and high-quality density maps which has wide applications in video surveillance, traffic monitoring, public safety, urban planning, scene understanding and flow monitoring. Also, the methods developed for crowd counting can be extended to counting tasks in other fields such as cell microscopy, vehicle counting, environmental survey, etc. The task of crowd counting and density estimation has seen significant progress in the recent years. However, due to the presence of various complexities such as occlusions, high clutter, non-uniform distribution of people, non-uniform illumination, intra-scene and inter-scene variations in appearance, scale and perspective, the resulting accuracies are far from optimal. Furthermore, existing methods tend to perform poorly on datasets that are different from the dataset used for training the models.

In this work, we specifically address two of the major issues plaguing the crowd counting community: (i) scale variations and (ii) poor cross-dataset performance. In order to address the problem of scale variations, we analyze existing scale-aware counting models and identify that their poor performance is due to the lack of contextual information and the poor quality of predicted density maps. We propose to overcome these issues by incorporating multiple context cues into the learning process, and additionally improving the quality of the predicted density maps using adversarial training. Finally, we explore the use of contextual information as weak image-level labels to improve cross-dataset performance.

Committee Members

Rama Chellappa, Department of Electrical and Computer Engineering

Carlos Castillo, Department of Electrical and Computer Engineering

Vishal Patel, Department of Electrical and Computer Engineering

Oct
15
Thu
Thesis Proposal: Niharika Shimona D’Souza
Oct 15 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Mapping Brain Connectivity to Behavior: from Network Optimization Frameworks to Deep-Generative Hybrid Models

Abstract: Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder characterized by multiple impairments and levels of disability that vary widely across the ASD spectrum. Currently, the most common methods of quantifying symptom severity are almost solely based on a trained clinician’s evaluation. Recently, neuroimaging techniques such as resting state functional MRI (rs-fMRI) and Diffusion Tensor Imaging (DTI) have been gaining popularity for studying aberrant brain connectivity in ASD. My thesis aims at linking the symptomatic characterization of ASD with the functional and structural organization of a typical patient’s brain as given by rs-fMRI and DTI respectively. My talk is organised into two main parts, as follows:

Network Optimization Models for rs-fMRI connectomics and clinical severity:
Analysis of a multi-subject rs-fMRI imaging study often begins at the group level, for example, estimating group-averaged functional connectivity across all subjects. The failure of data-driven machine learning techniques such as PCA, k-PCA, SVMs etc. are largely attributed to their failure at capturing both the group structure and the individual patient variability, due to which they fail to generalize to unseen patients. To overcome these limitations, we developed a matrix factorization technique to represent the rs-fMRI correlation matrices by decomposing them into a sparse set of representative subnetworks modeled by rank one outer products. The subnetworks are combined using patient-specific non-negative coefficients. The network representations are fixed across the entire group, however, the strength of the subnetworks can vary across individuals. We significantly extend prior work in the area by using these very network coefficients to simultaneously predict behavioral measures via techniques ranging from simple linear regression models to parametric kernel methods, to Artificial Neural Networks (ANNs). The main novelty of the algorithms lies in jointly optimizing for the regression/ANN weights in conjunction with the rs-fMRI matrix factors. By leveraging techniques from convex and non-convex optimization, these frameworks significantly outperform several state-of-the art machine learning, graph theoretic and deep learning baselines at generalization to unseen patients.

Deep-Generative Hybrid Frameworks for Integrating Multimodal and Dynamic Connectivity with Behavior:
There is now growing evidence that functional connectivity between regions is a dynamically process evolving over a static anatomical connectivity profile, and that modeling this evolution is crucial to understanding ASD. Thus, we propose an integrated deep-generative framework, that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract predictive biomarkers of a disease. The generative part of our framework is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying patient-specific loadings. This matrix factorization is guided by the DTI tractography matrices to learn anatomically informed connectivity profiles. The deep part of our framework is an LSTM-ANN block, which models the temporal evolution of the patient sr-DDL loadings to predict multidimensional clinical severity. Once again, our coupled optimization procedure collectively estimates the basis networks, the patient-specific dynamic loadings, and the neural network weights. Our hybrid model outperforms state-of-the-art baselines in a cross validated setting and extracts interpretable multimodal neural signatures of brain dysfunction in ASD.

In recent years, graph neural networks have shown great promise in brain connectivity research due to their ability to underscore subtle interactions between communicating brain regions while exploiting the underlying hierarchy of brain organization. To conclude, I will present some ongoing explorations based on end-to-end graph convolutional networks that directly model the evolution of the rs-fMRI signals/connectivity patterns over the underlying anatomical DTI graphs.

Committee Members

Archana Venkataraman, Department of Electrical and Computer Engineering

Rene Vidal, Department of Biomedical Engineering

Carey E. Priebe, Department of Applied Mathematics & Statistics

Stewart Mostofsky, Director of Center for Neurodevelopmental and Imaging Research, Kennedy Krieger Institute

Kilian Pohl, Program Director, Image Analysis, Center for Health Sciences,and Biomedical Computing, SRI International; Associate Professor of Psychiatry and Behavioral Sciences, Stanford University

Nov
5
Thu
Thesis Proposal: Jeff Craley
Nov 5 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Localizing Seizure Foci with Deep Neural Networks and Graphical Models

Abstract: Worldwide estimates of the prevalence of epilepsy range from 1-3% of the total population, making it one of the most common neurological disorders. With its wide prevalence and dramatic effects on quality of life, epilepsy represents a large and ongoing public health challenge. Critical to the treatment of focal epilepsy is the localization of the seizure onset zone. The seizure onset zone is defined as the region of the cortex responsible for the generation of seizures. In the clinic, scalp electroencephalography (EEG) recording is the first modality used to localize the seizure onset zone.

My work focuses on developing machine learning techniques to localize this zone from these recordings. Using Bayesian techniques, I will present graphical models designed to captures the observed spreading of seizures in clinical EEG recordings. These models directly encode clinically observed seizure spreading phenomena to capture seizure onset and evolution. Using neural networks, the raw EEG signal is evaluated is evaluated for seizure activity. In this talk I will propose extensions to these techniques employing semi-supervised learning and architectural improvements for training sophisticated neural networks designed to analyze scalp EEG signals. In addition, I will propose modeling improvements to current graphical models for evaluating the confidence of localization results.

Committee Members

Archana Venkataraman (Department of Electrical and Computer Engineering)

Sri Sarma (Department of Biomedical Engineering)

Rene Vidal (Department of Biomedical Engineering)

Richard Leahy (Department of Electrical Engineering Systems – University of Southern California)

Thesis Proposal: Yan Jiang
Nov 5 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Leveraging Inverter-Based Frequency Control in Low-Inertia Power Systems

Abstract: The shift from conventional synchronous generation to renewable converter-interfaced sources has led to a noticeable degradation of power system frequency dynamics. Fortunately, recent technology advancements in power electronics and electric storage facilitate the potential to enable higher renewable energy penetration by means of inverter-interfaced storage units. With proper control approaches, fast inverter dynamics can ensure the rapid response of storage units to mitigate degradation. A straightforward choice is to emulate the damping effect and/or inertial response of synchronous generators through droop control or virtual inertia, yet they do not necessarily fully exploit the benefits of inverter-interfaced storage units. For instance, droop control sacrifices steady-state effort share to improve dynamic performance, while virtual inertia amplifies frequency measurement noise. This work thus seeks to challenge this naive choice of mimicking synchronous generator characteristics and instead advocate for a principled control design perspective. To achieve this goal, we build our analysis upon quantifying power network dynamic performance using $\mathcal L_2$ and $\mathcal L_\infty$ norms so as to perform a systematic study evaluating the effect of different control approaches on both frequency response metrics and storage economic metrics. The main contributions of this project will be as follows: (i) We will propose a novel dynamic droop control approach, for grid following inverters, that can be tuned to achieve low noise sensitivity, fast synchronization, and Nadir elimination, without affecting the steady-state performance; (ii) We will propose a new frequency shaping control approach that allows to trade-off between the rate of change of frequency (RoCoF) and storage conrol effort; (iii) We will further extend the proposed solutions to operate in a grid-forming setting that is suitable for a non-stiff power grid where the amplitude and frequency of grid voltage is not well-regulated.

Committee Members

Enrique Mallada (Department of Electrical & Computer Engineering)

Pablo A. Iglesias (Department of Electrical & Computer Engineering)

Dennice F. Gayme (Department of Mechanical Engineering)

Nov
19
Thu
Thesis Proposal: Puyang Wang
Nov 19 @ 3:00 pm

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Accelerating Magnetic Resonance Imaging using Convolutional Recurrent Neural Networks

Abstract: Fast and accurate MRI image reconstruction from undersampled data is critically important in clinical practice. Compressed sensing based methods are widely used in image reconstruction but the speed is slow due to the iterative algorithms. Deep learning based methods have shown promising advances in recent years. However, recovering the fine details from highly undersampled data is still challenging. Moreover, Current protocol of Amide Proton Transfer-weighted (APTw) imaging commonly starts with the acquisition of high-resolution T2-weighted (T2w) images followed by APTw imaging at particular geometry and locations (i.e. slice) determined by the acquired T2w images. Although many advanced MRI reconstruction methods have been proposed to accelerate MRI, existing methods for APTw MRI lack the capability of taking advantage of structural information in the acquired T2w images for reconstruction. In this work, we introduce a novel deep learning-based method with Convolutional Recurrent Neural Networks (CRNN) to reconstruct the image from multiple scales. Finally, we explore the use of the proposed Recurrent Feature Sharing (RFS) reconstruction module to utilize intermediate features extracted from the matched T2w image by CRNN so that the missing structural information can be incorporated into the undersampled APT raw image thus effectively improving the image quality of the reconstructed APTw image.

Committee Members

Vishal M. Patel, Department of Electrical and Computer Engineering

Rama Chellappa, Department of Electrical and Computer Engineering