Title: Applications of high-speed optical signal processing in high-dimensional data acquisition
Abstract: Thanks to large bandwidth, and the ability to capture large amount of information in parallel, optical technologies have transformed the way we capture, process, and communicate information. During this talk I will discuss how optical signal processing can be used in conjunction with novel data compression strategies in order to break the decades long bottleneck faced by electronic systems. I will particularly discuss utility of optical signal processing on big data applications ranging from high speed material characterization, to capturing neural signals over large volume at unprecedented depth and speed.
During the first half of this talk I will discuss how we are taking advantage of parallel image acquisition techniques in order to gain a deeper understanding of rapidly evolving combustion events over a broad spectral range. Despite the rich body of scientific research, the volatile nature of the combustion process has presented an obstacle to our understanding of the chemical kinetics involved in flame propagation and evolution. Many combustive reactions occur in the sub mili-second time scale and involve high velocity motion and interaction of fuel reagents. Hyperspectral imaging technologies are an attractive solution which combine high spatial resolution with fine spectral resolution. However, most conventional hyperspectral cameras rely on slow scanning mechanisms and therefore are ill-suited for capturing fast evolving events. The emergence of Compressive Sensing (CS) over the past decade, has opened the doors to acquiring high dimensional signals at high speed. In the first part of this talk I will discuss how novel optical techniques can be combined with CS algorithms to realize Mega Frame hyperspectral imaging platforms for material diagnostics.
The second portion of my talk will focus on high spatio-temporal neural recording applications. Multi-photon microscopy has been a major breakthrough in overcoming optical scattering when imaging individual neurons deep inside the brain of live animals. Despite the impressive image quality and robustness to scattering, point scanning multi-photon microscopes face a fundamental trade-off between the field of view (FOV) and imaging speed. Higher speed, volumetric multi-photon imaging and stimulation technologies have the potential to revolutionize monitoring of neural network activity in vivo. In this part I will discuss our efforts to develop a scalable, volumetric, two-photon neural recording technology that combines rapid, volumetric scanning of a wide illumination field with synchronized high-resolution dynamic spatial patterning within the illumination field. This approach will allow us to both rapidly address large volumes and also achieve high-resolution random access within the sub-regions of the scan. We will leverage the random access capabilities of this hardware to implement compressive and adaptive imaging strategies that maximize the image information acquired for a given time and laser energy.
Title: Control of pattern formation in excitable systems
Abstract: Pattern formation embodies the beauty and complexity of nature. Some patterns like traveling and rotating waves are dynamic, while others such as dots and stripes are static. Both dynamic and static patterns have been observed in a variety of physiological and biological processes such as rotating action potential waves in the brain during sleep, traveling calcium waves in the cardiac muscle, static patterns on the skins of animals, and self-regulated patterns in the animal embryo. Excitable systems represent a class of ultrasensitive systems that are capable of generating different kinds of patterns depending on the interplay between activator and inhibitor dynamics. Through manipulation of different excitable parameters, a diverse array of traveling wave and standing wave patterns can be obtained. In this thesis, I use pattern formation theory to control the excitable systems involved in cell migration and neuroscience to alter the observed phenotype, in an attempt to affect the underlying biological process.
Cell migration is critical in many processes such as cancer metastasis and wound healing. Cells move by extending periodic protrusions of their cortex, and recent years have shown that the cellular cortex is an excitable medium where waves of biochemical species organize the cellular protrusion. Altering the protrusive phenotype can drastically alter cell migration — that can potentially affect critical physiological processes. In the first part of this thesis, I use excitable wave theory to model and predict wave pattern changes in amoeboid cells.
Excitable systems originated in neuroscience, where different patterns of activity reflect different brain states. Sleep is associated with slow waves, while repeated high-frequency waves are associated with epileptic seizures. These patterns arise from the interplay between the cerebral cortex and the thalamus, which form a closed-loop architecture. In the second part of this thesis, I use a three-layer two-dimensional thalamocortical model, to explore the different parameters that may influence different spatio-temporal dynamics on the cortex.
Title: Collaborative Regression and Classification via Bootstrapping
Abstract: In modern machine learning problems and applications, the data that we are dealing with have large dimensions as well as amount, making data analysis time-consuming and computationally inefficient. Sparse recovery algorithms are developed to extract the underlining low dimensional structure from the data. Classical signal recovery based on l1 minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. It has shown promising performances in regression and classification. Previous work on Compressed Sensing (CS) theory reveals that when the true solution is sparse and if the number of measurements is large enough, then solutions to l1 converge to the ground truths. In practice, when the number of measurements is low or when the noise level is high or when measurements arrive sequentially in streaming fashion, conventional l1 minimization algorithms tend to struggle in signal recovery.
This research work aims at using multiple local measurements generated from resampling using bootstrap or sub-sampling to efficiently make global predictions to deal with aforementioned challenging scenarios in practice. We develop two main approaches – one extends the conventional bagging scheme in sparse regression from a fixed bootstrapping ratio whereas the other called JOBS applies a support consistency among bootstrapped estimators in a collaborative fashion. We first derive rigorous theoretical guarantees for both proposed approaches and then carefully evaluate them with extensive simulations to quantify their performances. Our algorithms are quite robust compared to the conventional l1 minimization, especially in the scenarios with high measurements noise and low number of measurements. Our theoretical analysis also provides key guidance on how to choose optimal parameters, including bootstrapping ratios and number of collaborative estimates. Finally, we demonstrate that our proposed approaches yield significant performance gains in both sparse regression and classification, which are two crucial problems in the field of signal processing and machine learning.
Title: Brain structure segmentation using multiple MRI pulse sequences
Abstract: Medical image segmentation is the process of delineating anatomical structures of interest in images. Automatic segmentation algorithms applied to brain magnetic resonance images (MRI) allow for the processing of large volumes of data for the study of neurodegenerative diseases. Widely-used segmentation software packages only require T1-weighted (T1-w) MRI and segment cortical and subcortical structures, but are unable to segment structures that do not appear in T1-w MRI. Other MRI pulse sequences have properties that allow for the segmentation of structures that are invisible (or barely discernible) in T1-w MRI.
In this dissertation, three novel medical image segmentation algorithms are proposed to segment the following structures of interest: the thalamus; the falx and tentorium; and the meninges. The common theme that connects these segmentation algorithms is that they use information from multiple MRI pulse sequences because the structures they target are nearly invisible in T1-w MRI. Segmentation of these structures is used in the study of neurodegenerative diseases such as multiple sclerosis and for the development of computational models of the brain for the study of traumatic brain injury.
Our automatic thalamus and thalamic nuclei segmentation algorithm extracts features from T1-w MRI, T2-w MRI, and diffusion tensor imaging (DTI) to train a random forest classifier. Using a leave-one-out cross-validation on nine subjects, our algorithm achieves mean Dice coefficients of 0.897 and 0.902 for the left and right thalami, respectively, which are higher Dice scores than the three state-of-art methods we compared against.
Our falx and tentorium segmentation algorithm uses T1-w MRI and susceptibility-weighted imaging (SWI) to register multiple atlases and fuse their boundary points to generate a subject-specific falx and tentorium. Our method is compared against single-atlas approaches and achieves the lowest mean surface distance of 0.86 mm and 0.99 mm to a manually delineated falx and tentorium, respectively.
Our meninges reconstruction algorithm uses T1-w MRI, T2-w MRI, and a synthetic computed tomography (CT) image generated via convolutional neural network to find two layers of the meninges: the subarachnoid space and dura mater. We compare our method with other brain extraction and intracranial volume estimation algorithms. Our method produces a subarachnoid space segmentation with a mean Dice score of 0.991, which is comparable to the top-performing state-of-art method, and produces a dura mater segmentation with a mean Dice score of 0.983, which is the highest among the compared methods.
Title: Minimally-Invasive Lens-free Computational Microendoscopy
Abstract: Ultra-miniaturized imaging tools are vital for numerous biomedical applications. Such minimally invasive imagers allow for navigation into hard-toreach regions and, for example, observation of deep brain activity in freely moving animals with minimal ancillary tissue damage. Conventional solutions employ distal microlenses. However, as lenses become smaller and thus less invasive they develop greater optical aberrations, requiring bulkier compound designs with restricted field-of-view. In addition, tools capable of 3-dimensional volumetric imaging require components that physically scan the focal plane, which ultimately increases the distal complexity, footprint, and weight. Simply put, minimally-invasive imaging systems have limited information capacity due to their given cross-sectional area.
This thesis explores minimally-invasive lens-free microendoscopy enabled by a successful integration of signal processing, optical hardware, and image reconstruction algorithms. Several computational microendoscopy architectures that simultaneously achieve miniaturization and high information content are presented. Leveraging the computational imaging techniques enables color-resolved imaging with wide field-of-view, and 3-dimensional volumetric reconstruction of an unknown scene using a single camera frame without any actuated parts, further advancing the performance versus invasiveness of microendoscopy.
Title: Semi-supervised training for automatic speech recognition.
Abstract: State-of-the-art automatic speech recognition (ASR) systems use sequence-level objectives like Connectionist Temporal Classification (CTC) and Lattice-free Maximum Mutual Information (LF-MMI) for training neural network-based acoustic models. These methods are known to be most effective with large size datasets with hundreds or thousands of hours of data. It is difficult to obtain large amounts of supervised data other than in a few major languages like English and Mandarin. It is also difficult to obtain supervised data in a myriad of channel and envirormental conditions. On the other hand, large amounts of
unsupervised audio can be obtained fairly easily. There are enormous amounts of unsupervised data available in broadcast TV, call centers and YouTube for many different languages and in many environment conditions. The goal of this research is to discover how to best leverage the available unsupervised data for training acoustic models for ASR.
In the first part of this thesis, we extend the Maximum Mutual Information (MMI) training to the semi-supervised training scenario. We show that maximizing Negative Conditional Entropy (NCE) over lattices from unsupervised data, along with state-level Minimum Bayes Risk (sMBR) on supervised data, in a multi-task architecture gives word error rate (WER) improvements without needing any confidence-based filtering.
In the second part of this thesis, we investigate using lattice-based supervision as numerator graph to incorporate uncertainities in unsupervised data in the LF-MMI training framework. We explore various aspects of creating the numerator graph including splitting lattices for minibatch training, applying tolerance to frame-level alignments, pruning beam sizes, word LM scale and inclusion of pronunciation variants. We show that the WER recovery rate (WRR) of our proposed approach is 5-10\% absolute better than that of the baseline of using 1-best transcript as supervision, and is stable in the 40-60\% range even on large-scale setups and multiple different languages.
Finally, we explore transfer learning for the scenario where we have unsupervised data in a mismatched domain. First, we look at the teacher-student learning approach for cases where parallel data is available in source and target domains. Here, we train a “student” neural network on the target domain to mimic a “teacher” neural network on the source domain data, but using sequence-level posteriors instead of the traditional approach of using frame-level posteriors.
We show that the proposed approach is very effective to deal with acoustic domain mismatch in multiple scenarios of unsupervised domain adaptation — clean to noisy speech, 8kHz to 16kHz speech, close-talk microphone to distant microphone.
Second, we investigate approaches to mitigate language domain mismatch, and show that a matched language model significantly improves WRR. We finally show that our proposed semi-supervised transfer learning approach works effectively even on large-scale unsupervised datasets with 2000 hours of
audio in natural and realistic conditions.
Title: Strategies for Handling Out-of-Vocabulary Words in Automatic Speech Recognition
Abstract: Nowadays, most ASR (automatic speech recognition) systems deployed in industry are closed-vocabulary systems, meaning we have a limited vocabulary of words the system can recognize, and where pronunciations are provided to the system. Words out of this vocabulary are called out-of-vocabulary (OOV) words, for which either pronunciations or both spellings and pronunciations are not known to the system. The basic motivations of developing strategies to handle OOV words are: First, in the training phase, missing or wrong pronunciations of words in training data results in poor acoustic models. Second, in the test phase, words out of the vocabulary cannot be recognized at all, and mis-recognition of OOV words may affect recognition performance of its in-vocabulary neighbors as well. Therefore, this dissertation is dedicated to exploring strategies of handling OOV words in closed-vocabulary ASR.
First, we investigate dealing with OOV words in ASR training data, by introducing an acoustic-data driven pronunciation learning framework using a likelihood-reduction based criterion for selecting pronunciation candidates from multiple sources, i.e. standard grapheme-to-phoneme algorithms (G2P) and phonetic decoding, in a greedy fashion. This framework effectively expands a small hand-crafted pronunciation lexicon to cover OOV words, for which the learned pronunciations have higher quality than approaches using G2P alone or using other baseline pruning criteria. Furthermore, applying the proposed framework to generate alternative pronunciations for in-vocabulary (IV) words improves both recognition performance on relevant words and overall acoustic model performance.
Second, we investigate dealing with OOV words in ASR test data, i.e. OOV detection and recovery. We first conduct a comparative study of a hybrid lexical model (HLM) approach for OOV detection, and several baseline approaches, with the conclusion that the HLM approach outperforms others in both OOV detection and first pass OOV recovery performance. Next, we introduce a grammar-decoding framework for efficient second pass OOV recovery, showing that with properly designed schemes of estimating OOV unigram probabilities, the framework significantly improves OOV recovery and overall decoding performance compared to first pass decoding.
Finally we propose an open-vocabulary word-level recurrent neural network language model (RNNLM) re scoring framework, making it possible to re-score lattices containing recovered OOVs using a word-level RNNLM, that was ignorant of OOVs when it was trained. Above all, the whole OOV recovery pipeline shows the potential of a highly efficient open-vocabulary word-level ASR decoding framework, tightly integrated into a standard WFST decoding pipeline.
Title: Automated Spore Analysis Using Bright-Field Imaging and Raman Microscopy
Abstract: In 2015, it was determined that the United States Department of Defense had been shipping samples of B. anthracis spores which had undergone gamma irradiation but were not fully inactivated. In the aftermath of this event alternative and orthogonal methods were investigated to analyze spores determine their viability. In this thesis we demonstrate a novel analysis technique that combines bright-field microscopy images with Raman chemical microscopy.
We first developed an image segmentation routine based on the watershed method to locate individual spores within bright-field images. This routine was able to effectively demarcate 97.4% of the Bacillus spores within the bright-field images with minimal over-segmentation. Size and shape measurements, to include major and minor axis and area, were then extracted for 4048 viable spores which showed very good agreement with previously published values. When similar measurements were taken on 3627 gamma-irradiated spores, a statistically significant difference was noted for the minor axis length, ratio of major to minor axis, and total area when compared to the non-irradiated spores. Classification results show the ability to correctly classify 67% of viable spores with an 18% misclassification rate using the bright-field image by thresholding the minimum classification length.
Raman chemical imaging microscopy (RCIM) was then used to measure populations of viable, gamma irradiated, and autoclaved spores of B. anthracis Sterne, B. atrophaeus. B. megaterium, and B. thuringensis kurstaki. Significant spectral differences were observed between viable and inactivated spores due to the disappearance of features associated with calcium dipicolinate after irradiation. Principal component analysis was used which showed the ability to distinguish viable spores of B. anthracis Sterne and B. atrophaeus from each other and the other two Bacillus species.
Finally, Raman microscopy was used to classify mixtures of viable and gamma inactivated spores. A technique was developed that fuses the size and shape characteristics obtained from the bright-field image to preferentially target viable spores. Simulating a scenario of a practical demonstration of the technique was performed on a field of view containing approximately 7,000 total spores of which are only 12 were viable to simulate a sample that was not fully irradiated. Ten of these spores are properly classified while interrogating just 25% of the total spores.
Title: Robust Adaptive Strategies for Myographic Prosthesis Movement Decoding
Abstract: Improving the condition-tolerance, stability, response time, and dexterity of neural prosthesis control strategies are major clinical goals to aid amputees in achieving natural restorative upper-limb function. Currently, the dominant noninvasive neural source for prosthesis motor control is the skin-surface recorded electromyographic (EMG) signal. Decoding movement intentions from EMG is a challenging problem because this signal type is subject to a high degree of interference from noise and conditional influences. As a consequence, much of the movement intention information contained within the EMG signal has remained significantly under-utilized for the purposes of controlling robotic prostheses. We sought to overcome this information deficit through the use of adaptive strategies for machine learning, sparse representations, and signal processing to significantly improve myographic prosthesis control. This body of research represents the current state-of-the-art in condition-tolerant EMG movement classification (Chapter 3), stable and responsive EMG sequence decoding during movement transitions (Chapter 4), and positional regression to reliably control 7 wrist and finger degrees-of-freedom (Chapter 5). To our knowledge, the methods we describe in Chapter 5 elicit the most dexterous, biomimetic, and natural prosthesis control performance ever obtained from the surface EMG signal.
Title: Loss Landscapes of Neural Networks and their Generalization: Theory and Applications
Abstract: In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neural networks have helped achieve significant improvements in computer vision, machine translation, speech recognition, etc. These powerful empirical demonstrations leave a wide gap between our current theoretical understanding of neural networks and their practical performance. The theoretical questions in deep learning can be put under three broad but inter-related themes: 1) Architecture/Representation, 2) Optimization, and 3) Generalization. In this dissertation, we study the landscapes of different deep learning problems to answer questions in the above themes.
First, in order to understand what representations can be learned by neural networks, we study simple Autoencoder networks with one hidden layer of rectified linear units. We connect autoencoders to the well-known problem in signal processing of Sparse Coding. We show that the squared reconstruction error loss function has a critical point at the ground truth dictionary under an appropriate generative model.
Next, we turn our attention to a problem at the intersection of optimization and generalization. Training deep networks through empirical risk minimization is a non-convex problem with many local minima in the loss landscape. A number of empirical studies have observed that “flat minima” for neural networks tend to generalize better than sharper minima. However, quantifying the flatness or sharpness of minima has been an issue due to possible rescaling in neural networks with positively homogenous activations. We use ideas from Riemannian geometry to define a new measure of flatness that is invariant to rescaling. We test the hypothesis that flatter minima generalize better through a number of different experiments on deep networks.
Finally, we apply deep networks to computer vision problems with compressed measurements of natural images and videos. We conduct experiments to characterize the situations in which these networks fail, and those in which they succeed. We train deep networks to perform object detection and classification directly on these compressive measurements of images, without trying to reconstruct the scene first. These experiments are conducted on public datasets as well as datasets specific to a sponsor of our research.