Title: Minimally-Invasive Lens-free Computational Microendoscopy
Abstract: Ultra-miniaturized imaging tools are vital for numerous biomedical applications. Such minimally invasive imagers allow for navigation into hard-toreach regions and, for example, observation of deep brain activity in freely moving animals with minimal ancillary tissue damage. Conventional solutions employ distal microlenses. However, as lenses become smaller and thus less invasive they develop greater optical aberrations, requiring bulkier compound designs with restricted field-of-view. In addition, tools capable of 3-dimensional volumetric imaging require components that physically scan the focal plane, which ultimately increases the distal complexity, footprint, and weight. Simply put, minimally-invasive imaging systems have limited information capacity due to their given cross-sectional area.
This thesis explores minimally-invasive lens-free microendoscopy enabled by a successful integration of signal processing, optical hardware, and image reconstruction algorithms. Several computational microendoscopy architectures that simultaneously achieve miniaturization and high information content are presented. Leveraging the computational imaging techniques enables color-resolved imaging with wide field-of-view, and 3-dimensional volumetric reconstruction of an unknown scene using a single camera frame without any actuated parts, further advancing the performance versus invasiveness of microendoscopy.
Title: Semi-supervised training for automatic speech recognition.
Abstract: State-of-the-art automatic speech recognition (ASR) systems use sequence-level objectives like Connectionist Temporal Classification (CTC) and Lattice-free Maximum Mutual Information (LF-MMI) for training neural network-based acoustic models. These methods are known to be most effective with large size datasets with hundreds or thousands of hours of data. It is difficult to obtain large amounts of supervised data other than in a few major languages like English and Mandarin. It is also difficult to obtain supervised data in a myriad of channel and envirormental conditions. On the other hand, large amounts of
unsupervised audio can be obtained fairly easily. There are enormous amounts of unsupervised data available in broadcast TV, call centers and YouTube for many different languages and in many environment conditions. The goal of this research is to discover how to best leverage the available unsupervised data for training acoustic models for ASR.
In the first part of this thesis, we extend the Maximum Mutual Information (MMI) training to the semi-supervised training scenario. We show that maximizing Negative Conditional Entropy (NCE) over lattices from unsupervised data, along with state-level Minimum Bayes Risk (sMBR) on supervised data, in a multi-task architecture gives word error rate (WER) improvements without needing any confidence-based filtering.
In the second part of this thesis, we investigate using lattice-based supervision as numerator graph to incorporate uncertainities in unsupervised data in the LF-MMI training framework. We explore various aspects of creating the numerator graph including splitting lattices for minibatch training, applying tolerance to frame-level alignments, pruning beam sizes, word LM scale and inclusion of pronunciation variants. We show that the WER recovery rate (WRR) of our proposed approach is 5-10\% absolute better than that of the baseline of using 1-best transcript as supervision, and is stable in the 40-60\% range even on large-scale setups and multiple different languages.
Finally, we explore transfer learning for the scenario where we have unsupervised data in a mismatched domain. First, we look at the teacher-student learning approach for cases where parallel data is available in source and target domains. Here, we train a “student” neural network on the target domain to mimic a “teacher” neural network on the source domain data, but using sequence-level posteriors instead of the traditional approach of using frame-level posteriors.
We show that the proposed approach is very effective to deal with acoustic domain mismatch in multiple scenarios of unsupervised domain adaptation — clean to noisy speech, 8kHz to 16kHz speech, close-talk microphone to distant microphone.
Second, we investigate approaches to mitigate language domain mismatch, and show that a matched language model significantly improves WRR. We finally show that our proposed semi-supervised transfer learning approach works effectively even on large-scale unsupervised datasets with 2000 hours of
audio in natural and realistic conditions.
Title: Strategies for Handling Out-of-Vocabulary Words in Automatic Speech Recognition
Abstract: Nowadays, most ASR (automatic speech recognition) systems deployed in industry are closed-vocabulary systems, meaning we have a limited vocabulary of words the system can recognize, and where pronunciations are provided to the system. Words out of this vocabulary are called out-of-vocabulary (OOV) words, for which either pronunciations or both spellings and pronunciations are not known to the system. The basic motivations of developing strategies to handle OOV words are: First, in the training phase, missing or wrong pronunciations of words in training data results in poor acoustic models. Second, in the test phase, words out of the vocabulary cannot be recognized at all, and mis-recognition of OOV words may affect recognition performance of its in-vocabulary neighbors as well. Therefore, this dissertation is dedicated to exploring strategies of handling OOV words in closed-vocabulary ASR.
First, we investigate dealing with OOV words in ASR training data, by introducing an acoustic-data driven pronunciation learning framework using a likelihood-reduction based criterion for selecting pronunciation candidates from multiple sources, i.e. standard grapheme-to-phoneme algorithms (G2P) and phonetic decoding, in a greedy fashion. This framework effectively expands a small hand-crafted pronunciation lexicon to cover OOV words, for which the learned pronunciations have higher quality than approaches using G2P alone or using other baseline pruning criteria. Furthermore, applying the proposed framework to generate alternative pronunciations for in-vocabulary (IV) words improves both recognition performance on relevant words and overall acoustic model performance.
Second, we investigate dealing with OOV words in ASR test data, i.e. OOV detection and recovery. We first conduct a comparative study of a hybrid lexical model (HLM) approach for OOV detection, and several baseline approaches, with the conclusion that the HLM approach outperforms others in both OOV detection and first pass OOV recovery performance. Next, we introduce a grammar-decoding framework for efficient second pass OOV recovery, showing that with properly designed schemes of estimating OOV unigram probabilities, the framework significantly improves OOV recovery and overall decoding performance compared to first pass decoding.
Finally we propose an open-vocabulary word-level recurrent neural network language model (RNNLM) re scoring framework, making it possible to re-score lattices containing recovered OOVs using a word-level RNNLM, that was ignorant of OOVs when it was trained. Above all, the whole OOV recovery pipeline shows the potential of a highly efficient open-vocabulary word-level ASR decoding framework, tightly integrated into a standard WFST decoding pipeline.
Note: This is a virtual seminar that will be broadcast in Olin Hall 305. Refreshments will be available outside Olin Hall 305 at 2:30 PM.
Title: Computational infrastructure to improve scientific reproducibility
Abstract: The massive increase in the dimensionality of scientific data and the proliferation of complex data analysis methods has raised increasing concerns about the reproducibility of scientific results in many domains of science. I will first present evidence that analytic flexibility in neuroimaging research is associated with surprising variability in scientific outcomes in the wild, even holding the raw data constant. These findings motivate the development of well-tested software tools for neuroimaging data processing and analysis. I will focus in particular on the role of software development tools such as containerization and continuous integration, which provide the potential to deliver automated and reproducible data analysis at scale. I will also discuss the challenging tradeoffs inherent in the usage of complex software by scientists, and the need for increased transparency and validation of scientific software.
Bio: Russell A. Poldrack is the Albert Ray Lang Professor in the Department of Psychology and Professor (by courtesy) of Computer Science at Stanford University, and Director of the Stanford Center for Reproducible Neuroscience. His research uses neuroimaging to understand the brain systems underlying decision making and executive function. His lab is also engaged in the development of neuroinformatics tools to help improve the reproducibility and transparency of neuroscience, including the Openneuro.org and Neurovault.org data sharing projects and the Cognitive Atlas ontology.
Title: Electrets (Dielectrics with quasi-permanent Charges or Dipoles) – A long history and a bright future
Abstract: The history of electrets can be traced back to Thales of Miletus (approx. 624-546 B.C.E.) who reported that pieces of amber (“electron”) attract or repel each other. The science of fundamental electrical phenomena is closely intertwined with the development of electrets which came under such terms as “electrics”, “electrophores”, “charged/poled dielectrics”, etc. until about one century ago. Modern electret research started with Oliver Heaviside (1850-1925), who defined the concept of a “permanently electrized body” and proposed the name “electret” in 1885, and Mototarô Eguchi, who experimentally investigated carnauba wax electrets at the Higher Naval College in Tokyo around 1920. Today, we see a wide range of electret types, electret materials, and electret applications, which are being investigated and developed all over the world in a truly global endeavour. A classification of electrets will be followed by a few examples of useful electret effects and exciting device applications – mainly in the area of electro-mechanical and electro-acoustical transduction which started with the invention of the electret microphone by Sessler and West in the early 1960s. Furthermore, possible synergies between electret research and ultra-high-voltage DC electrical insulation will be mentioned.
Bio: Reimund Gerhard is a Professor of Physics and Astronomy at the University of Potsdam and the current President of the IEEE Dielectrics and Electrical Insulation Society (DEIS). He graduated from the Technical University of Darmstadt as Diplom-Physiker in 1978 and earned his PhD (Doktor-Ingenieur) in Communications Engineering from TU Darmstadt in 1984. From 1985 to 1994, Gerhard was a Research Scientist and Project Manager at the Heinrich-Hertz Institute for Communications Technology (now the Fraunhofer Institute) in Berlin, Germany. He was appointed as a Professor at the University of Potsdam in 1994. From 2004 to 2012, Gerhard served as the Chairman of the Joint Board for the Master-of-Science Program in Polymer Science of FU Berlin, HU Berlin, TU Berlin, and the University of Potsdam. He also served as the Dean of the Faculty of Science at the University of Potsdam from 2008 to 2012, eventually serving as a Senator of the University of Potsdam from 2014 to 2016.
Prof. Gerhard has received many awards and honors over his long career, including an Award (ITG-Preis) from the Information Technology Society (ITG) in the VDE, a silver medal from the Foundation Werner-von-Siemens-Ring, a First Prize Technology Transfer Award Brandenburg, Whitehead Memorial Lecturer of the IEEE CEIDP, and the Award of the EuroEAP Society “for his fundamental scientific contributions in the field of transducers based on dielectric polymers.” He is a Fellow of the American Physical Society (APS) and the Institute of Electrical and Electronics Engineers (IEEE). His research interests include polymer electrets with quasi-permanent space charge, ferro- or piezoelectrets (polymer films with electrically charged cavities), ferroelectric polymers with piezo- and pyroelectric properties, polymer composites with novel property combinations, physical mechanisms of dipole orientation and charge storage, electrically deformable dielectric elastomers (sometimes also called “electro-electrets”), as well as the physics of musical instruments.
Note: There will be a reception after the lecture.
Title: Automated Spore Analysis Using Bright-Field Imaging and Raman Microscopy
Abstract: In 2015, it was determined that the United States Department of Defense had been shipping samples of B. anthracis spores which had undergone gamma irradiation but were not fully inactivated. In the aftermath of this event alternative and orthogonal methods were investigated to analyze spores determine their viability. In this thesis we demonstrate a novel analysis technique that combines bright-field microscopy images with Raman chemical microscopy.
We first developed an image segmentation routine based on the watershed method to locate individual spores within bright-field images. This routine was able to effectively demarcate 97.4% of the Bacillus spores within the bright-field images with minimal over-segmentation. Size and shape measurements, to include major and minor axis and area, were then extracted for 4048 viable spores which showed very good agreement with previously published values. When similar measurements were taken on 3627 gamma-irradiated spores, a statistically significant difference was noted for the minor axis length, ratio of major to minor axis, and total area when compared to the non-irradiated spores. Classification results show the ability to correctly classify 67% of viable spores with an 18% misclassification rate using the bright-field image by thresholding the minimum classification length.
Raman chemical imaging microscopy (RCIM) was then used to measure populations of viable, gamma irradiated, and autoclaved spores of B. anthracis Sterne, B. atrophaeus. B. megaterium, and B. thuringensis kurstaki. Significant spectral differences were observed between viable and inactivated spores due to the disappearance of features associated with calcium dipicolinate after irradiation. Principal component analysis was used which showed the ability to distinguish viable spores of B. anthracis Sterne and B. atrophaeus from each other and the other two Bacillus species.
Finally, Raman microscopy was used to classify mixtures of viable and gamma inactivated spores. A technique was developed that fuses the size and shape characteristics obtained from the bright-field image to preferentially target viable spores. Simulating a scenario of a practical demonstration of the technique was performed on a field of view containing approximately 7,000 total spores of which are only 12 were viable to simulate a sample that was not fully irradiated. Ten of these spores are properly classified while interrogating just 25% of the total spores.
Title: Robust Adaptive Strategies for Myographic Prosthesis Movement Decoding
Abstract: Improving the condition-tolerance, stability, response time, and dexterity of neural prosthesis control strategies are major clinical goals to aid amputees in achieving natural restorative upper-limb function. Currently, the dominant noninvasive neural source for prosthesis motor control is the skin-surface recorded electromyographic (EMG) signal. Decoding movement intentions from EMG is a challenging problem because this signal type is subject to a high degree of interference from noise and conditional influences. As a consequence, much of the movement intention information contained within the EMG signal has remained significantly under-utilized for the purposes of controlling robotic prostheses. We sought to overcome this information deficit through the use of adaptive strategies for machine learning, sparse representations, and signal processing to significantly improve myographic prosthesis control. This body of research represents the current state-of-the-art in condition-tolerant EMG movement classification (Chapter 3), stable and responsive EMG sequence decoding during movement transitions (Chapter 4), and positional regression to reliably control 7 wrist and finger degrees-of-freedom (Chapter 5). To our knowledge, the methods we describe in Chapter 5 elicit the most dexterous, biomimetic, and natural prosthesis control performance ever obtained from the surface EMG signal.
Title: Loss Landscapes of Neural Networks and their Generalization: Theory and Applications
Abstract: In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neural networks have helped achieve significant improvements in computer vision, machine translation, speech recognition, etc. These powerful empirical demonstrations leave a wide gap between our current theoretical understanding of neural networks and their practical performance. The theoretical questions in deep learning can be put under three broad but inter-related themes: 1) Architecture/Representation, 2) Optimization, and 3) Generalization. In this dissertation, we study the landscapes of different deep learning problems to answer questions in the above themes.
First, in order to understand what representations can be learned by neural networks, we study simple Autoencoder networks with one hidden layer of rectified linear units. We connect autoencoders to the well-known problem in signal processing of Sparse Coding. We show that the squared reconstruction error loss function has a critical point at the ground truth dictionary under an appropriate generative model.
Next, we turn our attention to a problem at the intersection of optimization and generalization. Training deep networks through empirical risk minimization is a non-convex problem with many local minima in the loss landscape. A number of empirical studies have observed that “flat minima” for neural networks tend to generalize better than sharper minima. However, quantifying the flatness or sharpness of minima has been an issue due to possible rescaling in neural networks with positively homogenous activations. We use ideas from Riemannian geometry to define a new measure of flatness that is invariant to rescaling. We test the hypothesis that flatter minima generalize better through a number of different experiments on deep networks.
Finally, we apply deep networks to computer vision problems with compressed measurements of natural images and videos. We conduct experiments to characterize the situations in which these networks fail, and those in which they succeed. We train deep networks to perform object detection and classification directly on these compressive measurements of images, without trying to reconstruct the scene first. These experiments are conducted on public datasets as well as datasets specific to a sponsor of our research.
Title: Neural Circuit Mechanisms of Stimulus Selection Underlying Spatial Attention
Thesis Committee: Shreesh P. Mysore, Hynek Hermansky, Mounya Elhilali, Ralph Etienne-Cummings
Abstract: Humans and animals routinely encounter competing pieces of information in their environments, and must continually select the most salient in order to survive and behave adaptively. Here, using computational modeling, extracellular neural recordings, and focal, reversible silencing of neurons in the midbrain of barn owls, we uncovered how two essential computations underlying competitive selection are implemented in the brain: a) the ability to select the most salient stimulus among all pairs of stimulus locations, and b) the ability to signal the most salient stimulus categorically.
We first discovered that a key inhibitory nucleus in the midbrain attention network, called isthmi pars magnocellularis (Imc), encodes visual space with receptive fields that have multiple excitatory hotspots (‘‘lobes’’). Such (previously unknown) multilobed encoding of visual space is necessitated for selection at all location-pairs in the face of scarcity of Imc neurons. Although distributed seemingly randomly, the RF lobe-locations are optimized across the high-firing Imc neurons, allowing them to combinatorially solve selection across space. This combinatorially optimized inhibition strategy minimizes metabolic and wiring costs.
Next, we discovered that a ‘donut-like’ inhibitory mechanism in which each competing option suppresses all options except itself is highly effective at generating categorical responses. It surpasses motifs of feedback inhibition, recurrent excitation, and divisive normalization used commonly in decision-making models. We demonstrated experimentally not only that this mechanism operates in the midbrain spatial selection network in barn owls, but also that it is required for categorical signaling by it. Moreover, the pattern of inhibition in the midbrain forms an exquisitely structured ‘multi-holed’ donut consistent with this network’s combinatorial inhibitory function (computation 1).
Our work demonstrates that the vertebrate midbrain uses seemingly carefully optimized structural and functional strategies to solve challenging computational problems underlying stimulus selection and spatial attention at all location pairs. The neural motifs discovered here represent circuit-based solutions that are generalizable to other brain areas, other forms of behavior (such as decision-making, action selection) as well as for the design of artificial systems (such as robotics, self-driving cars) that rely on the selection of one among many options.
University policy at this present time: Students and faculty CAN attend dissertation defenses as long as there are fewer than 25 people.
Title: Deep Learning Based Novelty Detection
Abstract: In recent years, intelligent systems powered by artificial intelligence and computer vision that perform visual recognition have gained much attention. These systems observe instances and labels of known object classes during training and learn association patterns that can be used during inference. A practical visual recognition system should first determine whether an observed instance is from a known class. If it is from a known class, then the identity of the instance is queried through classification. The former process is commonly known as novelty detection (or novel class detection) in the literature. Given a set of image instances from known classes, the goal of novelty detection is to determine whether an observed image during inference belongs to one of the known classes.
In this thesis, deep learning-based approaches to solve novelty detection is studied under four different settings. In the first two settings, the availability of out-of-distributional data (OOD) is assumed. With this assumption, novelty detection can be studied for cases where there are multiple known classes and a single known class separately. These two problem settings are referred to as Multi-class novelty detection with OOD data and one-class novelty detection with OOD data in the literature, respectively. It is also possible to study this problem in a more constrained setting where only the data from known classes are considered for training. When there exist multiple classes in this setting novelty detection problem is known as Multiple-class novelty detection or Open-set recognition. On the other hand, when only a single class exists it is known as one-class novelty detection.
Finally, we study a practical application of novelty detection in mobile Active Authentication (AA). For a practical AA-based novelty detector, latency and efficiency are as important as the detection accuracy. Solutions are presented for the problem of quickly detecting intrusions with lower false detection rates in mobile AA systems with higher resource efficiency. Bayesian and Minimax versions of the Quickest Change Detection (QCD) algorithms are introduced to quickly detect intrusions in mobile AA systems. These algorithms are extended with an update rule to facilitate low-frequency sensing which leads to low utilization of resources.
Committee Members: Vishal Patel, Trac Tran, Najim Dehak