Title: Automated Spore Analysis Using Bright-Field Imaging and Raman Microscopy
Abstract: In 2015, it was determined that the United States Department of Defense had been shipping samples of B. anthracis spores which had undergone gamma irradiation but were not fully inactivated. In the aftermath of this event alternative and orthogonal methods were investigated to analyze spores determine their viability. In this thesis we demonstrate a novel analysis technique that combines bright-field microscopy images with Raman chemical microscopy.
We first developed an image segmentation routine based on the watershed method to locate individual spores within bright-field images. This routine was able to effectively demarcate 97.4% of the Bacillus spores within the bright-field images with minimal over-segmentation. Size and shape measurements, to include major and minor axis and area, were then extracted for 4048 viable spores which showed very good agreement with previously published values. When similar measurements were taken on 3627 gamma-irradiated spores, a statistically significant difference was noted for the minor axis length, ratio of major to minor axis, and total area when compared to the non-irradiated spores. Classification results show the ability to correctly classify 67% of viable spores with an 18% misclassification rate using the bright-field image by thresholding the minimum classification length.
Raman chemical imaging microscopy (RCIM) was then used to measure populations of viable, gamma irradiated, and autoclaved spores of B. anthracis Sterne, B. atrophaeus. B. megaterium, and B. thuringensis kurstaki. Significant spectral differences were observed between viable and inactivated spores due to the disappearance of features associated with calcium dipicolinate after irradiation. Principal component analysis was used which showed the ability to distinguish viable spores of B. anthracis Sterne and B. atrophaeus from each other and the other two Bacillus species.
Finally, Raman microscopy was used to classify mixtures of viable and gamma inactivated spores. A technique was developed that fuses the size and shape characteristics obtained from the bright-field image to preferentially target viable spores. Simulating a scenario of a practical demonstration of the technique was performed on a field of view containing approximately 7,000 total spores of which are only 12 were viable to simulate a sample that was not fully irradiated. Ten of these spores are properly classified while interrogating just 25% of the total spores.
Title: Robust Adaptive Strategies for Myographic Prosthesis Movement Decoding
Abstract: Improving the condition-tolerance, stability, response time, and dexterity of neural prosthesis control strategies are major clinical goals to aid amputees in achieving natural restorative upper-limb function. Currently, the dominant noninvasive neural source for prosthesis motor control is the skin-surface recorded electromyographic (EMG) signal. Decoding movement intentions from EMG is a challenging problem because this signal type is subject to a high degree of interference from noise and conditional influences. As a consequence, much of the movement intention information contained within the EMG signal has remained significantly under-utilized for the purposes of controlling robotic prostheses. We sought to overcome this information deficit through the use of adaptive strategies for machine learning, sparse representations, and signal processing to significantly improve myographic prosthesis control. This body of research represents the current state-of-the-art in condition-tolerant EMG movement classification (Chapter 3), stable and responsive EMG sequence decoding during movement transitions (Chapter 4), and positional regression to reliably control 7 wrist and finger degrees-of-freedom (Chapter 5). To our knowledge, the methods we describe in Chapter 5 elicit the most dexterous, biomimetic, and natural prosthesis control performance ever obtained from the surface EMG signal.
Title: “Honey I shrank the microscope!” And Other Adventures in Functional Imaging
Abstract: Imaging the brain in action, in awake freely behaving animals without the confounding effect of anesthetics poses unique design and experimental challenges. Moreover, imaging the evolution of disease models in the preclinical setting over their entire lifetime is also difficult with conventional imaging techniques. This lecture will describe the development and applications of a miniaturized microscope that circumvents these hurdles. This lecture will also describe how image acquisition, data visualization and engineering tools can be leveraged to answer fundamental questions in cancer, neuroscience and tissue engineering applications.
Bio: Dr. Pathak is an ideator, educator and mentor focused on transforming lives through the power of imaging. He received the BS in Electronics Engineering from the University of Poona, India. He received his PhD from the joint program in Functional Imaging between the Medical College of Wisconsin and Marquette University. During his PhD he was a Whitaker Foundation Fellow. He completed his postdoctoral fellowship at the Johns Hopkins University School of Medicine in Molecular Imaging. He is currently Associate Professor of Radiology, Oncology and Biomedical Engineering at Johns Hopkins University (JHU). His research is focused on developing new imaging methods, computational models and visualization tools to ‘make visible’ critical aspects of cancer, neurobiology and tissue engineering. His work has been recognized by multiple journal covers and awards including the Bill Negendank Award from the International Society for Magnetic Resonance in Medicine (ISMRM) given to “outstanding young investigators in cancer MRI” and the Career Catalyst Award from the Susan Komen Breast Cancer Foundation. He serves on review panels for national and international funding agencies, and the editorial boards of imaging journals. He is dedicated to mentoring the next generation of imagers and innovators. He has mentored over sixty students, was the recipient of the ISMRM’s Outstanding Teacher Award in 2014, a 125 Hopkins Hero in 2018 for outstanding dedication to the core values of JHU, and a Career Champion Nominee in 2018 for student career guidance and support.
Title: Loss Landscapes of Neural Networks and their Generalization: Theory and Applications
Abstract: In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neural networks have helped achieve significant improvements in computer vision, machine translation, speech recognition, etc. These powerful empirical demonstrations leave a wide gap between our current theoretical understanding of neural networks and their practical performance. The theoretical questions in deep learning can be put under three broad but inter-related themes: 1) Architecture/Representation, 2) Optimization, and 3) Generalization. In this dissertation, we study the landscapes of different deep learning problems to answer questions in the above themes.
First, in order to understand what representations can be learned by neural networks, we study simple Autoencoder networks with one hidden layer of rectified linear units. We connect autoencoders to the well-known problem in signal processing of Sparse Coding. We show that the squared reconstruction error loss function has a critical point at the ground truth dictionary under an appropriate generative model.
Next, we turn our attention to a problem at the intersection of optimization and generalization. Training deep networks through empirical risk minimization is a non-convex problem with many local minima in the loss landscape. A number of empirical studies have observed that “flat minima” for neural networks tend to generalize better than sharper minima. However, quantifying the flatness or sharpness of minima has been an issue due to possible rescaling in neural networks with positively homogenous activations. We use ideas from Riemannian geometry to define a new measure of flatness that is invariant to rescaling. We test the hypothesis that flatter minima generalize better through a number of different experiments on deep networks.
Finally, we apply deep networks to computer vision problems with compressed measurements of natural images and videos. We conduct experiments to characterize the situations in which these networks fail, and those in which they succeed. We train deep networks to perform object detection and classification directly on these compressive measurements of images, without trying to reconstruct the scene first. These experiments are conducted on public datasets as well as datasets specific to a sponsor of our research.
Title: Neural Circuit Mechanisms of Stimulus Selection Underlying Spatial Attention
Thesis Committee: Shreesh P. Mysore, Hynek Hermansky, Mounya Elhilali, Ralph Etienne-Cummings
Abstract: Humans and animals routinely encounter competing pieces of information in their environments, and must continually select the most salient in order to survive and behave adaptively. Here, using computational modeling, extracellular neural recordings, and focal, reversible silencing of neurons in the midbrain of barn owls, we uncovered how two essential computations underlying competitive selection are implemented in the brain: a) the ability to select the most salient stimulus among all pairs of stimulus locations, and b) the ability to signal the most salient stimulus categorically.
We first discovered that a key inhibitory nucleus in the midbrain attention network, called isthmi pars magnocellularis (Imc), encodes visual space with receptive fields that have multiple excitatory hotspots (‘‘lobes’’). Such (previously unknown) multilobed encoding of visual space is necessitated for selection at all location-pairs in the face of scarcity of Imc neurons. Although distributed seemingly randomly, the RF lobe-locations are optimized across the high-firing Imc neurons, allowing them to combinatorially solve selection across space. This combinatorially optimized inhibition strategy minimizes metabolic and wiring costs.
Next, we discovered that a ‘donut-like’ inhibitory mechanism in which each competing option suppresses all options except itself is highly effective at generating categorical responses. It surpasses motifs of feedback inhibition, recurrent excitation, and divisive normalization used commonly in decision-making models. We demonstrated experimentally not only that this mechanism operates in the midbrain spatial selection network in barn owls, but also that it is required for categorical signaling by it. Moreover, the pattern of inhibition in the midbrain forms an exquisitely structured ‘multi-holed’ donut consistent with this network’s combinatorial inhibitory function (computation 1).
Our work demonstrates that the vertebrate midbrain uses seemingly carefully optimized structural and functional strategies to solve challenging computational problems underlying stimulus selection and spatial attention at all location pairs. The neural motifs discovered here represent circuit-based solutions that are generalizable to other brain areas, other forms of behavior (such as decision-making, action selection) as well as for the design of artificial systems (such as robotics, self-driving cars) that rely on the selection of one among many options.
University policy at this present time: Students and faculty CAN attend dissertation defenses as long as there are fewer than 25 people.
Title: Deep Learning Based Novelty Detection
Abstract: In recent years, intelligent systems powered by artificial intelligence and computer vision that perform visual recognition have gained much attention. These systems observe instances and labels of known object classes during training and learn association patterns that can be used during inference. A practical visual recognition system should first determine whether an observed instance is from a known class. If it is from a known class, then the identity of the instance is queried through classification. The former process is commonly known as novelty detection (or novel class detection) in the literature. Given a set of image instances from known classes, the goal of novelty detection is to determine whether an observed image during inference belongs to one of the known classes.
In this thesis, deep learning-based approaches to solve novelty detection is studied under four different settings. In the first two settings, the availability of out-of-distributional data (OOD) is assumed. With this assumption, novelty detection can be studied for cases where there are multiple known classes and a single known class separately. These two problem settings are referred to as Multi-class novelty detection with OOD data and one-class novelty detection with OOD data in the literature, respectively. It is also possible to study this problem in a more constrained setting where only the data from known classes are considered for training. When there exist multiple classes in this setting novelty detection problem is known as Multiple-class novelty detection or Open-set recognition. On the other hand, when only a single class exists it is known as one-class novelty detection.
Finally, we study a practical application of novelty detection in mobile Active Authentication (AA). For a practical AA-based novelty detector, latency and efficiency are as important as the detection accuracy. Solutions are presented for the problem of quickly detecting intrusions with lower false detection rates in mobile AA systems with higher resource efficiency. Bayesian and Minimax versions of the Quickest Change Detection (QCD) algorithms are introduced to quickly detect intrusions in mobile AA systems. These algorithms are extended with an update rule to facilitate low-frequency sensing which leads to low utilization of resources.
Committee Members: Vishal Patel, Trac Tran, Najim Dehak
Taking place remotely. Email Belinda Blinkoff for more information.
Title: Engineering Earth-Abundant Colloidal Plasmonic and Semiconductor Nanomaterials for Solar Energy Harvesting and Detection Applications
Abstract: Colloidal nanomaterials have shown intriguing optical and electronic properties, making them important building blocks for a variety of applications, including photocatalysis, photovoltaics, and photodetectors. Their morphology and composition are effective tuning knobs for achieving desirable spectral characteristics for specific applications. In addition, they can be synthesized using solution-processed methods which possess the advantages of low cost, facile fabrication, and compatibility with building flexible devices. There is an ongoing quest for better colloidal materials with superior properties and high natural abundance for commercial viability. This thesis focuses on three such materials classes and applications: 1) studying the photophysical properties of earth-abundant plasmonic alumionum nanoparticles, 2) tailoring the optical profiles of semiconductor quantum dot solar cells with near-infrared sensitivity, and 3) using one-dimensional nanostructures for photodetector applications. A variety of analytical techniques and simulations are employed for characterization of both the morphology and optical properties of the nanostructures and for evaluating the performance of nanomaterial-based optoelectronic devices.
The first experimental section of this thesis consists of a systematic study of electron relaxation dynamics in solution-processed large aluminum nanocrystals. Transient absorption measurement are used to obtain the important characteristic relaxation timescales for each thermalization process. We show that several of the relevant timescales in aluminum differ from those in analogous noble metal nanoparticles and proposed that surface modification could be a useful tool for tuning heat transfer rates between the nanostructures and solvent. Further systematic studies on the relaxation dynamics in aluminum nanoparticles with tunable sizes show size-dependent phonon vibrational and damping characteristics that are influenced by size polydispersity, surface oxidation, and the presence of organic capping layers on the particles. These studies are significant first steps in demonstrating the feasibility of using aluminum nanomaterials for efficient photocatalysis.
The next section summarizes studies on the design and fabrication of multicolored PbS-based quantum dot solar cells. Specifically, thin film interference effects and multi-objective optimization methods are used to generate cell designs with controlled reflection and transmission spectra resulting in programmable device colors or visible transparency. Detailed investigations into the trade-off between the attainable color or transparency and photocurrent are discussed. The results of this study could be used to enable solar cell window-coatings and other controlled-color optoelectronic devices.
The last experimental section of thesis describes work on using 1D antimony selenide nanowires for flexible photodetector applications. A one-pot solution-based synthetic method is developed for producing a molecular ink which allows fabrication of devices on flexible substrates. Thorough characterization of the nanowire composition and morphology are performed. Flexible, broadband antimony selenide nanowire photodetectors are fabricated and show fast response and good mechanical stability. With further tuning of the nanowire size, spectral selectivity should be achievable. The excellent performance of the nanowire photodetectors is promising for the broad implementation of semiconductor inks in flexible photodetectors and photoelectronic switches.
Committee Members: Susanna Thon, Amy Foster, Jin Kang
This presentation happened remotely. Follow this link to view it. Please note that the presentation doesn’t start until 30 minutes into the video.
Title: Learning Spoken Language Through Vision
Abstract: Humans learn spoken language and visual perception at an early age by being immersed in the world around them. Why can’t computers do the same? In this talk, I will describe our work to develop methodologies for grounding continuous speech signals at the raw waveform level to natural image scenes. I will first present self-supervised models capable of jointly discovering spoken words and the visual objects to which they refer, all without conventional annotations in either modality. Next, I will show how the representations learned by these models implicitly capture meaningful linguistic structure directly from the speech signal. Finally, I will demonstrate that these models can be applied across multiple languages, and that the visual domain can function as an “interlingua,” enabling the discovery of word-level semantic translations at the waveform level.
Bio: David Harwath is a research scientist in the Spoken Language Systems group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). His research focuses on multi-modal learning algorithms for speech, audio, vision, and text. His work has been published at venues such as NeurIPS, ACL, ICASSP, ECCV, and CVPR. Under the supervision of James Glass, his doctoral thesis introduced models for the joint perception of speech and vision. This work was awarded the 2018 George M. Sprowls Award for the best Ph.D. thesis in computer science at MIT.
He holds a Ph.D. in computer science from MIT (2018), a S.M. in computer science from MIT (2013), and a B.S. in electrical engineering from UIUC (2010).
This presentation is happening remotely. Click this link as early as 15 minutes before the scheduled start time of the presentation to watch in a Zoom meeting.
Title: Interpretable End-to-End Neural Network for Audio and Speech Processing
Abstract: This talk introduces extensions of the basic end-to-end automatic speech recognition (ASR) architecture by focusing on its integration function to tackle major problems faced by current ASR technologies in adverse environments including cocktail party and data sparseness problems. The first topic is to integrate microphone-array signal processing, speech separation, and speech recognition in a single neural network to realize multichannel multi-speaker ASR for the cocktail party problem. Our architecture is carefully designed to maintain the role of each module as a differentiable subnetwork so that we can jointly optimize the whole network but still keep the interpretability of each subnetwork including the speech separation, speech enhancement, and acoustic beamforming abilities in addition to ASR. The second topic is based on semi-supervised training using cycle-consistency, which enables us to leverage unpaired speech and/or text data by integrating ASR with text-to-speech (TTS) within the end-to end framework. This scheme can be regarded as an interpretable disentanglement of audio signals with explicit decomposition of linguistic characteristics by ASR and speaker and speaking style characteristics by speaker embedding. These explicitly decomposed characteristics are converted back to the original audio signals by neural TTS; thus we form an acoustic feedback loop based on speech recognition and synthesis like human hearing, and both components can be jointly optimized only with the audio data.
This was a virtual seminar that can be viewed by clicking here.
Title: Unifying Human Processes and Machine Models for Spoken Language Interfaces
Abstract: Recent years have witnessed tremendous progress in digital speech interfaces for information access (eg., Amazon’s Alexa, Google Home etc). The commercial success of these applications is hailed as one of the major achievements of the “AI” era. Indeed these accomplishments are made possible only by sophisticated deep learning models trained on enormous amounts of supervised data over extensive computing infrastructure. Yet these systems are not robust to variations (like accent, out of vocabulary words etc), remain uninterpretable, and fail in unexpected ways. Most important of all, these systems cannot be easily extended speech and language disabled users, who would potentially benefit the most from availability of such technologies. I am a speech scientist interested in computational modelling of the human speech communication system towards building intelligent spoken language systems. I will present my research where I’ve tapped into the human speech communication processes to robust build spoken language systems — specifically, theories of phonology and physiological data including cortical signals in humans as they produce fluent speech. The insights from these studies reveal elegant organizational principles and computational mechanisms employed by the human brain for fluent speech production, the most complex of motor behaviors. These findings hold the key to the next revolution in human-inspired, human-compatible spoken language technologies that, besides alleviating the problems faced by current systems, can meaningfully impact the lives of millions of people with speech disability.
Bio: Gopala Anumanchipalli, PhD, is a researcher at the Department of Neurological Surgery and the Weill Institute for Neurosciences at the University of California, San Francisco. His interests in i) understanding neural mechanisms of human speech production towards developing next generation Brain-Computer Interfaces, and ii) Computational modelling of human speech communication mechanisms towards building robust speech technologies. Earlier, Gopala was a postdoctoral fellow at UCSF working with Edward F Chang, MD and has previously received PhD in Language and Information Technologies from Carnegie Mellon University working with Prof. Alan Black on speech synthesis.