Title: Extending the potential of thin-film optoelectronics via optical and photonic engineering
Project summary: Thin-film optoelectronics using solution-processed materials have become a strong research focus in recent decades. These technologies have demonstrated convenience and versatility, due to their solution-processed nature, in a wide range of applications such as solar power harvesting, photodetection, light emitting devices and even lasing. Some of the variants of these materials also enabled and dominate the field of flexible electronics, especially for display technologies, achieving large-scale industrialization and commercialization years ago specifically in applications where their conventional counterparts – bulk semiconductors – are limited. The development of optoelectronics applications using organic materials, colloidal quantum dots, perovskites, etc., has been made possible by research progress in materials and chemical engineering of the active material itself, as well as in optical and photonic engineering in the device architecture and related structures. The focus of this project is mainly on the latter set of approaches applied to lead chalcogenide-based colloidal quantum dot thin films.
Colloidal quantum dots (CQDs) are a type of semiconductor material in the form of nanocrystals (1-10 nm in diameter) of the corresponding bulk material. The spatial confinement of electrons and holes leads to significantly reconstructed energy band structures. Usually this manifests as a series of discrete energy levels above or below the corresponding bulk conduction and valence band edges, instead of the corresponding semi-continuum of states observed in bulk semiconductors. The spacings between the discrete energy levels are highly dependent on the size of the quantum dots, which at the same time determines the properties of optical transitions responsible for absorption (Figure 1b), modulation of the refractive index, etc. In this sense, CQDs are considered “tunable” by controlling the ensemble so that it predominantly consisting CQDs of one desired shape and size.
CQDs are solution-processed materials. The processing of CQDs starts from synthesis using solutions containing metal-organic precursors. The controlled growth of nanocrystals results in a dispersion of pristine CQDs in certain solvents. After that, the CQDs are purified and chemically treated to modify their surface ligands, through a series of precipitation, redispersion, phase transfer and concentration steps. The deposition of films of CQDs onto desired substrates is achieved by solution-compatible techniques such as spin-casting, blade coating and screen printing. A functional CQD film is usually 10-500 nm thick depending on its application and is usually preceded and/or succeeded by the deposition of other electronically functional device layers.
Lead sulfide (PbS) CQDs are widely used for applications involving solar photon absorption and resulting energy conversion. In the example of a CQD solar cell, PbS CQDs with effective band gaps of 1.3 eV are chosen as the active material. The full device utilizes a p-n or p-i-n structure, and a typical device architecture consists of a transparent conductive oxide (TCO) electrode layer, an electron transport layer (ETL), the absorbing PbS CQD film, a hole transport layer (HTL) and metal top electrode. Similar structures are also used in photodetectors and light emitting diodes, with critical layers substituted.
For the first section of the project, we studied and exploited the color reproduction capabilities using reflective interference from CQD solar cells, while maintaining high photon absorption and current generation. The second section is aimed at exploring the possibility of simultaneously controlling the spectral reflection, transmission and absorption of thin film optoelectronics using embedded photonic crystal structures in CQD films and other highly absorptive materials. In the third section, we devised and built a 2D multi-modal scanning characterization system for spatial mapping of photoluminescence (PL), transient photocurrent and transient photovoltage from a realistically large device area with micron-resolution. The last section of the project focuses on economical and scalable solar concentration solutions for CQD and other thin film solar cells.
We mostly limit our discussion and demonstration to PbS CQD solar cells within the
scope of this proposal; however, it is worth pointing out that the techniques and
principles described below could be applied to most optoelectronic materials that share
the solution-compatible deposition and processing procedures.
Title: New Diagnostic and Therapeutic Tools for Intravascular Magnetic Resonance Imaging (IVMRI)
Abstract: Intravascular (IV) magnetic resonance imaging (IVMRI) is a developing technology that uses minimally-invasive MRI coils to guide diagnosis and treatment. The combination of signal-to-noise (SNR) enhancement from the microscopic MRI local coils and the multi-contrast mechanisms provided by MRI has enlarged the possibilities of high-resolution imaging-guided diagnosis and treatment of atherosclerosis and nearby or surrounding cancers. Recent years have seen the development of many advanced MRI techniques including MRI thermometry and real-time MRI, yet the development of procedures that apply these advances to intravascular MRI remain challenging.
Among interventional diagnostic techniques, MRI endoscopy is an IVMRI technique that transfers MRI from the laboratory frame-of-reference to the IV-coil’s frame-of-reference. This enables high-resolution MRI of blood vessels with endoscopic-style functionality. Prior MRI endoscopy work was limited to ~2 frames-per-second (fps), which is not real-time and potentially limiting in clinical applications. Improving the speed of MRI endoscopy further without excessive undersampling artifacts could enable the rapid deployment and advancement of an IVMRI endoscope entirely by MRI guidance to evaluate local, advanced, intra- and extra-vascular disease at high resolution using MRI’s unique multi-contrast and multi-functional assessment capabilities. Furthermore, with its unique capability in high-resolution thermometry, IVMRI is suitable to guide and monitor ablation therapy delivery in disease such as vessel-involving cancers. Prior work using an IVMRI loopless antenna for both MRI and radiofrequency ablation (RFA) was limited in precision and ablated only the tissue in direct contact with the probe. Thus, one goal is to extend IVMRI methods using state-of-the-art real-time MRI acceleration methods to provide MRI endoscopy at a speed comparable to that of existing catherization and optical endoscopy procedures.
A second goal is to provide a minimally-invasive, IV-accessed ablation technology that could provide precision localization and perivascular ablation to render resectable, an otherwise inaccessible or non-resectable cancer with vascular involvement.
To these ends, a Max-Planck Institute (MPI) real-time MRI system employing graphic processing units (GPU) is first adapted to facilitate MRI endoscopy at 10 fps endoscopy with real-time display and is demonstrated in vitro and in vivo. To further improve image quality, we propose to use a neural network (CNN) trained on artifact patterns generated from motionless endoscopy to ameliorate artifacts during real-time imaging. A new method based on generative models and manifold learning is then proposed to optimize image contrast responsive to the varying endoscopic surroundings.
To address the second goal, an intravascular ultrasound ablation transducer is integrated with IVMRI to provide a tool that can also deliver therapy. By integrating an IV high-intensity ultrasound (HIFU) ablation component, the precision and depth of ablation is extended and contact injuries can be avoided. Procedures are developed to evaluate accuracy using ex vivo samples and feasibility is demonstrated in animals in vivo.
Title: Collaborative Regression and Classification via Bootstrapping
Abstract: In modern machine learning problems and applications, the data that we are dealing with have large dimensions as well as amount, making data analysis time-consuming and computationally inefficient. Sparse recovery algorithms are developed to extract the underlining low dimensional structure from the data. Classical signal recovery based on l1 minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. It has shown promising performances in regression and classification. Previous work on Compressed Sensing (CS) theory reveals that when the true solution is sparse and if the number of measurements is large enough, then solutions to l1 converge to the ground truths. In practice, when the number of measurements is low or when the noise level is high or when measurements arrive sequentially in streaming fashion, conventional l1 minimization algorithms tend to struggle in signal recovery.
This research work aims at using multiple local measurements generated from resampling using bootstrap or sub-sampling to efficiently make global predictions to deal with aforementioned challenging scenarios in practice. We develop two main approaches – one extends the conventional bagging scheme in sparse regression from a fixed bootstrapping ratio whereas the other called JOBS applies a support consistency among bootstrapped estimators in a collaborative fashion. We first derive rigorous theoretical guarantees for both proposed approaches and then carefully evaluate them with extensive simulations to quantify their performances. Our algorithms are quite robust compared to the conventional l1 minimization, especially in the scenarios with high measurements noise and low number of measurements. Our theoretical analysis also provides key guidance on how to choose optimal parameters, including bootstrapping ratios and number of collaborative estimates. Finally, we demonstrate that our proposed approaches yield significant performance gains in both sparse regression and classification, which are two crucial problems in the field of signal processing and machine learning.
Title: Brain structure segmentation using multiple MRI pulse sequences
Abstract: Medical image segmentation is the process of delineating anatomical structures of interest in images. Automatic segmentation algorithms applied to brain magnetic resonance images (MRI) allow for the processing of large volumes of data for the study of neurodegenerative diseases. Widely-used segmentation software packages only require T1-weighted (T1-w) MRI and segment cortical and subcortical structures, but are unable to segment structures that do not appear in T1-w MRI. Other MRI pulse sequences have properties that allow for the segmentation of structures that are invisible (or barely discernible) in T1-w MRI.
In this dissertation, three novel medical image segmentation algorithms are proposed to segment the following structures of interest: the thalamus; the falx and tentorium; and the meninges. The common theme that connects these segmentation algorithms is that they use information from multiple MRI pulse sequences because the structures they target are nearly invisible in T1-w MRI. Segmentation of these structures is used in the study of neurodegenerative diseases such as multiple sclerosis and for the development of computational models of the brain for the study of traumatic brain injury.
Our automatic thalamus and thalamic nuclei segmentation algorithm extracts features from T1-w MRI, T2-w MRI, and diffusion tensor imaging (DTI) to train a random forest classifier. Using a leave-one-out cross-validation on nine subjects, our algorithm achieves mean Dice coefficients of 0.897 and 0.902 for the left and right thalami, respectively, which are higher Dice scores than the three state-of-art methods we compared against.
Our falx and tentorium segmentation algorithm uses T1-w MRI and susceptibility-weighted imaging (SWI) to register multiple atlases and fuse their boundary points to generate a subject-specific falx and tentorium. Our method is compared against single-atlas approaches and achieves the lowest mean surface distance of 0.86 mm and 0.99 mm to a manually delineated falx and tentorium, respectively.
Our meninges reconstruction algorithm uses T1-w MRI, T2-w MRI, and a synthetic computed tomography (CT) image generated via convolutional neural network to find two layers of the meninges: the subarachnoid space and dura mater. We compare our method with other brain extraction and intracranial volume estimation algorithms. Our method produces a subarachnoid space segmentation with a mean Dice score of 0.991, which is comparable to the top-performing state-of-art method, and produces a dura mater segmentation with a mean Dice score of 0.983, which is the highest among the compared methods.
Title: Minimally-Invasive Lens-free Computational Microendoscopy
Abstract: Ultra-miniaturized imaging tools are vital for numerous biomedical applications. Such minimally invasive imagers allow for navigation into hard-toreach regions and, for example, observation of deep brain activity in freely moving animals with minimal ancillary tissue damage. Conventional solutions employ distal microlenses. However, as lenses become smaller and thus less invasive they develop greater optical aberrations, requiring bulkier compound designs with restricted field-of-view. In addition, tools capable of 3-dimensional volumetric imaging require components that physically scan the focal plane, which ultimately increases the distal complexity, footprint, and weight. Simply put, minimally-invasive imaging systems have limited information capacity due to their given cross-sectional area.
This thesis explores minimally-invasive lens-free microendoscopy enabled by a successful integration of signal processing, optical hardware, and image reconstruction algorithms. Several computational microendoscopy architectures that simultaneously achieve miniaturization and high information content are presented. Leveraging the computational imaging techniques enables color-resolved imaging with wide field-of-view, and 3-dimensional volumetric reconstruction of an unknown scene using a single camera frame without any actuated parts, further advancing the performance versus invasiveness of microendoscopy.
Title: Soroban: A Mixed-Signal Neuromorphic Processing in Memory Architecture
Abstract: To meet the scientific demand for future data-intensive processing for every day mundane tasks such as searching via images to the uttermost serious health care disease diagnosis in personalized medicine, we urgently need a new cloud computing paradigm and energy efficient i.e. “green” technologies. We believe that a brain-inspired approach that employs unconventional processing offers an alternative paradigm for BIGDATA computing.
My research aims to go beyond the state of the art processor in memory architectures. In the realm of un-conventional processors, charge based computing has been an attractive solution since it’s introduction with charged-coupled device (CCD) imagers in the seventies. Such architectures have been modified to compute-in-memory arrays that have been used for signal processing, neural networks and pattern recognition using the same underlying physics. Other work has utilized the same concept in the charge-injection devices (CIDs), which have also been used for similar pattern recognition tasks. However, these computing elements have not been integrated with the support infrastructure for high speed input/output commensurate with BIGDATA processing streaming applications. In this work, the CID concept is taken to a smaller CMOS 55nm node and has shown promising preliminary results as a multilevel input computing element for hardware inference applications. A mixed signal charge-based vector-vector multiplier (VMM) is explored which computes directly on a common readout line of a dynamic random-access memory (DRAM). Low power consumption and high area density is achieved by storing local parameters in a DRAM computing crossbar.
Title: Semi-supervised training for automatic speech recognition.
Abstract: State-of-the-art automatic speech recognition (ASR) systems use sequence-level objectives like Connectionist Temporal Classification (CTC) and Lattice-free Maximum Mutual Information (LF-MMI) for training neural network-based acoustic models. These methods are known to be most effective with large size datasets with hundreds or thousands of hours of data. It is difficult to obtain large amounts of supervised data other than in a few major languages like English and Mandarin. It is also difficult to obtain supervised data in a myriad of channel and envirormental conditions. On the other hand, large amounts of
unsupervised audio can be obtained fairly easily. There are enormous amounts of unsupervised data available in broadcast TV, call centers and YouTube for many different languages and in many environment conditions. The goal of this research is to discover how to best leverage the available unsupervised data for training acoustic models for ASR.
In the first part of this thesis, we extend the Maximum Mutual Information (MMI) training to the semi-supervised training scenario. We show that maximizing Negative Conditional Entropy (NCE) over lattices from unsupervised data, along with state-level Minimum Bayes Risk (sMBR) on supervised data, in a multi-task architecture gives word error rate (WER) improvements without needing any confidence-based filtering.
In the second part of this thesis, we investigate using lattice-based supervision as numerator graph to incorporate uncertainities in unsupervised data in the LF-MMI training framework. We explore various aspects of creating the numerator graph including splitting lattices for minibatch training, applying tolerance to frame-level alignments, pruning beam sizes, word LM scale and inclusion of pronunciation variants. We show that the WER recovery rate (WRR) of our proposed approach is 5-10\% absolute better than that of the baseline of using 1-best transcript as supervision, and is stable in the 40-60\% range even on large-scale setups and multiple different languages.
Finally, we explore transfer learning for the scenario where we have unsupervised data in a mismatched domain. First, we look at the teacher-student learning approach for cases where parallel data is available in source and target domains. Here, we train a “student” neural network on the target domain to mimic a “teacher” neural network on the source domain data, but using sequence-level posteriors instead of the traditional approach of using frame-level posteriors.
We show that the proposed approach is very effective to deal with acoustic domain mismatch in multiple scenarios of unsupervised domain adaptation — clean to noisy speech, 8kHz to 16kHz speech, close-talk microphone to distant microphone.
Second, we investigate approaches to mitigate language domain mismatch, and show that a matched language model significantly improves WRR. We finally show that our proposed semi-supervised transfer learning approach works effectively even on large-scale unsupervised datasets with 2000 hours of
audio in natural and realistic conditions.
Title: Strategies for Handling Out-of-Vocabulary Words in Automatic Speech Recognition
Abstract: Nowadays, most ASR (automatic speech recognition) systems deployed in industry are closed-vocabulary systems, meaning we have a limited vocabulary of words the system can recognize, and where pronunciations are provided to the system. Words out of this vocabulary are called out-of-vocabulary (OOV) words, for which either pronunciations or both spellings and pronunciations are not known to the system. The basic motivations of developing strategies to handle OOV words are: First, in the training phase, missing or wrong pronunciations of words in training data results in poor acoustic models. Second, in the test phase, words out of the vocabulary cannot be recognized at all, and mis-recognition of OOV words may affect recognition performance of its in-vocabulary neighbors as well. Therefore, this dissertation is dedicated to exploring strategies of handling OOV words in closed-vocabulary ASR.
First, we investigate dealing with OOV words in ASR training data, by introducing an acoustic-data driven pronunciation learning framework using a likelihood-reduction based criterion for selecting pronunciation candidates from multiple sources, i.e. standard grapheme-to-phoneme algorithms (G2P) and phonetic decoding, in a greedy fashion. This framework effectively expands a small hand-crafted pronunciation lexicon to cover OOV words, for which the learned pronunciations have higher quality than approaches using G2P alone or using other baseline pruning criteria. Furthermore, applying the proposed framework to generate alternative pronunciations for in-vocabulary (IV) words improves both recognition performance on relevant words and overall acoustic model performance.
Second, we investigate dealing with OOV words in ASR test data, i.e. OOV detection and recovery. We first conduct a comparative study of a hybrid lexical model (HLM) approach for OOV detection, and several baseline approaches, with the conclusion that the HLM approach outperforms others in both OOV detection and first pass OOV recovery performance. Next, we introduce a grammar-decoding framework for efficient second pass OOV recovery, showing that with properly designed schemes of estimating OOV unigram probabilities, the framework significantly improves OOV recovery and overall decoding performance compared to first pass decoding.
Finally we propose an open-vocabulary word-level recurrent neural network language model (RNNLM) re scoring framework, making it possible to re-score lattices containing recovered OOVs using a word-level RNNLM, that was ignorant of OOVs when it was trained. Above all, the whole OOV recovery pipeline shows the potential of a highly efficient open-vocabulary word-level ASR decoding framework, tightly integrated into a standard WFST decoding pipeline.
Title: Advanced Image Reconstruction and Analysis for Fluorescence Molecular Tomography (FMT) and Positron Emission Tomography (PET)
Abstract: Molecular imaging provides efficient ways to monitor different biological processes noninvasively, and high-quality imaging is necessary in order to fully explore the value of molecular imaging. To this end, advanced image generation algorithms are able to significantly improve image quality and quantitative performance. In this research proposal, we focus on two imaging modalities, fluorescence molecular tomography (FMT) and positron emission tomography (PET), that fall in the category of molecular imaging. Specifically, we studied the following two problems: i) reconstruction problem in FMT and ii) partial volume correction in brain PET imaging.
Reconstruction in FMT: FMT is an optical imaging modality that uses diffuse light for imaging. Reconstruction problem for FMT is highly ill-posed due to photon scattering in biological tissue, and thus, regularization techniques tend to be used to alleviate the ill-posed nature of the problem. Conventional reconstruction algorithms cause oversmoothing which reduces resolution of the reconstructed images. Moreover, a Gaussian model is commonly chosen as the noise model although most FMT systems based on charged-couple device (CCD) or photon multiplier tube (PMT) are contaminated by Poisson noise. In our work, we propose a reconstruction algorithm for FMT using sparsity-initialized maximum-likelihood expectation maximization (MLEM). The algorithm preserves edges by exploiting sparsity, as well as taking Poisson noise into consideration. Through simulation experiments, we compare the proposed method with pure sparse reconstruction method and MLEM with uniform initialization. We show the proposed method holds several advantages compared to the other two methods.
Partial volume correction of brain PET imaging: The so-called partial volume effect (PVE) is caused by the limited resolution of PET systems, reducing quantitative accuracy of PET imaging. Based on the stage of implementation, partial volume correction (PVC) algorithms could be categorized into reconstruction-based and post-reconstruction methods.Post reconstruction PVC methods can be directly implemented on reconstructed PET images and do not require access to raw data or reconstruction algorithms of PET scanners. Many of these methods use anatomical information from MRI to further improve their performance. However, conventional MR guided post-reconstruction PVC methods require segmentation of MR images and assume uniform activity distribution within each segmented region. In this proposal, we develop post-reconstruction PVC method based on deconvolution via parallel level set regularization. The method is implemented with non-smooth optimization based on the split Bregman method. The proposed method incorporates MRI information without requiring segmentation or making any assumption on activity distribution. Simulation experiments are conducted to compare the proposed method with several other segmentationfree method, as well as conventional segmentation-based PVC method. The results show the proposed method outperforms other segmentation-free method and shows stronger resistance to MR information mismatch compared to conventional segmentation-based PVC method.
Note: This is a virtual seminar that will be broadcast in Olin Hall 305. Refreshments will be available outside Olin Hall 305 at 2:30 PM.
Title: Computational infrastructure to improve scientific reproducibility
Abstract: The massive increase in the dimensionality of scientific data and the proliferation of complex data analysis methods has raised increasing concerns about the reproducibility of scientific results in many domains of science. I will first present evidence that analytic flexibility in neuroimaging research is associated with surprising variability in scientific outcomes in the wild, even holding the raw data constant. These findings motivate the development of well-tested software tools for neuroimaging data processing and analysis. I will focus in particular on the role of software development tools such as containerization and continuous integration, which provide the potential to deliver automated and reproducible data analysis at scale. I will also discuss the challenging tradeoffs inherent in the usage of complex software by scientists, and the need for increased transparency and validation of scientific software.
Bio: Russell A. Poldrack is the Albert Ray Lang Professor in the Department of Psychology and Professor (by courtesy) of Computer Science at Stanford University, and Director of the Stanford Center for Reproducible Neuroscience. His research uses neuroimaging to understand the brain systems underlying decision making and executive function. His lab is also engaged in the development of neuroinformatics tools to help improve the reproducibility and transparency of neuroscience, including the Openneuro.org and Neurovault.org data sharing projects and the Cognitive Atlas ontology.