Title: Exploring scalable coating of inorganic semiconductor inks: the surface structure-property-performance correlations
Abstract: Inorganic semiconductor inks – such as colloidal quantum dots (CQDs) and transition metal oxides (MOs) – can potentially enable low-cost flexible and transparent electronics via ‘roll-to-roll’ printing. Surfaces of these nanometer-sized CQDs and MO ultra-thin films lead to surface phenomenon with implications on film formation during coating, crystallinity and charge transport. In this talk, I will describe my recent efforts aimed at understanding the crucial role of surface structure in these materials using photoemission spectroscopy and X-ray scattering. Time-resolved X-ray scattering helps reveal the various stages during CQD ink-to-film transformation during blade-coating. Interesting insights include evidence of an early onset of CQD nucleation toward self-assembly and superlattice formation. I will close by discussing fresh results which suggest that nanoscale morphology significantly impacts charge transport in MO ultra-thin (≈5 nm) films. Control over crystallographic texture and film densification allows us to achieve high-performing (electron mobility ≈40 cm2V-1s-1), blade-coated MO thin-film transistors.
Bio: Dr. Ahmad R. Kirmani is a Guest Researcher in the Materials Science and Engineering Division, National Institute of Standards and Technology (NIST) in the group of Dr. Dean M. DeLongchamp and Dr. Lee J. Richter. He is exploring scalable coating of inorganic semiconductor inks using X-ray scattering. He received his PhD in Materials Science and Engineering from the King Abdullah University of Science and Technology (KAUST) under the supervision of Prof. Aram Amassian in 2017 for probing the surface structure-property relationship in colloidal quantum dot photovoltaics. He has published 30 articles in high-impact journals such Advanced Materials, ACS Energy Letters and the Nature family, and is also a volunteer science writer for the Materials Research Society (MRS) since the last couple of years and has contributed 10 news articles, opinions and perspectives.
Title: A Theory and Practice of the Lifelong Learnable Forest
Abstract: Since Vapnik’s and Valiant’s seminal papers on learnability, various lines of research have generalized his concept of learning and learners. In this paper, we formally define what it means to be a lifelong learner. Given this definition, we propose the first lifelong learning algorithm with theoretical guarantees that it can perform forward transfer and reverse transfer, while not experiencing catastrophic forgetting. Our algorithm, dubbed Lifelong Learning Forests, outperforms the current state-of-the-art deep lifelong learning algorithm on the CIFAR 10-by-10 challenge problem, despite its simplicity and mathematical tractability. Our approach immediately lends to further algorithmic developments that promise to exceed current performance limits of existing approaches.
Title: “Honey I shrank the microscope!” And Other Adventures in Functional Imaging
Abstract: Imaging the brain in action, in awake freely behaving animals without the confounding effect of anesthetics poses unique design and experimental challenges. Moreover, imaging the evolution of disease models in the preclinical setting over their entire lifetime is also difficult with conventional imaging techniques. This lecture will describe the development and applications of a miniaturized microscope that circumvents these hurdles. This lecture will also describe how image acquisition, data visualization and engineering tools can be leveraged to answer fundamental questions in cancer, neuroscience and tissue engineering applications.
Bio: Dr. Pathak is an ideator, educator and mentor focused on transforming lives through the power of imaging. He received the BS in Electronics Engineering from the University of Poona, India. He received his PhD from the joint program in Functional Imaging between the Medical College of Wisconsin and Marquette University. During his PhD he was a Whitaker Foundation Fellow. He completed his postdoctoral fellowship at the Johns Hopkins University School of Medicine in Molecular Imaging. He is currently Associate Professor of Radiology, Oncology and Biomedical Engineering at Johns Hopkins University (JHU). His research is focused on developing new imaging methods, computational models and visualization tools to ‘make visible’ critical aspects of cancer, neurobiology and tissue engineering. His work has been recognized by multiple journal covers and awards including the Bill Negendank Award from the International Society for Magnetic Resonance in Medicine (ISMRM) given to “outstanding young investigators in cancer MRI” and the Career Catalyst Award from the Susan Komen Breast Cancer Foundation. He serves on review panels for national and international funding agencies, and the editorial boards of imaging journals. He is dedicated to mentoring the next generation of imagers and innovators. He has mentored over sixty students, was the recipient of the ISMRM’s Outstanding Teacher Award in 2014, a 125 Hopkins Hero in 2018 for outstanding dedication to the core values of JHU, and a Career Champion Nominee in 2018 for student career guidance and support.
This presentation happened remotely. Follow this link to view it. Please note that the presentation doesn’t start until 30 minutes into the video.
Title: Learning Spoken Language Through Vision
Abstract: Humans learn spoken language and visual perception at an early age by being immersed in the world around them. Why can’t computers do the same? In this talk, I will describe our work to develop methodologies for grounding continuous speech signals at the raw waveform level to natural image scenes. I will first present self-supervised models capable of jointly discovering spoken words and the visual objects to which they refer, all without conventional annotations in either modality. Next, I will show how the representations learned by these models implicitly capture meaningful linguistic structure directly from the speech signal. Finally, I will demonstrate that these models can be applied across multiple languages, and that the visual domain can function as an “interlingua,” enabling the discovery of word-level semantic translations at the waveform level.
Bio: David Harwath is a research scientist in the Spoken Language Systems group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). His research focuses on multi-modal learning algorithms for speech, audio, vision, and text. His work has been published at venues such as NeurIPS, ACL, ICASSP, ECCV, and CVPR. Under the supervision of James Glass, his doctoral thesis introduced models for the joint perception of speech and vision. This work was awarded the 2018 George M. Sprowls Award for the best Ph.D. thesis in computer science at MIT.
He holds a Ph.D. in computer science from MIT (2018), a S.M. in computer science from MIT (2013), and a B.S. in electrical engineering from UIUC (2010).
This presentation is happening remotely. Click this link as early as 15 minutes before the scheduled start time of the presentation to watch in a Zoom meeting.
Title: Interpretable End-to-End Neural Network for Audio and Speech Processing
Abstract: This talk introduces extensions of the basic end-to-end automatic speech recognition (ASR) architecture by focusing on its integration function to tackle major problems faced by current ASR technologies in adverse environments including cocktail party and data sparseness problems. The first topic is to integrate microphone-array signal processing, speech separation, and speech recognition in a single neural network to realize multichannel multi-speaker ASR for the cocktail party problem. Our architecture is carefully designed to maintain the role of each module as a differentiable subnetwork so that we can jointly optimize the whole network but still keep the interpretability of each subnetwork including the speech separation, speech enhancement, and acoustic beamforming abilities in addition to ASR. The second topic is based on semi-supervised training using cycle-consistency, which enables us to leverage unpaired speech and/or text data by integrating ASR with text-to-speech (TTS) within the end-to end framework. This scheme can be regarded as an interpretable disentanglement of audio signals with explicit decomposition of linguistic characteristics by ASR and speaker and speaking style characteristics by speaker embedding. These explicitly decomposed characteristics are converted back to the original audio signals by neural TTS; thus we form an acoustic feedback loop based on speech recognition and synthesis like human hearing, and both components can be jointly optimized only with the audio data.
This was a virtual seminar that can be viewed by clicking here.
Title: Unifying Human Processes and Machine Models for Spoken Language Interfaces
Abstract: Recent years have witnessed tremendous progress in digital speech interfaces for information access (eg., Amazon’s Alexa, Google Home etc). The commercial success of these applications is hailed as one of the major achievements of the “AI” era. Indeed these accomplishments are made possible only by sophisticated deep learning models trained on enormous amounts of supervised data over extensive computing infrastructure. Yet these systems are not robust to variations (like accent, out of vocabulary words etc), remain uninterpretable, and fail in unexpected ways. Most important of all, these systems cannot be easily extended speech and language disabled users, who would potentially benefit the most from availability of such technologies. I am a speech scientist interested in computational modelling of the human speech communication system towards building intelligent spoken language systems. I will present my research where I’ve tapped into the human speech communication processes to robust build spoken language systems — specifically, theories of phonology and physiological data including cortical signals in humans as they produce fluent speech. The insights from these studies reveal elegant organizational principles and computational mechanisms employed by the human brain for fluent speech production, the most complex of motor behaviors. These findings hold the key to the next revolution in human-inspired, human-compatible spoken language technologies that, besides alleviating the problems faced by current systems, can meaningfully impact the lives of millions of people with speech disability.
Bio: Gopala Anumanchipalli, PhD, is a researcher at the Department of Neurological Surgery and the Weill Institute for Neurosciences at the University of California, San Francisco. His interests in i) understanding neural mechanisms of human speech production towards developing next generation Brain-Computer Interfaces, and ii) Computational modelling of human speech communication mechanisms towards building robust speech technologies. Earlier, Gopala was a postdoctoral fellow at UCSF working with Edward F Chang, MD and has previously received PhD in Language and Information Technologies from Carnegie Mellon University working with Prof. Alan Black on speech synthesis.
This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 11:45 AM EDT.
Title: Deep Learning for Face and Behavior Analytics
Abstract: In this talk I will describe the AI systems we have built for face analysis and complex activity detection. I will describe SfSNet a DCNN that produces accurate decomposition of an unconstrained image of a human face into shape, reflectance and illuminance. We present a novel architecture that mimics lambertian image formation and a training scheme that uses a mixture of labeled synthetic and unlabeled real world images. I will describe our results on the properties of DCNN-based identity features for face recognition. I will show how the DCNN features trained on in-the-wild images form a highly structured organization of image and identity information. I will also describe our results comparing the performance of our state of the art face recognition systems to that of super recognizers and forensic face examiners.
I will describe our system for detecting complex activities in untrimmed security videos. In these videos the activities happen in small areas of the frame and some activities are quite rare. Our system is faster than real time, very accurate and works well with visible spectrum and IR cameras. We have defined a new approach to compute activity proposals.
I will conclude by highlighting future directions of our work.
Bio: Carlos D. Castillo is an assistant research scientist at the University of Maryland Institute for Advanced Computer Studies (UMIACS). He has done extensive work on face and activity detection and recognition for over a decade and has both industry and academic research experience. He received his PhD in Computer Science from the University of Maryland, College Park where he was advised by Dr. David Jacobs. During the past 5 years he has been involved with the UMD teams in IARPA JANUS and IARPA DIVA and DARPA L2M. He was recipient of the best paper award at the International Conference on Biometrics: Theory, Applications and Systems (BTAS) 2016. The software he developed under IARPA JANUS has been transitioned to many USG organizations, including Department of Defense, Department of Homeland Security, and Department of Justice. In addition, the UMD JANUS system is being used operationally by the Homeland Security Investigations (HSI) Child Exploitation Investigations Unit to provide investigative leads in identifying and rescuing child abuse victims, as well as catching and prosecuting criminal suspects. The technologies his team developed provided the technical foundations to a spinoff startup company: Mukh Technologies LLC which creates software for face detection, alignment and recognition. In 2018, Dr. Castillo received the Outstanding Innovation of the Year Award from the UMD Office of Technology Commercialization. His current research interests include face and activity detection and recognition, and deep learning.
Note: This is a virtual presentation. Check this page at a later date for the Zoom meeting room where it will be taking place.
Title: Towards building a clinically-inspired ultrasound innovation hub: Design, Development and Clinical Validation of novel Ultrasound hardware for Imaging, Therapeutics, Sensing and other applications.
Abstract: Ultrasound is a relatively established modality with a number of exciting, yet not fully explored applications, ranging from imaging and image-guided navigation, to tumor ablation, neuro-modulation, piezoelectric surgery, and drug delivery. In this talk, Dr. Manbachi will be discussing some of his ongoing projects aiming to address low-frequency bone sonography, minimally invasive ablation of neuro-oncology and implantable sensors for spinal cord blood flow measurements.