Calendar

Jun
29
Tue
Dissertation Defense: Yan Jiang
Jun 29 @ 1:00 pm
Dissertation Defense: Yan Jiang

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Leveraging Inverter-Interfaced Energy Storage for Frequency Control in Low-Inertia Power Systems

Abstract: The shift from conventional synchronous generation to renewable inverter-interfaced sources has led to a noticeable degradation of frequency dynamics in power systems, mainly due to a loss of inertia. Fortunately, the recent technology advancement and cost reduction in energy storage facilitate the potential for higher renewable energy penetration via inverter-interfaced energy storage. With proper control laws imposed on inverters, the rapid power-frequency response from energy storage contributes to mitigating the degradation. A straightforward choice is to emulate the droop response and/or inertial response of synchronous generators through droop control (DC) or virtual inertia (VI), yet they do not necessarily fully exploit the benefits of inverter-interfaced energy storage. This thesis thus seeks to challenge this naive choice of mimicking synchronous generator characteristics by advocating for a principled control design perspective.

To achieve this goal, we build an analysis framework for quantifying the performance of power systems using signal and system norms, within which we perform a systematic study to evaluate the effect of different control laws on both frequency response metrics and storage economic metrics. More precisely, under a mild yet insightful proportionality assumption, we are able to perform a modal decomposition which allows us to get closed-form expressions or conditions for synchronous frequency, Nadir, rate of change of frequency (RoCoF), synchronization cost, frequency variance, and steady-state effort share. All of them pave the way for a better understanding of the sensitivities of various performance metrics to different control laws.

Our analysis unveils several limitations of traditional control laws, such as the inability of DC to improve the dynamic performance without sacrificing the steady-state performance and  the unbounded frequency variance introduced by VI in  the presence of frequency measurement noise. Therefore, rather than clinging to the idea of imitating synchronous generator behavior via inverter-interfaced energy storage, we prefer searching for better solutions.

We first propose dynam-i-c Droop control (iDroop)—inspired by the classical lead/lag compensator—which is proved to enjoy many good properties. First of all, the added degrees of freedom in iDroop allow to decouple the dynamic performance improvement from the steady-state performance. In addition, the lead/lag property of iDroop makes it less sensitive to stochastic power fluctuations and frequency measurement noise. Last but not least, iDroop can also be tuned either to achieve the zero synchronization cost or to achieve the Nadir elimination, by which we mean to remove the overshoot in the transient system frequency. Particularly, the Nadir elimination tuning of iDroop exhibits the potential for a balance among various performance metrics in reality. However, iDroop has no control over the RoCoF, which is undesirable in low-inertia power systems for the risk of falsely triggering protection.

We then propose frequency shaping control (FS)—an extension of iDroop—whose most outstanding feature is its ability to shape the system frequency dynamics following a sudden power imbalance into a first-order one with the specified synchronous frequency and RoCoF by adjusting two independent control parameters respectively.

We finally validate theoretical results through extensive numerical experiments performed on a more realistic power system test case that violates the proportionality assumption, which clearly confirms that our proposed control laws outperform the traditional ones in an overall sense.

Committee Members

  • Enrique Mallada, Department of Electrical and Computer Engineering
  • Pablo A. Iglesias, Department of Electrical and Computer Engineering
  • Dennice F. Gayme, Department of Mechanical Engineering
  • Petr Vorobev, Center for Energy Science and Technology, Skolkovo Institute of Science and Technology
Jun
30
Wed
Dissertation Defense: Ashwin Bellur
Jun 30 @ 10:00 am
Dissertation Defense: Ashwin Bellur

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Bio-Mimetic Sensory Mapping with Attention for Auditory Scene Analysis

Abstract: The human auditory system performs complex auditory tasks such as having a conversation in a busy cafe or picking the melodic line of a particular instrument in an ensemble orchestra, with remarkable ease. The human auditory system also exhibits the ability to effortlessly adapt to constantly changing conditions and novel stimulus. The human auditory system achieves these through complex neuronal processes. First the low dimensional signal representing the acoustic stimulus is mapped to a higher dimensional space through a series of feed-forward neuronal transformations; wherein the different auditory objects in the scene are discernible. These feed-forward processes are then further complemented by the top-down processes like attention, driven by the cognitive regions to modulate the feed-forward processes in a manner that shines the spotlight on the object of interest; the interlocutor in the example of a busy cafe or the instrument of interest in the ensemble orchestra.

In this work, we explore leveraging these mechanisms observed in the mammalian brain, within computational frameworks, for addressing various auditory scene analysis tasks such as speech activity detection, environmental sound classification and source separation. We develop bio-mimetic computational strategies to model the feed-forward sensory mapping processes as well as the corresponding complementary top-down mechanisms capable of modulating the feed-forward processes during attention.

In the first part of this work, we show using Gabor filters as an approximation for the feed-forward processes, that retuning the feed-forward processes under top-down attentional feedback are extremely potent in enabling robust behavior in detecting speech activity. We introduce the notion of memory to represent prior knowledge of the acoustic objects and show that memories of objects can be used to deploy the necessary top-down feedback. Next, we expand the feed-forward processes using data-driven distributed deep belief system consisting of multiple streams to capture the stimulus from different spectrotemporal resolutions, a feature observed in the human auditory system. We show that such a distributed system with inherent redundancies, further complemented by top-down attentional mechanisms using distributed object memories allow for robust classification of environmental sounds in mismatched conditions. Finally, we show that incorporating the ideas of distributed processing and attentional mechanisms using deep neural networks leads to state-of-the-art performance for even complex tasks such as source separation. Further, we show that in such a distributed system, the sum of the parts are better than the individual parts and that this aspect can be used to generate real-time top-down feedback; which in turn can be used to adapt the network to novel conditions during inference.

Overall, the results of the work show that leveraging theses biologically inspired mechanisms within computational frameworks lead to enhanced robustness and adaptability to novel conditions, traits of the human auditory system that we sought to emulate.

Committee Members

Mounya Elhilali, Department of Electrical and Computer Engineering

Najim Dehak, Department of Electrical and Computer Engineering

Rama Chellappa, Department of Electrical and Computer Engineering

Dissertation Defense: Soohyun Lee
Jun 30 @ 2:00 pm
Dissertation Defense: Soohyun Lee

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Optical coherence tomography (OCT) – guided ophthalmic therapy

Abstract: Optical coherence tomography (OCT), which provides cross-sectional images noninvasively with a micro-scale in real-time, has been widely applied for the diagnosis and treatment guidance for various ocular diseases.

In the first part of this work, we develop a hand-held subretinal-injector actively guided by a common-path OCT (CP-OCT) distal sensor. Subretinal injection is becoming increasingly prevalent in both scientific research and clinical communities as an efficient way of treating retinal diseases. It delivers drug or stem cells in the space between RPE and photoreceptor layers and, thus, directly affect resident cell and tissues in the subretinal space. However, the technique requires high stability and dexterity of surgeon due to fine anatomy of the retina, and it is challenging because of physiological motions of surgeons like hand tremor. We mainly focus on two aspects of the CP-OCT guided subretinal-injector: (i) A high-performance fiber probe based on high-index epoxy lensed-fiber to enhance the CP-OCT retinal image quality; (ii) Automated layer identification and tracking: Each retinal layer boundary, as well as retinal surface, is tracked using 1D convolutional neural network (CNN)-based segmentation on A-scans for accurate localization of a needle. The CNN model is integrated into the CP-OCT system for real-time target boundary distance sensing, and unwanted axial motions are compensated based on the target boundary tracking. The CP-OCT distal sensor guided system is tested on ex vivo bovine retina and achieves micro-scale depth targeting accuracy, showing its promising possibility for clinical application.

In the second part, we propose and demonstrate selective retina therapy (SRT) monitoring and temperature estimation based on speckle variance OCT (svOCT) for dosimetry control. SRT is an effective laser treatment method for retinal diseases associated with a degradation of the retinal pigment epithelium (RPE). The SRT selectively targets the RPE, so it reduces negative side effects and facilitates healing of the induced retinal lesions. However, the selection of proper laser energy is challenging because of ophthalmoscopically invisible lesions in the RPE and variance in melanin concentration between patients and even between regions within an eye. SvOCT quantifies speckle pattern variation caused by moving particles or structural changes in biological tissues. SvOCT images were calculated as interframe intensity variance of the sequence, and they show abrupt speckle variance change induced by laser pulse irradiation. We find that svOCT peak values have a reliable correlation with the degree of retinal lesion formation. The temperature at the neural retina and RPE is estimated from the svOCT peak values using numerically calculated temperature, which is consistent with the observed lesion creation.

Committee Members

  • Jin U. Kang, Department of Electrical and Computer Engineering
  • Israel Gannot, Department of Electrical and Computer Engineering
  • Mark Foster, Department of Electrical and Computer Engineering
Aug
6
Fri
Closing Ceremonies for Computational Sensing and Medical Robotics (CSMR) REU
Aug 6 @ 9:00 am – 3:00 pm

The closing ceremonies of the Computational Sensing and Medical Robotics (CSMR) REU are set to take place Friday, August 6 from 9am until 3pm at this Zoom link. Seventeen undergraduate students from across the country are eager to share the culmination of their work for the past 10 weeks this summer.

The schedule for the day is listed below, but each presentation is featured in more detail in the program. Please invite your students and faculty, and feel free to distribute this flyer to advertise the event.

We would love for everyone to come learn about the amazing summer research these students have been conducting!

 

2021 REU Final Presentations
Time Presenter Project Title Faculty Mentor Student/Postdoc/Research Engineer Mentors
9:00  

Ben Frey

 

Deep Learning for Lung Ultrasound Imaging of COVID-19 Patients Muyinatu Bell Lingyi Zhao
9:15  

Camryn Graham

 

Optimization of a Photoacoustic Technique to Differentiate Methylene Blue from Hemoglobin Muyinatu Bell Eduardo Gonzalez
9:30  

Ariadna Rivera

 

Autonomous Quadcopter Flying and Swarming Enrique Mallada Yue Shen
9:45  

Katie Sapozhnikov

 

Force Sensing Surgical Drill Russell Taylor Anna Goodridge
10:00  

Savannah Hays

 

Evaluating SLANT Brain Segmentation using CALAMITI Jerry Prince Lianrui Zuo
10:15  

Ammaar Firozi

 

Robustness of Deep Networks to Adversarial Attacks René Vidal Kaleab Kinfu, Carolina Pacheco
10:30 Break
10:45  

Karina Soto Perez

 

Brain Tumor Segmentation in Structural MRIs Archana Venkataraman Naresh Nandakumar
11:00  

Jonathan Mi

 

Design of a Small Legged Robot to Traverse a Field of Multiple Types of Large Obstacles Chen Li Ratan Othayoth, Yaqing Wang, Qihan Xuan
11:15  

Arko Chatterjee

 

Telerobotic System for Satellite Servicing Peter Kazanzides, Louis Whitcomb, Simon Leonard Will Pryor
11:30  

Lauren Peterson

 

Can a Fish Learn to Ride a Bicycle? Noah Cowan Yu Yang
11:45  

Josiah Lozano

 

Robotic System for Mosquito Dissection Russel Taylor,

Lulian Lordachita

Anna Goodridge
12:00  

Zulekha Karachiwalla

 

Application of dual modality haptic feedback within surgical robotic Jeremy Brown
12:15 Break
1:00  

James Campbell

 

Understanding Overparameterization from Symmetry René Vidal Salma Tarmoun
1:15  

Evan Dramko

 

Establishing FDR Control For Genetic Marker Selection Soledad Villar, Jeremias Sulam N/A
1:30  

Chase Lahr

 

Modeling Dynamic Systems Through a Classroom Testbed Jeremy Brown Mohit Singhala
1:45  

Anire Egbe

 

Object Discrimination Using Vibrotactile Feedback for Upper Limb Prosthetic Users Jeremy Brown
2:00  

Harrison Menkes

 

Measuring Proprioceptive Impairment in Stroke Survivors (Pre-Recorded) Jeremy Brown
2:15  

Deliberations

 

3:00 Winner Announced
Aug
9
Mon
Dissertation Defense: Debojyoti Biswas
Aug 9 @ 10:00 am
Dissertation Defense: Debojyoti Biswas

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Stochastic Models of Chemotaxing Signaling Processes

Abstract: Stochasticity is ubiquitous in all processes. Its contribution in shaping the output response is not only restricted to systems involving entities with low copy numbers. Intrinsic fluctuations can also affect systems in which the interacting species are present in abundance. Chemotaxis, the migration of cells towards chemical cues, is one such example. Chemotaxis is a fundamental process that is behind a wide range of biological events, ranging from the innate immune response of organisms to cancer metastasis. In this dissertation, we study the role that stochastic fluctuations play in the regulatory mechanism that regulates chemotaxis in the social amoeba Dictyostelium discoideum. It has been argued theoretically and shown experimentally that stochastically driven threshold crossings of an underlying excitable system, lead to the protrusions that enable amoeboid cells to move. To date, however, there has been no good computational model that accurately accounts for the effects of noise, as most models merely inject noise extraneously to deterministic models leading to stochastic differential equations. In contrast, in this study, we employ an entirely different paradigm to account for the noise effects, based on the reaction-diffusion master equation. Using a modular approach and a three-dimensional description of the cell model with specific subdomains attributed to the cell membrane and cortex, we develop a detailed model of the receptor-mediated regulation of the signal transduction excitable network (STEN), which has been shown to drive actin dynamics. Using this model, we recreate the patterns of wave propagation seen in both front- and back-side markers that are seen experimentally. Moreover, we recreate various perturbations. Our model provides further support for the biased excitable network hypothesis that posits that directed motion occurs from a spatially biased regulation of the threshold for activation of an excitable network.

Here we also consider another aspect of the chemotactic response. While front- and back-markers redistribute in response to chemoattractant gradients, over time, this spatial heterogeneity becomes established and can exist even when the external chemoattractant gradient is removed. We refer to this persistent segregation of the cell into back and front regions as polarity. In this dissertation, we study various methods by which polarity can be established. For example, we consider the role of vesicular trafficking as a means of bringing back-markers from the front to the rear of the cell. Then, we study how BAR-domain proteins that are sensitive to membrane curvature, can amplify small shape heterogeneities leading to cell polarization. Finally, we develop computational models that describe a novel framework by which polarity can be established and perturbed through the alteration of the charge distribution on the inner leaf of the cell membrane.

Committee Members

  • Pablo A. Iglesias, Department of Electrical and Computer Engineering
  • Noah J . Cowan, Department of Mechanical Engineering
  • Enrique Mallada, Department of Electrical and Computer Engineering
  • Peter N. Devreotes, Department of Cell Biology
Aug
12
Thu
Dissertation Defense: Yufan He
Aug 12 @ 1:00 pm
Dissertation Defense: Yufan He

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Retinal OCT Image Analysis Using Deep Learning

Abstract: Optical coherence tomography (OCT) is a noninvasive imaging modality which uses low coherence light waves to take cross-sectional images of optical scattering media. OCT has been widely used in diagnosing retinal and neural diseases by imaging the human retina. The thickness of retinal layers are important biomarkers for neurological diseases like multiple sclerosis (MS). The peripapillary retinal nerve fiber layer (pRNFL) and ganglion cell plus inner plexiform layer (GCIP) thickness can be used to assess global disease progression of MS patients. Automated OCT image analysis tools are critical for quantitatively monitoring disease progression and explore biomarkers. With the development of more powerful computational resources, deep learning based methods have achieved much better performance in accuracy, speed, and algorithm flexibility for many image analysis tasks. However, without task-specific modifications, these emerging deep learning methods are not satisfactory if directly applied to tasks like retinal layer segmentation.

In this thesis, we present a set of novel deep learning based methods for OCT image analysis. Specifically, we focus on automated retinal layer segmentation from macular OCT images. A first problem we address is that existing deep learning methods do not incorporate explicit anatomical rules and cannot guarantee the layer segmentation hierarchy (pixels of the upper layers should have no overlap or gap with pixels of layers beneath it). To solve this, we developed an efficient fully convolutional network to generate structured layer surfaces with correct topology that is also able to perform retinal lesion (cysts or edema) segmentation. A second problem we addressed is that the segmentation uncertainty reduces the sensitivity of detecting mild retinal changes in MS patients overtime. To solve this, we developed a longitudinal deep learning pipeline that considers both inter-slice and longitudinal segmentation priors to achieve a more consistent segmentation for monitoring patient-specific retinal changes. A third problem we addressed is that the performance of the deep learning models will degrade when test data is generated from different scanners (domain shift). We address this problem by developing a novel test-time domain adaptation method. Different than existing solutions, our model can dynamically adapt to each test subject during inference without time-consuming retraining. Our deep networks achieved state-of-the-art segmentation accuracy, speed, and flexibility comparing to the existing methods.

Committee Members

  • Jerry Prince, Department of Electrical and Computer Engineering
  • Archana Venkataraman, Department of Electrical and Computer Engineering
  • Vishal Patel, Department of Electrical and Computer Engineering
Aug
25
Wed
Dissertation Defense: Honghua Guan
Aug 25 @ 1:00 pm
Dissertation Defense: Honghua Guan

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: High-throughput Optical Explorer in Freely-behaving Rodents

Abstract: Optical brain imaging is one of the most important branches of the neuroimaging. It has seen thirty years of intense development. To monitor neuronal activities in vivo, calcium imaging and sensing technique is widely used for various neuroscience investigations by measuring the calcium (Ca2+) status of an isolated cell or a population of cells. Benefiting from different types of genetically encoded calcium indicators (GECI), especially the GCaMP family, optical calcium imaging enables to monitor the electrical activity in hundreds of neurons in cell culture or in living animals, which has made it possible to elucidate the function of neuronal circuits at fine spatial resolution.

Among different optical brain imaging tools, multiphoton microscopy, owing to its depth-resolving ability, confined excitation volume and deep imaging penetration, has become the standard choice for noninvasive in vivo brain imaging. However, the current experimental routine requires head-fixation of animals during data acquisition. This configuration will inevitably introduce unwanted stress and limit many behavioral studies such as reward/punishment training, memory, and social interaction. The scanning two-photon fiberscope described in this thesis is a promising technical direction to bridge this gap. Owing to the ultra-compact design and light weight, it is an ideal optical brain imaging modality to assess dynamic neuronal activities in freely-behaving rodents with subcellular resolution. One significant challenge with the compact scanning two-photon fiberscope is its suboptimal imaging throughput due to the limited choices of miniature optomechanical components.

This dissertation reports our efforts in improving the throughput of two-photon fiberscope system from different perspectives, which includes introducing multiple-wavelength excitation for simultaneous multicolor imaging, increasing imaging speed, enlarging field of view (FOV) and so on. We also discuss our contributions on animal model preparation protocols for in vivo imaging in freely-behaving mice.

The improvement of system throughput enables us to explore many new applications that were previously impractical or impossible. We first report a compact multicolor two-photon fiberscope platform. We used two coherent pulsed outputs (the pump and the Stokes beams) from an optical parametric oscillator (OPO). By temporal and spatial overlapping, we could synchronize the two coherent pulses and generate the third virtual wavelength. These three wavelengths, which cover a large range from 750 nm to 1200 nm, are suitable for many fluorescent proteins and calcium indicators that are commonly used in neuroscience studies. This method shows more benefits in practice (e.g., reasonable cost, integrated system). The imaging results acquired from “Brainbow” mouse model demonstrate that we can excite several different fluorescent proteins simultaneously with an optimal excitation efficiency.

In addition, we proposed a deep-learning (DL) based solution that can significantly improve the imaging frame rate with minimal loss in image quality. A two-step learning transfer strategy was introduced to generate appropriate training datasets for improving the quality (signal-to-noise ratio and spatial imaging resolution) of high-speed in vivo images. The method allowed for a more than 10-fold increase in imaging speed (from ~2.0 fps to ~26 fps) while maintaining a high SNR and imaging resolution. This new DL-assisted two-photon fiberscope opens up new avenues for studying and understanding the neural basis of behaviors.

Committee Members

  • Xingde Li, Department of Biomedical Engineering
  • Mark Foster, Department of Electrical and Computer Engineering
  • Jing U. Kang, Department of Electrical and Computer Engineering
  • Israel Gannot, Department of Electrical and Computer Engineering
  • Hui Lu, Department of Pharmacology and Physiology, George Washington University
Sep
9
Thu
Distinguished Lecture Series: Peter Abadir, Associate Professor of Medicine, Johns Hopkins University School of Medicine
Sep 9 @ 3:00 pm – 4:00 pm
Distinguished Lecture Series: Peter Abadir, Associate Professor of Medicine, Johns Hopkins University School of Medicine

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Engineering Innovations to Change Aging: A Geriatrician’s Attempt at Standing Circuits.

Abstract: The population of older adults with chronic illnesses and functional and cognitive decline is rapidly expanding in the US and worldwide. In parallel, there has been a rapid emergence of new uses for artificial intelligence (AI) and technology in health care driven by developments in sensors, computing at macro and micro scales, communication networks, and progress in deep learning and other reasoning methods. Despite these parallel trends, little focused effort has been made on bridging the gap between the growing needs of older adults and their caregivers and these AI and technology developments. This is partly because the clinical needs of this vulnerable population are tremendous, including dementia, depression, polypharmacy, delirium, incontinence, vertigo, falls, spontaneous bone fractures, failure to thrive, neglect and abuse, and social isolation. The impact of social isolation and depression became even more evident during this recent COVID pandemic, given that almost half of women age over 75 live alone. Properly managing these complex needs of older adults requires special training and expertise, and to complicate matters more, physicians specialized in taking care of older adults are in short demand. An estimated 1.07 geriatricians exist per 10,000 elderly residents in the United States. To design practical AI tools and technologies to better care for older adults, Engineers/Scientists must work hand in hand with Clinical providers specially trained to understand and manage the complex needs of older adults at the physical, cognitive and social domains. In addition, the successful development, testing, and piloting of these technologies require collaboration with clinical researchers that have access to substantial research infrastructure and older patients in real-world clinical settings. Here we will focus on the impact of aging and discuss our attempts at connecting wires between the clinicians and engineers, including establishing Gerotech Incubators to foster collaboration between Geriatricians and Engineers.

Bio: Dr. Peter Abadir is an assistant professor of medicine at the Johns Hopkins University School of Medicine. His area of clinical expertise is geriatric medicine.

After receiving his medical degree from the University of Al Fateh, Dr. Abadir completed his residency in family medicine at the University of Kentucky College of Medicine. He performed his fellowship in geriatric medicine and gerontology at the Johns Hopkins University School of Medicine.

Dr. Abadir’s research interests include changes in the renin angiotensin aldosterone system with aging, signal transduction and the role of the cross talk between angiotensin II receptor in aging, and understanding the role of angiotensin II in the development of vascular aging.

He has been recognized by the Hopkins Department of Medicine with the W. Leigh Thompson Excellence in Research Award. He is a member of the American Geriatrics Society and The Gerontological Society of America.

 

Sep
22
Wed
Dissertation Defense: Blake Dewey
Sep 22 @ 2:30 pm
Dissertation Defense: Blake Dewey

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Synthesis-Based Harmonization of Multi-Contrast Structural MRI

Abstract: The flexible design of the MRI system allows for the collection of multiple images with different acquisition parameters in a single scanning session. However, since MRI does not have any standards that regulate image acquisition (unlike other imaging modalities, such as computed tomography), differences in acquisition lead to variability in image appearance between manufacturers, imaging centers, and even individual scanners. Variability in images can cause significant problems in quality of analysis, setting the stage for harmonization.

This dissertation describes four main contributions to literature of synthesis-based harmonization for structural brain MR images. In synthesis-based harmonization, harmonized images are created that can be used confidently in automated analysis pipelines such as whole-brain segmentation, where image variability can cause inconsistent results. In our first contribution, we acquired a cross-domain dataset to provide training and validation data for our harmonization methods. This dataset is crucial to our work, as it provides examples of the same subjects under two different acquisition environments. In our second contribution, we used this unique, cross-domain dataset directly to develop a supervised method of harmonization. Our method, called DeepHarmony, uses state-of-the-art deep learning architecture and training strategies to provide significantly improved image harmonization over other synthesis methods. In our third contribution, we proposed an unsupervised harmonization framework to allow for broader applications where cross-domain data is not acquired. This novel framework is based on representation learning, where we aim to separate anatomical features from acquisition environment in a disentangled latent space. We used multi-contrast MRI images from the same scanning session as internal supervision to encourage this disentangled latent representation and we demonstrated that this regularization alone was able to generate disentanglement in a completely data-driven way. In our final contribution, we extended our unsupervised work for a more diverse clinical trial dataset, which included T2-FLAIR and PD-weighted images. In this substantially more complex dataset, we made improvements to the disentanglement architecture and training strategies to produce a more consistent latent space. This method was shown to properly enforce the expectations on our latent space and also has the ability to evaluate images for inconsistent acquisition.

Committee Members

  • Jerry Prince, Department of Electrical and Computer Engineering
  • Vishal Patel, Department of Electrical and Computer Engineering
  • Webster Stayman, Department of Biomedical Engineering
  • Peter van Zijl, Department of Radiology
  • Peter Calabresi, Department of Neurology
Sep
29
Wed
Dissertation Defense: Raghavendra Pappagari
Sep 29 @ 3:30 pm
Dissertation Defense: Raghavendra Pappagari

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Towards Better Understanding of Spoken Conversations: Assessment of Emotion and Sentiment

Abstract: Emotions play a vital role in our daily life as they help us convey information impossible to express verbally to other parties. While humans can easily perceive emotions, these are notoriously difficult to define and recognize by machines. However, automatically detecting the emotion of a spoken conversation can be useful for a diverse range of applications such as human machine interaction and conversation analysis. Automatic speech emotion recognition (SER) can be broadly classified into two types: SER from isolated utterances and SER from long recordings. In this thesis, we present machine learning based approaches to recognize emotion from both isolated utterances and long recordings.

Isolated utterances are usually shorter than 10s in duration and assumed to contain only one major emotion. One of the main obstacles in achieving high emotion recognition accuracy in this case is lack of large annotated data. We proposed to mitigate this problem by using transfer learning and data augmentation techniques. We show that utterance representations (x-vectors) extracted from speaker recognition models (x-vector models) contain emotion predictive information and adapting those models provide significant improvements in emotion recognition performance. To further improve the performance, we proposed a novel perceptually motivated data augmentation method, CopyPaste on isolated utterances. Assuming that the presence of emotions other than neutral dictates a speaker’s overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We show that using this concatenated data along with the original training data to train the model improves the model performance. We presented three CopyPaste schemes and evaluate on two models – one trained independently and another using transfer learning from an x-vector model, a speaker recognition model – in both clean and test conditions. We validated the proposed approaches on three datasets each collected with different elicitation methods: Crema-D (acted emotions), IEMOCAP (induced emotions) and MSP-Podcast (spontaneous emotions).

As isolated utterances are assumed to contain only one emotion, the proposed models make predictions on the utterance level i.e., one emotion prediction for the whole utterance. However, these models can not be directly applied to the conversations which can have multiple emotions unless we know locations of emotion boundaries. In this work, we propose to recognize emotions in the conversations by doing frame-level classification where predictions are made at regular intervals. We investigated several deep learning architectures – transformers, ResNet-34 and BiLSTM – that can exploit context in the conversations. We show that models trained on isolated utterances perform worse than models trained on conversations suggesting the importance of context. Based on inner-workings of attention operation, we propose a data augmentation method, DiverseCatAugment (DCA) to equip the transformer models with better classification ability. However, these models does not exploit turn-taking pattern available in conversations. Speakers in the conversations take turns to exchange information and emotion in each turn could depend on the speaker’s and the corresponding partner’s emotions in the past turns. We show that exploiting the information of who is speaking when in the conversation improves the emotion recognition performance.
The proposed models can exploit speaker information even in the absence of speaker segmentation information.

Annotating utterances with emotions is not a simple task – it is very expensive, time consuming and depends on the number of emotions used for annotation. However, annotation schemes can be changed to reduce annotation efforts based on application. For example, for some applications, the goal is to only classify into positive or negative emotions instead of more detailed emotions like angry, happy, sad and disgust. We considered one such application in this thesis: predicting customer’s satisfaction (CSAT) in a call center conversation. CSAT is defined as the overall sentiment (positive vs. negative) of the customer about his/her interaction with the agent. As the goal is to predict only one label for the whole conversation, we perform utterance-level classification. We conducted a comprehensive search for adequate acoustic and lexical representations at different granular levels of conversations such as word/frame-, turn-. and call-level. From the acoustic signal, we found that the proposed x-vector representation combined with feed-forward deep neural network outperformed widely used prosodic features. From transcripts, CSAT Tracker, a novel method that computes overall prediction based on individual segment outcomes performed best. Both methods rely on transfer learning to obtain the best performance. We also performed fusion of acoustic and lexical features using a convolutional network. We evaluated our systems on US English telephone speech from call center data. We found that lexical models perform better than acoustic models and fusion of them provided significant gains. The analysis of errors revealed that the calls where customers accomplished their goal but were still dissatisfied are the most difficult to predict correctly. Also, we found that the customer’s speech is more emotional compared to the agent’s speech.

Committee Members:

  • Najim Dehak, Department of Electrical and Computer Engineering
  • Jesús Villalba, Department of Electrical and Computer Engineering
  • Hynek Hermansky, Department of Electrical and Computer Engineering
Back to top