Calendar

Sep
9
Thu
Distinguished Lecture Series: Peter Abadir, Associate Professor of Medicine, Johns Hopkins University School of Medicine
Sep 9 @ 3:00 pm – 4:00 pm
Distinguished Lecture Series: Peter Abadir, Associate Professor of Medicine, Johns Hopkins University School of Medicine

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Engineering Innovations to Change Aging: A Geriatrician’s Attempt at Standing Circuits.

Abstract: The population of older adults with chronic illnesses and functional and cognitive decline is rapidly expanding in the US and worldwide. In parallel, there has been a rapid emergence of new uses for artificial intelligence (AI) and technology in health care driven by developments in sensors, computing at macro and micro scales, communication networks, and progress in deep learning and other reasoning methods. Despite these parallel trends, little focused effort has been made on bridging the gap between the growing needs of older adults and their caregivers and these AI and technology developments. This is partly because the clinical needs of this vulnerable population are tremendous, including dementia, depression, polypharmacy, delirium, incontinence, vertigo, falls, spontaneous bone fractures, failure to thrive, neglect and abuse, and social isolation. The impact of social isolation and depression became even more evident during this recent COVID pandemic, given that almost half of women age over 75 live alone. Properly managing these complex needs of older adults requires special training and expertise, and to complicate matters more, physicians specialized in taking care of older adults are in short demand. An estimated 1.07 geriatricians exist per 10,000 elderly residents in the United States. To design practical AI tools and technologies to better care for older adults, Engineers/Scientists must work hand in hand with Clinical providers specially trained to understand and manage the complex needs of older adults at the physical, cognitive and social domains. In addition, the successful development, testing, and piloting of these technologies require collaboration with clinical researchers that have access to substantial research infrastructure and older patients in real-world clinical settings. Here we will focus on the impact of aging and discuss our attempts at connecting wires between the clinicians and engineers, including establishing Gerotech Incubators to foster collaboration between Geriatricians and Engineers.

Bio: Dr. Peter Abadir is an assistant professor of medicine at the Johns Hopkins University School of Medicine. His area of clinical expertise is geriatric medicine.

After receiving his medical degree from the University of Al Fateh, Dr. Abadir completed his residency in family medicine at the University of Kentucky College of Medicine. He performed his fellowship in geriatric medicine and gerontology at the Johns Hopkins University School of Medicine.

Dr. Abadir’s research interests include changes in the renin angiotensin aldosterone system with aging, signal transduction and the role of the cross talk between angiotensin II receptor in aging, and understanding the role of angiotensin II in the development of vascular aging.

He has been recognized by the Hopkins Department of Medicine with the W. Leigh Thompson Excellence in Research Award. He is a member of the American Geriatrics Society and The Gerontological Society of America.

 

Sep
23
Thu
Thesis Proposal: Jaejin Cho
Sep 23 @ 3:00 pm
Thesis Proposal: Jaejin Cho

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Improving speaker embedding in speaker verification: Beyond speaker discrimanitive training

Abstract: Speaker verification (SV) is a task to verify a claimed identity from the voice signal. A well-performing SV system requires a method to transform a variable-length recording into a fixed-length representation (a.k.a. embedding vector), compacting the speaker biometric information that captures distinctive features over different speakers. There are two popular methods: i-vector and x-vector. Although i-vector is still used nowadays, x-vector outperforms i-vector in many SV tasks as deep learning research surges. The x-vector, however, has limitations, and we mainly tackle two of them in this proposal: 1) the embedding still includes information about the spoken text, 2) it cannot leverage data that do not have speaker labels since the training requires the labels.

In the first half, we tackle the text-dependency in the x-vector speaker embedding. Spoken text remaining in x-vector can degrade its performance in text-independent SV because utterances of the same speaker may have different embeddings due to different spoken text. This could lead to a false rejection, i.e., the system rejects a valid target speaker. To tackle this issue, we propose to disentangle the spoken text and speaker identity into separate latent factors using a text-to-speech (TTS) model. First, the multi-speaker end-to-end TTS system has text and speech encoders, each of which focuses on encoding information in its corresponding modality. These encoders enable text-independent speaker embedding learning by reconstructing the frames of a target speech segment, given a speaker embedding of another speech segment of the same utterance. Second, many efforts to the neural TTS research over recent years have improved the speech synthesis quality. We hypothesize that speech synthesis and speaker embedding qualities positively correlate since the speaker encoder in a TTS system needs to learn well for better speech synthesis of multiple speakers. We confirm the above two points through a series of experiments.

In the second half, we focus on leveraging unlabeled data to learn embedding. Considering that much more unlabeled data exists than labeled data, leveraging the unlabeled data is essential, which is not straightforward with the x-vector training. This, however, is possible with the proposed TTS method. First, we show how to use the TTS method for this purpose. The results show that it can leverage the unlabeled data, but it still requires some labeled data to post-process the embeddings for the final SV system. To develop a completely unsupervised SV system, we apply a self-supervised technique proposed in computer vision research, distillation with no labels (DINO), and compare this to the TTS method. The results show that the DINO method outperforms the TTS method in unsupervised scenarios and enables SV with no labels.

Future work will focus on 1) exploring the DINO-based method in semi-supervised scenarios, 2) fine-tuning the network for downstream tasks such as emotion recognition.

Committee Members

  • Najim Dehak, Department of Electrical and Computer Engineering
  • Jesús Villalba, Department of Electrical and Computer Engineering
  • Sanjeev Khudanpur, Department of Electrical and Computer Engineering
  • Hynek Hermansky, Department of Electrical and Computer Engineering
  • Laureano Moro-Velazquez, Department of Electrical and Computer Engineering
Back to top