Thesis Proposal: Honghua Guan
Jul 6 @ 12:30 pm
Thesis Proposal: Honghua Guan

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: High-throughput Optical Explorer in Freely-behaving Rodents

Abstract: One critical goal for neuroscience is to explore the mechanisms underlying neuronal information processing. A suitable brain imaging tool is of great significance to be capable of recording clear neuronal signals over prolonged periods. Among different imaging modalities, multiphoton microscopy becomes the choice for in vivo brain applications owing to its subcellular resolution, optical sectioning and deep penetration. The current experimental routine, however, requires head-fixation of animals during data acquisition. This configuration will inevitably introduce unwanted stress and limit many behavior studies such as social interaction. The scanning two-photon fiberscope is a promising technical direction to bridge this gap. Benefiting from its ultra-compact design and light-weight, it is an ideal optical brain imaging modality to assess dynamic neuronal activities in freely-behaving rodents with subcellular resolution. One significant challenge with the compact scanning two-photon fiberscope is its suboptimal imaging throughput due to the limited choices of miniature optomechanical components.

In this project, we present a compact multicolor two-photon fiberscope platform. We achieve three-wavelength excitation by synchronizing the pulse trains from a femtosecond OPO and its pump. The imaging results demonstrate that we can excite several different fluorescent proteins simultaneously with an optimal excitation efficiency. In addition, we propose a deep neural network (DNN) based solution that significantly improves the imaging frame rate with minimal loss in image quality. This innovation enables 10-fold speed enhancement for the scanning two-photon fiberscope, making it feasible to perform video-rate (26 fps) two-photon imaging in freely-moving mice with excellent imaging resolution and SNR that were previously not possible.

Committee Members

  • Xingde Li, Department of Biomedical Engineering
  • Mark Foster, Department of Electrical and Computer Engineering
  • Jing U. Kang, Department of Electrical and Computer Engineering
  • Israel Gannot, Department of Electrical and Computer Engineering
  • Hui Lu, Department of Pharmacology and Physiology, George Washington University
Thesis Proposal: Jaejin Cho
Sep 23 @ 3:00 pm
Thesis Proposal: Jaejin Cho

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Improving speaker embedding in speaker verification: Beyond speaker discrimanitive training

Abstract: Speaker verification (SV) is a task to verify a claimed identity from the voice signal. A well-performing SV system requires a method to transform a variable-length recording into a fixed-length representation (a.k.a. embedding vector), compacting the speaker biometric information that captures distinctive features over different speakers. There are two popular methods: i-vector and x-vector. Although i-vector is still used nowadays, x-vector outperforms i-vector in many SV tasks as deep learning research surges. The x-vector, however, has limitations, and we mainly tackle two of them in this proposal: 1) the embedding still includes information about the spoken text, 2) it cannot leverage data that do not have speaker labels since the training requires the labels.

In the first half, we tackle the text-dependency in the x-vector speaker embedding. Spoken text remaining in x-vector can degrade its performance in text-independent SV because utterances of the same speaker may have different embeddings due to different spoken text. This could lead to a false rejection, i.e., the system rejects a valid target speaker. To tackle this issue, we propose to disentangle the spoken text and speaker identity into separate latent factors using a text-to-speech (TTS) model. First, the multi-speaker end-to-end TTS system has text and speech encoders, each of which focuses on encoding information in its corresponding modality. These encoders enable text-independent speaker embedding learning by reconstructing the frames of a target speech segment, given a speaker embedding of another speech segment of the same utterance. Second, many efforts to the neural TTS research over recent years have improved the speech synthesis quality. We hypothesize that speech synthesis and speaker embedding qualities positively correlate since the speaker encoder in a TTS system needs to learn well for better speech synthesis of multiple speakers. We confirm the above two points through a series of experiments.

In the second half, we focus on leveraging unlabeled data to learn embedding. Considering that much more unlabeled data exists than labeled data, leveraging the unlabeled data is essential, which is not straightforward with the x-vector training. This, however, is possible with the proposed TTS method. First, we show how to use the TTS method for this purpose. The results show that it can leverage the unlabeled data, but it still requires some labeled data to post-process the embeddings for the final SV system. To develop a completely unsupervised SV system, we apply a self-supervised technique proposed in computer vision research, distillation with no labels (DINO), and compare this to the TTS method. The results show that the DINO method outperforms the TTS method in unsupervised scenarios and enables SV with no labels.

Future work will focus on 1) exploring the DINO-based method in semi-supervised scenarios, 2) fine-tuning the network for downstream tasks such as emotion recognition.

Committee Members

  • Najim Dehak, Department of Electrical and Computer Engineering
  • Jesús Villalba, Department of Electrical and Computer Engineering
  • Sanjeev Khudanpur, Department of Electrical and Computer Engineering
  • Hynek Hermansky, Department of Electrical and Computer Engineering
  • Laureano Moro-Velazquez, Department of Electrical and Computer Engineering
Back to top