We are developing a Python-based visualization suite for multichannel EEG data . Unlike existing tools, our visualizer can load and run Pytorch models on the EEG and plot channel-wise predictions of seizure activity. Additional features include a user-friendly GUI, customizable lowpass and highpass filters for data preprocessing, an annotation editor that allows the user to mark key events in the data, and spectral visualization. The EEG modified data can be exported as an EDF file, and the main viewer can be saved as a high-resolution PNG image. There is also a command-line option for batch processing and standalone apps for Windows and MAC that can be launched without a pre-existing Python installation. [github][Windows App][MAC App]
As a complement to our EEG Visualizer, we have developed an anonymization tool to remove protected health information from EDF files. Our tool allows the user to alter the EDF header fields and provides default settings for scrubbing patient IDs and time stamps. [github][Windows App][MAC App]
Emotional Speech Conversion
- Chained Encoder-Decoder-Predictor Network with Latent Variable Regularization [github]
This model implements multispeaker emotion conversion based on a chained encoder-decoder-predictor neural network architecture. The encoder constructs a latent embedding of the fundamental frequency (F0) contour and the spectrum, which we regularize using the Large Diffeomorphic Metric Mapping (LDDMM) registration framework. The decoder uses this embedding to predict the modified F0 contour in a target emotional class. Finally, the predictor uses the original spectrum and the modified F0 contour to generate a corresponding target spectrum. Our joint objective function simultaneously optimizes the parameters of three model blocks. In addition, the LDDMM regularization allows our model to convert novel phrases for out-of-sample generalization.
- Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator [github]
We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired emotion conversion (e.g., A to B) or to its inverse (B to A). This setup, which we refer to as a variational cycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergence between the source features and their cyclic counterpart. In addition, our generator combines a trainable deep network with a fixed generative block to implement a smooth and invertible transformation on the input features, in our case, the fundamental frequency (F0) contour. This hybrid architecture regularizes our adversarial training procedure. Our model can also generalize to new speakers.
- Sample, Attend and Morph: A Deep-Bayesian Framework for Adaptive Speech Duration Modification [github]
We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder to produce this attention map via a stochastic version of the mean absolute error loss function. Our model first predicts the length of the target speech signal using the encoder embeddings. The predicted length determines the number of steps for the decoder operation. During testing, we first generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal. Using this matrix, we compute a warping path of alignment between the two signals.