Calendar

Oct
23
Fri
Dissertation Defense: Gaspar Tognetti
Oct 23 @ 2:00 pm
Dissertation Defense: Gaspar Tognetti

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Circuits and Architecture for Bio-Inspired AI Accelerators

Abstract: Technological advances in microelectronics envisioned through Moore’s law have led to more powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at the scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant.

Unconventional Compute-in-Memory (CiM) architectures such as the analog winner-takes-all associative-memory, the Charge-Injection Device (CID) processor, and analog-array processing have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications (VMMs), and in recent work, multi-bit vector-vector multiplications. A similar approach was used in earlier work, where a charge-injection device array was utilized to store binary coded vectors, and computations were done using binary or multi-bit inputs in the charge domain; computation is carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with a large number of elements, high energy efficiencies can be achieved.

In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target storage technologies: (i) a multilevel non-volatile computational cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) computational bit-cell. Experimental results in deep-submicron CMOS processes demonstrate successful operation; subsequently, behavioral models were developed and employed in large-scale system simulations and emulations. Thereafter, at the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level, demonstrating successful experimental results and providing insight into the integration requirements that larger systems may demand. Finally, on the architectural level, two AI accelerator architectures for data center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory.

Committee Members

Andreas Andreou, Department of Electrical and Computer Engineering

Ralph Etienne-Cummings, Department of Electrical and Computer Engineering

Philippe Pouliquen, Department of Electrical and Computer Engineering

Dissertation Defense: Ruizhi Li
Oct 23 @ 2:00 pm
Dissertation Defense: Ruizhi Li

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: An Efficient and Robust Multi-Stream Framework for End-to-End Speech Recognition

Abstract: In the voice-enabled domestic or meeting environments, distributed microphone arrays aim to process distant-speech interaction into text with high accuracy. However, with dynamic corruption of noises and reverberations or human movement present, there is no guarantee that any microphone array (stream) is constantly informative. In these cases, an appropriate strategy to dynamically fuse streams or select the most informative array is necessary.

The multi-stream paradigm in Automatic Speech Recognition (ASR) considers scenarios where parallel streams carry diverse or complementary task-related knowledge. Such streams could be defined as microphone arrays, frequency bands, various modalities or etc. Hence, a robust stream fusion is crucial to emphasize on more informative streams than corrupted ones, specially under unseen conditions. This thesis focuses on improving the performance and robustness of speech recognition in multi-stream scenarios.

In recent years, with the increasing use of Deep Neural Networks (DNNs) in ASR, End-to-End (E2E) approaches, which directly transcribe human speech into text, have received greater attention. In this thesis, a multi-stream framework is presented based on joint Connectionist Temporal Classification/Attention (CTC/ATT) E2E model, where parallel streams are represented by separate encoders. On top of the regular attention networks, a secondary stream-fusion network is to steer the decoder toward the most informative streams. Two representative frameworks are proposed, which are Multi-Encoder Multi-Array (MEM-Array) and Multi-Encoder Multi-Resolution (MEM-Res), respectively.

The MEM-Array model aims at improving the far-field ASR robustness using multiple microphone arrays which are activated by separate encoders. With an increasing number of streams (encoders) requiring substantial memory and massive amounts of parallel data, a practical two-stage training strategy is desgnated to address these issues. Furthermore, a two-stage augmentation scheme is present to improve the robustness of the multi-stream model, where small amount of parallel data is sufficient to achieve competitive results. In MEM-Res, two heterogeneous encoders with different architectures, temporal resolutions and separate CTC networks work in parallel to extract complementary information from same acoustics. Compared with the best single-stream performance, both models have achieved substantial improvement, which also outperform various conventional fusion strategies.

While proposed framework optimizes information in multi-stream scenarios, this thesis also studies the Performance Monitoring (PM) measures to predict if recognition result of an end-to-end model is reliable, without growth-truth knowledge. Four different PM techniques are investigated, suggesting that PM measures on attention distributions and decoder posteriors are well-correlated with true performances.

Committee Members

Hynek Hermansky, Department of Electrical and Computer Engineering

Shinji Watanabe, Department of Electrical and Computer Engineering

Najim Dehak, Department of Electrical and Computer Engineering

Gregory Sell, JHU Human Language Technology Center of Excellence

Dec
16
Wed
Dissertation Defense: Tsan Zhao
Dec 16 @ 2:00 pm
Dissertation Defense: Tsan Zhao

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Medical Image Modality Synthesis and Resolution Enhancement Based on Machine Learning Techniques

Abstract: To achieve satisfactory performance from automatic medical image analysis algorithms such as registration or segmentation, medical imaging data with the desired modality/contrast and high isotropic resolution are preferred, yet they are not always available. We addressed this problem in this thesis using 1) image modality synthesis and 2) resolution enhancement.

The first contribution of this thesis is computed tomography (CT)-to-magnetic resonance imaging (MRI) image synthesis method, which was developed to provide MRI when CT is the only modality that is acquired. The main challenges are that CT has poor contrast as well as high noise in soft tissues and that the CT-to-MR mapping is highly nonlinear. To overcome these challenges, we developed a convolutional neural network (CNN) which is a modified U-net. With this deep network for synthesis, we developed the first segmentation method that provides detailed grey matter anatomical labels on CT neuroimages using synthetic MRI.

The second contribution is a method for resolution enhancement for a common type of acquisition in clinical and research practice, one in which there is high resolution (HR) in the in-plane directions and low resolution (LR) in the through-plane direction. The challenge of improving the through-plane resolution for such acquisitions is that the state-of-art convolutional neural network (CNN)-based super-resolution methods are sometimes not applicable due to lack of external LR/HR paired training data. To address this challenge, we developed a self super-resolution algorithm called SMORE and its iterative version called iSMORE, which are CNN-based yet do not require LR/HRpaired training data other than the subject image itself. SMORE/iSMOREcreate training data from the HR in-plane slices of the subject image itself, then train and apply CNNs to through-plane slices to improve spatial resolution and remove aliasing. In this thesis, we perform SMORE/iSMORE on multiple simulated and real data sets to demonstrate their accuracy and generalizability. Also, SMORE as a preprocessing step is shown to improve segmentation accuracy.

In summary, CT-to-MR synthesis, SMORE, and iSMORE were demonstrated in this thesis to be effective preprocessing algorithms for visual quality and other automatic medical image analysis such as registration or segmentation.

Committee Members

Jerry Prince, Department of Electrical and Computer Engineering

John Goutsias, Department of Electrical and Computer Engineering

Trac Tran, Department of Electrical and Computer Engineering

Feb
11
Thu
ECE Special Seminar: Amir Manbachi
Feb 11 @ 3:05 pm
ECE Special Seminar: Amir Manbachi

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Towards building a clinically-inspired ultrasound innovation hub: Design, Development and Clinical Validation of novel Ultrasound hardware for Imaging, Therapeutics, Sensing and other applications.

Abstract: Ultrasound is a relatively established modality with a number of exciting, yet not fully explored applications, ranging from imaging and image-guided navigation, to tumor ablation, neuro-modulation, piezoelectric surgery, and drug delivery. In this talk, Dr. Manbachi will be discussing some of his ongoing projects aiming to address low-frequency bone sonography, minimally invasive ablation of neuro-oncology and implantable sensors for spinal cord blood flow measurements.

Bio: Dr. Manbachi is an Assistant Professor of Neurosurgery and Biomedical Engineering at Johns Hopkins University. His research interests include applications of sound and ultrasound to various neurosurgical procedures. These applications include imaging the spine and brain, detection of foreign body objects, remote ablation of brain tumors, monitoring of blood flow and tissue perfusion, as well as other upcoming interesting applications such as neuromodulation and drug delivery. His teaching activities mentorship with BME Design Teams as well as close collaboration with clinical experts in Surgery and Radiology at Johns Hopkins.

His previous work included the development of ultrasound-guided spine surgery. He obtained his PhD from the University of Toronto, under the supervision of Dr. Richard S.C. Cobbold. Prior to joining Johns Hopkins, he was a postdoctoral fellow at Harvard-MIT Division of Health Sciences and Technology (2015-16) and the founder and CEO of Spinesonics Medical (2012–2015), a spinoff from his doctoral studies.

Amir is an author on >25 peer-reviewed journal articles, > 30 conference proceedings, 10 invention disclosures / patent applications and a book entitled “Towards Ultrasound-guided Spinal Fusion Surgery.” He has mentored 150+ students, has so far been raised $1.1M of funding and his interdisciplinary research has been recognized by a number of awards, including University of Toronto’s 2015 Inventor of Year award, Ontario Brain Institute 2013 fellowship, Maryland Innovation Initiative and Cohen Translational Funding.

Dr. Manbachi has extensive teaching experience, particularly in the field of engineering design, medical imaging and entrepreneurship (both at Hopkins and Toronto), for which he received the University of Toronto’s Teaching Excellence award in 2014, as well as Johns Hopkins University career centre’s award nomination for students’ “Career Champion” (2018) and finally Johns Hopkins University Whiting School of Engineering’s Robert B. Pond Sr. Excellence in Teaching Excellence Award (2018).

Feb
25
Thu
ECE Seminar: Ashutosh Dutta
Feb 25 @ 3:00 pm
ECE Seminar: Ashutosh Dutta

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: 5G Security – Opportunities and Challenges

Abstract: Software Defined Networking (SDN) and Network Function Virtualization (NFV) are the key pillars of future networks, including 5G and beyond that promise to support emerging applications such as enhanced mobile broadband, ultra-low latency, massive sensing type applications while providing the resiliency in the network. Service providers and other vertical industries (e.g., Connected Cars, IOT, eHealth) can leverage SDN/NFV to provide flexible and cost-effective service without compromising the end user quality of service (QoS). While NFV and SDN open up the door for flexible networks and rapid service creation, these also offer both security opportunities while also introducing additional challenges and complexities, in some cases. With the rapid proliferation of 4G and 5G networks, operators have now started the trial deployment of network function virtualization, especially with the introduction of various virtualized network elements in the access and core networks. While several standardization bodies (e.g., ETSI, 3GPP, NGMN, ATIS, IEEE) have started looking into the many security issues introduced by SDN/NFV, additional work is needed with larger security community including vendors, operators, universities, and regulators.

This talk will address evolution of cellular technologies towards 5G but will largely focus on various security challenges and opportunities introduced by SDN/NFV and 5G networks such as Hypervisor, Virtual Network Functions (VNFs), SDN controller, orchestrator, network slicing, cloud RAN, edge cloud, and security function virtualization. This talk will introduce a threat taxonomy for 5G security from an end-to-end system perspective, potential threats introduced by these enablers, and associated mitigation techniques. At the same time, some of the opportunities introduced by these pillars will also be discussed. This talk will also highlight some of the ongoing activities within various standards communities and will illustrate a few deployment use case scenarios for security including threat taxonomy for both operator and enterprise networks.

Bio: Ashutosh Dutta is currently senior scientist and 5G Chief Strategist at the Johns Hopkins University Applied Physics Laboratory (JHU/APL). He is also a JHU/APL Sabbatical Fellow and adjunct faculty at The Johns Hopkins University. Ashutosh also serves as the chair for Electrical and Computer Engineering Department of Engineering for Professional Program at Johns Hopkins University. His career, spanning more than 30 years, includes Director of Technology Security and Lead Member of Technical Staff at AT&T, CTO of Wireless for NIKSUN, Inc., Senior Scientist and Project Manager in Telcordia Research, Director of the Central Research Facility at Columbia University, adjunct faculty at NJIT, and Computer Engineer with TATA Motors. He has more than 100 conference, journal publications, and standards specifications, three book chapters, and 31 issued patents. Ashutosh is co-author of the book, titled, “Mobility Protocols and Handover Optimization: Design, Evaluation and Application” published by IEEE and John & Wiley.

As a Technical Leader in 5G and security, Ashutosh has been serving as the founding Co-Chair for the IEEE Future Networks Initiative that focuses on 5G standardization, education, publications, testbed, and roadmap activities. Ashutosh serves as IEEE Communications Society’s Distinguished Lecturer for 2017-2020 and as an ACM Distinguished Speaker (2020-2022) Ashutosh has served as the general Co-Chair for the premier IEEE 5G World Forums and has organized 65 5G World Summits around the world.

Ashutosh served as the chair for IEEE Princeton / Central Jersey Section, Industry Relation Chair for Region 1 and MGA, Pre-University Coordinator for IEEE MGA and vice chair of Education Society Chapter of PCJS. He co-founded the IEEE STEM conference (ISEC) and helped to implement EPICS (Engineering Projects in Community Service) projects in several high schools. Ashutosh has served as the general Co-Chair for the IEEE STEM conference for the last 10 years. Ashutosh served as the Director of Industry Outreach for IEEE Communications Society from 2014-2019. He was recipient of the prestigious 2009 IEEE MGA Leadership award and 2010 IEEE-USA professional leadership award. Ashutosh currently serves as Member-At-Large for IEEE Communications Society for 2020-2022.

Ashutosh obtained his BS in Electrical Engineering from NIT Rourkela, India; MS in Computer Science from NJIT; and Ph.D. in Electrical Engineering from Columbia University, New York under the supervision of Prof. Henning Schulzrinne.  Ashutosh is a Fellow of IEEE and senior member of ACM.

Mar
18
Thu
Dissertation Defense: Vishwanath Sindagi
Mar 18 @ 2:00 pm – 4:00 pm
Dissertation Defense: Vishwanath Sindagi

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Single Image Based Crowd Counting Using Deep Learning

Abstract: Estimating count and density maps from crowd images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. In addition, techniques developed for crowd counting can be applied to related tasks in other fields of study such as cell microscopy, vehicle counting and environmental survey. The task of crowd counting and density map estimation from a single image is a difficult problem since it suffers from multiple issues like occlusions, perspective changes, background clutter, non-uniform density, intra-scene and inter-scene variations in scale and perspective. These issues are further exacerbated in highly congested scenes. In order to overcome these challenges, we propose a variety of different deep learning architectures that specifically incorporate various aspects such as global/local context information, attention mechanisms, specialized iterative and multi-level multi-pathway fusion schemes for combining information from multiple layers in a deep network. Through extensive experimentations and evaluations on several crowd counting datasets, we demonstrate that the proposed networks achieve significant improvements over existing approaches.

We also recognize the need for large amounts of data for training the deep networks and their inability to generalize to new scenes and distributions.  To overcome this challenge, we propose novel semi-supervised and weakly-supervised crowd counting techniques that effectively leverage large amounts of unlabeled/weakly-labeled data. In addition to developing techniques with ability to learn from limited labeled data, we also introduce a new large-scale crowd counting dataset which can be used to train considerably larger networks. The proposed data consists of 4,372 high resolution images with 1.51 million annotations. We made explicit efforts to ensure that the images are collected under a variety of diverse scenarios and environmental conditions. The dataset provides a richer set of annotations like dots, approximate bounding boxes, blur levels, etc.

Committee Members

  • Vishal Patel, Department of Electrical and Computer Engineering
  • Rama Chellappa, Department of Electrical and Computer Engineering
  • Alan Yuille, Department of Computer Science
Mar
24
Wed
Dissertation Defense: Puyang Wang
Mar 24 @ 8:30 am
Dissertation Defense: Puyang Wang

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Deep Learning Based Methods for Ultrasound Image Segmentation and Magnetic Resonance Image Reconstruction

Abstract: In recent years, deep learning (DL) algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. It has shown promising performances in many medical image analysis (MIA) problems, including classification, segmentation and reconstruction. However, the inherent difference between natural images and medical images (Ultrasound, MRI etc.) have hinder the performance of such DL-based method that originally designed for natural images. Another obstacle for DL-based MIA comes the availability of large-scale training dataset as it have shown that large and diverse dataset can effectively improve the robustness and generalization ability of DL networks.

In this thesis, we develop various deep learning-based approaches to address two medical image analysis problems. In the first problem, we focus on computer assisted orthopedic surgery (CAOS) applications that use ultrasound as intra-operative imaging modality. This problem requires an automatic and real-time algorithm to detect and segment bone surfaces and shadows in order to provide guidance for the orthopedic surgeon to a standardized diagnostic viewing plane with minimal artifacts. Due to the limitation of relatively small datasets and image differences from multiple ultrasound machines, we develop DL-based frameworks that leverage a local phase filtering technique and integrate it into the DL framework, thus improving the robustness.

Finally, we propose a fast and accurate Magnetic Resonance Imaging (MRI) image reconstruction framework using a novel Convolutional Recurrent Neural Network (CRNN). Extensive experiments and evaluation on knee and brain datasets have shown its outstanding results compared to the traditional compressed sensing and other DL-based methods. Furthermore, we extend this method to enable multi sequence-reconstruction where T2-weighted MRI image can provide guidance and improvement to the reconstruction of amid proton transfer-weighted
MRI image.

Committee Members

  • Vishal M. Patel, Department of Electrical and Computer Engineering
  • Rama Chellappa, Department of Electrical and Computer Engineering
  • Carlos Castillo, Department of Electrical and Computer Engineering

  • Shanshan Jiang, Department of Radiology and Radiological Science

  • Ilker Hacihaliloglu, Department of Biomedical Engineering (Rutgers University)

Mar
26
Fri
Dissertation Defense: Phani Nidadavolu
Mar 26 @ 9:00 am
Dissertation Defense: Phani Nidadavolu

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Unsupervised Domain Adaptation for Speaker Verification in the Wild

Abstract: Performance of automatic speaker verification (ASV) systems is very sensitive to mismatch between training (source) and testing (target) domains. The best way to address domain mismatch is to perform matched condition training – gather sufficient labeled samples from the target domain and use them in training. However, in many cases this is too expensive or impractical. Usually, gaining access to unlabeled target domain data, e.g., from open source online media, and labeled data from other domains is more feasible. This work focuses on making ASV systems robust to uncontrolled (‘wild’) conditions, with the help of some unlabeled data acquired from such conditions.

Given acoustic features from both domains, we propose learning a mapping function – a deep convolutional neural network (CNN) with an encoder-decoder architecture – between features of both the domains. We explore training the network in two different scenarios: training on paired speech samples from both domains and training on unpaired data. In the former case, where the paired data is usually obtained via simulation, the CNN is treated as a non-linear regression function and is trained to minimize L2 loss between original and predicted features from target domain. Though effective, we provide empirical evidence that this approach introduces distortions that affect verification performance. To address this, we explore training the CNN using adversarial loss (along with L2), which makes the predicted features indistinguishable from the original ones, and thus, improve verification performance.

The above framework, though effective, cannot be used to train the network on unpaired data obtained by independently sampling speech from both domains. In this case, we first train a CNN using adversarial loss to map features from source to target. We, then, map the predicted features back to the source domain using an auxiliary network, and minimize a cycle-consistency loss between the original and reconstructed source features.

To prevent the CNN from over-fitting when trained on limited amounts of data, we present a simple regularizing technique. Our unsupervised adaptation approach using feature mapping, also complements its supervised counterpart, where adaptation is done using labeled data from both domains. We focus on three domain mismatch scenarios: (1) sampling frequency mismatch between the domains, (2) channel mismatch, and (3) robustness to far-field and noisy speech acquired from wild conditions.

Committee Members

  • Najim Dehak, Department of Electrical and Computer Engineering
  • Jesús Villalba, Department of Electrical and Computer Engineering
  • Hynek Hermansky, Department of Electrical and Computer Engineering
  • Sanjeev Khudanpur, Department of Electrical and Computer Engineering
May
10
Mon
Dissertation Defense: Jordi Abante
May 10 @ 3:00 pm
Dissertation Defense: Jordi Abante

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Statistical Signal Processing Methods for Epigenetic Landscape Analysis

Abstract: Since the DNA structure was discovered in 1953, a great deal of effort has been put into studying this molecule in detail. We now know DNA comprises an organism’s genetic makeup and constitutes a blueprint for life. The study of DNA has dramatically increased our knowledge about cell function and evolution and has led to remarkable discoveries in biology and medicine.

Just as DNA is replicated during cell division, several chemical marks are also passed onto progeny during this process. Epigenetics studies these marks and represents a fascinating research area given their crucial role. Among all known epigenetic marks, 5mc DNA methylation is probably one of the most important ones given its well-established association with various biological processes, such as development and aging, and disease, such as cancer. The work in this dissertation focuses primarily on this epigenetic mark, although it has the potential to be applied to other heritable marks.

In the 1940s, Waddington introduced the term epigenetic landscape to conceptually describe cell pluripotency and differentiation. This concept lived in the abstract plane until Jenkinson et al. 2017 & 2018 estimated actual epigenetic landscapes from WGBS data, and the work led to startling results with biological implications in development and disease. Here, we introduce an array of novel computational methods that draw from that work. First, we present CPEL, a method that uses a variant of the original landscape proposed by Jenkinson et al., which, together with a new hypothesis testing framework, allows for the detection of DNA methylation imbalances between homologous chromosomes. Then, we present CpelTdm, a method that builds upon CPEL to perform differential methylation analysis between groups of samples using targeted bisulfite sequencing data. Finally, we extend the original probabilistic model proposed by Jenkinson et al. to estimate methylation landscapes and perform differential analysis from nanopore data.

Overall, this work addresses immediate needs in the study of DNA methylation. The methods presented here can lead to a better characterization of this critical epigenetic mark and enable biological discoveries with implications for diagnosing and treating complex human diseases.

Committee Members

  • John Goutsias, Department of Electrical and Computer Engineering
  • Archana Venkataraman, Department of Electrical and Computer Engineering
  • Sanjeev Khudanpur, Department of Electrical and Computer Engineering
May
24
Mon
Dissertation Defense: Xing Di
May 24 @ 12:00 pm
Dissertation Defense: Xing Di

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Deep Learning Based Face Image Synthesis

Abstract: Face image synthesis is an important problem in the biometrics and computer vision communities due to its applications in law enforcement and entertainment. In this thesis, we develop novel deep neural network models and associated loss functions for two face image synthesis problems, namely thermal to visible face synthesis and visual attribute to face synthesis.

In particular, for thermal to visible face synthesis, we propose a model which makes use of facial attributes to obtain better synthesis. We use attributes extracted from visible images to synthesize attribute-preserved visible images from thermal imagery. A pre-trained attribute predictor network is used to extract attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pre-trained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification.

In addition, we propose another thermal to visible face synthesis method based on a self-attention generative adversarial network (SAGAN) which allows efficient attention-guided image synthesis. Rather than focusing only on synthesizing visible faces from thermal faces, we also propose to synthesize thermal faces from visible faces. Our intuition is based on the fact that thermal images also contain some discriminative information about the person for verification. Deep features from a pre-trained Convolutional Neural Network (CNN) are extracted from the original as well as the synthesized images. These features are then fused to generate a template which is then used for cross-modal face verification.

Regarding attribute to face image synthesis, we propose the Att2SK2Face model for face image synthesis from visual attributes via sketch. In this approach, we first synthesize a facial sketch corresponding to the visual attributes and then generate the face image based on the synthesized sketch. The proposed framework is based on a combination of two different Generative Adversarial Networks (GANs) – (1) a sketch generator network which synthesizes realistic sketch from the input attributes, and (2) a face generator network which synthesizes facial images from the synthesized sketch images with the help of facial attributes.

Finally, we propose another synthesis model, called Att2MFace, which can simultaneously synthesize multimodal faces from visual attributes without requiring paired data in different domains for training the network. We introduce a novel generator with multimodal stretch-out modules to simultaneously synthesize multimodal face images. Additionally, multimodal stretch-in modules are introduced in the discriminator which discriminates between real and fake images.

Committee Members

  • Vishal Patel, Department of Electrical and Computer Engineering
  • Rama Chellappa, Department of Electrical and Computer Engineering
  • Carlos Castillo, Department of Electrical and Computer Engineering
Back to top