Calendar

Jun
18
Thu
Dissertation Defense: Yansong Zhu
Jun 18 @ 1:00 pm
Dissertation Defense: Yansong Zhu

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 12:45 PM EDT. 

Title: Improved Modeling and Image Generation for Fluorescence Molecular Tomography (FMT) and Positron Emission Tomography (PET)

Abstract: In this thesis, we aim to improve quantitative medical imaging with advanced image generation algorithms. We focus on two specific imaging modalities: fluorescence molecular tomography (FMT) and positron emission tomography (PET).

In the case of FMT, we present a novel photon propagation model for its forward model, and in addition, we propose and investigate a reconstruction algorithm for its inverse problem. In the first part, we develop a novel Neumann-series-based radiative transfer equation (RTE) that incorporates reflection boundary conditions in the model. In addition, we propose a novel reconstruction technique for diffuse optical imaging that incorporates this Neumann-series-based RTE as forward model. The proposed model is assessed using a simulated 3D diffuse optical imaging setup, and the results demonstrate the importance of considering photon reflection at boundaries when performing photon propagation modeling. In the second part, we propose a statistical reconstruction algorithm for FMT. The algorithm is based on sparsity-initialized maximum-likelihood expectation maximization (MLEM), taking into account the Poisson nature of data in FMT and the sparse nature of images. The proposed method is compared with a pure sparse reconstruction method as well as a uniform-initialized MLEM reconstruction method. Results indicate the proposed method is more robust to noise and shows improved qualitative and quantitative performance.

For PET, we present an MRI-guided partial volume correction algorithm for brain imaging, aiming to recover qualitative and quantitative loss due to the limited resolution of PET system, while keeping image noise at a low level. The proposed method is based on an iterative deconvolution model with regularization using parallel level sets. A non-smooth optimization algorithm is developed so that the proposed method can be feasibly applied for 3D images and avoid additional blurring caused by conventional smooth optimization process. We evaluate the proposed method using both simulation data and in vivo human data collected from the Baltimore Longitudinal Study of Aging (BLSA). Our proposed method is shown to generate images with reduced noise and improved structure details, as well as increased number of statistically significant voxels in study of aging. Results demonstrate our method has promise to provide superior performance in clinical imaging scenarios.

Thesis Committee

  • Arman Rahmim, Department of Electrical and Computer Engineering, Department of Radiology and Radiological Sciences (advisor, primary reader)
  • Yong Du, Department of Radiology and Radiological Sciences (secondary reader)
  • Jin Kang, Department of Electrical and Computer Engineering
  • Trac Tran, Department of Electrical and Computer Engineering
Jul
28
Tue
Dissertation Defense: Ben Skerritt-Davis
Jul 28 @ 10:00 am
Dissertation Defense: Ben Skerritt-Davis

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 9:45 AM EDT.

Title: Statistical Inference in Auditory Perception

Abstract: The human auditory system effortlessly parses complex sensory inputs despite the ever-present randomness and uncertainty in real-world scenes. To achieve this, the brain tracks sounds as they evolve in time, collecting contextual information to construct an internal model of the external world for predicting future events. Previous work has shown the brain is sensitive to many predictable (and often complex) patterns in sequential sounds. However, real-world environments exhibit a broader spectrum of predictability, and moreover, the level of predictability is constantly in flux. How does the brain build robust internal representations of such stochastic and dynamic acoustic environments?

This question is addressed through the lens of a computational model based in statistical inference. Embodying theories from Bayesian perception and predictive coding, the model posits the brain collects statistical estimates from sounds and maintains multiple hypotheses for the degree of context to include in predictive processes. As a potential computational solution for perception of complex and dynamic sounds, this model is used to connect sensory inputs with listeners’ responses in a series of human behavioral and electroencephalography (EEG) experiments incorporating uncertainty. Experimental results point toward the underlying sufficient statistics collected by the brain, and the extension of these statistical representations to multiple dimensions is examined along spectral and spatial dimensions. The computational model guides interpretation of behavioral and neural responses, revealing multiplexed responses in the brain corresponding to different levels of predictive processing. In addition, the model is used to explain individual differences across listeners highlighted by uncertainty.

The proposed computational model was developed based on first principles, and its usefulness is not limited to the experiments presented here. The model was used to replicate a range of previous findings in the literature, unifying them under a single framework. Moving forward, this general and flexible model can be used as a broad-ranging tool for studying the statistical inference processes behind auditory perception, overcoming the need to minimize uncertainty in perceptual experiments and pushing what was previously considered feasible for study in the laboratory towards what is typically encountered in the “messy” environments of everyday listening.

Committee Members

Mounya Elhilali, Department of Electrical and Computer Engineering

Jason Fischer, Department of Psychological & Brain Sciences

Hynek Hermansky, Department of Electrical and Computer Engineering

James West, Department of Electrical and Computer Engineering

Aug
21
Fri
Dissertation Defense: Gary Li
Aug 21 @ 11:00 am
Dissertation Defense: Gary Li

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 10:45 AM EDT.

Title: Task-based Optimization of Administered Activity for Pediatric Renal SPECT Imaging

Abstract: Like any real-world problem, the design of an imaging system always requires tradeoffs. For medical imaging modalities using ionization radiation, a major tradeoff is between diagnostic image quality (IQ) and risk to the patient from absorbed dose (AD). In nuclear medicine, reducing the AD requires reducing the administered activity (AA). Lower AA to the patient can reduce risk and adverse effects, but can also result in reduced diagnostic image quality. Thus, ultimately, it is desirable to use the lowest AA that gives sufficient image quality for accurate clinical diagnosis.

In this dissertation, we proposed and developed tools for a general framework for optimizing RD with task-based assessment of IQ. Here, IQ is defined as an objective measure of the user performing the diagnostic task that the images were acquired to answer. To investigate IQ as a function of renal defect detectability, we have developed a projection image database modeling imaging of 99mTc-DMSA, a renal function agent. The database uses a highly-realistic population of pediatric phantoms with anatomical and body morphological variations. Using the developed projection image database, we have explored patient factors that affect IQ and are currently in the process of determining relationships between IQ and AA in terms of these found factors. Our data have shown that factors that are more local to the target organ may be more robust than weight for estimating the AA needed to provide a constant IQ across a population of patients. In the case of renal imaging, we have discovered that girth is more robust than weight (currently used in clinical practice) in predicting AA needed to provide a desired IQ. In addition to exploring the patient factors, we also did some work on improving the task simulating capability for anthropomorphic model observer. We proposed a deep learning-based anthropomorphic model observer to fully and efficiently (in terms of both training data and computational cost) model the clinical 3D detection task using multi-slice, multi-orientation images sets. The proposed model observer is important and could be readily adapted to model human observer performance on detection tasks for other imaging modalities such as PET, CT or MRI.

Committee Members

Eric Frey – Department of Radiology and Radiological Science. Faculty adviser.

Yong Du – Department of Radiology and Radiological Science. Second reader.

Vishal Patel – Department of Electrical and Computer Engineering.

George Sgouros – Department of Radiology and Radiological Science.

Archana Venkataraman – Department of Electrical and Computer Engineering.

Dissertation Defense: Nathan Henry
Aug 21 @ 11:00 am
Dissertation Defense: Nathan Henry

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 10:45 AM EDT.

Title: Mid-Infrared and Terahertz Frequency Combs from Quantum Cascade Lasers

Abstract: Optical frequency combs (FC) allow for extremely high resolution and broadband spectroscopic measurements that are captured contemporaneously rather than through some scanning action. Spectroscopic access to the infrared and THz is highly coveted as many molecular resonances lie in this region. However, due to a lack of available materials, emission of FC in the IR has been difficult, with many attempts resulting in low power and efficiency. In 2014 [1] the first mid-IR FC was characterized from a free-running QCL, requiring no extra elements. However, due to the inherently short upperstate lifetime of the laser, the FC is atypical in that it is not characterized by pulses but rather frequency modulation (FM). While the QCL FC has advanced significantly, it is not fully understood. As a result, spectroscopic measurements can become unreliable, sensitive to environmental changes, and recovery of absolute frequency can be difficult.

To better understand the FC QCL, a set of rate equations adapted from the optical Bloch equations is developed and found to be fully adequate for describing the origins and dynamics of FM FC. This work addresses two modes of operation (pseudo-random and chirped FM) calculating the dynamics of a QCL modeled after real-world measurements. Using specifications of real world QCLs (THz and IR), the gain is modeled under various operational scenarios and the most efficient state is identified. The period of the FM is postulated to be determined by the relative strengths of the various hole burning mechanisms and stability is shown for multiple regimes.

Further work is presented addressing the stability of QCL FCs. We begin by deriving the linewidth of the FC generating QCL and show that indeed it can be just as narrow as more conventional FCs. Subsequent to this work we use a two-dimensional model to achieve an engineered power-law dispersion, which can mitigate offset frequency drift offering the potential to significantly lower the phase noise. It is the hope of the author that this research will be used to develop a deeper understanding of FC producing QCLs that contribute to many fields of human endeavor such as medical diagnostics, remote sensing, time standardization, etc.

Committee Members

Jacob Khurgin – Department of Electrical and Computer Engineering. Adviser.

Susanna Thon – Department of Electrical and Computer Engineering.

Amy Foster – Department of Electrical and Computer Engineering.

Sep
4
Fri
Dissertation Defense: Yida Lin
Sep 4 @ 1:30 pm
Dissertation Defense: Yida Lin

This presentation will be taking place remotely. Follow this link to enter the Zoom meeting where it will be hosted. Do not enter the meeting before 1:15 PM EDT.

Title: Extending the Potential of Thin-film Optoelectronics via Optical Engineering

Abstract: Optoelectronics based on nanomaterials have become a research focus in recent years, and this technology bridges the fields of solid-state physics, electrical engineering and materials science. The rapid development in optoelectronic devices in the last century has both benefited from and spurred advancements in the science and engineering of pho- ton detection and manipulation, image sensing, high-efficiency and high-power-density light emission, displays, communications and renewable energy harvesting. A particularly promising material class for optoelectronics is colloidal nanomaterials, due to their functionality, cost -efficiency and even new physics, thanks to their exotic properties in the areas of light-matter interaction, low-dimensionality, and solution-processability which dramatically reduces the time and cost required to fabricate thin film devices, and at the same time provides wide compatibility with existing materials interfaces and device structures. This thesis focuses on exploring and assessing the capabilities of lead sulfide quantum dot-based solar cells and photodetectors. The discussion involves advances in techniques such as implementing novel photonic structures, designing and building novel characterization systems and methods, and coupling to external optical structures and components.

This thesis comprises three sections. The first section focuses on the design and adaption of photonic structures to tailor the function and response of photovoltaics and other absorption-based optoelectronics for specific applications. in the first part, we introduce consideration of complete multi-layer thin film interference effects into the design of solar cells. By numerical calculation and optimization of the film thicknesses as well as the precise fabrication control, devices with specific target colors or optical transparency levels were achieved. In the second part, we investigate the presence of 2D photonic crystal bands in absorbing materials that can be readily incorporated into nanomaterial thin films through nanostructuring of the material. We carried out simulations and theoretical analyses and proposed a method to realize simultaneous selectivity in the device reflection, transmission and absorption spectra that are critical for optoelectronic applications.

The next section focuses on designing and building a multi-modal microscopy system for thin-film optoelectronic devices, accompanied with analyses and explanation of complex experimental data. The goal of the system was to provide simultaneous 2D spatial measurements of, including but not limited to, photoluminescence spectra, time- resolved photocurrent and photovoltage responses, and a rich variety of all the possible combinations of these measurements and their associated derived quantities, collected with micrometer resolution. The multi-dimensional data helped us understand the intercorrelation between local defective regions in films and the entire device behavior, as well as a more comprehensive profile of mutual relationships between solar cell figures of merit.

In the last section, we discuss a new implementation of miniature solar concentrator arrays for lead sulfide quantum dot solar cells. First, we design and analyze the effects of a medium concentration ratio lens-type concentrator made from polydimethylsiloxane, a flexible organosilicon polymer. The concentrators were designed and optimized with the aid of ray-tracing simulation tools to achieve the best compatibility with colloidal nanomaterial-based solar cells. Experimentally, we produced an integrated concentrator system delivering 20-fold current and power enhancements close to the theoretical pre- dictions, and also used our concentrator measurements to explain the rarely explored carrier dynamics critical to high-power operation of thin film solar cells. Next, we design a wide-acceptance-angle dielectric solar concentrator that can be adapted to many types of high- efficiency small-area solar cells. The design was generated using rigorous optical models that define the behaviors of light rays and was verified with ray-tracing optical simulations to yield results for the full annual 2D time-resolved collectible power for the resulting system. Finally, we discuss strategies for further extending the possibilities of nanomaterial-based optoelectronics for future challenges in energy production and related applications.

Committee Members

Susanna Thon – Department of Electrical and Computer Engineering

Jacob Khurgin – Department of Electrical and Computer Engineering

Mark Foster – Department of Electrical and Computer Engineering

Oct
16
Fri
Dissertation Defense: Golnoosh Kamali
Oct 16 @ 12:00 pm
Dissertation Defense: Golnoosh Kamali

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Transfer function models of cortico-cortical evoked potentials for the localization of seizures in medically refractory epilepsy patients

Abstract: Surgical resection of the seizure onset zone (SOZ) could potentially lead to seizure-freedom in medically refractory epilepsy (MRE) patients. However, localizing the SOZ is a time consuming, subjective process involving visual inspection of intracranial electroencephalographic (iEEG) recordings captured during invasive passive patient monitoring. Cortical stimulation is currently performed on patients undergoing invasive EEG monitoring for the main purpose of mapping functional brain networks such as language and motor networks. We hypothesized that the evoked responses from single pulse electrical stimulation (SPES) can be used to localize the SOZ as they may express the natural frequencies and connectivity of the iEEG network. We constructed patient specific transfer function models from evoked responses recorded from 22 MRE patients that underwent SPES evaluation and iEEG monitoring. We then computed the frequency and connectivity dependent “peak gain” of the system, as measured by the H_∞ norm from systems theory, and the corresponding “floor gain,” which is the gain at which the H_∞ dipped 3dB below the DC gain. In cases for which clinicians had high confidence in localizing the SOZ, the highest peak gain transfer functions with the smallest “floor gains” corresponded to when the clinically annotated SOZ and early spread regions were stimulated. In more complex cases, there was a large spread of the peak gains when the clinically annotated SOZ was stimulated. Interestingly for patients who had successful surgeries, our ratio of peak-to-floor (PF) gains, agreed with clinical localization, no matter the complexity of the case. For patients with failed surgeries, the PF ratio did not match clinical annotations. Our findings suggest that transfer function gains and their corresponding frequency responses computed from SPES evoked responses may improve SOZ localization and thus surgical outcomes.

Committee Members

Sridevi V. Sarma, Department of Biomedical Engineering

Joon Y. Kang, Department of Neurology

Archana Venkataraman, Department of Electrical and Computer Engineering

Nathan E. Crone, Department of Neurology

Oct
23
Fri
Dissertation Defense: Gaspar Tognetti
Oct 23 @ 2:00 pm
Dissertation Defense: Gaspar Tognetti

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Circuits and Architecture for Bio-Inspired AI Accelerators

Abstract: Technological advances in microelectronics envisioned through Moore’s law have led to more powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at the scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant.

Unconventional Compute-in-Memory (CiM) architectures such as the analog winner-takes-all associative-memory, the Charge-Injection Device (CID) processor, and analog-array processing have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications (VMMs), and in recent work, multi-bit vector-vector multiplications. A similar approach was used in earlier work, where a charge-injection device array was utilized to store binary coded vectors, and computations were done using binary or multi-bit inputs in the charge domain; computation is carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with a large number of elements, high energy efficiencies can be achieved.

In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target storage technologies: (i) a multilevel non-volatile computational cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) computational bit-cell. Experimental results in deep-submicron CMOS processes demonstrate successful operation; subsequently, behavioral models were developed and employed in large-scale system simulations and emulations. Thereafter, at the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level, demonstrating successful experimental results and providing insight into the integration requirements that larger systems may demand. Finally, on the architectural level, two AI accelerator architectures for data center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory.

Committee Members

Andreas Andreou, Department of Electrical and Computer Engineering

Ralph Etienne-Cummings, Department of Electrical and Computer Engineering

Philippe Pouliquen, Department of Electrical and Computer Engineering

Dissertation Defense: Ruizhi Li
Oct 23 @ 2:00 pm
Dissertation Defense: Ruizhi Li

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: An Efficient and Robust Multi-Stream Framework for End-to-End Speech Recognition

Abstract: In the voice-enabled domestic or meeting environments, distributed microphone arrays aim to process distant-speech interaction into text with high accuracy. However, with dynamic corruption of noises and reverberations or human movement present, there is no guarantee that any microphone array (stream) is constantly informative. In these cases, an appropriate strategy to dynamically fuse streams or select the most informative array is necessary.

The multi-stream paradigm in Automatic Speech Recognition (ASR) considers scenarios where parallel streams carry diverse or complementary task-related knowledge. Such streams could be defined as microphone arrays, frequency bands, various modalities or etc. Hence, a robust stream fusion is crucial to emphasize on more informative streams than corrupted ones, specially under unseen conditions. This thesis focuses on improving the performance and robustness of speech recognition in multi-stream scenarios.

In recent years, with the increasing use of Deep Neural Networks (DNNs) in ASR, End-to-End (E2E) approaches, which directly transcribe human speech into text, have received greater attention. In this thesis, a multi-stream framework is presented based on joint Connectionist Temporal Classification/Attention (CTC/ATT) E2E model, where parallel streams are represented by separate encoders. On top of the regular attention networks, a secondary stream-fusion network is to steer the decoder toward the most informative streams. Two representative frameworks are proposed, which are Multi-Encoder Multi-Array (MEM-Array) and Multi-Encoder Multi-Resolution (MEM-Res), respectively.

The MEM-Array model aims at improving the far-field ASR robustness using multiple microphone arrays which are activated by separate encoders. With an increasing number of streams (encoders) requiring substantial memory and massive amounts of parallel data, a practical two-stage training strategy is desgnated to address these issues. Furthermore, a two-stage augmentation scheme is present to improve the robustness of the multi-stream model, where small amount of parallel data is sufficient to achieve competitive results. In MEM-Res, two heterogeneous encoders with different architectures, temporal resolutions and separate CTC networks work in parallel to extract complementary information from same acoustics. Compared with the best single-stream performance, both models have achieved substantial improvement, which also outperform various conventional fusion strategies.

While proposed framework optimizes information in multi-stream scenarios, this thesis also studies the Performance Monitoring (PM) measures to predict if recognition result of an end-to-end model is reliable, without growth-truth knowledge. Four different PM techniques are investigated, suggesting that PM measures on attention distributions and decoder posteriors are well-correlated with true performances.

Committee Members

Hynek Hermansky, Department of Electrical and Computer Engineering

Shinji Watanabe, Department of Electrical and Computer Engineering

Najim Dehak, Department of Electrical and Computer Engineering

Gregory Sell, JHU Human Language Technology Center of Excellence

Dec
16
Wed
Dissertation Defense: Tsan Zhao
Dec 16 @ 2:00 pm
Dissertation Defense: Tsan Zhao

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Medical Image Modality Synthesis and Resolution Enhancement Based on Machine Learning Techniques

Abstract: To achieve satisfactory performance from automatic medical image analysis algorithms such as registration or segmentation, medical imaging data with the desired modality/contrast and high isotropic resolution are preferred, yet they are not always available. We addressed this problem in this thesis using 1) image modality synthesis and 2) resolution enhancement.

The first contribution of this thesis is computed tomography (CT)-to-magnetic resonance imaging (MRI) image synthesis method, which was developed to provide MRI when CT is the only modality that is acquired. The main challenges are that CT has poor contrast as well as high noise in soft tissues and that the CT-to-MR mapping is highly nonlinear. To overcome these challenges, we developed a convolutional neural network (CNN) which is a modified U-net. With this deep network for synthesis, we developed the first segmentation method that provides detailed grey matter anatomical labels on CT neuroimages using synthetic MRI.

The second contribution is a method for resolution enhancement for a common type of acquisition in clinical and research practice, one in which there is high resolution (HR) in the in-plane directions and low resolution (LR) in the through-plane direction. The challenge of improving the through-plane resolution for such acquisitions is that the state-of-art convolutional neural network (CNN)-based super-resolution methods are sometimes not applicable due to lack of external LR/HR paired training data. To address this challenge, we developed a self super-resolution algorithm called SMORE and its iterative version called iSMORE, which are CNN-based yet do not require LR/HRpaired training data other than the subject image itself. SMORE/iSMOREcreate training data from the HR in-plane slices of the subject image itself, then train and apply CNNs to through-plane slices to improve spatial resolution and remove aliasing. In this thesis, we perform SMORE/iSMORE on multiple simulated and real data sets to demonstrate their accuracy and generalizability. Also, SMORE as a preprocessing step is shown to improve segmentation accuracy.

In summary, CT-to-MR synthesis, SMORE, and iSMORE were demonstrated in this thesis to be effective preprocessing algorithms for visual quality and other automatic medical image analysis such as registration or segmentation.

Committee Members

Jerry Prince, Department of Electrical and Computer Engineering

John Goutsias, Department of Electrical and Computer Engineering

Trac Tran, Department of Electrical and Computer Engineering

Mar
18
Thu
Dissertation Defense: Vishwanath Sindagi
Mar 18 @ 2:00 pm – 4:00 pm
Dissertation Defense: Vishwanath Sindagi

Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.

Title: Single Image Based Crowd Counting Using Deep Learning

Abstract: Estimating count and density maps from crowd images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. In addition, techniques developed for crowd counting can be applied to related tasks in other fields of study such as cell microscopy, vehicle counting and environmental survey. The task of crowd counting and density map estimation from a single image is a difficult problem since it suffers from multiple issues like occlusions, perspective changes, background clutter, non-uniform density, intra-scene and inter-scene variations in scale and perspective. These issues are further exacerbated in highly congested scenes. In order to overcome these challenges, we propose a variety of different deep learning architectures that specifically incorporate various aspects such as global/local context information, attention mechanisms, specialized iterative and multi-level multi-pathway fusion schemes for combining information from multiple layers in a deep network. Through extensive experimentations and evaluations on several crowd counting datasets, we demonstrate that the proposed networks achieve significant improvements over existing approaches.

We also recognize the need for large amounts of data for training the deep networks and their inability to generalize to new scenes and distributions.  To overcome this challenge, we propose novel semi-supervised and weakly-supervised crowd counting techniques that effectively leverage large amounts of unlabeled/weakly-labeled data. In addition to developing techniques with ability to learn from limited labeled data, we also introduce a new large-scale crowd counting dataset which can be used to train considerably larger networks. The proposed data consists of 4,372 high resolution images with 1.51 million annotations. We made explicit efforts to ensure that the images are collected under a variety of diverse scenarios and environmental conditions. The dataset provides a richer set of annotations like dots, approximate bounding boxes, blur levels, etc.

Committee Members

  • Vishal Patel, Department of Electrical and Computer Engineering
  • Rama Chellappa, Department of Electrical and Computer Engineering
  • Alan Yuille, Department of Computer Science
Back to top