Jamal Molin, Chetan Singh Thakur
From humans to computers, the ability to efficiently extract the most interesting regions of a visual scene is vital for any vision system. Visual attention depends on such extraction of salient regions for further visual processing. This extraction of interesting regions is important for the survival of biological systems as it allows for rapid detection of prey and predators in a cluttered visual environment. Furthermore, efficient and accurate computation of salient regions is essential for various tasks including object detection recognition, and tracking both in humans and computers. In humans, the retina transmits 100 Mbps of visual information per optic nerve. Henceforth, it is unlikely that this enormous amount of data is processed in parallel across the entire visual field even for the most complex human brain. Instead, it is hypothesized that at early stages of visual processing, the visual system only transmits interesting regions of the visual field.
Proto-object based visual saliency model
Our research involves developing a biologically-plausible, feed-forward, bottom-up dynamic visual saliency model capable of predicting human eye fixations over time. Many current visual saliency models are feature-based. Our model is object based, and specifically, based on the notion of proto-objects (pre-attentive and dynamic structures within a visual stimuli that generate the perception of an object when attended to (coherent field), and dissolves back into its proto-object state when attention is released. Such proto-objects are computed in the brain via communication between border-ownership and grouping cells which encode for figure-ground relationships within the visual scene. Saliency is then computed as a function of proto-objects within various feature channels (color, orientation, and intensity). Scale invariance is achieved by performing the computations on an image pyramid. The model flow can be seen in the diagram below.
Dynamic proto-object based visual saliency model (considers motion):
Another important factor on visual saliency is motion. The world is dynamic and constantly changing. Henceforth, it is important to consider such temporal information in a visual saliency model. Moreover, research has shown that motion is a more important cue than other low-level features for computing visual saliency. Our research involves determining the most biologically-plausible method for integrating a motion component into our proto-object based model in order to predict human fixations/saccades on video (dynamic visual stimuli) opposed to static stimuli. In doing so, we consider the spatiotemporal receptive fields of simple cells in V1. Our current model can be seen in the diagram below.
1. Software (Matlab) implementation of proto-object based visual saliency model. This model is capable of predicting human eye fixations on various images better than chance, and furthermore, outperforms other state-of-the-art visual saliency models (both feature-based and object-based) in predicting human eye fixations. This work was published in Vision Research journal (Alex Russel et. al, 2014).
2. Software (Matlab) implementation of a dynamic proto-object based visual saliency model. This model considers motion which may exist within the visual scene. It is capable of predicting human eye fixations better than chance on a set of videos. Furthermore, it is capable of predicting such human eye fixations better than other state-of-the-art dynamic visual saliency models which also consider motion. This work was published at CISS 2015 conference (Jamal Molin et. al., 2015).
3. Hardware implementation of the dynamic proto-object based visual model on FPGA (Field Programmable Gate Array) for real-time processing. This hardware-based implementation is capable of running at ~8 fps (~3x faster than Matlab). Allows for wider range of applications for real-time image processing. This work was demoed at BioCAS 2015 (Jamal Molin et. al., 2015).
Future work involves utilizing the Oculus Rift DK2 and building an eye tracker for compiling a virtual reality based video dataset for validating visual saliency models within such an environment. Furthermore, we are interested in utilizing the real-time eye tracker within the Rift to do augmented reality or assisted vision by integrating the visual saliency model with the eye tracker.
Finally, we also seek to design an audio-visual saliency model by integrating an audio component to our visual saliency model.
- J. L. Molin, E. Niebur, R. Etienne-Cummings, “How is motion integrated into a proto-object based visual saliency model?”, Proc. CISS 2015, Baltimore, MD March 2015.
- A. F. Russell, S. Mihalas, R. von der Heydt, E. Niebur, and R. Etienne-Cummings, “A model of proto-object based saliency,” Vision Research, vol. 94, pp. 1–15, 2014.
- J. L. Molin, A. F. Russell, S. Mihalas, E. Niebur, and R. Etienne-Cummings, “Proto-object based visual saliency model with a motion-sensitive channel,” BioCAS, pp. 25–28, Atlanta, GA, October 2013.
- T. Figliolia, Mendat, D. R., Russell, A. F., Murray, T. A., Niebur, E., Etienne-Cummings, R., & Andreou, A. G., “Auditory Modulation Of Visual Proto-Object Formation In A Hierarchical Auditory-Visual Saliency Map,” Proc. CISS 2013, Baltimore, MD, March 2013.