Schiller and Carvey (2006) studied perceptual grouping of 4 x 4 clusters of stimuli (see Fig. 1) by having subjects view a collection of quartet stimuli (a cluster) composed of a central spot with four targets ~ 1 degree from the central spot positioned at a radial location of 45, 135, 225, and 315 degrees (see Fig. 2). A diagonal target pair was made to flash followed by the flash of the remaining diagonal target pair. This evokes apparent motion such that the targets either oscillate according to a zig-zag pattern from left to right or a see-saw pattern from top to bottom. The direction of apparent motion changes according to the way one views the targets: left-right eye movements between the targets induce the zig-zag pattern, and up-down eye movements between the targets induce a see-saw pattern. Most importantly the pattern assumed is transferred to all the quartets so that a uniform oscillation is observed for the entire group of clusters. It was found that making the luminance, color, or object shape different between the targets does not affect the grouping. However, if the size of the targets is made to be maximally different, e.g., 1 degree in diameter vs. 0.2 degrees in diameter (a 5-fold difference in size), the grouping across the quartets breaks down (see Fig. 3).

[For a view of the various stimulus configurations and the percepts induced by the 4 x 4 clusters of quartet stimuli see: ‘https://web.mit.edu/bcs/schillerlab/book.html’. To have a consistent viewing experience position the stimuli on your monitor at a distance of 57 cm away so that the targets are spaced from the central spot at ~ 1 degree of visual angle and so that the 4 x 4 clusters occupy a region of ~ 20 by 20 degrees of visual angle on your monitor: about two hand-widths, not including the thumb, with the arm fully extended from the face/shoulder toward the monitor.]

To arrive at an understanding of grouping, let us consider the induction of phosphenes from the visual cortex of macaque monkeys. Evoking phosphenes from area V1 of a monkey using electrical currents is such that once phosphene detection has been established through learning (which can take up to a week of 500 or so stimulation trials per day between 3 to 5 days, Tehovnik and Slocum 2009). This process can be transferred immediately to any region within the V1 map-sheet (Bartlett, Doty et al. 2005; Doty 1965; Doty 1969: Doty and Rutledge 1959; Tehovnik and Slocum 2009), whereby the range of sizes of a phosphene is restricted by the retinal magnification factor and the perceptual quality of a phosphene is set by the genetic/developmental characteristics of the activated neurons (Tehovnik and Slocum 2007b). But transfers from V1 to other areas such as V4 fail to be immediate (Bartlett, Doty et al. 2005; Doty 1965; Doty 1969: Doty and Rutledge 1959). This is because as compared to V1, area V4 has a different retinal magnification factor and the activated neurons encode different perceptual qualities (DeYoe et al. 1996; Kolster, Orban et al. 2010; Sereno et al. 1995). Thus, V4 phosphenes are of a different size and perceptual content using the same amount of current as used in area V1. That the size and quality of the phosphene is now different means that it is novel to the monkey and therefore it necessitates new learning (Bartlett, Doty et al. 2005; Doty 1965; Doty 1969). What this means is that each map in the visual cortex deals with a range of percepts and their size that are dealt with uniformly within an area and not across areas.

The aforesaid highlights that different maps within the visual cortex from V1 to V5 (MT) to MST, sts, and IT have different size characteristics based on the retinal magnification factor and additionally different perceptual content qualities encoded by the neurons as to whether they respond to objects or motion and at what stereo-depth with respect to the fixation plane. Indeed, the disparity tuning increase from fine (i.e., as low as 0.2 degrees for V1) to course (as high as 10 degrees for LIP) as one moves from V1 to V5 (MT) and through MST/LIP/IT and beyond to the frontal lobes (Burkhalter and van Essen 1986; Eifuka and Wurtz 1999; Fellman and van Essen 1987; Ferraina et al. 2000; Gnadt and Beyer 1998; Gnadt and Mays 1995; Hubel and Wiesel 1970; Livingston and Hubel 1987; Maunsell and van Essen 1983; Poggio and Fischer 1977; Poggio and Talbot 1981; Poggio et al. 1988; Roy et al. 1992; Uka et al. 2000; Watanabe et al. 2002). As well, eye movements that move in depth (i.e., have a vergence component) can be evoked electrically from the occipital-temporal junction that includes V4, MT, MST, IT and from the frontal lobes including regions within and outside the frontal eye fields (Jampel 1960). [Note that when the clusters for apparent motion are presented at different stereo-depth planes with respect to one another, grouping between the clusters is compromised (Schiller and Carvey 2006), but this is a topic for another discussion.]

So, how can the forgoing explain the size effect related to the grouping phenomenon for apparent motion? When subjects scan the 4 x 4 clusters they put their fovea onto the details of a cluster, and the directions of the oscillation, whether left-right or up-down, is determined at the foveal level by either scanning left-right-left and so on or up-down-up and so on. This foveal behavior sets in motion the grouping of all the clusters. But when the size of the elements within a cluster is maximally varied the grouping is interrupted. This might be so because at the fovea no visual area is able to encode the full range of sizes with a 5-fold differential. Therefore, a size-difference threshold for interrupting the grouping process needs to be determined by titrating the range of difference between the smallest and largest target to yield an interruption in grouping. The deduced range will be used to see how it corresponds to the range of sizes encoded by a visual area as determined by its retinal magnification factor, whether V1, MT, or MST, for example (DeYoe et al. 1996; Kolster, Orban et al. 2010; Sereno et al. 1995; Tehovnik and Slocum 2007b). In short, a visual area’s magnification factor is expected to put a size limit on grouping using the apparent motion of quartet stimuli as the test paradigm.

More Edward J Tehovnik's questions See All
Similar questions and discussions