Philosophy 371: Minds & Brains/Cognitive Science Lab 1997

Lab 4: Into the Third Dimension

Preamble. What vision is NOT.

As every eighth grader knows, the story of vision begins as light passes through the lens of the eye to be focused on the retina, forming a tiny upsidedown image of the scene in front of the eye. Up to this point, the eye functions like a camera, and the retinal image is simply a faithful picture of the outside world, with colors, shapes, and motions still literally present. But as any eighth grader could figure out, the story to this point does not yet offer an explanation of vision, but only a precondition for it. After all, if you severed the optic nerve, a person would be completely blind even if his eyes were functioning perfectly. The retinal image is not sufficient for visual perception to occur. Obviously, the full explanation of how vision works depends on processing in the brain.

So, what happens to that retinal image? How does the brain transform it into an understanding of the visual and spatial world? An eighth grader might suppose that the optic nerve simply transmits the image, like cable TV, back to the brain, where the brain simply "looks" at it to see what’s there. But this is still an inadequate answer: To say that the "mind’s eye" "looks" at an "inner picture" explains nothing, without an explanation of what this inner "looking" could be. And, is there anything like an "inner picture" to be looked at? Many people believe in inner pictures, but our labs so far have offered several reasons to doubt this hypothesis:

• If there were an inner picture, we should be able to use it as we might use a literal picture, and make a reasonably detailed copy of it by hand. But our drawing lab showed that the mind’s picture was at best a crude approximation of a picture, not at all like a photographic image.

• The drawing lab also suggested that our internal representation of faces was built from generic "facial features" -- eyes, noses, etc. -- which we could manipulate (like Mr. Potato-head) to create a likeness. We found that those generic features dominated over the specific details of the face before us. Even with a live model, our attempts to capture the literal lines were swamped by the plug-in mouths and noses. Only by turning the model upsidedown could we set aside the generic features.

• If vision led to an inner picture, we might expect the understanding to be a single process that would work equally efficiently on every feature of the picture. But the feature detection lab showed that some simple features are processed by a quick, parallel process, while others require a slow, serial process. Q jumps out in a field of O’s, but F must be actively sought in a field of E’s. This suggests that some features are processed separately from others, and by very different mechanisms.

• What are the mechanisms? The widget exercises suggested that it is easy to build devices that are specialized for extracting just one feature from the sensory world. A spot, movement in a certain direction, symmetry -- these are just three features that widget detectors easily detect. Note that widget detectors are micro-specialists. None of them produces a picture of anything, nor do they use pictures in any sense.

Could we come to understand vision in this widgety way? Could we see it as a bundle of parallel feature-detecting pathways, where increasingly sophisticated widget-detectors report their own micro-specialized results? Tonight we continue our explorations of the mechanisms of vision. Once again, prepare to be surprised.

 

The lab: How is three-dimensional form perceived?

The image on the retina is flat. The eighth grader’s theory of vision imagines that picture is recreated in the brain, where it is presumably flat as well. But we understand the world with depth, the objects in it having solid, 3-dimensional form. How do we get the third dimension out of flatland? Most of us learn standard answers to these questions: the "pictorial depth cues," like perspective, and other cues to depth, like binocular disparity (exploiting the slight difference in the point of view of our two eyes). These cues do work. When we see depth in pictures, for example, we have to rely on perspective and occlusion (a nearby object blocks the object it stands in front of). It requires surprisingly little information to activate a depth cue. Even the sketchiest of pictures often specifies a cue sufficient to pop out into the third dimension.

Given the usual line-up of these cues, familiar from intro psychology courses, you may suppose that depth perception is fully explained. After all, these cues seem sufficient to explain depth perception in all the normal cases. People do respond to them. End of story.

Or is it? Perhaps you are wondering, Are there other mechanisms of depth and form perception beyond the static cues of traditional perceptual psychology? We’re glad you asked! Tonight’s lab reveals a set of cues you may not have suspected, and explores their workings.

As a warm-up discussion, we will enumerate the traditional cues. This will be the background for developing new non-natural stimuli that do not offer the eye the usual cues to depth or 3-dimensional form. If these non-natural stimuli nonetheless pop out into the third dimension, then there must be some other depth cue, some other perceptual mechanism, to explain it.

We’ll explore two main types of 3-d form perception:

I. The perception of geometrical objects (in this lab, these include cylinders, spheres, barbells, hyperbolic paraboloids, and elliptic cones).

II. The perception of biological form (in this lab, a moving person).

The goal will be to develop experiments to specify the processes involved in each kind of perception.

 

 

The lab: Basic procedure.

This is best done with a partner.

This lab uses the "Insight 2" software package, as did last week’s. As you arrive in the lab, look on the desktop screen to see if the TAs have already copied the "Insight 2: In color" folder onto the desktop. If it is there, jump to step 3. Otherwise, follow all these steps.

Running the software:

1.The path:

Mac HD

Network Servers

Class software

2. Click on the folder Insight 2.

Drag the folder "Insight 2:In color" onto your desktop.

It will take a minute to copy.

3. Open the "Insight 2:In color" folder.

4. Click on the icon labelled "Insight 2" (just that, and nothing else). You may have to expand the window to find this file.

5. A main menu will open. You may see a notice about the Spatial Vision module of the program. If so, just click OK.

6. Click on Form & Motion.

Overview of program operation:

After you click on Begin, you’ll be able to experiment with several moving stimuli. These are called Movies. You can vary several aspects of each movie: the shape of the stimulus, the number of dots, the "correlation" or uniformity of dot motion, the background, and others. (You use various pull-down menus to make these changes.) After you set the parameters you want, pull down the File menu, and select Generate Movie. Then pull it down again, and select Play movie. When you want to stop the movie, move your mouse to the center of the screen and double click. Note that, in general, parameter settings stay on until changed for a new movie.

Before you begin, you should read the next section on "Program Operation," taken directly from the software.

Then you might read the attached "Introduction," also taken from the software. This background may help you in thinking about the report for tonight’s lab. (The report questions are on the very last page.)

Your report:

In all the experiments in tonight’s lab, the visual system begins with the simple stimulus of dots in motion. When there are enough dots, when there are not too many irrelevant dots, and when the motion is "right," the perception of form results. The lab focuses on two kinds of forms: regular geometric forms and "biological forms." In your report, address these questions:

• Is the perception of geometric form a distinct process from the perception of biological form?

• If they are one same process, describe that process.

• If they are two distinct processes, describe both processes, highlighting what is different between them.

In either case, describe experiments you conducted with the software that justify your answer.

One way to think of the question: From the independent moving dots, the brain extracts certain information which it combines with other information (including, perhaps, previously learned information). What information is extracted, and how? What does the brain "look for" in the moving dots?

You may hand this in as hard copy or via Docex, in the "Phil 371" folder. Make sure both names are on the report. Try to finish tonight, but if that is not feasible, reports will be accepted until 4 PM tomorrow. (Please bring them to my office, McCook 325.)

How to gather relevant experimental data for your report:

You can develop any experiment you want as you justify your main conclusion. However, here are four possible experiments to warm up on. Each may offer relevant clues to the processes of 3-d form perception. If you and your partner perform the experiments and discuss the questions raised for each one, you will find that you have many ideas for the main report question. Your answers for these warm-up questions need not be written out or handed in.

1. Experiment 1: Compare the effects of reducing dot correlations (1.00, 0.75, 0.35) and reducing the number of dots in the object (125, 50, 10, 2) for one of the geometric shapes (e.g. a sphere). Describe the two manipulations on your perception of the moving shape. Which degrades your perception more? Why?

2. Experiment 2: Compare your perception of the biological motion object with each of the background types (stationary, random, random motion, drifting). Which background degrades your perception of biological motion the most? The least? Why?

3. Experiment 3: Which is degraded more by adding a random motion background -- your perception of a moving geometric shape or of the biological motion? Why?

4. Experiment 4: The biological motion object is made of just 12 dots. Are 12 dots enough to generate the perception of a complete geometric shape (e.g. a sphere)? Why or why not?