The Role of Mental Representations in Cognitive Studies of Film

Anthony Chemero and Andrei Cimpian

Department of Psychology

Franklin and Marshall College

Lancaster, PA 17604-3003 USA

 

Classical cognitive science (Fodor and Pylyshyn 1988) is founded on the idea that mind is a digital computer and that thinking is computation.  Since computation is usually understood as the rule-governed manipulation of representations (Haugeland 1985), this foundational idea requires the assumption that the mind contains representations of aspects of the environment.  Yet, due to advances in modeling within cognitive science, many have been denying this necessary condition.  That is, many have embraced anti-representationalism – very roughly, the idea that cognition does not involve representations at all. Due to the emphasis they place on the role of perceptual and cognitive processes in viewing films, sympathizers of Cognitive Film Theory (CFT) would benefit from a clear perspective on the relevance of this debate to their field in its current state, as well as on its potential implications for future research. In short, an awareness of the issues involved in the cognitive scientists’ disagreement over representations is likely to enhance the theoretical acuity of CFT. Thus, the thrust of this paper is to discuss the degree to which CFT should find it useful to rely on mental representations in its interpretation of film viewing; to this end, we will first provide and explain one standard definition of mental representation and then apply it, along with a non-representational account, to a phenomenon commonly encountered, especially in movie theaters – people’s ability to see a surface on which colored light is projected as either a two-dimensional surface or a three-dimensional scene, but never both. The conclusion of the paper will draw on the comparison of the two approaches to this phenomenon in order to reach more general conclusions about the usefulness of adopting either stance (representationalist or non-representationalist) in CFT research.

          We will start by providing a definition of representation that will match up with both philosophical tradition and current usage in cognitive science (Chemero 2000).  In order to show that something is a representation, it will suffice to show that it meets the following three conditions.

A feature R0 of a system S (say, an organism) will be counted as a Representation for S if and only if:

(R1) R0 stands between a representation producer P and a representation consumer C (say, a part of the perceptual apparatus of S) that have been standardized to fit one another.

(R2) R0 has as its proper function to adapt the representation consumer C to some aspect A0 of the environment, in particular by leading S to behave appropriately with respect to A0, even when A0 is not the case.

(R3) There are (in addition to R0) transformations of R0, R1...Rn, that have as their function to adapt the representation consumer C to corresponding transformations of A0, A1...An.

Though, we will not defend this definition here (see Chemero 1998, Chemero and Eck 1999, Millikan 1984), we will very briefly point out a few of its features.  First, as mentioned above, since it requires a representation to have functions, it is teleological (R2).  Second, it requires that the representation serve as a representation in the context of producing and consuming devices (R1). Combining R1 and R2, we could say that something is a representation whenever it is one of several things that were designed to be used as representations by some agent (where ‘agent’ is intended to be neutral among humans, non-human animals, and machines).  Third, it requires that a representation be part of a system of representations (R3).  Agents must be able to represent more than one thing, else they should not be thought of as representing anything at all.  Fourth, it requires that we follow Millikan (1984) in focusing on the representation consumer in determining the content of a representation – the content is the way the world would need to be for the behavior caused by the representation consumer to be adaptive (R2).

According to this definition of representation, everything that was designed to interact with its environment represents its environment. That is, one can argue that any system is representational, using the definition of representation outlined above, one that matches up with usage in cognitive science and philosophy (see Chemero 2000 for argument). A straightforward consequence of this definition is that claims such as “A person’s ability to understand the razor wire scene in Suspiria is due to his/her mental representations” are not (by themselves) informative – they are merely equivalent to the claim that people have well-designed visual and auditory systems. It would thus seem that the mere positing of mental representations does not carry much in the way of explanatory force.  There must, then, be some other reason for invoking mental representations in our explanations.  The question that must be asked is what those other reasons are. 

The best answer to this question is that positing mental representations provides a guide to discovery of new phenomenon.  That is, positing mental representations in film viewers (and other cognitive agents) is useful because doing so allows for the generation of hypotheses that can be tested by experiment, with the results of the experiment yielding further hypotheses and further experiments. (See Chemero 2000.)  Notice that this is a methodological, not metaphysical, reason to posit mental representations; mental representations might be posited only if they lead to progress in our scientific understanding of perception, cognition, and the like.  So, ultimately, whether or not we should explain our abilities in terms of mental representations will be determined by how much we can explain, and how well we can explain it, in terms of mental representations – that is, it’s a pragmatic matter, one which depends upon how useful positing mental representations are.  To steal a phrase from William Ramsey (1997), mental representations should be posited only if they earn their explanatory keep.

Currently in cognitive science, the extent to which mental representations earn their explanatory keep is hotly debated.  Fodor (1975), Clark (1997), and Markman & Dietrich (2000) (among others) have argued that representations are absolutely necessary. Brooks (1991), van Gelder(1995), and Keijzer (2001) (among others) have argued that representations are an explanatory dead end.  In our opinion, the history of representation-based cognitive science is one of limited successes and numerous failures.  On the other hand, in the case of non-representationally-based cognitive science, it is too early to tell what the merits of the approach are – there just has not been enough time for the empirical evidence to accumulate.  Nevertheless, it is in this (empirical, ultimately pragmatic) light that we should consider the question of whether representations are necessary in order to explain our abilities – and the results are not in yet.  A priori arguments for or against representations are unconvincing and almost always a waste of time. (See Chemero 2001).

The implications of this state of affairs for Cognitive Film Theory are now easily drawn: Since it’s not clear whether representations are necessary in cognitive science in general, it is also not clear whether they are necessary in CFT.  Therefore, individual researchers are left with considerable freedom in how they approach their projects – their intuitions concerning the value of representation in explaining our experiences with motion pictures should be the determining factor in how they go about their research.  Importantly, irrespective of what their initial convictions are regarding whether or not thought is use and/or manipulation of internal mental representations, the work of CFT researchers will contribute not only to their own field, but also to the debate over representations in cognitive science.

In order to illustrate the ideas presented thus far, we will now pit the two types of explanations against each other by presenting two models (one representational, one non-representational) of the same phenomenon, which we will call critical switching. To see critical switching in action, consider the recent television commercial by the pest control company Orkin.  This commercial begins as a standard pitch, with a man describing the benefits of hiring Orkin to rid your home of pests.  Then, apparently, a cockroach skitters across the surface of the TV monitor.  Immediately, one’s attention is drawn from the actor sitting, a few feet behind the TV screen selling a product (the scene) to the TV screen itself (a surface).  This change of perspective is sufficiently compelling that many viewers reported swatting at, even breaking, their television sets trying to kill the cockroach. 

Just as when one looks at a Necker cube, one sees one, but not both, of two different possible cubes, when viewing a film, one sees the play of colors on the screen as either a three-dimensional scene or a two-dimensional surface, but never both simultaneously. As Anderson (1996) suggested, our perceptual systems will not tolerate ambiguity, and thus we tend to go back and forth between projected-film-as-scene and projected-film-as-surface but not settle in-between. This is critical switching.  In what follows we will describe two different models purporting to explain critical switching: a representational model, and a non-representational model.

Anderson (1996, and personal communication) has identified several factors that play a role in determining whether one sees a projected film as two-dimensional or three-dimensional:

(a) motion parallax – a depth cue which arises when the observer (in our case, the camera) moves against a stationary background, thus making it seem that closer objects move much faster than objects at a distance (in the direction of motion opposite to the camera’s);

(b)  surface noise – small bodies (specks of dust, hair, etc.) and irregularities on the surface of the projection screen;

(c) depth information – information gained from the structure of the light projected on the screen, mostly from texture gradients, shading, etc.;

(d) “narrative strength” – the degree to which the plot is able to “draw in” the audience.

The identification of these factors make it possible to explain critical switching using a connectionist network (Rummelhart and McClelland 1986). 

Connectionist network research is a way of doing artificial intelligence that is based (loosely, we must add) on knowledge about brains.  A connectionist network consists of a (sometimes large) number of simple units or nodes that communicate with each other via connections that are capable of carrying only very simple signals. The parallel with the anatomy of the nervous system is realized in virtue of the fact that the units are intended to be analogous to neurons, while the connections are intended to be analogous to synapses.  Connections are weighted.  The weight of a connection between two units determines the degree to which one influences the other. Weights can either be excitatory (which tend to stimulate the activity of the receiving unit) or inhibitory (which tend to suppress the activity of the unit).  Also, units are normally arranged into layers.  There is always an input layer (analogous to an organism’s perceptual system) and an output layer.  There are usually one or more hidden layers.  A neural network structured this way (see Figure 1) is able to learn – and there is no need for additional programming to encode the network’s extra knowledge.  A neural network can be taught to solve particular problems; in general, when a network learns to solve a problem, the weights with which the network starts out (usually assigned randomly) are modified.

In neural networks, representations are distributed, in that groups of units are responsible for individual representations.  A distributed representation is a pattern of activation that exists across several units simultaneously.  For example, a particular pattern of activation across 10 nodes might be a representation of "zebra".  A different pattern across the same nodes might be a representation of "giraffe".  If we were to apply this type of representational model to the phenomenon under scrutiny (switching between the perception of a two-dimensional surface and a three-dimensional scene), we would first have to claim that there exist parts of our brains that represent the level of each of the relevant factors – motion parallax, surface noise, depth information, and narrative strength.  Supposing that we also know from experiments what combinations of levels of the factors above give us two-dimensional and three-dimensional perceptions, we are in a position to train a connectionist network to match up with human perception. 

As seen in Figure 2, such a network would have an input layer consisting of four nodes – one for each of the factors, one hidden layer (although here we could vary both the number of hidden layers and the number of units within each layer), and an output layer, which indicates the type of perception that will occur given a particular input.  After being trained by backpropagation (a supervised learning algorithm which strengthens the connections that contributed to a correct response on a particular trial and weakens the ones that did not), the network would perform appropriately on the training set.  It would also generalize: it would give the “right answer” on sets of values not in the training set. 

Successful discrimination between the sets of values that lead to perception in two dimensions or three dimensions in such a network would be explained in terms of distributed representations across the hidden nodes.  In particular, we would explain the match between the performance of the network and the performance of humans by pointing to the internal representations of motion parallax, etc. in the network, and by suggesting that humans have similar internal representations of these features of the motion picture.

A non-representational model of critical switching can be formulated within dynamical systems theory (DST), a branch of mathematics; see Port and Van Gelder 1995.  Most basically, according to the dynamical approach, agents are dynamical systems and are best explained mathematically, without any reference to representations.  A dynamical system is a set of quantitative variables changing continually, concurrently, and interdependently over time in accordance with dynamical laws described by some set of equations.  A crucial feature of the dynamical approach is that some of the components of a dynamical system might be inside the agent, while others are outside.  Given this set of assumptions, DST claims that there is no good reason to say that the mind is contained within the brain or even within the skin of an organism.  Since this is true, there is no need to posit representations: the animal interacts with the environment, not representations of it, and parts of the environment are actually components of the animal-plus-environment dynamical system.

We can describe this aspect of dynamical systems explanations by saying that the agent and environment are best thought of as a single coupled dynamical system.  That is, we could (somewhat arbitrarily) make a distinction between an agent A and its environment E, but it is often more useful to think of them as one larger system U.  Figure 3 contains a more detailed depiction of what is meant by coupling: The change of the animal () is a function of the animal’s state () and its sensing of the environment (). Moreover, the change of the environment () is a function of the environment’s state () and the animal’s motion (). The changes effected on and by the animal and the environment influence one another and form one system – the animal and the environment are coupled.  For our purposes, it is crucial to notice that U (the animal + environment) is the dynamical system of interest. 

          The idea of coupling is illustrated in a dynamical model of cognition that has proven widely applicable and whose range is being extended to more aspects of cognition: the HKB model (Kelso 1995; Haken, Kelso and Bunz 1985).  The HKB model is based upon a very simple, very robust experimental result.  Subjects asked to wag their index fingers left-to-right can produce only two stable patterns of bimanual coordination.  In one, called in-phase or relative phase 0, the fingers approach one another at the mid-line of the body; in the other, called out-of-phase or relative phase .5, the fingers move simultaneously to the left, then to the right, like the windshield wipers on most cars. As subjects were asked to wag their fingers out-of-phase at gradually increasing rates, they eventually were unable to do so, and slipped into in-phase wagging.  The HKB model for this behavior applies a vector field to the relative phase of the fingers.  At slower rates, this field has two attractors, one at relative phase .5, another at relative phase 0.  This means that any finger wagging will tend to be stable only when one of these values for relative phase is maintained.  But as the rate increases (and passes what HKB call the critical point), the attractor at .5 disappears, so the only remaining attractor is at relative phase 0.  So finger-wagging at higher rates will tend to be stable only when it is in-phase.  The mathematical model of this behavior (including its fundamental equation) is illustrated in Figure 4.  In the equation for the energy potential (V),  is the relative phase and the ratio b/a is inversely related to rate (in our case, the rate of finger wagging).  This function, it is worth noting, is the simplest that will accommodate all the data.

The HKB model is most impressive because it is extremely widely applicable – it can be used to describe nearly every sort of two-factor coordinated action where there is critical behavior, including instances of limb-limb coordination, person-person interaction, or, most importantly for present purposes, person-external signal interaction.  Furthermore, the HKB mathematics can be applied to any coupling of dynamical systems with critical points, points where a change in the behavior of the agent occurs due to the disappearance of an attractor. This model is therefore applicable to critical switching in film perception as well: a person-viewing-a-film is a coupled dynamical system, part of which is on the screen, part of which is inside the viewer. If we assume that the values of the HKB equation variables a and b measure the coupling between the viewer and aspects of the film, then, at some values of a and b (say, for high surface noise and low motion parallax), it will only be possible to view the film as a surface, while for others, the opposite will be true.  For yet other values of a and b, it will be possible to view the film as either a surface or a scene.  That is, we will see critical switching between viewing the film as surface and viewing the film as surface and viewing the film as scene.

Notice that given this application of the HKB model, there is no need to posit representations inside the film viewer to account for critical switching.  This dynamical account does not require mental representations because the switching from seeing the film as surface to seeing it as a scene does not occur inside the viewer.  The film viewer is not taken to be building an internal representation of the film, which she then sees as being either 2D or 3D, but not both.  Instead, the viewer and film are just one system, and the critical switching occurs in the relationship between them.

We have reviewed two models – one representational, one non-representational – of the same phenomenon, critical switching.  Cognitive Film Theorists can adopt either perspective in conducting their research – at this stage, their decisions should be based on the degree to which they find one model more convincing than the other or on whether they feel that one approach is likely to provide more plausible explanations of the phenomenon they are studying.  As it is extremely unlikely that an a priori demonstration of the real nature of cognition will ever be provided, the results of the studies conducted in CFT, irrespective of their theoretical stance, will be viewed as contributions to the eventual resolution of the debate over representations within cognitive science.

 

 


Works Cited

 

Anderson, J. (1996).  An Ecological Approach to Cognitive Film Theory.  Carbondale: Southern Illinois University Press.

 

Brooks, R. (1991). Intelligence Without Reason. Artificial Intelligence 47: 139-159.

 

Chemero, A. (1998) How to be an Anti-Representationalist. Doctoral Thesis in Philosophy and Cognitive Science, Indiana University, Bloomington, IN.

 

Chemero, A. (2000)  Anti-representationalism and the dynamical stance.  Philosophy of Science 67, 625-647.

 

Chemero, A. (2001).  Dynamical Explanation and Mental Representation. Trends in Cognitive Science, 5, 4, 140-141.

 

Chemero, A. and Eck, D. (1999).  An Exploration of Representational Complexity via Coupled Oscillator Systems. In U. Priss (ed.) Proceedings of the 1999 Midwest AI and Cognitive Science Conference, Cambridge: AAAI Press.

 

Clark, A. (1997). Being There. Cambridge: MIT Press.

 

Fodor, J. (1975).  The Language of Thought.  Cambridge: MIT Press.

 

Fodor, J. and Pylyshyn, Z. (1988). Connectionism and Cognitive Architecture. Cognition 28: 3-71.

 

Haugeland, J. (1985).  Artificial Intelligence: The Very Idea.  Cambridge: MIT Press.

 

Haken, H., Kelso, J.A.S.,  Bunz, H.  (1985).  A theoretical model of phase transitions in human hand movements. Biological Cybernetics 51, 347-356.

 

Keijzer, F. (2001).  Representation and Behavior.  Cambridge: MIT Press.

 

Kelso, J. A. S. (1995) Dynamic Patterns. Cambridge: MIT Press.

 

Markman, A. and Dietrich, E. (2000).  Extending the Classical View of Representations.  Trends in Cognitive Sciences 4, 470-475.

 

Millikan, R. (1984) Language, Thought and Other Biological Categories. Cambridge: MIT Press.

 

Port, R. and van Gelder, T. (1995). Mind as Motion. Cambridge: MIT Press.

 

Ramsey, W. (1997). Do Connectionist Representations Earn their Explanatory Keep? Mind and Language, 12: 34-66.

 

van Gelder, T. (1995). What might Cognition be if not Computation? Journal of Philosophy, 91, 345-381.

 

 

 

 


 


Figure 1. A two-layer and a three-layer connectionist network.


 

 

 


 

 


                   Figure 2. Sample connectionist network that outputs whether the perception is of a two-dimensional surface or a three-dimensional scene.

 


 


Figure 3. The coupling of the animal and the environment and the equations that express it.

 


 


Figure 4. The Haken-Kelso-Bunz model for finger wagging.  Please note that the two vertical lines on the graph of potential vs. phase are in fact the same line.