Anthony Chemero and Andrei Cimpian Department of Psychology Franklin and Marshall College Lancaster, PA 17604-3003 USA Classical cognitive science (Fodor and Pylyshyn 1988) is founded on the
idea that mind is a digital computer and that thinking is computation. Since computation is usually understood as
the rule-governed manipulation of representations (Haugeland 1985), this
foundational idea requires the assumption that the mind contains
representations of aspects of the environment.
Yet, due to advances in modeling within cognitive science, many have
been denying this necessary condition.
That is, many have embraced anti-representationalism – very roughly, the
idea that cognition does not involve representations at all. Due to the
emphasis they place on the role of perceptual and cognitive processes in
viewing films, sympathizers of Cognitive Film Theory (CFT) would benefit from a
clear perspective on the relevance of this debate to their field in its current
state, as well as on its potential implications for future research. In short,
an awareness of the issues involved in the cognitive scientists’ disagreement
over representations is likely to enhance the theoretical acuity of CFT. Thus,
the thrust of this paper is to discuss the degree to which CFT should find it
useful to rely on mental representations in its interpretation of film viewing;
to this end, we will first provide and explain one standard definition of
mental representation and then apply it, along with a non-representational
account, to a phenomenon commonly encountered, especially in movie theaters –
people’s ability to see a surface on which colored light is projected as either
a two-dimensional surface or a three-dimensional scene, but never both. The
conclusion of the paper will draw on the comparison of the two approaches to
this phenomenon in order to reach more general conclusions about the usefulness
of adopting either stance (representationalist or non-representationalist) in
CFT research. We
will start by providing a definition of representation that will match up with both philosophical tradition and current usage in
cognitive science (Chemero 2000).
In order to show that something is a representation, it will suffice to
show that it meets the following three conditions. A
feature R0 of a system S (say, an
organism) will be counted as a
Representation for S if and only if: (R1) R0 stands between a representation producer
P and a representation consumer C (say, a part of the perceptual apparatus of
S) that have been standardized to fit one another. (R2) R0 has as its proper function to adapt the
representation consumer C to some aspect A0
of the environment, in particular by leading S to behave appropriately with respect
to A0, even when A0 is not the case. (R3)
There are (in addition to R0)
transformations of R0, R1...Rn,
that have as their function to adapt the representation consumer C to
corresponding transformations of A0,
A1...An. Though,
we will not defend this definition here (see Chemero 1998, Chemero and Eck
1999, Millikan 1984), we will very briefly point out a few of its
features. First, as mentioned above,
since it requires a representation to have functions, it is teleological (R2). Second, it requires that the representation
serve as a representation in the context of producing and consuming devices
(R1). Combining R1 and R2, we could say that something is a representation whenever it is one
of several things that were designed to be used as representations by some
agent (where ‘agent’ is intended to be neutral among humans, non-human animals,
and machines). Third,
it requires that a representation be part of a system of representations
(R3). Agents must be able to represent
more than one thing, else they should not be thought of as representing
anything at all. Fourth, it requires
that we follow Millikan (1984) in focusing on the representation consumer in
determining the content of a representation – the content is the way the world
would need to be for the behavior caused by the representation consumer to be
adaptive (R2). According
to this definition of representation, everything
that was designed to interact with its environment represents its environment.
That is, one can argue that any system is representational, using the
definition of representation outlined above, one that matches up with usage in
cognitive science and philosophy (see Chemero 2000 for argument). A
straightforward consequence of this definition is that claims such as “A person’s
ability to understand the razor wire scene in Suspiria is due to his/her
mental representations” are not (by themselves) informative – they are merely
equivalent to the claim that people have well-designed visual and auditory
systems. It would thus seem that the mere positing of mental representations
does not carry much in the way of explanatory force. There must, then, be some other reason for invoking mental
representations in our explanations.
The question that must be asked is what those other reasons are. The
best answer to this question is that positing mental representations provides a
guide to discovery of new
phenomenon. That is, positing mental
representations in film viewers (and other cognitive agents) is useful because
doing so allows for the generation of hypotheses that can be tested by
experiment, with the results of the experiment yielding further hypotheses and
further experiments. (See Chemero 2000.)
Notice that this is a methodological, not metaphysical, reason to posit
mental representations; mental representations might be posited only if they
lead to progress in our scientific understanding of perception, cognition, and
the like. So, ultimately, whether or not we should explain our abilities in terms of mental
representations will be determined by how much we can explain, and how well we
can explain it, in terms of mental representations – that is, it’s a pragmatic matter, one which depends upon
how useful positing mental representations are. To steal a phrase from William Ramsey (1997), mental
representations should be posited only if they earn their explanatory keep. Currently
in cognitive science, the extent to which mental representations earn their
explanatory keep is
hotly debated. Fodor (1975), Clark
(1997), and Markman & Dietrich (2000) (among others) have argued that
representations are absolutely necessary. Brooks (1991), van Gelder(1995), and
Keijzer (2001) (among others) have argued that representations are an
explanatory dead end. In our opinion,
the history of representation-based cognitive science is one of limited
successes and numerous failures. On the
other hand, in the case of non-representationally-based cognitive science, it
is too early to tell what the merits of the approach are – there just has not
been enough time for the empirical evidence to accumulate. Nevertheless, it is in this (empirical,
ultimately pragmatic) light that we should consider the question of whether
representations are necessary in order to explain our abilities – and the
results are not in yet. A priori
arguments for or against representations are unconvincing and almost always a
waste of time. (See Chemero 2001). The
implications of this state of affairs for Cognitive Film Theory are now easily
drawn: Since it’s not clear whether representations are necessary in cognitive
science in general, it is also not clear whether they are necessary in
CFT. Therefore, individual researchers
are left with considerable freedom in how they approach their projects – their
intuitions concerning the value of representation in explaining our experiences
with motion pictures should be the determining factor in how they go about
their research. Importantly,
irrespective of what their initial convictions are regarding whether or not
thought is use and/or manipulation of internal mental representations, the work
of CFT researchers will contribute not only to their own field, but also to the
debate over representations in cognitive science. In
order to illustrate the ideas presented thus far, we will now pit the two types
of explanations against each other by presenting two models (one
representational, one non-representational) of the same phenomenon, which we
will call critical switching. To see critical
switching in action, consider the recent television commercial by the pest
control company Orkin. This commercial
begins as a standard pitch, with a man describing the benefits of hiring Orkin
to rid your home of pests. Then,
apparently, a cockroach skitters across the surface of the TV monitor. Immediately, one’s attention is drawn from
the actor sitting, a few feet behind the TV screen selling a product (the
scene) to the TV screen itself (a surface).
This change of perspective is sufficiently compelling that many viewers
reported swatting at, even breaking, their television sets trying to kill the
cockroach. Just
as when one looks at a Necker cube, one sees one, but not both, of two
different possible cubes, when viewing a film, one sees the play of colors on the screen as
either a three-dimensional scene or a two-dimensional surface, but never both simultaneously. As Anderson
(1996) suggested, our perceptual systems will not tolerate ambiguity, and thus
we tend to go back and forth between projected-film-as-scene and projected-film-as-surface
but not settle in-between. This is critical switching. In what follows we will describe two
different models purporting to explain critical switching: a representational
model, and a non-representational model. Anderson (1996, and
personal communication) has identified several factors that play a role in
determining whether one sees a projected film as two-dimensional or
three-dimensional: (a) motion parallax – a depth cue which arises when the observer (in our case, the
camera) moves against a stationary background, thus making it seem that closer
objects move much faster than objects at a distance (in the direction of motion
opposite to the camera’s); (b) surface
noise – small bodies (specks of dust, hair, etc.) and irregularities on the
surface of the projection screen; (c) depth information – information gained from the structure of the light projected on
the screen, mostly from texture gradients, shading, etc.; (d) “narrative strength” – the degree to which the plot is able to “draw in” the audience. The identification of these factors make it
possible to explain critical switching using a connectionist network
(Rummelhart and McClelland 1986). Connectionist network
research is a way of doing artificial intelligence that is based (loosely, we
must add) on knowledge about brains. A connectionist network
consists of a (sometimes large) number of simple units or nodes that
communicate with each other via connections
that are capable of carrying only very simple signals. The parallel with the
anatomy of the nervous system is realized in virtue of the fact that the units
are intended to be analogous to neurons, while the connections are intended to
be analogous to synapses. Connections
are weighted. The weight of a connection between two units
determines the degree to which one influences the other. Weights can either be excitatory (which tend to stimulate the
activity of the receiving unit) or inhibitory
(which tend to suppress the activity of the unit). Also, units are normally arranged into layers.
There is always an input layer
(analogous to an organism’s perceptual system) and an output layer. There are
usually one or more hidden layers. A neural network structured this way (see
Figure 1) is able to learn – and there is no need for additional
programming to encode the network’s extra knowledge. A neural network can be taught to solve particular problems; in general, when a
network learns to solve a problem, the weights
with which the network starts out (usually assigned randomly) are modified. In
neural networks, representations
are distributed, in that groups of
units are responsible for individual representations. A distributed representation is a pattern of activation that exists across several units
simultaneously. For example, a
particular pattern of activation across 10 nodes might be a representation of
"zebra". A different pattern across the same nodes might be a
representation of "giraffe".
If we were to apply this type of representational model to the
phenomenon under scrutiny (switching between the perception of a
two-dimensional surface and a three-dimensional scene), we would first have to
claim that there exist parts of our brains that represent the level of each of
the relevant factors – motion parallax, surface noise, depth information, and
narrative strength. Supposing that we
also know from experiments what combinations of levels of the factors above
give us two-dimensional and three-dimensional perceptions, we are in a position
to train a connectionist network to match up with human perception. As seen in Figure 2,
such a network would have an input layer consisting of four nodes – one for
each of the factors, one hidden layer (although here we could vary both the
number of hidden layers and the number of units within each layer), and an
output layer, which indicates the type of perception that will occur given a
particular input. After being trained
by backpropagation (a supervised learning algorithm which
strengthens the connections that contributed to a correct response on a
particular trial and weakens the ones that did not), the network would
perform appropriately on the training set.
It would also generalize: it would give the “right answer” on sets of
values not in the training set. Successful
discrimination between the sets of values that lead to perception in two
dimensions or three dimensions in such a network would be explained in terms of
distributed representations across
the hidden nodes. In particular, we
would explain the match between the performance of the network and the
performance of humans by pointing to the internal representations of motion
parallax, etc. in the network, and by suggesting that humans have similar
internal representations of these features of the motion picture. A non-representational
model of critical switching can be formulated within dynamical systems theory
(DST), a branch of mathematics; see Port and Van Gelder 1995. Most basically, according to the dynamical
approach, agents are dynamical systems
and are best explained mathematically,
without any reference to representations.
A dynamical system is a set of quantitative variables changing
continually, concurrently, and interdependently over time in accordance with
dynamical laws described by some set of equations. A crucial feature of the dynamical approach is that some of the
components of a dynamical system might be inside the agent, while others are
outside. Given this set of assumptions,
DST claims that there is no good reason to say that the mind is contained
within the brain or even within the skin of an organism. Since this is true, there is no need to
posit representations: the animal interacts with the environment, not
representations of it, and parts of the environment are actually components of
the animal-plus-environment dynamical system. We can describe this
aspect of dynamical systems explanations by saying that the agent and
environment are best thought of as a single coupled
dynamical system. That is, we could
(somewhat arbitrarily) make a distinction between an agent A and its
environment E, but it is often more useful to think of them as one larger
system U. Figure 3 contains a
more detailed depiction of what is meant by coupling: The change of the animal
( The
idea of coupling is illustrated in a dynamical model of cognition that has
proven widely applicable and whose range is being extended to more aspects of
cognition: the HKB model (Kelso 1995; Haken, Kelso and Bunz 1985). The HKB model is based upon a very simple,
very robust experimental result. Subjects
asked to wag their index fingers left-to-right can produce only two stable
patterns of bimanual coordination. In
one, called in-phase or relative phase 0, the fingers approach
one another at the mid-line of the body; in the other, called out-of-phase or relative phase .5, the fingers move simultaneously to the left,
then to the right, like the windshield wipers on most cars. As subjects were
asked to wag their fingers out-of-phase at gradually increasing rates, they
eventually were unable to do so, and slipped into in-phase wagging. The HKB model for this behavior applies a vector
field to the relative phase of the fingers.
At slower rates, this field has two attractors, one at relative phase
.5, another at relative phase 0. This
means that any finger wagging will tend to be stable only when one of these
values for relative phase is maintained.
But as the rate increases (and passes what HKB call the critical point), the attractor at .5
disappears, so the only remaining attractor is at relative phase 0. So finger-wagging at higher rates will tend
to be stable only when it is in-phase.
The mathematical model of this behavior (including its fundamental
equation) is illustrated in Figure 4.
In the equation for the energy potential (V), The HKB model is most
impressive because it is extremely widely applicable – it can be used to
describe nearly every sort of two-factor coordinated action where there is
critical behavior, including instances of limb-limb coordination, person-person
interaction, or, most importantly for present purposes, person-external signal interaction. Furthermore, the HKB mathematics can be applied to any coupling
of dynamical systems with critical points, points where a change in the
behavior of the agent occurs due to the disappearance of an attractor. This
model is therefore applicable to critical switching in film perception as well:
a person-viewing-a-film is a coupled dynamical system, part of which is on the
screen, part of which is inside the viewer. If we assume that the values of the
HKB equation variables a and b measure the coupling between the viewer and
aspects of the film, then, at some values of a and b (say, for high surface
noise and low motion parallax), it will only be possible to view the film as a
surface, while for others, the opposite will be true. For yet other values of a and b, it will be possible to view the
film as either a surface or a scene.
That is, we will see critical switching between viewing the film as
surface and viewing the film as surface and viewing the film as scene. Notice that given this
application of the HKB model, there is no need to posit representations inside
the film viewer to account for critical switching. This dynamical account does not require mental representations
because the switching from seeing the film as surface to seeing it as a scene
does not occur inside the viewer. The
film viewer is not taken to be building an internal representation of the film,
which she then sees as being either 2D or 3D, but not both. Instead, the viewer and film are just one
system, and the critical switching occurs in the relationship between them. We have reviewed two models
– one representational, one non-representational – of the same phenomenon,
critical switching. Cognitive Film
Theorists can adopt either perspective in conducting their research – at this
stage, their decisions should be based on the degree to which they find one
model more convincing than the other or on whether they feel that one approach
is likely to provide more plausible explanations of the phenomenon they are
studying. As it is extremely unlikely
that an a priori demonstration of the
real nature of cognition will ever be
provided, the results of the studies conducted in CFT, irrespective of their
theoretical stance, will be viewed as contributions to the eventual resolution
of the debate over representations within cognitive science. Works Cited Anderson, J. (1996).
An Ecological Approach to Cognitive Film Theory. Carbondale: Southern Illinois University
Press. Brooks, R. (1991). Intelligence Without Reason. Artificial Intelligence 47: 139-159. Chemero, A. (1998) How
to be an Anti-Representationalist. Doctoral Thesis in Philosophy and
Cognitive Science, Indiana University, Bloomington, IN. Chemero, A. (2000)
Anti-representationalism and the dynamical stance. Philosophy
of Science 67, 625-647. Chemero, A. (2001).
Dynamical Explanation and Mental Representation. Trends in Cognitive Science, 5, 4, 140-141. Chemero, A. and Eck, D. (1999). An Exploration of Representational Complexity via Coupled
Oscillator Systems. In U. Priss (ed.) Proceedings
of the 1999 Midwest AI and Cognitive Science Conference, Cambridge: AAAI
Press. Clark, A. (1997). Being
There. Cambridge: MIT Press. Fodor, J. (1975). The Language of Thought. Cambridge: MIT Press. Fodor, J. and Pylyshyn, Z. (1988). Connectionism
and Cognitive Architecture. Cognition
28: 3-71. Haugeland, J. (1985).
Artificial Intelligence: The Very
Idea. Cambridge: MIT Press. Haken, H., Kelso, J.A.S.,
Bunz, H. (1985). A theoretical model of phase transitions in
human hand movements. Biological
Cybernetics 51, 347-356. Keijzer, F. (2001).
Representation and Behavior. Cambridge: MIT Press. Kelso, J. A. S. (1995) Dynamic
Patterns. Cambridge: MIT Press. Markman, A. and Dietrich, E. (2000). Extending the Classical View of Representations. Trends
in Cognitive Sciences 4, 470-475. Millikan, R. (1984) Language,
Thought and Other Biological Categories. Cambridge: MIT Press. Port, R. and van Gelder, T. (1995). Mind as Motion. Cambridge: MIT Press. Ramsey, W. (1997). Do Connectionist Representations Earn
their Explanatory Keep? Mind and Language, 12: 34-66. van Gelder, T. (1995). What might Cognition be if not
Computation? Journal of Philosophy,
91, 345-381.
Figure 1. A
two-layer and a three-layer connectionist network.
Figure 2. Sample
connectionist network that outputs whether the perception is of a
two-dimensional surface or a three-dimensional scene.
Figure 3. The coupling of the animal and the environment
and the equations that express it.
Figure 4. The Haken-Kelso-Bunz model for finger
wagging. Please note that the two
vertical lines on the graph of potential vs. phase are in fact the same line.
|