Cognitive Approaches to
Software Comprehension: Results, Gaps and
Introduction
Notations Differ in
Intricate Ways
Types of Information and
Comprehension Processes
Some Complications
Resources: A Different
Interpretation
Conclusion: People do
what's easiest
Bibliography
Figure 1
Figure 2
Figure 3
Figure 4
Extended abstract of talk at workshop on Experimental
Psychology in Software Comprehension
Studies 97, University of Limerick, Ireland.
Thomas Green, 1997
Computer Based Learning Unit
University of Leeds
Leeds LS2 9JT, UK
greenery At ntlworld Dot com (sorry about anti-spam tricks)
http://homepage.ntlworld.com/greenery/
This talk describes some results from studies on the psychology of
program comprehension. The
particular focus I have chosen is on the relationship between the
notation and the cognitive
processes. (For present purposes, a programming language is a
computational paradigm expressed
in a notation.) The main thrust is that comprehension requires
extracting or inferring many kinds of
information, some of which is easier to get at than others, and that
one should avoid bold claims
about the comprehension process until the role of notation as an aid
or hindrance is better
understood.
An easy trap is to view program comprehension as a single process
that is the same in all
circumstances. Two common versions of that over-simplification are
the idea that one language or
notation is somehow 'natural', and the idea that one language or
notation is 'the best'. In the course
of this paper I shall expose a third version, the idea that program
comprehension is always
performed the same way, whatever the circumstances.
Naturalism
Adherents of particular languages frequently claim that
their language must be easy because it is
more natural, or because it works the way people think, etc.
Imperative programming, object-
oriented programming, logic programming, functional programming,
graphical programming, and
others are all 'natural' in someone's eyes. This is patently absurd:
they can't all be natural;
anyway, programming languages are not really like natural languages
(how would you pose a
rhetorical question, like this one, in a program?), and if they were
'natural' they wouldn't be so
hard to learn or comprehend. Not surprisingly studies have repeatedly
demonstrated the falsity
of these claims (e.g. Détienne, 1990).
Superlativism
Nor is any one language uniformly best for all
purposes. Using miniature programming languages
Green (1977) showed that jump-style programs were much harder to
understand than nest-style
when the task was to report the set of truth-values controlling
outcomes, but that the two styles
were indistinguishable when the task was to infer the outcomes from
the truth-values. Curtis et
al. (1989) and Green et al. (1991) showed similar results in the
domain of graphical programming,
respectively using the control-flow and the dataflow paradigms. That
may look as though nested
styles are better, but there is a much better interpretation, as
follows.
Match-Mismatch
Gilmore and Green (1984) showed that rule-based
paradigms are better than imperative
paradigms for the first task described above, but vice versa for the
second task (Figure 1). They
proposed the 'match-mismatch conjecture', according to which all
notations contain many types of
information and every notation highlights some kinds at the cost of
obscuring other kinds.
Extracting information about a program is correspondingly easy when
the information matches the
notation and hard when there is a mismatch. A good parallel is
swimming upstream or
downstream: sequential information is easy to determine from a Basic
program, because one is
swimming downstream, but hard to determine from say an event-driven
program, because one is
trying to swim in the opposite direction from the language. There is
a corollary to that conjecture:
adding tools to improve comprehension is only worthwhile for the
'upstream' types of information,
the ones that are hard to extract in the given language. Thus, a
tracing package that is supposed to
help Basic novices do the second task (infer outcomes from
truth-values) would probably be less
useful than one that helped them do the first task (infer
truth-values from outcomes).
Since those studies, more detailed analyses of information types
have been proposed. Pennington
(1987) proposed that programmers extract 5 types of information
(Figure 2), confirmed by an
experiment using a fairly long study-and-modification task with
experienced programmers. On the
basis of a pre-existing text-analysis model she interpreted her
results to indicate programming
understanding as a bottom-up process, starting with control flow and
culminating in forming the
'situation model', reflected in the function information. Her work
has attracted several follow-
ups. Corritore and Weidenbeck (1991) reported a much shorter
experiment on novice Pascal
programmers that reached similar conclusions (control flow most
accurate, function reached last).
Ramalingam and Weidenbeck (1997) found an interestingly different
pattern for C++ object-
oriented programming that can be interpreted as support (Figure 3),
pace some problems with their
experimental method. Bergantz and Hassell (1991) collated utterances
by programmers while
studying Prolog programs and found that their subjects initially
focused on data flow, later moving
to consider the function. Good, Brna and Cox (whose experimental
technique and careful reasoning
outshine the other follow-up studies) found that despite Prolog's
claims as a functional, logic-
based language, Prolog novices were much like the Corritore and
Weidenbeck Pascal novices in
that most of their program summaries were procedural.
We seem to have the beginnings of broad agreement. Maybe we can
design tools for these 5 types to
aid the comprehension of novices by making it easier to extract each
of them.
Although the '5 types' school has some good empirical results,
there are still some problems. The
textual analysis model underlying Pennington's work and its
successors is not the only approach.
Notably, Koenemann and Robertson (1991) have argued against the
'programs as texts' approach,
claiming that its view of the comprehension strategy, as a process of
reading through the whole
text, is unlikely to apply to large programs; they propose an
'as-needed' strategy in which text is
only consulted when needed. However, I myself do not believe that
there is a genuine opposition
between these two views. At different scales, both can be reasonable
models.
There are also some methodological problems. Some of the '5 types'
studies compared errors by
types, others used classification of program summaries. As Good et
al. observed, classifying
statements in summaries into just one of the 5 types is sometimes
very hard, and in some cases no
measure was reported of inter-judge consistency, lessening the value
of the study. Maybe these 5
types are really only different in the mind of the experimenter.
The '5 types' studies share a common model, in which comprehension
proceeds from procedural
understanding through data flow to function or situation model. This
'control-flow-first' model
postulates that the same cognitive processing takes place for all
types of programming language
and for all types of problem. I believe, however, that this
interpretation has been too strongly
shaped by the 'programs as text' view and by focusing on hypothesized
mental models. Instead, I
suggest that programmers comprehend programs using whatever
resources are available .
The research on the 5 types has entirely bypassed Gilmore and
Green's upstream/downstream
distinction. We still need to study the 5 types in conditions that
separate upstream and
downstream, by comparing different paradigms; since most of the
studies reported have used
Pascal, Fortran, or a similar language, control flow information has
been the downstream, easily
accessible type of information. Let us consider those studies that
have looked at different
paradigms.
There has been a movement towards a shared acceptance of the
'control-flow-first' model,
asserting that programmers in general, and novices in particular,
find it easiest to form procedural
models of programs. They therefore start by looking for control flow
information and slowly build
up a 'situation model'. I suggest that this is a third version of the
over-simplification trap I
described at the start of this paper, the idea that program
comprehension is a single process that
is the same in all circumstances.
A more plausible picture is that people use the easiest resources
available. Historically, the
development of languages and environments has made control flow
information much more
accessible than other types, so the 'control-flow-first' model is a
good approximation of most
existing environments. But the way forward in designing environments
and in understanding
comprehension, if my interpretation is correct, is to offer
programmers the chance to choose from a
richer array of resources. That means we need to improve access to
other types of information,
especially that which is 'upstream'. We can do that by changing
either the language or the
working environment, following Mulholland's lead.
More generally: the way forward is not to make strong, simple
claims about how cognitive
processes work. The way forward is to study the details
of how notations convey information.
Acknowledgements
Thanks to Judith Good for putting me right on details.
Remaining mistakes are mine, of course.
Bergantz, D. and Hassell, J. (1991) Information relationships in
Prolog programs: how do
programmers comprehend functionality? Int. J. Man-Machine
Studies , 35,313-328.
Corritore, C. L. and Weidenbeck, S. (1991) What do novices learn
during program comprehension?
Int. J. Human-Computer Interaction , 3(2), 199-222.
Curtis, B., Sheppard, S., Kruesi-Bailey, E., Bailey, J. and
Boehm-Davis, D. (1989) Experimental
evaluation of software documentation formats. J. Systems and
Software , 9 (2), 167-207.
Détienne, F. (1990) Difficulties in designing with an
object-oriented language: an empirical study.
In D. Diaper, D. Gilmore, G. Cockton and B. Shackel (Eds.)
Human-Computer Interaction -
INTERACT 90 . Elsevier.
Gilmore, D. J. and Green, T. R. G. (1984) Comprehension and recall
of miniature programs. Int. J.
Man-Machine Studies 21, 31-48.
Good, J., Brna, P. and Cox, R. (1997). Program comprehension and
novices: does programming
language make a difference? Technical Report 97 /10, Computer Based
Learning Unit,
University of Leeds.
Green, T. R. G. (1977) Conditional program statements and their
comprehensibility to professional
programmers. J. Occupational Psychology , 50, 93-109.
Green, T. R. G. and Navarro, R. (1995) Programming plans, imagery,
and visual programming. In
Nordby, K., Helmersen, P. H., Gilmore, D. J., and Arnesen, S. (Eds.)
INTERACT-95. London:
Chapman and Hall (pp. 139-144).
Green, T. R. G., Petre, M. and Bellamy, R. K. E. (1991)
Comprehensibility of visual and textual
programs: a test of 'Superlativism' against the 'match-mismatch'
conjecture. In J. Koenemann-
Belliveau, T. Moher, and S. P. Robertson (Eds.), Empirical Studies
of Programmers: Fourth
Workshop. Norwood, NJ: Ablex. Pp. 121-146.
Koenemann, J. and Robertson, S. P. (1991) Expert problem-solving
strategies for program
comprehension. In S. P. Robertson, G. M. Olson and J. S. Olson (Eds.)
Reaching Through
Technology, Proc. ACM Conf. on Human Factors in Computing Systems CHI
'91 . Addison-
Wesley.
Mulholland, P. (1997) Using a fine-grained comparative evaluation
technique to understand and
design software visualization tools. Empirical Studies of
Programmers , 1997 (in press)
Pennington, N. (1987) Stimulus structures and mental
representations in expert comprehension of
computer programs. Cognitive Psychology, 19, 295-341.
Ramalingam and Wiedenbeck (1997) An empirical study of novice
program comprehension in the
imperative and object-oriented styles. Empirical Studies of
Programmers , 1997 (in press)
Saariluoma, P. and Sajaniemi, J. (1989) Visual information
chunking in spreadsheet calculation.
Int. J. Man-Machine Studies , 30 , 475-488.
An idealised form of the claim made by Gilmore and Green (1984) is:
Extracting information is easy when program structure matches question asked:
Extracting information is hard in the other two opposite cases.
Schematic view of the match-mismatch conjecture
subjects: 40 professional programmers: 20 Cobol, 20 Fortran
materials: 200-line program (Fortran or Cobol). Could
scroll in a one or two-pane window. Half talked aloud,
half no-talk
task:
Results:
Explanation?
Text models (after van Dijk and Kintsch, 1983) - two
cross-referenced representations:
the textbase is a hierarchy of representations (surface,
micro, macro)
the situation model is 'what it's about'
Pennington suggests that the functional relations are 'more
comprehensible in the terms of the real
world objects' ;
so the textbase would be dominated by procedural relations, affected
by program structure
the functional relations would be derived from the situation model
Subjects: 75 students in an introductory programming course learning C, then C++
6 program segments:
3 using the C subset
3 using C++ features
Task:
5 comprehension questions per program:
subjects studied the program on paper for 2 minutes, then turned
the page and answered questions
from memory
The claim is that the error-rate on types of question indicated
the form of the mental
representation
Results
for non-OOP:
errors on operations and control <
errors on dataflow , state or
function
for OOP:
errors on dataflow and function <
control , state or operations
(Same pattern for best and worst quartiles)
Explanation?
(Theirs): The form of the mental representation for OOP is in
terms of real-world objects, so it is
easier to build the situation model
(Mine): The structure of C++ makes it easier to extract data flow
and function information than in
C
[Note that both explanations could be true]
64 Open University psychology students worked in pairs; utterances were later analysed
task was to find the difference between a Prolog program on paper
and one in the computer, only
being allowed to see the trace output of the latter
4 modified versions of original program (control flow change, data
flow change, relation name
change, and a change to an atom)
4 kinds of tracer (no space to describe details here)
Results
Although control flow information was more frequent than data flow
in all conditions, the ratio
depended on the tracer. Data flow was discussed most (both absolutely
and relatively) with the
tracer called Plater.
Strategy of reviewing data-flow was commonest with Plater
Frequency of utterances about the tracer itself (indicating
problems in using it) differed between
tracers. Plater was easiest to understand.
Mean number of utterances by content, for each of 4 tracers
Explanation?
When data flow information was easily accessible from the tracer
(e.g. from Plater), it was a
preferred resource. When it was hard to extract data flow, the
subjects had to fall back on control
flow information