Cognitive Approaches to
Software Comprehension: Results, Gaps and
Notations Differ in Intricate Ways
Types of Information and Comprehension Processes
Resources: A Different Interpretation
Conclusion: People do what's easiest
Extended abstract of talk at workshop on Experimental
Psychology in Software Comprehension
Studies 97, University of Limerick, Ireland.
Thomas Green, 1997
Computer Based Learning Unit
University of Leeds
Leeds LS2 9JT, UK
greenery At ntlworld Dot com (sorry about anti-spam tricks)
This talk describes some results from studies on the psychology of
program comprehension. The
particular focus I have chosen is on the relationship between the notation and the cognitive
processes. (For present purposes, a programming language is a computational paradigm expressed
in a notation.) The main thrust is that comprehension requires extracting or inferring many kinds of
information, some of which is easier to get at than others, and that one should avoid bold claims
about the comprehension process until the role of notation as an aid or hindrance is better
An easy trap is to view program comprehension as a single process
that is the same in all
circumstances. Two common versions of that over-simplification are the idea that one language or
notation is somehow 'natural', and the idea that one language or notation is 'the best'. In the course
of this paper I shall expose a third version, the idea that program comprehension is always
performed the same way, whatever the circumstances.
Adherents of particular languages frequently claim that their language must be easy because it is
more natural, or because it works the way people think, etc. Imperative programming, object-
oriented programming, logic programming, functional programming, graphical programming, and
others are all 'natural' in someone's eyes. This is patently absurd: they can't all be natural;
anyway, programming languages are not really like natural languages (how would you pose a
rhetorical question, like this one, in a program?), and if they were 'natural' they wouldn't be so
hard to learn or comprehend. Not surprisingly studies have repeatedly demonstrated the falsity
of these claims (e.g. Détienne, 1990).
Nor is any one language uniformly best for all purposes. Using miniature programming languages
Green (1977) showed that jump-style programs were much harder to understand than nest-style
when the task was to report the set of truth-values controlling outcomes, but that the two styles
were indistinguishable when the task was to infer the outcomes from the truth-values. Curtis et
al. (1989) and Green et al. (1991) showed similar results in the domain of graphical programming,
respectively using the control-flow and the dataflow paradigms. That may look as though nested
styles are better, but there is a much better interpretation, as follows.
Gilmore and Green (1984) showed that rule-based paradigms are better than imperative
paradigms for the first task described above, but vice versa for the second task (Figure 1). They
proposed the 'match-mismatch conjecture', according to which all notations contain many types of
information and every notation highlights some kinds at the cost of obscuring other kinds.
Extracting information about a program is correspondingly easy when the information matches the
notation and hard when there is a mismatch. A good parallel is swimming upstream or
downstream: sequential information is easy to determine from a Basic program, because one is
swimming downstream, but hard to determine from say an event-driven program, because one is
trying to swim in the opposite direction from the language. There is a corollary to that conjecture:
adding tools to improve comprehension is only worthwhile for the 'upstream' types of information,
the ones that are hard to extract in the given language. Thus, a tracing package that is supposed to
help Basic novices do the second task (infer outcomes from truth-values) would probably be less
useful than one that helped them do the first task (infer truth-values from outcomes).
Since those studies, more detailed analyses of information types
have been proposed. Pennington
(1987) proposed that programmers extract 5 types of information (Figure 2), confirmed by an
experiment using a fairly long study-and-modification task with experienced programmers. On the
basis of a pre-existing text-analysis model she interpreted her results to indicate programming
understanding as a bottom-up process, starting with control flow and culminating in forming the
'situation model', reflected in the function information. Her work has attracted several follow-
ups. Corritore and Weidenbeck (1991) reported a much shorter experiment on novice Pascal
programmers that reached similar conclusions (control flow most accurate, function reached last).
Ramalingam and Weidenbeck (1997) found an interestingly different pattern for C++ object-
oriented programming that can be interpreted as support (Figure 3), pace some problems with their
experimental method. Bergantz and Hassell (1991) collated utterances by programmers while
studying Prolog programs and found that their subjects initially focused on data flow, later moving
to consider the function. Good, Brna and Cox (whose experimental technique and careful reasoning
outshine the other follow-up studies) found that despite Prolog's claims as a functional, logic-
based language, Prolog novices were much like the Corritore and Weidenbeck Pascal novices in
that most of their program summaries were procedural.
We seem to have the beginnings of broad agreement. Maybe we can
design tools for these 5 types to
aid the comprehension of novices by making it easier to extract each of them.
Although the '5 types' school has some good empirical results,
there are still some problems. The
textual analysis model underlying Pennington's work and its successors is not the only approach.
Notably, Koenemann and Robertson (1991) have argued against the 'programs as texts' approach,
claiming that its view of the comprehension strategy, as a process of reading through the whole
text, is unlikely to apply to large programs; they propose an 'as-needed' strategy in which text is
only consulted when needed. However, I myself do not believe that there is a genuine opposition
between these two views. At different scales, both can be reasonable models.
There are also some methodological problems. Some of the '5 types'
studies compared errors by
types, others used classification of program summaries. As Good et al. observed, classifying
statements in summaries into just one of the 5 types is sometimes very hard, and in some cases no
measure was reported of inter-judge consistency, lessening the value of the study. Maybe these 5
types are really only different in the mind of the experimenter.
The '5 types' studies share a common model, in which comprehension
proceeds from procedural
understanding through data flow to function or situation model. This 'control-flow-first' model
postulates that the same cognitive processing takes place for all types of programming language
and for all types of problem. I believe, however, that this interpretation has been too strongly
shaped by the 'programs as text' view and by focusing on hypothesized mental models. Instead, I
suggest that programmers comprehend programs using whatever resources are available .
The research on the 5 types has entirely bypassed Gilmore and
distinction. We still need to study the 5 types in conditions that separate upstream and
downstream, by comparing different paradigms; since most of the studies reported have used
Pascal, Fortran, or a similar language, control flow information has been the downstream, easily
accessible type of information. Let us consider those studies that have looked at different
There has been a movement towards a shared acceptance of the
asserting that programmers in general, and novices in particular, find it easiest to form procedural
models of programs. They therefore start by looking for control flow information and slowly build
up a 'situation model'. I suggest that this is a third version of the over-simplification trap I
described at the start of this paper, the idea that program comprehension is a single process that
is the same in all circumstances.
A more plausible picture is that people use the easiest resources
available. Historically, the
development of languages and environments has made control flow information much more
accessible than other types, so the 'control-flow-first' model is a good approximation of most
existing environments. But the way forward in designing environments and in understanding
comprehension, if my interpretation is correct, is to offer programmers the chance to choose from a
richer array of resources. That means we need to improve access to other types of information,
especially that which is 'upstream'. We can do that by changing either the language or the
working environment, following Mulholland's lead.
More generally: the way forward is not to make strong, simple
claims about how cognitive
processes work. The way forward is to study the details of how notations convey information.
Thanks to Judith Good for putting me right on details. Remaining mistakes are mine, of course.
Bergantz, D. and Hassell, J. (1991) Information relationships in
Prolog programs: how do
programmers comprehend functionality? Int. J. Man-Machine Studies , 35,313-328.
Corritore, C. L. and Weidenbeck, S. (1991) What do novices learn
during program comprehension?
Int. J. Human-Computer Interaction , 3(2), 199-222.
Curtis, B., Sheppard, S., Kruesi-Bailey, E., Bailey, J. and
Boehm-Davis, D. (1989) Experimental
evaluation of software documentation formats. J. Systems and Software , 9 (2), 167-207.
Détienne, F. (1990) Difficulties in designing with an
object-oriented language: an empirical study.
In D. Diaper, D. Gilmore, G. Cockton and B. Shackel (Eds.) Human-Computer Interaction -
INTERACT 90 . Elsevier.
Gilmore, D. J. and Green, T. R. G. (1984) Comprehension and recall
of miniature programs. Int. J.
Man-Machine Studies 21, 31-48.
Good, J., Brna, P. and Cox, R. (1997). Program comprehension and
novices: does programming
language make a difference? Technical Report 97 /10, Computer Based Learning Unit,
University of Leeds.
Green, T. R. G. (1977) Conditional program statements and their
comprehensibility to professional
programmers. J. Occupational Psychology , 50, 93-109.
Green, T. R. G. and Navarro, R. (1995) Programming plans, imagery,
and visual programming. In
Nordby, K., Helmersen, P. H., Gilmore, D. J., and Arnesen, S. (Eds.) INTERACT-95. London:
Chapman and Hall (pp. 139-144).
Green, T. R. G., Petre, M. and Bellamy, R. K. E. (1991)
Comprehensibility of visual and textual
programs: a test of 'Superlativism' against the 'match-mismatch' conjecture. In J. Koenemann-
Belliveau, T. Moher, and S. P. Robertson (Eds.), Empirical Studies of Programmers: Fourth
Workshop. Norwood, NJ: Ablex. Pp. 121-146.
Koenemann, J. and Robertson, S. P. (1991) Expert problem-solving
strategies for program
comprehension. In S. P. Robertson, G. M. Olson and J. S. Olson (Eds.) Reaching Through
Technology, Proc. ACM Conf. on Human Factors in Computing Systems CHI '91 . Addison-
Mulholland, P. (1997) Using a fine-grained comparative evaluation
technique to understand and
design software visualization tools. Empirical Studies of Programmers , 1997 (in press)
Pennington, N. (1987) Stimulus structures and mental
representations in expert comprehension of
computer programs. Cognitive Psychology, 19, 295-341.
Ramalingam and Wiedenbeck (1997) An empirical study of novice
program comprehension in the
imperative and object-oriented styles. Empirical Studies of Programmers , 1997 (in press)
Saariluoma, P. and Sajaniemi, J. (1989) Visual information
chunking in spreadsheet calculation.
Int. J. Man-Machine Studies , 30 , 475-488.
An idealised form of the claim made by Gilmore and Green (1984) is:
Extracting information is easy when program structure matches question asked:
Extracting information is hard in the other two opposite cases.
Schematic view of the match-mismatch conjecture
subjects: 40 professional programmers: 20 Cobol, 20 Fortran
materials: 200-line program (Fortran or Cobol). Could
scroll in a one or two-pane window. Half talked aloud,
Text models (after van Dijk and Kintsch, 1983) - two
the textbase is a hierarchy of representations (surface, micro, macro)
the situation model is 'what it's about'
Pennington suggests that the functional relations are 'more
comprehensible in the terms of the real
world objects' ;
so the textbase would be dominated by procedural relations, affected by program structure
the functional relations would be derived from the situation model
Subjects: 75 students in an introductory programming course learning C, then C++
6 program segments:
3 using the C subset
3 using C++ features
5 comprehension questions per program:
subjects studied the program on paper for 2 minutes, then turned
the page and answered questions
The claim is that the error-rate on types of question indicated
the form of the mental
errors on operations and control < errors on dataflow , state or function
errors on dataflow and function < control , state or operations
(Same pattern for best and worst quartiles)
(Theirs): The form of the mental representation for OOP is in
terms of real-world objects, so it is
easier to build the situation model
(Mine): The structure of C++ makes it easier to extract data flow
and function information than in
[Note that both explanations could be true]
64 Open University psychology students worked in pairs; utterances were later analysed
task was to find the difference between a Prolog program on paper
and one in the computer, only
being allowed to see the trace output of the latter
4 modified versions of original program (control flow change, data
flow change, relation name
change, and a change to an atom)
4 kinds of tracer (no space to describe details here)
Although control flow information was more frequent than data flow
in all conditions, the ratio
depended on the tracer. Data flow was discussed most (both absolutely and relatively) with the
tracer called Plater.
Strategy of reviewing data-flow was commonest with Plater
Frequency of utterances about the tracer itself (indicating
problems in using it) differed between
tracers. Plater was easiest to understand.
Mean number of utterances by content, for each of 4 tracers
When data flow information was easily accessible from the tracer
(e.g. from Plater), it was a
preferred resource. When it was hard to extract data flow, the subjects had to fall back on control