Cognitive Approaches to Software Comprehension: Results, Gaps and
Notations Differ in Intricate Ways
Types of Information and Comprehension Processes
Some Complications
Resources: A Different Interpretation
Conclusion: People do what's easiest
Figure 1
Figure 2
Figure 3
Figure 4

Cognitive Approaches to Software Comprehension:

Results, Gaps and Limitations


Extended abstract of talk at workshop on Experimental Psychology in Software Comprehension
Studies 97
, University of Limerick, Ireland.

Thomas Green, 1997

Computer Based Learning Unit
University of Leeds
Leeds LS2 9JT, UK
greenery At ntlworld Dot com (sorry about anti-spam tricks)


This talk describes some results from studies on the psychology of program comprehension. The
particular focus I have chosen is on the relationship between the notation and the cognitive
processes. (For present purposes, a programming language is a computational paradigm expressed
in a notation.) The main thrust is that comprehension requires extracting or inferring many kinds of
information, some of which is easier to get at than others, and that one should avoid bold claims
about the comprehension process until the role of notation as an aid or hindrance is better

Notations Differ in Intricate Ways

An easy trap is to view program comprehension as a single process that is the same in all
circumstances. Two common versions of that over-simplification are the idea that one language or
notation is somehow 'natural', and the idea that one language or notation is 'the best'. In the course
of this paper I shall expose a third version, the idea that program comprehension is always
performed the same way, whatever the circumstances.

Adherents of particular languages frequently claim that their language must be easy because it is
more natural, or because it works the way people think, etc. Imperative programming, object-
oriented programming, logic programming, functional programming, graphical programming, and
others are all 'natural' in someone's eyes. This is patently absurd: they can't all  be natural;
anyway, programming languages are not really like natural languages (how would you pose a
rhetorical question, like this one, in a program?), and if they were 'natural' they wouldn't be so
hard to learn or comprehend. Not surprisingly studies have repeatedly demonstrated the falsity
of these claims (e.g. Détienne, 1990).

Nor is any one language uniformly best for all purposes. Using miniature programming languages
Green (1977) showed that jump-style programs were much harder to understand than nest-style
when the task was to report the set of truth-values controlling outcomes, but that the two styles
were indistinguishable when the task was to infer the outcomes from the truth-values. Curtis et
al. (1989) and Green et al. (1991) showed similar results in the domain of graphical programming,
respectively using the control-flow and the dataflow paradigms. That may look as though nested
styles are better, but there is a much better interpretation, as follows.

Gilmore and Green (1984) showed that rule-based paradigms are better than imperative
paradigms for the first task described above, but vice versa for the second task (Figure 1). They
proposed the 'match-mismatch conjecture', according to which all notations contain many types of
information and every notation highlights some kinds at the cost of obscuring other kinds.
Extracting information about a program is correspondingly easy when the information matches the
notation and hard when there is a mismatch. A good parallel is swimming upstream or
downstream: sequential information is easy to determine from a Basic program, because one is
swimming downstream, but hard to determine from say an event-driven program, because one is
trying to swim in the opposite direction from the language. There is a corollary to that conjecture:
adding tools to improve comprehension is only worthwhile for the 'upstream' types of information,
the ones that are hard to extract in the given language. Thus, a tracing package that is supposed to
help Basic novices do the second task (infer outcomes from truth-values) would probably be less
useful than one that helped them do the first task (infer truth-values from outcomes).

Types of Information and Comprehension Processes

Since those studies, more detailed analyses of information types have been proposed. Pennington
(1987) proposed that programmers extract 5 types of information (Figure 2), confirmed by an
experiment using a fairly long study-and-modification task with experienced programmers. On the
basis of a pre-existing text-analysis model she interpreted her results to indicate programming
understanding as a bottom-up process, starting with control flow and culminating in forming the
'situation model', reflected in the function information. Her work has attracted several follow-
ups. Corritore and Weidenbeck (1991) reported a much shorter experiment on novice Pascal
programmers that reached similar conclusions (control flow most accurate, function reached last).
Ramalingam and Weidenbeck (1997) found an interestingly different pattern for C++ object-
oriented programming that can be interpreted as support (Figure 3), pace  some problems with their
experimental method. Bergantz and Hassell (1991) collated utterances by programmers while
studying Prolog programs and found that their subjects initially focused on data flow, later moving
to consider the function. Good, Brna and Cox (whose experimental technique and careful reasoning
outshine the other follow-up studies) found that despite Prolog's claims as a functional, logic-
based language, Prolog novices were much like the Corritore and Weidenbeck Pascal novices in
that most of their program summaries were procedural.

We seem to have the beginnings of broad agreement. Maybe we can design tools for these 5 types to
aid the comprehension of novices by making it easier to extract each of them.

Some Complications

Although the '5 types' school has some good empirical results, there are still some problems. The
textual analysis model underlying Pennington's work and its successors is not the only approach.
Notably, Koenemann and Robertson (1991) have argued against the 'programs as texts' approach,
claiming that its view of the comprehension strategy, as a process of reading through the whole
text, is unlikely to apply to large programs; they propose an 'as-needed' strategy in which text is
only consulted when needed. However, I myself do not believe that there is a genuine opposition
between these two views. At different scales, both can be reasonable models.

There are also some methodological problems. Some of the '5 types' studies compared errors by
types, others used classification of program summaries. As Good et al. observed, classifying
statements in summaries into just one of the 5 types is sometimes very hard, and in some cases no
measure was reported of inter-judge consistency, lessening the value of the study. Maybe these 5
types are really only different in the mind of the experimenter.

Resources: A Different Interpretation

The '5 types' studies share a common model, in which comprehension proceeds from procedural
understanding through data flow to function or situation model. This 'control-flow-first' model
postulates that the same cognitive processing takes place for all types of programming language
and for all types of problem. I believe, however, that this interpretation has been too strongly
shaped by the 'programs as text' view and by focusing on hypothesized mental models. Instead, I
suggest that programmers comprehend programs using whatever resources are available .

The research on the 5 types has entirely bypassed Gilmore and Green's upstream/downstream
distinction. We still need to study the 5 types in conditions that separate upstream and
downstream, by comparing different paradigms; since most of the studies reported have used
Pascal, Fortran, or a similar language, control flow information has been the downstream, easily
accessible type of information. Let us consider those studies that have looked at different

  1. By concentrating on textual representations researchers may be overlooking cognitive resources
    available in richer representations. Mental representations of programs seem to use spatial
    imagery where possible, e.g. in spreadsheets or in a graphical programming language (Saariluoma
    and Sajaniemi, 1989; Green and Navarro, 1995), and the former study demonstrated that in some
    circumstances the spatial representation is preferentially used for problem solving when possible.
  2. The results obtained by Ramalingam and Weidenbeck are most easily interpreted as showing
    that data flow information is more readily available in C++ than in C. (Their interpretation is
    quite different: they claim that OOP encourages a mental model in which objects play a large part
    and that C++ is therefore more natural. I think that it will be quickly conceded that any claim
    that relies on C++ being 'natural' needs to be viewed cautiously.)
  3. There has so far been no mention of the working environment . It turns out, quite unexpectedly to
    me at least, that different environments can greatly change the pattern of information usage.
    Mulholland (1997) compared what pairs of students talked about while investigating Prolog
    programs using each of 5 kinds of debugger. He showed that although control-flow information
    was the commonest topic, as expected from other results described above, the ratio between control-
    flow and data-flow depended on the type of tracer in use (Figure 4). Working with Plater, a tracer
    that was deliberately designed to reveal data flow, the subjects spent more time using a strategy of
    following data rather than following control.
  4. Finally, although two other Prolog studies (Bergantz and Hassell, Good et al.) supported the
    control-flow-first model, the support was only moderate, and I think that the renowned opacity of
    Prolog, and the results of Mulholland just described, suggest that we can place little weight on
    these results until different environments have been compared.

Conclusion: People do what's easiest

There has been a movement towards a shared acceptance of the 'control-flow-first' model,
asserting that programmers in general, and novices in particular, find it easiest to form procedural
models of programs. They therefore start by looking for control flow information and slowly build
up a 'situation model'. I suggest that this is a third version of the over-simplification trap I
described at the start of this paper, the idea that program comprehension is a single process that
is the same in all circumstances.

A more plausible picture is that people use the easiest resources available. Historically, the
development of languages and environments has made control flow information much more
accessible than other types, so the 'control-flow-first' model is a good approximation of most
existing environments. But the way forward in designing environments and in understanding
comprehension, if my interpretation is correct, is to offer programmers the chance to choose from a
richer array of resources. That means we need to improve access to other types of information,
especially that which is 'upstream'. We can do that by changing either the language or the
working environment, following Mulholland's lead.

More generally: the way forward is not to make strong, simple claims about how cognitive
processes work. The way forward is to study the details  of how notations convey information.

Thanks to Judith Good for putting me right on details. Remaining mistakes are mine, of course.


Bergantz, D. and Hassell, J. (1991) Information relationships in Prolog programs: how do
programmers comprehend functionality? Int. J. Man-Machine Studies , 35,313-328.

Corritore, C. L. and Weidenbeck, S. (1991) What do novices learn during program comprehension?
Int. J. Human-Computer Interaction , 3(2), 199-222.

Curtis, B., Sheppard, S., Kruesi-Bailey, E., Bailey, J. and Boehm-Davis, D. (1989) Experimental
evaluation of software documentation formats. J. Systems and Software , 9 (2), 167-207.

Détienne, F. (1990) Difficulties in designing with an object-oriented language: an empirical study.
In D. Diaper, D. Gilmore, G. Cockton and B. Shackel (Eds.) Human-Computer Interaction -
 . Elsevier.

Gilmore, D. J. and Green, T. R. G. (1984) Comprehension and recall of miniature programs. Int. J.
Man-Machine Studies
  21, 31-48.

Good, J., Brna, P. and Cox, R. (1997). Program comprehension and novices: does programming
language make a difference? Technical Report 97 /10, Computer Based Learning Unit,
University of Leeds.

Green, T. R. G. (1977) Conditional program statements and their comprehensibility to professional
programmers. J. Occupational Psychology , 50, 93-109.

Green, T. R. G. and Navarro, R. (1995) Programming plans, imagery, and visual programming. In
Nordby, K., Helmersen, P. H., Gilmore, D. J., and Arnesen, S. (Eds.) INTERACT-95.  London:
Chapman and Hall (pp. 139-144).

Green, T. R. G., Petre, M. and Bellamy, R. K. E. (1991) Comprehensibility of visual and textual
programs: a test of 'Superlativism' against the 'match-mismatch' conjecture. In J. Koenemann-
Belliveau, T. Moher, and S. P. Robertson (Eds.), Empirical Studies of Programmers: Fourth
  Norwood, NJ: Ablex. Pp. 121-146.

Koenemann, J. and Robertson, S. P. (1991) Expert problem-solving strategies for program
comprehension. In S. P. Robertson, G. M. Olson and J. S. Olson (Eds.) Reaching Through
Technology, Proc. ACM Conf. on Human Factors in Computing Systems CHI '91
 . Addison-

Mulholland, P. (1997) Using a fine-grained comparative evaluation technique to understand and
design software visualization tools. Empirical Studies of Programmers , 1997 (in press)

Pennington, N. (1987) Stimulus structures and mental representations in expert comprehension of
computer programs. Cognitive Psychology,  19, 295-341.

Ramalingam and Wiedenbeck (1997) An empirical study of novice program comprehension in the
imperative and object-oriented styles. Empirical Studies of Programmers , 1997 (in press)

Saariluoma, P. and Sajaniemi, J. (1989) Visual information chunking in spreadsheet calculation.
Int. J. Man-Machine Studies , 30 , 475-488.

Figure 1


An idealised form of the claim made by Gilmore and Green (1984) is:

Extracting information is easy  when program structure matches question asked:

Extracting information is hard  in the other two opposite cases.

Schematic view of the match-mismatch conjecture

Figure 2

Pennington, 1987, Study 2 - a summary

subjects: 40 professional programmers: 20 Cobol, 20 Fortran

materials: 200-line program (Fortran or Cobol). Could scroll in a one or two-pane window. Half talked aloud,
half no-talk




Text models (after van Dijk and Kintsch, 1983) - two cross-referenced representations:
the textbase is a hierarchy of representations (surface, micro, macro)
the situation model is 'what it's about'

Pennington suggests that the functional relations are 'more comprehensible in the terms of the real
world objects' ;
so the textbase would be dominated by procedural relations, affected by program structure
the functional relations would be derived from the situation model

Figure 3

Ramalingam and Wiedenbeck, in press

Subjects: 75 students in an introductory programming course learning C, then C++

6 program segments:
3 using the C subset
3 using C++ features


5 comprehension questions per program:

subjects studied the program on paper for 2 minutes, then turned the page and answered questions
from memory

The claim is that the error-rate on types of question indicated the form of the mental


for non-OOP:
errors on operations  and control  < errors on dataflow , state  or function

for OOP:
errors on dataflow  and function  < control , state  or operations

(Same pattern for best and worst quartiles)


(Theirs): The form of the mental representation for OOP is in terms of real-world objects, so it is
easier to build the situation model

(Mine): The structure of C++ makes it easier to extract data flow and function information than in

[Note that both explanations could be true]

Figure 4

Mulholland, in press

64 Open University psychology students worked in pairs; utterances were later analysed

task was to find the difference between a Prolog program on paper and one in the computer, only
being allowed to see the trace output of the latter

4 modified versions of original program (control flow change, data flow change, relation name
change, and a change to an atom)

4 kinds of tracer (no space to describe details here)


Although control flow information was more frequent than data flow in all conditions, the ratio
depended on the tracer. Data flow was discussed most (both absolutely and relatively) with the
tracer called Plater.

Strategy of reviewing data-flow was commonest with Plater

Frequency of utterances about the tracer itself (indicating problems in using it) differed between
tracers. Plater was easiest to understand.

Mean number of utterances by content, for each of 4 tracers


When data flow information was easily accessible from the tracer (e.g. from Plater), it was a
preferred resource. When it was hard to extract data flow, the subjects had to fall back on control
flow information