Copyright © 2004 FactOne.
All rights reserved.
A Summary of Eye-movement Methodologies

Compiled by Sheree Josephson, Ph.D.

This short article summarizes some of the research methodologies that can be utilized to analyze and interpret eye-movement data. The summary draws heavily from the seminal article by Joseph H. Goldberg and Xerxes P. Kotval titled “Computer interface evaluation using eye movements: methods and constructs” published in a 1999 edition of the International Journal of Industrial Ergonomics.

Goldberg and Kotval make a convincing argument for the use of eye-movement data in interface evaluation and usability testing. They say: “Interface evaluation and usability testing are expensive, time-intensive exercises, often done with poorly documented standards and objectives. They are frequently qualitative, with poor reliability and sensitivity. Provision of an improved tool for rapid and effective evaluation of graphical user interfaces was the motivating goal underlying the present work assessing eye movements as an indicator of interface usability (p. 632).”

The following summary takes a look at eye-movement data as (1) measures of processing, (2) measures of search, (3) measures of scanpaths, and (4) other measures.

1. Measures of processing

1.1 Number of fixations

In visual search, the number of fixations is related to the number of components that the user is required to process, but not the depth of required processing (Goldberg and Kotval, 1999, p. 643). However, once the searcher has found what he/she is interested in, the number of fixations indicates the amount of interest in a visual area. Mackworth and Morandi (1967) made comparisons between visual fixations on, and verbal estimates of, the relative importance of regions within photographs. These authors found that the regions that were rated highly for informativeness produced the highest fixation frequency. Baker and Loeb (1973) found appreciable correlations between ratings of the importance of sections of geometric forms and durations of fixations on those sections.

1.2 Location of fixations

Fixations indicate one’s spatial focus of attention over time. The eyes naturally fixate on areas that are surprising, salient, or important through experience (Loftus and Mackworth, 1978).

1.3 Fixation duration

Viviani (1990) found that the minimum processing duration during a fixation is 100 to 150 milliseconds with the average length of a typical fixation at 250 to 300 milliseconds. Yarbus (1967) agreed that fixation durations are in the same range.

Buswell (1935) found that more difficult processing produced longer fixation durations. Mackworth (1976) noted that higher display densities produced fixations durations that were 50 to 100 milliseconds longer than those fixations on lower density displays. Looking at character and line spacing in a reading task, Kolers et al. (1981) found more fixations per line (and fewer fixations per word) with more tightly grouped, single-spaced material. Fewer, yet longer, fixations were made with smaller, more densely packed text characters.

In summary, longer fixations imply the user is spending more time interpreting or relating the visual representation to internalized representation. Goldberg and Kotval (1999) said that representations that require long fixations are not as meaningful to the user as those with shorter fixation durations (p. 643).

1.4 Cumulative fixation time

The total amount of fixation time on an area of interest (AOI) is generally interpreted as the amount of interest a viewer has in that particular visual element. It is also interpreted as the amount of time spent processing the information.

Latimer (1988) suggests making a cumulative fixation time (CFT) plot, which is a topographical representation of the cumulative fixation times made by a subject while inspecting a stimulus. Such plots can also be cumulative across stimuli and/or subjects (p. 438). The CFT measure is close to what Just and Carpenter (1976) referred to as “location and duration of gaze.” To plot a CFT, a computer program reads a text file of x-y coordinates and fixation times, and then (1) converts the x-y coordinates to row and column coordinates, (2) accumulates the fixation times for each location, and (3) prints out truncated t values in their screen locations (p. 438). Methods and computer programs exist to depict the CFT in a three-dimensional graph.

1.5 Cluster analysis

Clusters are heavy concentrations of fixations and fixation time (attention) on particular areas of a screen and stimuli. If a CFT distribution can be partitioned into such clusters in some relatively objective manner, an experimenter should be able to point to such clusters and assert that stimuli or stimulus parts located under them have received significant attention by a subject or subjects (Latimer, 1988, p. 445). The mean of a cluster in a CFT distribution is assumed to be the central point of an area of the screen, a stimulus or a stimulus region for which the fixations within the cluster were intended (Latimer, 1988, p. 445).

Latimer proposes using the k means cluster methodology for analysis of eye-fixation location. According to this approach, each object or location is given three descriptors (triple): its row coordinate, its column coordinate and its t value. Then Latimer uses the three-dimensional representations to estimate the location and number of clusters. The initial cluster means are noted. Once the initial estimates are settled, the remaining triples (t values together with their row and column coordinates) are now considered in random order for membership within each cluster. The criterion of membership is the Euclidean distance from each cluster mean, where each randomly chosen triple is allocated to its nearest cluster. After a triple has been allocated, new mean row and column coordinates and total t value are computed from the triples already in the cluster and the new triple. The process of random assignment of triples continues until all triples in the CFT distribution have been allocated and the final mean positions ascertained (Latimer, 1988, p. 445).

1.6 On-target fixations

The ratio of the number of fixations on the designated area of interest (AOI) or target can be determined by counting the number of fixations on AOI, then dividing by all fixations.

2. Measures of search

2.1 Scanpath length

Scanpath length is computed by summing the distance (in pixels) between gazepoint samples.

For information search tasks, the ideal scanpath is a straight line to the target, with relatively short fixation duration at the target (Goldberg and Kotval, 1999, p. 635).

Shorter scanpaths seem to indicate that the information is well-organized and information is easy to find. Goldberg and Kotval (1999) found that a well-organized grouping of component buttons in a computer interface resulted in shorter scanpaths, covering smaller areas.

Lengthy scanpaths indicate less efficient scanning behavior but do not distinguish between search and information processing times.

2.2 Scanpath duration

Scanpath duration is more related to processing complexity than to visual search efficiency, a much more relative time is spent in fixations than in saccades. Using 60 Hz gazepoint samples, the number of samples is directly proportional to the temporal duration of each scanpath, or Scanpath Duration = n x 16.67 ms, where n = number of samples in the scanpath. However, using fixatons, the scanpath duration must sum fixations with saccade durations (Goldberg and Kotval, 1999, p. 638).

2.3 Convex hull area

The area covered by a scanpath is also important to an analysis of visual search. Goldberg and Kotval (1999) propose an algorithm to construct convex hulls and hull area in order to minimize the effect an outlyzing gazepoint sample would have if the area of a scanpath were simply computed by drawing a circle around the scanpath. Using the convex hull area in conjunction with the length of the scanpath, Goldberg and Kotval are able to determine whether a lengthy search covered a large or localized area on a display.

2.4 Spatial density

Coverage of an interface due to search and processing may be captured by the spatial distribution of gazepoint samples. Evenly spread samples throughout the display indicate extensive search with an inefficient path, whereas targeted samples in a small area reflect direct and efficient search (Goldberg and Kotval, 1999, 640). The amount of spatial density can be computed by dividing the visual stimulus into grid areas representing the physical screen area or specific screen objects and counting how many of these grids received at least one gazepoint sample. The spatial density index is equal to the number of cells containing at least one sample, divided by the total number of grid cells.

2.5 Transition matrix

A transition matrix expresses the frequency of eye-movement transitions between Areas of Interest (AOIs) (Ponsoda et al., 1995). This measurement considers both search area and movement over time. Also known as link analysis (Jones et al., 1949), frequent transitions from one region of a display to another indicates inefficient scanning with extensive search. Goldberg and Kotval (1999) determine it by dividing the number of active transition cells (those containing at least one transition) by the total number of cells. According to their formula, a large index value indicates a dispersed, lengthy and wandering scanpath, while smaller values point to more directed and efficient search.

2.6 Number of saccades

The number of saccades in a scanpath indicates the amount of search with more saccades implying greater amount of search. The number of saccades equals the number of fixations minus one.

2.7 Saccadic amplitude

Saccadic amplitude is defined as the distance covered by a saccade. The average saccadic amplitude is computed by summing the distances between consecutive fixations, dividing this by the number of fixations minus one. Well-designed Web pages or interfaces should have few interim fixations.

3. Measures of scanpaths

3.1 Scanpath regularity

Measures of scanpath complexity and regularity, considering integrated error or deviation from a regular cycle, can indicate variance. Many potential measures of scanpath complexity are possible, once cyclic scanning behavior is identified (Goldberg & Kotval, 1999, p. 643.)

3.2 Marov analysis

A Markov process is a stochastic model for the probabilities that the viewers’ eyes will move from one visual element to another. The assumption is that scanpaths across visual elements can be described by a first-order Markov process – that is, each eye fixation only depends on the previous one. Three possible stochastic possibilities underlie visual scanning: this first-order dependence, plus reversibility and stationarity. Reversibility means that saccades from element A to B occur as often as saccades from B to A (Ellis & Smith, 1985), and stationarity predicts the scanpaths of viewers exposed repeatedly to the same visual stimuli remain constant across exposures.

Noton and Stark (1971) theorized about the existence of scanpaths. Their predition is that a subject scans a new stimulus during the first exposure and stores the sequence of fixations in memory as a spatial model, so that a scanpath is established. When the subject is re-exposed to the stimulus, the first few eye movements tend to follow the same scanpath established during the initial viewing of the stimulus, which facilitates stimulus recognition.

Ellis and Smith (1985) elaborated on Noton and Stark’s scanpath theory by suggesting that scanpaths can be generated by completely random, stratified random, or statistically dependent stochastic processes. A completely random process assumes that each element of a visual has an equal probability of being focused on during each fixation. A stratified random process assumes that the probabilities of visual elements being fixated reflect the attentional attractiveness of those elements, but they do not depend on previous fixations. The statistically dependent stochastic process specifies that the position of a fixation depends on previous fixations. Rayner (1995) and Stark and Ellis (1981) believe it is unlikely that saccades from one fixation point to another are either completely random or stratified random processes and look toward statistically dependent stochastic processes as explanation.

3.3 String-edit

The string-edit method is a technique that measures resemblances between eye-path sequences by means of a simple metric based on the insertions, deletions and substitutions required to transform one sequence into another.

Abbott and Hrycak (1990) noted several advantages of string-edit methods for studying event sequences and outline several difficulties with Markovian sequence models. First, and foremost, they argued the sequence-generating process may have a longer history than the immediate past typically used in Markov analysis. Second, Markov models describe the stochastic processes that generate observed sequences, and can be used to explore the goodness of fit of a predicted model, but don’t address the questions of whether there is a typical event sequence for a given process. Abbott and Hrycak (1990) argued that the direct testing of the Markov model – in terms of actual resemblance between generated and observed sequences – requires a technique for assessing similarity between sequences, categorizing sequences, and identifying typical sequences. String-edit analysis affords all of these techniques.

The first step in comparing eye-path sequences using the string-edit technique is to define a sequence alphabet for the stimulus. This is accomplished by assigning each defined target area an alphabetic code. The second step is to define the eye-path sequence for each subject’s viewing of the stimulus material by recording the sequence of fixations by the defined target area within which the fixation occurred (called “target tracing” by Salvucci and Anderson [2001]). For example, a viewing beginning with a single fixation in area “A” followed by three fixations in area “C” would generate a sequence beginning “ACCC...”.

Next optimal matching analysis (OMA) is used to compare the coded sequences. OMA is a generic string-edit tool for sequence comparison (Holmes, 1997) when each sequence is represented by well-defined elements drawn from a relatively small sequence alphabet. OMA produces a numerical index – the Levenshtein distance – of the similarity between any two sequences, computed as the smallest possible cost of elementary operations of insertion, substitution and deletion of units required to align or transform one sequence into another (Sankoff & Kruskal, 1983; Abbott & Forrest, 1986). Similar sequences will, when compared, have smaller dissimilarity indexes; the more different two sequences, the greater the index.

Alignments may use a combination of substitutions and indels (insertions and deletions) to produce the Levenshtein distance. In their application of the string-edit method, Brandt and Stark (1997) set equal substitution costs for all pairs of sequence elements. Josephson and Holmes (2002) based the substitution values on a measure of distance.

The contribution to the Levenshtein distance by the length of the compared eye-path sequences (defined by the number of fixations in each) is an issue that has to be considered in OMA. To adjust for the role of sequence length in shaping the total cost of alignment, the inter-sequence distance is determined by dividing the raw sum alignment cost by the length of the longer sequence in the sequence pair. This makes the distance relative to length and comparable across pairs of varying lengths.

Next, WinPhaser software (Holmes, 1996) is used to generate a sequence distance matrix of distance indexes for each possible pair of sequences for each stimulus. WinPhaser’s OMA package uses a dynamic programming algorithm by Andrew Abbott. UCINET software (Borgatti, Everett & Freeman, 1992) is then used to perform non-metric multidimensional scaling and hierarchical cluster analysis on the distance matrices. Scaling arranges the sequences in n-dimensional space such that the spatial arrangement approximates the distances between sequences; cluster analysis helps to define “neighborhoods” of similar cases within that n-dimensional space.

4. Other measures

4.1 Backtrack

A backtrack can be described by any saccadic motion that deviates more than 90 degrees in angle from its immediately preceding saccade. These acute angles indicate rapid changes in direction, due to changes in goals and mismatch between users’ expectation and the observed interface layout (Goldberg & Kotval, p. 643).

4.2 On-target: all-target fixations

The on-target: all-target fixations can be defined by counting the number of fixations falling within a designated AOI or target, then dividing by all the fixations. This is a content-dependent efficiency measure of search, with smaller ratios indicating lower efficiency (Goldberg & Kotval, 1999, p. 643).

4.3 Post-target fixations

The number of post-target fixations or fixations on other areas, following capture of the target, can indicate the target’s meaningfulness to a user (Goldberg & Kotval, 1999, p. 643).

References

Abbott, A. & Forrest, J. (1986) Optimal matching sequences for historical sequences. Journal of Interdisciplinary History, 16, 471-494.

Abbott, A. & Hrycak, A. (1990) Measuring resemblance in sequence data, American Journal of Sociology, 16, 144-185.

Baker, M. A. & Loeb, M. (1973) Implications of measurement of eye fixations for a psychophysics of form perception. Perception & Psychophysics, 13, 185-192.

Borgetti, E. F., Everett, M. & Freeman, L. C. (1992). UCINET IV (Version 1.0) [Computer software]. Columbia: Analytic Technologies.

Brandt, S. A. & Stark, L. W. (1997) Spontaneous eye movements during visual imagery reflect the content of the visual scene. Journal of Cognitive Neuroscience, 9, 27-38.

Ellis, S. R. & Smith, J. D. (1985) Patterns of statistical dependency in visual scanning. In R. Groner et al. (Eds.), Eye movements and human information processing, (pp. 221-238). Amsterdam: Elsevier Science Publishers BV.

Goldberg, J. H. & Kotval, X. P. (1999) Computer interface evaluation using eye movements: methods and constructs. International Journal of Industrial Ergonomics, 24, 631-645.

Holmes, M. E. (1996) WinPhaser user’s manual (Version 1.0c) [computer software].

Holmes, M. E. (1997) Optimal matching analysis of negotiation phase sequences in simulated and authentic hostage negotiations. Communication Reports, 10, 1-8.

Jones, R. E., Milton, J. L. & Fitts, P. M. (1949) Eye fixations of aircraft pilots; IV: Frequency, duration and sequence of fixations during routine instrument flight, US Air Force Technical Report 5975.

Josephson, S. & Holmes, M. E. (2002) Visual attention to repeated Internet images: testing the scanpath theory on the World Wide Web. Proceedings of ETRA ‘02 (Eye Tracking and Research Applications Symposium), New Orleans, March 25-27.

Just, M. A. & Carpenter, P. A. (1976) Eye fixations and cognitive processes. Cognitive Psychology, 8, 441-480.

Kolers, P. A.; Duchnicky, R. L. & Ferguson, D. C. (1981) Eye movement measurement of readability of CRT displays. Human Factors, 23(5), 517-527.

Latimer, C. R. (1988). Eye-movement data: Cumulative fixation time and cluster analysis. Behavior Research Methods, Instruments, & Computers, 20(5), 437-470.

Loftus, G. R. & Mackworth, N. H. (1978) Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4(4), 565-572.

Mackworth, N. H. & Morandi, A. J. (1967) The gaze selects informative details within pictures. Perception & Psychophysics, 7, 173-178.

Mackworth, N. H. (1976). Stimulus density limits the useful field of view. In R. A. Monty & J. W. Senders (Eds.), Eye movements and psychological processes (pp. 307-321). Hillsdale, NJ: Lawrence Erlbaum Associates.

Noton, D. & Stark L. W. (1970) Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research, 11, 929-942.

Ponsada, V., Scott, D. & Findlay, J. M. (1995) A probability vector and transition matric analysis of eye movements during visual search. Acta Psycholgica 88, 167-185.

Rayner, K. (1995) Eye movements and cognitive processes in reading, visual search, and scene perception. In J. M. Findlay (et al. (Eds.), Eye movement research: Mechanisms, processes and applications (pp. 3-22), Amsterdam: Elsevier Science Publishers BV.

Salvucci, D. D. & Anderson, J. R. (2001) Automated eye-movement protocol analysis. Human-Computer Interaction, 16, 39-86.

Sankoff, D. & Kruskal, J. B. (Eds.) (1983) Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Reading, MA: Addison-Wesley.

Stark, L. W. & Ellis, S. R. (1981) Scanpaths revisited: cognitive models direct active looking. In Eye movements: cognition and visual perception (pp. 193-226), D. F. Fisher et al., (Eds.), Hillsdale, NJ: Lawrence Erlbaum Associates.

Viviani, P. (1990) Chapter 8. In Kowler, E. (Ed.) Eye movements and their role in visual and cognitive processes. Amsterdam: Elsevier Science.

Yarbus, A. L. (1967) Eye movements and vision. New York: Plenum Press.