Electronic Proceedings of the
ACM Workshop on Effective Abstractions in Multimedia
November 4, 1995
San Francisco, California

Effective presentation of information...

Effective Presentation of Information Through Page Layout: a Linguistically-Based Approach

Klaus Reichenberger, Klaas Jan Rondhuis, Jörg Kleinz and John Bateman: GMD-German National Research Centre for Information Technology; IPSI-Integrated Publication and Information Systems Institute; Dolivostraße 15, 64293 Darmstadt, Germany; [reichen, rondhuis, kleinz, bateman]@darmstadt.gmd.de

ACM Copyright Notice

Abstract

This contribution deals with the structuring and the presentation of information of textual and visual nature as can be found on a typical magazine page. Here, linguistic, graphical and typographical resources go hand in hand in order to create an organisation of the material to be presented, thus providing not only an overview but also easier access to the document content. In the following, some concrete layout alternatives will be analysed with the aim to identify layout decisions. We will then outline possible motivations and realisations of these layout decisions.

Introduction
Some Layout Examples
Discussion
How is the Layout Organisation Motivated?
Realisation of the Layout Goals
Summary and Conclusions
Bibliography

Introduction

One of the most widely discussed issues in new media is the linearity of text and the non-linearity of discourse or thought": Hypermedia systems, for example, propose a technical solution for the supposed impossibility of text to communicate other than linear structures. This contribution, however, starts from the premise that text (here: written, formatted text) does succeed in communicating complex non-linear structures in an efficient manner and aims to analyse this very phenomenon in order to improve the quality of information presentation in general.
   There are substantial linguistic resources for creating links between different parts of a text or across texts. These have been extensively classified by, for example, Halliday and Hasan (1976) under the term endophoric; this includes anaphora, cataphora, etc. Martin (1992) provides further textual uses of the resources of identification, illustrated in examples such as "The next section concerns..", "the previous three examples were...", as well as simple "this" and "that" as used internally to a text. Locating the possible antecedents of such text-referring anaphors has frequently been investigated (cf. Sidner 1981; Reichman 1985; Asher 1993). Connectives of various kinds also serve a text organizing function (termed variously internal, pragmatic, presentational in the literature: see Bateman and Rondhuis (1994) for an extensive overview), marking argument chains and textual progressions: e.g., "Previously,", "In contrast,", "Indeed", etc. These often combine with the identificational devices. Finally, links can be explicitly made - in which case the information communicated is relocated entirely at the meta-level: e.g., "See chapter 8", "The reader is referred to the excellent work of Smith (1988)", etc. While the possibilities for bringing different parts of a text or of several texts into relations are very numerous, using them too much can detract from the effectiveness of a text. Too many cross-references render a text unreadable. Moreover, textual antecedents are not always clearly identifiable: for example, the text-referent of proximal "this" may in principle be selected anywhere from the set of open discourse topics.
   The linguistic means to realise non-linear structures in texts are usually accompanied and supported by the page layout when it comes to the presentation of the text. We take a broad view of page layout: besides the placement of text blocks and pictures on the page, we are concerned with typographical issues such as the choice of type, spacing, leading etc. All these layout resources work hand in hand with linguistical resources to provide meta-information about the text, which fit together to a structural overview (see Norrish 1987). Such an overview has three main functions. It not only allows for the communication of non-linear discourses but also serves as an access tool for its non-linear, selective consumption and creates expectations concerning the content of the document on the part of the reader, thus influencing the decision to read the document at all, for example.
   In this work, we will investigate how page layout works and how it contributes to the presentation of a text. This means to identify typographical resources and motivate their use. In this respect, page layout differs strongly from our previous field of work, diagram design (Reichenberger et al. 1995). This is because diagrams communicate subject-matter information, whereas page layout mainly communicates meta-information, i.e., information about the information-communicating process (the text). This meta-information can be: the text has such and such sections, they all have such and such lengths, they should be read in this order etc. Therefore the use of layout resources is not motivated directly by the subject matter as in diagram design, but draws on a text organisation specification similar to that employed for motivating the linguistic resources - here represented concretely in terms of Rhetorical Structure Theory (henceforth RST).
   Correspondingly, the point of departure is a given text. Typographical decisions in a narrower sense are only concerned with the placement and formatting of that text. In the broader process of page layout also purely graphical elements (lines, colored areas etc.) can be introduced, but their role compared to that of written text is usually quite the opposite as in diagram design: there, graphical elements are the primary carriers of the message, text in labels and legends only acts as a key to that information communication, here in layout, text and its typographical arrangement carry the message and graphical elements are only used to support or ornament it.

Some Layout Examples

In the following we present three example layouts of the same magazine article - adapted from an originally multicoloured article in a German sports magazine (fit for fun 1995, p.92). In magazines layout not only plays a more important role compared to other publications but also is subject to fewer generic constraints (as in, e.g., encyclopedia or newpaper layouts), which makes them a suitable test case.
Apart from functional aspects layout always has a certain style which should appeal to the target reader and should raise certain expectations about the contents and their literary style. Although it's not always possible, we will try to separate functional and stylistic aspects of the layout and restrict ourselves to the functional ones. Therefore we have used the same stylistic means for all examples.

^{Figure 1: First example layout of the Unihoc page}

The first example contains no layout decisions that would subdivide the text into segments and establish relations among these segments - despite the fact that the text organisation in terms of RST (see below) contains many such segments and relations. The only subdivisions used in this example are the heading and the paragraph divisions. This example has only been added for comparison.

^{Figure 2: Second example layout of the Unihoc page}

In example 2 a relatively large number of layout decisions is taken to structure the article. For instance, the most important statements form a block of their own at the top of the page, additional information is placed in a vertical grey bar, the main text is divided in two sections "The rules" and "Unihoc variants". We believe that this layout fullfills the tasks (set out in the introduction) reasonably well.

^{Figure 3: Third example layout of the Unihoc page}

Example 3 shows that achieving an overview of the article is not necessarily supported by the arbitrary application of layout measures. Among the unmotivated decisions that make this layout incomprehensible are: the "variants" section breaks down into two parts of different visual appearance, additional information (events) becomes erroneously important due to its prominent position and font size, the information about the author is related only to the first textblock and not to the entire article. The problem this raises is identical to any generation task once more flexibility of expression/presentation is made available, it is essential to appropriately control this flexibility in order to avoid wrong decisions.

Discussion

Most of the layout measures applied in example 2 and misapplied in example 3 have something in common, namely grouping. That means that the coverage of the printed area is no longer homogeneous, instead elements form clusters which are embedded. The prominent role of grouping comes as no surprise: grouping is something like the core component of visual perception (Sarkar and Boyer 1993). Besides, grouping seen as an abstract operation on the textual level is probably the most important and most common means for the organisation of text or any data in general into an overview. Therefore it seems logical to employ grouping purposively for the layout of pages. The recursive grouping of layout example 2 into larger and larger units can be seen in the following thumbnail images showing the same page at different resolutions.

^{Figure 4: In the different resolutions we see how words gradually blend
into lines, lines into blocks etc. The labeling of the main components (A, B, C) is referred to in figure 5.}

^{Figure 5: The hierarchy imposed by visual grouping taking parts of the example layout number 2 as an example (pic = picture, w = word, ch = character).}

This grouping is represented schematically in figure 5. The diagram can be read as follows: The vertical axis shows the progressive reduction of the page resolution. The placement of an element along this axis indicates that from this resolution on downwards its subelements cannot be distinguished anymore. The lower such a point lies, the higher the resolution-reduction must be for blending its children into one element, i.e. the more prominent these children are. The depth of two elements with regard to their next common root indicates the strength of their visual coherence. We assume that the strength of visual coherence is proportional to the strength of rhetorical coherence.
   Finally, the legibility of an element corresponds to the minimal resolution at which the text is still readable, which is more or less the resolution where the single characters just blend into words. Obviously, this depends mainly on the type size.
   An increase of prominence as well as legibility also means an increase of the space required (free space, additional graphical elements, larger fonts etc.). The attribution of prominence and legibility to elements is a decision which up to a certain point can also have a formal aesthetic motivation. However, it basically has to reflect the intrinsic or rhetorical importance of the elements for the achievement of the goal of the entire group, which we call communicative significance. That means, given the same amount of text, a more significant element may use up more space, either through its own size or through the amount of space around it (For instance, in the headline group of example 2, the "subheadline" "Teamsport Unihoc" has only a small communicative significance compared to the main headline but the same amount of text. Visually, it seems to be only an appendage to it). The ratio of communicative significance to amount of text is called communicative effectiveness. Differences in communicative effectiveness lead to different types of text elements with different prominence and legibility values.
   However other cases exist in which one would like to distinguish between different text types without falling back on a difference in communicative effectiveness. An example: The author segment and the events segment have been attributed different types. The communicative effectiveness however - being an instrument for explaining the different treatment of elements on the same levels forming one group - cannot account for this difference just like that. Consequently, we would need to extend this instrument for the comparison of elements that do not satisfy the above conditions. However, even the introduction of an overall effectiveness1 would not justify the difference in type between the author segment and the events segment, since it is practically equal for both segments. There has to be an additional reason for type distinction. This becomes even clearer if we consider the example of a multi-column page of a multilingual edition where each column contains the same text but in a different language. Here all texts obviously have the same communicative effectiveness, nevertheless they may be attributed different types.
   Another interesting issue is reading order: the Latin alphabet imposes a left-to-right and top-to-bottom reading order. However, this default reading order is only defined on columns of text. Consequently, it is only valid within groups, for instance in the "The rules" group of our example. Among groups, the reading order is a result of the interaction between depth of embedding, visual prominence and the default reading order. In this layout the most likely starting points are the beginning of the bold paragraph at the top of the page "Among the Swedes..." and the headline "Game with ball and boards".
   From this discussion we see that the main goal of the layout, to provide an overview over the document, is mainly realised through three subgoals: grouping, type distinction and reading order. In the following graph the three subgoals are represented as relations between the elements of the Unihoc example. This representation containing also the communicative effectiveness of the elements, we call layout structure (henceforth LS).

Layout with labels ^{Figure 6 a+b: Layout Structure of the Unihoc article (effectiveness values have been omitted)}

So far we have presented a layout description on a high level, that is through layout goals represented in the LS. It seems plausible that these goals can lead to a layout similar to example 2. Moreover it appears that these layout goals reflect up to a certain point the rhetorical organisation of the text. Since both assumptions only are based on the intuitive analyis of example 2, they require further generalisation and formalisation. That is why on the one hand we need to ask the question how these goals can be sensibly realised, on the other hand how they are formally motivated.

How is the Layout Organisation Motivated?

We have proposed that page layout typically supports the non-linear structure underlying a text. In order to motivate page layout we therefore need to include a sufficient account of this structure. Our current work has investigated the "rhetorical organization" of the document to be presented in terms of Rhetorical Structure Theory (RST: Mann & Thompson 1987), one of the most widely applied accounts of rhetorical organization used in text generation. It appears that the kinds of problems that arise in the planning of a rhetorical structure in order to express some information in a communicatively effective manner show many similarities with those arising in the planning of an appropriate layout organization. We are building on these similarities in order to provide an as uniform a model of the document creation process as possible. In the work described here, we assume that a text planning process has already created a rhetorical structure organization (or, alternatively, that some text has been marked-up according to such an organization). Our problem is then to achieve a mapping from the rhetorical organization to the layout structure. Here there are clear overlaps with work in text generation on, for example, "aggregation" (e.g., Dalianis and Hovy, to appear).
As we will discuss below, we need to be able to decide which components of the rhetorical organization will be grouped and ordered in order to provide an appropriate layout structure. However, in oppposition to work on aggregration that assumes an already formed rhetorical structure as starting point for aggregation principles, we consider it necessary to allow principles of aggregation to be both goal-driven and applicable to all levels of representation. Thus aggregation considerations must also apply even in the construction of an appropriate rhetorical organization. Some layout decisions can also be understood as global "aggregations" which are realized not linguistically but visually. For the present paper, however, we will restrict our discussion to the use of the RST organization as motivation for layout organization.
RST seeks to describe the structure of a text in terms of rhetorical relations which hold between the segments of the text. It is a functional theory: the segments related are functional rather than textual, i.e. a rhetorical relation does not necessarily need to have a specific grammatical or lexical realisation. Mann and Thompson distinguish 23 relations which they argue to be adequate for the description of a wide variety of texts. The concept of nuclearity plays a crucial role in the definition of rhetorical relations. Often two segments related by a rhetorical relation are not of equal importance: the text segment most essential to the writer's purpose is called the nucleus, the other segment is the satellite.

^{Figure 7: RST structure of the Unihoc text}

This is graphically depicted in figure 7 (where [4b] is the nucleus and [4a] the satellite and ci stands for the relation which relates them). A nucleus can not be ommitted from the text without endangering the text's coherence, whereas a satellite can. If a relation holds between text segments of equal importance it is called multinuclear (for instance the joint relation between [10] and [11] in figure A5). A definition of a RST relation typically consists of the following elements: the name of the relation; the constraints on the nucleus, on the satellite and on their combination; the effect of the relation and the locus of this effect. The definition in table 1 taken from Mann and Thompson (1987) will illustrate this.

RST relation name: condition
constraints on N: none constraints on S: S presents a hypothetical, future, or otherwise unrealized situation (relative to the situational context of S) constraints on N+S combination: Realization of the situation presented in N depends on realization of that presented in S. the effect: Reader recognizes how the realization of the situation presented in N depends on the realization of the situation presented in S. locus of the effect: N and S

^{Table 1: definition of RST relation}

The structure in figure 7 also shows that RST is recursive: nuclei and satellites themselves may also be RST structures. There are neither constraints on the order of the nucleus and the satellites, nor on the number of satellites a nucleus may have. Metainformation as required by the layout is usually not formulated directly in the form of a rhetorical relation, with the exception of the summary relation. For that reason a transformation of RST relations into the layout structure is not straightforward, but has to be done through the analysis of the RST structure.
This is for instance the case in unit A of example 2 in which a kind of summary of the document is made available to the reader. The contents of this summary are neither related to the rest of the text in the RST structure by means of the summary relation, nor are they explicitly represented as a coherent text unit, in which case it would be either a satellite or a nucleus. Our hypothesis is rather that this unit is composed of the summary of most important statements on the central theme. These statements can be found by gathering branches in the RST structure until a certain volume has been reached. These branches must not be too deeply embedded which means that the root of the branch has to be directly connected to the root nucleus - which may be included as well - on a high level. Another example are the events which express a form of additional information on the central theme. Our hypothesis is that if this information has only a limited outflow (no or little other information is dependent on it) or if it has a distinct and/or regular structure it is a likely candidate for a separate layout item. It is to be expected that a large part of the layout goals can be inferred from RST by means of a structural analysis. Additional information is required to determine the layout goals completely and thus enable the transformation of the RST structure into the layout structure. For instance, RST ignores headlines completely, whereas they are essential in the layout structure. To overcome this problem, headlines should be generated from suitable text elements identified by the RST structure. Theme analysis could play an important role in this headline generation. It can also be helpful in another field: thematic dependencies can constrain the reading order of elements and thus determine layout goals.

Realisation of the Layout Goals

After having treated the motivation of the layout goals, we now turn to a more concrete level, their actualisation. The resources generally used to realize the layout goals are the distribution of the elements on the page and the use of various text attributes and formatting parameters (type face, type size, weight, justification: left, right, middle, flush setting, line space etc.). The use of all text attributes and some formatting parameters leads to a certain appearance of the type, which in terms of visual perception may be called texture. The texture of an element is the most important means to realize type distinction. Grouping and reading order are mainly realized by purposively positioning the elements. However, there is no one-to-one mapping between the individual layout resources and the layout goals: a layout goal can generally be realized by multiple resources; most resources can be used for the realization of multiple goals. A more extensive enumeration of the different resources and their appropriateness to communicate grouping, reading order and type similarity is given in table 2.

             constraint  group  order  type   
                                       sim.   
              closeness  +++                  
              alignment  +++    +      +      
             same width  +             ++     
            same height  +             ++     
             same shape  +      -      ++     
           same texture  ++     -  -   +++    
         equally marked  +      -  -   ++     
           before/after  -      +++           
                overlap  ++     ++     -      
             same layer  +      -  -   +      
              same area  +++           +++

^{Table 2: Use of the graphical and typographical resources for the different layout goals}

   The interdependence of the layout goals as well as the overlaps in the assignment of layout goals to graphical resources indicate that normally many different layouts will exist which are functionally similar and that any process of automatically transforming the layout goals into an actual layout is very complex.
   Some characteristics and components of such a process can already be identified. It will have to take into account the fact that layout is not solely determined by constraints as regards the content to be communicated to the reader, the so-called communicative goal, but also by constraints concerning the content-independent outward appearance (the presentational goals). For instance, a magazine can have its own typical "look" fixed in style guides, which may constrain the choice of type faces, colours, etc. In most cases, including most magazine articles, the construction of layout goals and the search for the best realisation only has to be done on a relatively high level. The fact is, the layout goals and means for the realisation of elements smaller than those identified in the LS - e.g. paragraphs, lines, words, letters - are largely determined by the conventions of microtypography (Hochuli 1987).
   One component for automatically realising layouts will most likely be a resource allocation process that solves the conflicts among the layout goals competing for resources and finds an optimal combination. A further special problem arises from the fact that, although many of the resources from table 2 constrain positioning, they do not determine it completely.
   In the area of page layout a number of specific layout strategies are commonly used, each offering a mechanism for the concrete spatial distribution of the layout elements. These strategies range from "free" positioning (employed in our example 2) over grid-based positioning to the linear filling of columns (used in example layout 1). A preselection from the graphical resources comes along with the spatial distribution mechanism in most strategies. For example, the only resources that the filling of columns leaves to pick from, are texture, marks and colored areas. All positioning-related resources are either occupied by that strategy or incompatible with it.
   Two international standards in the area of layout, FOSI (1990) and DSSSL (1994) are based on this strategy. They both provide a language to model the behaviour of sgml-marked text in all possible situations that may occur during the filling of columns. The only work known to us that takes into account other spatial distribution strategies is the one of Weitzman and Wittenburg (1994). Here, the distribution strategy for each layout is given through the set of constraints - somewhat more concrete than our resources - that is applied to the elements in the given case. The application of the constraints is driven by manually generated rules, their realisation is done by a constraint-solving process.

Summary and conclusion

In this paper we have investigated both semantic and visual aspects of layout. In connection with these aspects three levels (RST, LS and graphical layout) have been introduced, which can be regarded as intermediate steps in the transition from a semantic to a visual representation of documents. Since our goal is the automatic generation of documents it will be necessary to develop a process which performs the transition from one level into the next, i.e. the transformation from RST to LS as well as the conversion of LS into a graphical layout. This forms the main direction of our current work.

Bibliography

[Asher 1993]

Nicholas Asher. Reference to abstract objects in discourse. Kluwer Academic Publishers, Dordrecht, 1993.

[Arens & Hovy 1990]

Y. Arens & E. H. Hovy. How to describe what? Towards a theory of modality utilization. The Twelfth Annual Conference of the Cognitive Science Society. Lawrence Erlbaum Associates, Cambridge, MA, 1990, pp. 487 - 494.

[Bateman and Rondhuis 1994]

John Bateman and Klaas Jan Rondhuis. Coherence Relations: Analysis and Specification. DeliverableR1.1.2:a,b. ESPRIT-project EP6665 DANDELION.

[Dalianis & Hovy, to appear]

Hercules Dalianis and Eduard Hovy. Aggregation in Natural Language Generation. To appear in: Trends in Natural Language Generation: An Artificial Intelligence Perspective, Springer, fothcoming 1995.

[DSSSL 1994]

Information technology - Text and office systems - Documents Style Semantics and Specification Language (DSSSL) ISO/IEC DIS 10179.2, International Organization for Standardization, 1994.

[FIT FOR FUN 1995]

FIT FOR FUN, Deutschlands großes Aktiv-Magazin, FIT FOR FUN Verlag, No. 5, 1995

[FOSI 1990]

Markup Requirements and Generic Style Specification for Electronic Printed Output and Exchange of Text (contains section on Formatting Output Specification Instance - FOSI), Military Specification MIL-M-28001A. CALS Policy Office, 1990.

[Halliday and Hasan 1976]

Michael A.K. Halliday and Ruqaiya Hasan. Cohesion in English. Longman, London, 1976.

[Hochuli 1987]

J. Hochuli. Das Detail in der Typografie. Compugraphic, Wilmington (Mass.), 1987

[Mann and Thompson 1987]

William C. Mann and Sandra A. Thompson. Rhetorical structure theory: a theory of text organization. Technical Report RS-87-190, USC/Information Sciences Institute, 1987. Reprint series.

[Martin 1992]

James R. Martin. English text: systems and structure. Benjamins, Amsterdam, 1992.

[Norrish 1987]

Patricia Norrish. The graphic translatability of text. British Library Board and University of Reading, 1987

[Reichenberger et al. 1995]

Klaus Reichenberger, Thomas Kamps and Gene Golovchinsky. Towards a Generative Theory of Diagram Design, to appear in Proceedings of InfoVis �95, October 1995, Atlanta, Georgia.

[Reichman 1985]

Rachel Reichman. Making computers speak like you and me. The M.I.T. Press, Cambridge, MA, 1985.

[Sarkar, S., Boyer, K. L. 1993]

Perceptual Organisation in Computer Vision: A Review and a Proposal for a Classificatory Structure, in IEEE Trans. on Systems, Man and Cybernetics, Vol 23, No. 2. pp. 382-399.

[Sidner 1981]

Candace L. Sidner. Focusing for interpretation of pronouns. In: American Journal of Computational Linguistics, number 4, volume 7, pages 217-231.

[Weitzman and Wittenburg 1994]

Louis Weitzman and Kent Wittenburg Automatic Presentation of Multimeda Documents Using Relational Grammars. In: Proceedings of ACM Multimedia `94, San Francisco, CA, USA, 1994. pp. 443-451.

Appendix The example text

[1] Among the Swedes it is the most popular and best-known branch of sport. [2] We are talking about Unihoc, also called Floorball or Indoor Bandy. [3] This mixture of hockey and ice hockey is attracting ever more supporters. [4a] Since the middle of the eighties, this dynamic team sport has also been played in Germany and [4b] the step to becoming a school sport is imminent. [5] Unihoc can be played in the gym as well as outside, on grass or ice. [6a] Because the ball can be played with both sides of the stick, [6b] it is much easier to master than normal hockey. [7] One can continue playing behind the goal (four metres up to the board) and [8] there is no offside rule. [9] Stopping the ball with the stick and the foot is allowed, as well as playing via the board. [10] Not allowed is raising the blade of the stick above knee height, or lifting, hitting and holding the opponent's stick. [11] Nor is it allowed to enter the goal area, to play the ball while lying or kneeling, to move the stick around between the legs of the opponent, and to engage in hard body contact. [12] Unihoc allows many alternatives in how it is played. [13] One possibility: [14] each team has six players, and no goalie. [15a] In front of the goal there is a no-go area - [15b] no players are allowed within a semicircle of almost 2 meters radius. [16] The second alternative requires more tactical insight: [17] here there are 6 players per team, plus a goalie. [18] Each receives a clear function, which determines their effective playing area. [19a] The two defenders may only act within their own half; [19b] in contrast, the two attackers may only play within their opponents' half. [20] Only the midfield players can have their fling on the entire playing field. [21] In this exciting variant solo artists have no chance. [22] Seeing the free team member and passing the ball to him is essential. [23] Two variants have become dominant. [24] On the large field (forty metres long, twenty metres wide) two six-man teams with fixed goalie oppose each other (Playing time- two times twenty minutes). [25a] A board keeps the ball continuously in play; [25b] rest periods hardly ever occur. [26] As in ice hockey a player substituting another does not lead to an interruption of play. (up to eight substitute players per team). [27] This is the variant that is used in international matches. [28] The German Unihoc Union (0421/4984255) frequently goes back to the small field variant: [29] where mixed 4-man teams play without a goalie. [30] The playing field is only thirty metres long and sixteen metres wide, while the playing time is halved. [31] The goals are also smaller than on the large field. [32] The goals (60 x 90 centimeters) are collapsible. [33] In the gym light holed plastic balls (20 grams, 8 centimeter diameter) are used. [34] The sticks (Kevlar 95 Mark, plastic 10 Mark) are 100 to 120 centimeters in length. [35] Complete sets of Unihoc equipment cost around 450 Marks. [36] Info: 05357/18181. [37] 05.-07.5., D�sseldorf, [38] 09.-11.6., Clausthal-Zellerfeld, [39] 16.-18.6., M�nchen, [40] 23.-25.6., Halle/Saale, [41] 03.-05.11., Bremen, [42] 10.-21.11., G�teborg, [43] 17.-19.11., Bremen, Deutsche Meisterschaften. [44] Weitere Infos: 0421/23 94 01