§-Syllabus dot point

WAEnglishSyllabus dot point

How do visual and multimodal texts construct meaning through image, layout and design?

Analyse how visual and multimodal texts construct perspectives and position viewers through visual and design choices

A focused answer to the WACE Year 12 English Unit 4 dot point on visual and multimodal texts. How framing, salience, gaze, colour and layout carry meaning, how mode and image interact, and how to write visual analysis with the same rigour as language.

Generated by Claude Opus 4.86 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

WACE English includes visual and multimodal texts, and the Comprehending section regularly presents images, advertisements, cartoons or texts that combine words and pictures. Students who treat the image as a backdrop to the words lose marks. This dot point asks you to read visual and design choices analytically, applying the same discipline of naming a feature and arguing its effect that you use on prose, and to understand how the different modes in a multimodal text work together.

Visual texts have their own metalanguage

Just as written texts have syntax and diction, visual texts have a vocabulary of choices you can name precisely.

Framing and composition: what is included, what is cropped out, and how the elements are arranged.
Salience: what draws the eye first, through size, contrast or placement.
Gaze and angle: where a depicted figure looks, and whether the viewer looks up at, down on, or level with the subject.
Colour: the palette and its connotations, including warmth, coldness and saturation.
Layout: in texts with words and images, how the two are arranged and which leads.

Using this vocabulary accurately lets you argue effect rather than describe the picture.

A camera angle is a position

Visual choices construct perspective in the literal and the analytical sense. A low angle looking up at a figure positions the viewer to feel small before them, lending the subject power. A high angle looking down does the reverse. A direct gaze meeting the viewer demands engagement; an averted gaze invites the viewer to observe unseen. None of this is neutral. Each choice positions the viewer to feel and judge in a particular way, and naming the choice is the start of the analysis.

Model analytical paragraph on a visual text

Reading how an image positions a viewer

The charity advertisement positions the viewer through a single dominant choice, the child's direct gaze meeting the camera at the viewer's own eye level. By refusing the high angle that would invite pity from above, the image instead demands engagement as an equal, so the viewer is positioned not to feel sorry for the child but to feel addressed by them. Salience reinforces the demand: the child occupies the upper third in sharp focus while the background blurs into a desaturated grey, stripping away context so nothing competes for the eye. The warm tone reserved solely for the child's face, set against that cold surround, constructs the figure as the only source of life in the frame. The composition therefore argues a perspective, that this is a person who looks back rather than a victim to be observed, and the viewer is positioned to respond to a relationship rather than a scene.

The paragraph names visual features with accurate metalanguage and argues the position each constructs, treating the image exactly as it would treat a written text.

Multimodal texts coordinate their modes

In a text that combines words and images, the meaning lives partly in how the modes interact. Words can anchor an ambiguous image toward one meaning, an image can undercut or ironise the words above it, and layout decides which the reader meets first. Analysing a multimodal text means reading the relationship between modes, not just each mode alone.

A reliable analytical frame

Build the point around this chain: the visual choice of [framing, salience, gaze, colour or layout] positions the viewer to [response] by [how], constructing a perspective in which [view]. The frame keeps your visual analysis as rigorous as your language analysis.

How this maps to the exam

The Comprehending section regularly includes a visual or multimodal text, often paired with a written one, and the marks reward genuine visual analysis rather than description. Reading the relationship between modes is also useful in Responding when studied texts are films or graphic works, where image and word are inseparable.

Exam-style practice questions

Practice questions written in the style of SCSA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

WACE 20227 marksSection One (Comprehending). Analyse how the visual text positions the viewer through its design choices. Refer to specific visual features.

Show worked answer →

A 7 mark answer reads the image with the same rigour as language, naming visual choices and arguing the position each constructs.

Plan: identify the dominant visual choice (gaze, salience, framing, colour or angle), then build two or three points that each name a feature and argue its effect on the viewer.

Para 1 (the dominant choice). Name it, for example a direct gaze at eye level, and argue what it demands of the viewer (engagement as an equal rather than pity from above).

Para 2 (reinforcement). Show how salience, colour or composition reinforces the position, attaching effect to each.

For a multimodal text, add a point on how word and image interact (anchoring, undercutting, which leads).

Markers reward analysis of visual choices and penalise description of what the image shows. Use accurate visual metalanguage throughout.

WACE 20236 marksSection One (Comprehending). Compare how a written text and an image represent the same subject.

Show worked answer →

A 6 mark comparison weighs the affordances of each mode rather than describing them in turn.

Plan: name the shared subject, then compare how the written text and the image each represent it, moving between the two within paragraphs.

For the written text, analyse a language choice and its effect; for the image, analyse a visual choice (salience, gaze, framing) and its effect.

Strong move: explain the difference by what each mode is built to do, for example prose interiorising across time versus an image working spatially in a single frame.

Markers reward integrated comparison, accurate metalanguage for both modes, and difference explained by mode rather than asserted.