How do you combine words, images and sound into one purposeful text?
Create a multimodal text that combines modes purposefully to communicate meaning to an audience.
How to create multimodal texts in TCE English: combining written, visual and audio modes with purpose, meeting the work requirement for at least one multimodal text.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
The course work requirements specify that you create a range of texts, including at least one multimodal text for assessment. A multimodal text is one that communicates through more than one mode at once: written or spoken words, still or moving images, sound, and design elements such as layout, colour and typography. A podcast with scripted narration and sound design, a short documentary, an annotated photo essay, a designed infographic with explanatory text, and a narrated slide presentation are all multimodal. The mistake is to treat the extra modes as garnish on a piece of writing. They are part of the meaning.
The central principle is that modes should do different jobs and reinforce one another. If your images merely repeat what your words already say, you have a written text with pictures, not a multimodal one. The strongest multimodal texts let each mode carry what it does best. Words can argue, define and sequence. Images can show, evoke and condense. Sound can set mood, pace and emphasis. When you assign each idea to the mode that conveys it best, the whole becomes more than the sum of its parts.
Designing for audience and purpose
Every choice in a multimodal text should answer to audience and purpose, just as in any composition. Who is this for, and what should it make them think, feel or do? That question governs the tone of your narration, the style of your images, the pace of your edits and the formality of your design. A public health message for teenagers and a reflective documentary for adults will look and sound completely different, and the difference is not accidental; it is designed.
Coherence across modes is the technical challenge. The text must feel like one piece, not separate components stapled together. Visual consistency, such as a repeated colour or motif, a steady narrative voice, and a clear structure that the audience can follow, all hold a multimodal text together. Transitions matter especially in time based modes like audio and video, where a clumsy cut can break the audience's attention.
Planning before producing
Resist the urge to start editing software before you have a plan. A storyboard or a simple plan that maps which mode carries which idea will save hours and produce a tighter text. For a podcast, draft the script and mark where sound will enter. For a video, storyboard the shots against the narration. For an infographic, sketch the visual hierarchy before choosing colours. The planning is where the design thinking happens; the software just executes it.
When you submit a multimodal text, be ready to explain your design choices if asked, since understanding why you combined modes as you did is part of demonstrating control over the form.