Skip to main content
ExamExplained
NSW · Software Engineering
Software Engineering study scene
§-Syllabus dot point
NSWSoftware EngineeringSyllabus dot point

Inquiry Question 1: How are large-scale software solutions developed and managed?

Describe testing strategies, including unit testing, integration testing, system testing and user acceptance testing

A focused answer to the HSC Software Engineering Module 4 dot point on testing. Unit, integration, system, UAT, the test pyramid, test-driven development, the worked Python example, and the traps markers look for.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to distinguish the testing strategies that operate at different scales of a system, identify their purpose, and give a concrete example of each.

The answer

The test pyramid

The conventional model: many cheap tests at the base, fewer expensive tests at the top.

The test pyramid A pyramid with four horizontal bands. The base is the widest, labelled unit tests with many fast tests running in milliseconds. Above it integration tests running in seconds, then end-to-end system tests running in tens of seconds, with user acceptance testing as the narrowest band at the top running over minutes to a full release cycle. An arrow on the right side indicates that cost and time per test increase upward while the number of tests decreases upward. UAT end-to-end integration unit tests UAT: minutes to a release cycle end-to-end: tens of seconds each integration: ~1-2 seconds each unit: milliseconds each fewer, slower many, faster Cost and time per test increase moving up; the number of tests decreases moving up. An inverted pyramid (many end-to-end, few unit tests) is slow, flakey and hard to debug.

Unit testing

Test one function or class in isolation. Dependencies (database, network, file system) are replaced with mocks or stubs.

# code under test
def calculate_gst(price):
    return round(price - price / 1.1, 2)

# unit test
import pytest

def test_calculate_gst_basic():
    assert calculate_gst(11.0) == 1.0

def test_calculate_gst_zero():
    assert calculate_gst(0.0) == 0.0

def test_calculate_gst_rounds_to_two_decimals():
    assert calculate_gst(10.99) == 1.0

Properties: fast, deterministic, run on every commit, locate bugs precisely.

Integration testing

Test how components work together, including real or test-instance external services (database, message queue).

def test_create_order_integration(test_db):
    # Real test database, populated with a test user
    response = client.post(
        "/api/orders",
        json={"product_id": 7, "qty": 2},
        headers={"Authorization": "Bearer test-token"},
    )
    assert response.status_code == 201
    order = test_db.execute("SELECT * FROM orders WHERE id = ?", (response.json["id"],)).fetchone()
    assert order is not None
    assert order["product_id"] == 7
    items = test_db.execute("SELECT * FROM order_items WHERE order_id = ?", (order["id"],)).fetchall()
    assert len(items) == 1

Properties: slower than unit tests (seconds), catch issues that arise at boundaries (SQL errors, contract mismatches, transaction handling).

System (end-to-end) testing

Test the whole application from outside, typically through the UI or public API, against a deployed environment.

import { test, expect } from "@playwright/test";

test("user can complete a purchase", async ({ page }) => {
  await page.goto("/");
  await page.getByRole("button", { name: "Sign in" }).click();
  await page.getByLabel("Email").fill("test@example.com");
  await page.getByLabel("Password").fill("test-password");
  await page.getByRole("button", { name: "Log in" }).click();

  await page.getByRole("link", { name: "Mechanical keyboard" }).click();
  await page.getByRole("button", { name: "Add to cart" }).click();
  await page.getByRole("link", { name: "Checkout" }).click();
  await page.getByRole("button", { name: "Pay now" }).click();

  await expect(page.getByText("Thank you for your order")).toBeVisible();
});

Properties: slow (tens of seconds per test), flakey (real browser, real network), catch issues no other layer can.

User acceptance testing (UAT)

The product is exercised by real users (or business stakeholders standing in for them) against acceptance criteria from the original brief. Driven by humans, not automation.

A typical UAT scenario:

  • Acceptance criteria: "A merchandiser can add a new promotional banner to the home page that disappears after the promotion end date."
  • Tester: the head of merchandising.
  • Pass criteria: they can complete the task without developer help, and the banner behaves as documented.

UAT happens after development, before release. Confirms the system meets the business needs, not just the technical specification.

Test-driven development (TDD)

Write the test first, watch it fail, write the code to make it pass, then refactor. Cycle:

  1. Red: write a failing test.
  2. Green: write the simplest code that passes.
  3. Refactor: clean up the code while tests stay green.

TDD produces a comprehensive test suite as a side effect, encourages small focused units, and surfaces design issues early.

Other test types

  • Regression testing: rerun existing tests after a change to confirm nothing was broken. Usually automated.
  • Performance testing: measure response time and throughput under load.
  • Security testing: SAST, DAST, penetration testing (see secure-development-lifecycle).
  • Smoke testing: a quick check after deployment that the basics work.
  • Property-based testing: generate random inputs and assert properties (rather than checking specific cases).

Tooling

Language Unit Integration E2E
Python pytest pytest with fixtures Playwright, Selenium
JavaScript Vitest, Jest Vitest, Jest Playwright, Cypress
Java JUnit JUnit + Testcontainers Selenium

Continuous testing

Tests run on every commit in CI. Failed tests block merging. This is what makes continuous integration work.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 HSC6 marksDistinguish between unit, integration, system and user acceptance testing. Give an example of each in the context of an online shopping site.
Show worked answer →

Unit testing tests one function or class in isolation. Dependencies are mocked or stubbed. Fast, runs in milliseconds. Example: a calculate_gst(price) function is tested to return 0.10 for an input of 1.10 (10 percent GST). The test does not touch the database.

Integration testing tests how units work together. Components and external services (database, message queue) are real or close-to-real. Example: the checkout handler is tested against a real test database to confirm an order row, an order_items row, and a payment row are all written correctly when a checkout request comes in.

System testing tests the application as a whole, end-to-end, in an environment that resembles production. Example: a Playwright test opens the home page, logs in, adds an item to the cart, completes checkout and confirms the confirmation email is received - all through a real browser against a deployed staging copy.

User acceptance testing (UAT) is the human-driven test that the product meets business needs. Real users (or business stakeholders standing in for them) exercise the system against acceptance criteria. Example: the merchandising manager checks that the new "shop by occasion" navigation works as expected, finds the right products, and matches the agreed brief.

The four layers form a pyramid: many unit tests at the base, fewer integration tests above, fewer system tests above that, and selective UAT at the top. Markers reward all four levels named correctly, the right scope at each level (isolation, components together, end-to-end, business), and a concrete shopping-site example for each.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation3 marksA developer writes a test for a `calculate_late_fee(days_overdue)` function using a fake, hard-coded due date and no database connection. Identify the test level and justify your answer.
Show worked solution →

Unit test. The function is tested in isolation, with no real database or external dependency involved, matching the definition of a unit test (one function tested alone, dependencies replaced with fakes).

Marking criteria: 1 mark for correctly naming "unit test", 2 marks for justification referencing isolation and the absence of real external dependencies (not just "it tests one function").

foundation3 marksList the four levels of the test pyramid from base to top, and state whether the number of tests at each level typically increases or decreases moving up the pyramid.
Show worked solution →

Unit tests, integration tests, system (end-to-end) tests, user acceptance testing (UAT), from base to top.

The number of tests DECREASES moving up the pyramid: many fast unit tests at the base, progressively fewer, slower and more expensive tests at each level above.

Marking criteria: 1 mark for the four levels in the correct order, 2 marks for correctly identifying and explaining the decreasing pattern.

core4 marksA CI pipeline log shows the following run times for a school library app: 420 unit tests in 9 seconds, 40 integration tests in 70 seconds, 12 end-to-end tests in 4 minutes. (a) Calculate the average run time per test at each level, to the nearest whole millisecond or second. (b) Explain what this data confirms about the test pyramid shape.
Show worked solution →

(a) Average time per test.

unit: 9 s4200.021 s21 ms per test\text{unit: } \frac{9\ \text{s}}{420} \approx 0.021\ \text{s} \approx 21\ \text{ms per test}

integration: 70 s40=1.75 s per test\text{integration: } \frac{70\ \text{s}}{40} = 1.75\ \text{s per test}

end-to-end: 240 s12=20 s per test\text{end-to-end: } \frac{240\ \text{s}}{12} = 20\ \text{s per test}

(b) What this confirms. Each level up the pyramid costs roughly an order of magnitude more time per test (about 21 ms, then 1.75 s, then 20 s), which is exactly why the pyramid puts the most tests at the cheapest level (unit) and the fewest at the most expensive level (end-to-end); if the counts were reversed, the same suite would take vastly longer to run on every commit.

Marking criteria: 1 mark for correct unit test average, 1 mark for correct integration average, 1 mark for correct end-to-end average, 1 mark for linking the increasing per-test cost to the reason for the pyramid's shape.

core4 marksA team removes all integration tests, keeping only unit tests and end-to-end tests, believing the two extremes cover everything in between. Explain why this leaves a gap, using the shopping-site checkout example.
Show worked solution →

Unit tests confirm that individual functions, such as calculate_gst(price), work correctly in isolation. End-to-end tests confirm that the whole checkout flow works when driven through a real browser. Neither level confirms that the CHECKOUT HANDLER correctly writes an order row, an order_items row and a payment row together to a real database in one transaction; a bug in how those three writes interact (for example, a payment row created without a matching order row if the transaction is not atomic) could pass unit tests (each write function works alone) and even pass a happy-path end-to-end test, yet corrupt data under load or on a network blip.

Integration tests specifically target this boundary between components, which is why removing them leaves a real gap even with both extremes covered.

Marking criteria: 1 mark for stating what unit tests do confirm, 1 mark for stating what end-to-end tests do confirm, 2 marks for a concrete example of a boundary/interaction bug that only an integration test would catch.

exam6 marksA start-up ships a new feature with only end-to-end tests, arguing this gives 'real' confidence because it tests exactly what the user experiences. Evaluate this testing strategy for a fast-moving product team.
Show worked solution →

This is a 6-mark EVALUATE: markers reward a judgement backed by contrasted evidence, not a one-sided description.

Plan. Thesis: an end-to-end-only strategy gives high-fidelity confidence per test but is a poor overall strategy for a fast-moving team, because it trades away speed, cost and precise fault localisation, which matter more when the team is shipping frequently.

Model paragraph. It is true that end-to-end tests give the highest-fidelity signal of the four levels, since they exercise the real UI, real network calls and a deployed environment exactly as a user would encounter them, catching integration and configuration issues that a unit test mocking every dependency would miss entirely. However, for a fast-moving team this benefit is outweighed by three costs. First, speed: end-to-end tests run in tens of seconds each rather than milliseconds, so a suite large enough to cover a growing feature set would soon take many minutes per run, slowing every commit and discouraging developers from running tests locally. Second, flakiness: real browsers and real network calls introduce timing-dependent failures unrelated to the code itself, eroding trust in the suite until failures are ignored, which is more dangerous than having no tests at all. Third, and most costly for a fast team, fault localisation: when an end-to-end test fails, it only reports that "checkout is broken" somewhere across dozens of components, whereas a failing unit test points to the exact function at fault, letting a developer fix the bug in minutes rather than hours of manual investigation. A team can keep a small number of end-to-end tests for its most critical user journeys while relying on unit and integration tests, run on every commit, for the bulk of coverage, giving both speed and the boundary/interaction confidence that unit tests alone cannot provide.

Marker's note: top-band answers (1) acknowledge the genuine strength of end-to-end tests rather than dismissing them, (2) name at least two concrete costs (speed, flakiness, fault localisation) rather than a vague "it's slow", (3) reference the test pyramid concept explicitly, and (4) close with an explicit recommendation, not a neutral summary.

exam5 marksA team practises strict test-driven development (TDD) for a new refund-processing module. Explain how the red-green-refactor cycle would change the order in which the team writes code, and assess one risk of applying TDD to this specific module.
Show worked solution →

Order change. Under TDD the team writes a FAILING test first, for example test_refund_cannot_exceed_original_payment() asserting the function rejects a refund of $150 against a $100 payment (RED). They then write the simplest implementation that makes this test pass, such as a single comparison and early return (GREEN). Only then do they refactor the implementation for clarity or to remove duplication, re-running the test suite to confirm behaviour is unchanged (REFACTOR). This reverses the usual order of "write the feature, then test it": the test is written before any implementation exists.

Risk for a refund module. Refund logic often has many edge cases (partial refunds, multiple refunds against one payment, currency rounding, already-refunded payments). If the team writes tests for only the cases they think of up front, TDD can create a false sense of completeness: all written tests pass (green), but an untested edge case, such as two concurrent partial refunds together exceeding the original payment, can still slip through, because TDD only guarantees that the tests you wrote pass, not that you wrote the right tests.

Marking criteria: 1 mark for describing the reversed order (test before code), 1 mark for a correct, concrete red example, 1 mark for a correct, concrete green/refactor example, 2 marks for a specific, well-explained risk tied to the refund domain (not a generic "tests might be wrong").

ExamExplained