Skip to main content
ExamExplained
NSW · Software Engineering
Software Engineering study scene
§-Syllabus dot point
NSWSoftware EngineeringSyllabus dot point

Inquiry Question 1: How are large-scale software solutions developed and managed?

Apply code review and quality practices, including peer review, style guides, linters and static analysis

A focused answer to the HSC Software Engineering Module 4 dot point on code review. Pull request reviews, style guides, linters, static analysis, the worked example, and the traps markers look for.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Status
  4. Context
  5. Decision
  6. Consequences

What this dot point is asking

NESA wants you to describe how code review works in a software team, the mechanical tools that support it, and how the combination of human review and automated tools produces high-quality code.

The answer

Code review

A peer reviews every change before it is merged. In modern Git workflows, this happens on pull requests:

  1. The author opens a PR.
  2. Reviewers read the diff, leave inline comments, ask questions, request changes.
  3. The author replies or revises. Comments are resolved.
  4. When reviewers approve and automated checks pass, the PR can be merged.

A pull request passing through the human review gate and the automated check gate A schematic showing a pull request opened by an author, then splitting into two parallel gates: a human review gate where a reviewer reads the diff and leaves comments, and an automated check gate running linter, type checker, static analysis and tests. Both gates must pass before the change merges into the main branch; either gate failing sends the change back to the author for revision. Author opens PR Human review gate reads diff, comments, approves or requests changes Automated check gate linter, type checker, static analysis, tests main branch both gates passed Both gates run for every change Either gate failing sends the change back to the author for revision, not into main.

What reviewers look for

  • Correctness: does the code do what the PR says it does?
  • Tests: are there tests for the change? Do they cover the edge cases?
  • Security: are inputs validated, outputs encoded, queries parameterised?
  • Maintainability: is the code clear? Are the names good? Is the structure consistent with the rest of the codebase?
  • Design: is this the right approach? Are there simpler alternatives?
  • Documentation: are docs updated? Are non-obvious decisions noted?
  • Performance: any obvious hot spots? N+1 queries? Memory leaks?

Constructive review

Best practice from industry:

  • Comment on the code, not the person. "This function could be clearer", not "you wrote unclear code".
  • Distinguish blocking issues from suggestions. Use prefixes like [blocking] or [nit].
  • Ask questions rather than make pronouncements. "Why this approach?" surfaces assumptions.
  • Approve when ready. Holding approval for minor preferences slows everything down.
  • Authors: take feedback seriously, push back when warranted, do not take it personally.

Style guides

A document (or shared linter config) describing how the team writes code:

  • Naming conventions (snake_case for Python variables, camelCase for JavaScript).
  • File and folder structure.
  • Comment style.
  • Error handling patterns.
  • Import order.

Examples: PEP 8 for Python, Airbnb JavaScript Style Guide, Google's various style guides.

Linters and formatters

Automated tools that enforce style and catch common mistakes:

Language Linter Formatter
Python ruff (or flake8 + pylint) black, ruff format
JavaScript/TypeScript eslint prettier
Go golangci-lint gofmt
Java Checkstyle google-java-format
Rust clippy rustfmt

Run automatically in CI on every PR. Failed lint blocks the merge.

# A ruff configuration in pyproject.toml
[tool.ruff]
line-length = 100
target-version = "py311"

[tool.ruff.lint]
select = ["E", "F", "B", "I", "S", "C90"]
# E = pycodestyle errors
# F = pyflakes (unused imports, undefined names)
# B = bugbear (common bug patterns)
# I = import order
# S = security (bandit)
# C90 = complexity (McCabe)

Static analysis

Goes deeper than linting. Analyses the code structure without running it.

  • Type checkers: mypy (Python), TypeScript, Flow. Catch type mismatches and missing null checks.
  • Security scanners: Semgrep, SonarQube, Snyk Code. Match common vulnerability patterns.
  • Complexity analysers: McCabe complexity, cyclomatic complexity, identify functions that should be refactored.
  • Dead code detectors: identify unreachable code and unused imports.

Run in CI alongside tests.

Pre-commit hooks

Run linters and formatters automatically on every commit, before the change leaves the developer's machine. Saves a round trip with CI.

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.5.0
    hooks:
      - id: ruff
      - id: ruff-format
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.10.0
    hooks:
      - id: mypy

Architecture decision records (ADRs)

For decisions that shape the codebase, write a short record:

# ADR 0007: Use PostgreSQL not MongoDB

## Status
Accepted, 2026-05-15

## Context
We need a database. The team has more SQL experience than NoSQL.

## Decision
Use PostgreSQL 15.

## Consequences
- Strong consistency by default.
- Mature tooling (psql, pgAdmin).
- The team must design relational schemas, including for semi-structured
  data (use jsonb columns where needed).

ADRs preserve the why behind technical choices, so future maintainers can revisit them with full context.

Metrics

Some teams track:

  • Time to review: median hours from PR open to first review.
  • Time to merge: median hours from PR open to merge.
  • Review thoroughness: comments per 100 lines of diff.
  • Coverage: test coverage percentage.
  • Lint debt: open lint errors.

Use metrics as signals to investigate, not targets to optimise (Goodhart's law).

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 HSC5 marksExplain the purpose of code review in a software project. Describe two complementary practices that improve code quality alongside code review.
Show worked answer →

Code review is the practice of having another developer read and approve every change before it is merged into the main branch. The reviewer checks for correctness, security, style, maintainability, and adherence to team conventions. Review is conducted on a pull request (PR), with line-by-line comments and an explicit approve or request-changes decision.

Purposes:

  • Catch bugs before they reach main, while the change is small and fresh.
  • Share knowledge across the team. Reviewing teaches both reviewer and author.
  • Maintain consistency. Reviewers enforce the team's standards.
  • Provide an audit trail. Every change has a recorded discussion attached.

Complementary practice 1: linters and formatters. Tools like ruff (Python), eslint (JavaScript), prettier and black automate enforcement of style rules and catch common mistakes. The team configures the linter once, and every commit is checked in CI. This frees humans from arguing about formatting and lets the code review focus on logic and design.

Complementary practice 2: static analysis. Tools like mypy (Python type checking), TypeScript, SonarQube and Semgrep analyse the code without running it. They catch type mismatches, unreachable code, common security patterns (hardcoded secrets, missing null checks), and complexity hotspots. Static analysis catches systematic issues that humans tire of catching.

Together, linters and static analysis handle the mechanical issues so code review can focus on the higher-value questions: is this the right design? Have edge cases been considered? Does it match user needs?

Markers reward a clear definition of code review, at least two distinct purposes, and two complementary practices (not just two linters). Best answers note that reviews are most effective when freed from mechanical issues by tooling.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksState two reasons a team runs a linter in CI rather than relying only on human reviewers to check formatting.
Show worked solution →

Any two of: linters are faster and cheaper than a human reading every line for style; linters are consistent, applying the same rule every time with no fatigue; linters free reviewer attention for design and correctness rather than formatting; linters catch issues (unused imports, undefined names) that are easy for a human to miss.

Marking criteria: 1 mark per valid distinct reason, up to 2 marks.

foundation3 marksDistinguish between a linter and a static analysis tool, giving one named example of each.
Show worked solution →

A linter checks surface-level style and simple mistakes against a fixed rule set, for example unused imports or inconsistent indentation. Example: eslint for JavaScript.

A static analysis tool inspects the code's structure more deeply without executing it, catching issues such as type mismatches, security vulnerability patterns or excessive complexity. Example: mypy for Python type checking, or Semgrep for security patterns.

Marking criteria: 1 mark for a correct linter description with a named example, 1 mark for a correct static analysis description with a named example, 1 mark for stating that static analysis goes deeper than surface style.

core4 marksA team's pull requests average 600 changed lines and sit unreviewed for three days. Explain two changes the team should make, and how each addresses the underlying problem.
Show worked solution →

Change 1: enforce smaller PRs. A 600-line PR is too large for a reviewer to hold in their head; splitting the work into smaller, focused PRs (for example, under 200 lines) means each PR can be reviewed thoroughly in one sitting, which directly reduces the risk of a superficial "LGTM" approval.

Change 2: set a review-time service-level objective. Introducing a target such as "every PR gets a first review within one business day", tracked and reported weekly, gives the team a visible signal when reviews are being neglected, whereas without a tracked target reviews silently slip because no individual is accountable for turnaround.

Marking criteria: 1 mark per named change (2 max), 1 mark per explanation of how it addresses the root cause (large size or lack of accountability) rather than just restating the symptom, up to 4 marks total.

core4 marksThe table below shows lint and review data for two sprints at a small startup. | Sprint | Avg PR size (lines) | Median time to first review (hours) | Open lint errors at sprint end | |---|---|---|---| | Sprint 8 | 340 | 30 | 12 | | Sprint 9 | 120 | 6 | 0 | Using the data, explain what changed between Sprint 8 and Sprint 9, and why the Sprint 9 practices are likely to produce higher-quality code.
Show worked solution →

Between the two sprints, average PR size fell from 340 to 120 lines, median time to first review fell from 30 hours to 6 hours, and open lint errors fell from 12 to 0.

This pattern is consistent with the team adopting smaller, more focused PRs (explaining the drop in size), which are naturally reviewed faster because there is less for a reviewer to hold in their head at once (explaining the drop in review time), and adopting an enforced lint gate in CI so lint errors can no longer accumulate unresolved (explaining the drop to 0 open errors).

Smaller, promptly reviewed PRs with zero outstanding lint debt are higher quality because bugs are caught while the change is still fresh in the author's memory, and reviewers are not distracted by mechanical style issues that a linter should have already caught, so their attention goes to correctness and design.

Marking criteria: 1 mark for correctly reading all three trends from the table, 1 mark for linking smaller PR size to faster review, 1 mark for linking the lint gate to zero outstanding errors, 1 mark for explaining why the combination improves code quality (not just restating the numbers).

core3 marksExplain why 'approving without reading' a large pull request is a common trap, and state one process change that reduces it.
Show worked solution →

Approving without reading gives the appearance of review (a recorded approval) without the substance of it: bugs, design flaws and security issues that only surface through careful reading pass into main unchecked, and the team develops false confidence that changes have been vetted.

Process change: cap PR size (for example, a soft limit of 300-400 changed lines) so that a reviewer can realistically read the whole diff in one sitting, or require the reviewer to explicitly comment on at least the key files changed before approving.

Marking criteria: 1 mark for identifying the false-confidence risk, 1 mark for explaining the consequence (issues reach main unchecked), 1 mark for a workable process change.

exam6 marksEvaluate the claim that 'code review is more valuable for spreading knowledge across a team than for catching bugs'.
Show worked solution →

This is a 6-mark EVALUATE: markers reward a judgement supported by contrasted evidence on both sides, not a one-sided description.

Case for the claim (knowledge-spreading dominates)
Every review exposes the reviewer to a part of the codebase they might not otherwise touch, and exposes the author to the reviewer's mental model and conventions. Over time this reduces bus factor (the number of people who could quit before a section of the system becomes unmaintainable) and builds a shared, consistent understanding of the system that no amount of bug-catching alone would produce. Teams that rotate reviewers report this benefit compounding: junior developers ramp up faster because they read senior code every week.
Case against the claim (bug-catching still dominates in impact)
A single bug that reaches production, especially one touching security or data integrity, can cost far more in incident response, reputational damage and lost trust than the marginal knowledge gained from one more review. Mechanical tools (linters, static analysis, CI tests) already catch many mechanical bugs, but human review remains the main defence against logic errors, missing edge cases and design flaws that automation cannot detect, which is precisely why review remains mandatory even on teams with excellent automated tooling.
Judgement
The two benefits are not actually in competition: they occur on every reviewed PR simultaneously, so ranking one above the other depends on team maturity. For a mature team with strong CI (tests, linters, static analysis, high coverage), review's marginal value shifts toward knowledge-spreading and design discussion, because most mechanical bugs are already caught before a human ever opens the PR. For a team with thin automated coverage, review remains the primary bug-catching layer, so the claim is false. The claim is therefore only partially true, and depends heavily on how much of the mechanical checking has already been automated.

Marker's note: top-band answers (1) explicitly argue both sides rather than picking one immediately, (2) connect the strength of each side to a stated condition (team maturity, automation coverage) rather than asserting it universally, and (3) end with an explicit, qualified judgement rather than a neutral summary.

ExamExplained