Module 1: Secure Software Architecture

NSWSoftware EngineeringSyllabus dot point

Inquiry Question 2: How can the security of a developed solution be evaluated?

Apply input validation, sanitisation and output encoding to defend against injection attacks

A focused answer to the HSC Software Engineering Module 1 dot point on input validation. Allow-list vs deny-list, sanitisation, output encoding, parameterised queries, the worked SQL injection example, and the traps markers look for.

Generated by Claude OpusReviewed by Better Tuition Academy6 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to define three related but distinct techniques - input validation, sanitisation, and output encoding - and apply each to defend against injection attacks. You need to know when to use which, and why parameterised queries (output encoding for SQL) are the primary defence against SQL injection.

The answer

Input validation

Check that incoming data matches an expected format before processing it. Two approaches:

  • Allow-list (preferred): specify exactly what is permitted, reject everything else. "Username must be 3-32 alphanumeric characters."
  • Deny-list: specify what is forbidden, allow everything else. Brittle because attackers find creative bypasses.

Validate at the server, not just in the browser. Browser validation improves UX but does nothing for an attacker who calls your API directly.

import re

USERNAME_PATTERN = re.compile(r"^[a-zA-Z0-9_]{3,32}$")

def validate_username(username):
    if not USERNAME_PATTERN.fullmatch(username):
        raise ValueError("Invalid username format")
    return username

Sanitisation

Transform input to make it safe for downstream use. Removes or escapes unwanted characters but does not reject the request.

import html

def sanitise_for_display(text):
    return html.escape(text)

print(sanitise_for_display("<script>alert(1)</script>"))
# &lt;script&gt;alert(1)&lt;/script&gt;

Sanitisation is a useful defence in depth but is fragile when used as the only defence. Different output contexts (HTML attribute, JavaScript string, SQL value, URL) have different escape rules.

Output encoding

Transform data at the boundary where it is written to a target context. The encoding depends on the context:

  • HTML body: HTML-encode <, >, &, ", '.
  • HTML attribute: HTML-encode plus quote the attribute.
  • JavaScript string: JavaScript-encode and never trust user input as code.
  • SQL: do not encode - use parameterised queries.
  • URL parameter: URL-encode.

The big example: parameterised queries

A vulnerable login query:

def login(username, password):
    query = f"SELECT * FROM users WHERE name = '{username}' AND pass = '{password}'"
    return db.execute(query).fetchone()

Submitting ' OR '1'='1 as the password turns the query into:

SELECT * FROM users WHERE name = 'admin' AND pass = '' OR '1'='1'

which returns the admin row regardless of password.

The fix:

def login(username, password):
    query = "SELECT id, pass_hash FROM users WHERE name = ?"
    row = db.execute(query, (username,)).fetchone()
    if row and bcrypt.checkpw(password.encode(), row["pass_hash"]):
        return row["id"]
    return None

The database driver substitutes the ? placeholder with the value safely. No string concatenation, no escape rules, no injection.

Defence in depth

Real systems combine all three:

  1. Validate at input: reject obviously malformed data early.
  2. Use parameterised queries for SQL (and equivalent techniques for other languages).
  3. Encode at output for HTML, JavaScript, URL contexts.
  4. Apply Content Security Policy headers to limit damage if XSS slips through.

Past exam questions, worked

Real questions from past NESA papers on this dot point, with our answer explainer.

2024 HSC5 marksDistinguish between input validation, sanitisation and output encoding. Show how each technique defends against an SQL injection attack on a login form.
Show worked answer →

Input validation checks that data matches an expected format before any further processing. A username field might require 3-32 characters, alphanumeric only. If the input fails the check, the request is rejected. Validation works best as an allow-list (specify what is allowed) rather than a deny-list (specify what is blocked).

Sanitisation transforms input to make it safe for downstream use - for example, stripping or escaping characters that would have a special meaning. Validation rejects bad input; sanitisation modifies it.

Output encoding transforms data at the point it is written to a target context (HTML, SQL, shell). Encoding HTML entities (< becomes &lt;) prevents reflected XSS; SQL parameter binding prevents SQL injection.

For an SQL injection attack via the login form:

  • Validation: reject usernames containing quote or semicolon characters. Defends in depth but is not the primary fix.
  • Sanitisation: strip or escape SQL metacharacters. Fragile because escape rules differ by database.
  • Output encoding (parameterised queries): pass the username as a bound parameter so the database driver never interprets it as SQL. This is the primary defence against SQL injection.

Markers reward the three definitions, the distinction (validation rejects, sanitisation transforms, encoding adapts to context), and identifying parameterised queries as the canonical defence.

Related dot points