Inquiry Question 2: How can the security of a developed solution be evaluated?
Apply input validation, sanitisation and output encoding to defend against injection attacks
A focused answer to the HSC Software Engineering Module 1 dot point on input validation. Allow-list vs deny-list, sanitisation, output encoding, parameterised queries, the worked SQL injection example, and the traps markers look for.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to define three related but distinct techniques - input validation, sanitisation, and output encoding - and apply each to defend against injection attacks. You need to know when to use which, and why parameterised queries (output encoding for SQL) are the primary defence against SQL injection.
The answer
Input validation
Check that incoming data matches an expected format before processing it. Two approaches:
- Allow-list (preferred): specify exactly what is permitted, reject everything else. "Username must be 3-32 alphanumeric characters."
- Deny-list: specify what is forbidden, allow everything else. Brittle because attackers find creative bypasses.
Validate at the server, not just in the browser. Browser validation improves UX but does nothing for an attacker who calls your API directly.
import re
USERNAME_PATTERN = re.compile(r"^[a-zA-Z0-9_]{3,32}$")
def validate_username(username):
if not USERNAME_PATTERN.fullmatch(username):
raise ValueError("Invalid username format")
return username
Sanitisation
Transform input to make it safe for downstream use. Removes or escapes unwanted characters but does not reject the request.
import html
def sanitise_for_display(text):
return html.escape(text)
print(sanitise_for_display("<script>alert(1)</script>"))
# <script>alert(1)</script>
Sanitisation is a useful defence in depth but is fragile when used as the only defence. Different output contexts (HTML attribute, JavaScript string, SQL value, URL) have different escape rules.
Output encoding
Transform data at the boundary where it is written to a target context. The encoding depends on the context:
- HTML body: HTML-encode
<,>,&,",'. - HTML attribute: HTML-encode plus quote the attribute.
- JavaScript string: JavaScript-encode and never trust user input as code.
- SQL: do not encode - use parameterised queries.
- URL parameter: URL-encode.
An owned schematic traces one piece of untrusted input through all three layers to three different output contexts, showing why the SAME stored value needs a DIFFERENT encoding at each destination.
The big example: parameterised queries
A vulnerable login query:
def login(username, password):
query = f"SELECT * FROM users WHERE name = '{username}' AND pass = '{password}'"
return db.execute(query).fetchone()
Submitting ' OR '1'='1 as the password turns the query into:
SELECT * FROM users WHERE name = 'admin' AND pass = '' OR '1'='1'
which returns the admin row regardless of password.
The fix:
def login(username, password):
query = "SELECT id, pass_hash FROM users WHERE name = ?"
row = db.execute(query, (username,)).fetchone()
if row and bcrypt.checkpw(password.encode(), row["pass_hash"]):
return row["id"]
return None
The database driver substitutes the ? placeholder with the value safely. No string concatenation, no escape rules, no injection.
Defence in depth
Real systems combine all three:
- Validate at input: reject obviously malformed data early.
- Use parameterised queries for SQL (and equivalent techniques for other languages).
- Encode at output for HTML, JavaScript, URL contexts.
- Apply Content Security Policy headers to limit damage if XSS slips through.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2024 HSC5 marksDistinguish between input validation, sanitisation and output encoding. Show how each technique defends against an SQL injection attack on a login form.Show worked answer →
Input validation checks that data matches an expected format before any further processing. A username field might require 3-32 characters, alphanumeric only. If the input fails the check, the request is rejected. Validation works best as an allow-list (specify what is allowed) rather than a deny-list (specify what is blocked).
Sanitisation transforms input to make it safe for downstream use - for example, stripping or escaping characters that would have a special meaning. Validation rejects bad input; sanitisation modifies it.
Output encoding transforms data at the point it is written to a target context (HTML, SQL, shell). Encoding HTML entities (< becomes <) prevents reflected XSS; SQL parameter binding prevents SQL injection.
For an SQL injection attack via the login form:
- Validation: reject usernames containing quote or semicolon characters. Defends in depth but is not the primary fix.
- Sanitisation: strip or escape SQL metacharacters. Fragile because escape rules differ by database.
- Output encoding (parameterised queries): pass the username as a bound parameter so the database driver never interprets it as SQL. This is the primary defence against SQL injection.
Markers reward the three definitions, the distinction (validation rejects, sanitisation transforms, encoding adapts to context), and identifying parameterised queries as the canonical defence.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation3 marksDefine input validation, sanitisation and output encoding in one sentence each, giving a concrete example of each applied to a web form field.Show worked solution →
Validation: checks that data matches an expected format before processing, e.g. rejecting a phone number field that contains letters.
Sanitisation: transforms input to remove or escape unwanted content without rejecting it, e.g. stripping HTML tags from a plain-text bio field.
Output encoding: transforms data at the point it is written into a target context, e.g. HTML-encoding a comment before rendering it on a page so embedded tags display as text rather than executing.
Marking criteria: 1 mark per correct definition with a matching concrete example (3 marks total); definitions must show the reject/transform/context distinction, not just repeat the term.
foundation4 marksA junior developer writes this function to fetch a product by ID from a URL parameter:
```python
def get_product(product_id):
query = f"SELECT * FROM products WHERE id = {product_id}"
return db.execute(query).fetchone()
```
Identify the vulnerability, show an input that exploits it, and rewrite the function to fix it.Show worked solution →
Vulnerability: SQL injection. The product_id value is concatenated directly into the query string instead of being passed as data, so an attacker can inject SQL syntax.
Exploiting input: submitting product_id = "0 OR 1=1" turns the query into SELECT * FROM products WHERE id = 0 OR 1=1, which returns every row in the table instead of one product.
Fix:
def get_product(product_id):
query = "SELECT * FROM products WHERE id = ?"
return db.execute(query, (product_id,)).fetchone()
The ? placeholder is bound to product_id by the driver, so the value can never change the query's structure regardless of its content.
Marking criteria: 1 mark for correctly naming SQL injection, 1 mark for a working exploit input with explanation, 2 marks for a correct parameterised rewrite.
core5 marksA signup form has the field constraints below. For each field, write an allow-list validation rule (in words or a short regex) and state one reason an allow-list is safer here than a deny-list.
| Field | Expected format |
|---|---|
| Username | 3-32 letters, digits, underscore |
| Postcode (Australia) | exactly 4 digits |
| Display name | 1-50 Unicode letters, spaces, hyphens |Show worked solution →
Username: ^[a-zA-Z0-9_]{3,32}$. An allow-list is safer because it is easy to state exactly what a username should contain; a deny-list of "bad" characters would need to anticipate every unusual character an attacker might try.
Postcode: ^[0-9]{4}$. Postcodes have a fixed, well-defined structure, so an allow-list eliminates letters, symbols, or extra digits in one rule rather than trying to list every invalid character.
Display name: 1 to 50 Unicode letters, spaces and hyphens, e.g. ^[\p{L} -]{1,50}$ conceptually. An allow-list still works here even with international names, because it specifies the permitted CATEGORIES (letters, space, hyphen) rather than trying to enumerate every disallowed symbol, which is impossible to do completely for a deny-list.
Marking criteria: 1 mark per correct rule (3 marks), 2 marks distributed across justifications that correctly explain why enumerating permitted patterns beats enumerating forbidden ones for that field.
core5 marksA login form is vulnerable to SQL injection. An attacker submits the password field as `' OR '1'='1`. Explain, step by step, why this bypasses the login check in a vulnerable concatenated query, and explain precisely why a parameterised query with a bound parameter is not vulnerable to the same input.Show worked solution →
- Step 1: the vulnerable query
query = f\"SELECT * FROM users WHERE name = '{username}' AND pass = '{password}'\". String concatenation inserts the raw password text directly into the SQL.- Step 2: what the injected input does
- Substituting
' OR '1'='1for password produces... AND pass = '' OR '1'='1'. The database parser reads this as a complete SQL condition: the closing quote ends the intended string, andOR '1'='1'is always true, so the WHERE clause evaluates to true regardless of the real password. - Step 3: why login succeeds
- Because the overall condition is always true for the matched username, the query returns the row for
admin(or any username supplied) with no correct password required. - Step 4: why a bound parameter is immune
db.execute(\"SELECT * FROM users WHERE name = ? AND pass = ?\", (username, password))sends the SQL text and the values to the database SEPARATELY. The database compiles the query structure first, with placeholders, and only afterwards substitutes the supplied values as literal data. The characters' OR '1'='1are treated as the literal contents of the password field being compared, not as SQL syntax, so the attack has no effect on the query's logic.
Marking criteria: 1 mark for correctly tracing how the quote and OR clause complete a true condition, 2 marks for explaining why the login then succeeds, 2 marks for explaining that binding separates query structure from data so injected syntax is treated as a literal value.
core4 marksA blog stores comments in a database using a parameterised INSERT query, so SQL injection is not possible. A comment containing `<script>alert(1)</script>` is later shown to every visitor and the script executes. Explain what went wrong and the missing control.Show worked solution →
Parameterising the INSERT protects the SQL layer only; it does not protect the HTML rendering layer. The comment text was stored safely, but when the page later writes the raw comment into the HTML response, the browser interprets <script> as a real tag and executes it.
The missing control is output encoding at the HTML boundary: the comment text must be HTML-encoded immediately before being written into the page (converting <, >, &, and quote characters to their entities) so the browser displays the text literally instead of parsing it as markup. This is a stored cross-site scripting (XSS) vulnerability, caused by validating/parameterising one boundary (SQL) while leaving another boundary (HTML output) unencoded.
Marking criteria: 1 mark for identifying this as stored XSS, 1 mark for explaining that SQL parameterisation does not protect the HTML output boundary, 2 marks for correctly identifying HTML output encoding as the missing control and how it prevents execution.
exam7 marksA file-sharing app lets users upload a file and choose its display name, which is later used to build a server-side file path for download, e.g. `open(f"/uploads/{display_name}")`. Evaluate a layered defence strategy (validation, sanitisation, output encoding, and any other relevant control) against this scenario, and justify which layer should be implemented first.Show worked solution →
This is an extended-response evaluate question: markers reward identifying the specific attack (path traversal), a layered defence with justification for EACH layer's role, and a reasoned priority.
- The attack
- Because
display_nameis concatenated directly into a filesystem path, a value such as../../etc/passwdallows the attacker to escape the intended/uploads/directory and read (or, depending on the operation, overwrite) arbitrary files on the server. This is a path traversal / directory traversal attack, a form of injection where the "interpreter" being injected into is the filesystem rather than SQL. - Layer 1: input validation (allow-list)
- Reject any
display_namecontaining/,\, or.., and restrict it to a safe pattern such as letters, digits, hyphens and a single file extension. This stops the obvious traversal payloads before the value is used at all. - Layer 2: sanitisation as defence in depth
- Even after validation, normalise the resolved path (e.g. resolve it to an absolute path) and strip any residual path separators, in case validation is bypassed by an encoding trick (such as a URL-encoded slash) the allow-list did not anticipate.
Layer 3: output/target encoding is not the right frame here; the equivalent boundary control is a safe API. Rather than building a path string at all, generate a random internal filename (e.g. a UUID) for storage, and keep the user's display_name only as separate metadata used purely for display, never for filesystem access. This removes the injection target completely rather than trying to filter it.
Layer 4: least privilege. Run the file-serving process with a filesystem user that has read access only inside /uploads/, so even a successful traversal cannot reach sensitive files elsewhere on the server.
Priority and judgement. Layer 3 (never using user-controlled text to build a path; storing files under a generated identifier) should be implemented first, because it eliminates the vulnerability class entirely rather than relying on a filter that could later be bypassed by an encoding the developers did not foresee. Validation and least-privilege remain valuable as defence in depth, but should not be the ONLY control, since history shows deny/allow-list filename filters have repeatedly been bypassed by novel encodings.
Marking criteria: 1 mark for correctly naming path traversal as the attack, 1 mark each for a correctly justified validation, sanitisation and least-privilege layer (3 marks), 1 mark for identifying the generated-filename approach as the strongest control, 1 mark for a reasoned priority judgement rather than an unjustified list order.
exam6 marksAssess the claim that 'output encoding alone is a complete defence against all injection attacks', referring to at least two different injection types in your answer.Show worked solution →
A strong assess response takes a clear position, supports it with at least two named injection types, and reconciles the counter-view.
- Position
- The claim is false. Output encoding is essential but context-specific, and some injection targets are better defended by a structural control (like parameterisation) than by encoding alone.
- Evidence 1: SQL injection
- For SQL, the recommended primary defence is a parameterised query, which separates query structure from data entirely, not "encoding" the value into the SQL string. Attempting to defend SQL injection purely by escaping characters is fragile, because escape rules differ subtly between database engines and have historically had bypasses (e.g. multi-byte character tricks defeating naive escaping functions).
- Evidence 2: cross-site scripting (XSS)
- For HTML output, encoding IS the correct and sufficient primary defence, because HTML-encoding the small, well-defined set of special characters (
<,>,&, quotes) reliably prevents the browser from parsing injected markup as executable script, provided it is applied consistently at every output boundary (HTML body, attribute, JavaScript string each need their OWN encoding rules). - Reconciling
- Output encoding remains necessary in both cases as defence in depth (encoding a value before using it in a shell command or LDAP query still helps), but "output encoding alone" is not sufficient for injection targets like SQL and OS commands, where a structural control (parameterised queries, safe APIs that never build a string from user input) removes the injection class more reliably than any escaping rule.
- Judgement
- Output encoding is a necessary but not universally sufficient defence: it is the correct primary control for XSS, but for SQL and command injection the primary control should be a structural one (parameterisation, safe APIs), with encoding used as an additional layer, not the sole defence.
Marking criteria: 2 marks for a clear position with reasoning, 3 marks for correctly explained, contrasting examples (SQL injection and XSS) that justify the position, 1 mark for an explicit final judgement distinguishing "necessary" from "sufficient".
