Inquiry Question 1: How do machine learning systems work?
Distinguish machine learning from classical programming, and define the roles of model, features, training data and predictions
A focused answer to the HSC Software Engineering Module 3 dot point on what machine learning is. Classical programming vs ML, the role of training data, features, model and predictions, the worked example, and the traps markers look for.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to draw the fundamental distinction between writing rules by hand (classical programming) and having an algorithm learn rules from data (machine learning). You also need to know the standard ML vocabulary: model, features, training data, labels, predictions, and how a model is evaluated using a training/validation/test split.
The answer
Classical programming
The developer writes the rules. The program takes inputs and applies the rules to produce outputs.
rules + data --> answers
Example: a thermostat. "If temperature > 25, turn on the AC." The developer decides the rule.
Machine learning
The developer provides examples (data plus the correct answers). The algorithm learns the rules.
data + answers --> model
Then in use:
model + new data --> predictions
Example: an image classifier. The developer collects 100,000 photos labelled "cat" or "dog", and trains a model to predict the label from the pixels.
The standard vocabulary
- Training data: the examples used to train the model. Typically a table where each row is one example.
- Features: the input columns. For email spam, features might be
subject_length,contains_url,sender_blacklisted. Features need to be numeric (or one-hot encoded categories) for most algorithms. - Label (also called target): the answer column for each training example. "spam" or "not spam".
- Model: the trained artefact. Internally, a set of learned parameters that map features to predictions.
- Prediction: the model's output for a new, unseen example.
- Training: the algorithm that fits the model's parameters to the training data.
- Inference: using the trained model to make predictions on new data.
Worked Python
A minimal end-to-end ML workflow with scikit-learn:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Training data: 150 iris flowers, 4 features each, 3 species labels.
data = load_iris()
X = data.data # features: sepal/petal lengths and widths
y = data.target # label: species (0, 1, or 2)
# Split into training and test sets.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train.
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict on unseen examples.
predictions = model.predict(X_test)
# Evaluate.
print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")
The developer wrote no rules about iris species. The model learned the boundaries from the training data.
When to use ML vs classical
ML is the right tool when:
- The rules are complex, change over time, or are hard to articulate (spam, image recognition, machine translation).
- Labelled data is available in volume.
- An approximate answer is acceptable (predictions are probabilistic, not exact).
Classical programming is the right tool when:
- The rules are well-understood and stable (calculating GST, sorting a list, parsing JSON).
- Errors are unacceptable (banking transactions, control systems).
- The dataset is small or unavailable.
ML is a tool, not a default. Most software is still classical because the rules are clear and exact answers are required.
The training/test split
You never evaluate a model on data it has already seen. Standard practice:
- 60-80 percent of the data is the training set.
- 10-20 percent is the validation set, used during development to tune hyperparameters.
- 10-20 percent is the test set, used once at the end to estimate real-world performance.
If the test accuracy is much lower than training accuracy, the model is overfitting - it memorised the training data instead of learning patterns that generalise.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2024 HSC4 marksDistinguish between classical programming and machine learning. Use the example of classifying email as spam or not spam.Show worked answer →
In classical programming the developer writes the rules. To classify spam, the developer might write code like: "if the subject contains 'viagra' OR the sender is on a blacklist OR there are too many exclamation marks, mark as spam". The program takes rules and data and produces answers.
In machine learning the developer provides examples and lets the algorithm find the rules. The developer collects thousands of labelled emails ("spam" or "not spam") - the training data. An algorithm (logistic regression, a naive Bayes classifier, a neural network) analyses the features (words in the subject, sender, capitalisation, links, attachments) and learns weights that best separate spam from non-spam. The program takes data and answers and produces a model that can predict new examples.
Spam classification works better with ML because:
- Spammers change tactics constantly. Hand-written rules go stale; ML retrained on recent data adapts.
- Many subtle signals (combinations of features) matter together. ML captures interactions classical rules miss.
- Labelling is cheap (users click "this is spam") and large datasets are available.
Markers reward the explicit contrast (classical: rules + data leads to answers; ML: data + answers leads to model), the spam-specific application, and at least one reason ML is preferred for this problem.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation2 marksState two differences between classical programming and machine learning.Show worked solution →
Any two of the following (1 mark each):
- Classical programming takes rules and data to produce answers; machine learning takes data and known answers (labels) to produce a model.
- In classical programming the developer writes the logic; in machine learning an algorithm learns the logic from examples.
- Classical programs give exact, predictable outputs for a given rule; ML models give probabilistic predictions that can be wrong.
Marking criteria: 1 mark per correctly stated and distinct difference, to a maximum of 2.
foundation3 marksA supermarket wants software that calculates GST (10 percent) on a receipt total. Explain, with a reason, whether this should be built using classical programming or machine learning.Show worked solution →
This should be built using classical programming.
The relationship between the receipt total and the GST owed is a fixed, well-understood rule: . There is nothing to "learn" from examples because the rule never changes and has no exceptions that depend on subtle patterns in data.
Classical code (gst = total * 0.10) is exact every time, requires no training data, and is far cheaper to build, test and verify than collecting a dataset and training a model for a task that has an exact mathematical answer.
Marking criteria: 1 mark for correctly choosing classical programming, 1 mark for identifying the rule is fixed/exact and does not vary, 1 mark for explaining why this makes classical programming more appropriate than ML (no need to learn from data, and ML's probabilistic output is not exact).
core4 marksThe table below shows training accuracy and test accuracy for three candidate models trained on the same dataset.
| Model | Training accuracy | Test accuracy |
|---|---|---|
| A | 0.78 | 0.76 |
| B | 0.99 | 0.71 |
| C | 0.85 | 0.83 |
(a) Identify which model is overfitting, and justify your answer using the figures. (b) State which model should be preferred for deployment, giving a reason.Show worked solution →
(a) Model B is overfitting. Its training accuracy (0.99) is far higher than its test accuracy (0.71), a gap of 0.28. This large gap means the model has memorised patterns specific to the training data rather than learning patterns that generalise to unseen data. Models A and C both show small training-test gaps (0.02), so neither is overfitting substantially.
(b) Model C should be preferred. Although Model B has the highest training accuracy, training accuracy is not a reliable measure because the model has already seen that data. Model C has the highest test accuracy (0.83), which is the best available estimate of how the model will perform on genuinely new data, so it is expected to generalise best in production.
Marking criteria: 1 mark for correctly identifying Model B, 1 mark for justifying using the size of the training-test gap, 1 mark for correctly selecting Model C for deployment, 1 mark for justifying using test accuracy (not training accuracy) as the relevant metric.
core4 marksA school records the following data for five students to predict who will pass HSC Physics.
| Student | Year 11 Physics mark | Attendance rate | Practice papers done | Outcome |
|---|---|---|---|---|
| 1 | 88 | 0.95 | 6 | Pass |
| 2 | 45 | 0.60 | 1 | Fail |
| 3 | 72 | 0.88 | 4 | Pass |
| 4 | 51 | 0.70 | 2 | Fail |
| 5 | 66 | 0.80 | 3 | Pass |
(a) Identify the features and the label in this dataset. (b) Identify one likely source of bias if this model were trained only on data from one school.Show worked solution →
(a) Features and label. The features (inputs) are Year 11 Physics mark, attendance rate and practice papers done. The label (the answer column) is Outcome (Pass/Fail).
(b) Source of bias. Training only on one school's historical students means the model learns patterns specific to that school's teaching style, cohort ability and marking, e.g. it may not generalise to a school with different resources, teacher quality or a different student population, so predictions for students outside that school could be unreliable or unfair.
Marking criteria: 1 mark for correctly listing all three features, 1 mark for correctly identifying the label, 1 mark for identifying a plausible bias source (single-school data not representative of a wider population), 1 mark for explaining the consequence (unreliable or unfair predictions elsewhere).
core5 marksA dataset of 2,000 labelled examples is split using a 70/15/15 rule into training, validation and test sets. (a) Calculate the number of examples in each set. (b) Explain the difference in purpose between the validation set and the test set.Show worked solution →
(a) Calculating the split.
Check: .
(b) Validation versus test. The validation set is used repeatedly during development to tune choices such as hyperparameters or which features to include, so it indirectly influences the final model. The test set is held back and used only once, after all tuning is finished, to give an unbiased, final estimate of how the model will perform on completely new data. If the test set were used for tuning like the validation set, that final estimate would no longer be trustworthy.
Marking criteria: 1 mark each for the three correct example counts (3 marks total), 1 mark for correctly stating the validation set's role in tuning, 1 mark for correctly stating the test set's role as a single final unbiased check.
exam6 marksA ride-share company wants to predict how long a trip will take before the driver accepts it. Justify whether this problem should be solved with classical programming or machine learning, referring to the roles of features, training data and label. Discuss one risk of deploying such a system without a proper validation strategy.Show worked solution →
- Recommendation: machine learning
- Trip duration depends on a complex, constantly shifting combination of factors, traffic conditions, time of day, weather, road works and driver route choice, that cannot be captured by a small set of fixed rules. A hand-written formula would quickly go stale as traffic patterns change, whereas a model can be retrained on recent trips.
- Setting it up as an ML problem
- The training data is a large historical log of completed trips. The features would include pickup and drop-off coordinates, time of day, day of week, and live traffic indicators. The label is the actual trip duration recorded for each historical trip. A regression model is trained to learn the relationship between the features and the label, producing a model that outputs a predicted duration (in minutes) for a new, not-yet-completed trip.
- Risk of skipping a validation strategy
- Without a validation set held separate from training, the company has no honest way to tune the model (e.g. choosing which features matter, or when to stop training) without risking overfitting to the training log. It might select a model that memorises quirks of past trips, such as a road closure that no longer exists, and confidently predict wildly wrong durations in production. This directly damages the customer experience (mis-set pickup expectations) and driver allocation efficiency, and the company would have no reliable, unbiased estimate of real-world accuracy before the system goes live.
Marking criteria: 1 mark for correctly justifying ML over classical programming, 1 mark for identifying complex/changing rules as the reason, 1 mark each for correctly identifying training data, features and label in this context (3 marks), 1 mark for a specific, well-explained risk of omitting a validation strategy (overfitting leading to unreliable real-world predictions).
exam7 marksAssess the claim that machine learning will eventually replace classical programming for all software tasks.Show worked solution →
This is an ASSESS question: markers reward a supported judgement, not a one-sided description.
- Plan
- Thesis: machine learning will not replace classical programming for all tasks, because the two solve different kinds of problems; ML expands what software can attempt, it does not make classical programming obsolete.
- Case for ML expanding capability
- ML is transformative precisely where rules are too complex, subtle or fast-changing to hand-code: image recognition, natural language translation, fraud detection and recommendation are now solved far better by models trained on millions of examples than by any hand-written rule set. As labelled data and compute become cheaper, more tasks fall into this category.
- Case for classical programming remaining essential
- Many tasks have exact, well-understood, stable rules where classical programming is strictly better: calculating tax, sorting a list, validating a form, or controlling a lift. These need deterministic, auditable, exactly-correct behaviour; an ML model's probabilistic output is unacceptable where an error has legal or safety consequences (banking transactions, aircraft control systems). Classical code is also cheaper to build and verify when no training data is needed at all.
- Model paragraph (excerpt)
- Machine learning has genuinely expanded the boundary of what software can do, but it has not shifted that boundary to zero; it has simply moved tasks that were previously impossible for computers, such as recognising a face in a photo, into the category of solvable problems. Tasks that already sat comfortably on the "well-understood rule" side of that boundary, such as computing GST or sorting a database, gain nothing from being reframed as an ML problem: they would trade an exact, cheap, verifiable answer for a probabilistic, expensive, harder-to-audit one. The two paradigms therefore coexist by design, with most real systems, including the ride-share pricing engine or the spam filter, combining a classical programming shell (input validation, business rules, security) around an ML component used only where classical rules genuinely fail.
Marker's note: top-band answers (1) take an explicit position rather than listing pros and cons neutrally, (2) give concrete named examples on both sides, (3) explain WHY each paradigm suits its examples (complexity/stability of rules, need for exactness), and (4) close with a reasoned judgement, ideally noting that most real systems use both together.
