β Module 3: Software Automation
Inquiry Question 1: How do machine learning systems work?
Describe the basic structure of a neural network, including neurons, layers, weights, activation functions and training by backpropagation
A focused answer to the HSC Software Engineering Module 3 dot point on neural networks. Neurons, layers, weights, activation functions, forward pass, backpropagation, the worked example, and the traps markers look for.
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to describe the basic architecture of a feed-forward neural network and the mechanics of how it is trained. You do not need to derive gradients, but you should know the components by name and what each does.
The answer
A feed-forward neural network is a stack of layers. The diagram shows a small network with an input layer of four features, one hidden layer of five neurons, and an output layer of three classes. Every neuron in one layer is connected to every neuron in the next.
The artificial neuron
The basic unit. It takes inputs from the previous layer, multiplies each by a weight, sums them, adds a bias, and applies an activation function:
Common activation functions:
- ReLU: . The default in hidden layers.
- Sigmoid: . Squashes the output to . Used for binary classification output.
- Softmax: turns a vector of scores into probabilities summing to 1. Used for multi-class classification output.
- Tanh: . Squashes to . Older default.
Layers
A neural network is a stack of layers:
- Input layer: one neuron per feature. For an image, that might be 28 x 28 = 784 input neurons.
- Hidden layers: one or more layers between input and output. Each neuron is connected to every neuron in the previous layer (in a fully connected network).
- Output layer: one neuron for regression, neurons for -class classification.
A "deep" network has many hidden layers. Each layer learns increasingly abstract features.
Forward pass
To make a prediction, feed the input through every layer in turn. Each layer computes its weighted sums and activations. The output layer produces the prediction.
For a 784-input, 128-hidden, 10-output digit classifier:
- Input: 784 pixel values, normalised to .
- Hidden layer: 128 neurons, each computing a weighted sum of the 784 inputs and applying ReLU.
- Output layer: 10 neurons (one per digit 0-9), each computing a weighted sum of the 128 hidden activations and applying softmax.
The output is a probability distribution over the 10 digits. The predicted digit is the one with the highest probability.
Loss
Measures how wrong the prediction is.
- Cross-entropy loss for classification: low when the predicted probability of the correct class is high.
- Mean squared error for regression: low when the predicted value is close to the true value.
Backpropagation
The training algorithm. For each batch of training examples:
- Forward pass: compute predictions and the loss.
- Backward pass: compute the gradient of the loss with respect to every weight in the network, using the chain rule of calculus.
- Update: adjust every weight by a small step opposite to its gradient. The step size is the learning rate.
After many passes through the training data (epochs), the weights settle into values that produce good predictions.
The optimiser controls how the updates are applied. Stochastic gradient descent (SGD) updates after each mini-batch. Adam is a popular adaptive variant.
A worked code example
A minimal feed-forward network in PyTorch:
import torch
import torch.nn as nn
import torch.optim as optim
class MLP(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x) # softmax applied via the loss
model = MLP()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# One training step:
predictions = model(batch_x) # forward pass
loss = loss_fn(predictions, batch_y) # compute loss
loss.backward() # backpropagation
optimizer.step() # update weights
optimizer.zero_grad() # reset gradients
The framework handles the gradient calculations automatically.
Hyperparameters
Choices the developer makes that affect training:
- Number of layers and neurons per layer (network architecture).
- Activation functions (ReLU, sigmoid, tanh).
- Learning rate (how big each weight update is).
- Batch size (examples per gradient step).
- Number of epochs (passes through the training data).
- Regularisation (dropout, L2 weight decay) to prevent overfitting.
Overfitting
A neural network with enough parameters can memorise the training data exactly. Such a model has zero training error but performs poorly on new data. Detection: training loss keeps falling while validation loss starts rising. Prevention: more training data, smaller network, dropout, regularisation, early stopping.
Beyond feed-forward
For images and other structured inputs, specialised architectures perform far better:
- Convolutional neural networks (CNNs) for images.
- Recurrent neural networks (RNNs) for sequences.
- Transformers for language and many other domains.
These are out of HSC scope at the architectural level, but you should recognise the names.
Past exam questions, worked
Real questions from past NESA papers on this dot point, with our answer explainer.
2024 HSC6 marksDescribe the structure of a simple feed-forward neural network with one hidden layer and explain how it learns from training data.Show worked answer β
A feed-forward neural network has three kinds of layer. The input layer has one neuron per feature. The hidden layer(s) contain artificial neurons that combine inputs from the previous layer. The output layer produces the prediction - one neuron for regression, one neuron per class for classification.
Each neuron computes a weighted sum of its inputs plus a bias, then passes the result through an activation function (typically ReLU in hidden layers, softmax or sigmoid in the output). Mathematically: where are learned weights, are the inputs from the previous layer, is the bias and is the activation function.
Training has two phases per batch of examples.
Forward pass: feed the input through every layer in turn to produce a prediction. Compare the prediction to the true label to compute a loss (cross-entropy for classification, mean squared error for regression).
Backward pass (backpropagation): compute the gradient of the loss with respect to every weight, using the chain rule of calculus. Update each weight by a small step in the direction that reduces the loss. The optimiser (typically stochastic gradient descent or Adam) controls how large the step is.
Repeat for many epochs (passes through the training data) until the loss stops decreasing on a held-out validation set. The network then generalises to new examples.
Markers reward the three-layer structure, weights/bias/activation in the neuron description, both forward and backward pass, and recognising that backpropagation uses the chain rule to compute gradients.
Related dot points
- Distinguish machine learning from classical programming, and define the roles of model, features, training data and predictions
A focused answer to the HSC Software Engineering Module 3 dot point on what machine learning is. Classical programming vs ML, the role of training data, features, model and predictions, the worked example, and the traps markers look for.
- Compare supervised, unsupervised and reinforcement learning, and identify a typical application of each
A focused answer to the HSC Software Engineering Module 3 dot point on learning paradigms. Supervised classification and regression, unsupervised clustering, reinforcement learning, applications of each, the worked example, and the traps markers look for.
- Explain how the quality and representativeness of training data affect a model, including the risks of bias and overfitting
A focused answer to the HSC Software Engineering Module 3 dot point on training data. Sample bias, label bias, the train/test split, overfitting and underfitting, the worked example, and the traps markers look for.