§-Syllabus dot point

NSWSoftware EngineeringSyllabus dot point

Inquiry Question 2: How are machine learning systems used to develop solutions?

Describe applications of machine learning in industry, including image recognition, natural language processing, recommendation systems and predictive maintenance

A full study guide to the HSC Software Engineering Module 3 dot point on ML applications: image recognition, NLP, recommendation systems, predictive maintenance, an owned MLOps lifecycle figure, worked examples, common traps, and graded practice.

Generated by Claude Opus 4.89 min answerUpdated 2026-07-03

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

In plain English

Think of an industrial ML deployment as hiring four different specialist tradespeople for a big job, not just one all-rounder. The image-recognition specialist reads photos and flags problems, the same way a building inspector spots a crack in a wall. The NLP specialist reads and writes messages, like a receptionist handling customer emails. The recommendation specialist is a matchmaker, quietly suggesting who might want what based on who liked similar things before. The predictive-maintenance specialist is the mechanic who listens for early rattles in a machine, long before it actually breaks down. Hiring each specialist (training a model) is only the start: someone still has to roster them, check their work, and retrain them when the job changes, which is exactly what MLOps does for a deployed model.

What this dot point is asking

NESA wants you to know the major categories of industrial ML deployment, the type of learning each uses, and a realistic deployment challenge. The four big categories: image recognition, natural language processing, recommendations, predictive maintenance.

The answer

Image recognition

Computer vision systems classify or detect objects in images. Applications:

Medical imaging: detecting pneumonia in chest x-rays, cancer in pathology slides, diabetic retinopathy in retinal scans.
Autonomous driving: detecting pedestrians, vehicles, traffic signs, road markings.
Quality control: identifying defects on a factory production line.
Agriculture: identifying weeds or pest damage from drone imagery.
Retail: cashier-less stores (Amazon Go) tracking which items shoppers take.

Learning type: supervised classification or object detection. Typically convolutional neural networks (CNNs).

Challenges: requires very large labelled datasets, must work across lighting and equipment variations, ethical issues around surveillance.

The four categories at a glance

Category	Typical learning type	Typical architecture	One real deployment challenge
Image recognition	Supervised classification / object detection	Convolutional neural network	Domain shift across cameras, lighting, populations
Natural language processing	Supervised pre-training plus fine-tuning	Transformer / large language model	Hallucination and prompt injection
Recommendation systems	Collaborative + content-based filtering, plus RL	Matrix factorisation / neural ranking	Cold start and filter bubbles
Predictive maintenance	Supervised regression or classification, plus anomaly detection	Gradient-boosted trees / time-series models	Rare-event, imbalanced training data

No industrial system uses only one learning type; markers reward naming the DOMINANT type plus recognising the others in play.

Natural language processing (NLP)

Systems that understand or generate human language. Applications:

Machine translation (Google Translate).
Sentiment analysis of customer reviews or social media.
Question answering and chatbots.
Summarisation of long documents.
Code generation (GitHub Copilot, Claude Code).
Email and document search.

Learning type: supervised pre-training plus task-specific fine-tuning. Modern systems use transformer architectures, especially large language models.

Challenges: large compute cost for training and inference, hallucination (confident wrong answers), cultural and language coverage gaps, prompt injection attacks.

Recommendation systems

Predict items a user is likely to want. Applications:

Netflix, YouTube, Spotify: what to watch or listen to next.
Amazon, eBay: products a customer is likely to buy.
News feeds (Facebook, X, TikTok): which posts to show.
Job sites (LinkedIn, Seek): roles matched to a candidate.

Learning type: a mix of collaborative filtering (find users similar to you, recommend what they liked), content-based filtering (find items similar to ones you liked), and reinforcement learning to optimise long-term engagement.

Challenges: cold start (new users or items have no history), filter bubbles (showing only what users already agree with), measuring success (clicks vs satisfaction vs long-term wellbeing).

Predictive maintenance

Predict when industrial equipment will fail. Applications:

Manufacturing: motors, pumps and bearings in factories.
Energy: wind turbines, transformers, power lines.
Transport: aircraft engines, train wheels, ship engines.
Mining: haul truck components.

Learning type: supervised regression (time to failure) or classification (will it fail in the next N days), with anomaly detection as a complement.

Challenges: rare-event labels (most machines do not fail in any given week), cost-sensitive evaluation (false negatives cost more than false positives), sensor noise and missing data.

Other categories worth knowing

Fraud detection: classifying transactions as legitimate or fraudulent.
Demand forecasting: predicting retail or energy demand.
Translation and accessibility: real-time captions, sign-language recognition.
Drug discovery: predicting which molecules bind to a target.
Generative AI: image, video, audio and text generation.

A worked Python example

A simple sentiment analysis pipeline using scikit-learn:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

reviews = [
    "The food was amazing and the service was great.",
    "Terrible experience, will not return.",
    "Excellent, I loved every dish.",
    "Boring and overpriced.",
]
labels = [1, 0, 1, 0]  # 1 = positive, 0 = negative

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer()),
    ("logreg", LogisticRegression()),
])
pipeline.fit(reviews, labels)

print(pipeline.predict(["The meal was fantastic"]))  # [1]
print(pipeline.predict(["I hated the waiter"]))      # [0]

This is a tiny example, but the structure generalises: convert text to numbers, train a classifier, predict on new text.

Australian context

CSIRO runs ML projects in agriculture (weed detection from drones) and climate modelling.
Cochlear uses ML in hearing implant signal processing.
Big four banks use ML for fraud detection, credit scoring and customer service routing.
Atlassian, Canva ship ML features in their products (smart search, content generation).
Telstra and major mining companies use predictive maintenance on infrastructure and equipment.

Deployment realities

Deploying ML is more than training a model:

Data pipelines keep training data fresh.
Model serving runs the trained model at inference time, often at scale.
Monitoring detects when the model's predictions drift from reality.
Retraining updates the model when data shifts.
A/B testing compares model versions against a baseline.
Fallbacks provide a safe response when the model is uncertain.

This is sometimes called MLOps. The five stages form a loop, not a one-off pipeline: retraining feeds straight back into a refreshed data pipeline.

Worked example

A bank wants to use ML to detect fraudulent credit card transactions. Outline the design.

Data: every transaction with features (amount, merchant, time, location, distance from last transaction). Labels: confirmed fraud from chargebacks.
Learning type: supervised classification. Severe class imbalance (99.9 percent legitimate), so use techniques like class weighting or specialised loss.
Deployment: the model scores every incoming transaction in real time. Above a threshold, the transaction is held for review. Below it, it goes through.
Monitoring: track precision, recall, false positive rate, and the rate of customer complaints. Retrain monthly.
Fallback: if the model service is unavailable, fall back to simple rules ("decline transactions over $10,000 from a new device") rather than going down.

Challenges: fraudsters adapt their tactics, so the model must be retrained constantly. False positives (declining legitimate transactions) hurt customer trust, so threshold tuning is critical.

Marker's note: full marks need the type of learning named, a specific technique for the class imbalance (not just "it is imbalanced"), a concrete deployment mechanism (a threshold and what happens above/below it), and a fallback that keeps the system safe rather than simply stating "monitor it".

Common traps

Listing only one type of learning: Industry usually combines several. Recommendation systems use collaborative filtering plus content-based filtering plus RL.
Forgetting the deployment layer: Training a model is the easy bit. Serving it reliably, monitoring it and retraining it are the hard parts. Markers reward mentioning MLOps activities.
Treating image recognition as a solved problem: Domain shift (different cameras, lighting, populations) breaks deployed models. Continuous evaluation matters.
Conflating "ML system" with "deep learning": Many industrial deployments still use logistic regression, gradient-boosted trees or random forests. Deep learning is one tool, not the only one.
Ignoring cost asymmetry: In fraud, missing a fraud costs more than declining a real customer; in cancer screening, missing a tumour costs more than a false alarm. Evaluation must reflect costs, not just accuracy.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 HSC5 marksDescribe two industry applications of machine learning. For each, identify the type of learning used and one challenge in deploying the system.

Show worked answer →

Application 1: medical image analysis.

Used by hospitals to assist radiologists in reading chest x-rays for pneumonia, mammograms for breast cancer, and dermatology images for skin cancer. The model classifies an image as normal or suggesting a particular condition.

Type of learning: supervised classification, typically a convolutional neural network trained on thousands of images each labelled by a radiologist.

Deployment challenge: regulatory approval. Medical AI is classed as a medical device by health regulators (TGA in Australia, FDA in the US) and must pass clinical evaluation. Models also need to be robust across imaging equipment from different manufacturers and patient populations, which requires diverse training data.

Application 2: predictive maintenance in industrial plants.

Used by manufacturers and energy companies to predict when machinery will fail. Sensors on motors, pumps and turbines stream vibration, temperature and acoustic data. A model predicts which machines will fail within the next two weeks so they can be serviced proactively.

Type of learning: supervised regression or classification, sometimes with anomaly detection on top. Training data is sensor readings paired with historical failure dates.

Deployment challenge: rare-event problem. Failures are infrequent, so training data is imbalanced. The model must avoid both false negatives (missed failures that lead to expensive downtime) and false positives (unnecessary maintenance). Cost-sensitive evaluation is essential.

Markers reward two distinct applications, the correct type of learning for each, a specific deployment challenge (not just "it is hard"), and ideally one mitigation per challenge.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation3 marksIdentify the type of learning used by a system that classifies chest x-rays as showing pneumonia or not, and justify your choice.

Show worked solution →

Type of learning: supervised classification.

Justification: the model is trained on a dataset of x-ray images where each image has already been labelled by a radiologist as "pneumonia" or "normal". The model learns a mapping from labelled inputs to a known, fixed set of output categories, which is the definition of supervised classification rather than unsupervised learning, where no labels would exist.

Marking criteria: 1 mark for naming supervised classification, 2 marks for justifying the answer using the presence of labelled training data and a defined output category.

foundation3 marksState two deployment challenges facing a natural language processing chatbot used for customer service, and explain why each is a genuine risk beyond simple training accuracy.

Show worked solution →

Challenge 1: hallucination. The chatbot may give a fluent, confident answer that is factually wrong, e.g. quoting an incorrect refund policy. This misleads customers and creates a reputational and possibly legal risk, even if the model's overall accuracy score looks high.

Challenge 2: prompt injection. A user can craft an input designed to make the chatbot ignore its instructions, for example to reveal internal data or bypass a safety rule. This is a security risk, not an accuracy problem, so it will not show up in normal accuracy metrics at all.

Marking criteria: 1 mark per correctly identified and explained risk (2 marks), 1 mark for explaining why each risk matters beyond raw training accuracy.

core4 marksThe table below shows weekly average vibration and temperature readings from a factory pump monitored by a predictive maintenance model, alongside the model's predicted failure risk. | Week | Vibration (mm/s) | Temperature (degC) | Predicted failure risk | |------|-------------------|---------------------|-------------------------| | 1 | 2.1 | 58 | 4% | | 2 | 2.3 | 59 | 6% | | 3 | 3.8 | 64 | 22% | | 4 | 5.6 | 71 | 58% | | 5 | 6.9 | 76 | 81% | (a) Identify the trend in the data and state what action the maintenance team should take by week 5. (b) Explain why a predictive maintenance model like this is still described as solving a "rare-event" problem, even though this pump's own risk clearly increased.

Show worked solution →

(a) Trend and action. Vibration, temperature and predicted failure risk all rise steadily from week 1 to week 5. The risk crosses a typical "schedule maintenance now" threshold (often around 50 percent) between weeks 3 and 4. By week 5, at 81 percent predicted risk, the team should schedule the pump for immediate maintenance rather than waiting for the next routine shutdown.

(b) Why it is still a rare-event problem. The "rare-event" description applies at the training-data level across the whole fleet of machines a company operates: in any given week, only a small fraction of all pumps ever reach high failure risk, so the historical training data is dominated by the low-risk, non-failing case. This remains true even though one individual pump, once it starts to degrade, can show a clear, obvious rising trend like the one in this table.

Marking criteria: 1 mark for correctly describing the rising trend, 1 mark for a specific recommended action tied to week 5's risk level, 1 mark for identifying that rarity is a fleet-wide training-data property, 1 mark for distinguishing this from one pump's clearly rising individual risk.

core5 marksCompare collaborative filtering and content-based filtering as used in a recommendation system, and explain one weakness of each.

Show worked solution →

Collaborative filtering recommends items by finding users with similar past behaviour to the target user and suggesting items those similar users liked. It needs no information about the items themselves, only a matrix of user interactions. Weakness: the cold start problem. A brand new user (or a brand new item with no ratings yet) has no interaction history, so the system cannot find "similar users" or "similar items" to base a recommendation on.

Content-based filtering recommends items by comparing an item's own features (genre, description, ingredients, specifications) to items a user has liked before. It works even for a new item as soon as its features are known. Weakness: over-specialisation. Because it only ever suggests items similar to what a user already liked, it struggles to introduce genuinely novel or unexpected items, narrowing the user's exposure over time.

Marking criteria: 1 mark for a correct description of collaborative filtering, 1 mark for a correct description of content-based filtering, 1 mark each for a genuine, distinct weakness of each technique (2 marks), with credit only if the weakness is tied to how that specific technique works.

exam6 marksA logistics company installs vibration and temperature sensors on 400 delivery trucks and wants to use predictive maintenance to reduce roadside breakdowns. Design an ML solution. Your answer must name the type of learning, the training data required, one way to handle the rare-event problem, and one MLOps practice needed after deployment.

Show worked solution →

Type of learning: Supervised learning: either regression (predict days until failure) or binary classification (will this truck fail within the next 14 days), trained on historical sensor readings paired with the dates of past breakdowns.
Training data: Time series of vibration, temperature, engine hours and odometer readings from each truck's sensors, labelled with whether and when a breakdown followed. Data should span multiple truck models and driving conditions so the model generalises across the fleet, not just to one truck type.
Handling the rare-event problem: Because breakdowns are rare relative to normal operation, the training set will be heavily imbalanced. Techniques include class weighting (penalising a missed failure more heavily than a false alarm in the loss function), oversampling breakdown examples, or framing the task as anomaly detection so the model flags sensor patterns that deviate from a truck's normal baseline rather than needing many labelled failure examples.
MLOps practice after deployment: Ongoing monitoring: track the model's predicted-risk distribution and its real-world false negative rate (missed breakdowns) over time, and retrain periodically as trucks age, routes change, or new truck models are added to the fleet, since a model trained on the original 400 trucks will drift as the fleet and conditions change.

Marking criteria: 1 mark for the correct type of learning, 1 mark for realistic and specific training data, 2 marks for a genuine, named technique to handle class imbalance, 2 marks for a specific MLOps practice (not just "the company should monitor it") linked to why it is needed for this system.

exam7 marksEvaluate the claim that "the hardest part of building an industrial machine learning system is training an accurate model." Support your evaluation using at least two of image recognition, NLP, recommendation systems or predictive maintenance.

Show worked solution →

This is an EVALUATE question: markers reward a clear judgement supported by contrasted evidence from named applications, not a description of each application in isolation.

Thesis: The claim is largely false for real industrial systems: reaching a high training accuracy is usually achievable with enough labelled data and compute, but keeping a deployed model reliable, fair and useful over time is a harder, ongoing engineering problem that training accuracy alone does not capture.
Evidence 1: predictive maintenance: A model can reach very high training accuracy simply by predicting "no failure" for almost every example, since failures are rare; this training accuracy is misleadingly high and does not reflect real usefulness. The genuinely hard part is designing cost-sensitive evaluation so the model actually catches the rare failures that matter, and then keeping it accurate as machines age and sensor patterns drift, which requires ongoing monitoring and retraining long after the "accurate model" was first trained.
Evidence 2: recommendation systems: A recommender can achieve strong offline accuracy on historical click data yet still fail commercially if it creates filter bubbles or cannot handle the cold start problem for new users and products. Here, "accuracy" on past data is not even the right target: the harder problem is designing an evaluation that reflects long-term customer satisfaction, and building the infrastructure (A/B testing, monitoring engagement over weeks, not just click-through in the moment) to know whether the deployed system is actually working.
Counterpoint: For some tightly scoped image recognition tasks, such as detecting a single well-defined defect type on a stable production line with abundant labelled examples, training an accurate model genuinely is close to the whole problem, since the input distribution barely changes after deployment. This shows the claim is not uniformly false, but it is the exception rather than the rule.
Judgement: Across most industrial ML systems, especially those with rare events, evolving user behaviour, or changing real-world conditions, the ongoing MLOps burden (evaluation design, monitoring, retraining, handling imbalance and drift) outweighs the initial difficulty of reaching a good training accuracy, so the claim should be rejected as an overgeneralisation.

Marker's note: top-band answers (1) state an explicit thesis rather than just listing pros and cons, (2) use accurate, specific detail from at least two named applications rather than generic statements, (3) include a genuine counterpoint before reaching a judgement, and (4) end with an explicit judgement that directly answers the claim.