Skip to main content
ExamExplained
NSW · Software Engineering
Software Engineering study scene
§-Syllabus dot point
NSWSoftware EngineeringSyllabus dot point

Inquiry Question 2: How are machine learning systems used to develop solutions?

Identify the ethical implications of automation and artificial intelligence, including accountability, transparency, employment effects and the use of personal data

A focused answer to the HSC Software Engineering Module 3 dot point on AI ethics. Accountability, transparency, employment, personal data, real cases (COMPAS, Amazon hiring, Robodebt), the worked example, and the traps markers look for.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to identify the major ethical concerns raised by automated decision-making, and to be able to discuss them with reference to real cases. You should know at least two cases in detail, and be ready to connect named ethical principles (accountability, transparency, fairness, privacy) to specific evidence from those cases rather than discussing ethics in the abstract.

The answer

The big ethical concerns

Accountability and redress.

When an automated decision is wrong, who is responsible? If a self-driving car causes a crash, is it the manufacturer, the software supplier, the safety driver, or the company that operates the fleet? Affected individuals need a clear avenue to complain and to be made whole.

Transparency and explainability.

Many ML systems are black boxes. The model can predict refuse-this-loan without anyone being able to explain why. The European Union's GDPR includes a right to an explanation for automated decisions; Australian law is moving in the same direction.

Bias and fairness.

A model trained on biased data perpetuates bias at scale. See training-data-and-bias. Cases below.

Privacy and use of personal data.

ML systems are trained on personal data. Customers, patients and citizens may not have consented to that use, or even know it is happening. The General Data Protection Regulation (GDPR) and Australia's Privacy Act set baselines: lawful purpose, minimisation, retention limits, deletion rights.

Surveillance.

Facial recognition, gait analysis and behaviour prediction enable monitoring at scale. The Australian Human Rights Commission and the Office of the Australian Information Commissioner have both warned about facial recognition deployment.

Employment effects.

Automation displaces some jobs and creates others. Truck drivers, radiologists, call centre staff and translators all face changing labour markets. Industries and governments have responsibility to manage the transition (retraining, social safety nets).

Concentration of power.

A handful of companies train the largest models. Concentrated technical capability becomes concentrated economic and political power.

An automated decision pipeline with six ethical checkpoints A vertical data flow diagram. Personal data is collected from citizens, flows into training data, then into a model, which produces an automated decision that affects a person. Six ethical checkpoint labels sit beside the relevant stage: privacy and consent beside data collection, bias and fairness beside the model stage, transparency beside the decision stage, and accountability and contestability beside the affected person stage, with employment effects and concentration of power flagged as system-wide concerns at the base. Personal data collected from citizens Privacy and consent Model trained on historical training data Bias and fairness Automated decision produced Transparency A real person is affected by the outcome Accountability and contestability outcomes retrain the model System-wide concerns: employment effects (jobs displaced/created) and concentration of power (few firms control the largest models)

Case studies

Robodebt (Australia, 2016-2020).

The Australian government deployed an automated system to identify welfare debt by comparing self-reported income to ATO data, averaging annual income across pay periods. The averaging method produced false debts where income was lumpy. ~470,000 false debts totalling over 1.7billionwereissued.TheFederalCourtfoundtheschemeunlawfulin2019.ARoyalCommissionin2023found"venality,incompetenceandcowardice".Thegovernmentpaid1.7 billion were issued. The Federal Court found the scheme unlawful in 2019. A Royal Commission in 2023 found "venality, incompetence and cowardice". The government paid 1.8 billion in settlement. Lessons: human-in-the-loop for consequential decisions, no reverse onus of proof, external audit before deployment, deliberate decision-making about averaging assumptions.

COMPAS recidivism scoring (US, 2016).

Northpointe's COMPAS algorithm gave US courts a risk score for criminal defendants. ProPublica's 2016 analysis found Black defendants were nearly twice as likely as white defendants to be incorrectly labelled high risk, while white defendants were more likely to be incorrectly labelled low risk. The case sparked the entire algorithmic fairness research field.

Amazon hiring tool (US, 2018).

Amazon trained a model on a decade of CVs from successful hires, mostly male. The model learned to penalise CVs that mentioned "women's chess club captain" and to downrank graduates of women-only colleges. Amazon scrapped the project. Lessons: training on biased historical data reproduces bias, even removing the protected attribute does not help because proxies leak it.

Apple Card credit limits (US, 2019).

Goldman Sachs gave women lower credit limits than their husbands on the Apple Card despite shared finances and equivalent histories. Goldman could not explain why. The case prompted a regulator investigation and ultimately a finding that the bank had not violated fair lending laws, but the inability to explain the decisions revealed how opaque such systems can be.

Clearview AI facial recognition (global).

Clearview scraped 3 billion images from social media without consent and sold facial recognition to law enforcement. Australia's Information Commissioner ruled in 2021 that Clearview had breached the Privacy Act and ordered it to stop collecting data on Australians. Multiple regulators in the UK, Italy, France and Canada have made similar findings.

Generative AI and content (current).

LLMs train on web-scale text that includes copyrighted works without consent. Image generators do the same with art. Lawsuits and regulatory action are ongoing. Workers in writing, illustration, voice acting and translation face direct labour-market effects.

Principles for responsible deployment

The Australian Government's AI Ethics Principles (2019) and the OECD AI Principles (2019) converge on roughly the same list:

  1. Human, societal and environmental wellbeing is the primary goal.
  2. Human-centred values: respect human rights, diversity and individual autonomy.
  3. Fairness: avoid unfair discrimination.
  4. Privacy protection and security.
  5. Reliability and safety.
  6. Transparency and explainability.
  7. Contestability: people can challenge decisions.
  8. Accountability: responsibility is identifiable.

A worked code example: a fairness audit

from sklearn.metrics import confusion_matrix
import pandas as pd

audit = pd.DataFrame({
    "group": ["F"] * 100 + ["M"] * 100,
    "predicted": predictions,  # from the model
    "actual": actuals,         # known ground truth
})

for group, sub in audit.groupby("group"):
    tn, fp, fn, tp = confusion_matrix(sub["actual"], sub["predicted"]).ravel()
    fpr = fp / (fp + tn)
    fnr = fn / (fn + tp)
    print(f"{group}: false positive rate={fpr:.2f}, false negative rate={fnr:.2f}")

A 5 percentage-point gap in false positive rate between groups is grounds to halt deployment, investigate the source, and remediate.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2025 HSC6 marksDiscuss the ethical implications of using a machine learning system to make decisions about people, with reference to one real-world case study.
Show worked answer →

Machine learning systems making decisions about people raise three big ethical issues.

Accountability: when an automated decision is wrong, who is responsible? The developer who wrote the model, the company that deployed it, or the user who relied on it? Without clear accountability, harmed individuals cannot seek redress.

Transparency: many ML models are opaque. The model can predict a refusal without anyone being able to explain why. Where decisions affect people's lives (a loan refusal, a parole denial, a job rejection), the inability to explain the decision violates the right to a reasoned outcome.

Bias and harm: a model trained on biased data perpetuates bias at scale. One biased decision affects one person; an automated biased decision affects everyone.

Case study: Robodebt (Australia, 2016-2020). Centrelink deployed an automated system to identify welfare debts by comparing reported income to ATO data, averaging annual income across pay periods. The system issued ~470,000 incorrect debt notices for over 1.7billion.Peoplereceiveddebtstheydidnotowe,withlittlerecourseandnohumanverification.Thesystemwasfoundunlawful,aRoyalCommissionfound"venality,incompetenceandcowardice",andthegovernmentsettledfor1.7 billion. People received debts they did not owe, with little recourse and no human verification. The system was found unlawful, a Royal Commission found "venality, incompetence and cowardice", and the government settled for 1.8 billion.

Lessons: human-in-the-loop review for consequential decisions, the burden of proof should not shift to the affected individual, transparency about how the system works is essential, and external audit before deployment to high-stakes domains.

Markers reward at least two distinct ethical concerns, a named real-world case (Robodebt, COMPAS, Amazon hiring, Apple Card credit limits, Clearview AI), and at least one mitigation principle.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation3 marksDefine 'accountability' and 'transparency' as ethical concerns in automated decision-making, and explain why a system can satisfy one without satisfying the other.
Show worked solution →

Accountability is being able to identify who is responsible when an automated decision causes harm or is wrong, so the affected person has someone to hold answerable and a path to redress.

Transparency is being able to explain why a system produced a particular decision, in terms a person can understand.

A system can satisfy one without the other: a company might clearly accept legal accountability (e.g. "we are responsible for our model's decisions") while still using a black-box model that no one, including the company, can actually explain, so an affected individual knows WHO to complain to but not WHY the decision was made. Conversely, a fully explainable rule-based system with no clear ownership could tell a person exactly why they were rejected, but leave them with no one accountable to appeal to.

Marking criteria: 1 mark for a correct definition of accountability, 1 mark for a correct definition of transparency, 1 mark for a valid explanation of how the two can come apart (a specific example scenario, not just restating the definitions).

foundation3 marksName the case study in which a government automated income-averaging system issued approximately 470,000 incorrect debt notices, and state one specific safeguard that could have prevented the harm.
Show worked solution →

The case is Robodebt (Australia, 2016-2020).

One valid safeguard (any one, fully explained):

  • Human-in-the-loop verification: requiring a human caseworker to check and approve each debt notice before it was sent, rather than fully automating the decision, would have caught cases where income averaging produced an incorrect result for someone with irregular pay.
  • No reverse onus of proof: not shifting the burden onto the individual to disprove an automatically generated debt would have prevented people being pursued for debts the agency itself could not properly substantiate.
  • External audit before deployment: independently testing the income-averaging method against real pay patterns before rollout would likely have revealed it produced false positives for people with lumpy income.

Marking criteria: 1 mark for correctly naming Robodebt, 1 mark for a valid, specific safeguard, 1 mark for clearly explaining how that safeguard would have prevented the specific harm described (false debts from income averaging).

core5 marksThe table below summarises a fairness audit comparing a loan-approval model's false positive rate (incorrectly denying a creditworthy applicant) and false negative rate (incorrectly approving a risky applicant) across two applicant groups. | Group | False positive rate | False negative rate | |---|---|---| | Group X | 0.06 | 0.09 | | Group Y | 0.21 | 0.08 | (a) Identify the disparity revealed by this audit. (b) Explain why this disparity is an ethical concern even if the model's OVERALL accuracy is high. (c) Suggest one action the deploying company should take.
Show worked solution →
(a) The disparity
Group Y has a much higher false positive rate (0.21) than Group X (0.06), a gap of 0.15. This means creditworthy applicants in Group Y are denied loans they should have received far more often than creditworthy applicants in Group X, while the false negative rates are similar across both groups.
(b) Why this matters despite high overall accuracy
Overall accuracy averages across the whole population and can hide large disparities between subgroups; a model can be 90 percent accurate overall while systematically disadvantaging one group. This is unfair (a form of bias) because an individual's outcome depends partly on their group membership rather than solely on their actual creditworthiness, which may also breach anti-discrimination obligations even without any intent to discriminate.
(c) Recommended action
The company should halt or pause deployment for Group Y, investigate the source of the disparity (e.g. biased historical training data or a proxy feature correlated with group membership), and remediate the model (retraining on more representative or reweighted data, or applying a fairness constraint) before redeploying, then re-audit before resuming.

Marking criteria: 1 mark for correctly identifying the false positive rate gap, 1 mark for correctly noting the false negative rates are similar, 2 marks for a full explanation of why subgroup disparity is an ethical concern despite good overall accuracy, 1 mark for a specific, actionable remediation step (not just "fix the bias").

core5 marksCompare the ethical failures in the Robodebt case and the Amazon hiring tool case. In your answer, identify one ethical concern common to both cases and one way in which the two cases differ.
Show worked solution →

Common concern: lack of a human check before harm occurred. In Robodebt, automated debt notices were issued to citizens without a caseworker verifying the income-averaging calculation first. In the Amazon hiring tool, resumes were scored and effectively filtered without a recruiter checking whether the model's downranking of "women's" terms was legitimate. In both cases, an automated output was allowed to directly affect people (a debt demand; a rejected application) without a mandatory human review step that could have caught the specific flaw before it caused harm.

A key difference. Robodebt's harm came from a flawed AVERAGING METHOD applied uniformly (a design/logic flaw that produced false debts for anyone with irregular income, not tied to protected characteristics), whereas the Amazon tool's harm came from LEARNED BIAS in historical training data (a decade of mostly-male hires), which specifically and systematically disadvantaged women, a protected-characteristic-linked harm. Robodebt was ultimately found unlawful and cost the government $1.8 billion in settlement; Amazon internally identified the bias during development and cancelled the project before full deployment, so it did not cause the same scale of public harm.

Marking criteria: 1 mark for correctly identifying a valid common concern, 2 marks for explaining it clearly with reference to both cases, 1 mark for identifying a valid difference, 1 mark for explaining that difference with reference to both cases.

exam7 marksA hospital is considering an automated triage system that scores incoming emergency patients on urgency using a model trained on ten years of past triage records. Discuss the ethical implications of deploying this system, referring to at least THREE distinct ethical concerns and at least one real-world case study covered in this topic.
Show worked solution →

This is a DISCUSS question worth 7 marks: markers want breadth (multiple concerns), depth (specific reasoning), and a case study used as evidence, not decoration.

Concern 1: Bias from historical training data
Ten years of past triage records may reflect historical under-triage of certain groups (e.g. patients from communities with language barriers, or historically under-diagnosed conditions in women). A model trained on this data would learn and automate that same pattern at scale. This mirrors the Amazon hiring tool case, where a model trained on a decade of biased historical outcomes reproduced and scaled that bias rather than correcting it, even though no one intended it to discriminate.
Concern 2: Accountability when a triage score is wrong
If a patient is under-triaged and deteriorates while waiting, who is responsible: the developers, the hospital, or the triage nurse who deferred to the score? Clear accountability matters enormously in a life-or-death context, where the consequences of an unresolved "who is responsible" question are far more severe than in a typical commercial application.
Concern 3: Transparency and human-in-the-loop review
Nursing staff need to understand why a score was assigned to exercise appropriate clinical judgement rather than blindly following an opaque number. As with Robodebt, where automated debt decisions were issued with no human verification and no explanation to affected people, an unexplained and unreviewed triage score risks harming patients whose case does not fit the pattern the model learned, especially rare or atypical presentations.
Judgement
These concerns do not mean the system should not be built, but that it must be deployed as a decision-support tool, advisory only, with a human clinician as the accountable decision-maker, subgroup fairness audits performed before and after deployment, and clear escalation and override procedures, directly reflecting the lessons of Robodebt (human-in-the-loop, no automation of consequential decisions without verification) and the Amazon case (audit training data for historical bias before trusting model outputs).

Marking criteria: 1 mark per correctly identified and explained ethical concern relevant to this scenario (up to 3 concerns, 3 marks), 1 mark for accurately applying the Amazon case, 1 mark for accurately applying the Robodebt case, 1 mark for a specific mitigation tied to the scenario (human-in-the-loop, subgroup audit), 1 mark for an overall reasoned judgement rather than a list.

core4 marksExplain why the Clearview AI case (global) is primarily a privacy and consent issue rather than a bias issue, and state one specific action a regulator can take in response.
Show worked solution →

Why this is primarily a privacy and consent issue. Clearview AI scraped approximately 3 billion images from social media and other public websites without asking the people in those images for permission, and without any of them knowing their photo had been collected. The core wrong is not that the facial recognition matching was inaccurate or unfairly discriminatory between groups (a bias/fairness issue); it is that personal biometric data was collected and commercialised (sold to law enforcement) without lawful basis or consent, which is a privacy issue regardless of how accurate the matching later turned out to be.

Regulator action. Australia's Information Commissioner ruled in 2021 that Clearview breached the Privacy Act and ordered it to stop collecting data on Australians and to destroy existing data collected unlawfully. Equivalent regulators in the UK, Italy, France and Canada issued comparable findings and orders, showing this was treated as a data protection breach, not a bias/accuracy complaint.

Marking criteria: 1 mark for correctly identifying the absence of consent/lawful basis as the core issue, 1 mark for explicitly distinguishing this from a bias/fairness concern, 1 mark for a specific, accurate regulator action (Australian Information Commissioner ruling/order), 1 mark for noting the scale or multi-jurisdictional nature of the finding.

ExamExplained