What is human evolution and migration?

Projects that sequenced thousands of people from populations across every continent built a shared catalogue of allele frequencies. The clear trend is that genetic diversity is highest in African populations and decreases with distance from Africa (shown above). The relationship this reveals is that populations further from Africa descend from progressively smaller migrating groups (each carrying a subset of alleles - a "serial founder effect"), which supports the out-of-Africa model.

What is inherited disease?

Pooling de-identified genetic-screening data lets a health authority measure the carrier frequency of a disease allele across many regions. The pattern that certain recessive alleles are concentrated in particular ancestral populations confirms the inheritance is autosomal recessive and tells health services where to target screening and genetic counselling.

What is conservation genetics?

For a threatened species, pooled data from monitored populations and captive-breeding records track the number of different alleles and the genotype frequencies over time. A falling allele count signals loss of genetic diversity from drift and inbreeding; managers then choose breeding pairs or translocations that add alleles and reduce inbreeding.

What is name the trend, pattern AND relationship?

The dot point uses all three words. In a data question, state the trend (direction over the axis), the pattern (which groups differ) and the relationship (what it implies), and link to the named context (evolution, disease or conservation).

What is use "suggests" / "supports"?

Frequency data is evidence, not proof. Cautious, evidence-based language ("the data supports the out-of-Africa model") reads as Band 6.

What is address the collaboration explicitly?

For "investigate the use of a large-scale collaborative project", say why pooled data is more reliable (sample size, representativeness, replication) and, for "discuss/evaluate", give a limitation (sampling bias, ethics/privacy, inconsistent methods).

What is use precise vocabulary?

Gene pool, allele frequency, genotype frequency, genetic drift, gene flow, founder effect, carrier frequency, out-of-Africa model. Precise terms separate Band 6 from Band 4.

§-Syllabus dot point

NSWBiologySyllabus dot point

Inquiry Question 5: Can population genetics be used to determine inheritance patterns in a population?

Investigate the use of data analysis from a large-scale collaborative project to identify trends, patterns and relationships, for example: the use of population genetics in the study of human evolution; population genetics studies used to determine the inheritance of a disease or disorder; population genetics relating to human evolution

A focused HSC Biology Module 5 answer on using large-scale collaborative data to find trends in population genetics: allele and genotype frequencies in a gene pool, why pooled data matters, and examples from the 1000 Genomes Project, conservation genetics, disease-allele tracking and human evolution.

Generated by Claude Opus 4.813 min answerUpdated 2026-06-28

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Quick answer

Population genetics studies the gene pool of a whole population using allele frequencies (which sum to 1) and genotype frequencies, and these change through selection, genetic drift, gene flow, mutation and non-random mating. Large-scale collaborative projects such as the 1000 Genomes Project pool genetic data from many populations so researchers can compare frequencies and read trends, patterns and relationships. Worked examples include human evolution (diversity falls with distance from Africa, supporting the out-of-Africa model), inherited-disease tracking (carrier frequencies across regions), and conservation (monitoring loss of diversity in a threatened species). Pooling gives larger, more representative, replicable samples that reveal rare alleles and small trends, though sampling bias and ethics are limitations.

What this dot point is asking

NESA wants you to investigate how data from a large-scale collaborative project is analysed to identify trends, patterns and relationships in population genetics. The command word is "investigate", so you are expected to know how the data is gathered and pooled, and crucially how scientists read trends, patterns and relationships out of it - applied to a real context such as human evolution, an inherited disease, or conservation of a threatened species.

This is an applied dot point, not a pure-recall one. Marks reward students who can take frequency data and reason about what it shows: which populations are related, how a disease allele is distributed, or whether a gene pool is shrinking. You should be able to do a simple allele-frequency calculation and interpret a chart or table of frequencies.

The answer

What population genetics is

Population genetics studies the alleles in a whole population rather than in a single individual. The key idea is the gene pool: the total set of all alleles of all genes in every member of a population.

Two quantities describe a gene pool:

Allele frequency - the proportion of all copies of a gene that are a particular allele. For a gene with alleles $A$ and $a$ , $\text{freq}(A) + \text{freq}(a) = 1$ .
Genotype frequency - the proportion of individuals with a given genotype (e.g. $AA$ , $Aa$ or $aa$ ).

Calculating an allele frequency. Because each diploid individual carries two alleles, you count alleles, not individuals. For a sample of $AA$ , $Aa$ and $aa$ individuals:

\text{freq}(A) = \frac{(2 \times \text{number of } AA) + (\text{number of } Aa)}{2 \times \text{total individuals}}

So in a sample of 100 individuals with 36 $AA$ , 48 $Aa$ and 16 $aa$ , there are $(2 \times 36) + 48 = 120$ copies of $A$ out of 200 alleles, giving $\text{freq}(A) = 0.60$ and $\text{freq}(a) = 0.40$ .

How and why allele frequencies change

A gene pool is not fixed. Allele frequencies shift between generations through:

Natural selection - alleles that aid survival and reproduction become more common.
Genetic drift - random change from chance alone, strongest in small populations.
Gene flow (migration) - individuals moving between populations carry alleles with them, adding variation.
Mutation - the original source of new alleles.
Non-random mating - including inbreeding, which raises homozygosity.

(Hardy-Weinberg gives a formal "no-change" baseline, but the HSC does not require its algebra here - you only need the idea that these forces change frequencies over time.)

Large-scale collaborative projects and the value of pooled data

To see real trends you need a lot of data from many populations. No single lab can sequence enough genomes, so researchers run large-scale collaborative projects: many institutions pool genetic data into one shared, often open-access, database.

Why pooling matters:

A much larger sample size makes allele-frequency estimates more reliable and reduces sampling error.
Sampling many populations and regions lets researchers compare frequencies and spot patterns a single sample would miss.
Even rare alleles and small trends become detectable.
Results can be cross-checked (replicated) between contributors, improving validity.

Worked applications - reading trends, patterns and relationships

Human evolution and migration (1000 Genomes / Human Genome Diversity work): Projects that sequenced thousands of people from populations across every continent built a shared catalogue of allele frequencies. The clear trend is that genetic diversity is highest in African populations and decreases with distance from Africa (shown above). The relationship this reveals is that populations further from Africa descend from progressively smaller migrating groups (each carrying a subset of alleles - a "serial founder effect"), which supports the out-of-Africa model. The same datasets revealed the pattern that non-African genomes carry a few percent of Neanderthal DNA, evidence of ancient interbreeding.
Inherited disease: Pooling de-identified genetic-screening data lets a health authority measure the carrier frequency of a disease allele across many regions. The pattern that certain recessive alleles are concentrated in particular ancestral populations confirms the inheritance is autosomal recessive and tells health services where to target screening and genetic counselling.
Conservation genetics: For a threatened species, pooled data from monitored populations and captive-breeding records track the number of different alleles and the genotype frequencies over time. A falling allele count signals loss of genetic diversity from drift and inbreeding; managers then choose breeding pairs or translocations that add alleles and reduce inbreeding.

Worked exam answers

Worked example 1 - calculate an allele frequency (3 marks)

Question. In a sample of 200 diploid plants, 98 are $RR$ , 84 are $Rr$ and 18 are $rr$ . Calculate the frequency of the $r$ allele.

Model answer. Total alleles $= 2 \times 200 = 400$ . Copies of $r = (2 \times 18) + 84 = 36 + 84 = 120$ . So $\text{freq}(r) = \dfrac{120}{400} = 0.30$ (30 percent).

Marker's note: 1 mark for counting total alleles (400), 1 mark for counting $r$ copies correctly (remembering each heterozygote carries one $r$ ), 1 mark for the correct final frequency. Counting individuals instead of alleles, or forgetting the heterozygotes, loses marks.

Worked example 2 - interpret a trend in pooled data (4 marks)

Question. A collaborative project plots a genetic-diversity index for human populations against their distance from East Africa. The index falls steadily from about 0.92 in African populations to about 0.58 in Indigenous American populations. Identify the trend and explain what it suggests about human origins.

Model answer. The trend is that genetic diversity decreases as distance from East Africa increases. This suggests modern humans originated in Africa and spread outward in waves: each migrating group carried only a subset of the parent population's alleles (a serial founder effect), so populations further from Africa have less diversity. This pattern is strong evidence for the out-of-Africa model of human evolution.

Marker's note: 1 mark for stating the trend (diversity falls with distance), 1 mark for the founder-effect/subset mechanism, 1 mark for linking to an African origin, 1 mark for naming the out-of-Africa model. Describing the graph without explaining the mechanism caps at 2 marks.

Worked example 3 - value of pooled collaborative data (3 marks)

Question. Explain why a large-scale collaborative project gives more reliable conclusions about allele frequencies than a single laboratory's study.

Model answer. Pooling data from many laboratories produces a far larger sample size drawn from many populations, so allele-frequency estimates are more representative of the true gene pool and less affected by sampling error. The larger dataset can detect rare alleles and small trends, and results can be cross-checked between contributors, improving reliability and validity. A single small study can be skewed by chance or by one unrepresentative sample.

Marker's note: marks for (1) larger/more representative sample reducing sampling error, (2) detecting rare alleles or small trends, (3) replication across labs. Simply writing "more data is better" without linking to reliability does not earn full marks.

Common traps

Counting individuals instead of alleles: Allele frequency counts allele copies: each diploid individual has two, and each heterozygote contributes one copy of each allele. Dividing by the number of people instead of $2 \times$ people is the most common arithmetic error.
Treating a pattern as proof of cause: A difference in allele frequency between populations is a correlation/pattern, not automatically a cause. NESA rewards students who say the trend "suggests" or "supports" a conclusion and note it needs further investigation.
Confusing allele frequency with genotype frequency: "30 percent of people are $rr$ " (a genotype frequency) is not the same as "the $r$ allele frequency is 30 percent". Read which one the question asks for.
Forgetting the value of collaboration: A question about a "large-scale collaborative project" is partly testing why pooling data helps (bigger, more representative, replicable sample). Answers that describe only the biology and ignore the collaborative/data-analysis angle miss marks.

Exam technique

Show the allele count: In a frequency calculation, always write the total number of alleles ( $2 \times$ individuals) and the count of the target allele before the final fraction - markers award method marks for the working, not just the answer.
Name the trend, pattern AND relationship: The dot point uses all three words. In a data question, state the trend (direction over the axis), the pattern (which groups differ) and the relationship (what it implies), and link to the named context (evolution, disease or conservation).
Use "suggests" / "supports": Frequency data is evidence, not proof. Cautious, evidence-based language ("the data supports the out-of-Africa model") reads as Band 6.
Address the collaboration explicitly: For "investigate the use of a large-scale collaborative project", say why pooled data is more reliable (sample size, representativeness, replication) and, for "discuss/evaluate", give a limitation (sampling bias, ethics/privacy, inconsistent methods).
Use precise vocabulary: Gene pool, allele frequency, genotype frequency, genetic drift, gene flow, founder effect, carrier frequency, out-of-Africa model. Precise terms separate Band 6 from Band 4.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksDefine the terms 'gene pool' and 'allele frequency' as used in population genetics.

Show worked solution →

1 mark - gene pool. The gene pool is the total collection of all alleles of all genes present in every member of a population at a given time.

1 mark - allele frequency. The allele frequency is the proportion (or percentage) of a particular allele among all the alleles of that gene in the population.

Both definitions must refer to a population (not an individual) to earn the mark. Saying allele frequency is "how common a gene is" without referencing the proportion of alleles is too vague for the mark.

foundation3 marksA population of 100 diploid butterflies is sampled for a wing-colour gene with two alleles, B and b. The counts are 36 BB, 48 Bb and 16 bb. Calculate the frequency of the B allele and the frequency of the b allele, showing your working.

Show worked solution →

1 mark - count the alleles: Each diploid individual carries two alleles, so the population has $200$ alleles in total. Number of $B$ alleles $= (2 \times 36) + 48 = 72 + 48 = 120$ . Number of $b$ alleles $= (2 \times 16) + 48 = 32 + 48 = 80$ .
1 mark - frequency of B: $\text{freq}(B) = \dfrac{120}{200} = 0.60$ (60 percent).
1 mark - frequency of b: $\text{freq}(b) = \dfrac{80}{200} = 0.40$ (40 percent).

The two frequencies must sum to $1.0$ ; a quick check ( $0.60 + 0.40 = 1.0$ ) confirms the working. A common slip is to count individuals instead of alleles - the heterozygotes (Bb) contribute one allele of each type.

foundation2 marksExplain why pooling data from many laboratories in a large-scale collaborative project gives more reliable allele-frequency estimates than a single small study.

Show worked solution →

1 mark - larger and more representative sample. Pooling data produces a much larger sample size drawn from many populations and regions, so the allele-frequency estimates are more representative of the true gene pool and less affected by chance sampling error.

1 mark - reliability / validity. A larger, shared dataset lets researchers detect smaller or rarer trends, cross-check (replicate) results between labs, and reduces the influence of any one biased or unusual sample, improving the reliability and validity of the conclusions.

The mark is for linking sample size to reliability, not just stating "more data is better".

core4 marksDescribe how a large-scale collaborative project such as the 1000 Genomes Project can be used to identify trends, patterns and relationships in allele frequencies between human populations.

Show worked solution →

Award up to 4 marks for a description that names the project's approach AND links it to trends, patterns and relationships.

Pooled sequencing (1 mark): The project sequenced the genomes of thousands of people from many populations across several continents and compiled the data into a single shared, open database.
Allele-frequency catalogue (1 mark): For each genetic variant, the project records the frequency of each allele in each population group, building a catalogue that can be compared across populations.
Identifying patterns (1 mark): Comparing these frequencies reveals patterns - for example, some alleles are common in one continental group but rare in another, while most variation is shared by all groups.
Relationships and trends (1 mark): These patterns reveal relationships between populations (which groups are more closely related) and trends such as decreasing genetic diversity with distance from Africa, supporting the out-of-Africa model of human migration.

A response that only describes sequencing without linking it to comparing frequencies between populations caps at 2 marks.

core5 marksA conservation team monitors a small, isolated population of an endangered marsupial. Genetic testing over 40 years shows the number of different alleles in the population falling steadily, and the proportion of individuals homozygous for a harmful recessive allele rising. Analyse what these trends indicate about the population and explain why pooled, collaborative genetic data is valuable for its management.

Show worked solution →

Falling allele number = loss of genetic diversity (1-2 marks): A steady fall in the number of different alleles shows the population is losing genetic diversity (its gene pool is shrinking). In a small, isolated population this is expected from genetic drift (random loss of alleles each generation) and inbreeding, because there is no gene flow from outside.
Rising homozygosity for the harmful allele (1-2 marks): A rising proportion of homozygous-recessive individuals indicates increasing inbreeding, which brings two copies of harmful recessive alleles together more often. This raises the rate of the genetic disorder and lowers survival and fertility - reducing the population's chance of long-term survival.
Value of collaborative data (1-2 marks): Pooled genetic data across multiple monitored populations (and captive-breeding records) lets managers compare diversity between populations, identify which individuals carry the rarest alleles, and choose mating pairs or translocations that add new alleles and reduce inbreeding. Sharing data also allows trends to be detected earlier and more reliably than any single small study could.

A top response names genetic drift and inbreeding, links them to the two trends, and connects the management decision to restoring genetic diversity.

exam6 marksA health authority pools de-identified genetic-screening data from clinics across several regions to track the frequency of a recessive allele that causes an inherited blood disorder. The combined dataset shows the carrier frequency is markedly higher in some regions than others. (a) Explain how analysing this pooled data helps identify inheritance patterns of the disorder. (b) Discuss the benefits and one limitation of using a large-scale collaborative dataset for this purpose.

Show worked solution →

Target a sequenced response that uses the data to reason about inheritance AND weighs the value of the collaborative approach.

(a) Identifying the inheritance pattern (2-3 marks): Because the dataset records carrier (heterozygote) and affected (homozygous-recessive) frequencies across many people, it confirms the disorder is autosomal recessive: carriers are unaffected, and two carriers have a $\tfrac{1}{4}$ chance of an affected child. The regional differences in carrier frequency reveal a pattern - the allele is concentrated in certain ancestral populations, allowing the disorder's inheritance and distribution to be mapped and predicted (for example, who to offer screening to).
(b) Benefits (1-2 marks): A large pooled dataset gives a big, representative sample, so even a rare allele is detected reliably; it lets authorities compare regions, target screening and genetic counselling, and replicate findings across clinics, improving reliability.
(b) One limitation (1 mark): A valid limitation, for example: sampling bias (only people who attended screening are included, which may not represent the whole population), privacy/ethical concerns with combining personal genetic data, or inconsistent testing methods between clinics making data harder to compare.

Full marks need the inheritance pattern named as autosomal recessive in (a), at least one clear benefit AND one valid limitation in (b).

exam7 marksEvaluate the use of data analysis from large-scale collaborative projects to identify trends, patterns and relationships in population genetics. In your answer, refer to at least two different applications (for example human evolution, inherited disease, or conservation).

Show worked solution →

"Evaluate" requires a judgement supported by evidence weighed on both sides. A Band 6 response reaches a clear conclusion, not just a list of examples.

What the approach does (1 mark): Large-scale collaborative projects pool genetic data from many individuals and populations into shared databases, letting researchers measure and compare allele and genotype frequencies to find trends, patterns and relationships no single study could see.
Application 1 - strengths (1-2 marks): In human evolution and migration, projects such as the 1000 Genomes Project show genetic diversity is highest in African populations and decreases with distance from Africa, supporting the out-of-Africa model and mapping ancient migration and interbreeding (for example with Neanderthals). The huge, shared sample makes these relationships statistically robust.
Application 2 - strengths (1-2 marks): In inherited disease and conservation, pooled data tracks carrier-allele frequencies across regions (targeting screening and counselling) and monitors loss of diversity in endangered species (guiding breeding to reduce inbreeding). Pooling detects rare alleles and small trends reliably.
Limitations / judgement (2-3 marks): Limitations include sampling bias (some populations are over- or under-represented), ethical and privacy issues with shared genetic data, inconsistent methods between contributors, and the fact that frequency data shows correlation, not always cause. A supported conclusion: on balance the approach is highly valuable and largely justified - the scale and shared, replicable data give insights impossible from small studies - provided sampling bias and ethical safeguards are managed. An answer that lists applications without an explicit weighed judgement caps at 5 marks.