Inquiry Question 5: Can population genetics be used to determine inheritance patterns in a population?
Investigate the use of data analysis from a large-scale collaborative project to identify trends, patterns and relationships, for example: the use of population genetics in the study of human evolution; population genetics studies used to determine the inheritance of a disease or disorder; population genetics relating to human evolution
A focused HSC Biology Module 5 answer on using large-scale collaborative data to find trends in population genetics: allele and genotype frequencies in a gene pool, why pooled data matters, and examples from the 1000 Genomes Project, conservation genetics, disease-allele tracking and human evolution.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to investigate how data from a large-scale collaborative project is analysed to identify trends, patterns and relationships in population genetics. The command word is "investigate", so you are expected to know how the data is gathered and pooled, and crucially how scientists read trends, patterns and relationships out of it - applied to a real context such as human evolution, an inherited disease, or conservation of a threatened species.
This is an applied dot point, not a pure-recall one. Marks reward students who can take frequency data and reason about what it shows: which populations are related, how a disease allele is distributed, or whether a gene pool is shrinking. You should be able to do a simple allele-frequency calculation and interpret a chart or table of frequencies.
The answer
What population genetics is
Population genetics studies the alleles in a whole population rather than in a single individual. The key idea is the gene pool: the total set of all alleles of all genes in every member of a population.
Two quantities describe a gene pool:
- Allele frequency - the proportion of all copies of a gene that are a particular allele. For a gene with alleles and , .
- Genotype frequency - the proportion of individuals with a given genotype (e.g. , or ).
Calculating an allele frequency. Because each diploid individual carries two alleles, you count alleles, not individuals. For a sample of , and individuals:
So in a sample of 100 individuals with 36 , 48 and 16 , there are copies of out of 200 alleles, giving and .
How and why allele frequencies change
A gene pool is not fixed. Allele frequencies shift between generations through:
- Natural selection - alleles that aid survival and reproduction become more common.
- Genetic drift - random change from chance alone, strongest in small populations.
- Gene flow (migration) - individuals moving between populations carry alleles with them, adding variation.
- Mutation - the original source of new alleles.
- Non-random mating - including inbreeding, which raises homozygosity.
(Hardy-Weinberg gives a formal "no-change" baseline, but the HSC does not require its algebra here - you only need the idea that these forces change frequencies over time.)
Large-scale collaborative projects and the value of pooled data
To see real trends you need a lot of data from many populations. No single lab can sequence enough genomes, so researchers run large-scale collaborative projects: many institutions pool genetic data into one shared, often open-access, database.
Why pooling matters:
- A much larger sample size makes allele-frequency estimates more reliable and reduces sampling error.
- Sampling many populations and regions lets researchers compare frequencies and spot patterns a single sample would miss.
- Even rare alleles and small trends become detectable.
- Results can be cross-checked (replicated) between contributors, improving validity.
Worked applications - reading trends, patterns and relationships
- Human evolution and migration (1000 Genomes / Human Genome Diversity work)
- Projects that sequenced thousands of people from populations across every continent built a shared catalogue of allele frequencies. The clear trend is that genetic diversity is highest in African populations and decreases with distance from Africa (shown above). The relationship this reveals is that populations further from Africa descend from progressively smaller migrating groups (each carrying a subset of alleles - a "serial founder effect"), which supports the out-of-Africa model. The same datasets revealed the pattern that non-African genomes carry a few percent of Neanderthal DNA, evidence of ancient interbreeding.
- Inherited disease
- Pooling de-identified genetic-screening data lets a health authority measure the carrier frequency of a disease allele across many regions. The pattern that certain recessive alleles are concentrated in particular ancestral populations confirms the inheritance is autosomal recessive and tells health services where to target screening and genetic counselling.
- Conservation genetics
- For a threatened species, pooled data from monitored populations and captive-breeding records track the number of different alleles and the genotype frequencies over time. A falling allele count signals loss of genetic diversity from drift and inbreeding; managers then choose breeding pairs or translocations that add alleles and reduce inbreeding.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation2 marksDefine the terms 'gene pool' and 'allele frequency' as used in population genetics.Show worked solution →
1 mark - gene pool. The gene pool is the total collection of all alleles of all genes present in every member of a population at a given time.
1 mark - allele frequency. The allele frequency is the proportion (or percentage) of a particular allele among all the alleles of that gene in the population.
Both definitions must refer to a population (not an individual) to earn the mark. Saying allele frequency is "how common a gene is" without referencing the proportion of alleles is too vague for the mark.
foundation3 marksA population of 100 diploid butterflies is sampled for a wing-colour gene with two alleles, B and b. The counts are 36 BB, 48 Bb and 16 bb. Calculate the frequency of the B allele and the frequency of the b allele, showing your working.Show worked solution →
- 1 mark - count the alleles
- Each diploid individual carries two alleles, so the population has alleles in total. Number of alleles . Number of alleles .
- 1 mark - frequency of B
- (60 percent).
- 1 mark - frequency of b
- (40 percent).
The two frequencies must sum to ; a quick check () confirms the working. A common slip is to count individuals instead of alleles - the heterozygotes (Bb) contribute one allele of each type.
foundation2 marksExplain why pooling data from many laboratories in a large-scale collaborative project gives more reliable allele-frequency estimates than a single small study.Show worked solution →
1 mark - larger and more representative sample. Pooling data produces a much larger sample size drawn from many populations and regions, so the allele-frequency estimates are more representative of the true gene pool and less affected by chance sampling error.
1 mark - reliability / validity. A larger, shared dataset lets researchers detect smaller or rarer trends, cross-check (replicate) results between labs, and reduces the influence of any one biased or unusual sample, improving the reliability and validity of the conclusions.
The mark is for linking sample size to reliability, not just stating "more data is better".
core4 marksDescribe how a large-scale collaborative project such as the 1000 Genomes Project can be used to identify trends, patterns and relationships in allele frequencies between human populations.Show worked solution →
Award up to 4 marks for a description that names the project's approach AND links it to trends, patterns and relationships.
- Pooled sequencing (1 mark)
- The project sequenced the genomes of thousands of people from many populations across several continents and compiled the data into a single shared, open database.
- Allele-frequency catalogue (1 mark)
- For each genetic variant, the project records the frequency of each allele in each population group, building a catalogue that can be compared across populations.
- Identifying patterns (1 mark)
- Comparing these frequencies reveals patterns - for example, some alleles are common in one continental group but rare in another, while most variation is shared by all groups.
- Relationships and trends (1 mark)
- These patterns reveal relationships between populations (which groups are more closely related) and trends such as decreasing genetic diversity with distance from Africa, supporting the out-of-Africa model of human migration.
A response that only describes sequencing without linking it to comparing frequencies between populations caps at 2 marks.
core5 marksA conservation team monitors a small, isolated population of an endangered marsupial. Genetic testing over 40 years shows the number of different alleles in the population falling steadily, and the proportion of individuals homozygous for a harmful recessive allele rising. Analyse what these trends indicate about the population and explain why pooled, collaborative genetic data is valuable for its management.Show worked solution →
- Falling allele number = loss of genetic diversity (1-2 marks)
- A steady fall in the number of different alleles shows the population is losing genetic diversity (its gene pool is shrinking). In a small, isolated population this is expected from genetic drift (random loss of alleles each generation) and inbreeding, because there is no gene flow from outside.
- Rising homozygosity for the harmful allele (1-2 marks)
- A rising proportion of homozygous-recessive individuals indicates increasing inbreeding, which brings two copies of harmful recessive alleles together more often. This raises the rate of the genetic disorder and lowers survival and fertility - reducing the population's chance of long-term survival.
- Value of collaborative data (1-2 marks)
- Pooled genetic data across multiple monitored populations (and captive-breeding records) lets managers compare diversity between populations, identify which individuals carry the rarest alleles, and choose mating pairs or translocations that add new alleles and reduce inbreeding. Sharing data also allows trends to be detected earlier and more reliably than any single small study could.
A top response names genetic drift and inbreeding, links them to the two trends, and connects the management decision to restoring genetic diversity.
exam6 marksA health authority pools de-identified genetic-screening data from clinics across several regions to track the frequency of a recessive allele that causes an inherited blood disorder. The combined dataset shows the carrier frequency is markedly higher in some regions than others. (a) Explain how analysing this pooled data helps identify inheritance patterns of the disorder. (b) Discuss the benefits and one limitation of using a large-scale collaborative dataset for this purpose.Show worked solution →
Target a sequenced response that uses the data to reason about inheritance AND weighs the value of the collaborative approach.
- (a) Identifying the inheritance pattern (2-3 marks)
- Because the dataset records carrier (heterozygote) and affected (homozygous-recessive) frequencies across many people, it confirms the disorder is autosomal recessive: carriers are unaffected, and two carriers have a chance of an affected child. The regional differences in carrier frequency reveal a pattern - the allele is concentrated in certain ancestral populations, allowing the disorder's inheritance and distribution to be mapped and predicted (for example, who to offer screening to).
- (b) Benefits (1-2 marks)
- A large pooled dataset gives a big, representative sample, so even a rare allele is detected reliably; it lets authorities compare regions, target screening and genetic counselling, and replicate findings across clinics, improving reliability.
- (b) One limitation (1 mark)
- A valid limitation, for example: sampling bias (only people who attended screening are included, which may not represent the whole population), privacy/ethical concerns with combining personal genetic data, or inconsistent testing methods between clinics making data harder to compare.
Full marks need the inheritance pattern named as autosomal recessive in (a), at least one clear benefit AND one valid limitation in (b).
exam7 marksEvaluate the use of data analysis from large-scale collaborative projects to identify trends, patterns and relationships in population genetics. In your answer, refer to at least two different applications (for example human evolution, inherited disease, or conservation).Show worked solution →
"Evaluate" requires a judgement supported by evidence weighed on both sides. A Band 6 response reaches a clear conclusion, not just a list of examples.
- What the approach does (1 mark)
- Large-scale collaborative projects pool genetic data from many individuals and populations into shared databases, letting researchers measure and compare allele and genotype frequencies to find trends, patterns and relationships no single study could see.
- Application 1 - strengths (1-2 marks)
- In human evolution and migration, projects such as the 1000 Genomes Project show genetic diversity is highest in African populations and decreases with distance from Africa, supporting the out-of-Africa model and mapping ancient migration and interbreeding (for example with Neanderthals). The huge, shared sample makes these relationships statistically robust.
- Application 2 - strengths (1-2 marks)
- In inherited disease and conservation, pooled data tracks carrier-allele frequencies across regions (targeting screening and counselling) and monitors loss of diversity in endangered species (guiding breeding to reduce inbreeding). Pooling detects rare alleles and small trends reliably.
- Limitations / judgement (2-3 marks)
- Limitations include sampling bias (some populations are over- or under-represented), ethical and privacy issues with shared genetic data, inconsistent methods between contributors, and the fact that frequency data shows correlation, not always cause. A supported conclusion: on balance the approach is highly valuable and largely justified - the scale and shared, replicable data give insights impossible from small studies - provided sampling bias and ethical safeguards are managed. An answer that lists applications without an explicit weighed judgement caps at 5 marks.
