PUBH2203 Biostatistics
For solutions, purchase a LIVE CHAT plan or contact us
4pm Thursday 25 August 2022
Question 1. [7 marks] This question relates to the following extract of a table from an article by Hui et al titled ‘Increase in preterm stillbirths in association with reduction in iatrogenic preterm births during COVID-19 lockdown in Australia: a multicentre cohort study’ published in Obstetrics, 2022. The exposure of interest was women who were exposed or unexposed to lockdown restrictions during pregnancy.
The methods included the following text: “Statistical significance was tested with the t-test or chi-squared test as appropriate.”
⦁ [1 mark] In which BMI category does the median BMI fall for the exposed group. Provide reasoning.
⦁ [1 mark] The P-value for Smoking in Pregnancy is 0.41. State the hypotheses that this p-value relates to and describe what conclusion can be drawn from this test.
⦁ [2 marks] Create a 2x2 contingency table showing the counts of Smoking in pregnancy by cohort (Exposed vs Control). By hand, calculate the odds of smoking in pregnancy separately for the Exposed and Control cohorts (provide your answers to 3 decimal places).
⦁ [2 marks] By hand, calculate the odds ratio and 95% confidence interval for the association between smoking in pregnancy and exposure to lockdown restrictions. Write a sentence that interprets the odds ratio and associated 95% confidence interval (provide your answers to 3 decimal places).
⦁ [1 mark] Is the P-value for Smoking in pregnancy compatible with the confidence interval you calculated in (d)? Explain your reasoning.
Question 2. [10 marks] Use SPSS and the file bsn81.sav available from the LMS pages to carry out the following statistical summaries and analyses. For this question, you will need to create a new variable called AGEGROUP which categorises age into groups, where the groups are defined as in computing 1 (and below). Ensure to assign appropriate labels to your created variables.
Age AGEGROUP
55.0 to 59.9 years 1
60.0 to 64.9 years 2
65.0 to 69.9 years 3
70.0 to 74.9 years 4
75.0 to 79.9 years 5
80.0 to 85.0 years 6
⦁ [1 mark] Produce a scatterplot of FVC against AGE and calculate an appropriate statistic that quantifies the relationship between these two variables.
⦁ [0.5 marks] Calculate the median, upper and lower quartiles for FVC and highlight these in your SPSS output.
⦁ [0.5 marks] Determine the interval that contains the middle 80% of FVC values.
⦁ [1 mark] Produce an Error Bar Chart that shows the mean and standard deviation of FVC by AGEGROUP and comment on observed differences across AGEGROUP levels.
⦁ [3 marks] Perform a one-way ANOVA and post-hoc t-tests to compare mean FVC across AGEGROUP. Present relevant output from SPSS, comment on the overall and pairwise tests results and make a conclusion about differences based on this analysis (include appropriate pairwise estimates).
⦁ [2 marks] Group EXERCISE to create a new dichotomous variable (with appropriate labels) called EXGROUP that represents those who do not exercise (EXERCISE=0) and those who exercise at least once per week (EXERCISE=1-7). Presenting relevant output, use SPSS to answer the following:
⦁ Perform an appropriate test that compares the mean FVC between those that do and do not exercise.
⦁ What is the estimated mean difference in FVC for these two groups and is this difference significant? Provide the p-value and 95% confidence for the difference.
⦁ [2 marks] Fit a multiple linear regression model to investigate the relationship between FVC and EXGROUP (as a categorical factor) after adjusting for AGEGROUP and SEX (both as categorical factors). Include your SPSS output and ensure you only include main effects and tick ‘Parameter estimates’ in the options to ensure your output includes these. Based on this model (and the results from part (f)), provide a summary of the relationship between FVC and EXGROUP and comment on the impact age and sex has on this relationship.
Question 3. [8 marks] Use SPSS and the file bsn81.sav available from the LMS pages to carry out the following statistical summaries and analyses.
⦁ [2 marks] Produce a single table that examines the percentage of individuals who have HAYFEVER in each SMOKING group and comment on the relationship between SMOKING and HAYFEVER. Perform an appropriate test to determine whether the relationship is statistically significant, providing a p-value and conclusion.
⦁ [3 marks] Fit a logistic regression model in SPSS, which tests whether there is an association between HAYFEVER (response variable) and SMOKING (as a categorical variable). Present relevant output from SPSS and highlight in your output the overall p-value for the effect of SMOKING. Based on this model, provide odds ratios, 95% confidence intervals and p-values describing the comparisons of all smoking categories compared to the never smokers (use Never smokers as the reference level).
⦁ [2 marks] Fit a similar logistic regression model to that fitted in (b) but include AGE (quantitative) and SEX (categorical) in the model in addition to SMOKING. Present relevant output from SPSS and based on this model (and the results of the model fitted in (b)) provide an interpretation of the relationship between SMOKING and HAYFEVER and comment on the effect age and sex has on this relationship.
⦁ [1 mark] Produce a single chart showing the prevalence of HAYFEVER by SMOKING and SEX.
For solutions, purchase a LIVE CHAT plan or contact us
Follow us on Instagram and tag 10 friends for a $50 voucher! No minimum purchase required.