Introduction to Probability and StatisticsΒΆ

AssignmentΒΆ

In this assignment, we will use the dataset of diabetes patients taken from here.

import pandas as pd
import numpy as np

df = pd.read_csv("../../data/diabetes.tsv",sep='\t')
df.head()

In this dataset, columns as the following:

  • Age and sex are self-explanatory

  • BMI is body mass index

  • BP is average blood pressure

  • S1 through S6 are different blood measurements

  • Y is the qualitative measure of disease progression over one year

Let’s study this dataset using methods of probability and statistics.

Task 1: Compute mean values and variance for all valuesΒΆ

Task 2: Plot boxplots for BMI, BP and Y depending on genderΒΆ

Task 3: What is the the distribution of Age, Sex, BMI and Y variables?ΒΆ

Task 4: Test the correlation between different variables and disease progression (Y)ΒΆ

Hint Correlation matrix would give you the most useful information on which values are dependent.

Task 5: Test the hypothesis that the degree of diabetes progression is different between men and womenΒΆ