Introduction to Probability and StatisticsΒΆ
AssignmentΒΆ
In this assignment, we will use the dataset of diabetes patients taken from here.
import pandas as pd
import numpy as np
df = pd.read_csv("../../data/diabetes.tsv",sep='\t')
df.head()
In this dataset, columns as the following:
Age and sex are self-explanatory
BMI is body mass index
BP is average blood pressure
S1 through S6 are different blood measurements
Y is the qualitative measure of disease progression over one year
Letβs study this dataset using methods of probability and statistics.
Task 1: Compute mean values and variance for all valuesΒΆ
Task 2: Plot boxplots for BMI, BP and Y depending on genderΒΆ
Task 3: What is the the distribution of Age, Sex, BMI and Y variables?ΒΆ
Task 4: Test the correlation between different variables and disease progression (Y)ΒΆ
Hint Correlation matrix would give you the most useful information on which values are dependent.