Learning Outcomes
By the end of this session, students will be able to:
 Continue practicing basic software commands
 Learn how to explore the dataset, identifying the different types of variable stored
 Calculate the different measures of location and spread
 Plot frequency distributions and histograms
You can download a copy of the slides here: 1.3: Descriptive Statistics
Video A1.3 – Data Distributions and Descriptive Statistics (20 minutes)
A1.3a Practical: Stata – Summarising different types of variables
Remembering the skills you learned in the last section, please do the following:
 Set working directory using the command ‘cd <filepath>’
 Open your dataset in Stata
 Look at your dataset using ‘browse ‘
 Get to know your variables using the commands ‘describe’ , ‘codebook’, or ‘inspect’
Question 1.3a.i: After completing the steps above, can you classifying the following variables as Categorical, Binary, Ordinal, Continuous?

 Prior CVD
 SBP
 Frailty
 Death
Question 1.3a.ii: What are the measures of location (mean, median and mode) and the measures of spread (range, interquartile range and standard deviation) for the cholesterol levels of participants in dataset (use overall cholesterol ‘chol’ variable)?
A1.3a. Answers
Answer A1.3a.i: Prior CVD is a binary variable as it only has two values: 0 and 1; this is the same for the ‘death’ variable. SBP is a continuous variable. Frailty is an ordinal variable, with categories ascending from 15. We do not have any pure ‘categorical’ variables in this dataset, like ‘ethnicity’. All the grouped variables in this dataset (like age_grp, bmi_grp4) have an order to the categories so they ‘ordinal’.
Answer A1.3a.ii:
Mean = 5.51
Median = 5.47
Mode = 5.07
Standard deviation = 1.01
Range = 8.53
Interquartile range = 1.30
Answers above produced using code below:
tab chol, sort
tabstat chol, stat(n mean median sd iqr range)
A1.3b PRACTICAL: Stata – Bar charts and histograms
Bar charts
Bar charts are a useful way of comparing groups by a particular characteristic. We can tell Stata what summary statistic we wish to include in the bar chart, for example, the frequency within each category of a variable, or the mean of one variable within each level of another categorical variable.
For categorical variables, it can be useful to look at frequencies within each level. To do this, we use the ‘graph bar’ command and include ‘(count),’ followed by ‘over ([variable name])’. The following code will present a bar chart comparing the frequencies within each age group category:
graph bar (count), over (age_grp)
To look at percentages within each category:
graph bar (percent), over (age_grp)
 Explore the above command with some variables within your dataset.
We can also look at summary statistics of a continuous variable within each level of another, categorical variable. For example, the following code will produce a bar chart that presents the mean of vitamin D serum levels within each age group category:
graph bar vitd, over (age_grp)
To present the median of vitamin D by age group, you simply include (median) after the command ‘graph bar’:
graph bar (median) vitd, over (age_grp)
It is also possible to present the bar chart with multiple categorical variables. The following code will produce a bar chart presenting the mean vitamin D by age groups and history of cardiovascular disease.
graph bar vitd, over (age_grp) over (prior_cvd)
To add a title to the yaxis, we can use the following code:
graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D concentration)
To remove labels or change the size, you can use the following code:
graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D concentration) ylabel(, nolabels)
graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D concentration) ylabel(, labels labsize(small))
If comparing multiple variables on one chart, it can be useful to change the colour of bars. To do this, add in the following code ‘bar (1, fcolour([insert colour])’:
graph bar (mean) vitd, bar (1,fcolor(black)) ytitle(Mean vitamin D concentration)
 Try producing a number of different bar charts and play around with changing different features.
Histograms
When you want to look at the distribution of a variable, rather than comparing characteristics, you can use a histogram. A histogram can be produced for a continuous or categorical variable, as long as they are measured on an interval scale. Type ‘histogram [variable name]’.
histogram sbp
If the variable is not continuous, type ‘, discrete’ afterwards:
histogram bmi, discrete
A histogram is often used to check whether a variable is normally distributed. To add a normal distribution curve to the histogram, use the following code:
histogram bmi, discrete normal
To adjust the number of bins, include ‘, bin ([number of bins])’
histogram sbp, bin (20)
histogram sbp, bin (10)
You can also add a title and labels to the xaxis:
histogram bmi, discrete normal title (“Body Mass Index”)
histogram bmi_grp4, discrete normal title(“Body Mass Index”) xlabel (1 “Underweight” 2 “Normal weight” 3 “Overweight” 4 “Obese”)
It is also possible to show the percentage or frequency on a histogram. To do this, amend the code at the end of the histogram command.
histogram bmi, discrete percent
histogram bmi, discrete frequency
Grouping continuous data
There are different ways you can group continuous data to create a categorical variable. To this, firstly generate a duplicate variable, so you are not altering the original.
1. ‘xtile’
If you want to create a new variable with percentiles, the ‘xtile’ command is useful. For example, if you wish to produce deciles of systolic blood pressure:
xtile sbp10=sbp, nquantiles(10)
Or quartiles of systolic blood pressure:
xtile sbp4=sbp, nquantiles(4)
2. ‘cut’
If you want to create a variable with specific categories you can use the ‘egen’ function with the ‘cut’ command. The code below is an example of creating a new categorical systolic blood pressure variable. The new variable categories are <90 = low sbp; 90<120 = normal sbp; 120<130 = elevated sbp; ≥130 = high.
egen sbp_cat=cut(sbp), at(0,90, 120, 130, 231)
Note: that the max systolic blood pressure recorded in this population is 230, therefore, the cut off 231, includes all values below 231.
3. ‘recode’
The recoded command also works in the same way to the cut command above.
gen sbp_cat=recode(sbp, 90, 120, 130, 231)
4. ‘autocode’
The autocode command creates evenly spaced categories of a continuous variable:
gen [new var name]=autocode([original var name], [number of categories], [minimum], [maximum])
To create a new systolic blood pressure categorical variable, with 4 evenly spaced categories between 0 and 230:
gen sbp_cat=autocode(sbp, 4, 0, 230)
You can use the tab and tabstat commands to check that your new categorical variables include the correct categories. Use ‘label’ function to label the variable and the categories in your new variable.
 Questions A1.3b:
 Which type of variable can you plot with a bar chart? When should you use a histogram?
 Plot a histogram of total cholesterol and describe the distribution.
 Can you change the number of bins used to plot the histogram? What is the effect of changing the number of bins?
 Split total cholesterol into groups and make a bar chart of the number of participants in each cholesterol group. Can you give this graph a title? Can you label the y axis and change the colour of the bars in the chart?
Answers
Answer A1.3b.i:
A bar chart can be used to compare the frequency and percentage of participants within each level of a categorical variable. They can also be used to look at summary statistics of continuous variables, but only within level of categorical variables.
Histograms should be used to look at the distribution of data.
Answer A1.3b.ii:
histogram chol
(normally distributed)
Answer A1.3b.iii:
histogram chol
With too few bins it becomes difficult to identify the distribution of the data
histogram chol, bin(3)
Answer A1.3b.iv:
tab chol
gen chol_cat=recode(chol, 0, 5, 7.5, 11)
label var chol_cat “Categories of cholesterol”
label define chol_cat 5 “Normal” 7.5 “High” 11 “Very high”
label values chol_cat chol_cat
tab chol_cat
graph bar (count), over(chol_cat) bar(1, fcolour(black)) ytitle (Frequency)
A1.3a PRACTICAL: R – Summarising different types of variables
There are many different ways to generate basic descriptive statistics for your dataset in R, some of which have already been covered.
Here, we will cover several basic functions. Remember that there is always more information to be found online, and many good resources exist only a Google search away!
Summary
The summary function can be used to generate a minimum, 1^{st} quartile, median, median, 3^{rd} quartile , maximum, and number of NA datapoints for all numeric variables within a dataset. See the following:
summary(object.name)
summary(whitehall.data)
Note that this produces summary statistics for the entire dataset. To focus in on a specific column variable, we can use the $ operator as previously:
summary(Whitehall.data$bmi)
It is useful here to check that your minimum, maximum and mean are reasonable values. Try to generate some summary statistics for systolic blood pressure (SBP). Are these figures reasonable?
You can also obtain summary statistics that are differentiated by another variable, such as ‘age group’, using the ‘by’ function, as follows:
by(object.name, object.name$variable, summary)
by(whitehall.data, whitehall.data$age_grp, summary)
Does this generate any interesting information, or do any differences become immediately apparent?
Measures of Central Tendency
a) Mean
To calculate the mean, we use the following function.
mean(object.name$variable)
mean(Whitehall.data$sbp)
The above function may seem complete. We note, from our previous investigation, that there are some NA values within this dataset. Resultantly, we must remember to tell RStudio to calculate the mean without these values, using na.rm = TRUE . You can double check the values to see the difference.
mean(object.name$variable, na.rm = TRUE)
mean(Whitehall.data$sbp, na.rm= TRUE)
b) Median
We can compute median using a similar function:
median(object.name$variable, na.rm = TRUE)
median(whitehall.data$sbp, na.rm=TRUE)
c) Mode
install.packages(“DescTools”)
library(DescTools)
Mode(x, na.rm = FALSE)
*if there are missing values you can to make “na.rm = TRUE”
Mode(whitehall.data$bmi_grp4,na.rm=TRUE)
[1] 3
attr(,”freq”)
[1] 2091
The most common category of BMI group is ‘3’, with N=2091.
d) Quantiles
The quantile function can be used to calculate the median, and first and third quartiles, by using a second argument to define the percentage range ‘probs’ as a decimal figure between 0 and 1.
Remember that the first quartile = 25% of values, and the third = 75%. In this way, we can also use the quantile function to compute any percentile cutoff for our data.
quantile(object.name$variable, probs = , na.rm=TRUE)
quantile(whitehall.data$sbp, probs=0.25, na.rm=TRUE)
e) IQR
To calculate IQR, we can simply use the following:
IQR(object.name$variable)
IQR(Whitehall.data$sbp)
See if you can figure out how to calculate IQR using the quantile function!
quantile(whitehall.data$sbp, probs=0.75, na.rm=TRUE) – quantile(whitehall.data$sbp, probs=0.25, na.rm=TRUE)
f) Range
To calculate range, we simply subtract the maximum from minimum values, as shown:
max(object.name$variable) – min(object.name$variable)
max(Whitehall.data$sbp) – min(Whitehall.data$sbp)
g) SD and Variance
To calculate standard deviation and variance, you can use the following simple functions:
sd(object.name$variable)
var(object.name$variable)
 Question A1.3a.i: After completing the steps above, can you classify the following variables as Categorical, Binary, Ordinal, Continuous?

 Prior CVD
 SBP
 Frailty
 Death
 Question A1.3a.ii: What are the measures of central tendency (mean, median and mode) and the measures of spread (range, interquartile range and standard deviation) for the measured cholesterol of participants in our dataset (using the chol variable)?
Answer
Answer A1.3a.i:
 Prior CV: binary
 SBP: continuous
 Frailty: ordinal
 Death: binary
The answer to this question could use some of the knowledge that we gained in the above practical exercises, A1.2b about the structure of the data.
Answer A1.3a.ii:
To answer this question, we need to apply the knowledge we gained in exercise A1.3a.
For the measures of central tendency:
1) Mean
As earlier described, calculating mean and median is relatively simple. We just have to make sure to tell R how to treat missing values with our na.rm argument.
mean(whitehall.data$chol, na.rm=TRUE)
[1] 5.510199
2) Median
median(whitehall.data$chol, na.rm=TRUE)
[1] 5.47
3) Mode
Mode(whitehall.data$chol, na.rm = TRUE)
[1] 5.07
attr(,”freq”)
[1] 35
4) Range
As above, to calculate range we will subtract the maximum value from the minimum value.
max(whitehall.data$chol, na.rm=TRUE) – min(whitehall.data$chol, na.rm=TRUE)
[1] 8.53
5) IQR
As above, there are two ways to calculate IQR. Either method will give you the same answer, as below.
IQR(whitehall.data$chol, na.rm=TRUE)
quantile(whitehall.data$chol, probs=0.75, na.rm=TRUE) – quantile(whitehall.data$chol, probs=0.25, na.rm=TRUE)
[1] 1.3
6) SD
Calculating standard deviation can again be performed with a simple function.
sd(whitehall.data$chol, na.rm=TRUE)
[1] 1.00712
A1.3b PRACTICAL: R – Bar charts and histograms
R offers several powerful tools to make easily customisable visualisations of data.
Bar Charts
To make a simple barplot, we can use the barplot() function, and then input code in order to represent the data in different ways. The first step is to load several useful packages which will assist with our graphing. We then load these libraries. These opensource packages contain several additional tools that we can use.
install.library(tidyverse)
install.library(RColorBrewer)
library(tidyverse)
library(RColorBrewer)
We then generate a table, which we will assign as the object ‘bmi.counts’. It is this object we ask R to generate a barplot for.
bmi.counts < table (whitehall.data$bmi)
Within the barplot function, we can label the X and Y axis using ‘xlab=’
and ‘ylab=’, title the graph using ‘main=’, and change the colour of the bars using ‘col=’. Here, the package RColorBrewer allows us to generate a vector of (20) contiguous colours.
barplot (bmi.counts, xlab = “BMI”, ylab = “Number of People”, main = “BMI of Whitehall Participants”, col=heat.colors(20))
There is also opportunity to create stacked bar graphs in R. We can change the orientation by using the argument ‘horiz = TRUE’
Histograms
We can make a simple histogram using the ‘hist()’ function in R. This function takes in a vector of values for which the histogram will be plotted. As with before, we will modify the histogram to add axis labels and a title.
hist(whitehall.data$bmi, xlab = “BMI”, ylab = “Number of People”, main = “BMI of Whitehall Participants”, col=heat.colors(12), density=100)
Grouping Continuous Data
Grouping continuous data is simply an extension of the skills we used to earlier create our binary variable. There are many reasons why you would want to categorise a continuous variable, and cutoffs need to be defined for different categories. BMI is a skewed variable and different parts of the distribution of BMI may have different relationships with diseases that we investigate.
Our first step is to create a new, empty column, which we will title ‘bmi.grouped’:
whitehall.data$bmi.grouped < NA
Our subsequent code will act to populate this new column we’ve created in our data frame.
Our goal is to get R to assign values to this new column, which will be drawn from the existing BMI data recorded for those rows. Remember that each row is a collection (vector) of data that represents a different individual.
We will define BMI of less than 18.5 as 0, between 18 – 25 as 1,between 2530 as 2, and greater than or equal to 30 as 3. This will be then coded in. This can be achieved with the below simple operators:
whitehall.data$bmi.grouped [whitehall.data$bmi<18.5]<0
whitehall.data$bmi.grouped [whitehall.data$bmi >=18.5 & whitehall.data$bmi<25]<1
whitehall.data$bmi.grouped [whitehall.data$bmi >=25 & whitehall.data$bmi <30]<2
whitehall.data$bmi.grouped [whitehall.data$bmi >=30]<3
It is important to also check that this variable is in the right format, using class().
class(whitehall.data$bmi.grouped)
[1] “numeric”
R views our new variable as ‘numeric’. Clearly, however, it is an ordered categorical variable. Therefore, we have to ensure that R views the variable as a Factor. Simultaneously, while completing this task, we can also label our groups using the labels=c() function, remembering to use “” to denote different category titles. This can be completed as follows:
whitehall.data$bmi.grouped < factor (whitehall.data$bmi.grouped, labels=c(“<18.5”, “18.524.9”, “2529.9”, “>30”))
 Question A1.3b.i: Which type of variable can you plot with a bar chart? When should you use a histogram?
 Question A1.3b.ii: Plot the bar chart that counts the number of participants in each BMI group and save it. Can you give this graph a title? Can you label the y axis and change the colour of the bars in the chart?
 Question A1.3b.iii: Plot a histogram of SBP and describe the distribution.
 Question A1.3b.iv: Regroup SBP in a different way, and decide which grouping best represents the data.
 Question A1.3b.v: Can you change the number of bins used to plot the histogram of SBP? What is the effect of changing the number of bins?
Question A1.3b Answers
Answer A1.3b.i :
Categorical variables (such as our newly created BMI categories) can be visually represented using a bar graph.
Histograms should be employed for continuous numerical variables. In our current dataset, variables that could be appropriately represented using a histogram include systolic blood pressure, blood cholesterol, and LDL levels.
Answer A1.3b.ii:
To generate this bar graph, we will follow the same process as we did before we had grouped BMI into the new variable. Note that we’ve changed the number of colours on our contiguous spectrum to 4, to reflect the new number of categories.
bmi.grouped.graph < table (whitehall.data$bmi.grouped)
barplot (bmi.grouped.graph, horiz = TRUE, xlab = “BMI”, ylab = “Number of People”, main = “BMI of Whitehall Participants”, col=heat.colors(4))
Answer A1.3b.iii:
As above, the code for this histogram is as follows:
hist(whitehall.data$sbp, xlab = “SBP”, ylab = “Number of People”, main = “Histogram of SBP of Whitehall Participants”, col=heat.colors(12), density=100)
There appears to be a small right skew (positive skew) to this distribution but it is approximately normally distributed.
Answer A1.3b.iv:
We will group SBP according to the American College of Cardiology/American Heart Association guidelines for hypertension management, which classes blood pressure according to the following categories. The process for coding this new variable is near identical to the BMI categorisation — have a go yourself!
whitehall.data$sbp.grouped< NA
whitehall.data$sbp.grouped [whitehall.data$sbp<120]<1
whitehall.data$sbp.grouped [whitehall.data$sbp >=120 & whitehall.data$sbp <130]<2
whitehall.data$sbp.grouped [whitehall.data$sbp >=130 & whitehall.data$sbp <140]<3
whitehall.data$sbp.grouped [whitehall.data$sbp >=140]<4
class(whitehall.data$sbp.grouped)
whitehall.data$sbp.grouped < factor(whitehall.data$sbp.grouped, labels=c(“Normotensive”, “Elevated”, “Stage 1 Hypertension”, “Stage 2 Hypertension”))
Answer A1.3b.v:
Normally, R automatically calculates the size of each bin of the histogram. We may not find, however, that the default bins offer an appropriate or sufficient visualisation of the data. We can change the number of bins by adding an additional argument to our hist()code, breaks=()
hist(whitehall.data$sbp, xlab = “SBP”, ylab = “Number of People”, main = “Histogram of SBP of Whitehall Participants”, col=heat.colors(12), density=100, breaks = 1000)
If we try two extremes, we end up with two different graphs — and the distribution and skew is easier to assess.
We could say that this appears to be a more ‘normal’ distribution — even though the underlying data hasn’t changed!
hist(whitehall.data$sbp, xlab = “SBP”, ylab = “Number of People”, main = “Histogram of SBP of Whitehall Participants”, col=heat.colors(12), density=100, breaks = 10).
When represented with larger bins, however, the distribution does not seem to be quite as normally distributed. Some of the information at lower levels of SBP and in the far right tail is obscured.
A1.3a PRACTICAL: SPSS – Summarising different types of variables
Open the FoSSA Whitehall data set in SPSS.
Check that you have set up and classified all of your variables correctly in the previous practical.
We are now going to run some descriptive statistics on one of the variables.
In SPSS you can run a whole range of descriptives at once using the ‘Explore’ function.
Go to the menu bar at the top of the window and select
Analyze >> Descriptive Statistics >> Explore
When the Explore window opens, move the variable you are interested in into the ‘Dependent List’ box by selecting it and then clicking on the blue arrow next to the box.
At the bottom of the box in the ‘Display’ section, select ‘Statistics’, as we do not want to explore plots of the data at this stage.
Click OK to run the analysis.
Use this method to find the measures of central tendency (mean and median) and the measures of spread (range, interquartile range and standard deviation) for the cholesterol levels of participants in dataset (use overall cholesterol ‘chol’ variable).
Answer
Once you have run the analysis the answer tables pop up in a separate output window. You will see something that looks like this.
From this you can extract the descriptive statistics you were asked for in the question
Mean = 5.51
Median = 5.47
Standard deviation = 1.01
Range = 8.53
Interquartile range = 1.30
There are other options under the Descriptive Statistics heading from the Analyze menu which you can investigate to see what other statistics you can run on your data.
A1.3b PRACTICAL: SPSS – Bar charts and histograms
Bar Charts
Bar charts are a useful way of comparing groups by a particular characteristic. The most straightforward way to do this in SPSS is to use the Chart Builder function.
We can tell SPSS what summary statistic we wish to include in the bar chart, for example, the frequency within each category of a variable, or the mean of one variable within each level of another categorical variable.
Select
Graphs >> Chart Builder
A warning on ‘Define Variable Properties’ will pop up. You should have properly categorised all of your variables within the previous sections. So you can just press ‘OK’ to move on to creating a chart.
You will then see the ‘Chart Builder’ window open. Select ‘Bar’ from the ‘Gallery’ menu on the bottom left, then drag and drop the type of bar chart you are interested in into your previews window in the centre.
In this example we will select ‘Simple Bar’.
You can drag and drop your variables into the chart. Remember we want a categorical variable on the x axis, to define the groups, and then a continuous variable on the y axis, to define the height of the bar.
The ‘Element Properties’ tab on the righthand side allows you to edit the elements of the chart. Once you select a simple bar chart the default selection is ‘Bar1’
The ‘Statistics’ box allows you to define which statistic to display. ‘Count’ will display the number (frequency) of individuals in a category and does not require a continuous variable on the y axis. ‘Mean’ will show the mean of each category as the height of the bar, ‘Median’ will show the median of each category, and so on. You can also select the box which says ‘Display error bars’ and decide what you would like these error bars to show.
You can also select to edit the properties of the x and y axes. This allows you to customise scales or write axis labels at this stage.
If you click on ‘Chart Appearance’ this allows you to define the colours or your bars as well at this stage.
If you forget to do any of these formatting steps in the set up though don’t worry. SPSS allows you to edit your graph in the output window as well. All you need to do it double click on it and a new editing window will open.
Histograms
When you want to look at the distribution of a variable, rather than comparing characteristics, you can use a histogram. A histogram can be produced for a continuous or categorical variable, as long as they are measured on an interval scale.
In SPSS you just need to open the Chart Builder, select ‘Histogram’ from the menu on the bottom left and then drag and drop the ‘Simple Histogram’ into the preview window.
Once you have done this you will see the option in the element properties tab to add a normal curve, which can help with assessing the distribution of the data. You can also click on ‘Set Parameters’ which allows you to change the number or range of the bins (groups) in your histogram.
Grouping continuous data
You can group continuous data to create a new categorical variable in SPSS in the following way
Select
Transform >> Recode into Different Variables
Move your existing continuous variable into the centre box, then name your new variable in the Output Variable section on the right, then press ‘Change’.
Then select ‘Old and New Values’. In the Old Value section use the ‘range’ functions to define the range of values you want to code into each category. Then select a number for your category in the New Value section. Then press ‘Add’ and your change will appear in the Old — > New box on the right.
Once you have set up all of your recoded variables press continue and then OK, then your new variable will appear in your data and variable views. You can now allocate a measurement type and labels for each of the values (low, medium, high for example) in the variable view as you did when you initially set up your data.
Now try to answer the following questions
 Which type of variable can you plot with a bar chart? When should you use a histogram?
 Plot a histogram of total cholesterol and describe the distribution.
 Can you change the number of bins used to plot the histogram? What is the effect of changing the number of bins?
 Split total cholesterol into groups and make a bar chart of the number of participants in each cholesterol group. Can you give this graph a title? Can you label the y axis and change the colour of the bars in the chart?
Answer
 A bar chart can be used to compare the frequency and percentage of participants within each level of a categorical variable. They can also be used to look at summary statistics of continuous variables, but only within level of categorical variables. Histograms should be used to look at the distribution of data.
The histogram indicates that the data are normally distributed. Note how the bars approximately follow the normal curve which has been added.
III. With too few bins it becomes difficult to identify the distribution of the data
Example with bins set to 10
Example with bins set to 5
IV.
Example using 05mmol/l = Normal, 5.017.5 mmol/l = High, 7.51+ mmol/l = Very high.
You can change the bars to any colour you wish within the Chart Builder or the editing window.
I am getting an error message for this command. What might be the problem?
histogram bmi_grp4, discrete normal title(“Body Mass Index”) xlabel (1 “Underweight” 2 “Normal weight” 3 “Overweight” 4 “Obese”)
Hi, the slides that are available for download are different than the ones in the video.
Am experiencing the same issue with installing the color brewer option
I have also tried..install.packages(DescTools) but also in vain
Have you tried putting DescTools in quotations:
install.packages(“DescTools”)
Yes
I am having a challenge calculating the mode due to the version of R am using, 4.3.1. I have upgraded it to 4.3.2 but DescTools still isn’t being installed. Please assist.
Please can you share what code you have been using so far to try and install this?
mode(Whitehall_fossa$chol,na.rm=TRUE)
I see the issue – you need to use a capital M in Mode:
Mode(Whitehall_fossa$chol,na.rm=TRUE)
I think, I just watched one of the precise and concise summary of descriptive statistics ever. Thank to the Graph and Oxford team. ❤️
Great explanation!