Here is a possible solution for the practical
0)\(~\)Load required packages
# Load packages
library(dplyr)
library(readr)
1)\(~\)Load the dataset haiti-healthsites.csv into R.
# Read in health centre data
hc <- read_csv('data/haiti/haiti-healthsites.csv')
2)\(~\)Select the variables required for analysis (health facilities, admin areas, population).
hc <- select(hc, name, adm1_en, adm2_en, total)
3)\(~\)Filter the data so that only the data for Northern departments remains.
adm2north <- filter(hc, adm1_en %in% c("North", "North-East", "North-West"))
4)\(~\)Group the data by the smallest scale admin level. 5)\(~\)Create summary variables of the number of health facilities and the population per commune.
adm2north <- adm2north %>%
group_by(adm1_en, adm2_en) %>%
summarise(healthcentres = n(), pop = first(total))
## `summarise()` has grouped output by 'adm1_en'. You can override using the `.groups` argument.
The pipe (%>%) operator is used to link the adm2north object to the next line of code. This can be read as take the dataframe adm2north and then group by adm1_en and adm2_en and then summarise the groups by counting the health centre and taking the first value for the population.
Note: population is already at the aggregate admin 2 level so the first value is taken.
6)\(~\)Calculate the number of people per health facility.
adm2north <- adm2north %>%
mutate(pop_per_health = pop/healthcentres)
7)\(~\)Sort the data to find the areas with the highest number of people per health facility.
arrange(adm2north, desc(pop_per_health))
| adm1_en | adm2_en | pop_per_health | healthcentres |
|---|---|---|---|
| North-West | Chamsolme | 30361.0 | 1 |
| North | Limbe | 28434.0 | 3 |
| North | Pilate | 27025.0 | 2 |
| North | Borgne | 22307.0 | 3 |
| North | Saint-Raphael | 17918.3 | 3 |
In one block
Note that steps 1 to 6 could be chained together.
# Load packages
library(dplyr)
library(readr)
# Read in health centre data
hc <- read_csv('data/haiti/haiti-healthsites.csv')
adm2north <- hc %>%
select(name, adm1_en, adm2_en, total) %>%
filter(adm1_en %in% c("North", "North-East", "North-West")) %>%
group_by(adm1_en, adm2_en) %>%
summarise(healthcentres = n(), pop = min(total)) %>%
mutate(pop_per_health = pop/healthcentres)
# Sort by population per health centre
arrange(adm2north, desc(pop_per_health))