Weekly Update: Survey data from Carers of Elderly Person — Assessing how carers behave and how we could help them

Tony Wang
5 min readJun 6, 2021

--

Recently I am involved in a research project about carers of elderly person here in Hong Kong. Although it has only been a week, I have already did quite many data processing and analysis.

Background Information

From the lifespan perspective, an increasing number in the population will suffer from various levels of physical and/or cognitive impairment. Regular unpaid/informal care will be required for supporting their daily function and meeting their health and social care needs as appropriate. There are two major groups. Elderly persons will require more assistance due to ageing. Persons with disabilities will require assistance in daily living due to inborn disabilities, illnesses, tragedies or other unexpected events that happened in life.

In Hong Kong, while most elderly persons can live independently, approximately 2.5% (roughly 280,500) of community-dwelling older adults require assistance in daily living (Census and Statistics Department, 2009, 2012, 2013). Carers of elderly persons are persons taking care of frail elders who are of age 60 and above and who have different levels of physical and/or physical and/or cognitive impairments, and require assistance in their ADL, such as eating, walking, and bathing, and/or IADL, such as shopping, preparing meals, and taking medications, as well as emotional, financial, and decision-making support. According to the latest statistics, adult children constitute the largest group (37.3%) of unpaid carers, followed by spouses (26.3%) (Census and Statistics Department, 2009). When available, daughters provided the most care supports for community-dwelling centenarians in Hong Kong.

This Project proposed a multi-step risk assessment and care management approach with the aim to (1) coordinating and building a network for “graduated carers” and current carers to enhance care capacity of carers and increase social capital; and (2) extending the concept of “age-friendly districts” to include “carers of elderly to envisage age-friendly policy.

Our original data

The original data are downloaded from Qualtrics in several excel files. Since the data collection period is not yet over, we may even update the dataset from time to time. So I decided to write some functions to update and clean the dataset. However, the survey data is extremely complex, having more than columns. These variables could be information about caregivers (age, jobs…..) and carer recipients. Things could get really complex about care recipients because caregivers could have more than one care recipients.

Therefore, I first wrote this extract_loop function to loop through the care recipients and extract them out. You may notice that I used a file whose path is called ‘baseline_path’. This is the essence of whole thing. What I have done here is to creat the structure of my data(once you can organize all the observations in one structure, this dataset is basically cleaned). However, another challenge is that every carers have different past experience, we have to deal with the existence of some blank entries, which could be resolved by a simple filter().

extract_loop <- function(dir, filename, baseline_path, name_columns){
path = paste(dir, filename,sep = '/')
data <- read_xlsx(path = path, sheet = 'Sheet0')
vars <- colnames(data)
#Start the loop here
i <- 1
loop <- data.frame()
for(var in vars){
if(substr(vars[i],1,1) %in% as.character(1:9))
{
print(vars[i])
temp <- data.frame(data[-1,i:(i+16)])
if(sum(!is.na(temp)) != 0){
colnames(temp) <- name_columns[12:28]
temp <- cbind(data[c('Org','Case_ID')][-1,],temp)
loop <- bind_rows(loop,temp)
}
i <- i+16
if(i > length(vars)){
break
}
}
i <- i+1
if(i > length(vars)){
break
}
}
loop <- filter(loop, !is.na(A1_Q4))
return(loop)
}

Next we may start to deal with the information on the giver’s side. Tidyverse functions could be used to select the important questions.

#extract caregiver information from one survey
extract_info <- function(dir, filename, baseline_path, columns){
path = paste(dir, filename,sep = '/')
data <- change_name(read_xlsx(path = path, sheet = 'Sheet0'))
name_columns <- names(columns)
start_row <- 18 #This is the row of Org from survey

names(data)[start_row:(10 + start_row)] <- name_columns[1:11]
info <- select(columns, name_columns[c(1:11, 29:length(columns))])
#temp <- data[-1,c(start_row:(10 + start_row), (29 + length(names(data)) - length(columns)):length(names(data)))]
temp <- select(data, name_columns[1:11],name_columns[29:length(columns)])[-1,]
#names(temp) <- names(info)
info <- rbind(info, temp)
return(info)
}

We now have two functions. Another function is needed to loop through all the excel files.

#integrate function integrates all the information extracted from loop
integrate <- function(dir, baseline_path){
files <- list.files(dir)
columns <- read_xlsx(path = baseline_path, sheet = 'All SPSS Code')
name_columns <- names(columns)
care_recipients <- data.frame()
care_giver <- columns
for(file in files){
care_recipients <- rbind(care_recipients, extract_loop(dir,file, baseline_path, name_columns))
care_giver <- rbind(care_giver, extract_info(dir,file,baseline_path, columns))
#Remove the repetitive answers
care_recipients <- care_recipients[!duplicated(care_recipients),]
care_giver <- care_giver[!duplicated(care_giver),]
return(list(care_giver,care_recipients))}
}

You may notice that we have “Remove the repetitive answers” comment in the code. This is a special situation that occurs when the survey is not filled in properly. Function duplicated is used to eliminate the unnecessary repetitions.

Finally, we could put all the functions into a tidy and elegant code which also includes writing the tibble into a new excel and returns all the cleaned data.

#cleansurvey function out put the cleaned data.cleansurvey <- function(path_recipient, path_giver, dir, baseline_path){
result <- integrate(dir,baseline_path)
write.csv(result[1], path_giver)
write.csv(result[2], path_recipient)
names(result) <- c('Giver', 'Recipient')
return(result)
}

Mark the data

The last section I would like to cover is how to do the marking and assign weights to answers.

Marking <- function(dir,baseline_path, result){
Mark_baseline <- read_xlsx(path = baseline_path, sheet = 'Qs Marking')

temp <- result$Giver
for(qs in names(Mark_baseline)[-1]){
temp[[qs]] <- as.vector(sapply(temp[[qs]],
function(x){
if(is.na(x)){
return(0)
}else{
return(match(x,Mark_baseline[[qs]]))
}
}))
}

resultmarked <- c(result,Giver_marked = list(temp))

return(resultmarked)
}

This piece of code seems quite neat considering the fact that dozens of questions will be marked through it. The secret lies in the baseline file I created. It includes all the answers for certain questions.

QS Marking sheet from Baseline

But it is also quite plausible if you prefer factor. For example, we can manipulate the data to do a plot of sleep condition and health condition. The y-axis is the mean health score for certain sleeping condition. And x-axis is the reordered Q26(factor) based on health score. We can see that health condition and sleeping condition is highly correlated.

result$Giver %>% 
select(contains(c('Q19','Q26'))) %>%
mutate_at(vars(-Q19_EQ5D_VAS_1),as.factor) %>%
mutate(Q19_EQ5D_VAS_1 = as.integer(Q19_EQ5D_VAS_1)) %>%
group_by(Q26_SLEEP) %>%
summarise(mean = mean(Q19_EQ5D_VAS_1)) %>%
ggplot()+
geom_point(aes(x =fct_reorder(Q26_SLEEP,mean),y = mean))

Summary

It would be too long if I include radar plots and grouping in this blog. But that would be the emphasis of next week. See you.

--

--

Tony Wang
Tony Wang

Written by Tony Wang

I am currently a UG at HKU, study statistics and computer science. Economics also attracts me a lot. You can also call me a active effective altruism practioner

No responses yet