Exploring Census Household Pulse Survey Part 1
Background
The Census Bureau began the Household Pulse Survey to measure the impacts of the Coronavirus Pandemic on the U.S. Household Population. This post will demonstrate some basics of downloading the data, getting it into R, and doing some simple analysis.
Download this post as an R Markdown file here.
Getting the data
This analysis will be based around using the Public Use File (PUF). The PUF contains the person-level responses to the survey and can be used to produce custom estimates. The PUFs for each week are published at https://www.census.gov/programs-surveys/household-pulse-survey/datasets.html.
The code below will download and unzip the data.
# icesTAF::mkdir("Data")
# download.file("https://www2.census.gov/programs-surveys/demo/datasets/hhp/2020/wk1/HPS_Week01_PUF_CSV.zip", "Data/HPS_Week01_PUF_CSV.zip")
# unzip("Data/HPS_Week01_PUF_CSV.zip", exdir = "Data/HPS_Week01_PUF_CSV")
Working with the data
library(forcats)
library(scales)
library(srvyr)
library(tidyverse)
First, we read in the PUF.
puf <- read_csv(file = "Data/HPS_Week01_PUF_CSV/pulse2020_puf_01.csv")
The PUF contains the PWEIGHT variable to produce (weighted) estimates. In order to calculate standard errors though, we also need the “Replicate Weights” file, attaching it to the PUF.
repweights <- read_csv(file = "Data/HPS_Week01_PUF_CSV/pulse2020_repwgt_puf_01.csv")
puf_w_weights <- inner_join(puf, repweights, by = c("SCRAM","WEEK"))
Now, we convert data frame to survey object. This allows for calculating summary statistics without re-specifying the weight each time.
wgts <- colnames(repweights)[3:length(colnames(repweights))]
survey_puf <- as_survey_rep(puf_w_weights, id = SCRAM, weights = PWEIGHT,
repweights = all_of(wgts), type = "Fay", rho = 0.5005)
And now we should be all set to start analyzing the data. First, let’s make sure we know what we are doing by estimating something that already appears in the Detailed Tables. Specifically, we’ll look at the Housing 2b table. The table states that there are 8,918,242 persons in renter occupied housing units with No Confidence in the Ability to Pay Next Month’s Rent, 12,571,649 with slight confidence, and so on. We can replicate these numbers, adding the category IDs from the data.
renters_payment_confidence <-
survey_puf %>%
filter(WEEK == "1" & TENURE == "3") %>%
group_by(MORTCONF) %>%
survey_count() %>%
mutate_if(is.numeric, round, digits = 0)
renters_payment_confidence$MORTCONF <- factor(renters_payment_confidence$MORTCONF, labels =
c( "Question Seen But Category Not Collected",
"Missing / Did Not Report",
"No Confidence",
"Slight Confidence",
"Moderate Confidence",
"High Confidence",
"Payment Deferred"))
knitr::kable(renters_payment_confidence, format.args = list(big.mark = ","))
MORTCONF | n | n_se |
---|---|---|
Question Seen But Category Not Collected | 170,927 | 61,811 |
Missing / Did Not Report | 153,139 | 37,374 |
No Confidence | 8,918,242 | 377,552 |
Slight Confidence | 12,571,649 | 374,676 |
Moderate Confidence | 18,070,862 | 480,523 |
High Confidence | 30,643,777 | 609,009 |
Payment Deferred | 938,815 | 153,909 |
We see that we were able to successfully reproduce the estimates. However, the standard errors are slightly off. (If anyone knows why, please let me know.)
Now we can produce custom estimates. For example, the pulse asks respondents a series of questions about their mental health. Health Table 2a lists Symptoms of Anxiety By Selected Characteristics. However, the respondent’s housing situation is not one of the characteristics.
Let’s look at the symptoms of anxiety for renters.
anxiety_for_all_renters <- survey_puf %>%
filter(WEEK == "1" & TENURE == "3") %>%
group_by(ANXIOUS) %>%
summarise(proportion = survey_mean())
anxiety_for_all_renters$Group <- "All Renters"
anxiety_for_renters_w_no_conf <- survey_puf %>%
filter( WEEK == "1" &
TENURE == "3" &
MORTCONF == "1" ) %>%
group_by(ANXIOUS) %>%
summarise(proportion = survey_mean())
anxiety_for_renters_w_no_conf$Group <- "Renters With No Confidence
in Paying Next Month's Rent"
anxiety_for_renters <- rbind(anxiety_for_all_renters, anxiety_for_renters_w_no_conf)
anxiety_for_renters$ANXIOUS <- factor(anxiety_for_renters$ANXIOUS, labels = c("Missing",
"Not at all",
"Several days",
"More than half the days",
"Nearly every day"))
ggplot(anxiety_for_renters, aes(x = ANXIOUS, y = proportion, fill = Group)) +
geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 45)) +
xlab("Over the last 7 days, how often have you been bothered by the
following problems: Feeling nervous, anxious, or on edge? Would you
say not at all, several days, more than half the days, or nearly every
day?") +
ylab("Share of Respondants") +
scale_y_continuous(labels = scales::percent)
As might be expected, renters who report not being confident in their ability to pay next month’s rent also report feeling anxious more often.
One thing to note about the Pulse is that tenure (whether the home is owned or rented) is missing for a large number of the respondents.
tenure_w_missing <- survey_puf %>%
filter(WEEK == "1") %>%
group_by(TENURE) %>%
summarise(proportion = survey_mean())
tenure_w_missing$TENURE <- factor(tenure_w_missing$TENURE, labels =
c( "Question Seen But Category Not Collected",
"Missing / Did Not Report",
"Owned free and clear",
"Owned with a mortgage",
"Rented",
"Occupied without payment of rent"))
tenure_w_missing$proportion <- scales::label_percent()(tenure_w_missing$proportion)
tenure_w_missing[,1:2]
## # A tibble: 6 x 2
## TENURE proportion
## <fct> <chr>
## 1 Question Seen But Category Not Collected 0.5%
## 2 Missing / Did Not Report 10.2%
## 3 Owned free and clear 18.3%
## 4 Owned with a mortgage 40.8%
## 5 Rented 28.7%
## 6 Occupied without payment of rent 1.5%
We can see from below that even though tenure is missing for a large share of respondants, the proportion of owners and renters appears in line with that reported in the American Community Survey.
tenure_no_missing <- survey_puf %>%
filter(WEEK == "1" & TENURE != "-88" & TENURE != "-99") %>%
group_by(TENURE) %>%
summarise(proportion = survey_mean())
tenure_no_missing$TENURE <- factor(tenure_no_missing$TENURE, labels =
c( "Owned free and clear",
"Owned with a mortgage",
"Rented",
"Occupied without payment of rent"))
tenure_no_missing$proportion <- scales::label_percent()(tenure_no_missing$proportion)
tenure_no_missing[,1:2]
## # A tibble: 4 x 2
## TENURE proportion
## <fct> <chr>
## 1 Owned free and clear 21%
## 2 Owned with a mortgage 46%
## 3 Rented 32%
## 4 Occupied without payment of rent 2%