Presentation Exercise

Get the original data

I found this data set from the FiveThirtyEight website about American opinions on the Coronavirus in relation to political events from March 2020 to April 2021: https://data.fivethirtyeight.com/ I chose to use the covid_concern_polls.csv file to recreate the graph about how worried Americans are about Covid infection, which can be found here. I added the original graph in this section for convenience. Image Alt Text I examined the data to discover there are start and end dates for each response and four levels of worry for each topic: very, somewhat, not very, and not at all, which are all represented on the graph I want to replicate. There are 678 observations of 15 variables stored in this dataset.

library(readr)
library(here)
pollsdata <- read_csv(here("presentation-exercise", "covid_concern_polls.csv"))
str(pollsdata)
spc_tbl_ [678 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ start_date : Date[1:678], format: "2020-01-27" "2020-01-31" ...
 $ end_date   : Date[1:678], format: "2020-01-29" "2020-02-02" ...
 $ pollster   : chr [1:678] "Morning Consult" "Morning Consult" "YouGov" "Morning Consult" ...
 $ sponsor    : chr [1:678] NA NA "Economist" NA ...
 $ sample_size: num [1:678] 2202 2202 1500 2200 1000 ...
 $ population : chr [1:678] "a" "a" "a" "a" ...
 $ party      : chr [1:678] "all" "all" "all" "all" ...
 $ subject    : chr [1:678] "concern-economy" "concern-economy" "concern-infected" "concern-economy" ...
 $ tracking   : logi [1:678] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ text       : chr [1:678] "How concerned are you that the coronavirus will impact the following? U.S. economy" "How concerned are you that the coronavirus will impact the following? U.S. economy" "Taking into consideration both your risk of contracting it and the seriousness of the illness, how worried are "| __truncated__ "How concerned are you that the coronavirus will impact the following? U.S. economy" ...
 $ very       : num [1:678] 19 26 13 23 11 11 22 22 22 10 ...
 $ somewhat   : num [1:678] 33 32 26 32 24 28 23 35 21 28 ...
 $ not_very   : num [1:678] 23 25 43 24 33 39 37 28 33 42 ...
 $ not_at_all : num [1:678] 11 7 18 9 20 22 19 15 23 19 ...
 $ url        : chr [1:678] "https://morningconsult.com/wp-content/uploads/2020/02/200167_crosstabs_CORONAVIRUS_Adults_v2_JB-1.pdf" "https://morningconsult.com/wp-content/uploads/2020/02/200191_crosstabs_CORONAVIRUS_Adults_v2_JB-1.pdf" "https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf" "https://morningconsult.com/wp-content/uploads/2020/02/200214_crosstabs_CORONAVIRUS_Adults_v4_JB.pdf" ...
 - attr(*, "spec")=
  .. cols(
  ..   start_date = col_date(format = ""),
  ..   end_date = col_date(format = ""),
  ..   pollster = col_character(),
  ..   sponsor = col_character(),
  ..   sample_size = col_double(),
  ..   population = col_character(),
  ..   party = col_character(),
  ..   subject = col_character(),
  ..   tracking = col_logical(),
  ..   text = col_character(),
  ..   very = col_double(),
  ..   somewhat = col_double(),
  ..   not_very = col_double(),
  ..   not_at_all = col_double(),
  ..   url = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
dim(pollsdata)
[1] 678  15

Ask AI to recreate the original graph

This is the original prompt that I asked ChatGPT 3.5: R code to generate the graph titled “How worried are Americans about infection?” on this website https://projects.fivethirtyeight.com/coronavirus-polls/ using the covid_concern_polls.csv file from this github repository https://github.com/fivethirtyeight/covid-19-polls

I used the links to the website and GitHub repository to ensure that it knew which graph I am trying to recreate, which has been saved as Inspiration_graph.png in the presentation-exercise folder. The first step it gave me was to load three packages: readr, dplyr, and ggplot2. The second step was to save the data as an object, which I have already done. Next, it suggested filtering the data for the question regarding infection: “How worried are you about personally being infected with coronavirus?”. I made several modifications to the code because it did not know the object name that I stored my data in, and it did not know the exact verbiage of the question that was asked. This was not successful because I do not have one “estimate” for the percentage of people in each of the worry categories ranging from not_at_all to very. I should have known that this code would not be accurate because it said these steps will produce a polar bar graph when I am trying to produce a line graph. I commented out this code for the sake of rendering my website with this exercise.

# load required packages
library(dplyr)
library(ggplot2)

# modified filtering code
# concern_data <- pollsdata %>%
 # filter(text == "How worried are you about you or someone in your family being infected with the Coronavirus??") %>%
 # select(starts_with("estimate"))

This is the second prompt I asked ChatGPT: R code to produce a line graph with percentages of four different categories “very”, “somewhat”, “not very”, and “not at all” on the y-axis and time on the x-axis using the covid_concern_polls.csv file from this github repository https://github.com/fivethirtyeight/covid-19-polls

I asked for a line graph specifically, and I know that I need to create variables with the percentages of each response category. I filtered for the subject of interest: “concern-infected” to limit the data to information about concern over Coronavrius infection. The graph models how concerned Americans are that they, someone in their family or someone else they know will be infected with Coronavirus, so I filtered for responses to all of the questions asking about concern over Coronavirus infections, which yielded 77 observations.

# filter for all responses about Coronavirus concern
concern_data <- pollsdata %>% 
    filter(subject == "concern-infected")

The next suggestion from ChatGPT was to create percentages for each of the response variables. The first variable that had to be changed here is the grouping factor because modeldate was not included in the original dataset. Additionally, there were no “estimate” variables, so I just used the column names instead because each column had count data for that observation.

# group data by date and calculate percentages for each category
concern_data2 <- concern_data %>%
  group_by(end_date) %>%
  summarise(
    very_percent = sum(very) / n(),
    somewhat_percent = sum(somewhat) / n(),
    not_very_percent = sum(not_very) / n(),
    not_at_all_percent = sum(not_at_all) / n()
  )

# check that it worked 
str(concern_data2)
tibble [226 × 5] (S3: tbl_df/tbl/data.frame)
 $ end_date          : Date[1:226], format: "2020-02-04" "2020-02-09" ...
 $ very_percent      : num [1:226] 13 11 11 22 16 10 16 19 23 30 ...
 $ somewhat_percent  : num [1:226] 26 24 28 23 24.5 30 31 33 32.5 32 ...
 $ not_very_percent  : num [1:226] 43 33 39 37 37.5 40 27 32.5 27 24 ...
 $ not_at_all_percent: num [1:226] 18 20 22 19 21 20 16 15 17 15 ...

ChatGPT suggested changing the shape of the data before attempting to visualize it. It did not account for the fact that I need the tidyr package to use the pivot_longer() function, so I loaded that package here. This transformation left me with a data set with 276 observations of 3 variables.

# load packages
library(tidyr)

# convert shape of the dataset
concern_data_long <- concern_data2 %>%
  pivot_longer(cols = c(very_percent, somewhat_percent, not_very_percent, not_at_all_percent),
               names_to = "Concern Level",
               values_to = "Percentage")

Lastly, I attempted to recreate the plot from the original webpage. The first attempt is not bad, but it looks like I have missing data in the “not_at_all_percent” variable which was not the case in the graph on the website.

# attempt to recreate the original plot
graph1 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
  geom_line() +
  labs(title = "COVID-19 Concern Levels Over Time",
       x = "Date",
       y = "Percentage",
       color = "Concern Level") +
  theme_minimal()
graph1

After finding there are 6 missing observations of the Percentage variable, I decided to omit those values because they account for such a minimal percentage of total observations, which fixed the strange gap in the line graph.

# explore for missing data and remove it
sum(is.na(concern_data_long$Percentage))
[1] 8
concern_data_long <- na.omit(concern_data_long)

# check that missing data removal fixed the issue
graph2 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
  geom_line() +
  labs(title = "COVID-19 Concern Levels Over Time",
       x = "Date",
       y = "Percentage",
       color = "Concern Level") +
  theme_minimal()
graph2

I need to make aesthetic changes, so the replicated graph matches the original graph more accurately. The title needs to be changed and centered. I used ChatGPT to create code to separate the title into three lines and change the size of the second title line, so it will fit better. I got a warning message that vectorized input to element_text is not supported in ggplot2, so all three lines of the title are the same size. The original x-axis labels each month, so I created custom labels and dropped the “date” label for the entire axis. While the original graph is interactive, that is slightly out of reach with my coding knowledge right now, so I decided to keep the stagnant legend. I noticed the legend was inverted, so I corrected that. I manually changed the colors of the lines using hex codes.

# create 14 breaks for the x-axis and custom labels for each break
breaks <- seq(as.Date("2020-02-28"), by = "month", length.out = 14)
custom_labels <- c("3/1/20", "4/1", "5/1", "6/1", "7/1",
                   "8/1", "9/1", "10/1", "11/1", "12/1", "1/1/21", "2/1", "3/1", "4/1")

# create updated version of the graph with modifications
graph3 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
    geom_line() +
    scale_color_manual(values = c("#800080","#BA55D3","#FFDAB9","#FF0000")) +
    labs(title = "How worried are Americans about infection?",
         subtitle = paste("How concerned Americans say they are that they, someone in their family or someone else they know will", "\n", "become infected with the coronavirus")) +
    theme(plot.title = element_text(hjust = 0.5, size = 10), 
          plot.subtitle = element_text(hjust = 0.5, size = 8)) +
    scale_x_date(breaks = breaks, labels = custom_labels) +
    xlab(NULL) + 
    guides(color = guide_legend(reverse = TRUE))
graph3

Lastly, I need to add labels on specific dates. I used this simple prompt in ChatGPT: how to add labels to specific dates on a line graph. It suggested adding a geom_text() layer. I had to go back and forth with ChatGPT a couple times to find a date format that worked. Unfortunately, I received a consistent error about the geom_text() layer not being able to find the Percentage variable which prevented it from adding the labels onto my existing graph. I attempted the same format with geom_label() and got the same error. I added an arbitrary Percentage variable to the labels_data just to see if that would fix the error, but it was also unsuccessful. I would appreciate any input on how to solve this error. I commented out the last piece of code for the sake of rendering my website.

# create custom labels to be added as another layer on the original graph 
labels_data <- data.frame(
  modeldate = as.Date(c("2020-02-29", "2020-05-28", "2020-10-02", "2020-11-07", "2021-01-02")),  
  label_text = c("First US death reported", "US deaths surpass 100,000", "Trump diagnosed with Covid-19", "Biden declared election winner", "Biden sworn into office",
  Percentage = c(60, 60, 60, 60, 60))
)

 ## graph4 <- graph3 + 
 ##  geom_label(data = labels_data, aes(x = modeldate, y=Percentage, label = label_text))

After receiving input from a classmate, Erick Mollinedo, I added several geom_vline() and geom_text() functions to add the missing labels. The geom_vline() functions created dashed lines on each of the important dates, which I think solved the previous issue because it provided a location for the new layer to be added to in ggplot(). The geom_text() functions allow the actual descriptions to be added. The labels were a little too large for my graph originally, so I changed the size of each one to make them fit better. This addition brought me much closer to replicating the original graph although my lines appear to be significantly more jagged than the original. I also received a suggestion to use the ggthemes package to make the final replication appear closer to the original by adding the theme_538() function. I received multiple errors about not being able to find this function, so ChatGPT suggested creating my own function to mimic the 538 theme. This change removed the gridlines in the background and adjusted the sizing of the x-axis labels to match the original graph more closely.

# create a function to mimic the 538 style
theme_538 <- function() {
  theme_minimal() +
    theme(
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      panel.background = element_blank(),
      axis.line = element_line(color = "black"),
      axis.ticks = element_line(color = "black"),
      plot.title = element_text(size = 14, hjust = 0.5)
    )
}

# apply the 538 style function and add labels to dates of interest
graph4 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
    geom_line() +
    theme_538() +
    scale_color_manual(values = c("#800080","#BA55D3","#FFDAB9","#FF0000")) +
    labs(title = "How worried are Americans about infection?",
         subtitle = paste("How concerned Americans say they are that they, someone in their family or someone else they know will", "\n", "become infected with the coronavirus")) +
    theme(plot.title = element_text(hjust = 0.5, size = 10), 
          plot.subtitle = element_text(hjust = 0.5, size = 8)) +
    scale_x_date(breaks = breaks, labels = custom_labels) +
    xlab(NULL) + 
    guides(color = guide_legend(reverse = TRUE)) +
    geom_vline(xintercept = as.Date("2020-02-29"), linetype = "dashed") + 
    geom_vline(xintercept = as.Date("2020-05-28"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2020-10-02"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2020-11-07"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2021-01-20"), linetype = "dashed") +
    geom_text(aes(x = as.Date("2020-02-29"), y = 55, label = paste("First U.S.", "\n", "death reported")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") + 
  geom_text(aes(x = as.Date("2020-05-28"), y = 55, label = paste("U.S. deaths", "\n", "surpass 100,000")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2020-10-02"), y = 48, label = paste("Trump diagnosed", "\n", "with COVID-19")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2020-11-07"), y = 57, label = paste("Biden declared", "\n", "election winner")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2021-01-20"), y = 55, label = paste("Biden sworn", "\n", "into office")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black")
graph4

Original graph included again for quick comparison. Image Alt Text

Since Dr. Handel recommended that new users begin with the gt package when creating tables, I specifically asked ChatGPT to use that package with this prompt: R code using the gt package to create a table displaying the percentages for four categories (very, somewhat, not very, not at all) for each daily observation. I am using the same data from the exercise above, which is stored as concern_data2.

The original code provided by ChatGPT had outdated syntax in the column argument of the tab_spanner() function, so I updated that using the simple c() function. I also had to add the real variable names found in concern_data2.

# load packages
library(gt)

# generate first attempt at publication quality table
table1 <- concern_data2 %>%
  gt() %>%
  tab_spanner(
    label = "Percentage",
    columns = c(very_percent, somewhat_percent, not_very_percent, not_at_all_percent)
  ) %>%
  tab_header(
    title = "Observations with Percentage Breakdown"
  )
table1
Observations with Percentage Breakdown
end_date Percentage
very_percent somewhat_percent not_very_percent not_at_all_percent
2020-02-04 13.00000 26.00000 43.00000 18.000000
2020-02-09 11.00000 24.00000 33.00000 20.000000
2020-02-11 11.00000 28.00000 39.00000 22.000000
2020-02-16 22.00000 23.00000 37.00000 19.000000
2020-02-18 16.00000 24.50000 37.50000 21.000000
2020-02-25 10.00000 30.00000 40.00000 20.000000
2020-02-28 16.00000 31.00000 27.00000 16.000000
2020-03-03 19.00000 33.00000 32.50000 15.000000
2020-03-08 23.00000 32.50000 27.00000 17.000000
2020-03-09 30.00000 32.00000 24.00000 15.000000
2020-03-10 14.00000 35.00000 35.00000 16.000000
2020-03-11 24.00000 26.00000 39.00000 11.000000
2020-03-12 22.33333 36.33333 26.33333 11.333333
2020-03-13 25.00000 33.66667 23.33333 14.000000
2020-03-15 33.66667 31.33333 22.33333 10.333333
2020-03-16 23.00000 34.66667 28.66667 13.333333
2020-03-17 15.00000 41.00000 33.00000 11.000000
2020-03-19 26.93333 41.83333 22.13333 8.100000
2020-03-22 39.00000 32.00000 18.00000 9.000000
2020-03-23 33.33333 37.33333 21.00000 8.666667
2020-03-24 35.40000 35.00000 21.40000 8.400000
2020-03-25 33.20000 38.20000 19.60000 8.400000
2020-03-26 36.50000 34.50000 20.50000 8.500000
2020-03-27 37.50000 35.50000 18.50000 8.500000
2020-03-28 31.70000 41.10000 20.00000 7.050000
2020-03-29 35.00000 36.66667 17.66667 8.666667
2020-03-30 30.60000 34.60000 20.80000 12.400000
2020-03-31 34.00000 36.50000 20.50000 8.250000
2020-04-01 37.00000 35.66667 19.00000 8.333333
2020-04-02 44.00000 35.33333 15.00000 5.333333
2020-04-03 40.00000 37.00000 16.50000 6.500000
2020-04-04 39.10000 38.50000 14.90000 5.550000
2020-04-05 35.00000 37.00000 16.00000 10.000000
2020-04-06 43.33333 35.33333 15.33333 5.666667
2020-04-07 40.16667 36.00000 16.16667 7.000000
2020-04-08 41.50000 34.00000 18.00000 6.500000
2020-04-09 40.25000 39.00000 15.00000 6.000000
2020-04-10 34.92500 36.35000 17.67500 8.225000
2020-04-11 34.00000 40.00000 19.00000 6.000000
2020-04-12 23.80000 41.60000 23.20000 10.000000
2020-04-13 35.33333 38.00000 18.66667 8.000000
2020-04-14 31.59286 37.78143 21.59286 8.831429
2020-04-15 37.00000 35.00000 19.66667 7.666667
2020-04-16 37.33333 35.66667 20.33333 6.666667
2020-04-17 34.50000 35.50000 22.50000 7.500000
2020-04-18 32.40000 37.80000 21.00000 8.200000
2020-04-19 25.40000 37.80000 22.60000 12.000000
2020-04-20 37.33333 34.33333 21.00000 7.333333
2020-04-21 36.78667 36.70833 19.27500 7.236667
2020-04-22 36.00000 37.00000 19.33333 7.666667
2020-04-23 38.33333 38.33333 18.00000 5.666667
2020-04-24 36.50000 36.50000 22.50000 5.500000
2020-04-25 27.80000 43.10000 19.20000 9.200000
2020-04-26 30.28571 35.85714 20.42857 11.000000
2020-04-27 36.50000 34.50000 23.00000 5.500000
2020-04-28 32.62200 38.05200 21.70200 7.692000
2020-04-29 38.50000 35.50000 20.00000 6.500000
2020-04-30 37.66667 37.33333 18.33333 5.666667
2020-05-01 33.66667 37.33333 19.33333 9.333333
2020-05-02 29.70000 38.20000 22.00000 9.000000
2020-05-03 28.33333 35.33333 22.33333 12.666667
2020-05-04 39.00000 32.20000 18.60000 9.800000
2020-05-05 30.32833 36.97500 22.19167 10.105000
2020-05-06 38.50000 32.50000 21.00000 8.500000
2020-05-07 36.33333 37.00000 19.33333 6.666667
2020-05-08 37.00000 35.00000 20.50000 7.500000
2020-05-09 31.60000 39.90000 18.80000 8.700000
2020-05-10 24.20000 37.80000 22.80000 13.200000
2020-05-11 36.50000 34.00000 21.50000 8.500000
2020-05-12 31.73400 37.18200 21.12800 9.850000
2020-05-13 35.66667 36.33333 19.00000 9.000000
2020-05-14 33.50000 37.25000 19.75000 10.000000
2020-05-15 40.50000 32.00000 18.00000 9.500000
2020-05-16 28.60000 37.60000 20.90000 12.400000
2020-05-17 29.60000 34.60000 21.60000 11.800000
2020-05-18 39.00000 33.25000 19.00000 8.000000
2020-05-19 26.49333 39.33000 22.45667 11.320000
2020-05-20 40.00000 34.00000 19.50000 6.000000
2020-05-21 36.00000 42.00000 15.00000 7.000000
2020-05-23 27.60000 37.40000 24.80000 9.700000
2020-05-24 20.50000 38.50000 27.50000 11.500000
2020-05-25 32.75000 36.25000 18.50000 11.500000
2020-05-26 28.44000 37.74667 23.22333 10.226667
2020-05-27 26.00000 38.00000 22.00000 13.000000
2020-05-28 30.00000 33.00000 22.00000 13.000000
2020-05-30 24.60000 36.30000 27.40000 10.700000
2020-05-31 22.25000 35.75000 24.25000 15.000000
2020-06-01 33.25000 34.00000 20.50000 11.500000
2020-06-02 22.60000 38.40000 25.63333 12.066667
2020-06-03 14.00000 37.00000 20.00000 29.000000
2020-06-05 22.00000 38.00000 28.00000 12.000000
2020-06-06 23.70000 37.60000 23.90000 14.300000
2020-06-07 20.25000 36.00000 24.75000 16.000000
2020-06-08 31.50000 33.50000 22.50000 13.000000
2020-06-09 23.77667 37.13667 24.82333 13.963333
2020-06-11 28.00000 41.00000 23.00000 8.000000
2020-06-13 26.00000 37.80000 22.40000 12.300000
2020-06-14 21.33333 37.00000 24.00000 15.333333
2020-06-15 34.25000 33.00000 20.25000 12.250000
2020-06-16 30.30000 33.60000 26.20000 8.833333
2020-06-21 16.00000 37.00000 28.00000 17.500000
2020-06-22 32.00000 33.33333 22.00000 11.333333
2020-06-23 25.37500 38.59000 22.97000 12.875000
2020-06-25 34.00000 43.00000 19.00000 5.000000
2020-06-26 34.00000 37.00000 17.00000 11.000000
2020-06-27 23.10000 39.30000 21.80000 13.800000
2020-06-28 19.00000 39.50000 25.50000 15.500000
2020-06-29 33.00000 38.00000 18.00000 9.000000
2020-06-30 34.25000 33.25000 15.25000 12.000000
2020-07-05 18.00000 39.00000 25.50000 15.000000
2020-07-06 35.00000 36.00000 15.00000 10.000000
2020-07-07 28.83500 37.58000 20.81000 12.160000
2020-07-12 26.00000 36.00000 21.66667 14.000000
2020-07-13 47.00000 30.50000 13.50000 9.000000
2020-07-14 24.00000 40.00000 26.00000 11.000000
2020-07-15 33.00000 33.00000 17.00000 12.000000
2020-07-19 26.66667 37.00000 20.66667 13.000000
2020-07-20 48.00000 31.50000 13.00000 7.000000
2020-07-21 29.35500 36.37000 22.68000 10.575000
2020-07-22 51.00000 28.00000 20.00000 1.000000
2020-07-23 37.00000 40.00000 17.00000 6.000000
2020-07-24 35.00000 37.00000 17.00000 11.000000
2020-07-26 22.50000 39.00000 21.50000 14.500000
2020-07-27 38.00000 33.00000 17.00000 9.000000
2020-07-28 29.07333 36.47000 22.26333 10.536667
2020-08-02 24.50000 44.50000 20.00000 10.500000
2020-08-03 40.00000 32.50000 17.00000 8.500000
2020-08-04 25.00000 40.00000 23.00000 12.000000
2020-08-09 11.00000 44.00000 30.00000 15.000000
2020-08-10 39.50000 31.50000 17.00000 11.000000
2020-08-11 31.43000 36.58500 19.06000 12.510000
2020-08-15 31.00000 34.00000 15.00000 14.000000
2020-08-16 21.50000 40.00000 22.50000 14.000000
2020-08-17 36.00000 34.00000 18.00000 7.000000
2020-08-18 37.50000 33.00000 21.50000 8.000000
2020-08-19 42.00000 31.00000 17.00000 9.000000
2020-08-21 32.00000 36.00000 20.00000 12.000000
2020-08-22 37.00000 33.00000 19.00000 8.000000
2020-08-23 25.00000 36.00000 17.00000 16.000000
2020-08-25 21.00000 37.00000 28.00000 14.000000
2020-08-28 37.00000 35.00000 17.00000 10.000000
2020-08-29 31.00000 37.00000 19.00000 10.000000
2020-08-30 19.00000 38.00000 25.50000 16.000000
2020-08-31 33.75000 34.00000 21.50000 10.500000
2020-09-01 25.31500 36.95000 24.08000 13.205000
2020-09-04 30.00000 38.00000 21.00000 11.000000
2020-09-06 27.00000 35.00000 18.00000 17.000000
2020-09-08 34.00000 32.00000 19.50000 14.000000
2020-09-13 27.00000 33.00000 20.00000 16.000000
2020-09-14 35.75000 32.00000 21.75000 9.250000
2020-09-15 27.27500 36.37000 24.10500 12.225000
2020-09-19 29.00000 43.00000 19.00000 10.000000
2020-09-20 22.00000 36.00000 20.00000 17.000000
2020-09-21 36.00000 32.50000 20.50000 11.000000
2020-09-22 24.00000 36.00000 26.00000 14.000000
2020-09-23 22.00000 42.00000 22.00000 13.000000
2020-09-24 29.00000 33.00000 18.00000 15.000000
2020-09-25 49.00000 26.00000 24.00000 1.000000
2020-09-27 26.00000 33.00000 21.00000 16.000000
2020-09-30 20.00000 41.00000 25.00000 15.000000
2020-10-02 20.00000 35.50000 27.00000 17.500000
2020-10-03 27.66667 40.33333 21.33333 10.333333
2020-10-04 24.00000 36.00000 20.00000 15.000000
2020-10-06 39.47000 32.58000 16.97000 10.690000
2020-10-09 29.00000 36.00000 15.00000 12.000000
2020-10-11 23.00000 39.66667 19.33333 15.333333
2020-10-12 38.50000 31.50000 18.00000 10.000000
2020-10-13 21.00000 42.00000 24.00000 14.000000
2020-10-18 25.00000 38.00000 21.00000 14.666667
2020-10-20 27.48333 37.77333 19.91000 14.576667
2020-10-22 49.00000 27.00000 23.00000 NA
2020-10-24 36.00000 41.00000 15.00000 7.000000
2020-10-25 28.00000 35.00000 18.00000 14.000000
2020-10-27 28.33333 36.00000 22.00000 13.000000
2020-11-01 30.00000 35.50000 19.50000 14.500000
2020-11-02 24.00000 38.00000 24.00000 14.000000
2020-11-08 25.00000 35.00000 17.00000 17.000000
2020-11-10 23.00000 36.00000 23.00000 18.000000
2020-11-15 25.00000 34.00000 16.00000 16.000000
2020-11-16 38.00000 31.00000 17.00000 12.000000
2020-11-20 47.00000 28.00000 23.00000 NA
2020-11-22 36.00000 39.00000 17.00000 8.000000
2020-11-24 25.00000 38.00000 23.00000 14.000000
2020-11-29 26.00000 31.00000 18.00000 17.000000
2020-12-07 44.50000 32.50000 14.50000 9.000000
2020-12-08 27.50000 37.50000 20.50000 13.000000
2020-12-13 25.00000 34.00000 17.00000 18.000000
2020-12-15 30.75500 36.20000 20.60500 11.770000
2020-12-17 45.00000 28.00000 26.00000 NA
2020-12-21 47.00000 33.00000 13.00000 5.000000
2020-12-22 26.00000 35.00000 25.00000 14.000000
2021-01-03 25.00000 33.00000 17.00000 15.000000
2021-01-12 22.00000 40.00000 22.00000 16.000000
2021-01-13 31.54500 33.77500 14.46000 14.390000
2021-01-15 34.00000 35.00000 19.00000 12.000000
2021-01-17 28.00000 32.00000 16.00000 15.000000
2021-01-19 22.00000 40.00000 21.00000 16.000000
2021-01-24 60.00000 19.00000 12.00000 7.000000
2021-01-25 19.00000 41.00000 23.00000 9.000000
2021-01-26 52.00000 25.00000 20.00000 NA
2021-01-31 22.00000 33.00000 18.00000 16.000000
2021-02-02 29.40500 34.42000 21.57500 12.500000
2021-02-07 32.00000 33.00000 19.00000 11.000000
2021-02-09 26.00000 35.00000 23.00000 16.000000
2021-02-14 25.00000 33.00000 16.00000 15.000000
2021-02-15 35.00000 34.00000 14.00000 11.000000
2021-02-16 24.00000 37.00000 24.00000 16.000000
2021-02-18 50.00000 29.00000 20.00000 NA
2021-02-22 23.00000 35.00000 25.00000 17.000000
2021-02-28 22.00000 32.00000 17.00000 17.000000
2021-03-01 35.50000 32.00000 19.00000 12.500000
2021-03-02 27.08500 34.20000 23.23500 14.280000
2021-03-06 27.00000 45.00000 20.00000 8.000000
2021-03-09 22.00000 35.00000 25.00000 18.000000
2021-03-13 29.00000 39.00000 21.00000 11.000000
2021-03-14 41.00000 NA 28.50000 NA
2021-03-18 22.00000 38.00000 28.00000 12.000000
2021-03-21 35.00000 28.00000 32.00000 NA
2021-03-23 18.00000 34.00000 30.00000 17.000000
2021-03-28 20.00000 32.00000 21.00000 19.000000
2021-03-29 29.00000 33.33333 24.66667 13.333333
2021-03-30 21.33333 34.66667 27.00000 17.333333
2021-04-06 20.17000 33.08000 26.82500 18.125000
2021-04-11 20.00000 30.00000 22.00000 20.000000
2021-04-13 16.00000 34.00000 29.00000 20.000000
2021-04-20 20.00000 31.00000 31.00000 17.000000

The first table looks decent, but I want to clean up a few things stylistically. I added a more descriptive title and renamed the columns, so the table would not contain variable names. Following my classmate Erick’s suggestion, I created a new variable, month to summarize the data in the original table to make it more useful. I felt like it was unnecessary to have 5 decimal points for each percentage, so I limited the decimals to 1 place when I created the new average percentages by month. The result is a clean and publication-quality table that is much easier to digest than the original.

# load packages required to manipulate dates
library(lubridate)

# create new month variable to summarize the table and rename columns
covid_summary <- concern_data2 %>%
  mutate(month = floor_date(as.Date(end_date, format = "%Y-%m-%d"), "month")) %>% 
  group_by(month) %>% #Group by month of the year
  summarise(avg_very_percent = round(mean(very_percent, na.rm = TRUE), 1),
            avg_somewhat_percent = round(mean(somewhat_percent, na.rm = TRUE), 1),
            avg_not_very_percent = round(mean(not_very_percent, na.rm = TRUE), 1),
            avg_not_at_all_percent = round(mean(not_at_all_percent, na.rm = TRUE), 1)) %>%
  mutate(across(starts_with("avg_"), ~ as.numeric(format(., nsmall = 2)))) %>% 
  rename(Month = "month", 
         Very = "avg_very_percent",
         Somewhat = "avg_somewhat_percent",
         `Not very` = "avg_not_very_percent",
         `Not at all` = "avg_not_at_all_percent")

## generate table with proper labels
table2 <- covid_summary %>% 
  gt() %>%
  tab_spanner(label = "Percentage",
              columns = c("Very", "Somewhat", "Not very", "Not at all")) %>%
  tab_header(title = "Levels of concern about COVID-19 infections among Americans") %>% 
  fmt_number(columns = c("Very", "Somewhat", "Not very", "Not at all"),
             decimals = 1)
table2
Levels of concern about COVID-19 infections among Americans
Month Percentage
Very Somewhat Not very Not at all
2020-02-01 14.1 26.6 36.6 19.4
2020-03-01 28.7 35.2 24.3 10.9
2020-04-01 35.8 37.0 19.2 7.4
2020-05-01 31.7 36.3 21.2 10.1
2020-06-01 26.4 37.0 22.5 13.0
2020-07-01 33.0 35.7 19.4 10.4
2020-08-01 30.3 36.2 20.4 11.5
2020-09-01 29.0 35.4 21.5 12.8
2020-10-01 29.7 36.3 20.0 12.9
2020-11-01 29.9 34.5 19.8 14.5
2020-12-01 35.1 33.7 19.5 11.8
2021-01-01 31.6 33.2 18.2 13.4
2021-02-01 29.6 33.6 20.0 14.4
2021-03-01 27.2 35.0 25.0 14.2
2021-04-01 19.0 32.0 27.2 18.8