Presentation Exercise

Get the original data

I found this data set from the FiveThirtyEight website about American opinions on the Coronavirus in relation to political events from March 2020 to April 2021: https://data.fivethirtyeight.com/ I chose to use the covid_concern_polls.csv file to recreate the graph about how worried Americans are about Covid infection, which can be found here. I added the original graph in this section for convenience. Image Alt Text I examined the data to discover there are start and end dates for each response and four levels of worry for each topic: very, somewhat, not very, and not at all, which are all represented on the graph I want to replicate. There are 678 observations of 15 variables stored in this dataset.

library(readr)
library(here)
pollsdata <- read_csv(here("presentation-exercise", "covid_concern_polls.csv"))
str(pollsdata)

spc_tbl_ [678 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ start_date : Date[1:678], format: "2020-01-27" "2020-01-31" ...
 $ end_date   : Date[1:678], format: "2020-01-29" "2020-02-02" ...
 $ pollster   : chr [1:678] "Morning Consult" "Morning Consult" "YouGov" "Morning Consult" ...
 $ sponsor    : chr [1:678] NA NA "Economist" NA ...
 $ sample_size: num [1:678] 2202 2202 1500 2200 1000 ...
 $ population : chr [1:678] "a" "a" "a" "a" ...
 $ party      : chr [1:678] "all" "all" "all" "all" ...
 $ subject    : chr [1:678] "concern-economy" "concern-economy" "concern-infected" "concern-economy" ...
 $ tracking   : logi [1:678] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ text       : chr [1:678] "How concerned are you that the coronavirus will impact the following? U.S. economy" "How concerned are you that the coronavirus will impact the following? U.S. economy" "Taking into consideration both your risk of contracting it and the seriousness of the illness, how worried are "| __truncated__ "How concerned are you that the coronavirus will impact the following? U.S. economy" ...
 $ very       : num [1:678] 19 26 13 23 11 11 22 22 22 10 ...
 $ somewhat   : num [1:678] 33 32 26 32 24 28 23 35 21 28 ...
 $ not_very   : num [1:678] 23 25 43 24 33 39 37 28 33 42 ...
 $ not_at_all : num [1:678] 11 7 18 9 20 22 19 15 23 19 ...
 $ url        : chr [1:678] "https://morningconsult.com/wp-content/uploads/2020/02/200167_crosstabs_CORONAVIRUS_Adults_v2_JB-1.pdf" "https://morningconsult.com/wp-content/uploads/2020/02/200191_crosstabs_CORONAVIRUS_Adults_v2_JB-1.pdf" "https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/73jqd6u5mv/econTabReport.pdf" "https://morningconsult.com/wp-content/uploads/2020/02/200214_crosstabs_CORONAVIRUS_Adults_v4_JB.pdf" ...
 - attr(*, "spec")=
  .. cols(
  ..   start_date = col_date(format = ""),
  ..   end_date = col_date(format = ""),
  ..   pollster = col_character(),
  ..   sponsor = col_character(),
  ..   sample_size = col_double(),
  ..   population = col_character(),
  ..   party = col_character(),
  ..   subject = col_character(),
  ..   tracking = col_logical(),
  ..   text = col_character(),
  ..   very = col_double(),
  ..   somewhat = col_double(),
  ..   not_very = col_double(),
  ..   not_at_all = col_double(),
  ..   url = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

dim(pollsdata)

[1] 678  15

Ask AI to recreate the original graph

This is the original prompt that I asked ChatGPT 3.5: R code to generate the graph titled “How worried are Americans about infection?” on this website https://projects.fivethirtyeight.com/coronavirus-polls/ using the covid_concern_polls.csv file from this github repository https://github.com/fivethirtyeight/covid-19-polls

I used the links to the website and GitHub repository to ensure that it knew which graph I am trying to recreate, which has been saved as Inspiration_graph.png in the presentation-exercise folder. The first step it gave me was to load three packages: readr, dplyr, and ggplot2. The second step was to save the data as an object, which I have already done. Next, it suggested filtering the data for the question regarding infection: “How worried are you about personally being infected with coronavirus?”. I made several modifications to the code because it did not know the object name that I stored my data in, and it did not know the exact verbiage of the question that was asked. This was not successful because I do not have one “estimate” for the percentage of people in each of the worry categories ranging from not_at_all to very. I should have known that this code would not be accurate because it said these steps will produce a polar bar graph when I am trying to produce a line graph. I commented out this code for the sake of rendering my website with this exercise.

# load required packages
library(dplyr)
library(ggplot2)

# modified filtering code
# concern_data <- pollsdata %>%
 # filter(text == "How worried are you about you or someone in your family being infected with the Coronavirus??") %>%
 # select(starts_with("estimate"))

This is the second prompt I asked ChatGPT: R code to produce a line graph with percentages of four different categories “very”, “somewhat”, “not very”, and “not at all” on the y-axis and time on the x-axis using the covid_concern_polls.csv file from this github repository https://github.com/fivethirtyeight/covid-19-polls

I asked for a line graph specifically, and I know that I need to create variables with the percentages of each response category. I filtered for the subject of interest: “concern-infected” to limit the data to information about concern over Coronavrius infection. The graph models how concerned Americans are that they, someone in their family or someone else they know will be infected with Coronavirus, so I filtered for responses to all of the questions asking about concern over Coronavirus infections, which yielded 77 observations.

# filter for all responses about Coronavirus concern
concern_data <- pollsdata %>% 
    filter(subject == "concern-infected")

The next suggestion from ChatGPT was to create percentages for each of the response variables. The first variable that had to be changed here is the grouping factor because modeldate was not included in the original dataset. Additionally, there were no “estimate” variables, so I just used the column names instead because each column had count data for that observation.

# group data by date and calculate percentages for each category
concern_data2 <- concern_data %>%
  group_by(end_date) %>%
  summarise(
    very_percent = sum(very) / n(),
    somewhat_percent = sum(somewhat) / n(),
    not_very_percent = sum(not_very) / n(),
    not_at_all_percent = sum(not_at_all) / n()
  )

# check that it worked 
str(concern_data2)

tibble [226 × 5] (S3: tbl_df/tbl/data.frame)
 $ end_date          : Date[1:226], format: "2020-02-04" "2020-02-09" ...
 $ very_percent      : num [1:226] 13 11 11 22 16 10 16 19 23 30 ...
 $ somewhat_percent  : num [1:226] 26 24 28 23 24.5 30 31 33 32.5 32 ...
 $ not_very_percent  : num [1:226] 43 33 39 37 37.5 40 27 32.5 27 24 ...
 $ not_at_all_percent: num [1:226] 18 20 22 19 21 20 16 15 17 15 ...

ChatGPT suggested changing the shape of the data before attempting to visualize it. It did not account for the fact that I need the tidyr package to use the pivot_longer() function, so I loaded that package here. This transformation left me with a data set with 276 observations of 3 variables.

# load packages
library(tidyr)

# convert shape of the dataset
concern_data_long <- concern_data2 %>%
  pivot_longer(cols = c(very_percent, somewhat_percent, not_very_percent, not_at_all_percent),
               names_to = "Concern Level",
               values_to = "Percentage")

Lastly, I attempted to recreate the plot from the original webpage. The first attempt is not bad, but it looks like I have missing data in the “not_at_all_percent” variable which was not the case in the graph on the website.

# attempt to recreate the original plot
graph1 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
  geom_line() +
  labs(title = "COVID-19 Concern Levels Over Time",
       x = "Date",
       y = "Percentage",
       color = "Concern Level") +
  theme_minimal()
graph1

After finding there are 6 missing observations of the Percentage variable, I decided to omit those values because they account for such a minimal percentage of total observations, which fixed the strange gap in the line graph.

# explore for missing data and remove it
sum(is.na(concern_data_long$Percentage))

[1] 8

concern_data_long <- na.omit(concern_data_long)

# check that missing data removal fixed the issue
graph2 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
  geom_line() +
  labs(title = "COVID-19 Concern Levels Over Time",
       x = "Date",
       y = "Percentage",
       color = "Concern Level") +
  theme_minimal()
graph2

I need to make aesthetic changes, so the replicated graph matches the original graph more accurately. The title needs to be changed and centered. I used ChatGPT to create code to separate the title into three lines and change the size of the second title line, so it will fit better. I got a warning message that vectorized input to element_text is not supported in ggplot2, so all three lines of the title are the same size. The original x-axis labels each month, so I created custom labels and dropped the “date” label for the entire axis. While the original graph is interactive, that is slightly out of reach with my coding knowledge right now, so I decided to keep the stagnant legend. I noticed the legend was inverted, so I corrected that. I manually changed the colors of the lines using hex codes.

# create 14 breaks for the x-axis and custom labels for each break
breaks <- seq(as.Date("2020-02-28"), by = "month", length.out = 14)
custom_labels <- c("3/1/20", "4/1", "5/1", "6/1", "7/1",
                   "8/1", "9/1", "10/1", "11/1", "12/1", "1/1/21", "2/1", "3/1", "4/1")

# create updated version of the graph with modifications
graph3 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
    geom_line() +
    scale_color_manual(values = c("#800080","#BA55D3","#FFDAB9","#FF0000")) +
    labs(title = "How worried are Americans about infection?",
         subtitle = paste("How concerned Americans say they are that they, someone in their family or someone else they know will", "\n", "become infected with the coronavirus")) +
    theme(plot.title = element_text(hjust = 0.5, size = 10), 
          plot.subtitle = element_text(hjust = 0.5, size = 8)) +
    scale_x_date(breaks = breaks, labels = custom_labels) +
    xlab(NULL) + 
    guides(color = guide_legend(reverse = TRUE))
graph3

Lastly, I need to add labels on specific dates. I used this simple prompt in ChatGPT: how to add labels to specific dates on a line graph. It suggested adding a geom_text() layer. I had to go back and forth with ChatGPT a couple times to find a date format that worked. Unfortunately, I received a consistent error about the geom_text() layer not being able to find the Percentage variable which prevented it from adding the labels onto my existing graph. I attempted the same format with geom_label() and got the same error. I added an arbitrary Percentage variable to the labels_data just to see if that would fix the error, but it was also unsuccessful. I would appreciate any input on how to solve this error. I commented out the last piece of code for the sake of rendering my website.

# create custom labels to be added as another layer on the original graph 
labels_data <- data.frame(
  modeldate = as.Date(c("2020-02-29", "2020-05-28", "2020-10-02", "2020-11-07", "2021-01-02")),  
  label_text = c("First US death reported", "US deaths surpass 100,000", "Trump diagnosed with Covid-19", "Biden declared election winner", "Biden sworn into office",
  Percentage = c(60, 60, 60, 60, 60))
)

 ## graph4 <- graph3 + 
 ##  geom_label(data = labels_data, aes(x = modeldate, y=Percentage, label = label_text))

After receiving input from a classmate, Erick Mollinedo, I added several geom_vline() and geom_text() functions to add the missing labels. The geom_vline() functions created dashed lines on each of the important dates, which I think solved the previous issue because it provided a location for the new layer to be added to in ggplot(). The geom_text() functions allow the actual descriptions to be added. The labels were a little too large for my graph originally, so I changed the size of each one to make them fit better. This addition brought me much closer to replicating the original graph although my lines appear to be significantly more jagged than the original. I also received a suggestion to use the ggthemes package to make the final replication appear closer to the original by adding the theme_538() function. I received multiple errors about not being able to find this function, so ChatGPT suggested creating my own function to mimic the 538 theme. This change removed the gridlines in the background and adjusted the sizing of the x-axis labels to match the original graph more closely.

# create a function to mimic the 538 style
theme_538 <- function() {
  theme_minimal() +
    theme(
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      panel.background = element_blank(),
      axis.line = element_line(color = "black"),
      axis.ticks = element_line(color = "black"),
      plot.title = element_text(size = 14, hjust = 0.5)
    )
}

# apply the 538 style function and add labels to dates of interest
graph4 <- ggplot(concern_data_long, aes(x = end_date, y = Percentage, color = `Concern Level`)) +
    geom_line() +
    theme_538() +
    scale_color_manual(values = c("#800080","#BA55D3","#FFDAB9","#FF0000")) +
    labs(title = "How worried are Americans about infection?",
         subtitle = paste("How concerned Americans say they are that they, someone in their family or someone else they know will", "\n", "become infected with the coronavirus")) +
    theme(plot.title = element_text(hjust = 0.5, size = 10), 
          plot.subtitle = element_text(hjust = 0.5, size = 8)) +
    scale_x_date(breaks = breaks, labels = custom_labels) +
    xlab(NULL) + 
    guides(color = guide_legend(reverse = TRUE)) +
    geom_vline(xintercept = as.Date("2020-02-29"), linetype = "dashed") + 
    geom_vline(xintercept = as.Date("2020-05-28"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2020-10-02"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2020-11-07"), linetype = "dashed") +
    geom_vline(xintercept = as.Date("2021-01-20"), linetype = "dashed") +
    geom_text(aes(x = as.Date("2020-02-29"), y = 55, label = paste("First U.S.", "\n", "death reported")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") + 
  geom_text(aes(x = as.Date("2020-05-28"), y = 55, label = paste("U.S. deaths", "\n", "surpass 100,000")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2020-10-02"), y = 48, label = paste("Trump diagnosed", "\n", "with COVID-19")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2020-11-07"), y = 57, label = paste("Biden declared", "\n", "election winner")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black") +
  geom_text(aes(x = as.Date("2021-01-20"), y = 55, label = paste("Biden sworn", "\n", "into office")), size = 3, angle = 0, vjust = 0, fontface = "italic", color = "black")
graph4

Original graph included again for quick comparison. Image Alt Text

Since Dr. Handel recommended that new users begin with the gt package when creating tables, I specifically asked ChatGPT to use that package with this prompt: R code using the gt package to create a table displaying the percentages for four categories (very, somewhat, not very, not at all) for each daily observation. I am using the same data from the exercise above, which is stored as concern_data2.

The original code provided by ChatGPT had outdated syntax in the column argument of the tab_spanner() function, so I updated that using the simple c() function. I also had to add the real variable names found in concern_data2.

# load packages
library(gt)

# generate first attempt at publication quality table
table1 <- concern_data2 %>%
  gt() %>%
  tab_spanner(
    label = "Percentage",
    columns = c(very_percent, somewhat_percent, not_very_percent, not_at_all_percent)
  ) %>%
  tab_header(
    title = "Observations with Percentage Breakdown"
  )
table1

Observations with Percentage Breakdown
end_date	Percentage
end_date	very_percent	somewhat_percent	not_very_percent	not_at_all_percent
2020-02-04	13.00000	26.00000	43.00000	18.000000
2020-02-09	11.00000	24.00000	33.00000	20.000000
2020-02-11	11.00000	28.00000	39.00000	22.000000
2020-02-16	22.00000	23.00000	37.00000	19.000000
2020-02-18	16.00000	24.50000	37.50000	21.000000
2020-02-25	10.00000	30.00000	40.00000	20.000000
2020-02-28	16.00000	31.00000	27.00000	16.000000
2020-03-03	19.00000	33.00000	32.50000	15.000000
2020-03-08	23.00000	32.50000	27.00000	17.000000
2020-03-09	30.00000	32.00000	24.00000	15.000000
2020-03-10	14.00000	35.00000	35.00000	16.000000
2020-03-11	24.00000	26.00000	39.00000	11.000000
2020-03-12	22.33333	36.33333	26.33333	11.333333
2020-03-13	25.00000	33.66667	23.33333	14.000000
2020-03-15	33.66667	31.33333	22.33333	10.333333
2020-03-16	23.00000	34.66667	28.66667	13.333333
2020-03-17	15.00000	41.00000	33.00000	11.000000
2020-03-19	26.93333	41.83333	22.13333	8.100000
2020-03-22	39.00000	32.00000	18.00000	9.000000
2020-03-23	33.33333	37.33333	21.00000	8.666667
2020-03-24	35.40000	35.00000	21.40000	8.400000
2020-03-25	33.20000	38.20000	19.60000	8.400000
2020-03-26	36.50000	34.50000	20.50000	8.500000
2020-03-27	37.50000	35.50000	18.50000	8.500000
2020-03-28	31.70000	41.10000	20.00000	7.050000
2020-03-29	35.00000	36.66667	17.66667	8.666667
2020-03-30	30.60000	34.60000	20.80000	12.400000
2020-03-31	34.00000	36.50000	20.50000	8.250000
2020-04-01	37.00000	35.66667	19.00000	8.333333
2020-04-02	44.00000	35.33333	15.00000	5.333333
2020-04-03	40.00000	37.00000	16.50000	6.500000
2020-04-04	39.10000	38.50000	14.90000	5.550000
2020-04-05	35.00000	37.00000	16.00000	10.000000
2020-04-06	43.33333	35.33333	15.33333	5.666667
2020-04-07	40.16667	36.00000	16.16667	7.000000
2020-04-08	41.50000	34.00000	18.00000	6.500000
2020-04-09	40.25000	39.00000	15.00000	6.000000
2020-04-10	34.92500	36.35000	17.67500	8.225000
2020-04-11	34.00000	40.00000	19.00000	6.000000
2020-04-12	23.80000	41.60000	23.20000	10.000000
2020-04-13	35.33333	38.00000	18.66667	8.000000
2020-04-14	31.59286	37.78143	21.59286	8.831429
2020-04-15	37.00000	35.00000	19.66667	7.666667
2020-04-16	37.33333	35.66667	20.33333	6.666667
2020-04-17	34.50000	35.50000	22.50000	7.500000
2020-04-18	32.40000	37.80000	21.00000	8.200000
2020-04-19	25.40000	37.80000	22.60000	12.000000
2020-04-20	37.33333	34.33333	21.00000	7.333333
2020-04-21	36.78667	36.70833	19.27500	7.236667
2020-04-22	36.00000	37.00000	19.33333	7.666667
2020-04-23	38.33333	38.33333	18.00000	5.666667
2020-04-24	36.50000	36.50000	22.50000	5.500000
2020-04-25	27.80000	43.10000	19.20000	9.200000
2020-04-26	30.28571	35.85714	20.42857	11.000000
2020-04-27	36.50000	34.50000	23.00000	5.500000
2020-04-28	32.62200	38.05200	21.70200	7.692000
2020-04-29	38.50000	35.50000	20.00000	6.500000
2020-04-30	37.66667	37.33333	18.33333	5.666667
2020-05-01	33.66667	37.33333	19.33333	9.333333
2020-05-02	29.70000	38.20000	22.00000	9.000000
2020-05-03	28.33333	35.33333	22.33333	12.666667
2020-05-04	39.00000	32.20000	18.60000	9.800000
2020-05-05	30.32833	36.97500	22.19167	10.105000
2020-05-06	38.50000	32.50000	21.00000	8.500000
2020-05-07	36.33333	37.00000	19.33333	6.666667
2020-05-08	37.00000	35.00000	20.50000	7.500000
2020-05-09	31.60000	39.90000	18.80000	8.700000
2020-05-10	24.20000	37.80000	22.80000	13.200000
2020-05-11	36.50000	34.00000	21.50000	8.500000
2020-05-12	31.73400	37.18200	21.12800	9.850000
2020-05-13	35.66667	36.33333	19.00000	9.000000
2020-05-14	33.50000	37.25000	19.75000	10.000000
2020-05-15	40.50000	32.00000	18.00000	9.500000
2020-05-16	28.60000	37.60000	20.90000	12.400000
2020-05-17	29.60000	34.60000	21.60000	11.800000
2020-05-18	39.00000	33.25000	19.00000	8.000000
2020-05-19	26.49333	39.33000	22.45667	11.320000
2020-05-20	40.00000	34.00000	19.50000	6.000000
2020-05-21	36.00000	42.00000	15.00000	7.000000
2020-05-23	27.60000	37.40000	24.80000	9.700000
2020-05-24	20.50000	38.50000	27.50000	11.500000
2020-05-25	32.75000	36.25000	18.50000	11.500000
2020-05-26	28.44000	37.74667	23.22333	10.226667
2020-05-27	26.00000	38.00000	22.00000	13.000000
2020-05-28	30.00000	33.00000	22.00000	13.000000
2020-05-30	24.60000	36.30000	27.40000	10.700000
2020-05-31	22.25000	35.75000	24.25000	15.000000
2020-06-01	33.25000	34.00000	20.50000	11.500000
2020-06-02	22.60000	38.40000	25.63333	12.066667
2020-06-03	14.00000	37.00000	20.00000	29.000000
2020-06-05	22.00000	38.00000	28.00000	12.000000
2020-06-06	23.70000	37.60000	23.90000	14.300000
2020-06-07	20.25000	36.00000	24.75000	16.000000
2020-06-08	31.50000	33.50000	22.50000	13.000000
2020-06-09	23.77667	37.13667	24.82333	13.963333
2020-06-11	28.00000	41.00000	23.00000	8.000000
2020-06-13	26.00000	37.80000	22.40000	12.300000
2020-06-14	21.33333	37.00000	24.00000	15.333333
2020-06-15	34.25000	33.00000	20.25000	12.250000
2020-06-16	30.30000	33.60000	26.20000	8.833333
2020-06-21	16.00000	37.00000	28.00000	17.500000
2020-06-22	32.00000	33.33333	22.00000	11.333333
2020-06-23	25.37500	38.59000	22.97000	12.875000
2020-06-25	34.00000	43.00000	19.00000	5.000000
2020-06-26	34.00000	37.00000	17.00000	11.000000
2020-06-27	23.10000	39.30000	21.80000	13.800000
2020-06-28	19.00000	39.50000	25.50000	15.500000
2020-06-29	33.00000	38.00000	18.00000	9.000000
2020-06-30	34.25000	33.25000	15.25000	12.000000
2020-07-05	18.00000	39.00000	25.50000	15.000000
2020-07-06	35.00000	36.00000	15.00000	10.000000
2020-07-07	28.83500	37.58000	20.81000	12.160000
2020-07-12	26.00000	36.00000	21.66667	14.000000
2020-07-13	47.00000	30.50000	13.50000	9.000000
2020-07-14	24.00000	40.00000	26.00000	11.000000
2020-07-15	33.00000	33.00000	17.00000	12.000000
2020-07-19	26.66667	37.00000	20.66667	13.000000
2020-07-20	48.00000	31.50000	13.00000	7.000000
2020-07-21	29.35500	36.37000	22.68000	10.575000
2020-07-22	51.00000	28.00000	20.00000	1.000000
2020-07-23	37.00000	40.00000	17.00000	6.000000
2020-07-24	35.00000	37.00000	17.00000	11.000000
2020-07-26	22.50000	39.00000	21.50000	14.500000
2020-07-27	38.00000	33.00000	17.00000	9.000000
2020-07-28	29.07333	36.47000	22.26333	10.536667
2020-08-02	24.50000	44.50000	20.00000	10.500000
2020-08-03	40.00000	32.50000	17.00000	8.500000
2020-08-04	25.00000	40.00000	23.00000	12.000000
2020-08-09	11.00000	44.00000	30.00000	15.000000
2020-08-10	39.50000	31.50000	17.00000	11.000000
2020-08-11	31.43000	36.58500	19.06000	12.510000
2020-08-15	31.00000	34.00000	15.00000	14.000000
2020-08-16	21.50000	40.00000	22.50000	14.000000
2020-08-17	36.00000	34.00000	18.00000	7.000000
2020-08-18	37.50000	33.00000	21.50000	8.000000
2020-08-19	42.00000	31.00000	17.00000	9.000000
2020-08-21	32.00000	36.00000	20.00000	12.000000
2020-08-22	37.00000	33.00000	19.00000	8.000000
2020-08-23	25.00000	36.00000	17.00000	16.000000
2020-08-25	21.00000	37.00000	28.00000	14.000000
2020-08-28	37.00000	35.00000	17.00000	10.000000
2020-08-29	31.00000	37.00000	19.00000	10.000000
2020-08-30	19.00000	38.00000	25.50000	16.000000
2020-08-31	33.75000	34.00000	21.50000	10.500000
2020-09-01	25.31500	36.95000	24.08000	13.205000
2020-09-04	30.00000	38.00000	21.00000	11.000000
2020-09-06	27.00000	35.00000	18.00000	17.000000
2020-09-08	34.00000	32.00000	19.50000	14.000000
2020-09-13	27.00000	33.00000	20.00000	16.000000
2020-09-14	35.75000	32.00000	21.75000	9.250000
2020-09-15	27.27500	36.37000	24.10500	12.225000
2020-09-19	29.00000	43.00000	19.00000	10.000000
2020-09-20	22.00000	36.00000	20.00000	17.000000
2020-09-21	36.00000	32.50000	20.50000	11.000000
2020-09-22	24.00000	36.00000	26.00000	14.000000
2020-09-23	22.00000	42.00000	22.00000	13.000000
2020-09-24	29.00000	33.00000	18.00000	15.000000
2020-09-25	49.00000	26.00000	24.00000	1.000000
2020-09-27	26.00000	33.00000	21.00000	16.000000
2020-09-30	20.00000	41.00000	25.00000	15.000000
2020-10-02	20.00000	35.50000	27.00000	17.500000
2020-10-03	27.66667	40.33333	21.33333	10.333333
2020-10-04	24.00000	36.00000	20.00000	15.000000
2020-10-06	39.47000	32.58000	16.97000	10.690000
2020-10-09	29.00000	36.00000	15.00000	12.000000
2020-10-11	23.00000	39.66667	19.33333	15.333333
2020-10-12	38.50000	31.50000	18.00000	10.000000
2020-10-13	21.00000	42.00000	24.00000	14.000000
2020-10-18	25.00000	38.00000	21.00000	14.666667
2020-10-20	27.48333	37.77333	19.91000	14.576667
2020-10-22	49.00000	27.00000	23.00000	NA
2020-10-24	36.00000	41.00000	15.00000	7.000000
2020-10-25	28.00000	35.00000	18.00000	14.000000
2020-10-27	28.33333	36.00000	22.00000	13.000000
2020-11-01	30.00000	35.50000	19.50000	14.500000
2020-11-02	24.00000	38.00000	24.00000	14.000000
2020-11-08	25.00000	35.00000	17.00000	17.000000
2020-11-10	23.00000	36.00000	23.00000	18.000000
2020-11-15	25.00000	34.00000	16.00000	16.000000
2020-11-16	38.00000	31.00000	17.00000	12.000000
2020-11-20	47.00000	28.00000	23.00000	NA
2020-11-22	36.00000	39.00000	17.00000	8.000000
2020-11-24	25.00000	38.00000	23.00000	14.000000
2020-11-29	26.00000	31.00000	18.00000	17.000000
2020-12-07	44.50000	32.50000	14.50000	9.000000
2020-12-08	27.50000	37.50000	20.50000	13.000000
2020-12-13	25.00000	34.00000	17.00000	18.000000
2020-12-15	30.75500	36.20000	20.60500	11.770000
2020-12-17	45.00000	28.00000	26.00000	NA
2020-12-21	47.00000	33.00000	13.00000	5.000000
2020-12-22	26.00000	35.00000	25.00000	14.000000
2021-01-03	25.00000	33.00000	17.00000	15.000000
2021-01-12	22.00000	40.00000	22.00000	16.000000
2021-01-13	31.54500	33.77500	14.46000	14.390000
2021-01-15	34.00000	35.00000	19.00000	12.000000
2021-01-17	28.00000	32.00000	16.00000	15.000000
2021-01-19	22.00000	40.00000	21.00000	16.000000
2021-01-24	60.00000	19.00000	12.00000	7.000000
2021-01-25	19.00000	41.00000	23.00000	9.000000
2021-01-26	52.00000	25.00000	20.00000	NA
2021-01-31	22.00000	33.00000	18.00000	16.000000
2021-02-02	29.40500	34.42000	21.57500	12.500000
2021-02-07	32.00000	33.00000	19.00000	11.000000
2021-02-09	26.00000	35.00000	23.00000	16.000000
2021-02-14	25.00000	33.00000	16.00000	15.000000
2021-02-15	35.00000	34.00000	14.00000	11.000000
2021-02-16	24.00000	37.00000	24.00000	16.000000
2021-02-18	50.00000	29.00000	20.00000	NA
2021-02-22	23.00000	35.00000	25.00000	17.000000
2021-02-28	22.00000	32.00000	17.00000	17.000000
2021-03-01	35.50000	32.00000	19.00000	12.500000
2021-03-02	27.08500	34.20000	23.23500	14.280000
2021-03-06	27.00000	45.00000	20.00000	8.000000
2021-03-09	22.00000	35.00000	25.00000	18.000000
2021-03-13	29.00000	39.00000	21.00000	11.000000
2021-03-14	41.00000	NA	28.50000	NA
2021-03-18	22.00000	38.00000	28.00000	12.000000
2021-03-21	35.00000	28.00000	32.00000	NA
2021-03-23	18.00000	34.00000	30.00000	17.000000
2021-03-28	20.00000	32.00000	21.00000	19.000000
2021-03-29	29.00000	33.33333	24.66667	13.333333
2021-03-30	21.33333	34.66667	27.00000	17.333333
2021-04-06	20.17000	33.08000	26.82500	18.125000
2021-04-11	20.00000	30.00000	22.00000	20.000000
2021-04-13	16.00000	34.00000	29.00000	20.000000
2021-04-20	20.00000	31.00000	31.00000	17.000000

The first table looks decent, but I want to clean up a few things stylistically. I added a more descriptive title and renamed the columns, so the table would not contain variable names. Following my classmate Erick’s suggestion, I created a new variable, month to summarize the data in the original table to make it more useful. I felt like it was unnecessary to have 5 decimal points for each percentage, so I limited the decimals to 1 place when I created the new average percentages by month. The result is a clean and publication-quality table that is much easier to digest than the original.

# load packages required to manipulate dates
library(lubridate)

# create new month variable to summarize the table and rename columns
covid_summary <- concern_data2 %>%
  mutate(month = floor_date(as.Date(end_date, format = "%Y-%m-%d"), "month")) %>% 
  group_by(month) %>% #Group by month of the year
  summarise(avg_very_percent = round(mean(very_percent, na.rm = TRUE), 1),
            avg_somewhat_percent = round(mean(somewhat_percent, na.rm = TRUE), 1),
            avg_not_very_percent = round(mean(not_very_percent, na.rm = TRUE), 1),
            avg_not_at_all_percent = round(mean(not_at_all_percent, na.rm = TRUE), 1)) %>%
  mutate(across(starts_with("avg_"), ~ as.numeric(format(., nsmall = 2)))) %>% 
  rename(Month = "month", 
         Very = "avg_very_percent",
         Somewhat = "avg_somewhat_percent",
         `Not very` = "avg_not_very_percent",
         `Not at all` = "avg_not_at_all_percent")

## generate table with proper labels
table2 <- covid_summary %>% 
  gt() %>%
  tab_spanner(label = "Percentage",
              columns = c("Very", "Somewhat", "Not very", "Not at all")) %>%
  tab_header(title = "Levels of concern about COVID-19 infections among Americans") %>% 
  fmt_number(columns = c("Very", "Somewhat", "Not very", "Not at all"),
             decimals = 1)
table2

Levels of concern about COVID-19 infections among Americans
Month	Percentage
Month	Very	Somewhat	Not very	Not at all
2020-02-01	14.1	26.6	36.6	19.4
2020-03-01	28.7	35.2	24.3	10.9
2020-04-01	35.8	37.0	19.2	7.4
2020-05-01	31.7	36.3	21.2	10.1
2020-06-01	26.4	37.0	22.5	13.0
2020-07-01	33.0	35.7	19.4	10.4
2020-08-01	30.3	36.2	20.4	11.5
2020-09-01	29.0	35.4	21.5	12.8
2020-10-01	29.7	36.3	20.0	12.9
2020-11-01	29.9	34.5	19.8	14.5
2020-12-01	35.1	33.7	19.5	11.8
2021-01-01	31.6	33.2	18.2	13.4
2021-02-01	29.6	33.6	20.0	14.4
2021-03-01	27.2	35.0	25.0	14.2
2021-04-01	19.0	32.0	27.2	18.8