Five Years of COVID-19 Data: Variants, Hospitalizations, Vaccines, and What Wastewater Tells Us Now

Five Years of COVID-19 Data: Variants, Hospitalizations, Vaccines, and What Wastewater Tells Us Now

1. About   publicHealth dataviz covid

covid-tracker-banner.jpeg

Figure 1: JPEG produced with DALL-E 4o

Few events in modern history have generated as much real-time public data as COVID-19. This post draws on four federal datasets – variant proportions, hospital admissions, vaccination progress, and wastewater surveillance – to tell the story of five years of a virus reshaping American life, from emergence through endemicity.

2. TLDR   tldr

The COVID-19 pandemic has been one of the most intensively documented public health events in history. CDC's public data APIs capture the full arc: five waves of distinct variant lineages, a hospitalization record that dwarfed any recent respiratory disease season, the fastest vaccine rollout in American history (with significant state-level variation), and a new permanent surveillance infrastructure built on wastewater. As of early 2026, wastewater monitoring shows SARS-CoV-2 activity is present but well below pandemic peaks. The virus didn't disappear — it became endemic, and public health built tools to watch it.

3. Introduction: COVID-19 as a Data Story   analysis

Few events in modern history have generated as much real-time public data as COVID-19. At the pandemic's height, public health agencies at every level — federal, state, and local — were publishing case counts, test positivity rates, hospitalization figures, vaccine uptake, variant sequences, and wastewater viral loads, often daily. Some of that infrastructure has since been wound down. But what remains in the CDC's public APIs tells a coherent story about five years of a virus reshaping American life.

This post draws on four distinct federal datasets that remain active and publicly accessible:

  • Variant Proportions (CDC): Genomic surveillance tracking which SARS-CoV-2 lineage dominates circulating sequences each week, starting in January 2021 and updated continuously through today.
  • Hospital Admissions (CDC): Weekly adult COVID-19 admissions reported by hospitals to HHS/CDC, from August 2020 through October 2024 when mandatory reporting ended.
  • Vaccination Progress (CDC): Dose administration and completion rates by state and nationally, from the first shots in December 2020 through the formal end of the federal vaccination data program in May 2023.
  • Wastewater Surveillance (CDC NWSS): SARS-CoV-2 viral signal detected at wastewater treatment plants nationwide, from mid-2020 to the present — the only dataset that continues tracking COVID-19 activity in real time.

Together these four data streams let us trace the pandemic from emergence through endemicity.

4. The Variant Succession   dataviz variants

One of the defining features of SARS-CoV-2 has been its capacity for rapid evolutionary change. CDC's genomic surveillance program tracks the proportion of sequenced specimens attributed to each circulating lineage each week. The result is a clear visual record of variant succession — each new dominant strain displacing the last, often within weeks.

Several structural patterns emerge from the variant timeline:

  • Alpha and Delta were sequential. Alpha (B.1.1.7, first identified in the UK) became dominant in the US through spring 2021 before Delta (B.1.617.2, first identified in India) displaced it almost entirely by July 2021. Delta was more transmissible than any prior variant; it drove the summer and fall 2021 wave and remained dominant until a single event changed everything.
  • Omicron was a discontinuity. The emergence of B.1.1.529 (Omicron) in late November 2021 wasn't just a variant transition — it was a reset. Omicron displaced Delta in a matter of weeks, achieving dominance faster than any prior variant. But unlike Delta, which produced severe disease in unvaccinated populations at rates comparable to the original strain, Omicron showed substantially reduced severity per infection (partly due to its different cell tropism, partly due to widespread prior immunity). The trade-off: it was dramatically more contagious, and it produced the largest single hospitalization wave of the pandemic.
  • Omicron fragmented into subvariants. After the initial BA.1/BA.1.1 wave, Omicron didn't recede — it diversified. BA.2, BA.2.12.1, BA.4, and BA.5 emerged in rapid succession through 2022, each outcompeting its predecessor. By late 2022, BQ.1 and BQ.1.1 took over; by early 2023, XBB.1.5 dominated; through 2023-2024, EG.5, HV.1, JN.1, KP.2, KP.3, and XEC followed. The post-Omicron era has been characterized by continuous churn among subvariants, none achieving quite the dramatic emergence of the original Omicron wave.
  • The current era: XFG. As of February 2026, the XFG lineage (a recombinant descendent of Omicron) accounts for the largest share of sequenced specimens, alongside XFG.2.5.1 and XFG.1.1. The pattern has become stable: a new Omicron-descendent subvariant achieves dominance every few months, drives a modest uptick in activity, and gives way to the next. This is how seasonal respiratory viruses behave.

The variant chart illustrates something important about pathogen evolution under population immunity: selective pressure favors immune evasion over severity. Each successive Omicron subvariant became better at evading prior immunity but didn't revert to Delta-level severity. The virus found a stable evolutionary niche — high transmissibility, periodic immune escape, manageable (for most) disease burden.

5. The Hospitalization Record   dataviz hospitals

Variant proportions tell us about the virus. Hospital admissions tell us about the impact on people.

The hospitalization data runs from August 2020 through October 2024, when the federal government ended mandatory hospital reporting requirements. Several features of this record are striking:

  • The Omicron BA.1 wave was the largest hospitalization event of the pandemic. In mid-January 2022, weekly adult COVID-19 admissions in the US peaked at levels roughly double the prior record (the winter 2020-21 wave). This is counterintuitive given that Omicron caused less severe disease per infection — but the sheer number of infections was so large that even a lower hospitalization rate produced record admissions. The US hospital system was significantly strained, with staff quarantines and patient surges occurring simultaneously.
  • The winter 2020-21 wave was the first severe wave. Before vaccines, before any widespread immunity, the original strain drove a sustained winter surge that overwhelmed hospitals across the Sun Belt, Midwest, and Northeast in sequence. This wave established the baseline for what COVID-19 could do to a fully susceptible population.
  • Delta drove a sharp, sustained summer surge in 2021. Unlike prior waves, which tracked seasonality (rising in winter, falling in summer), Delta spread aggressively through an unvaccinated population during the summer of 2021. Southern states with lower vaccination rates were hit first and hardest. The Delta wave killed approximately 130,000 Americans between June and November 2021 — a toll concentrated heavily among the unvaccinated.
  • Post-Omicron waves have been progressively smaller. Each subsequent wave (BA.4/5 in summer 2022, XBB.1.5 in winter 2022-23, JN.1 in winter 2023-24) has produced lower peak hospitalizations than its predecessor. This reflects accumulated immunity from prior infection and vaccination, improved treatments (antivirals, updated vaccines), and possible reduced inherent severity of circulating strains. The JN.1 wave in late 2023 / early 2024 — the last full wave captured in this dataset — produced roughly one-tenth the hospitalizations of the Omicron BA.1 peak.
  • Mandatory reporting ended October 2024. The federal requirement for hospitals to report COVID-19 admissions to HHS expired, creating a gap in national surveillance. Wastewater data (covered below) now serves as the primary ongoing signal for COVID-19 activity.

6. The Vaccination Campaign   dataviz vaccines

The COVID-19 vaccine rollout was, by any historical measure, extraordinary. Within one year of a novel pathogen being identified, multiple safe and effective vaccines were authorized, manufactured at scale, and administered to hundreds of millions of Americans.

The national vaccination timeline shows three distinct phases:

  1. The initial sprint (December 2020 – June 2021). Vaccines were authorized for emergency use and administered to priority groups (healthcare workers, long-term care residents, the elderly) before broadening to all adults in April 2021. The pace was remarkable — at peak velocity in April 2021, the US was administering over 3 million doses per day. By June 2021, roughly 45% of the population had received at least one dose.
  2. The plateau (July 2021 – on). First-dose uptake plateaued sharply in mid-2021. The remaining unvaccinated population proved more resistant to vaccination — a combination of hesitancy, access barriers, and political polarization. Despite significant public health campaigns, employer mandates, and ongoing Delta-related mortality that disproportionately affected the unvaccinated, the national fully-vaccinated rate plateaued near 70% for primary series completion.
  3. Boosters and bivalent doses. The booster program launched in fall 2021, primarily targeting those 65+ and immunocompromised. An updated bivalent booster targeting Omicron BA.4/5 was authorized in September 2022, but uptake was substantially lower than the primary series — reflecting both waning urgency (Omicron was less severe) and vaccination fatigue. The CDC stopped tracking vaccine data after May 2023 as the federal vaccination program wound down.

6.1. State-Level Variation   dataviz maps

The national averages obscure enormous variation between states.

The spread in peak primary-series vaccination rates across states is striking — a gap of more than 30 percentage points separated the most and least vaccinated states. Several patterns are consistent with other health and political geography data:

  • New England led. Vermont, Massachusetts, Connecticut, and Maine achieved the highest vaccination rates, with more than 80% of their populations completing a primary series. These states combined high trust in public institutions, dense urban healthcare access, and early employer and institutional mandates.
  • Mountain West and Deep South lagged. Wyoming, Idaho, Mississippi, and Alabama had the lowest vaccination rates, often below 55%. These states also experienced higher per-capita COVID-19 mortality, reflecting the direct consequence of the protection gap.
  • The metro-rural divide within states was sharper than the state-level data suggests. A state like Georgia appears near the middle of the distribution, but its rural counties show vaccination rates in the 30-40% range while Atlanta metro approaches 80%. County-level data reveals geography that state aggregates obscure.
  • The gap had real mortality consequences. Studies published throughout 2021-2023 consistently showed that unvaccinated adults were 5-10x more likely to be hospitalized with COVID-19 and 10-15x more likely to die during Delta. The state variation in vaccination rates translated directly into preventable deaths.

The vaccination data program formally ended in May 2023. The federal COVID-19 vaccine tracking infrastructure — which produced daily granular data at county level throughout the pandemic — no longer exists in its original form.

7. Wastewater: The Ongoing Signal   dataviz wastewater

With case counts discontinued in 2023 and hospital reporting ended in 2024, the primary ongoing source of real-time COVID-19 tracking is wastewater surveillance.

SARS-CoV-2 is shed in human feces regardless of whether someone has symptoms or has sought testing. Wastewater treatment plants — which aggregate sewage from thousands to millions of people — serve as passive surveillance systems. The CDC's National Wastewater Surveillance System (NWSS) collects viral load measurements from hundreds of sites nationwide and converts them to percentile scores: 0 means the lowest viral signal ever recorded at that site, 100 means the highest.

The wastewater trend shows several features not visible in hospitalization data (which ended in late 2024):

  • The JN.1 wave (winter 2023-24) was clearly visible. Wastewater percentiles spiked nationally in December 2023 and January 2024, consistent with the last major wave captured in the hospitalization data. The peak was substantial but well below the Omicron BA.1 winter (when some sites recorded their all-time highest signals).
  • A summer 2024 wave appeared. Consistent with the KP.3/XEC variant transitions, wastewater showed an uptick in summer 2024 — a pattern that was barely visible in the (by then curtailed) hospital data but clearly detectable in sewage.
  • Activity has remained at low but non-zero levels. As of early 2026, the national median wastewater percentile sits well below the 50th percentile — meaning most sites are detecting viral levels below their historical midpoint. COVID-19 is present but not producing the surges of the acute pandemic phase.
  • Wastewater leads clinical data by 4-7 days. One of wastewater surveillance's key advantages is its lead time over clinical testing or hospitalization data. When viral shedding rises in sewage, emergency department visits and hospitalizations typically follow within a week. This makes it a useful early-warning system for resource planning.

7.1. COVID Activity by State   dataviz maps

The state-level map shows the most recent four weeks of wastewater activity, expressed as each state's median site percentile.

Geographic variation in wastewater activity reflects both true differences in viral transmission and differences in site coverage. States with very few reporting wastewater treatment plants (often rural states like Wyoming, Montana, or North Dakota) have less reliable state-level estimates, since a single large urban WWTP can dominate the state median. Densely monitored states like California, Illinois, and New York have hundreds of reporting sites, making their state-level estimates much more stable.

The Sunbelt states — Florida, Texas, Arizona — have historically shown elevated summer wastewater signals, likely reflecting indoor congregating driven by heat. Northeast and Midwest states often show higher winter signals. The current map captures a snapshot; the national trend chart above shows how this changes over time.

8. What COVID Data Looks Like Now   analysis endemic

Five years after the first US COVID-19 deaths, the data tells a story of transition from pandemic to endemic:

The surveillance landscape has changed. The dense, multi-layer data infrastructure of 2021-2022 — daily case counts by county, hospital occupancy by state, seven-day test positivity, daily vaccination records — has largely been dismantled. What remains is leaner but more sustainable: continuous wastewater monitoring, periodic genomic sequencing, influenza-season syndromic surveillance integrated with COVID-19.

Immunity has fundamentally reshaped the disease burden. The CDC estimated that by early 2022, roughly 75% of Americans had been infected at least once. Combined with vaccination, population-level immunity against severe disease is high. This doesn't prevent infection — Omicron subvariants evade neutralizing antibodies effectively — but it dramatically reduces hospitalization and death rates per infection compared to the pre-immune era.

The seasonal pattern has asserted itself. COVID-19 now behaves more like influenza or RSV than like a novel pandemic pathogen: a predictable winter surge (driven partly by seasonal behavior, partly by waning immunity), a smaller summer bump, and relatively quiet spring and fall periods. This regularity allows healthcare systems to plan rather than react.

The population most at risk has narrowed. During Delta, an unvaccinated 40-year-old with no comorbidities faced meaningful risk of severe disease. Today, for the same person with prior infection and vaccination history, COVID-19 carries risk more comparable to a bad flu year. The remaining high-risk populations — adults over 75, immunocompromised individuals, those with multiple chronic conditions — still benefit substantially from updated vaccines and antiviral treatment, but the population-wide burden has shifted.

Long COVID remains an open chapter. The datasets analyzed here do not capture long COVID prevalence, which CDC surveys have estimated at anywhere from 6% to 11% of ever-infected adults experiencing persistent symptoms. This represents tens of millions of Americans and remains one of the least well-quantified ongoing consequences of the pandemic.

9. Data and Methods   data methodology

All data from publicly accessible CDC APIs, no authentication required.

  • Variant proportions: CDC Nowcast variant proportions (jr58-6ysp). Filtered to usa_or_hhsregion = "USA" for national estimates. Variants with peak share below 5% grouped as "Other." Data current through February 2026. Proportions are model-smoothed estimates; raw sequencing counts shown in confidence intervals (not displayed here).
  • Hospitalizations: CDC Hospital Respiratory Data (aemt-mg7g). Filtered to jurisdiction = "USA" for national aggregate. Field: total_admissions_adult_covid_confirmed (confirmed adult COVID-19 admissions in the reporting week). Data runs August 2020 – October 2024 when mandatory reporting ended.
  • Vaccination: CDC COVID-19 Vaccinations in the United States, Jurisdiction (unsk-b7fc). National data filtered to location = "US"; state data filtered to two-letter state codes. Fields: administered_dose1_pop_pct, series_complete_pop_pct, additional_doses_vax_pct, bivalent_booster_pop_pct. Data runs December 2020 – May 2023.
  • Wastewater surveillance: CDC NWSS Public SARS-CoV-2 Wastewater Metric Data (2ew6-ywp6). The percentile field represents each site's current viral level ranked against its own historical distribution (0=lowest ever, 100=highest ever). State-level estimates computed as median percentile across reporting sites. National trend computed as median percentile across all sites reporting in each biweekly window, restricted to windows with at least 75 reporting sites. Wastewater data current through September 2025.
  • Charts: Generated using Plotly and Python from the raw API data. All source code available in the site's GitHub repository.