--- title: "Trip updates" author: "Matthew Palm" date: "2026-05-18" output: html_vignette vignette: > %\VignetteIndexEntry{Trip updates} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include= FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette shows how to read GTFS-realtime trip updates into R using the {gtfsrealtime} package. Trip updates describe the real-time progress of a vehicle along a scheduled trip, including predicted arrival and departure times, delays, skipped stops, canceled trips, and other real-time information. The {gtfsrealtime} package reads the nested GTFS-realtime trip update format and flattens it into a data frame that is easier to inspect and analyze in R. ## Load libraries First, we load {gtfsrealtime} to read GTFS-realtime files and {dplyr} to inspect and summarize the resulting data frame. ```{r setup, message=FALSE} library(gtfsrealtime) library(dplyr) ``` ## Load a GTFS-realtime trip updates feed This example uses a New York City trip updates feed included with {gtfsrealtime}. The file is compressed with bzip2 to save space. {gtfsrealtime} can automatically detect and read uncompressed files as well as files compressed with zip, gzip, or bzip2. Zip files can contain multiple GTFS-realtime files, in which case {gtfsrealtime} will read all of them. You can differentiate which file each update came from based on the `file_index` field. GTFS-realtime time values are stored as Unix timestamps, which are interpreted relative to UTC. To convert to local time, we provide a local time zone. Time zones are specified in standardized TZ database format, generally Continent/City. If you do not want to convert times, you can specify a time zone of Etc/UTC. ```{r read-updates, warning=FALSE} updates <- read_gtfsrt_trip_updates( system.file("nyc-trip-updates.pb.bz2", package = "gtfsrealtime"), "America/New_York" ) ``` When reading this example feed, {gtfsrealtime} warns that some GTFS-realtime entity IDs are duplicated. In these cases, the package appends suffixes such as `_duplicated_1` so that each row can be represented with a unique `id`. There are quite a few of them, so they are suppressed here to keep the vignette readable, but the first two are: ``` 1: ! ID UP_A6-Weekday-SDon-094800_B6_243 is duplicated. Replacing with UP_A6-Weekday-SDon-094800_B6_243_duplicated_1 . This may cause joins between different GTFS-realtime files (even within a ZIP archive) to be incorrect. 2: ! ID UP_A6-Weekday-SDon-094800_B6_243 is duplicated. Replacing with UP_A6-Weekday-SDon-094800_B6_243_duplicated_2 . This may cause joins between different GTFS-realtime files (even within a ZIP archive) to be incorrect. ``` These warnings are useful in practice: duplicated entity IDs can affect workflows that join records across GTFS-realtime files or across multiple files within a ZIP archive, as IDs may no longer match across files. ## Explore trip updates GTFS-realtime trip updates are [hierarchical](https://gtfs.org/documentation/realtime/reference/#message-tripupdate); one trip update can contain information about the trip as a whole as well as updates for multiple stops along that trip. `read_gtfsrt_trip_updates()` flattens that structure into a data frame. As a result, the same `trip_id` may appear in multiple rows when the feed contains stop-level updates for multiple stops. ```{r glimpse-updates} glimpse(updates) ``` ## Inspecting one trip across its stops Because a single trip can include predictions for multiple stops, it is useful to inspect all rows associated with one `trip_id`. In the example below, we select the first trip in the feed and display the route, stop sequence, stop ID, and predicted arrival and departure times for each stop. If a trip update has no stop time updates, it will appear as a single row with all the `stop_*` fields NA. Documentation for all of the columns is in the documentation for `read_gtfsrt_trip_updates()`. ```{r inpsect-updates} updates |> filter(trip_id == first(trip_id)) |> select( trip_id, route_id, stop_id, stop_sequence, arrival_time, departure_time, arrival_delay, departure_delay ) ```