Using data from the NOAA Storm Database ranging from 1950 to November 2011, we review what effect different types of weather events have on the United States. We compare what weather events are most harmful to the health of the population, in terms of injuries and fatalities, in total number (sum) and per weather event (mean). We also review which weather events cause the greatest economic effect, in total (sum) and per event (mean), based on property damage and crop damage. We find that tornados are most harmful to public health, and flooding most costly to the nation’s economy.
We first read in our entire data set, found at this link using the following code:
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
We apply some filtering to separate into a Public Health dataset and an Economic Damage dataset. In the economic dataset, we create a new column to convert K, M, and B, to their relevant numeric values (1e3, 1e6, 1e9) in computing total monetary damages.
Public Health Data Set
We select only the relevant categories, and then apply a mean and sum, grouping by event type. This is then returned in descending order of overall fatalities.
data.health <- select(data, EVTYPE, FATALITIES, INJURIES) data.health.final <- arrange(summarise_each(group_by(data.health, EVTYPE), funs(mean, sum)), desc(FATALITIES_sum))
Economic Data Set
We select only the relevant categories, express damages in dollars instead of abbreviated terms, and then calculate the mean and sum of each event type. We then return the results in descending order of overall monetary damages.
full.num <- c("K" = 1e3, "k" = 1e3, "M" = 1e6, "m" = 1e6, "B" = 1e9, "b" = 1e9, "0" = 1) data.econ <- select(data, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) data.econ$PROPDMGTOT <- data.econ$PROPDMG * full.num[as.character(data.econ$PROPDMGEXP)] data.econ$CROPDMGTOT <- data.econ$CROPDMG * full.num[as.character(data.econ$CROPDMGEXP)] data.econ[is.na(data.econ)] <- 0 data.econ$DMGTOT <- data.econ$PROPDMGTOT + data.econ$CROPDMGTOT data.econ.final <- arrange(summarise_each(group_by(select(data.econ, EVTYPE, DMGTOT, PROPDMGTOT, CROPDMGTOT), EVTYPE), funs(mean, sum)), desc(PROPDMGTOT_sum))
Health Effects of Weather Events on United States Population
## Source: local data frame [6 x 5] ## ## EVTYPE FATALITIES_mean INJURIES_mean FATALITIES_sum INJURIES_sum ## 1 TORNADO 0.092874 1.50607 5633 91346 ## 2 EXCESSIVE HEAT 1.134088 3.88856 1903 6525 ## 3 FLASH FLOOD 0.018019 0.03274 978 1777 ## 4 HEAT 1.221643 2.73794 937 2100 ## 5 LIGHTNING 0.051796 0.33198 816 5230 ## 6 TSTM WIND 0.002292 0.03163 504 6957
Sorted by descending overall fatalities from 1950-2011, Tornados are easily the leading cause of both injuries and fatalities. However, it is important to note that the mean fatality and injury rate for a tornado is NOT the highest. It is the sheer rate of tornado weather events that culminates in its deadly statistics, but individually it is not the most deadly or deleterious.
Monetary harm in the form of Property and Crop Damage
## Source: local data frame [6 x 7] ## ## EVTYPE DMGTOT_mean PROPDMGTOT_mean CROPDMGTOT_mean DMGTOT_sum ## 1 FLOOD 5935390 5711826 2.236e+05 1.503e+11 ## 2 HURRICANE/TYPHOON 817201282 787566364 2.963e+07 7.191e+10 ## 3 TORNADO 945593 938752 6.842e+03 5.735e+10 ## 4 STORM SURGE 165990579 165990559 1.916e+01 4.332e+10 ## 5 FLASH FLOOD 323565 297378 2.619e+04 1.756e+10 ## 6 HAIL 64984 54501 1.048e+04 1.876e+10 ## Variables not shown: PROPDMGTOT_sum (dbl), CROPDMGTOT_sum (dbl)
Sorted in descending order of most costly overall from 1950-2011, floods are the most expensive by a significant margin. This is expressed in the following plot:
barplot(data.econ.final$DMGTOT_sum[1:5], horiz = FALSE, names.arg=substr(data.econ.final$EVTYPE[1:5],1,9), xlab = "Event Type", ylab = "Property + Crop Damage in Dollars", main = "5 Most Expensive Weather Events Overall")
Note that once again, the damage per weather event does not necessarily follow the costliest overall. In fact, among those event types with at least several discrete events recorded, a hurricane tends to be the most expensive per event. This can be seen in the following logarithmic plot:
barplot(data.econ.final$DMGTOT_mean[1:5], log = "y", horiz = FALSE, names.arg=substr(data.econ.final$EVTYPE[1:5],1,9), xlab = "Event Type", ylab = "log(Property + Crop Damage per Event, in dollars)", main = "Mean Cost per Event (logarithmic in dollars)")