Economic Effects of Weather Events with R

Synopsis

Using data from the NOAA Storm Database ranging from 1950 to November 2011, we review what effect different types of weather events have on the United States. We compare what weather events are most harmful to the health of the population, in terms of injuries and fatalities, in total number (sum) and per weather event (mean). We also review which weather events cause the greatest economic effect, in total (sum) and per event (mean), based on property damage and crop damage. We find that tornados are most harmful to public health, and flooding most costly to the nation’s economy.

Data Processing

We first read in our entire data set, found at this link using the following code:

library(dplyr)
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

We apply some filtering to separate into a Public Health dataset and an Economic Damage dataset. In the economic dataset, we create a new column to convert K, M, and B, to their relevant numeric values (1e3, 1e6, 1e9) in computing total monetary damages.

Public Health Data Set

We select only the relevant categories, and then apply a mean and sum, grouping by event type. This is then returned in descending order of overall fatalities.

data.health <- select(data, EVTYPE, FATALITIES, INJURIES)
data.health.final <- arrange(summarise_each(group_by(data.health, EVTYPE), funs(mean, sum)), desc(FATALITIES_sum))

Economic Data Set

We select only the relevant categories, express damages in dollars instead of abbreviated terms, and then calculate the mean and sum of each event type. We then return the results in descending order of overall monetary damages.

full.num <- c("K" = 1e3, "k" = 1e3, "M" = 1e6, "m" = 1e6, "B" = 1e9, "b" = 1e9, "0" = 1)
data.econ <- select(data, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
data.econ$PROPDMGTOT <- data.econ$PROPDMG * full.num[as.character(data.econ$PROPDMGEXP)]
data.econ$CROPDMGTOT <- data.econ$CROPDMG * full.num[as.character(data.econ$CROPDMGEXP)]
data.econ[is.na(data.econ)] <- 0
data.econ$DMGTOT <- data.econ$PROPDMGTOT + data.econ$CROPDMGTOT
data.econ.final <- arrange(summarise_each(group_by(select(data.econ, EVTYPE, DMGTOT, PROPDMGTOT, CROPDMGTOT), EVTYPE), funs(mean, sum)), desc(PROPDMGTOT_sum))

Results

Health Effects of Weather Events on United States Population

head(data.health.final)
## Source: local data frame [6 x 5]
## 
##           EVTYPE FATALITIES_mean INJURIES_mean FATALITIES_sum INJURIES_sum
## 1        TORNADO        0.092874       1.50607           5633        91346
## 2 EXCESSIVE HEAT        1.134088       3.88856           1903         6525
## 3    FLASH FLOOD        0.018019       0.03274            978         1777
## 4           HEAT        1.221643       2.73794            937         2100
## 5      LIGHTNING        0.051796       0.33198            816         5230
## 6      TSTM WIND        0.002292       0.03163            504         6957

Sorted by descending overall fatalities from 1950-2011, Tornados are easily the leading cause of both injuries and fatalities. However, it is important to note that the mean fatality and injury rate for a tornado is NOT the highest. It is the sheer rate of tornado weather events that culminates in its deadly statistics, but individually it is not the most deadly or deleterious.

Monetary harm in the form of Property and Crop Damage

head(data.econ.final)
## Source: local data frame [6 x 7]
## 
##              EVTYPE DMGTOT_mean PROPDMGTOT_mean CROPDMGTOT_mean DMGTOT_sum
## 1             FLOOD     5935390         5711826       2.236e+05  1.503e+11
## 2 HURRICANE/TYPHOON   817201282       787566364       2.963e+07  7.191e+10
## 3           TORNADO      945593          938752       6.842e+03  5.735e+10
## 4       STORM SURGE   165990579       165990559       1.916e+01  4.332e+10
## 5       FLASH FLOOD      323565          297378       2.619e+04  1.756e+10
## 6              HAIL       64984           54501       1.048e+04  1.876e+10
## Variables not shown: PROPDMGTOT_sum (dbl), CROPDMGTOT_sum (dbl)

Sorted in descending order of most costly overall from 1950-2011, floods are the most expensive by a significant margin. This is expressed in the following plot:

barplot(data.econ.final$DMGTOT_sum[1:5], horiz = FALSE, names.arg=substr(data.econ.final$EVTYPE[1:5],1,9), xlab = "Event Type", ylab = "Property + Crop Damage in Dollars",
        main = "5 Most Expensive Weather Events Overall")

plot of chunk plot.totaldamage

Note that once again, the damage per weather event does not necessarily follow the costliest overall. In fact, among those event types with at least several discrete events recorded, a hurricane tends to be the most expensive per event. This can be seen in the following logarithmic plot:

barplot(data.econ.final$DMGTOT_mean[1:5], log = "y", horiz = FALSE, names.arg=substr(data.econ.final$EVTYPE[1:5],1,9), xlab = "Event Type", ylab = "log(Property + Crop Damage per Event, in dollars)",
        main = "Mean Cost per Event (logarithmic in dollars)")

plot of chunk plot.meandamage

 

Leave a Reply

Your email address will not be published. Required fields are marked *