Predicting the Trajectory of Atlantic Hurricanes with a Statistical Model
Thomas Varley • José Koluder-Ramírez • Lucas Pham
Abstract
A hurricane is a tropical cyclone with maximum wind speeds greater than 74 miles per
hour. Hurricanes not only can cause billions of dollars in damage and cause numerous fatalities,
but also have a significant impact on climate change and pollution. During these catastrophic
storms, having accurate knowledge of if and when the storm will affect a certain area is
incredibly important in preparation to minimize loss of property, damage, and environmental
effects. Using historical hurricane data provided by the National Hurricane Center, a statistical
model was created to accurately assess the trajectory in hurricanes once a storm system has been
created. A multiple linear regression model, a regression-based decision tree, and an XGBOOST
model were trained to compare different approaches to predicting hurricane trajectory. The input
of each model was the two location metrics, longitude and latitude, or the two distance metrics,
distance and bearing, as well as two intensity metrics which were maximum wind speed and
maximum wind pressure. An important hyperparameter chosen was how many previous six-hour
timestamps would be included to predict the location of the next timestamp. Using differing
number of timestamps, the optimal statistical models created consisted of the multiple linear
regression model with four six-hour timestamps used to predict the next six-hour location and
intensity of the storm. It was determined that normalizing this model increased the accuracy of
the results from this regression model. The multiple linear regression was accurate using its own
predictions as inputs to determine timestamps more than 1 six-hour timestamp in the future.
Introduction
Hurricanes are rapidly-rotating storm systems that often originate in the Atlantic Ocean
and contain a low-pressure center, a spiral arrangement of extreme thunderstorms, and winds
exceeding 74 miles per hour (8). After forming over the Atlantic Ocean, hurricanes typically
travel north, where they make landfall over North America or the Carribean. When these
hurricanes reach land, the intense wind and rainfall can be catastrophic, often spurring severe
economic instability, claiming the lives of numerous individuals, and causing irreparable damage
to the environment through the destruction of infrastructure that stores toxic materials (7). As
such, it is incredibly important to be able to accurately predict where a hurricane will travel after
its formation, so that people can evacuate the likely path and precautions can be taken to protect
any infrastructure that would be in danger.
The National Weather Service (NWS) and the National Hurricane Center (NHC) have a
variety of models that can be utilized in order to predict the future path of hurricanes. These
models generally fall into one of two categories: dynamical models or statistical models (2).
Dynamical models are incredibly complex, as they account for the physical equations of motion
in the atmosphere and utilize this information to predict the future path of the hurricane. While
dynamical models are oftentimes the most accurate predictor of a hurricane path, their
complexity can lead to exceptionally long runtimes in generating those predictors (1). Statistical
models are much simpler and much faster than their dynamical counterpart, as they utilize
statistical formulas to generate their predictions through the use of historical hurricane data (2).
This paper focuses on the use of several statistical models to predict the future path of a
hurricane.
Using data provided by the National Hurricane Center, a series of statistical models were
created to predict the future path of a hurricane given initial information on the hurricane and
historical hurricane data. A total of three statistical models were trained in order to allow for
comparison when determining the most accurate model to predict the future path of a hurricane:
a linear regression model, an XGBOOST model, and a regression-based decision tree. For each
of the three models, an input consists of a series of location and intensity metrics, which the
models then used to generate their prediction for the path of the hurricane. For the location
metrics, longitude and latitude of the hurricanes were applied first to each of the three models to
generate predictions and then switched the location metrics to distance and bearing in order to
compare and determine which predictions were more accurate. For the intensity metrics, the
maximum wind speed and maximum wind pressure were applied to each of the three models.
Beyond simply altering which metrics were input for each model, it was also necessary to
select the optimal number of six-hour timestamps would be utilized by each statistical model to
predict the next location of a hurricane. This number of timestamps served as an important
hyperparameter in the learning process, as it impacted the effectiveness of the three statistical
models. Upon calculating the mean absolute error for the three statistical models, each using a
variety of different numbers of six-hour timestamps, it was determined that the most effective
statistical model for predicting the path of a hurricane was a multiple linear regression model that
had the selected hyperparameter of four six-hour timestamps, as this was the least complex
model amongst multiple models that achieved similarly small mean absolute errors. Through
normalization, this model was able to further reduce mean absolute error and improve in
predictive accuracy. This result demonstrates that a more complex model does not necessarily
lead to a more accurate model, as multiple linear regression models that utilized more than four
six-hour timestamps to make their predictions often suffered from overfitting and were more
inaccurate as a result.
Methods
The data from the National Hurricane Center came in the form of a text file. This given
data is a (50911 x 23) matrix with 50911 observations and 23 different variables describe 1864
total hurricanes that occurred in the Atlantic Ocean from 1851 to 2018. The file was not a CSV
and could not be read traditionally with built-in Pandas functions. For each hurricane, the
identifying information was listed such as hurricane name, year, and unique identifying codes.
Then there was a variable amount of data for each six-hour timestamp provided. Each timestamp
consisted of the time and date and the atmospheric measurements. A PDF was provided that gave
a detailed explanation of how to successfully read the text file. To acquire the data, each line of
the file was read and relevant information was extracted using the specific indexing information
provided in the PDF.
After reading in all the lines and using the associated substring to extract the specific data
points, the information was stored as a list of hurricane objects. Each object consisted of
properties for their name, year, and other hurricane specific identifying information.
Additionally, each hurricane object consisted of a Pandas DataFrame that held the timestamp
specific data for each timestamp within a hurricane.
To create a model, it was determined what features would be relevant. It was then
decided to include the latitude and longitude which represent the location of the hurricane at a
timestamp as well as the maximum wind speed and pressure to represent the intensity of a
hurricane at a timestamp. The other features included in the data provided by the National
Hurricane Center were nearly all missing for hurricanes before 2004 which represented a
significant portion of the data. Thus, those features were not used in the models. To determine
the optimal number of timestamps included, models were tested using differing numbers of
timestamps from 3 timestamps to 10 timestamps of all 4 input variables. Thus, the input size
ranged from (3,4) to (10,4). To organize the list of hurricane objects obtained by extracting the
data from the text file into input and output data, hurricane objects that had fewer observations
than the required window length were discarded. Then, hurricanes were either split into the
training or testing set using sklearn.TrainTestSplit assigning 80% of the hurricanes as training
hurricanes and the rest as test hurricanes. For the training hurricanes and the test hurricanes
separately, for each window length, inputs were created by selecting any instance of timestamps
of length equal to the window length with the next timestamp becoming the output for this
particular input and output pair. For example, a hurricane that recorded 10 timestamps would be
able to provide 5 input and output pairs using a window length of 5 where each input had
dimension (5,4) and each output had dimension (1,4). As all models required a two-dimensional
input and the input was three dimensional once all training samples were combined, the input
matrix was transposed so that a single input would have a dimension of (1, window length * 4).
Additionally, as all training inputs were created, a list of the actual lines used to make the input
and output respectively without repeats. For example, a hurricane with only 10 timestamps
would contribute the first 9 timestamps to this non-repeated list.
Three different variations using all three types of models were first investigated. Using all
three models with the data as is, normalizing the data, and determining the distance and bearing
from the latitude and longitude between two points to replace the latitude and longitude data
points as in the input were three variations of our original data that were hypothesised to have an
impact on the performance of the model. Once the data has been organized into training input,
training output, test inputs, test outputs and list of training lines without repeats one could easily
use the training inputs and outputs to train the model and the test inputs and outputs to evaluate
the model’s performance by mean absolute error. To see the effect of normalizing the data, the
Python library StandardScalar was used. A scalar was trained on the list of training inputs
without repeats and a scalar on the test inputs to determine the mean and standard deviation of
each feature within the training set and testing set. Once the training and testing means and
standard deviations were found, the normalization process was applied to the training and testing
input features and normalized training output. The normalized training input and training outputs
were used to train each of the three models.Then the normalized output was used to create
predictions from the trained model and the prediction was transformed back to the original scale
and the mean absolute error was calculated with the test output. To investigate using the distance
and bearing as opposed to the longitude and latitude, it was determined how to calculate the
distance on the Earth in terms of kilometers between two points using a builtin geopy function
and using a bearing value estimated from the two coordinates. To convert an input of a certain
length with longitude and latitude values to distance and bearing values, the calculation utilized
the distance and bearing between consecutive timestamp locations. Once the distance and
bearing are calculated the later timestamp’s maximum wind speed and maximum wind pressure
were used as the intensity metric. Thus, an input consisting of 5 timestamps with longitude and
latitude coordinates will create an input of 4 timestamps or, in general, one less timestamp than
the original input’s window length. The output will be the distance and bearing from the original
input’s last timestamp location to the original output’s timestamp location. The output’s
maximum wind speed and maximum intensity will be the same as the original output. The three
models are then trained and tested in a similar manner as the non-normalized longitude and
latitude based models.
In order to assess a model’s performance on predicting timestamps further than one in the
future using previous predictions as inputs, the models were first created using the normalized
longitude and latitude method with window lengths form 3 to 10 timestamps. Then, to determine
a model’s accuracy in determining a given number of timestamps in the future, a prediction was
made using the first window length of inputs. Then all inputs except the first time stamp were
used in addition to the new prediction to predict the second timestamp and so on until we
reached the desired amount of timestamps in the future. Once we reach that desired timestamp,
we recorded the prediction and the actual values for the output. Using all the test hurricanes that
had the required number of timestamps to predict the necessary timestamps into the future, we
recorded the mean absolute error for a given number of timestamps in the future.
Figure 1: The project’s overall workflow
Standard Score Normalization:
Standard Score Normalization (or Z-scores) linearly transforms data values to having a mean of
zero and a standard deviation of 1. The Z-score of each value is equal to the differenceZ
x
X
i
between itself and the sample mean divided by the sample standard deviation. This formula is
used to normalize longitude, latitude, wind speed, and wind pressure.
Mean Absolute Error
In this paper, Mean Absolute Error (MAE) is used to justify the correctness of predictions for
different window lengths instead of the normally used Mean Square Error (MSE) or Root Mean
Square Error (RMSE). In this case, are the differences between the actual values in they
|
|
i
x
i
|
|
testing set and the predicted values derived from using data in the training set as predictors. By
calculating the mean of those differences (dividing the sum of all absolute differences by the
number of observations n), the performance of the model can be accessed based on how close the
predictions are regarding the actual data. There are multiple reasons why MAE is a better
justification tool than RMSE for this analysis. First of all, between the two, from an
interpretation standpoint, MAE is beneficial due to its simplicity. Secondly, since the errors in
this model are considerably small, it’s more appropriate to use MAE instead of RMSE, since
RMSE tends to work more effectively with penalizing large errors (5).
Multiple Linear Regression
Multiple Linear Regression is being used in this project as one of the three main fitting models
(along with Decision Tree and XGBOOST). Multiple Linear Regression is a straightforward
method and that can be interpreted to assess what features are most important in predicting
outputs. Furthermore, throughout the course of this project, the decision to try and fit the
multiple linear regression with different window lengths (i.e. different number of variables) is
due to the expectation of a linear correlation between different variables such as wind speed,
wind pressure, etc. and the location/path of the hurricane.
Bearing angle between two points: Longitude and Latitude
Bearing can be defined as direction or an angle, between the north-south line of earth or meridian
and the line connecting the target and the reference point (4). The given longitude and latitude
give us the distance between the points, but bearing can tell us the direction that it’s going.
Bearing from point A to point B can be calculated as:
, atan2(X, Y ) β =
where is a function is defined as the angle in the Euclidean plane, given in radians,
tan2(X, Y ) a
between the positive Y-axis and the ray to the point , while X and Y are two
X, Y ) ≠ (0, 0) (
quantities and can be calculated as:
cos θ sin ΔL X =
B
*
cos θ sin θ in θ os θ os ΔL Y =
A
*
B
s
A
*
c
B
*
c
with are latitude of point A, latitude of point B, and the difference between the 2
, θ , ΔL θ
A
B
longitudes, respectively.
Decision Tree Regression
When it comes to small-to-medium structured/tabular data, decision tree based algorithms are
considered best-in-class as of the time of this report (3). Therefore, this method of statistical
modeling is used as one out of three models to train the data. Decision tree learning uses a
decision tree (as a predictive model) to go from observations about an item (represented in the
branches) to conclusions about the item's target value (represented in the leaves). Based on
specific conditions, the tree is being splitted to more branches and creates more sub-trees as the
tree’s height increases. The main reason why both Decision Tree Regression and Multiple Linear
Regression Machine Learning models are being used in this project is because unlike linear
models, Decision Tree Regression map nonlinear relationships quite well. Therefore, having both
models in the same study can determine whether the relationship is linear or not.
XGBOOST
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient
boosting framework implemented in Python. Moreover, XGBoost is described as the top
tree-based algorithm, integrated from Decision Tree, that optimized Gradient Boosting algorithm
through parallel processing, tree-pruning, handling missing values and regulations to avoid
overfitting and bias (6). Although XGBoost doesn’t work best for all given prediction models,
it’s such a powerful tool that should always be considered. Furthermore, since Decision Tree is
one of the three models we decided to use, XGBoost can serve as a tool to compare the gap in
predicting performance between the two.
Results and Discussion
Figure 2: Scatter plots depicting MAE per Window Length using various models
Top Left: Scatter plot depicting the mean absolute error of location features using all three
models across different window lengths. Top Right: Scatter plot depicting the mean absolute
error of all features using all three models across different window lengths. Middle Left:Scatter
plot depicting the mean absolute error of location features using all three models across
different window lengths with normalized input. Middle Right:Scatter plot depicting the mean
absolute error of all features using all three models across different window lengths with
normalized input. Bottom Left: Scatter plot depicting the mean absolute error of distance
features using all three models across different window lengths with distance and bearing.
Bottom Right: Scatter plot depicting the mean absolute error of all features using all three
models across different window lengths with distance and bearing.
The first objective of this investigation was to determine the optimal model for hurricane
path prediction, along with determining whether using distance or location metrics would
improve accuracy, and finding the optimal number of timestamps to predict the next immediate
timestamp’s location and intensity. As shown in Figure 1, using the longitude and latitude as
opposed to the distance and bearing produced more accurate results when predicting both the
trajectory of the hurricane and the combined trajectory and intensity. The distance based metrics
were not normalized, as opposed to the location based metrics, because the bearing is an angle
and the information would be lost by normalizing the angle. The distance based metrics may
have performed poorly because converting the variables to distance between location variables
could potentially result in a loss of information as to where the hurricane is actually at during a
timestamp. For example, hurricanes in the Gulf of Mexico may generally behave differently than
hurricanes in the middle of the Atlantic Ocean. That information is preserved when the longitude
and latitude are used, but when distance and bearing are used, the information about the specific
location of the hurricane is lost. Thus, there are significant errors in the models that used distance
and bearing instead of longitude and latitude.
In every graph in Figure 1 where longitude and latitude are the location metrics, the
multiple linear regression produces an output with the least mean absolute error for both
combinations of variables. Based upon the mean absolute error of the distance metrics and all
variables combined, the optimal model is the multiple linear regression. The optimal number of
timestamps used to predict the next timestamp is not as apparent. Of the four graphs where
longitude and latitude are used as the features. The number of timestamps from three to ten
perform approximately the same on the test hurricanes when measuring the mean absolute error
of the latitude and longitude only. However, when the mean squared error is calculated using the
longitude latitude, maximum wind speed, and maximum pressure, there is a locally optimal
window length at four and nine timestamps. In practice, using only four timestamps, or 24 hours,
of data may be more accessible and accurate than determining using nine timestamps, or 54
hours of data. Additionally a model that uses four timestamps is significantly less complex than a
model that uses nine timestamps. The four-timestamp model must only learn 17 parameters
while the nine-timestamp model must learn 37 parameters. Normalizing this model when using
longitude and latitude brought a small decrease in the mean squared error for nearly all window
length. Thus, the optimal model is the normalized multiple linear regression model that uses a
window length of four-timestamps.
Figure 3: Top scatter plot depicts the mean absolute error of location variables of predicting a
given number of steps away from the initial window length input values. Bottom scatter plot
depicts the mean absolute error of all four features when predicting a given number of steps
away from the initial window length of input values.
In Figure 3, as the number of timestamps away from the window length of original input
timestamps increased, the accuracy of all the multiple linear regression models decreased for all
window lengths used, whether the mean absolute error was calculated using all four features or
just the longitude and latitude features. However, for nearly every timestamp form the original
input provided that was tested, using four as the window length for the model was able to
provide the most accurate model. Potential reasoning behind the four- timestamp model being
the best at predicting the trajectory of the hurricane as opposed to using more timestamps is that
only the first few timestamps of six hours are important in determining a hurricanes trajectory for
the next trajectory and making a more complex model that uses timestamps of data from more
than 24 hours or four timestamps ago adds additional noise to the model and thus overfit the
model to the training data and hinders the model’s performance on the test data.
Figure 4: Scatter plots depicting the predicted path of a hurricane in blue and the actual path of
a hurricane in red for three sample hurricanes using a multiple linear regression with a window
length of 4. Each model uses its previous predictions to predict the next target for all predictions
after the initial 4 timestamps are given.
Figure 4 shows the multiple linear regression model training with input of window length
four being used to predict test hurricanes where only the first four timestamps were given and the
model used its predictions to predict the next outputs. The general trend seems to be that the
simpler the path of the hurricane, the better the linear model is able to predict timestamps past
one timestamp into the future. The top right hurricane in Figure 4 shows arguably the most linear
of the three hurricanes. Similarly, the hurricane on the bottom is relatively simple and does not
loop back it on itself indicating that the model is able to better predict the hurricane’s path. The
linear model seems to predict the path of the hurricane with high accuracy even when the
hurricane is over 10 timestamps past the original four timestamps. However, another aspect of
the prediction is the actual distance travelled. Sometimes the hurricane will travel quickly in one
six-hour timestamp but slowly in another. The linear models fails to capture that varying distance
and chooses the constant distance between consecutive points such that it minimizes the overall
error. Thus, if a hurricane exhibits varying distances between points, the linear model exhibits a
worse performance that compounds as more predictions are made. This is confirmed by the
values in Figure 5 as the most linear looking hurricane, the one in the top right, performs worse
than the model in the bottom even though the top right hurricane appears more linear.
The trend that a simpler hurricane path is easier to predict the trajectory holds true for the
top left hurricane which loops back in on itself and the linear model performs the worst on this
hurricane. The linear model performing better on simpler hurricanes makes sense conceptually as
it is difficult for a linear model to learn patterns beyond what can be represented as a sum of a
set number of linear parameters.
Hurricane Graph Location
Mean Absolute Error of Location Features
Top Left
3.092
Top Right
2.852
Bottom
0.606
Figure 5: Table depicting the mean absolute error of the three hurricanes in Figure 4
Conclusion
During this investigation into determining the optimal statistical model of three chosen
models, and the necessary number timestamps to be included, to most accurately predict the
trajectory of a hurricane, it was determined that using a multiple linear regression model with a
normalized input of the four previous six-hour timestamps of longitude, latitude, maximum wind
speed, and maximum pressure were sufficient to accurately predict the trajectory of a hurricane
not only for the next timestamp, but also for numerous timestamps after the initial prediction,
using only the given predicted timestamps. However, the nonlinearity of hurricane motion
indicates that a more complex model is necessary to improve the accuracy of predicting a
hurricane’s trajectory. It is proposed that utilizing a more complex model such as a Recurrent
Neural Network or obtaining more detailed information about the hurricanes such as shorter time
stamps or additional features would increase the accuracy of the trajectory prediction.
References
1. Sheila, et al. “Predicting Hurricane Trajectories Using a Recurrent Neural Network.”
ArXiv.org
, 12 Sept. 2018, arxiv.org/abs/1802.02548.
2. Hurricanes: Science and Society “Statistical, Statistical-Dynamical, and Trajectory
Models.” Hurricanes
,
www.hurricanescience.org/science/forecast/models/modeltypes/statistical/.
3. Brid, Rajesh S. “Decision Trees-A Simple Way to Visualize a Decision.” Medium
,
GreyAtom, 26 Oct. 2018,
medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-dc506a403ae
4. Upadhyay, Akshay. “Formula to Find Bearing or Heading Angle between Two Points:
Latitude Longitude.” GIS MAP INFO
, 31 May 2019,
www.igismap.com/formula-to-find-bearing-or-heading-angle-between-two-points-latitud
e-longitude/.
5. Jj. “MAE and RMSE - Which Metric Is Better?” Medium
, Human in a Machine World,
23 Mar. 2016,
medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bd
e13d.
6. Morde, Vishal. “XGBoost Algorithm: Long May She Reign!” Medium
, Towards Data
Science, 8 Apr. 2019,
towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-ma
y-rein-edd9f99be63d.
7. Mosier, Jeff. “Texas Plants Spewed 8 Million Pounds of Air Pollutants as Hurricane
Harvey Hit.” Phys.org
, Phys.org, 17 Aug. 2018,
phys.org/news/2018-08-texas-spewed-million-pounds-air.html.
8. “Hurricanes.” Hurricanes | National Oceanic and Atmospheric Administration
,
www.noaa.gov/education/resource-collections/weather-atmosphere-education-resources/
hurricanes.