Predicting the Trajectory of Atlantic Hurricanes with a Statistical Model

Thomas Varley • José Koluder-Ramírez • Lucas Pham

Abstract

A hurricane is a tropical cyclone with maximum wind speeds greater than 74 miles per

hour. Hurricanes not only can cause billions of dollars in damage and cause numerous fatalities,

but also have a significant impact on climate change and pollution. During these catastrophic

storms, having accurate knowledge of if and when the storm will affect a certain area is

incredibly important in preparation to minimize loss of property, damage, and environmental

effects. Using historical hurricane data provided by the National Hurricane Center, a statistical

model was created to accurately assess the trajectory in hurricanes once a storm system has been

created. A multiple linear regression model, a regression-based decision tree, and an XGBOOST

model were trained to compare different approaches to predicting hurricane trajectory. The input

of each model was the two location metrics, longitude and latitude, or the two distance metrics,

distance and bearing, as well as two intensity metrics which were maximum wind speed and

maximum wind pressure. An important hyperparameter chosen was how many previous six-hour

timestamps would be included to predict the location of the next timestamp. Using differing

number of timestamps, the optimal statistical models created consisted of the multiple linear

regression model with four six-hour timestamps used to predict the next six-hour location and

intensity of the storm. It was determined that normalizing this model increased the accuracy of

the results from this regression model. The multiple linear regression was accurate using its own

predictions as inputs to determine timestamps more than 1 six-hour timestamp in the future.

Introduction

Hurricanes are rapidly-rotating storm systems that often originate in the Atlantic Ocean

and contain a low-pressure center, a spiral arrangement of extreme thunderstorms, and winds

exceeding 74 miles per hour (8). After forming over the Atlantic Ocean, hurricanes typically

travel north, where they make landfall over North America or the Carribean. When these

hurricanes reach land, the intense wind and rainfall can be catastrophic, often spurring severe

economic instability, claiming the lives of numerous individuals, and causing irreparable damage

to the environment through the destruction of infrastructure that stores toxic materials (7). As

such, it is incredibly important to be able to accurately predict where a hurricane will travel after

its formation, so that people can evacuate the likely path and precautions can be taken to protect

any infrastructure that would be in danger.

The National Weather Service (NWS) and the National Hurricane Center (NHC) have a

variety of models that can be utilized in order to predict the future path of hurricanes. These

models generally fall into one of two categories: dynamical models or statistical models (2).

Dynamical models are incredibly complex, as they account for the physical equations of motion

in the atmosphere and utilize this information to predict the future path of the hurricane. While

dynamical models are oftentimes the most accurate predictor of a hurricane path, their

complexity can lead to exceptionally long runtimes in generating those predictors (1). Statistical

models are much simpler and much faster than their dynamical counterpart, as they utilize

statistical formulas to generate their predictions through the use of historical hurricane data (2).

This paper focuses on the use of several statistical models to predict the future path of a

hurricane.

Using data provided by the National Hurricane Center, a series of statistical models were

created to predict the future path of a hurricane given initial information on the hurricane and

historical hurricane data. A total of three statistical models were trained in order to allow for

comparison when determining the most accurate model to predict the future path of a hurricane:

a linear regression model, an XGBOOST model, and a regression-based decision tree. For each

of the three models, an input consists of a series of location and intensity metrics, which the

models then used to generate their prediction for the path of the hurricane. For the location

metrics, longitude and latitude of the hurricanes were applied first to each of the three models to

generate predictions and then switched the location metrics to distance and bearing in order to

compare and determine which predictions were more accurate. For the intensity metrics, the

maximum wind speed and maximum wind pressure were applied to each of the three models.

Beyond simply altering which metrics were input for each model, it was also necessary to

select the optimal number of six-hour timestamps would be utilized by each statistical model to

predict the next location of a hurricane. This number of timestamps served as an important

hyperparameter in the learning process, as it impacted the effectiveness of the three statistical

models. Upon calculating the mean absolute error for the three statistical models, each using a

variety of different numbers of six-hour timestamps, it was determined that the most effective

statistical model for predicting the path of a hurricane was a multiple linear regression model that

had the selected hyperparameter of four six-hour timestamps, as this was the least complex

model amongst multiple models that achieved similarly small mean absolute errors. Through

normalization, this model was able to further reduce mean absolute error and improve in

predictive accuracy. This result demonstrates that a more complex model does not necessarily

lead to a more accurate model, as multiple linear regression models that utilized more than four

six-hour timestamps to make their predictions often suffered from overfitting and were more

inaccurate as a result.

Methods

The data from the National Hurricane Center came in the form of a text file. This given

data is a (50911 x 23) matrix with 50911 observations and 23 different variables describe 1864

total hurricanes that occurred in the Atlantic Ocean from 1851 to 2018. The file was not a CSV

and could not be read traditionally with built-in Pandas functions. For each hurricane, the

identifying information was listed such as hurricane name, year, and unique identifying codes.

Then there was a variable amount of data for each six-hour timestamp provided. Each timestamp

consisted of the time and date and the atmospheric measurements. A PDF was provided that gave

a detailed explanation of how to successfully read the text file. To acquire the data, each line of

the file was read and relevant information was extracted using the specific indexing information

provided in the PDF.

After reading in all the lines and using the associated substring to extract the specific data

points, the information was stored as a list of hurricane objects. Each object consisted of

properties for their name, year, and other hurricane specific identifying information.

Additionally, each hurricane object consisted of a Pandas DataFrame that held the timestamp

specific data for each timestamp within a hurricane.

To create a model, it was determined what features would be relevant. It was then

decided to include the latitude and longitude which represent the location of the hurricane at a

timestamp as well as the maximum wind speed and pressure to represent the intensity of a

hurricane at a timestamp. The other features included in the data provided by the National

Hurricane Center were nearly all missing for hurricanes before 2004 which represented a

significant portion of the data. Thus, those features were not used in the models. To determine

the optimal number of timestamps included, models were tested using differing numbers of

timestamps from 3 timestamps to 10 timestamps of all 4 input variables. Thus, the input size

ranged from (3,4) to (10,4). To organize the list of hurricane objects obtained by extracting the

data from the text file into input and output data, hurricane objects that had fewer observations

than the required window length were discarded. Then, hurricanes were either split into the

training or testing set using sklearn.TrainTestSplit assigning 80% of the hurricanes as training

hurricanes and the rest as test hurricanes. For the training hurricanes and the test hurricanes

separately, for each window length, inputs were created by selecting any instance of timestamps

of length equal to the window length with the next timestamp becoming the output for this

particular input and output pair. For example, a hurricane that recorded 10 timestamps would be

able to provide 5 input and output pairs using a window length of 5 where each input had

dimension (5,4) and each output had dimension (1,4). As all models required a two-dimensional

input and the input was three dimensional once all training samples were combined, the input

matrix was transposed so that a single input would have a dimension of (1, window length * 4).

Additionally, as all training inputs were created, a list of the actual lines used to make the input

and output respectively without repeats. For example, a hurricane with only 10 timestamps

would contribute the first 9 timestamps to this non-repeated list.

Three different variations using all three types of models were first investigated. Using all

three models with the data as is, normalizing the data, and determining the distance and bearing

from the latitude and longitude between two points to replace the latitude and longitude data

points as in the input were three variations of our original data that were hypothesised to have an

impact on the performance of the model. Once the data has been organized into training input,

training output, test inputs, test outputs and list of training lines without repeats one could easily

use the training inputs and outputs to train the model and the test inputs and outputs to evaluate

the model’s performance by mean absolute error. To see the effect of normalizing the data, the

Python library StandardScalar was used. A scalar was trained on the list of training inputs

without repeats and a scalar on the test inputs to determine the mean and standard deviation of

each feature within the training set and testing set. Once the training and testing means and

standard deviations were found, the normalization process was applied to the training and testing

input features and normalized training output. The normalized training input and training outputs

were used to train each of the three models.Then the normalized output was used to create

predictions from the trained model and the prediction was transformed back to the original scale

and the mean absolute error was calculated with the test output. To investigate using the distance

and bearing as opposed to the longitude and latitude, it was determined how to calculate the

distance on the Earth in terms of kilometers between two points using a builtin geopy function

and using a bearing value estimated from the two coordinates. To convert an input of a certain

length with longitude and latitude values to distance and bearing values, the calculation utilized

the distance and bearing between consecutive timestamp locations. Once the distance and

bearing are calculated the later timestamp’s maximum wind speed and maximum wind pressure

were used as the intensity metric. Thus, an input consisting of 5 timestamps with longitude and

latitude coordinates will create an input of 4 timestamps or, in general, one less timestamp than

the original input’s window length. The output will be the distance and bearing from the original

input’s last timestamp location to the original output’s timestamp location. The output’s

maximum wind speed and maximum intensity will be the same as the original output. The three

models are then trained and tested in a similar manner as the non-normalized longitude and

latitude based models.

In order to assess a model’s performance on predicting timestamps further than one in the

future using previous predictions as inputs, the models were first created using the normalized

longitude and latitude method with window lengths form 3 to 10 timestamps. Then, to determine

a model’s accuracy in determining a given number of timestamps in the future, a prediction was

made using the first window length of inputs. Then all inputs except the first time stamp were

used in addition to the new prediction to predict the second timestamp and so on until we

reached the desired amount of timestamps in the future. Once we reach that desired timestamp,

we recorded the prediction and the actual values for the output. Using all the test hurricanes that

had the required number of timestamps to predict the necessary timestamps into the future, we

recorded the mean absolute error for a given number of timestamps in the future.

Figure 1: The project’s overall workflow

Standard Score Normalization:

Standard Score Normalization (or Z-scores) linearly transforms data values to having a mean of

zero and a standard deviation of 1. The Z-score of each value is equal to the differenceZ

between itself and the sample mean divided by the sample standard deviation. This formula is

used to normalize longitude, latitude, wind speed, and wind pressure.

Mean Absolute Error

In this paper, Mean Absolute Error (MAE) is used to justify the correctness of predictions for

different window lengths instead of the normally used Mean Square Error (MSE) or Root Mean

Square Error (RMSE). In this case, are the differences between the actual values in they

− x

testing set and the predicted values derived from using data in the training set as predictors. By

calculating the mean of those differences (dividing the sum of all absolute differences by the

number of observations n), the performance of the model can be accessed based on how close the

predictions are regarding the actual data. There are multiple reasons why MAE is a better

justification tool than RMSE for this analysis. First of all, between the two, from an

interpretation standpoint, MAE is beneficial due to its simplicity. Secondly, since the errors in

this model are considerably small, it’s more appropriate to use MAE instead of RMSE, since

RMSE tends to work more effectively with penalizing large errors (5).

Multiple Linear Regression

Multiple Linear Regression is being used in this project as one of the three main fitting models

(along with Decision Tree and XGBOOST). Multiple Linear Regression is a straightforward

method and that can be interpreted to assess what features are most important in predicting

outputs. Furthermore, throughout the course of this project, the decision to try and fit the

multiple linear regression with different window lengths (i.e. different number of variables) is

due to the expectation of a linear correlation between different variables such as wind speed,

wind pressure, etc. and the location/path of the hurricane.

Bearing angle between two points: Longitude and Latitude

Bearing can be defined as direction or an angle, between the north-south line of earth or meridian

and the line connecting the target and the reference point (4). The given longitude and latitude

give us the distance between the points, but bearing can tell us the direction that it’s going.

Bearing from point A to point B can be calculated as:

, atan2(X, Y ) β =

where is a function is defined as the angle in the Euclidean plane, given in radians,

tan2(X, Y ) a

between the positive Y-axis and the ray to the point , while X and Y are two

X, Y ) ≠ (0, 0) (

quantities and can be calculated as:

cos θ sin ΔL X =

cos θ sin θ in θ os θ os ΔL Y =

− s

with are latitude of point A, latitude of point B, and the difference between the 2

, θ , ΔL θ

longitudes, respectively.

Decision Tree Regression

When it comes to small-to-medium structured/tabular data, decision tree based algorithms are

considered best-in-class as of the time of this report (3). Therefore, this method of statistical

modeling is used as one out of three models to train the data. Decision tree learning uses a

decision tree (as a predictive model) to go from observations about an item (represented in the

branches) to conclusions about the item's target value (represented in the leaves). Based on

specific conditions, the tree is being splitted to more branches and creates more sub-trees as the

tree’s height increases. The main reason why both Decision Tree Regression and Multiple Linear

Regression Machine Learning models are being used in this project is because unlike linear

models, Decision Tree Regression map nonlinear relationships quite well. Therefore, having both

models in the same study can determine whether the relationship is linear or not.

XGBOOST

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient

boosting framework implemented in Python. Moreover, XGBoost is described as the top

tree-based algorithm, integrated from Decision Tree, that optimized Gradient Boosting algorithm

through parallel processing, tree-pruning, handling missing values and regulations to avoid

overfitting and bias (6). Although XGBoost doesn’t work best for all given prediction models,

it’s such a powerful tool that should always be considered. Furthermore, since Decision Tree is

one of the three models we decided to use, XGBoost can serve as a tool to compare the gap in

predicting performance between the two.

Results and Discussion

Figure 2: Scatter plots depicting MAE per Window Length using various models

Top Left: Scatter plot depicting the mean absolute error of location features using all three

models across different window lengths. Top Right: Scatter plot depicting the mean absolute

error of all features using all three models across different window lengths. Middle Left:Scatter

plot depicting the mean absolute error of location features using all three models across

different window lengths with normalized input. Middle Right:Scatter plot depicting the mean

absolute error of all features using all three models across different window lengths with

normalized input. Bottom Left: Scatter plot depicting the mean absolute error of distance

features using all three models across different window lengths with distance and bearing.

Bottom Right: Scatter plot depicting the mean absolute error of all features using all three

models across different window lengths with distance and bearing.

The first objective of this investigation was to determine the optimal model for hurricane

path prediction, along with determining whether using distance or location metrics would

improve accuracy, and finding the optimal number of timestamps to predict the next immediate

timestamp’s location and intensity. As shown in Figure 1, using the longitude and latitude as

opposed to the distance and bearing produced more accurate results when predicting both the

trajectory of the hurricane and the combined trajectory and intensity. The distance based metrics

were not normalized, as opposed to the location based metrics, because the bearing is an angle

and the information would be lost by normalizing the angle. The distance based metrics may

have performed poorly because converting the variables to distance between location variables

could potentially result in a loss of information as to where the hurricane is actually at during a

timestamp. For example, hurricanes in the Gulf of Mexico may generally behave differently than

hurricanes in the middle of the Atlantic Ocean. That information is preserved when the longitude

and latitude are used, but when distance and bearing are used, the information about the specific

location of the hurricane is lost. Thus, there are significant errors in the models that used distance

and bearing instead of longitude and latitude.

In every graph in Figure 1 where longitude and latitude are the location metrics, the

multiple linear regression produces an output with the least mean absolute error for both

combinations of variables. Based upon the mean absolute error of the distance metrics and all

variables combined, the optimal model is the multiple linear regression. The optimal number of

timestamps used to predict the next timestamp is not as apparent. Of the four graphs where

longitude and latitude are used as the features. The number of timestamps from three to ten

perform approximately the same on the test hurricanes when measuring the mean absolute error

of the latitude and longitude only. However, when the mean squared error is calculated using the

longitude latitude, maximum wind speed, and maximum pressure, there is a locally optimal

window length at four and nine timestamps. In practice, using only four timestamps, or 24 hours,

of data may be more accessible and accurate than determining using nine timestamps, or 54

hours of data. Additionally a model that uses four timestamps is significantly less complex than a

model that uses nine timestamps. The four-timestamp model must only learn 17 parameters

while the nine-timestamp model must learn 37 parameters. Normalizing this model when using

longitude and latitude brought a small decrease in the mean squared error for nearly all window

length. Thus, the optimal model is the normalized multiple linear regression model that uses a

window length of four-timestamps.

Figure 3: Top scatter plot depicts the mean absolute error of location variables of predicting a

given number of steps away from the initial window length input values. Bottom scatter plot

depicts the mean absolute error of all four features when predicting a given number of steps

away from the initial window length of input values.

In Figure 3, as the number of timestamps away from the window length of original input

timestamps increased, the accuracy of all the multiple linear regression models decreased for all

window lengths used, whether the mean absolute error was calculated using all four features or

just the longitude and latitude features. However, for nearly every timestamp form the original

input provided that was tested, using four as the window length for the model was able to

provide the most accurate model. Potential reasoning behind the four- timestamp model being

the best at predicting the trajectory of the hurricane as opposed to using more timestamps is that

only the first few timestamps of six hours are important in determining a hurricanes trajectory for

the next trajectory and making a more complex model that uses timestamps of data from more

than 24 hours or four timestamps ago adds additional noise to the model and thus overfit the

model to the training data and hinders the model’s performance on the test data.

Figure 4: Scatter plots depicting the predicted path of a hurricane in blue and the actual path of

a hurricane in red for three sample hurricanes using a multiple linear regression with a window

length of 4. Each model uses its previous predictions to predict the next target for all predictions

after the initial 4 timestamps are given.

Figure 4 shows the multiple linear regression model training with input of window length

four being used to predict test hurricanes where only the first four timestamps were given and the

model used its predictions to predict the next outputs. The general trend seems to be that the

simpler the path of the hurricane, the better the linear model is able to predict timestamps past

one timestamp into the future. The top right hurricane in Figure 4 shows arguably the most linear

of the three hurricanes. Similarly, the hurricane on the bottom is relatively simple and does not

loop back it on itself indicating that the model is able to better predict the hurricane’s path. The

linear model seems to predict the path of the hurricane with high accuracy even when the

hurricane is over 10 timestamps past the original four timestamps. However, another aspect of

the prediction is the actual distance travelled. Sometimes the hurricane will travel quickly in one

six-hour timestamp but slowly in another. The linear models fails to capture that varying distance

and chooses the constant distance between consecutive points such that it minimizes the overall

error. Thus, if a hurricane exhibits varying distances between points, the linear model exhibits a

worse performance that compounds as more predictions are made. This is confirmed by the

values in Figure 5 as the most linear looking hurricane, the one in the top right, performs worse

than the model in the bottom even though the top right hurricane appears more linear.

The trend that a simpler hurricane path is easier to predict the trajectory holds true for the

top left hurricane which loops back in on itself and the linear model performs the worst on this

hurricane. The linear model performing better on simpler hurricanes makes sense conceptually as

it is difficult for a linear model to learn patterns beyond what can be represented as a sum of a

set number of linear parameters.

Hurricane Graph Location

Mean Absolute Error of Location Features

Top Left

3.092

Top Right

2.852

Bottom

0.606

Figure 5: Table depicting the mean absolute error of the three hurricanes in Figure 4

Conclusion

During this investigation into determining the optimal statistical model of three chosen

models, and the necessary number timestamps to be included, to most accurately predict the

trajectory of a hurricane, it was determined that using a multiple linear regression model with a

normalized input of the four previous six-hour timestamps of longitude, latitude, maximum wind

speed, and maximum pressure were sufficient to accurately predict the trajectory of a hurricane

not only for the next timestamp, but also for numerous timestamps after the initial prediction,

using only the given predicted timestamps. However, the nonlinearity of hurricane motion

indicates that a more complex model is necessary to improve the accuracy of predicting a

hurricane’s trajectory. It is proposed that utilizing a more complex model such as a Recurrent

Neural Network or obtaining more detailed information about the hurricanes such as shorter time

stamps or additional features would increase the accuracy of the trajectory prediction.

References

1. Sheila, et al. “Predicting Hurricane Trajectories Using a Recurrent Neural Network.”

ArXiv.org



, 12 Sept. 2018, arxiv.org/abs/1802.02548.

2. Hurricanes: Science and Society “Statistical, Statistical-Dynamical, and Trajectory

Models.” Hurricanes



www.hurricanescience.org/science/forecast/models/modeltypes/statistical/.

3. Brid, Rajesh S. “Decision Trees - A Simple Way to Visualize a Decision.” Medium



GreyAtom, 26 Oct. 2018,

medium.com/greyatom/decision-trees-a-simple-way-to-visualize-a-decision-dc506a403ae

4. Upadhyay, Akshay. “Formula to Find Bearing or Heading Angle between Two Points:

Latitude Longitude.” GIS MAP INFO



, 31 May 2019,

www.igismap.com/formula-to-find-bearing-or-heading-angle-between-two-points-latitud

e-longitude/.

5. Jj. “MAE and RMSE - Which Metric Is Better?” Medium



, Human in a Machine World,

23 Mar. 2016,

medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bd

e13d.

6. Morde, Vishal. “XGBoost Algorithm: Long May She Reign!” Medium



, Towards Data

Science, 8 Apr. 2019,

towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-ma

y-rein-edd9f99be63d.

7. Mosier, Jeff. “Texas Plants Spewed 8 Million Pounds of Air Pollutants as Hurricane

Harvey Hit.” Phys.org



, Phys.org, 17 Aug. 2018,

phys.org/news/2018-08-texas-spewed-million-pounds-air.html.

8. “Hurricanes.” Hurricanes | National Oceanic and Atmospheric Administration



www.noaa.gov/education/resource-collections/weather-atmosphere-education-resources/

hurricanes.