Analytical study of climatic conditions of Shannon Airport

Raghvendra Pratap Singh
8 min readSep 23, 2020

Temperature, rain, sunshine are the primary factors which affect the climatic conditions of any geographical location across the world. This analysis has been conducted on the climatic data collected for Shannon airport. Four factors were taken into account: maximum temperature, minimum temperature, rain and sun shine. According to the data, value of maximum temperature ranged from -6 °C to 32 °C whereas as that of minimum temperature ranged from -11.4 °C to 18.9 °C. Similarly, value of rain ranged from 0 mm to 52.5 mm and that of sun ranged from 0 hour to 15.8 hours. After this, 95% confidence intervals of all four variables were calculated to find the range of values.

Sun and maximum temperature were related to each other with Pearson correlation coefficient, r = 0.25. Similarly, r for rain and minimum temperature fell into the range of 0 and 0.25. No correlation was found between minimum temperature and sun whereas rain and sun were found to be inversely correlated.

After applying ANOVA for the analysis, by using linear regression a predictive analysis was also conducted for rain and sun, respectively. With the standard division of data as 80% and 20% for training and testing, RMSE values of 1.44 and 2.8 were obtained. After running multiple tests with different divisions of test and training data, we were able to obtain 1.26 and 1.17 as RMSE values for rain and sun, respectively.

Airports are in operation since the start of 20th century [2] and with the commercialization, they have become an integral part of the thriving economy of the World. Nowadays, myriads of airplanes take off and land on the airports across the world. There are a lot of factors that affect air traffic which include rainfall, sunshine, temperature etc. This report takes the data collected from Shannon airport, located in the west coast of Ireland, and analyses the variations in temperature, sunshine and rain over the period of 1st May,2005 to 31st October 2019.

Moreover, this report tries to find the relation between the factors taken into account and performs a predictive analysis on the available data set.

Dataset

This data set was collected from the website of Irish Government. It contains the daily data of Shannon Airport from 1st May 2005 to 31st October 2019. Data is about the climatic factors like maximum temperature, minimum temperature, precipitation amount of rain, duration of sunshine etc. with 5297 rows and 25 columns.

Exploratory analysis

The data set contains the factors affecting the climate like the values of maximum temperature, minimum temperature, precipitation of rain (in mm), CBL pressure, sunshine, global radiation and evaporation. For our study, we have focussed on 4 factors: maximum temperature (in °C), minimum temperature (in °C), precipitation of rain (in mm) and duration of sunshine (in hours).

Mean of maximum temperature was calculated as 14.157 °C with 32 °C as highest and -6 °C as the lowest one. Mean of minimum temperature was 7.355 °C with 18.9 °C as highest and -11.4 °C as the lowest. Similarly, mean precipitation of rain was recorded as 2.838 mm with 52.5 mm as highest and 0 mm as the lowest. Similarly, mean duration of sunshine was 3.835 hours with 15.8 hours as maximum and 0 hours as minimum.

Modes of sun shine and rain were recorded as 0 but those of maximum and minimum temperature were recorded as 12.7 °C and 9.6 °C, respectively.

Median values of sun shine and rain were recorded as 2.8 hours and 0.7 mm, respectively whereas those of maximum and minimum temperature were recorded as 14.2 °C and 7.6 °C, respectively.

When plotting of this data was done, we found below results:

Apparently, there is a right skewness in the plots of sunshine and rain whereas the plots of maximum and minimum temperatures formed the bell curve with few outliers. Due to the right skewness, the mean will lie at the right most part of the median. Also, mean would be at the left of the peak value. Similar inference can be drawn from the box plots, like below:

To understand the variation, we replaced the 0 hours values of sunshine with the median value, i.e., 2.8 hours. It didn’t create much difference in the skewness and the obtained result was as follows:

Later, to find the range in which most of the values were falling, we conducted the 95% confidence interval test and obtained following table:

After this, Pearson test was conducted to find the correlation between four factors.

Because correlation didn’t provide enough evidences of relation in numerical values, we set the null hypothesis of ANOVA, H0 = μ1 = μ2 = μ3 = μ4 where μ1 = maxtp, μ2 = mintp, μ3 = rain, μ4 = sunshine. We used one-way ANOVA method to test the hypothesis and found below observations:

With these values, we inferred that none of the factors are directly related to one another with the available data and alternate hypothesis was passed in this scenario.

Finally, to do a predictive analysis, we used the linear regression and divided our data in the ratio of 80% (for training) and 20% (for testing). We intended to predict the values of sunshine and rain using the values of remaining three factors. For sun, we got the root mean square error (RMSE) value of 2.8 whereas for the rain, we got RMSE value as 1.44. Multiple tests were conducted for rain by changing the percentage share of test set and best RMSE was recorded as 1.26 when 95% of data was set for testing. Moreover, when we tried to predict the values of sun without any training set, i.e., inclusion of 100% data as test, we found the best RMSE as 1.17.

Later, we filtered out the days where the values of rain and sun were greater than the mean of rain and mean of sun, respectively and the 390 days were found satisfying this scenario. Also, we set four conditions:

1. Days when maxtp = -6 °C (lowest maximum temperature) and sunshine > mean of sunshine.

2. Days when mintp = -11.4 °C (lowest minimum temperature) and sunshine > mean of sunshine.

3. Days when mintp < mean of minimum temperature and sunshine > mean of sunshine.

4. Days when maxtp < mean of maximum temperature and sunshine > mean of sunshine.

Results of above are explained in results and findings section.

Research questions

1. How temperatures, rain and sunshine are related to each other?

2. Were there any day(s) which recorded rain and sunlight values greater than their respective mean, on same day?

3. During lowest temperature, is there any day which recorded sunlight duration greater than its mean?

Methods used and why

To find the average of the values of each variable, we calculated the mean of each of them. Due to the large difference in the highest and lowest values, mode was calculated for all the four variables. We found that duration of sun shine was 0 in most of the days. To understand the variation of data in a better way, by replacing the values from median, it was also calculated. Later, we calculated confidence interval to find the values of the variables which lie in the 95% of range.

To answer our first research question and investigate the relationship between the variables, correlation and ANOVA was performed. Finally, a predictive analysis was done using linear regression to determine the patterns and predict the future trends.

Results and findings

To find the answer of first research question we conducted the Pearson test (as we had the data on interval scale, like temperature) and ANOVA. On one hand, Pearson test showed relationship of all the four variables to an extent but by using ANOVA, our null hypothesis failed.

The calculations for our second research question gave a result of 390 days out of 5297 days. This means that there were 7.36% days during which the rain and sunlight both were greater than their respective means on same day.

For our third research question, we didn’t find any result for condition 1 and condition 2 but condition 3 gave 1060 days which means 20% days satisfied this condition. Similarly, 808 days were found satisfying condition 4 which is 15.25% days.

Finally, for predictive analysis, since the RMSE values varied a lot from 0.5, we found below possible reasons for this:

· Split of the data into training and test had less similarity

· Only four factors taken into the account depends on others factors like geographical location, pressure, evaporation etc.

· Volume of data is less for the prediction.

Conclusions

Climatic conditions depend on assorted factors. Temperature, sunshine and rain are few of them. According to the results obtained for these factors, we can say that sunshine depends on the maximum temperature at any day but is doesn’t depend on minimum temperature or rain. If we talk about rain, there is slight dependency on minimum temperature but no dependency of maximum temperature and sunshine was found using Pearson test. Moreover, along with temperatures, rain and sun, if we would have taken other factors into account, the results produced above could have been slightly better.

To get my ipython notebook, please follow this link of Kaggle!

I have shared my own experience in this article. Please share your thoughts if you find anything incorrect here.

Twitter: @MrTomarOfficial

LinkedIn: https://ie.linkedin.com/in/raghvendra-pratap-singh-tomar

--

--