Regression Modelling Estimation of Marine Diesel Generator Fuel Consumption and Emissions

This study aims to estimate the fuel consumption of marine diesel generators onboard. Objective technical specifications and operational data on the ship's power generating plants and port calls were collected from an oceangoing oil/chemical tanker and used to develop the mathematical model of the plant in the Python and MATLAB environment. The model consists of alternators, prime movers and load distributions of the ship’s power generating plant and provides information on fuel consumption in metric tons calculated based on hours of operation and specific fuel consumption data. Regression models have helped predict future fuel consumption for the plant and the optimal model for the dataset was identified by comparing four


INTRODUCTION
Trustworthy ship system fuel consumption (FC) estimation is relevant in economic and environmental terms (Prpić-Oršić and Faltinsen, 2012).The understanding of fuel consumption dependencies increases the management efficiency of ships.FC systems can be optimized by using predictions.Future estimations are also helpful for the assessment of vessel and system conformity with the regulations of the International Maritime Organization (IMO).IMO regulations in maritime activities aim to lower shipping-related emissions since maritime transportation is a major contributor to air pollution (IMO, 2014).In addition to setting emission limits, the IMO aims to improve marine vessels efficiency by using methods such as Energy Efficiency Design Index (EEDI) and Ship Energy Efficiency Management Plan (SEEMP).Since emissions depend on the quantity of fossil fuels consumed by ship systems, FC optimization is becoming increasingly important.Ship-owners use fuel estimations to comply with regulations and improve the operational efficiency of their vessels (Eide et. al., 2011;Uyanık et. al., 2020).
Marine diesel generators power the ship by using energy obtained from fossil fuel combustion.They consist of a diesel engine as the prime mover and a synchronous alternator for three-phase electricity generation.Power generating plants may have more than one marine diesel generator, depending on the ship's electrical load.The number of marine diesel generators in a plant can vary depending on the type of operation (McGeorge, 1995).Marine diesel generators are continuous fuel consumers that considerably contribute to air pollution.Ship electrical needs are particularly high during cargo transfer operations in ports, resulting in the release of pollutants (Styhre et al., 2017).Real time assessment of marine diesel generator FC can indicate emission quantities and encourage improvements.
Some studies conducted in the last decade have dealt with FC calculation and estimation onboard ships.For instance, Kesgin and Vardar (2001) computed emissions from ships in Istanbul and Canakkale straits using automated identification system (AIS) data.Miola and Ciuffo (2011) proposed an alternative approach to ship pollutant prediction and assessed the reliability of current techniques.Prpić-Oršić and Faltinsen (2012) proposed a method of estimation of ship speed loss and related CO2 emissions for a container ship on the North Atlantic route.Winnes et al. (2015) computed GHG emissions using AIS and ship technical data from ships, then analyzed GHG emission reduction strategies for ships in port areas.They analyzed three scenarios, namely "Alternative Fuel", "Ship Design" and "Operation" at the Port of Gothenburg.Tichavska and Tovar (2015) built a model based on AIS data and the Ship Traffic Emission Assessment Model (STEAM).Their model calculated the emissions from cruisers and ferries in the Port of Las Palmas.Bialystocki and Konovessis (2016) proposed a statistical approach to predicting a ship's FC and speed curves.BalBesikci et al. (2016) constructed an artificial neural network (ANN) to analyze the relationship between engine revolutions per minute (rpm) and outside factors using the noon report data.Chang (2016) analyzed the relationship between carbon emission production and deadweight tonnage of shipping transportation.Leloup et. al., (2016) built a ship propulsion system model involving kites and conducted a fuel estimation for the system.Styhre et al. (2017) developed a model that estimates GHG emissions from ship operations from different ports.Simonsen et al. (2018), constructed a model that predicts FC and energy usage of cruise vessels using AIS data and technical specifications of ships.Wang, et. al. (2018) estimated ship FC using Least Absolute Shrinkage and Selection Operator (LASSO) regression.Yang et al. ( 2019) built a genetic algorithm-based grey-box model to estimate a ship's FC using crude oil tanker's operational data.Le et al. (2020) presented an ANN model to predict the FC of container ships in Korea.Liu and Duru (2020) proposed a probabilistic Bayesian prediction algorithm that forecasts ship emissions based on ship movements gathered from AIS data.Le et al. (2020) conducted a study that predicted the FC of ocean-going container ships using a regression model.The data were acquired from a large container terminal in Korea.Farag and Ölçer (2020) used ANN and multi-regression methods to predict ship power and FC.Uyanik et. al. (2020) compared the performance of different regression algorithms with respect to ship fuel consumption assessment.Reis et al. (2020) developed and tested two feature-oriented models to estimate shipping CO2 emissions using an actual data set from a Ro-Pax ship.Li et. al. (2020) conducted a study that aimed to optimize the exhaust emissions from a marine dualfuel engine using Response Surface Method.Zhu et. al. (2021) introduced a joint model to estimate the fuel consumption of a passenger ship.Kim et al. (2021) created an ANN model which can estimate the FC of a container ship using operational data.Moreira et al. (2021) used an ANN method to estimate FC and ship speed through the establishment of the correlation between propulsion, weather inputs, ship speed and FC.This overview of literature clearly shows that ANN is a popular strategy owing to its good performance with complex problems.The studies generally focus on and examine propulsion from an environmental standpoint.The objective of this study is to enable the estimation of fuel consumption and emissions of marine diesel generators used on a 29681 GT oil/chemical tanker.The ship's power generating plant is a constant emission producer that has not been studied in detail.The plant runs during both navigation and port operations.Particularly in port operations, the number of working generators increases due to higher load demands, resulting in higher air pollutant emissions.A detailed analysis of the plant can help determine exact pollutant quantities and consequently encourage plant innovations.This study provides an improved mathematical model that includes different plant operation modes and regression analysis based on the data obtained from the model's calculations.The model is based on real-time data and operations.The study is significant because the future estimations given focus on the plant's environmental impact.The remainder of the paper is divided into three sections.Section 2 explains the mathematical background, simulation modelling of a marine diesel generator plant and regression models.Section 3 gives the results, findings and comparison of these models, while Section 4 contains conclusions, discussions, and recommendations.

MATHEMATICAL BACKGROUND AND MODEL DESCRIPTION
This section gives the equations used in the ship power plant model, describes the logic and provides validation of the model.Evaluation metrics, regression algorithms and preferred choices are also mentioned in this section.

Ship Power Plant Simulation
Technical specifications, operational data, electrical load in different operating modes and cargo transfer information have been collected from a ship.Position and port of call data have been obtained from AIS.The data were collected between 06/12/2019 Trans.marit.sci.2022; 01: 79-94 and 07/06/2021, and used to develop a mathematical model of the plant, while the plant's fuel consumption was established in Python and MATLAB in this same period.Various regression models were compared to find the best prediction algorithm for the data set.
The prime mover uses heavy fuel oil (HFO) during navigation and marine gas oil (MGO) during harbor and port operations.Fuel specifications were obtained from ship fuel records.The manufacturer of the engine provided look-up tables that include brake-specific FC data for the prime mover in the numerical model.The torque and power transmitted to the synchronous generator were calculated using the manufacturer's stroke, bore and break mean effective pressure (bmep) data.Eq. 1 is the displacement volume (V d ) calculation, where b is the bore and s piston stroke.Eq. 2 gives the transmitted power (P) calculation.Reference generators had six cylinders (n), while the number of firing strokes (N) was calculated by dividing the number of revolutions by 120.η t is transmission efficiency, set at 0.99 (Altosole et al., 2019;Altosole and Figari, 2011;Pulkrabeck, 2004).
Synchronous alternator onboard test trial provided the data for the simulation.Eq. 3 demonstrates the phase voltage calculation using armature current (I A ), armature resistance (R A ), synchronous reactance (X s ) and induced voltage (E A ).
Terminal voltage (V T ) equals √(3V Φ ) for Y-connected phases.Eq. 4 is the calculation of alternator input power (P in ).γ is the angle between E A and I A .Eq. 5 indicates synchronous generator output power that can be calculated using line or phase variables, where I L is line voltage.Eq. 6 is the voltage drop (VD) percentage calculation, using no-load voltage (V nl ) and full-load voltage (V fl ).The higher the synchronous generator load, the lower the rpm of the prime mover.Eq. 7 is the calculation of the prime mover's speed drop (SD) percentage.The generator's no-load speed is nnl, and full-load speed n fl . (4)

SD=( n nl -n fl )/ n fl ) • 100
Eq. 8 explains the relationship between generator power and frequency.The generator's no-load frequency is f nl , the system's operating frequency is f sys , and the slope of the speed-power curve is s p expressed in MW/Hz.
In a system with three equivalent generators functioning in parallel, the total power is equal to the sum of their power.The total power (P tot ) of the system is shown in Eq. 9 (Chapman, 2005;Krause, et al., 2002;Hansen and Michalke, 2008).
The ship power plant simulation consists of three main parts.The prime mover, the synchronous generator and the required electrical load.The model compares the active power of generators, computed in the synchronous generator section, with load and current to determine the number of generators required for the given operating mode.Then, it ensures the operation of the ship's synchronous generators in a predetermined sequence, depending on the number of generators.Generator operation sequence can be determined at the beginning of the simulation and is initially set as 3, 1, 2. The generator sequence is updated in maintenance intervals obtained from the operation manual.In other words, the model is adjusted to ensure the overhauled generator is the last in the sequence.If the generator is in operation, the model fills the binary array with value 1, otherwise with 0. The program next examines the control array to calculate the frequency, speed drop and slope of the running generators' power curve.Then it computes system frequency, power, load, and current requirements for each generator, utilizing the control array and a number of generator data.The computed load of each generator, power of the prime mover, armature currents, internally generated voltages, line voltages, voltage regulation percentages and synchronous power output help determine brake-specific FC under changing load conditions.FC and emissions produced by the generator, both in kg per hour, are calculated using brake-specific FC.Operation types and times have been established based on port call and operational records obtained from the ship.Therefore, the simulation computes total FC and emissions in metric tons.Figure 2 illustrates the logic of the marine diesel generator's mathematical model.
Emission calculations are based on fuel consumption computed by the model and emission factors taken over from Kuzu et al. and 2021;Trozzi, 2010.Eq.10 is the emission calculation and Table 2 shows marine gas oil (MGO) and heavy fuel oil (HFO) emission factors.

Performance Evaluation Model
Error rate is the correlation between the actual value and the model's output that can be used to assess the model's performance.The validity and performance of the marine diesel generator model have been assessed using mean absolute error (MAE) as a relevant, understandable and reliable method (Chai & Drexler, 2014;Willmott & Matsuura, 2005).MAE is a metric that evaluates the discrepancy between real and anticipated values without taking direction into account.Therefore, lower MAE indicates better prediction performance (Chai & Drexler, 2014).Regression algorithms have been evaluated using MAE, correlation score (R 2 ) and root mean squared error (RMSE).The comparison of three evaluation metrics improves assessment reliability to avoid overfitting.R 2 is a well-known performance measurement metric that evaluates the strength of a relation in regression.It ranges from 0 to 1, with higher R 2 being indicative of better performance (Kasuya, 2019).RMSE is used to validate MAE for regression performance measurement purposes.Eq. 11, Eq. 12, and Eq.13 are the equations for MAE, RMSE, and R 2 , respectively. (11) where n is the total number of data, y p calculated or predicted value, y t real or true value and ¯y mean of the data.(Chai & Drexler, 2014;Kasuya, 2019).

Ship Power Plant Simulation Validation
Various performance measures can be used to assess model validation.These performance criteria have been derived both from real data obtained from a ship, and determined using the mathematical model.As a result, performance was tested by comparing calculated and real values.

Fuel Consumption Prediction
The mathematical model computed the plant's FC in the designated period.For future estimations, some regression techniques have been trained based on the data.However, the data vary depending on the working hours of the generator.Data standardization is ensured to improve performance and reduce computation time for complicated methods.Standardization is a scaling method that provides the mean of zero and the standard deviation of one by first subtracting each data point (x) from the mean of data (¯x) and then dividing the standard deviation (σ).Eq. 14 explains the process (Trebuňa et. al., 2014) ANN OLS is a type of generalized linear regression that can be used to model a single response variable that has been recorded on a scale of at least one gap (Craven and Islam, 2011).OLS deals with the correlation between dependent and independent variables.The value of the dependent parameter is obtained using the linear correlation between independent parameters plus error rate (e).Eq. 15 shows the OLS formula. (15) where β is the regression coefficient, X independent variable and y estimated value (Pohlman and Leitneri 2003).
ARIMA is a well-known statistical approach to predicting time-related data.ARIMA(p, d, q) can be defined as a linear combination of past values of yt and e. Eq. 16 shows the formulation of the ARIMA.
· e t-2 +...+ α p · y t-p -m p · e t-q + e t where p, d, and q are ARIMA orders, α and m estimated regression weights, and φ 0 trend member (Mills, 2019).Optimal model orders for each data set are identified by computing Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) using an iterative algorithm.The algorithm iterated every combination of p, q, d within the specified limits.The p and q range was 1-5, while d range was 1-3.These limits depend on autocorrelation and partial autocorrelation plots.Algorithm results suggest that ARIMA (0,2,2) is the ideal model for the data.
SVR determines the acceptable error in the model, as well as the optimal data fitting line.The purpose of the objective function is to minimize coefficients and constraints.The approximation of magnitude of the normal vector ( || w || ) is given in Eq. 17 (Awad and Khanna, 2015) ANN is one of the deep learning algorithms and is widely used in both classification and regression problems.The Python trial results helped identify the number of layers, neurons, activation function, batch size, epochs and optimizer.Rectified linear unit (Relu) performs best with the data compared to other activation which can be defined as y=max (0, x).Since the data are free of complex patterns or large bias, more basic activation functions like Relu can be applied and reduce the ANN training time (Dreyfus, 2005).The Adam optimizer is used for network creation due to its computational efficiency, easy implementation and suitability for large datasets (Kingma and Ba, 2014).Epochs are 50 and batch size is 10 in the model.Figure 7 depicts ANN structure.

RESULTS AND FINDINGS
The mathematical model calculated the FC of the marine diesel generator plant in different ship operation modes between December 6, 2019 and June 7, 2021.Fig 8 illustrates total FC and FC by operation modes in metric tons.Air pollutants from the operations were calculated using FC and emission factors shown in Table 2. Table 3 gives emission production in each operation mode in metric tons.65 % of plant operations occurred during navigation and the rest were in harbor and port areas since the ship's routes in the period examined involved many oceangoing voyages.Even though cargo handling operations require the highest electrical load, they accounted for only 4 % of utilization time and had the lowest fuel consumption of all operations.The simulation computed FC by operation mode and total operating hours.The data were transformed hourly to prepare them for regression analysis.Fig 10 shows hourly FC increments of the plant over the period of 13,268 hours.The figure demonstrates that there are linear relationships between the data and that the models' R2 scores are high as expected.Four regression models, from basic to complex, were adjusted to the data.An attempt was made to use other linear regression algorithms, such as Ridge, Lasso and Elastic Net, with their parameter optimization ending with OLS. Figure 11, 12, 13, and 14 are a comparison between actual (test data) and predicted values, evaluation metrics and one-year predictions of each regression algorithm.As anticipated, R 2 scores were higher for each model, with OLS achieving the highest R 2 of 0.9992.OLS had the lowest MAE and RMSE metrics of 3.932 and 2.935 respectively.
Figure 11 shows that OLS has the best curve overlap of all the methods applied.Lower MAE and RMSE values also support this.However, in spite of OLS being the fastest and the most straightforward method, towards the end, the predicted value line is higher than the actual value line, i.e. predicted values obtained are higher than those obtained by other techniques.Comparison of test data and predicted values, evaluation metrics, and one-year prediction using ARIMA (0, 2, 2).
A similar phenomenon can be seen in Figure 12 for ARIMA (0,2,2).In spite of having the highest MAE, successful curve overlap can be observed.Nevertheless, in the second part of the data, its predicted value curve is lower than the actual value curve and thus yields lower future forecasts.In addition, ARIMA's running time is the second-highest after ANN.SVR's MAE and RMSE are particularly higher, especially at the beginning of the data set.In the second part of the data set, SVR gives a perfect overlap which seemingly suggests better predictions, however, MAE and RMSE values obtained are slightly higher compared to other regression techniques.
Even though it is a complex methodology, the computation time of the SVR model dramatically drops with data standardization.
Trans.marit.sci.2022; 01: 79-94  ANN, on the other hand, has a high R2 score, and its future predictions and other evaluation metrics are decent.Figure 14 shows the curve overlap, metrics and predictions obtained with ANN.ANN seems not to be the optimal choice from this data set.Its computation time is the highest by far and is better suited to more complex datasets with multiple inputs and uncertainty.For this particular dataset, OLS is preferable to other prediction methods owing to its superior performance metrics and computation time.One year later, following the expiry of the period observed, the plant's total FC predicted by the OLS model is 4,322,436 t.Five-year and ten-year predictions are 10,684.86and 18,615,472 t, respectively.Table 4 shows emission predictions for these periods based on estimated data to highlight the environmental effect of the plant.The emission calculation of the forecasted FC is based on the assumption that the route of the ship will remain the same.

CONCLUSION AND DISCUSSION
The research examined the FC and emissions of a ship power plant through mathematical modelling.The numerical simulation calculated FC based on the data obtained from an oceangoing oil-chemical tanker.The study focused on marine diesel generators due to their major adverse environmental impact, especially in port and harbor areas.In addition, ships powered by two-stroke main engines use at least one diesel generator during navigation, resulting in continuous emissions by the plant.Between 6 December 2019 and 7 June 2021, model outputs have shown that the plant has produced 8,592.38 t of CO 2 , 252.15 t of NO x , and 53.98 t of SO 2 .These values indicate substantial emissions even from a single ship plant.Even though IMO takes measures to reduce NO x and SO 2 , CO 2 reduction seems impossible without abandoning fossil fuels.To highlight the issue more efficiently, regression models were used to forecast future FC levels.Four different regression algorithms, suitable for the data, were trained and compared.The analogy identified OLS as the most suitable FC forecasting regression model for the current data.Although the other three algorithms performed successfully in terms of computation time, evaluation metrics and curve overlap quality, OLS is the optimal choice.One, five and ten-year forecasts obtained by OLS application to periods after 7 June 2021 showed that the plant's CO 2 production capacity was 58,567,998 t.This quantity of emissions has a significant impact on air pollution, especially given that they were generated by a single ship's auxiliary engines.Though promising, green ports supplying ships with electricity only prevent air pollution in the urban area, and their development is still an ongoing process.Short-term onboard solutions include plant hybridization using alternative energy sources or waste heat energy.Their installation and usage can be faster and would help reduce emissions from marine diesel generators.Long-term solutions include the usage of alternative fuels and energy approaches onboard.However, they are still ongoing projects that need to be improved to be efficiently implemented on ships.Future studies will focus on battery additions to the plant and battery charging strategies due to their faster applicability.

Figure 2 .
Figure 2. The algorithm scheme of the mathematical model.
Figures 3. a, 3. b, and 3. c illustrate power-sharing based on total system load, as determined by parallel running tests on each generator.Generator 1 (Fig 3.a), Generator 2 (Fig 3.b), and Generator 3 (Fig 3.c) have MAEs of 0.0168, 0.0095, 0.0102 respectively for this indicator, all of which are small error rates.For generators 1, 2 and 3, Fig 4. a, Fig 4. b and Figure 4. c show the comparison between frequency decrease due to increasing load measured by governor tests and the frequency computed by the model.The MAEs of frequency calculations for generators 1, 2 and 3 are 0.0034, 0.0039, and 0.004, which is within acceptable limits for this assessment.Fig 4. d illustrates system frequency decrease depending on varying total system load.The parallel running test in ship trials starts with a stable system frequency of 60 Hz.Since the model initially ignores stabilization mechanisms, the system frequency computation error rate is slightly higher, with MAE of 0.0099.Fig 5. compares real and calculated power of synchronous generators as armature current increases.This benchmarking has the MAE of 0.000325, which is an acceptable error rate.Fig 5. b is a comparison between calculated and actual line voltages in the function of increasing armature current.As MAE in this analogy is 0.0035, model line voltage prediction can be assumed to be acceptable.

Figure 3 .
Figure 3.Comparison of measured and calculated power in correlation with varying system loads (a) generator 1 (b) generator 2 (c) generator 3.

Figure 4 .
Figure 4. Validation analogy between measured and calculated frequency in correlation with increasing generator power (a) generator 1 (b) generator 2 (c) generator 3 (d) system.

Figure 5 .
Figure 5. Validation analogy between measured and calculated (a) power and (b) line voltage as armature current increases.

σ
. Transformed and scaled data have 13268 rows and are split into 80 % train and 20 % test sets.The data split configuration is obtained from trials of various split combinations.According to results, the 80-20 split not only provides the optimum amount of data to test the model, but also a large quantity of training data.Regression models are trained with training data evaluated with test data using evaluation metrics.The optimum model is identified and future fuel consumption predictions ensured.Fig 6. illustrates the FC estimation process.Techniques suitable for plant FC estimation are as follows: • Ordinary least squares (OLS) regression • Auto-regressive integrated moving average (ARIMA) • Support vector regression (SVR) • Figure 7. ANN structure.

FC
Fig 9 illustrates the plant's utilization time distribution by mode of operation.The plant produces 8,592.38 t of CO2, which is remarkable compared to other pollutants.NOx and SO2 are the second and third highest generated pollutants, respectively.

Figure 8 .
Figure 8.Total generator FC by operation mode.
Plant utilization time distribution by operation mode.

FC
of the plant by working hours.

Figure 11 .
Figure 11.Comparison of test data and predicted values, evaluation metrics, and one-year prediction using OLS.

Figure 13 .
Figure 13.Comparison of test data and predicted values, evaluation metrics, and one-year prediction using SVR.

Figure 14 .
Figure 14.Comparison of test data and predicted values, evaluation metrics, and one-year prediction using ANN.

Table 1 .
Table 1 provides the essential specifications of these parts.Load test, governor test, parallel running test, open circuit and short-circuit test results have been obtained from alternator manufacturer's manuals and ship electrical equipment tests conducted before installation.FC and other required data for the prime mover model have been obtained from instructions and manuals of the diesel engine manufacturer.Data on electrical load in different operation modes have been obtained from electrical load tests conducted during ship trials.Figure1is a basic sketch of the ship's electrical distribution system.General information on marine diesel generators and electrical load.

Table 3 .
Emissions from marine diesel generators by operation mode.