1. INTRODUCTION

ESTIMATING AUTO INSURANCE PREMIUM USING

GENERALISED LINEAR MODELS

1.1

BACKGROUND INFORMATION AND PROBLEM

STATEMEENT

The basic role of

insurance is to pool fortuitous losses together, provide financial protection,

offering a means of transferring the risk of losses in exchange of an insurance

premium. Insured’s pose varying level of risks to an insurer. It is rational that

the rates charged correspond to individual risk levels.

The non-uniform rates charged are emphasized

by the non-homogenous nature of insurance portfolio which brings about the phenomenon of

anti-selection. This basically implies charging same rates for the entire

portfolio, meaning that the unfavorable risks are also insured at a lower rate

and as a negative effect, it discourages insuring medium risks. According to

Ohlsson and Johansen (2010), the idea is that if an insurance company charges

too high a premium for some policies, these will be lost to a competitor with a

fairer price. Suppose that a company charges too little for young drivers and

too much for old drivers; then they will tend to loose old drivers to

competitors while attracting young drivers; this adverse selection will result

in economic loss both ways: by losing profitable and gaining underpriced

policies. Therefore, non-life-insurance pricing techniques are mainly employed to

combat the phenomenon of anti-selection by dividing the insurance portfolio

into sub-portfolios based on certain influence variables. Every sub-division thus

will contain policyholders with identical risk profile that will be charged the

same reasonable tariff

A

method usually employed to estimate the premium involves combining the

conditional expectation of the claim frequency with the expected cost of

claims, considering the observable risk characteristics. The process of

evaluating risks in order to determine the insurance premium is performed by

the actuaries, which over time proposed and applied different statistical

methods. Most often a distinction is

made between the overall premium level of an insurance portfolio and the

question of how the total required premium should be divided between the policyholders.

The overall premium level is

based on considerations on predicted future costs (reserves) and the cost of

capital (expected profit), as well as the market situation. Historically, these

calculations have not involved much statistical analysis. On the other hand,

the question of how much to charge each individual given the premium level

involves the application of rather advanced statistical models.

Statistical models suggest a simple summary of data in terms

of the major systematic effects together with a summary of the nature and

magnitude of the random variation. Regression analysis plays a key role in

statistics as one of its most powerful and widely used techniques for analyzing

models and predicting future trends. However, the simple linear regression is

not always the best tool due to the following reasons. First of all, the

dependent variable of interest may have a non-continuous distribution implying

that the predicted values should also follow the respective distribution.

However, since the simple linear regression is based on the assumption that the

response variable follows a normal distribution only, it may not be the best

model to analyze the model which follows the normal distribution. In addition,

when the effect of the predictors on the dependent variable is not linear in

nature the simple linear regression model is inadequate.

The Generalized

Linear Models (GLMs) is an extension of the linear modeling process to a wider

class of problems involving the relationship between a response and one or more

explanatory variables. In this context, linear regression used to evaluate the

effect of explanatory variables on the event of interest, has been replaced

starting with 1980 by the GLMs. These models according to Michaela (2013) allow

modeling a non-linear behavior and a non-Gaussian distribution of residuals. This

property is very useful for the analysis of non-life insurance, where claim

frequency and claim cost follow an asymmetric density that is clearly

non-Gaussian. The establishment of the GLMs has brought an improvement in the

quality of risk predictive modeling techniques given the nature of risks to

give a fair tariff.

The main

objective of this paper is to apply the GLMs in order to assess the premiums

applied to each insured, in an equitable and reasonable manner.

1.2 SCOPE OF STUDY

The next section presents a review of the literature concerning

the application of the GLMs in non-life insurance pricing. Section 3 describes

the research methodology employed in this paper. Each subsection of this part probes

the estimation methods of the claim frequency and cost of claims, leading to

the calculation model of the pure premium. Section 4 is presents a study on an

auto-insurance branch data in order to explain briefly the risk factors that

enable dividing the insurance portfolio in premium classes and how to obtain

the associated or corresponding premium. Section 5 presents the conclusions of

the study.

2. LITERATURE REVIEW

Historically,

the Gaussian linear regression model proposed by Legendre and Gauss in the 19th

century, limited the application of actuarial science models in explaining

realistic and possible risk occurrences. The model proposed mainly to quantify

the impact of exogenous variables over the phenomenon of interests, has taken

lead in econometrics but the application of this model in insurance has been

found to be difficult. According to Michaela (2013), the linear model implied a

series of hypothesis that are not compatible with the reality imposed by

frequency and cost of damages generated by risk occurrences. The Gaussian

probability density, the linearity of the predictor and homoscedasticity are

the most relevant assumptions of the model.

The increasing

complexity of statistical methods and development of wide risk contests meant

that actuaries had to find models that explained risk occurrences as realistic

as possible. A great development in non-life insurance pricing, the Minimum

Bias procedure was introduced by Bailey and Simon (1960). The method defines

randomly the link between the explanatory variables, the risk levels and the

difference between predicted values and observed ones.

Once these variables are established, an iterative algorithm

can be used to calculate the coefficient associated with each risk level using

the minimizing distance criterion.

According to Michaela (2013), the algorithm has been found subsequently

to be a particular case of the Generalized Linear Models although it was

created outside a recognized statistical framework.

The

implementation merits of these models, both in statistics and actuarial science

are attributed to British actuaries from City University, John Nelder and

Robert Wedderburn (1972). They prove that the generalization of the linear

modeling allows the deviation from the assumption of normality, extending the

Gaussian model to a particular family of distributions, known as the

exponential family. Members belonging to this family of distributions include,

but not limited to the Binomial, Normal, Poisson and the Gamma distributions. Regression models where the response variable

is distributed as a member of the exponential family share the same

characteristics. In contrary to the classical normal linear regression, there

are less restrictions here: in addition to the wide gamma of possible response

distributions, the variance need not be constant (heteroscedasticity is

allowed) and the relation between the fitted values and the predictors need not

be linear.

The GLM Models have the advantage of a

theoretical framework that enables the application of statistical tests in

order to evaluate the fitting of models comparing to the fitting of models.

Nelder and Wedderburn (1972) also suggest that the estimation of parameters is

done using the maximum likelihood procedure, so that the parameter estimates

are obtained through iterative an algorithm. The contribution of Nelder in

propounding and completing the GLMs theory continues while collaborating with

Irish statistician Peter McCullagh, whose paper (1989) offers extensive information

on the iterative algorithm and the asymptotic properties (properties true when

the sample size becomes large) of the parameter estimation.

The complexity

and abundance of papers have been remarkable since the establishment of the

GLMs principles. Many authors and scientists have successfully highlighted,

developed and improved the assumptions imposed by the practical applications of

the models in non-life insurance. Among the precursors of the GLMs approach as

the main statistical tool in determining the insurance pricing is noted Jean

Lemaire (1985). Based on these models, he aims to estimate the probability of

risk occurrence in auto insurance, to establish the insurance premium and also

to measure the effectiveness if the models used to estimate it. In this field,

a significant contribution also goes to Arthur Charpentier and Michel Denuit

(2005) who have successfully covered, in a modern view, all the aspects of

insurance mathematics. Recent studies also reveal the contribution of Jong and

Zeeler (2008), Kaas and al. (2009), Frees ( 2010) and Ohlsson and Johansen

(2010), who have highlighted the particularities of the GLMs in non-life

insurance risk modeling.