23
Chapter 3 Research Methodology

project

Embed Size (px)

Citation preview

Page 1: project

Chapter 3Research Methodology

Page 2: project

CHAPTER IV

Research Methodology

The objective of this chapter is to provide review about the research design. This chapter will

focus on the explaining the methodology used in the project. This chapter also explains the

significance of different method that are important for statistical analysis to increase the

significance of the study.

3.1 Introduction

It very important to evaluate and introduce the methods used in the study. It helps in determining

the statistical outline of the project. In the past various studies have been conducted on rural

banking by various researchers. It is very difficult to study the effect of eBanking on satisfaction

level of customers in rural market. Mostly studies on rural banking are conducted in Developing

countries both especially African and South Asia. In India researches have been undertaken but

the main focus of study are limited up Regional Rural Banks and Microfinance companies in

rural market. Only very few studies are conducted on private banks presence in Rural market and

the significance of internet Banking. So it is very important to understand the importance of this

study to analyse the effectiveness of eBanking in rural market. So it is very important to select

the appropriate statistical method to increase the significance of the study.

So in this chapter an attempt has been made to present the methodology adopted for the study,

such as data collection, data analysis and detailed description of the results.

3.2 Research Design

A research design is a systematic plan to study a scientific problem. The design of a study

defines the study type (descriptive, correlational, semi-experimental, experimental, review, meta-

analytic) and sub-type (e.g., descriptive-longitudinal case study), research question, hypotheses,

independent and dependent variables, experimental design, and, if applicable, data collection

methods and a statistical analysis plan.

Descriptive Research design has been used in this study.

Page 3: project

Descriptive research design is the method where both qualitative and quantitative variable has

been used in the study. Descriptive research design includes three methods of three types:-

1. Observational Method

2. Case Study Method

3. Survey Method.

Here in this study survey method has been used. In survey method research, participants answer

questions administered through interviews or questionnaires. After participants answer the

questions, researchers describe the responses given. Here close ended questionnaire has been

used in the study.

In this study questionnaire is based on the eBanking satisfaction of people in rural segment. The

surveyed population was required to respond to different variables on the basis of five point

Likert’s scales, which rated 1 as least satisfactory and 5 as most satisfactory.

The collected data has been analyzed through IBM statistical software SPSS. At the outset

Cronbach Alpha test has been employed to check the internal consistency (reliability) of the data.

Later on Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and

sphericity of collected data. To diagnose the problem of multi-co linearity degree of correlation

has been estimated. As the results have shown problem of co-linearity, factor analysis has been

done as a tool of dimension reduction. The results have further been analyzed through regression

to establish the relation of RGER scores with overall satisfaction level of customers.

3.3 Statement of ProblemSince the privatization of Banks in 1994 Indian banking sector has grown at rapid speed. Due to

intense competition and entry of new players it is very tough for existing players to maintain its

profitability in the market. As the time is passing banks are going beyond the urban market. Even

the new guidelines of RBI are also increasing the pressure on banking players to increase their

presence in rural market. As a result big private players like HDFC Bank and ICICI Bank are

following the aggressive strategy to explore opportunities in rural market. But there involve huge

administration cost in rural banking which is a tough task for rural banks. Here eBanking can

play a significant role to reduce down the administration cost (cost of operations) for banks so it

is very important for private banks to understand the importance of eBanking. Banks can

understand the significance of eBanking by conducting the study on existing rural customers.

Page 4: project

Hence this study aims ate determining the satisfaction level of rural customer with eBanking.

Here simply questionnaire method has been used in the study. The collected data has been

analyzed through IBM statistical software SPSS. Cronbach Alpha test has been employed to

check the internal consistency (reliability) of the data. Later on Kaiser-Meyer-Olkin and

Bartlett’s tests were conducted to test sample adequacy and reliability of collected data. To

diagnose the problem of multi-co linearity degree of correlation has been estimated. As the

results have shown problem of co-linearity, factor analysis has been done as a tool of dimension

reduction. The results have further been analyzed through regression to establish the relation of

RGER scores with overall satisfaction level of customers.

3.4 Objectives of the Study

Main Objective: - The main objective of study is to understand the role of eBanking in a

bank to penetrate the rural market.

Secondary Objectives:-I. To view the scope of the Banking in rural market?

II. To analyse overall satisfaction of rural customers from e-banking services.

III. To identify the factors that influence rural customers’ satisfaction from e-

banking.

IV. To identify the primary obstacles hindering the wide acceptability and propensity

to use e-banking as a primary banking channel in rural areas.

V. To find out the ability of rural people to understand the banking activities.

VI.

3.5 Research Methodology

3.5.1 Data Collection

The study is primarily based upon primary data collected through a questionnaire from rural

users of e-banking channels from different villages of Shimla District of Himachal Pradesh

(India). Questionnaire comprises of 2 general questions and 17 questions relating to variables to

be studied. The selection of variables is based upon previous research work. The surveyed

Page 5: project

population was required to respond to different variables on the basis of five point Likert’s

scales, which rated 1 as least satisfactory and 5 as most satisfactory.

3.5.1 Analysis of data

The collected data has been analyzed through IBM statistical software. At the outset Cronbach

Alpha test has been employed to check the internal consistency (reliability) of the data. Later on

Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and sphericity

of collected data. To diagnose the problem of multi-co linearity degree of correlation has been

estimated. As the results have shown problem of co-linearity, factor analysis has been done as a

tool of dimension reduction. The results have further been analyzed through regression to

establish the relation of RGER scores with overall satisfaction level of customers.

3.6 Statistical Tools Used in the study.

3.6.1 Likert Scale: A method of ascribing quantitative value to qualitative data, to make it

amenable to statistical analysis. A numerical value is assigned to each potential choice and a

mean figure for all the responses is computed at the end of the evaluation or survey.

Variables used in Scales: - Generally there are five variables used in the study ranging from

Strongly Disagree at one end to strongly agree to other end. This is also quite useful for

evaluating a respondent’s opinion of important purchasing, product, or satisfaction features. The

scores can be used to create a chart of the distribution of opinion across the population. For

further analysis, you can cross tabulate the score mean with contributing factors. For the score to

have meaning, each item in the scale should to be closely related to the same topic of

measurement. Here in this study Likert scale is based on satisfaction level of customer where 1

stands for least satisfaction and 5 for most satisfaction.

3.6.2 Cronbach Alpha test:- Cronbach's (alpha) is a coefficient of internal consistency. It

is commonly used as an estimate of the reliability of a psychometric test for a sample of

examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue

with further coefficients. The measure can be viewed as an extension of the Kuder– Richardson

Formula 20 (KR-20), which is an equivalent measure for dichotomous items. Alpha is not robust

Page 6: project

against missing data. Several other Greek letters have been used by later researchers to designate

other measures used in a similar context.

The theoretical value of alpha varies from zero to 1, since it is the ratio of two variances.

However, depending on the estimation procedure used, estimates of alpha can take on any value

less than or equal to 1, including negative values, although only positive values make sense.

Higher values of alpha are more desirable. Some professionals, as a rule of thumb, require a

reliability of 0.70 or higher (obtained on a substantial sample) before they will use an instrument.

Obviously, this rule should be applied with caution when has been computed from items that

systematically violate its assumptions. Furthermore, the appropriate degree of reliability depends

upon the use of the instrument.

Cronbach's alpha will generally increase as the intercorrelation among test items increase, and is

thus known as an internal consistency estimate of reliability of test scores. Because

intercorrelation among test items are maximized when all items measure the same construct,

Cronbach's alpha is widely believed to indirectly indicate the degree to a set of items measures a

single unidimensional latent construct. However, the average intercorrelation among items is

affected by skew just like any other average. Thus, whereas the modal intercorrelation among

test items will equal zero when the set of items measures several unrelated latent constructs, the

average intercorrelation among test items will be greater than zero in this case. Indeed, several

investigators have shown that alpha can take on quite high values even when the set of items

measures several unrelated latent constructs. As a result, alpha is most appropriately used when

the items measure different substantive areas within a single construct.

Cronbach's alpha Internal consistency

α ≥ 0.9 Excellent (High-

Stakes testing)

0.7 ≤ α < 0.9 Good (Low-Stakes

testing)

0.6 ≤ α < 0.7 Acceptable

0.5 ≤ α < 0.6 Poor

α < 0.5 Unacceptable

Page 7: project

3.6.3 Kaiser-Meyer-Olkin :- Sampling adequacy predicts if data are likely to factor well,

based on correlation and partial correlation. In the old days of manual factor analysis, this was

extremely useful. KMO can still be used, however, to assess which variables to drop from the

model because they are too multicollinear.

There is a KMO statistic for each individual variable, and their sum is the KMO overall statistic.

KMO varies from 0 to 1.0 and KMO overall should be .60 or higher to proceed with factor

analysis. If it is not, drop the indicator variables with the lowest individual KMO statistic values,

until KMO overall rises above .60.

To compute KMO overall, the numerator is the sum of squared correlations of all variables in the

analysis (except the 1.0 self-correlations of variables with themselves, of course). The

denominator is this same sum plus the sum of squared partial correlations of each variable i with

each variable j, controlling for others in the analysis. The concept is that the partial correlations

should not be very large if one is to expect distinct factors to emerge from factor analysis.

It is an index to identify whether sufficient correlation exist among the variables has checked the

sampling adequacy or not. It compares the magnitudes of the observed correlation coefficients

with the partial correlation coefficients. The minimum acceptable value of KMO is 0.50.

3.6.4 Bartlett’s tests:- Bartlett's test (see Snedecor and Cochran, 1989) is used to test if k

samples are from populations with equal variances. Equal variances across samples is called

homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of

variance, assume that variances are equal across groups or samples. The Bartlett test can be used

to verify that assumption.

Bartlett's test is sensitive to departures from normality. That is, if the samples come from non-

normal distributions, then Bartlett's test may simply be testing for non-normality. Levene's test

and the Brown–Forsythe test are alternatives to the Bartlett test that are less sensitive to

departures from normality.

Bartlett's test is used to test the null hypothesis, H0 that all k population variances are equal

against the alternative that at least two are different.

Page 8: project

If there are k samples with size ni. and sample variances Si2then Bartlett's test statistic is:-

3.6.5 Multicolinearity:- It is a statistical phenomenon in which two or more predictor

variables in a multiple regression model are highly correlated, meaning that one can be linearly

predicted from the others with a non-trivial degree of accuracy. In this situation the coefficient

estimates of the multiple regression may change erratically in response to small changes in the

model or the data. Multicollinearity does not reduce the predictive power or reliability of the

model as a whole, at least within the sample data themselves; it only affects calculations

regarding individual predictors. That is, a multiple regression model with correlated predictors

can indicate how well the entire bundle of predictors predicts the outcome variable, but it may

not give valid results about any individual predictor, or about which predictors are redundant

with respect to other.

Detection of Multicolinearity:-

Indicators that multicolinearity may be present in a model:

1. Large changes in the estimated regression coefficients when a predictor variable is added

or deleted

2. Insignificant regression coefficients for the affected variables in the multiple regression,

but a rejection of the joint hypothesis that those coefficients are all zero (using an F-test)

3. If a multivariable regression finds an insignificant coefficient of a particular explanatory,

yet a simple linear regression of the explained variable on this explanatory variable

Page 9: project

shows its coefficient to be significantly different from zero, this situation indicates

multicollinearity in the multivariable regression.

4. Some authors have suggested a formal detection-tolerance or the variance inflation factor

(VIF) for multicollinearity:

Tolerance = 1−R j2, VIF=

1tolerance

where R j2 is the coefficient of determination of a regression of explanator j on all the other

explanators. A tolerance of less than 0.20 or 0.10 and/or a VIF of 5 or 10 and above

indicates a multicollinearity problem.

5. Condition number test: The standard measure of ill-conditioning in a matrix is the

condition index. It will indicate that the inversion of the matrix is numerically unstable

with finite-precision numbers (standard computer floats and doubles). This indicates the

potential sensitivity of the computed inverse to small changes in the original matrix. The

Condition Number is computed by finding the square root of (the maximum eigenvalue

divided by the minimum eigenvalue). If the Condition Number is above 30, the

regression is said to have significant multicollinearity.

6. Farrar–Glauber test: If the variables are found to be orthogonal, there is no

multicollinearity; if the variables are not orthogonal, then multicollinearity is present. C.

Robert Wichers has argued that Farrar–Glauber partial correlation test is ineffective in

that a given partial correlation may be compatible with different multicollinearity

patterns. The Farrar–Glauber test has also been criticized by other researchers.

7. Construction of a correlation matrix among the explanatory variables will yield

indications as to the likelihood that any given couplet of right-hand-side variables are

creating multicollinearity problems. Correlation values (off-diagonal elements) of at least

.4 are sometimes interpreted as indicating a multicollinearity problem.

Consequences of Multicolinearity:-

1. Even extreme multicollinearity (so long as it is not perfect) does not violate OLS

assumptions. OLS estimates are still unbiased and BLUE (Best Linear Unbiased

Estimators)

Page 10: project

2. Nevertheless, the greater the multicollinearity, the greater the standard errors. When high

multicollinearity is present, confidence intervals for coefficients tend to be very wide and

t-statistics tend to be very small. Coefficients will have to be larger in order to be

statistically significant, i.e. it will be harder to reject the null when multicollinearity is

present.

3. Note, however, that large standard errors can be caused by things besides

multicollinearity.

4. When two variabless are highly and positively correlated, their slope coefficient

estimators will tend to be highly and negatively correlated. When, for example, b1 is

greater than β1, b2 will tend to be less than β2. Further, a different sample will likely

produce the opposite result. In other words, if you overestimate the effect of one

parameter, you will tend to underestimate the effect of the other. Hence, coefficient

estimates tend to be very shaky from one sample to the next.

Remedies for Multicolinearity:-

1. Make sure you have not fallen into the dummy variable trap; including a dummy variable

for every category (e.g., summer, autumn, winter, and spring) and including a constant

term in the regression together guarantee perfect multicollinearity.

2. Try seeing what happens if you use independent subsets of your data for estimation and

apply those estimates to the whole data set. Theoretically you should obtain somewhat

higher variance from the smaller datasets used for estimation, but the expectation of the

coefficient values should be the same. Naturally, the observed coefficient values will

vary, but look at how much they vary.

3. Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn't

affect the efficacy of extrapolating the fitted model to new data provided that the

predictor variables follow the same pattern of multicollinearity in the new data as in the

data on which the regression model is based.

4. Drop one of the variables. An explanatory variable may be dropped to produce a model

with significant coefficients. However, you lose information (because you've dropped a

variable). Omission of a relevant variable results in biased coefficient estimates for the

remaining explanatory variables.

Page 11: project

5. Obtain more data, if possible. This is the preferred solution. More data can produce more

precise parameter estimates (with lower standard errors), as seen from the formula in

variance inflation factor for the variance of the estimate of a regression coefficient in

terms of the sample size and the degree of multicollinearity.

6. Mean-center the predictor variables. Generating polynomial terms can cause some

multicollinearity if the variable in question has a limited range. Mean-centering will

eliminate this special kind of multicollinearity. However, in general, this has no effect. It

can be useful in overcoming problems arising from rounding and other computational

steps if a carefully designed computer program is not used.

7. Standardize your independent variables. This may help reduce a false flagging of a

condition index above 30.

8. It has also been suggested that using the Shapley value, a game theory tool, the model

could account for the effects of multicollinearity. The Shapley value assigns a value for

each predictor and assesses all possible combinations of importance.

9. Ridge regression or principal component regression can be used.

10. If the correlated explanators are different lagged values of the same underlying

explanator, then a distributed lag technique can be used, imposing a general structure on

the relative values of the coefficients to be estimated.

Here in this study multicolinearity have been analysed between different factors. If there is

multicolinearity factor and regression analysis has been used to reduce the multicolinearity.

3.6.6 Factor Analysis:- Factor analysis is a tool to reduce the number of variables to such a

small number that could be capable enough to explain the observed variance in the large number

of variables. It reduces the number of variables to such a small number which could be capable

enough to explain observed variance in the large number of variables. Initially the communalities

of variables have been calculated to represent the amount of variation extracted from each

variable. Variable with higher value is expected to represent better one. The extraction of

variable is done by principal component analysis method.

Page 12: project

3.6.7 Regression Analysis: - It is a statistical process for estimating the relationships

among variables. It includes many techniques for modeling and analyzing several variables,

when the focus is on the relationship between a dependent variable and one or more independent

variables. More specifically, regression analysis helps one understand how the typical value of

the dependent variable (or 'criterion variable') changes when any one of the independent

variables is varied, while the other independent variables are held fixed. Most commonly,

regression analysis estimates the conditional expectation of the dependent variable given the

independent variables – that is, the average value of the dependent variable when the

independent variables are fixed. Less commonly, the focus is on a quantile, or other location

parameter of the conditional distribution of the dependent variable given the independent

variables. In all cases, the estimation target is a function of the independent variables called the

regression function. In regression analysis, it is also of interest to characterize the variation of the

dependent variable around the regression function which can be described by a probability

distribution.

Regression analysis is widely used for prediction and forecasting, where its use has substantial

overlap with the field of machine learning. Regression analysis is also used to understand which

among the independent variables are related to the dependent variable, and to explore the forms

of these relationships. In restricted circumstances, regression analysis can be used to infer causal

relationships between the independent and dependent variables. However this can lead to

illusions or false relationships, so caution is advisable; for example, correlation does not imply

causation.

Many techniques for carrying out regression analysis have been developed. Familiar methods

such as linear regression and ordinary least squares regression are parametric, in that the

regression function is defined in terms of a finite number of unknown parameters that are

estimated from the data. Nonparametric regression refers to techniques that allow the regression

function to lie in a specified set of functions, which may be infinite-dimensional.

The performance of regression analysis methods in practice depends on the form of the data

generating process, and how it relates to the regression approach being used. Since the true form

of the data-generating process is generally not known, regression analysis often depends to some

Page 13: project

extent on making assumptions about this process. These assumptions are sometimes testable if a

sufficient quantity of data is available. Regression models for prediction are often useful even

when the assumptions are moderately violated, although they may not perform optimally.

However, in many applications, especially with small effects or questions of causality based on

observational data, regression methods can give misleading results.

Regression models involve the following variables:

The unknown parameters, denoted as β, which may represent a scalar or a vector.

The independent variables, X.

The dependent variable, Y.

In various fields of application, different terminologies are used in place of dependent and

independent variables.

A regression model relates Y to a function of X and β.

Y Y ≈ f (X ,β )

The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis,

the form of the function f must be specified. Sometimes the form of this function is based on

knowledge about the relationship between Y and X that does not rely on the data. If no such

knowledge is available, a flexible or convenient form for f is chosen.

Assume now that the vector of unknown parameters β is of length k. In order to perform a

regression analysis the user must provide information about the dependent variable Y:

If N data points of the form (Y, X) are observed, where N < k, most classical approaches

to regression analysis cannot be performed: since the system of equations defining the

regression model is underdetermined, there are not enough data to recover β.

If exactly N = k data points are observed, and the function f is linear, the equations Y =

f(X, β) can be solved exactly rather than approximately. This reduces to solving a set of

N equations with N unknowns (the elements of β), which has a unique solution as long as

the X are linearly independent. If f is nonlinear, a solution may not exist, or many

solutions may exist.

Page 14: project

The most common situation is where N > k data points are observed. In this case, there is

enough information in the data to estimate a unique value for β that best fits the data in

some sense, and the regression model when applied to the data can be viewed as an

overdetermined system in β.

In the last case, the regression analysis provides the tools for:

1. Finding a solution for unknown parameters β that will, for example, minimize the

distance between the measured and predicted values of the dependent variable Y (also

known as method of least squares).

2. Under certain statistical assumptions, the regression analysis uses the surplus of

information to provide statistical information about the unknown parameters β and

predicted values of the dependent variable Y.

Classical assumptions for regression analysis:

1. The sample is representative of the population for the inference prediction.

2. The error is a random variable with a mean of zero conditional on the explanatory

variables.

3. The independent variables are measured with no error. (Note: If this is not so, modeling

may be done instead using errors-in-variables model techniques).

4. The predictors are linearly independent, i.e. it is not possible to express any predictor as a

linear combination of the others.

5. The errors are uncorrelated, that is, the variance–covariance matrix of the errors is

diagonal and each non-zero element is the variance of the error.

6. The variance of the error is constant across observations (homoscedasticity). If not,

weighted least squares or other methods might instead be used.

Summary and Conclusion

Page 15: project

In this chapter an attempt has been made to provide the justification for the research design. The

study is a descriptive research design based on questionnaire. The questionnaire has been formed

in a simplest way so that positive response can be received. The questionnaire has been

constructed to determine the satisfaction level with eBanking in rural market. Questionnaire has

tried to cover almost all necessary elements based on eBanking. Questionnaire has been prepared

on the basis of Likert’s five points scale.

The collected data has been analyzed through IBM statistical software SPSS. At the outset

Cronbach Alpha test has been employed to check the internal consistency (reliability) of the data.

Later on Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and

sphericity of collected data. To diagnose the problem of multi-co linearity degree of correlation

has been estimated. As the results have shown problem of co-linearity, factor analysis has been

done as a tool of dimension reduction. The results have further been analyzed through regression

to establish the relation of RGER scores with overall satisfaction level of customers.