Upload
jitender-thakur
View
213
Download
1
Embed Size (px)
Citation preview
Chapter 3Research Methodology
CHAPTER IV
Research Methodology
The objective of this chapter is to provide review about the research design. This chapter will
focus on the explaining the methodology used in the project. This chapter also explains the
significance of different method that are important for statistical analysis to increase the
significance of the study.
3.1 Introduction
It very important to evaluate and introduce the methods used in the study. It helps in determining
the statistical outline of the project. In the past various studies have been conducted on rural
banking by various researchers. It is very difficult to study the effect of eBanking on satisfaction
level of customers in rural market. Mostly studies on rural banking are conducted in Developing
countries both especially African and South Asia. In India researches have been undertaken but
the main focus of study are limited up Regional Rural Banks and Microfinance companies in
rural market. Only very few studies are conducted on private banks presence in Rural market and
the significance of internet Banking. So it is very important to understand the importance of this
study to analyse the effectiveness of eBanking in rural market. So it is very important to select
the appropriate statistical method to increase the significance of the study.
So in this chapter an attempt has been made to present the methodology adopted for the study,
such as data collection, data analysis and detailed description of the results.
3.2 Research Design
A research design is a systematic plan to study a scientific problem. The design of a study
defines the study type (descriptive, correlational, semi-experimental, experimental, review, meta-
analytic) and sub-type (e.g., descriptive-longitudinal case study), research question, hypotheses,
independent and dependent variables, experimental design, and, if applicable, data collection
methods and a statistical analysis plan.
Descriptive Research design has been used in this study.
Descriptive research design is the method where both qualitative and quantitative variable has
been used in the study. Descriptive research design includes three methods of three types:-
1. Observational Method
2. Case Study Method
3. Survey Method.
Here in this study survey method has been used. In survey method research, participants answer
questions administered through interviews or questionnaires. After participants answer the
questions, researchers describe the responses given. Here close ended questionnaire has been
used in the study.
In this study questionnaire is based on the eBanking satisfaction of people in rural segment. The
surveyed population was required to respond to different variables on the basis of five point
Likert’s scales, which rated 1 as least satisfactory and 5 as most satisfactory.
The collected data has been analyzed through IBM statistical software SPSS. At the outset
Cronbach Alpha test has been employed to check the internal consistency (reliability) of the data.
Later on Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and
sphericity of collected data. To diagnose the problem of multi-co linearity degree of correlation
has been estimated. As the results have shown problem of co-linearity, factor analysis has been
done as a tool of dimension reduction. The results have further been analyzed through regression
to establish the relation of RGER scores with overall satisfaction level of customers.
3.3 Statement of ProblemSince the privatization of Banks in 1994 Indian banking sector has grown at rapid speed. Due to
intense competition and entry of new players it is very tough for existing players to maintain its
profitability in the market. As the time is passing banks are going beyond the urban market. Even
the new guidelines of RBI are also increasing the pressure on banking players to increase their
presence in rural market. As a result big private players like HDFC Bank and ICICI Bank are
following the aggressive strategy to explore opportunities in rural market. But there involve huge
administration cost in rural banking which is a tough task for rural banks. Here eBanking can
play a significant role to reduce down the administration cost (cost of operations) for banks so it
is very important for private banks to understand the importance of eBanking. Banks can
understand the significance of eBanking by conducting the study on existing rural customers.
Hence this study aims ate determining the satisfaction level of rural customer with eBanking.
Here simply questionnaire method has been used in the study. The collected data has been
analyzed through IBM statistical software SPSS. Cronbach Alpha test has been employed to
check the internal consistency (reliability) of the data. Later on Kaiser-Meyer-Olkin and
Bartlett’s tests were conducted to test sample adequacy and reliability of collected data. To
diagnose the problem of multi-co linearity degree of correlation has been estimated. As the
results have shown problem of co-linearity, factor analysis has been done as a tool of dimension
reduction. The results have further been analyzed through regression to establish the relation of
RGER scores with overall satisfaction level of customers.
3.4 Objectives of the Study
Main Objective: - The main objective of study is to understand the role of eBanking in a
bank to penetrate the rural market.
Secondary Objectives:-I. To view the scope of the Banking in rural market?
II. To analyse overall satisfaction of rural customers from e-banking services.
III. To identify the factors that influence rural customers’ satisfaction from e-
banking.
IV. To identify the primary obstacles hindering the wide acceptability and propensity
to use e-banking as a primary banking channel in rural areas.
V. To find out the ability of rural people to understand the banking activities.
VI.
3.5 Research Methodology
3.5.1 Data Collection
The study is primarily based upon primary data collected through a questionnaire from rural
users of e-banking channels from different villages of Shimla District of Himachal Pradesh
(India). Questionnaire comprises of 2 general questions and 17 questions relating to variables to
be studied. The selection of variables is based upon previous research work. The surveyed
population was required to respond to different variables on the basis of five point Likert’s
scales, which rated 1 as least satisfactory and 5 as most satisfactory.
3.5.1 Analysis of data
The collected data has been analyzed through IBM statistical software. At the outset Cronbach
Alpha test has been employed to check the internal consistency (reliability) of the data. Later on
Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and sphericity
of collected data. To diagnose the problem of multi-co linearity degree of correlation has been
estimated. As the results have shown problem of co-linearity, factor analysis has been done as a
tool of dimension reduction. The results have further been analyzed through regression to
establish the relation of RGER scores with overall satisfaction level of customers.
3.6 Statistical Tools Used in the study.
3.6.1 Likert Scale: A method of ascribing quantitative value to qualitative data, to make it
amenable to statistical analysis. A numerical value is assigned to each potential choice and a
mean figure for all the responses is computed at the end of the evaluation or survey.
Variables used in Scales: - Generally there are five variables used in the study ranging from
Strongly Disagree at one end to strongly agree to other end. This is also quite useful for
evaluating a respondent’s opinion of important purchasing, product, or satisfaction features. The
scores can be used to create a chart of the distribution of opinion across the population. For
further analysis, you can cross tabulate the score mean with contributing factors. For the score to
have meaning, each item in the scale should to be closely related to the same topic of
measurement. Here in this study Likert scale is based on satisfaction level of customer where 1
stands for least satisfaction and 5 for most satisfaction.
3.6.2 Cronbach Alpha test:- Cronbach's (alpha) is a coefficient of internal consistency. It
is commonly used as an estimate of the reliability of a psychometric test for a sample of
examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue
with further coefficients. The measure can be viewed as an extension of the Kuder– Richardson
Formula 20 (KR-20), which is an equivalent measure for dichotomous items. Alpha is not robust
against missing data. Several other Greek letters have been used by later researchers to designate
other measures used in a similar context.
The theoretical value of alpha varies from zero to 1, since it is the ratio of two variances.
However, depending on the estimation procedure used, estimates of alpha can take on any value
less than or equal to 1, including negative values, although only positive values make sense.
Higher values of alpha are more desirable. Some professionals, as a rule of thumb, require a
reliability of 0.70 or higher (obtained on a substantial sample) before they will use an instrument.
Obviously, this rule should be applied with caution when has been computed from items that
systematically violate its assumptions. Furthermore, the appropriate degree of reliability depends
upon the use of the instrument.
Cronbach's alpha will generally increase as the intercorrelation among test items increase, and is
thus known as an internal consistency estimate of reliability of test scores. Because
intercorrelation among test items are maximized when all items measure the same construct,
Cronbach's alpha is widely believed to indirectly indicate the degree to a set of items measures a
single unidimensional latent construct. However, the average intercorrelation among items is
affected by skew just like any other average. Thus, whereas the modal intercorrelation among
test items will equal zero when the set of items measures several unrelated latent constructs, the
average intercorrelation among test items will be greater than zero in this case. Indeed, several
investigators have shown that alpha can take on quite high values even when the set of items
measures several unrelated latent constructs. As a result, alpha is most appropriately used when
the items measure different substantive areas within a single construct.
Cronbach's alpha Internal consistency
α ≥ 0.9 Excellent (High-
Stakes testing)
0.7 ≤ α < 0.9 Good (Low-Stakes
testing)
0.6 ≤ α < 0.7 Acceptable
0.5 ≤ α < 0.6 Poor
α < 0.5 Unacceptable
3.6.3 Kaiser-Meyer-Olkin :- Sampling adequacy predicts if data are likely to factor well,
based on correlation and partial correlation. In the old days of manual factor analysis, this was
extremely useful. KMO can still be used, however, to assess which variables to drop from the
model because they are too multicollinear.
There is a KMO statistic for each individual variable, and their sum is the KMO overall statistic.
KMO varies from 0 to 1.0 and KMO overall should be .60 or higher to proceed with factor
analysis. If it is not, drop the indicator variables with the lowest individual KMO statistic values,
until KMO overall rises above .60.
To compute KMO overall, the numerator is the sum of squared correlations of all variables in the
analysis (except the 1.0 self-correlations of variables with themselves, of course). The
denominator is this same sum plus the sum of squared partial correlations of each variable i with
each variable j, controlling for others in the analysis. The concept is that the partial correlations
should not be very large if one is to expect distinct factors to emerge from factor analysis.
It is an index to identify whether sufficient correlation exist among the variables has checked the
sampling adequacy or not. It compares the magnitudes of the observed correlation coefficients
with the partial correlation coefficients. The minimum acceptable value of KMO is 0.50.
3.6.4 Bartlett’s tests:- Bartlett's test (see Snedecor and Cochran, 1989) is used to test if k
samples are from populations with equal variances. Equal variances across samples is called
homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of
variance, assume that variances are equal across groups or samples. The Bartlett test can be used
to verify that assumption.
Bartlett's test is sensitive to departures from normality. That is, if the samples come from non-
normal distributions, then Bartlett's test may simply be testing for non-normality. Levene's test
and the Brown–Forsythe test are alternatives to the Bartlett test that are less sensitive to
departures from normality.
Bartlett's test is used to test the null hypothesis, H0 that all k population variances are equal
against the alternative that at least two are different.
If there are k samples with size ni. and sample variances Si2then Bartlett's test statistic is:-
3.6.5 Multicolinearity:- It is a statistical phenomenon in which two or more predictor
variables in a multiple regression model are highly correlated, meaning that one can be linearly
predicted from the others with a non-trivial degree of accuracy. In this situation the coefficient
estimates of the multiple regression may change erratically in response to small changes in the
model or the data. Multicollinearity does not reduce the predictive power or reliability of the
model as a whole, at least within the sample data themselves; it only affects calculations
regarding individual predictors. That is, a multiple regression model with correlated predictors
can indicate how well the entire bundle of predictors predicts the outcome variable, but it may
not give valid results about any individual predictor, or about which predictors are redundant
with respect to other.
Detection of Multicolinearity:-
Indicators that multicolinearity may be present in a model:
1. Large changes in the estimated regression coefficients when a predictor variable is added
or deleted
2. Insignificant regression coefficients for the affected variables in the multiple regression,
but a rejection of the joint hypothesis that those coefficients are all zero (using an F-test)
3. If a multivariable regression finds an insignificant coefficient of a particular explanatory,
yet a simple linear regression of the explained variable on this explanatory variable
shows its coefficient to be significantly different from zero, this situation indicates
multicollinearity in the multivariable regression.
4. Some authors have suggested a formal detection-tolerance or the variance inflation factor
(VIF) for multicollinearity:
Tolerance = 1−R j2, VIF=
1tolerance
where R j2 is the coefficient of determination of a regression of explanator j on all the other
explanators. A tolerance of less than 0.20 or 0.10 and/or a VIF of 5 or 10 and above
indicates a multicollinearity problem.
5. Condition number test: The standard measure of ill-conditioning in a matrix is the
condition index. It will indicate that the inversion of the matrix is numerically unstable
with finite-precision numbers (standard computer floats and doubles). This indicates the
potential sensitivity of the computed inverse to small changes in the original matrix. The
Condition Number is computed by finding the square root of (the maximum eigenvalue
divided by the minimum eigenvalue). If the Condition Number is above 30, the
regression is said to have significant multicollinearity.
6. Farrar–Glauber test: If the variables are found to be orthogonal, there is no
multicollinearity; if the variables are not orthogonal, then multicollinearity is present. C.
Robert Wichers has argued that Farrar–Glauber partial correlation test is ineffective in
that a given partial correlation may be compatible with different multicollinearity
patterns. The Farrar–Glauber test has also been criticized by other researchers.
7. Construction of a correlation matrix among the explanatory variables will yield
indications as to the likelihood that any given couplet of right-hand-side variables are
creating multicollinearity problems. Correlation values (off-diagonal elements) of at least
.4 are sometimes interpreted as indicating a multicollinearity problem.
Consequences of Multicolinearity:-
1. Even extreme multicollinearity (so long as it is not perfect) does not violate OLS
assumptions. OLS estimates are still unbiased and BLUE (Best Linear Unbiased
Estimators)
2. Nevertheless, the greater the multicollinearity, the greater the standard errors. When high
multicollinearity is present, confidence intervals for coefficients tend to be very wide and
t-statistics tend to be very small. Coefficients will have to be larger in order to be
statistically significant, i.e. it will be harder to reject the null when multicollinearity is
present.
3. Note, however, that large standard errors can be caused by things besides
multicollinearity.
4. When two variabless are highly and positively correlated, their slope coefficient
estimators will tend to be highly and negatively correlated. When, for example, b1 is
greater than β1, b2 will tend to be less than β2. Further, a different sample will likely
produce the opposite result. In other words, if you overestimate the effect of one
parameter, you will tend to underestimate the effect of the other. Hence, coefficient
estimates tend to be very shaky from one sample to the next.
Remedies for Multicolinearity:-
1. Make sure you have not fallen into the dummy variable trap; including a dummy variable
for every category (e.g., summer, autumn, winter, and spring) and including a constant
term in the regression together guarantee perfect multicollinearity.
2. Try seeing what happens if you use independent subsets of your data for estimation and
apply those estimates to the whole data set. Theoretically you should obtain somewhat
higher variance from the smaller datasets used for estimation, but the expectation of the
coefficient values should be the same. Naturally, the observed coefficient values will
vary, but look at how much they vary.
3. Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn't
affect the efficacy of extrapolating the fitted model to new data provided that the
predictor variables follow the same pattern of multicollinearity in the new data as in the
data on which the regression model is based.
4. Drop one of the variables. An explanatory variable may be dropped to produce a model
with significant coefficients. However, you lose information (because you've dropped a
variable). Omission of a relevant variable results in biased coefficient estimates for the
remaining explanatory variables.
5. Obtain more data, if possible. This is the preferred solution. More data can produce more
precise parameter estimates (with lower standard errors), as seen from the formula in
variance inflation factor for the variance of the estimate of a regression coefficient in
terms of the sample size and the degree of multicollinearity.
6. Mean-center the predictor variables. Generating polynomial terms can cause some
multicollinearity if the variable in question has a limited range. Mean-centering will
eliminate this special kind of multicollinearity. However, in general, this has no effect. It
can be useful in overcoming problems arising from rounding and other computational
steps if a carefully designed computer program is not used.
7. Standardize your independent variables. This may help reduce a false flagging of a
condition index above 30.
8. It has also been suggested that using the Shapley value, a game theory tool, the model
could account for the effects of multicollinearity. The Shapley value assigns a value for
each predictor and assesses all possible combinations of importance.
9. Ridge regression or principal component regression can be used.
10. If the correlated explanators are different lagged values of the same underlying
explanator, then a distributed lag technique can be used, imposing a general structure on
the relative values of the coefficients to be estimated.
Here in this study multicolinearity have been analysed between different factors. If there is
multicolinearity factor and regression analysis has been used to reduce the multicolinearity.
3.6.6 Factor Analysis:- Factor analysis is a tool to reduce the number of variables to such a
small number that could be capable enough to explain the observed variance in the large number
of variables. It reduces the number of variables to such a small number which could be capable
enough to explain observed variance in the large number of variables. Initially the communalities
of variables have been calculated to represent the amount of variation extracted from each
variable. Variable with higher value is expected to represent better one. The extraction of
variable is done by principal component analysis method.
3.6.7 Regression Analysis: - It is a statistical process for estimating the relationships
among variables. It includes many techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and one or more independent
variables. More specifically, regression analysis helps one understand how the typical value of
the dependent variable (or 'criterion variable') changes when any one of the independent
variables is varied, while the other independent variables are held fixed. Most commonly,
regression analysis estimates the conditional expectation of the dependent variable given the
independent variables – that is, the average value of the dependent variable when the
independent variables are fixed. Less commonly, the focus is on a quantile, or other location
parameter of the conditional distribution of the dependent variable given the independent
variables. In all cases, the estimation target is a function of the independent variables called the
regression function. In regression analysis, it is also of interest to characterize the variation of the
dependent variable around the regression function which can be described by a probability
distribution.
Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so caution is advisable; for example, correlation does not imply
causation.
Many techniques for carrying out regression analysis have been developed. Familiar methods
such as linear regression and ordinary least squares regression are parametric, in that the
regression function is defined in terms of a finite number of unknown parameters that are
estimated from the data. Nonparametric regression refers to techniques that allow the regression
function to lie in a specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data
generating process, and how it relates to the regression approach being used. Since the true form
of the data-generating process is generally not known, regression analysis often depends to some
extent on making assumptions about this process. These assumptions are sometimes testable if a
sufficient quantity of data is available. Regression models for prediction are often useful even
when the assumptions are moderately violated, although they may not perform optimally.
However, in many applications, especially with small effects or questions of causality based on
observational data, regression methods can give misleading results.
Regression models involve the following variables:
The unknown parameters, denoted as β, which may represent a scalar or a vector.
The independent variables, X.
The dependent variable, Y.
In various fields of application, different terminologies are used in place of dependent and
independent variables.
A regression model relates Y to a function of X and β.
Y Y ≈ f (X ,β )
The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis,
the form of the function f must be specified. Sometimes the form of this function is based on
knowledge about the relationship between Y and X that does not rely on the data. If no such
knowledge is available, a flexible or convenient form for f is chosen.
Assume now that the vector of unknown parameters β is of length k. In order to perform a
regression analysis the user must provide information about the dependent variable Y:
If N data points of the form (Y, X) are observed, where N < k, most classical approaches
to regression analysis cannot be performed: since the system of equations defining the
regression model is underdetermined, there are not enough data to recover β.
If exactly N = k data points are observed, and the function f is linear, the equations Y =
f(X, β) can be solved exactly rather than approximately. This reduces to solving a set of
N equations with N unknowns (the elements of β), which has a unique solution as long as
the X are linearly independent. If f is nonlinear, a solution may not exist, or many
solutions may exist.
The most common situation is where N > k data points are observed. In this case, there is
enough information in the data to estimate a unique value for β that best fits the data in
some sense, and the regression model when applied to the data can be viewed as an
overdetermined system in β.
In the last case, the regression analysis provides the tools for:
1. Finding a solution for unknown parameters β that will, for example, minimize the
distance between the measured and predicted values of the dependent variable Y (also
known as method of least squares).
2. Under certain statistical assumptions, the regression analysis uses the surplus of
information to provide statistical information about the unknown parameters β and
predicted values of the dependent variable Y.
Classical assumptions for regression analysis:
1. The sample is representative of the population for the inference prediction.
2. The error is a random variable with a mean of zero conditional on the explanatory
variables.
3. The independent variables are measured with no error. (Note: If this is not so, modeling
may be done instead using errors-in-variables model techniques).
4. The predictors are linearly independent, i.e. it is not possible to express any predictor as a
linear combination of the others.
5. The errors are uncorrelated, that is, the variance–covariance matrix of the errors is
diagonal and each non-zero element is the variance of the error.
6. The variance of the error is constant across observations (homoscedasticity). If not,
weighted least squares or other methods might instead be used.
Summary and Conclusion
In this chapter an attempt has been made to provide the justification for the research design. The
study is a descriptive research design based on questionnaire. The questionnaire has been formed
in a simplest way so that positive response can be received. The questionnaire has been
constructed to determine the satisfaction level with eBanking in rural market. Questionnaire has
tried to cover almost all necessary elements based on eBanking. Questionnaire has been prepared
on the basis of Likert’s five points scale.
The collected data has been analyzed through IBM statistical software SPSS. At the outset
Cronbach Alpha test has been employed to check the internal consistency (reliability) of the data.
Later on Kaiser-Meyer-Olkin and Bartlett’s tests were conducted to test sample adequacy and
sphericity of collected data. To diagnose the problem of multi-co linearity degree of correlation
has been estimated. As the results have shown problem of co-linearity, factor analysis has been
done as a tool of dimension reduction. The results have further been analyzed through regression
to establish the relation of RGER scores with overall satisfaction level of customers.