Click here to load reader

What is MIS? - National Center for Border Security and ...borders.arizona.edu/classes/mis696a/studentfiles/What is... · Web viewPrevious MIS696a projects and the MIS literature in

Embed Size (px)

Citation preview

Defining Information Systems from an Academic PerspectiveAuthors: Devipsita Bhattacharya, Samuel Birk, John Gastreich, Justin Giboney, Chenhui Guo, Shan Jiang, YuKai Lin, Jeff Proudfoot, Ryan Schuetzler, Jaebong Son, Xing Wan, Justin Williams

ContentsWhat is MIS?4How can MIS be identified within academia?4What differentiates high and low quality MIS research?5Method6Data for Article Citations and Characteristics6Measures6Dependent variable.6Contribution.7Article Coding8Cluster analysis of Research Paper Abstracts8Approach8Dataset9Process for Analyzing Abstract10Five Naturally Formed Clusters11IS for Decision Support Cluster11Organizational Behavior Cluster12Electrical Engineering & Healthcare Cluster13Economics & Accounting Cluster14What MIS is NOT Cluster14Conclusions from Clustering Analysis15Keyword Analysis16Vector Space Model16Cosine Similarity17Implications21Limitations22Discipline Correlation by Citations22References Analysis25Number of citations received by a discipline26Number of references given by a discipline27Number of Self citations made by each discipline29Number of citations received Vs number of references made31Market share of citations received by discipline32Market share of references given by discipline33Interaction Between MIS and Other Disciplines35What Makes a MIS Article a High-Quality Article?42Overview42Advanced statistical analysis of high-quality MIS articles44The 6 Basic Variables44The Generation of Textual Variables45The results of Logistic Regression Analyses46Analysis to Identify the Determinants of a Highly-Cited MIS Paper51Introduction51SPSS Logistic Regression51Analysis 151Analysis 253Analysis 355R Matched Pair Logistic Regression58Using coded variables58Using other variables59Discussion60Future Research61General Conclusions62Works Cited63Appendix 1 - C2MIS Evolution64Appendix 2 Highly Cited MIS Articles71Appendix 3 Motion Chart73

What is MIS?

Previous MIS696a projects and the MIS literature in general make it evident that this is not an easy question to answer. When approached with this question, our class decided to approach it from two directions. First, considering the importance of MIS as a multidisciplinary field, we were interested in determining the similarities, differences, and relationships between MIS and several related research fields (e.g. psychology, management, economics, computer science, etc). The hope was that this examination would provide insight into how MIS can be identified within academia. Second, we were interested in determining not only what MIS is in general, but what attributes can be used to identify high quality MIS research, which we consider to be the core of MIS.

How can MIS be identified within academia?

The project conducted by MIS 696a in 2009 concluded that future projects should investigate the relationship between MIS and related disciplines. In order to explore these relationships, we first determined 12 related disciplines identified by (Katerattankul, Han, & Rea, 2006). Several of these disciplines were also identified by previous projects. We then identified 6 to 9 journals high quality journals for each of the 12 disciplines using either ISIs discipline search or articles citing quality journals within the discipline[endnoteRef:1]. Next, article and citation data for this study was obtained from the Institute for Scientific Informations (ISI) Web of Knowledge website. ISI is the major source of academic research citation information, reporting citation and article information for over 8,500 scientific journals. Citation and article information was obtained for articles published in 13 identified journals. Because we were interested in studying articles that have an impact on their field through their empirical and theoretical research, we limited our scope to publications that ISI classified as articles, notes, or reviews (here after called articles). In total, citation and article information was obtained for 102,388 articles. We will refer to this as the overall citation database. [1: For any discipline not listed we could not find an article so we used Thomas Reuter's ISI Web of Knowledge database and identifed the top 6-9 journals within the discipline based on 5-year Journal impact factor.MIS: Huang, H. & Hsu, J.S. (2005) An evalution of publication productivity in information systems: 1999-2003. Communications of the Association for Information Systems. Lin, A.C.H. & Gregor, S. (2009). Publication productivity in information systems 2003-2007: A focus on the 'Basket of Six' and the Pacific Asia Region. Pacific Asia Journal of the Association for Information Systems1, 1-16. Clark, J.G., Warren, J., Au, Y.A. (2009).Assing research publication productivity in leading information systems journals: A 2003-2007 update. Communications of the Association for Information Systems. 24, 225-254Economics: Ritzberger, K. (2008), A Ranking of Journals in Economics and Related Fields. German Economic Review, 9: 402430.Library Science: Bar-Ilan, J. (2010). Rankings of information and library science journals by JIF and by h-type indices. Journal of Informetrics. 4, 141-147Marketing: Baumgartner, H. & Pieters, R. (2003).The Structural Influence of Marketing Journals: A Citation Analysis of the Discipline and Its Subareas over Time. Journal of Marketing, 67, pp. 123-139. Theoharakis, V. & Hirst, A. (2002). Perceptual Differences of Marketing Journals: A Worldwide Perspective. Marketing Letters, 13(4), pp. 389-402.Medical Informatics:Baskerville, R. L., & Myers, M. D. (2002). Information Systems as a Reference Discipline. MIS Quarterly, 26(1), 1-14.Accounting :Sarah E. Bonner, James W. Hesford, Wim A. Van der Stede, S. Mark Young. 2006. The most influential journals in academic accounting. Accounting, Organizations and Society. Volume 31 p 663-685.]

This data was then cleaned and mined to determine relationships between MIS and other disciplines based on frequently used words in abstracts, commonly used keywords, and the number of citations to other disciplines. More information about these analyses will be provided later.

What differentiates high and low quality MIS research?

We felt that beyond understanding how to identify MIS in academia at large, it was necessary to determine how to identify MIS research that is widely recognized and appreciated for its contribution to the field. These articles might be considered the core of MIS, and as such we felt it was necessary to determine how to differentiate this research from general MIS publications.

Further, one of the most important indicators of a scholars success is the number and quality of the articles s/he publishes throughout a career. These publications are used to inform promotion and salary decisions for professors. Therefore, it seemed practical for us, as future MIS scholars, to understand the factors that influence the quality of MIS publications.

A recent study by Judge et al. (2007) has begun to identify some of the important factors directly influencing article citations, such as the quality of the journal the article was published in, whether the article explored new theoretical paradigms, the number of references the article cited, and type of research design employed. In addition, these authors also found that the clarity of an articles presentation, the prestige of author affiliations, and whether the study included independent data sources also indirectly affected the citations an empirical article received.

We therefore sought to investigate the impact of these factors on citations in the field of MIS. We also attempted to extend the Judge et al. (2007) study by specifically identifying the characteristics that distinguish highly-cited MIS articles from those that are less highly-cited. This is important because our research indicates that although citation classics (i.e., articles that have received at least 100 citations) represent a relatively small proportion of the total number of articles published, they account for large proportion of the citations attributed to journals. Given the disproportionately large effect of these classics, we believe it is critical to understand those factors that predict whether an article will have a disproportionate level of impact, or not.

We also included additional measures of theoretical contribution (i.e., theory building and theory testing; Colquitt & Zapata-Phelan, 2007; Newman & Cooper, 1993) that were excluded from the Judge et al. study. Finally, we examine a list of citation classics (and matched pair articles) published between 1970 and 2010.

MethodData for Article Citations and Characteristics

Using the overall citation dataset, we identified a subset of citation classics. We defined a citation classic as an article in our database having 100 or more citations. 50 of these articles were randomly selected to be coded. We also identified 50 non-citation classics (articles having fewer than 100 citations) that were matched on publication year, journal to control for article age and journal source, factors which have been shown to have substantial effects on citations in the field of management (Colquitt & Zapata-Phelan, 2007; Judge et al. 2007; Podsakoff et al., 2005). Therefore, we identified a set of 50 matched pairs, for a total of 100 articles. We will refer to this data as the matched pair database in future discussions.

Measures

Dependent variable. Article impact was measured as the total number of citations that each article in our sample had accrued in ISIs Social Science Citation Index (SSCI) from the time it was published until October 31st, 2010. Using the total number of citations an article has received is the most frequently used measure of article impact (Bergh et al., 2006; Colquitt & Zapata-Phelan, 2007; Judge et al., 2007; Kacmar & Whitfield, 2000; Newman & Cooper, 1993; Podsakoff et al., 2005, 2008).

Contribution. We employed two measures to assess universalistic attributes related to the ideas presented in each article. First, we coded the research plot of each article according to the criteria described by Newman and Cooper (1993). Articles having refinement plots were those that focused on increasing the accuracy of the scope of known things by replicating previous studies in different settings or with different statistical/methodological techniques. Articles having extension plots were those that focused on articulating an existing paradigm (Newman & Cooper, 1993, p. 520) by examining relationships between previously examined dependent variables and previously unexamined independent variables, examining the relationships between multiple independent variables (previously examined in isolation) with a dependent variable, or proposing new mediators/moderators to better explain existing links between independent and dependent variables. Finally, articles with exploration plots tried to carry a paradigm into unknown territory by examining a traditional dependent variable as an independent variable, examining variables at a new level of analysis, or introducing a new casual network or foundations for new theory. Raters coded each article based on the highest level research plot present in an article.

Second, we also coded articles based on the theory building and theory testing criteria developed by Colquitt and Zapata-Phelan (2007). Colquitt and Zapata-Phelan developed 5-point scales designed to rate the extent to which empirical articles build new theory (1 = attempts to replicate previous findings to 5 = introduces a new construct [or significantly reconceptualizes an existing one]) and test existing theory (1 = is inductive or grounds predictions with logical speculation to 5 = grounds predictions with existing theory). These authors reported that both the theory building and theory testing variables explained significant variance in empirical article impact, even after controlling for the year the article was published. Therefore, we coded both dimensions for all empirical articles in our sample.

Article Coding

First, each student independently coded a set of 3 randomly selected articles from the matched pairs database. After completing their codes, the two raters compared codes and discussed all discrepancies, which led to the refinement of several coding criteria. Following this, students again independently coded a set of 3 randomly selected articles from the matched pairs dataset. Comparison of the individual codes and discussion of differences in ratings, led to a few additional refinements to the criterion coding scheme. In the final step of this process, each student was assigned a set of randomly selected articles representing approximately a third of the remaining dataset. Each student coded the articles on his/her own and then met with a group of 1 or 2 other students who had coded the same articles. The groups changed depending on the particular articles ensuring cross-validation of coding decisions.

Cluster analysis of Research Paper Abstracts

Our assumption is that similar disciplines use similar words, and the similarity among disciplines may be reflected in the abstracts from research papers. Therefore, text mining was leveraged to analyze and compare the abstracts form research papers to determine disciplines similar to MIS and disciplines dissimilar to MIS.

Approach

To conduct text mining, two main analyses were required: term extraction and cluster analysis. Both analyses were performed using Microsoft SQL Server 2000. One of the main analysis steps is the extraction of terms. SQL Text Mining uses a Markov chain-based grammar model to detect terms in addition to performing a normal stemming technique. Terms extracted were used as input values for the cluster analysis. EM clustering algorithm of SQL Server 2008 is a probabilistic-based technique that iteratively refines an initial cluster model to fit the data and determines the probability which a data point exists in a cluster. Therefore, the results are probabilistic.

Dataset

38,642 out of our complete dataset of 102,388 records were chosen for this analysis because our dataset only has the abstracts from research papers for those 38,642 records; the other records in our dataset do not have abstracts and therefore could not be included.

Figure 1 shows the proportion of disciplines that were used in the abstract analysis. MIS, and Economics, and Electrical Engineering disciplines had a large volume of research papers with approximately 5,000 each. In contrast, the Accounting and Communication disciplines had a relatively small number of research papers with about 1,000 each.

To prevent the unbalanced data volume from causing biased results, the number of terms to represent each discipline was limited to 150.

Figure 1 - Proportion of Discipline in Analysis

Process for Analyzing Abstract

We followed four steps to analyze abstracts. Through the analysis, we expected that similar disciplines would be clustered together.

Figure 2 - Abstract Analysis Process

Step 1: Nouns and noun phrases (terms) are extracted from all thirteen disciplines by the Term Frequency (TF) method, which is commonly used in text mining tasks.

Step 2: Using the previously extracted terms from Step 1, a global vocabulary was created, representing all terms from all disciplines. We obtained 817 terms, after the elimination of redundant terms.

Step 3: The global vocabulary was used to create bag-of-words model by indexing all extracted terms. The bag-of-model is a simple assumption for natural language processing, text mining, and information retrieval systems.

Step 4: Cluster algorithm used this bag-of-words model as input variables to form clusters.

Five Naturally Formed Clusters

From the analysis, five naturally formed clustered were generated (Figure 3). Clusters were then labeled by their top ten keywords and their most predominant disciplines; the labels given were IS for Decision Support (5,524 articles), Organizational Behavior (6,196 articles), Electrical Engineering & Healthcare (9,270 articles), Economics & Accounting (9,555 articles), and What MIS is NOT (8,097 articles).

Figure 3 - Cluster Proportions

IS for Decision Support Cluster

The first cluster (Figure 4) shows the highest concentrations of MIS (30.3%) and Library Science (24.1%) with a rather large segment of Communications (14.4%). All of the other disciplines in the cluster represent less than ten percent each. This cluster shows a high overlapping of MIS, Library Science, and Communications. On this cluster, Psychology, Economics, and Electrical Engineering have very low representations.

Figure 4 - IS for Decision Support Cluster

Terms in this cluster include: decision support system (DSS), information, system, software, organization, database, Web, collaboration, knowledge, and information retrieval. From these keywords, it can be seen that the abstracts from the articles in this cluster are highly related to such topics such as collaboration, information systems, and decision making.

Organizational Behavior Cluster

The second cluster (Figure 5) was labeled Organizational Behavior due to its diverse number of disciplines related to human behavior and its keywords related to the human side. This cluster was more evenly distributed than the previous clusters with several disciplines representing six percent or more. It is represented by Management (20.2%), Computer Science (12.9%), Marketing (11.8%), and Sociology (9.8%), Library Science (8.0%), and Psychology (6.6%). It appears that MIS also is somewhat related to the articles in this cluster with a 6.6% representation.

Figure 5 - Organizational Behavior Cluster

Terms in this cluster include: transformational leadership, leader-member exchange (LMX), relational uncertainty, organizational citizenship behavior (OCB), organizational commitment, leadership, satisfaction, culture, meta-analysis, and social movement.

Electrical Engineering & Healthcare Cluster

The third cluster (Figure 6) was labeled the Electrical Engineering & Healthcare Cluster due to the domination by the discipline of Electrical Engineering (30.2%) and Healthcare (20.5%). Again, MIS is represented with a 6.6% share of the articles in this cluster. This indicates that MIS is somewhat related to the Electrical Engineering & Healthcare discipline.

Figure 6 - Electrical Engineering & Healthcare Cluster

Terms in this cluster include: inverter, induction motor, sensor, topology, mobile robot, neural network, architecture, system, support vector machine (SVM), and genetic algorithm (GA).

Economics & Accounting Cluster

The fourth cluster (Figure 7) was labeled the Economics & Accounting Cluster due to the dominance of the Economics and Accounting disciplines with both representing about 27% of the cluster. Another strongly represented discipline is Marketing (16.2%). Most other disciplines represent five percent or less of the cluster. This cluster is very financial-based with an emphasis on numbers.

Figure 7 - Economics & Accounting Cluster

Terms in this cluster include: earnings announcement, Financial Accounting Standard Board (FASB), Sarbanes-Oxley Act (SOX), audit fee, equilibrium, valuation, private information, bidder, earnings forecast, and incentive.

What MIS is NOT Cluster

The fifth cluster (Figure 8) was labeled What MIS is NOT due its very limited splice of the cluster. This cluster is dominated by Psychology (23.4%), Sociology (17.3%), Communication (16.1%), Education (16.5%), and Healthcare (9.8%). All of disciplines represented five percent or less of the cluster. Interestingly, MIS only represented 1.2% of the cluster which would indicate that MIS is very dissimilar to the types of papers this cluster represents.

Figure 8 - What MIS is NOT Cluster

Terms in this cluster include: somatic symptom, body mass index, bipolar disorder, anxiety disorder, (major) depression, (psychiatric, mental) disorder, physical activity, medication, blood pressure, and competitive intelligence (CI).

Conclusions from Clustering Analysis

The results from this cluster analysis show that MIS is more similar to some disciplines and less similar to others (Figure 9). MIS is more similar to Library Science and Communication with common terms such as information, system, software, organization, database, Web, collaboration, knowledge, and information retrieval. On the other extreme, MIS appears to be very dissimilar to Psychology & Social Sciences which have key terms such as body mass index, physical activity, blood pressure, bipolar disorder, and anxiety disorder.

Figure 9 - Conclusions from Clustering Analysis

Keyword Analysis

In order to compare the relationships among MIS and the other twelve disciplines with keyword analysis, there are three questions to be asked and addressed in this section:

1. How to represent a discipline with keywords?

2. Based on the representation, how to compare the relations/similarities among different disciplines?

3. Hows the relations/similarities between MIS and the other disciplines evolve over time?

We propose to utilize vector space model (Raghavan & S. K. M. Wong, 1986; Salton et al., 1975) to represent each discipline, and compare them with cosine similarity, a typical measurement of similarity in text mining. In the following, we will explain the exact processes on how the keyword analysis is conducted. In the end of this section, we provide implications as well as limitations of this keyword analysis.

Vector Space Model

In a vector space mode, target items are represented as vectors. The elements inside a vector are called features. In our context, we try to use paper keywords as features to represent a disciple. The use of keyword to represent content of a text file has been used for years in the information retrieval and text mining studies. In the beginning, there is a process to transform a discipline with its representative keywords used in that field (Figure 10).

Figure 10 - An illustration of the transformation from discipline keywords to discipline feature vectors

After the transformation, it is the vectors, instead of keywords themselves, which are used to represent a discipline. Suppose that we have successfully represented two disciplines, say MIS and Computer Science, using vectors of their corresponding keywords. Their representative term vectors will be:

=

= < w21, w22, , w2x >

Cosine Similarity

Following this, we can use cosine similarity, a widely adopted similarity assessing model in vector space model, to estimate the similarity between two disciplines, i.e., vectors. Take Figure 2 for example. The cosine similarity between v and v1 is to measure the angle 1. Similarly, the cosine similarity between v and v2 is to measure the angle 2. Two vectors are deemed more similar if they have a smaller angle in between. Given that 1 is smaller than 2, in Figure 11 v and v1 is more similar than v and v2.

Figure 11 - Illustration of cosine similarity

While it is easy to discern an included angle in a two dimensional space with naked eyes, it becomes difficult, if not impossible, to do so in a hyperspace with more than three dimensionswhich is typical in vector spaces. Therefore we need a formula to calculate the included angle. Following the previous example of MIS and Computer Science vectors, the formula to calculate cosine similarity between the two disciplines is as the following:

.

With this understanding, we can further investigate the similarity between two disciplines by different period of time. To do this, we can simply use the keywords appeared in the particular time span to represent the disciplines (see Figure 12 for an example of five-year interval).

Figure 12 - five-year interval

The following three diagrams (Figures 13-15) shows the similarities of MIS with the other 11 disciplines with varied units of time spans (one year, two years, and five years). The data points are starting from 1990 because before then most of the papers did not contain keyword information. Therefore, it is only possible to conduct keyword-based vector space model if keywords are available.

Figure 13 - The similarities of MIS with the other 11 disciplines using one-year time span

Figure 14 - The similarities of MIS with the other 11 disciplines using two-year time span

Figure 15 - The similarities of MIS with the other 11 disciplines using five-year time span

Implications

It can be found that although similarities from one-year time span provide more data points, they are rather unstable given that using keywords in a year to represent a discipline is rather risky. It is because these keywords in that year may not be representative enough. Therefore, it makes sense to use the results with a slightly longer time interval.

Based on the experiments, the following implications are provides.

1. From the diagrams, it can be found that the most similar four disciplines of MIS are Computer Science, Marketing, Management, and Medical Informatics.

2. Although there are some fluctuations, the order of similarities of different disciplines is rather stable. That is, Computer Science, Marketing, Management are at constant high whereas Accounting, Psychology, and Education are always the least similar disciplines to MIS.

3. There is a growing trend among the similarities of MIS with management and marketing. This implies that in the past decades MIS has been gradually growing toward a business discipline from its origin of operation research and computer science in the early years.

Limitations

This analysis is not without limitations. The most significant limitation of this analysis is that keywords are not always available. For instance, most articles in the early years do not have keywords (neither provided by the authors in the first place nor being recorded in our source database, i.e., ISI Web of Science). In addition, articles in some influential publications, such as MIS Quarterly, provide no keyword information to reader which makes certain bias in our keyword analysis. Nevertheless, the bias is expected to be compensated by the abstract-based or citation-based analyses in this study.

Discipline Correlation by Citations

Page1

For our analysis we assumed that the top (A) journals of each discipline are the purest definition of the discipline. Therefore we found articles that listed the top journals of each field. For example, if each discipline is a mountain in a range of mountains, the top of each mountain is the most distinct part of the mountain. As the mountain slopes down, it gets closer to the other mountains in the range. As the top of the mountain is the furthest distance to all the other mountains, the top journals of a discipline are going to be the furthest from the other disciplines.

In order to find out which disciplines are most closely related to IS we used citation counts of each top journal to every other top journal. The more cites between two journals, the closer the mountains would be to each other. We found some pretty interesting things. The first thing we found is that IS top journals cite Marketing and Management top journals much more than any other top journals (see Figure 16). It is interesting to see that CS and Accounting top journals are not highly cited by IS.

Figure 16 - Cites by IS

Seeing which discipline is cited by IS is only half of the story. We also need to see which disciplines cite IS (see Figure 17). This will show us which disciplines think of IS as a reference discipline. Disciplines that cite IS are going to be more closely related than disciplines that do not cite IS. Figure 17 shows a much different picture than Figure 16. Figure 17 shows us that IS is most cited by Marketing, Education, Library Science, and Healthcare.

Figure 17 - Cites to IS

Another thing to look at when determining how IS fits in with the other disciplines, is to see over time how IS is being cited and how it is citing. Figure 18 shows a comparison to how much a discipline cites other discipline to how much it gets cited by other disciplines. We can see that IS (pink) overtime has less cites as a percentage to other disciplines (i.e. IS is citing itself more and more). We also see that of the cites IS receives, overtime IS is receiving a larger percentage of cites from other disciplines (i.e. as a percentage it is being increasingly cited by other disciplines). This shows that IS is becoming a reference discipline not only to itself, but to other disciplines as well. This is in contrast to Electrical Engineering (EE). EE is folding in on itself. We can see that EE is citing other disciplines less and less, while being less and less cited by other disciplines. This shows that EE is not very correlated with a lot of other disciplines.

Figure 18 - Cite Comparison

References Analysis

As an additional step, year wise trends in citations received and references given were developed using Google motion chart. Google motion chart developed using Google visualization API, is a dynamic chart to explore several indicators over time. The chart is rendered within the browser using Flash.

The different charts developed are as elaborated below. The charts are visualized using Microsoft Word chart tool in this document. The Google motion charts developed have been attached as an html document in the Appendix section. Note: Viewing the Google Motion chart requires Internet connection.

Number of citations received by a discipline

The line chart below shows the number of citations received by each discipline from 1970 till 2009. The citations received count does not include the self citations received by a discipline.

Figure 19: Number of citations received by discipline

Psychology and Economics have been the frontrunner disciplines is citations received since 1970. In 2009, Economics and Marketing discipline received the maximum number of citations, followed by Marketing and Psychology.

Figure 20: Number of citation received by MIS

MIS discipline received its first citation in the year 1976, and has over the years slowly developed to be a reference discipline. Referring to the citations count in 2009, MIS scored 456 citations from the 12 disciplines as shown in chart above.

In 2009, MIS received citations primarily from the following disciplines (excluding self citations).

Table 1- Top Citers of MIS

Education

161

Marketing

56

Healthcare

92

Management

26

Library Science

59

Number of references given by a discipline

The second chart presents the number of references given by a discipline to other disciplines from 1970 to 2009. This references given count does not include the self references made by a discipline.

Management clearly emerges as a discipline that includes maximum references to other disciplines since 1970, although its count of references made has seen a sharp drop in the year 2009. Accounting and Marketing disciplines follow close second and third respectively.

Figure 21: Number of references given by discipline

In 2009, MIS made 422 references to other disciplines in its top journal papers. The references given trend for MIS is very similar to the citations received trend, as shown by the chart below.

Figure 22: Number of references given by MIS

In 2009, MIS made references primarily to the following disciplines (excluding self citations).

Table 2 - Top Reference Disciplines of MIS

Marketing

165

Economics

25

Management

157

Psychology

22

It would be interesting to analyze the comparison between citations received and references given for each discipline to get a clearer picture.

Number of Self citations made by each discipline

The third chart presents the self citations made by each discipline from 1970 to 2009. It is interesting to observe much does a discipline cite itself in the top journal papers

The Economics discipline as evidenced from the first and second chart emerged as a discipline that receives a lot of citations but does not refer other disciplines much. In the chart below, it is evident that Economics cites itself in majority of its papers (top journal). Other frontrunners in self citations include Marketing and Management disciplines.

Figure 23: Number of self citation by discipline

Referring the chart below (Figure 24), MIS is one of the disciplines that makes one of the least number of self references/citations. But, interestingly this count is higher when compared to the references made count. For instance in 2009, MIS made 422 references and 514 self citations.

Figure 24: Self citations made by MIS

Number of citations received Vs number of references made

Figure 25 exhibits the comparison analysis of the number of citations received by each discipline to number of references given by a discipline from 1970 to 2009.

Figure 25: Number of citations received Vs number of references made

When observing the MIS trend from 1970 -2009, as depicted, MIS is seen to have emerged as a reference discipline off-late with higher number of citations received. In comparison with other disciplines such as Economics which is a major reference discipline but does not refer other disciplines much. Compare this with Accounting which refers other disciplines more, but does not receive many citations from other disciplines. Management and Marketing on the other hand have maintained a fair balance in the number of citations received and references made.

Market share of citations received by discipline

Figure 26 shows the market share of disciplines from 1970 2009. By market share, it is implied that of the total citations count received by all disciplines for a given year, what was the percentage share of each discipline. This citations received count does not include the self citations. This chart shows how the dynamics of the citations received by a discipline has changed over the years. Psychology and Economics dominate the share of citations received; this corroborates the observations made in earlier charts.

Figure 26 Market Share of Citations Received by Discipline

The market share of citations received for MIS discipline is indicated by the teal blue bars (highlighted by black oval) in Figure 27. This shows that over the years MIS has increased significance as a referenced discipline, becoming evident from 1988 onwards. In 2009, it stands at 7.1%.

Figure 27 - MIS highlighted

Market share of references given by discipline

The last chart (Figure 28) in this series of Google motion chart shows the market share of references given.

Figure 28 Market Share of References Given by Discipline

By market share of references, it is implied that of the total references made by all disciplines for a given year, what was the percentage share of each discipline. This references made count does not include the self citations. This chart shows how the dynamics of the references given by a discipline has changed over the years. Management and Marketing dominate the share of references made.

Figure 29: MIS highlighted

The market share of references given for MIS discipline is indicated by the teal blue bars (highlighted by black oval) in Figure 29 above. This shows that over the years, the market share of MIS references made to other disciplines has increased significantly. In 2009 it stands at 6.6%.

Interaction Between MIS and Other Disciplines

In this section, we will discuss the interaction represented as citation relationship between MIS and other non MIS disciplines. In 1970, MIS research came out as an interdisciplinary discipline that makes citations from many other fields of study. As it evolved, the importance of MIS studies were gradually recognized by researchers in other fields and inspired them in the research of their own fields. To better understand this phenomenon, we will look into the data regarding citation statistics of how many papers MIS has cited in other disciplines, and how many MIS papers were cited by research in other disciplines.

The data we have is year-wise citation count of how many times papers belong to one discipline cite papers in each of the discipline the discipline collection, from 1970 to 2010. The discipline collection includes: MIS, accounting, communication, computer science, economics, education, electrical engineering, healthcare, library science, management, marketing, psychology and sociology. As is often the case, each paper in one discipline cites papers mostly from the same discipline, which usually exceed 60% of all citations of that paper. With huge amount of self-discipline citations, analysis of interdisciplinary citation will be affected when we want to know the proportion of citation in each disciplinary because inclusion of self-discipline citation leads to large denominator. Thus we will exclude self-discipline citation and use only the count of interdisciplinary citations as denominator when calculating some proportions.

Here, we will introduce three indicators to measure the interaction between MIS and other disciplines: MIS market share (MISMS), contribution to MIS (C2MIS) and MIS consumption (MISC). They are defined as follows:

D: The collection of all discussed non-MIS disciplines. :={ accounting, communication, computer science, economics, education, electrical engineering, healthcare, library science, management, marketing, psychology and sociology}

MISMS (d)= , d D

MISMS indicates among the all interdisciplinary citations in one field, what percentage does MIS takes. The higher MISMS, the more importance MIS holds and gives inspiration in this field. It can be analogous to the market share of MIS in another field, say d, if we see all disciplines except d as competitors to sell papers for d to cite.

C2MIS (d)= , d D

C2MIS indicates among the all interdisciplinary citations by MIS, what percentage does one field, say d, takes. The higher C2MIS, the more d contributes MIS by providing some ideas, theory or methodology.

MISC (d) = , d D

MISC indicates among all cited MIS papers in other fields, what percentage does MIS citation in one field, say d, takes. The higher MISC, the more MIS is related to d. It can be analogous to the regional consumption of MIS, if we see all non MIS disciplines as consumers of MIS papers. MISC differs from MISMS in that it measures the importance of one field to MIS, while MISMS measures the importance of MIS in one field. MISC can be high in one field but with low MISM in that field, as we will see later that such is the case in marketing.

Because we used citation statistics only in top journals in all disciplines, the data is not sufficient to cover all publications, causing fluctuation in data representation if year wise data is used. Such being the case, analyses will be done using 5 years as unit of time to offset the fluctuation caused by data shortage. See Tables 3,4,5 as the values of indicators from 1970 to 2010.

Table 3 - MISM statistics from 1970 to 2010

MIS Market share

Accounting

Communication

Computer Science

Economics

Education

Electrical Engineering

Healthcare

Library Science

Management

Marketing

Psychology

Sociology

1971~1975

0

0

0

0

0

0

0

0

0

0

0

0

1976~1980

0

0

0

0

0

0

0

0.0493

0

0

0

0

1981~1985

0

0

0

0

0

0.0387

0

0.0251

0

0

0

0

1986~1990

0

0

0.1041

0

0.0343

0.1229

0.0263

0.1861

0.0023

0.0072

0

0

1991~1995

0

0.0117

0.2736

0.0032

0.0222

0.1054

0

0.2007

0.0066

0.0165

0

0.0010

1996~2000

0.0061

0.0264

0.2200

0.0039

0.0113

0.1174

0.0341

0.2354

0.0139

0.0290

0.0050

0.0070

2001~2005

0.0025

0.0808

0.2488

0.0042

0.0679

0.1250

0.0737

0.1522

0.0161

0.0364

0.0079

0.0026

2006~2010

0.0150

0.0434

0.1446

0.0100

0.2021

0.1380

0.1453

0.1760

0.0127

0.0752

0

0.0046

Table 4 - CMIS statistics from 1970 to 2010

Contribution to MIS

Accounting

Communication

Computer Science

Economics

Education

Electrical Engineering

Healthcare

Library Science

Management

Marketing

Psychology

Sociology

1971~1975

0

0

0

0

0

0

0

0

0

0

0

0

1976~1980

0

0

0

0

0

0

0

0

0

0

0

0

1981~1985

0.0714

0

0.0595

0

0

0.1071

0

0

0.1309

0.5833

0.0238

0.0238

1986~1990

0.0494

0.0035

0.0742

0

0.0035

0.0989

0

0.0070

0.2261

0.4840

0.0424

0.0106

1991~1995

0.0351

0.0039

0.0449

0.0273

0

0.0957

0.0019

0.0175

0.3027

0.3750

0.0839

0.0117

1996~2000

0.0288

0.0196

0.0311

0.0461

0

0.0461

0

0.0103

0.3598

0.3713

0.0668

0.0196

2001~2005

0.0226

0.0062

0.0280

0.0631

0

0.0311

0.0046

0.0062

0.3616

0.3943

0.0779

0.0038

2006~2010

0.0123

0.0041

0.0242

0.0629

0.0030

0.0170

0.0098

0.0123

0.3588

0.4057

0.0722

0.0170

Table 5 - MISC statistics from 1970 to 2010

MIS Consumption

Accounting

Communication

Computer Science

Economics

Education

Electrical Engineering

Healthcare

Library Science

Management

Marketing

Psychology

Sociology

1971~1975

0

0

0

0

0

0

0

0

0

0

0

0

1976~1980

0

0

0

0

0

0

0

1

0

0

0

0

1981~1985

0

0

0

0

0

0.4545

0

0.5454

0

0

0

0

1986~1990

0

0

0.0349

0

0.1118

0.1538

0.0069

0.5244

0.0629

0.1048

0

0

1991~1995

0

0.0173

0.1805

0.0034

0.0381

0.0868

0

0.3819

0.1319

0.1562

0

0.0034

1996~2000

0.0166

0.0360

0.1274

0.0055

0.0193

0.0858

0.0221

0.3462

0.1495

0.1634

0.0083

0.0193

2001~2005

0.0075

0.0567

0.0982

0.0037

0.0850

0.0812

0.1001

0.1096

0.2211

0.2230

0.0094

0.0037

2006~2010

0.0262

0.0215

0.0155

0.0047

0.2562

0.0465

0.1463

0.1469

0.0667

0.2656

0

0.0033

Figure 30 shows the MISMS trends for each discipline. As we can see, MIS has been important for computer science, library science and electrical engineering, and contributed much to paper in these three fields. But it was not dominantly important because the highest value did not exceed 0.3. We can also see that the importance of MIS is emerging in the discipline of education, healthcare and marketing. Especially in education, MISMS in this field has become the highest among all other fields.

Figure 31 shows the C2MIS trends for each discipline. As we can see, marketing and management literature have been and are still remaining to be the highest contributors to MIs. They together acted as dominant interdisciplinary source of citation for MIS. To our surprise, the emerging research that use economics model in MIS is not reflected in our data. An explanation is that they are still too few to be published in top MIS journals.

Figure 32 shows the MISC trends for each discipline. As we can see, almost all the disciplines decrease in MISC, indicating MIS is becoming less important in these fields. The four disciplines with increasing MISC are library Science, education, healthcare and marketing. MIS is becoming more and more important to them.

Figure 30 - MISMS trends

Figure 31 - C2MIS trends

Figure 32 - MISC trends

What Makes a MIS Article a High-Quality Article?Overview

In this section, we reviewed the ISI dataset for the purpose of answering the question What makes an MIS article a high quality article? To accomplish this, we reviewed a subset of the full ISI dataset representing articles from eight MIS journals in order to identify the most significant factors correlated to high quality articles. For the purpose of this study we define high quality to mean 100 or more citations.

For this study we filtered the total ISI dataset down to subset of eight significant journals having a primary focus on the field of information systems. The journals we selected included Information & Management, Decision Support Systems, Information Systems, MIS Quarterly, European Journal of Information Systems, Information Systems Research, Journal of Management Information Systems, and Journal of Strategic information Systems. We reviewed three previous studies involving MIS journals and drew from these to develop this list of eight journals (Huang, 2005; Clark et al., 2009; Lin &Gregor, 2009).

In our MIS-only dataset, we identify 4868 articles from the eight journals published from 1978 to 2005. We further filtered the data to 4745 articles labeled with Article, Editorial Material, Proceedings Paper, or Review. Finally, we excluded articles published in the last five years in order to avoid the results being biased by articles that were so new even high quality ones may have few citations.

Once we identified the dataset to be studied, we created six logistic regression models to review the data. The first of these models is called the standard model and included the following factors: years since publication, number of references, number of authors, and number of pages. We called the second one the standard + name model and added the journal name to the standard model. We made sure to isolate the journal name from the standard model because of the high likelihood that it would dominate the model.

We then created four conceptual phrase models based on the text contained in the title, author keyword and ISI keyword fields in the dataset. We used text mining to identify the most frequently occurring terms and a computational linguistic method to group the terms into related conceptual phrases. To create the remaining four models we added each of the conceptual phrase lists to the original standard model. See Table 6 for a description of all six models.

Finally we ran all six models through a logistic regression to determine the significant factors correlated to high quality articles. The remainder of this portion of the paper will describe the procedure and the results of these analyses in detail.

Table 6 - Description of Six Regression Models

Model Name

Description

Standard Model

Includes years since publication, number of references, number of authors, number of pages, and type of document

Standard + Name

Standard model plus journal name

Standard + Title

Standard model plus conceptual phrases from article title

Standard + Author Keyword

Standard model plus conceptual phrases from author keyword

Standard + ISI Keyword

Standard model plus conceptual phrases from ISI keyword

Standard + Title + Author + ISI

Standard model plus conceptual phrases from article title, author keyword and ISI keyword

Advanced statistical analysis of high-quality MIS articles

Since we have already defined what high-quality MIS articles are, in the next step, we will utilize advanced statistical tools to discover the indicators of high-quality articles. This question can be answered by conducting a regression analysis. Logistic regression has the advantage in dealing with binary variables and providing a method to show the importance of different variables. The dependent variable in this task is the binary variable called quality which has two values 1 or 0. Several groups of variables are put into several logit models, the pseudo R square of each model is calculated and the coefficients of variables are used to explain the correlated factors to high-quality MIS articles.

The 6 Basic Variables

There are some explicit variables we can use to build the logit models. Year is a discrete variable that shows the number of years passed since the article was first published. Number of references is the discrete variable that shows the number of references cited in this article by the authors. Number of authors is a discrete variable showing the number of authors in a certain article. Document Type is the categorical variable that shows whether the article is research article, editor material, proceeding papers or review. These are basic variables we can use as independent variables in our Logit model, but they cannot explain much of the variance. Journal Name is a categorical variable that has eight values (names of the eight MIS journals), and it is an important variable that shows in which journal the article was published.

Table 7 - The 6 Basic variables Used in the Logit Model

Variable

Type

Distribution or Values

Number years since article published

Numeric

Min=5, Max=32, Mean=23.22, StD= 6.6426

Number of references

Numeric

Min=0, Max=411, Mean=33.252, StD=27.630

Number of pages

Numeric

Min=1, Max=77, Mean=14.400, StD=7.795

Number of authors

Numeric

Min=1, Max=17, Mean=2.166, StD=1.058

Document type

Categorical

{Article, Editorial Material, Proceeding Paper, Review}

Journalname

Categorical

{Information & Management, Decision Support Systems, Information Systems, MIS Quarterly, European Journal of Information Systems, Information Systems Research, Journal of Management Information Systems, Journal of Strategic information Systems}

The Generation of Textual Variables

Since we have all data from the title, author keyword, ISI keyword fields for each article, we can make good use of them by converting the unstructured textual data into structural variables. If we need to convert string title into structural variables, several steps are needed here. In the first step, we use the Statistical Natural Language Process method called term extraction to extract frequently used terms from the title field. We extracted more than 1000 terms from the titles. This list is too long to be useful, so we grouped terms that have close meanings together in a higher-level semantic unit we called a conceptual phrase. The computational linguistic method used here is called lexical series algorithm, which helps generate higher-level semantic units by grouping terms with similar meanings together.

Table 8 shows that innovation adoption and other terms such as adoption of technology are in the same group called adoption, meaning that they are all about adoption. With this technique we generated 100 conceptual phrases from the titles of all the 4745 MIS articles, and 50 from author keywords and 50 from ISI keywords.

Table 8 - Terms that are within the Conceptual Phrase Adoption

Terms

Number of Words

Adoption

1

Innovation adoption

2

Electronic marketplace adoption

3

Electronic billing adoption

3

EIS adoption

2

EDI adoption

2

E-commerce Adoption

2

Adoption of technology

3

Adoption of online

3

Adoption of inter-organizational

3

Adoption of client-server

2

Adoption of determinants

3

The results of Logistic Regression Analyses

In order to show the importance of different variables toward high-quality MIS articles, we conduct 6 logistic regression analyses. As showed in Table 9, we illustrate the six logit models we used in which different combinations of the variables are put into the model as the independent variables.

Table 9 - The Variables used in Logit Models

Model

-2 Log/df

Cox & Snell

Nagelkerke

Standard Model

176.731/7

.037

.135

Standard + Name

407.946/14

.082

.304

Standard + Title

405.301/107

.082

.302

Standard + Author Keyword

307.701/57

.063

.232

Standard + ISI Keyword

328.428/57

.067

.247

Standard + Title + Author + ISI

659.500/207

.130

.479

By comparing the Nagelkerke R square of all the logistic models, we can clearly know the performance of different models. The standard model has an R square of 0.135, which is the lowest. The standard + name modelhas an R square of 0.304, much higher than that in the standard model. This means that name of the journal can explain a lot about why the article received many citations and was selected as a high-quality article. It is common sense that high quality journals only accept high-quality articles. In the MIS field, the articles published in MIS Quarterly and Information Systems Research receives many more citations compared with those published in other journals.

The next step is to put the journalnameinto the model, and we labeled with standard + name model. The result shows that the six independent variables in standard + name help reduce the -2 log likelihood statistic at 407.946 (degree of freedom =14, sig. = 0.0). As to the R squares, the result shows that the Cox and Snell R Square is 0.082 and Nagelkerke R Square is 0.304, significantly higher than those in standard model. The estimations of coefficients and the p-value of all the variables are showed in the following table.

Table 10 - The Logistic Regression standard + name model (Cox & Snell =.082; Nagelkerke = .304)

Variable

Coefficient

Std. Error

Wald

Sig.

Intercept

-3.12

.631

24.322

.000

Number of References

.018

.003

27.822

.000

Number of Pages

.002

.012

.029

.866

Number of Authors

.079

.084

.879

.348

Years since published

.002

.017

.009

.923

Article Document Type

.458

.365

1.573

.210

Editorial Document Type

-1.676

1.111

2.273

.132

Proceedings Document Type

.268

.527

.258

.612

Review Document Type

0 (benchmark)

Decision Support Systems

-3.269

.439

55.545

.000

European Journal of Information Systems

-4.078

1.022

15.913

.000

Information & Management

-2.827

.334

71.766

.000

Information Systems

-2.886

.403

51.335

.000

Information SystemsResearch

-.506

.233

4.696

.030

Journal of Management Information Systems

-1.719

.355

23.511

.000

Journal of Strategic Information Systems

-3.687

1.014

13.227

.000

MIS Quarterly

0 (benchmark)

The result of logistic regression for the standard + name model objectively generates an answer to the question What is the best journal in the area of MIS. In the model category MIS Quarterly is selected as the benchmark category (with coefficient of 0). We can find that all the other categories have significant (p-value