6
978-1-4673-1572-2/12/$31.00 ©2012 IEEE Abstract—Authorship attribution, namely determination of the author of a text, may become an extraordinarily complex and sensitive job due to its relatively difficult feature extraction phase and highly nonlinear nature. This paper proposes a classification tool using committee machines consisting of multilayered perceptron neural networks (MLP) to identify the author of a text. Each expert is an individual MLP learning complex input- output relation composed of 14 lexical, stylometric attributes extracted from the corpus. The resulting mapping after training is used to identify the texts in German language written by two different authors. Unlike other committee based classification tools individual answers of the experts are combined with a novel voting method, k-nearest neighbors rated voting. The proposed method shows very promising results when benchmarked with simple majority voting technique. Index Terms—Artificial neural networks, author identification, committee machines, k-nearest neighbors rated voting I. INTRODUCTION TYLOMETRY, statistical analysis of writing style, relies mainly on the assumption that individuals have distinctive ways of writing and this writing style cannot be manipulated consciously [1]. Based on this assumption a long history of linguistic and stylistic investigation into author identification has been devoted. In recent years, number of stylometry based author identification applications has grown in areas such as criminal law, civil law, and computer security as a part of a broader area within intelligent identification technologies, including, cryptographic signatures, intrusion detection systems, spam mail detection, and others [2–4]. Authorship attribution may become an extraordinary complex and sensitive job due to its computational demand and legal consequences that it indicates. In broad terms, it can be considered as a typical classification problem, where a set of text documents with known authors are used for learning and the aim is to determine the corresponding author of an anonymous text. A more specific and precise definition of stylometry in [2] implies that it is more about distinguishing styles of two authors. This two authors approach can be A. K. Author is with the Industrial Engineering Department, International University of Sarajevo, Hrasnicka Cesta 15, 71210, Bosnia and Herzegovina (e-mail: [email protected] , [email protected] ). certainly expanded to more than two authors case with some effort. In contrast to other classification tasks, the main problem in authorship attribution studies is that it is not clear which features of a text should be taken out to identify an author correctly. Researchers proposed feature sets covering all level of language including even non-linguistic features such as layout of a document [2]. Early stylometric studies introduced the idea of counting word and sentence lengths [5]. Other features that have been employed include various measures of vocabulary richness and lexical repetition, and word frequency distributions [6]. So-called function words are another widely used feature set. Mosteller and Wallace [7] counted the frequency of words like ‘while’ and ‘upon’, while Burrows [8], [9] proposed to use sets of more than fifty common high-frequency words and conducted a version of principal component analysis on the data. Reference [2] notes one reason why function words perform well as their topic-independent characteristics. Function words tend to be relatively meaningless, and simply describe relationships between content words; i.e., syntax. More precisely, a person’s preferred syntactic construction can also be cues to her/his authorship. Recent research mainly concentrates on statistical and artificial intelligence techniques, including artificial neural networks (ANN) [10], [11], support vector machines (SVM) [1], decision trees [12], and many other classification tools [13]. Tweedie et al. [10] applied multi-layer perceptrons to identify the authors of disputed federalist papers. As input variables they used frequency of eleven function words and designed a network consists of three hidden and two output neurons. The results were consistent with other researchers’ classification results employing other methods. Tsimboukakis and Tambouratzis [11] employed neural networks and support vector machines on a dataset which comprises texts from the minutes of the Greek Parliament. They extracted 85 features used as distinctive characteristics of five speakers. The highest classification accuracy achieved was (90.5%) When compared with SVM, MLP resulted with slightly lower classification accuracy than the SVM. Reference [14] used learning vector quantization technique (LVQ) to classify disputed 23 Tamil articles. They extracted Authorship Attribution Using Committee Machines with k-nearest Neighbors Rated Voting Ali Osman Kusakci S

[IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

978-1-4673-1572-2/12/$31.00 ©2012 IEEE

Abstract—Authorship attribution, namely determination of the

author of a text, may become an extraordinarily complex and sensitive job due to its relatively difficult feature extraction phase and highly nonlinear nature. This paper proposes a classification tool using committee machines consisting of multilayered perceptron neural networks (MLP) to identify the author of a text. Each expert is an individual MLP learning complex input-output relation composed of 14 lexical, stylometric attributes extracted from the corpus. The resulting mapping after training is used to identify the texts in German language written by two different authors. Unlike other committee based classification tools individual answers of the experts are combined with a novel voting method, k-nearest neighbors rated voting. The proposed method shows very promising results when benchmarked with simple majority voting technique.

Index Terms—Artificial neural networks, author identification, committee machines, k-nearest neighbors rated voting

I. INTRODUCTION TYLOMETRY, statistical analysis of writing style, relies mainly on the assumption that individuals have distinctive

ways of writing and this writing style cannot be manipulated consciously [1]. Based on this assumption a long history of linguistic and stylistic investigation into author identification has been devoted. In recent years, number of stylometry based author identification applications has grown in areas such as criminal law, civil law, and computer security as a part of a broader area within intelligent identification technologies, including, cryptographic signatures, intrusion detection systems, spam mail detection, and others [2–4].

Authorship attribution may become an extraordinary complex and sensitive job due to its computational demand and legal consequences that it indicates. In broad terms, it can be considered as a typical classification problem, where a set of text documents with known authors are used for learning and the aim is to determine the corresponding author of an anonymous text. A more specific and precise definition of stylometry in [2] implies that it is more about distinguishing styles of two authors. This two authors approach can be

A. K. Author is with the Industrial Engineering Department, International

University of Sarajevo, Hrasnicka Cesta 15, 71210, Bosnia and Herzegovina (e-mail: [email protected], [email protected]).

certainly expanded to more than two authors case with some effort.

In contrast to other classification tasks, the main problem in authorship attribution studies is that it is not clear which features of a text should be taken out to identify an author correctly. Researchers proposed feature sets covering all level of language including even non-linguistic features such as layout of a document [2]. Early stylometric studies introduced the idea of counting word and sentence lengths [5]. Other features that have been employed include various measures of vocabulary richness and lexical repetition, and word frequency distributions [6].

So-called function words are another widely used feature set. Mosteller and Wallace [7] counted the frequency of words like ‘while’ and ‘upon’, while Burrows [8], [9] proposed to use sets of more than fifty common high-frequency words and conducted a version of principal component analysis on the data. Reference [2] notes one reason why function words perform well as their topic-independent characteristics. Function words tend to be relatively meaningless, and simply describe relationships between content words; i.e., syntax. More precisely, a person’s preferred syntactic construction can also be cues to her/his authorship.

Recent research mainly concentrates on statistical and artificial intelligence techniques, including artificial neural networks (ANN) [10], [11], support vector machines (SVM) [1], decision trees [12], and many other classification tools [13].

Tweedie et al. [10] applied multi-layer perceptrons to identify the authors of disputed federalist papers. As input variables they used frequency of eleven function words and designed a network consists of three hidden and two output neurons. The results were consistent with other researchers’ classification results employing other methods.

Tsimboukakis and Tambouratzis [11] employed neural networks and support vector machines on a dataset which comprises texts from the minutes of the Greek Parliament. They extracted 85 features used as distinctive characteristics of five speakers. The highest classification accuracy achieved was (90.5%) When compared with SVM, MLP resulted with slightly lower classification accuracy than the SVM.

Reference [14] used learning vector quantization technique (LVQ) to classify disputed 23 Tamil articles. They extracted

Authorship Attribution Using Committee Machines with k-nearest Neighbors Rated

Voting Ali Osman Kusakci

S

Page 2: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

24 function words as input variables from 32 articles and tried to attribute authorship to the disputed articles from same period. Based on the results of their study, LVQ seemed to be a powerful technique for stylometry analysis.

In this paper, an application of committee machines with MLP neural network experts will be presented to authorship identification problem. A new method, k-nearest neighbors rated voting (k-NNRV) algorithm, is utilized rather than simple majority voting while combining the responses of the parallel networks. By this method, it is aimed to account for varying specialization power of experts on different region of feature space. All above, this approach recognizes that one author may use different syntactic styles by employing parallel experts studying partially separated parts of the input space.

Texts studied are literary works of two worldwide known writers, German-Swiss Hermann Hesse and Austrian Stephan Zweig. 14 lexical and syntactical components are selected to describe corpus. To construct the committee machine following steps will be applied: To reduce the dimensionality of the feature space the

dataset will be pre-processed and normalized with min/max mapping.

Three MLPs with different initial weights will be designed.

An ensemble of experts (a committee) will be constructed by employing k-NNRV algorithm for each test data sample to be estimated.

After the introduction section, this paper is organized as follows; firstly, a short description of neural networks is presented in Section II. Secondly, the structure of the committee machine with proposed k-NNRV method is introduced in Section III. Section IV is devoted for a detailed description of dataset and methodology with configurations of proposed algorithm. Lastly, obtained results are illustrated with a discussion section at the end.

II. NEURAL NETWORKS An ANN architecture is capable of learning from a set of

examples and recognizing the patterns with the property that the gained knowledge will be generalized successfully to other patterns from the same domain [15]. Their generalization power in the presence of extreme values and highly nonlinear patterns has been widely recognized [16], [17]. The topology of a feed forward NN with back propagation algorithm will not be discussed here due to page limitations. Furthermore, potential readers of this paper should already be familiar with basic terminology in this domain. Some guidelines followed by this paper while designing the ANN will be discussed in short.

A. Basic Design Guidelines A good choice of initial values of the synaptic weights can

be of tremendous help in well behaving network design. Both large and small values in initialization stage should be avoided [18]. As a rule of thumb, synaptic weights can be picked up from a uniform distribution with a zero mean and a variance, which makes the standard deviation of the induced local field

of the neurons lie at the region between the linear and saturated parts of the sigmoid activation function. One practical approach which leads to good choice of initial weights is Nguyen-Widrow [19] method. It divides the input space into small intervals and chooses the values so that active region of each neuron in the layer is approximately evenly distributed across the layer’s input space [19].

Assume that the range of input signals and non-saturated region of activation function is (-1, 1). For a m(p+1) weight matrix w with bias terms in its first column, the magnitude of the synaptic weights is given as follows:

pmG /17.0 (1)

where m stands for number of hidden neurons for a network fed with p input variables for each input sample. To initialize the weights a pm dimensional matrix A has to be randomly generated. Based on A, m normalized random unity vectors ja are given by

:)A(j,:)A(j,a j . (2)

By using ja , synaptic weights (disregarding the bias term) can be initialized as

ja1)p:w(j,2 G (3)

for mj ,...,2,1 . As seen in (3) weights for bias will be inserted in first column as

jGjwjw .)).2:(sgn()1:( (4)

where )1()1(21 mjj for mj ,...,2,1 . Despite their crucial role in creating learning models with

sufficient sensitivity and robustness, the number of input and output neurons are generally to be seen as external specification of the NN [20]. However, performance of a NN model strongly depends on some features including number of input neurons and statistical specifications of input data [18]. To make a NN perform better the information content has to be maximized by choosing the training example on the basis that its information content is the largest possible for the task to be solved [21].

Number of hidden neurons is another factor to be specified in the model. Although processing time restricts the number of neurons in the hidden layer, there is no bounding constraint to determine the number of the neurons. Thus, this number shall be determined experimentally.

Another essential stage of designing a NN with good generalization features is pre-processing the inputs. Each input variable should be pre-processed so that its mean value is close to zero or else it is small compared to its standard deviation [21]. An easy normalization strategy may be to map the values of an input variable between -1 and 1 with so called min/max mapping. This operation results with a mean value close to zero as suggested above.

As final word on NNs, the whole structure of the network, including number of neurons, size and level of

162

Page 3: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

representativeness of the dataset etc., affect the performance of the network strongly and whole structure has to be tailored very carefully [22]. That is why design of artificial NNs is both a science and an art.

B. Committee Machines with k-NNRV Unlike other (AI) tools, committee machines allow to

distribute the learning task among a number of experts, and accordingly the input space into a set of subsets. Then, the knowledge acquired by experts can be combined to generalize a universal decision that is assumed to be superior to that attainable by any of them reaching alone [18]. For an overall decision mostly a simple majority voting technique is used. However, simple voting does not take into consideration that each expert may have different expertise area. To overcome this burden, this paper proposes an alternative voting method; k-nearest neighbors rated voting (k-NNRV), which accounts for varying specialization power of each expert on different region of feature space. The method can be summarized in the following steps: (i) Obtaining the k-nearest neighbors of a test point: The local neighborhood of a point iq from test set is defined as a subset of training dataset, tTrainingsei k21 ,...xx,x , that lies in the immediate neighborhood of that point. In other words, i is formed by calculating Euclidian distances

ji xq over all N training data samples and taking k points

with the shortest distance as k-nearest neighbors of iq . (ii) Computing the accuracy of experts: Given

lEEEx xx21 ,...,, , a set of L experts composing the committee, test samples are fed to each committee members. The accuracy

liACC , of an expert lEx in the neighborhood of iq can be formulated as number of correct classifications if lEx is fed with i . (iii) Computing overall decision: To generate a performance measure based on accuracy numbers two approaches can be employed. (a) Collaborative method: Rating lir , of lEx for ith test sample is given as:

L

llilili ACCACCr

1,,, / . (5)

Let yi,l denote the response of lth expert to ith data point. Overall output of the system is a weighted average of responses given by each expert

L

llili yry

1,, . (6)

(b) Competitive method: Rating lir , for lEx and at ith sample is defined in the same manner as given in (5). Unlike the former method, only the winner expert’s decision is accepted as overall response. More precisely, the expert with

)max( ,lir will be the winner of the competition. In math, *,li

yy where )max( ,*, lirrli

.

III. DATASET AND METHODOLOGY In this research texts of writers, Hermann Hesse, and

Stephan Zweig, are used as corpus. Three novels from each author are selected and 14 different attributes are extracted from their writings to identify the possibility of either of considered writers being recognized as the author of a paragraph taken from their books. The corpus is given in Table I.

Whether the author of a novel is reflected in lexical and syntactic characteristics of it is the question yet to be answered. To extract content independent and reliable training set, samples from two different books of each corresponding author were included while in the test phase a dataset is generated from one book for each author unseen before by the committee machine.

This study proposes a 3–units authorship attribution scheme:

Feature extraction unit Committee machine (parallel running experts) unit k-NNRV unit.

A. Feature Extraction Establishing features that work as effective discriminators of

texts under study is one of the main challenges in this research on authorship analysis. Therefore, sufficient number of descriptors must be selected with a large data pool.

For each novelist, 2000 paragraphs are randomly selected from their two books while their third books are used to generate a test data with 300 samples for each. Thus, a training data pool with 4000 data samples is prepared while teaching the individual MLPs. To test estimation power of the committee machine 600 samples are used.

Before generating data set the textbooks have been pre-processed to eliminate punctuation and capitalization effects, so that, for example, “und”, “und?” and “Und” will be counted as same token. In each paragraph, 41 attributes are counted while some are used to determine some indirect features of the writers. For example; in total 36 function words are counted, however only 6 of them are included in the dataset whereas the remaining statistics are combined and exploited to generate other second order attributes, such as cumulative running sum for number of words per paragraph, and total of function words.

The descriptive statistics for these 14 textual descriptors are

TABLE I LIST OF BOOKS USED

Training and 1st Test Set 2nd Test Set Book1 Book2 Book3 Stephan Zweig

Ungeduld des Herzens

Rachel rechtet mit Gott

Joseph Fouché: Bildnis eines politischen Menschen

Hermann Hesse

Das Glasperlenspiel

Freunde Gertrud

163

Page 4: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

given in Table II. As it can be seen from the table above Hesse prefers long paragraphs while Zweig’s data exhibits strong deviations in some attributes when compared with Hesse’s style, including number of words, number of “auch”, “und”, “nicht”, and “wie”.

B. Committee Machine In this study three different MLPs are generated to form a

committee of experts. The MLPs are trained with randomly selected 500 data from the training data pool of 4000 sample paragraphs each having 14 descriptors. Then, from training data pool randomly selected 1000 samples are stored separately to test each expert individually before building an ensemble of experts. This step is called the first testing stage. The proposed architecture of MLFs after several runs of experiments is obtained and described in Table III.

After each MLP is tested individually, a committee of experts is set up and the third books of both authors are fed to the networks to see the generalization power of the committee machine. This stage is called the second testing stage. The outputs of MLPs are evaluated with k-NNRV algorithm described in section III. First, the k-nearest neighbors are determined for each test point from the training set and sent to

the three experts while the ratios of correct classification out of

k neighbors are recorded to grade the level of expertise of each committee member at the region of the specific test point. This ratio can be interpreted as reliability of estimation of a specific expert for that test point. The ratings are utilized in two ways: the collaborative method proposes that the answers are

multiplied by the ratings and linearly added to have an overall estimation made by the committee whereas the competitive method assigns the expert with highest rating as “winner” and decision is made based on “winner expert’s” response.

Lastly, the proposed collaborative and competitive approaches are benchmarked with simple majority voting to demonstrate increased prediction power of the committee machine.

Another question that is aimed to be answered by this study is whether number of neighbors does influence the accuracy of the committee. To test this, three different k values are tested, 5, and 10.

For comparison of results two measures are employed:(i) Mean Squared Error (MSE) which is defined in [18]; and (ii) F-measure, [23] formulated in [24] as

RP

RPF

2 (7)

where P and R stand for precision and recall, respectively, with equal weights assigned for both ratios.

IV. RESULTS The results of classification for each expert at the end of

training, and 1st testing phases are given in Table IV. For each expert recall, precision, F-measure, and MSE values are depicted. Results in training stage shows that the third expert has the highest overall accuracy of 87.2% while the second expert reaches 84.9%. Viewing the results on first and second test data in Table IV it can be noticed that all experts reach to a good generalization power, as they all performed slightly worse than the results in training stage. Another point to be mentioned is the experts’ slightly differing recall and precision values for each author. This indicates different specialization

TABLE II. DESCRIPTIVE STATISTICS FOR TRAINING AND TEST SET.

Training and 1st Test Set 2nd Test Set Zweig Hesse Zweig Hesse Zweig Hesse Zweig Hesse Mean Mean Std Dev Std Dev Mean Mean Std Dev Std Dev

Char count 470.447 529.14 541.061 585.766 1250.3 354.28 807.482 344.595 Char per word 5.3141 5.435 0.65401 0.6680 5.9601 4.668 0.342262 0.5285 Word count 88.5775 95.812 100.118 103.459 209.67 75.906 135.0682 72.737 Cum run1 2985.26 12727 3170.85 6598.83 -2330. 1501.1 1205.643 811.75 Word per sentence 14.439 24.256 9.2106 15.683 22.229 18.559 6.680943 11.577 Comma count 8.842 9.8655 11.4306 11.574 21.95 6.6066 14.88487 6.5779 es count 0.6365 1.175 1.1619 1.7002 0.8 0.8766 1.094223 1.1886 sich count 0.774 0.8245 1.2616 1.3844 2.36 0.3466 2.156983 0.7034 nicht count 1.064 1.035 1.8004 1.478 1.85 0.9333 2.069244 1.3347 wie count 0.6975 0.5375 1.2941 1.0440 1.056 0.3866 1.282812 0.7869 auch count 0.2015 0.56 0.5357 0.9979 0.53 0.3366 0.798682 0.6255 und count 3.159 4.9925 4.4575 6.2988 7.1666 3.83 5.184769 4.6288 2-3-4-5 sum 13.809 17.20 16.5691 19.312 27.976 12.786 18.65716 12.830 Cum run2 490.28 1848.2 487.420 1134.57 -356.4 258.62 203.2614 137.56

TABLE III STRUCTURE OF MLPS

Max number of epochs 1500 Activation function of hidden layer Tanh Sigmoid Activation function of output layer Tanh Sigmoid Mse goal .01 Number of neurons in the hidden layer 20 Number of data in the training data pool 4000 Number of training data for each expert 500 Number of 1st test data for each expert 1000 Number of 2nd test data for committee 600 Learning parameter 0.004

164

Page 5: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

of experts. The results obtained by employing committee machine with

mentioned voting strategies are given in Table V. As it is seen from the table, the overall accuracy on second test data for k-NNRV with 5, and 10 neighbors is above 86% while highest overall recall value is 89.6% with competitive voting approach.

Additionally, it can be deduced that the proposed voting algorithm with both methods improves system’s performance up to 5.2% for Zweig and 4.3% for Hesse when F-measures are compared with values in single expert case. This clearly shows that a committee composed of three MLPs replaced with any of experts acting alone can improve classification accuracy. Furthermore, it can be seen that committee of experts exhibits lower standard deviations in all performance measures. This indicates the trade-off between variance of estimation and model complexity.

When benchmarked with simple majority voting, the proposed voting algorithm with both collaborative and competitive methods demonstrates better prediction performance while competitive method is superior to the others for all k values. Therefore, this study provides strong evidence that the proposed voting method is effective.

Considering lower standard deviation as k increases, it can be seen that the model is more stable when constructed with more neighbors. However, increasing number of neighbors may not result with an immediate increase in F-measures and MSE values for all voting methods as it can be seen from

Table V. Thus, based on the result of experiments, there is no direct relation between range of neighborhood and prediction performance. However, this statement needs to be verified for different dataset and higher/lower k values.

Another very surprising implication of this study is that a committee does not contribute to improvement of recall values of the paragraphs authored by Hesse. On the other hand, building a committee machine makes a significant difference of at most 11.7% in recall of Zweig’s paragraphs. This in turn increases precision ratios on Hesse data. Hence, on overall improvement in F-measures for both authors is evident.

V. RESULTS Despite its long history, authorship attribution domain is

still a promising research area, since researchers have not been so far come to an agreement which artificial intelligence tools on which feature set to be used. Conventional works assumes each author has a unique, inimitable style and tries to model it. However, one author may change her/his way of writing in the course of time. This paper concerning author identification analysis shows that parallel running experts with the proposed k-NNRV method can be very efficient tool to deal with the variations in style.

Needless to say, this study reveals some strong implications to a good solution for selected data set but it is not sole and

TABLE IV. CLASSIFICATION RESULTS FROM AN AVERAGE OF 20 TRAILS FOR THREE

EXPERTS ON TRAINING, AND 1ST TEST DATASET Training R % P % F% MSE Exp.1 Zweig 87.6±10.2 86.7±7.1 86.4±3.6 Hesse 85.2±10.1 88.7±7.7 86.1±3.3 Overall 86.4±2.9 0.56±.32 Exp.2 Zweig 84.6±10 85.1±5 84.2±3.8 Hesse 84.3±8.1 85.9±8 84.4±2.3 Overall 84.9±3.3 0.62±0.1 Exp.3 Zweig 87.1±2.5 87.4±2.4 87.2±2.4 Hesse 87.4±2.4 87.1±2.4 87.3±2.4 Overall 87.2±2.3 0.51±0.09 Test 1 R % P % F% MSE Exp.1 Zweig 83.8±17.6 86.4±10.9 83.0±9.9 Hesse 84.3±9.8 85.9±7.1 84.3±3.3 Overall 84.1±5.4 0.63±0.21 Exp.2 Zweig 82.0±11.4 83.9±4.7 82.3±4.2 Hesse 83.4±8.0 83.7±8.7 82.8±2.4 Overall 82.7±2.9 0.69±0.11 Exp.3 Zweig 84.4±2.3 85.8±1.9 85.1±1.8 Hesse 86.0±2.1 84.7±2.0 85.3±1.7 Overall 85.2±1.7 0.59±0.07

TABLE V. CLASSIFICATION RESULTS FROM AN AVERAGE OF 20 TRAILS FOR THE

COMMITTEE ON 2ND TEST SET WITH TWO DIFFERENT VOTING APPROACHES

5-nearest neighbors R % P % F% MSE Collaborative Method Zweig 92.5±3.6 85.9±3.0 89.0±1.8 Hesse 84.7±4.1 92.0±3.1 88.1±2.2 Overall 88.6±1.9 .49±.08 Competitive Method Zweig 94.9±3.6 85.9±2.5 90.1±1.5 Hesse 84.2±3.7 94.5±3.6 89.0±1.7 Overall 89.6±1.6 .41±.06 Simple Majority Voting Zweig 87.1±2.5 87.4±2.4 87.2±2.4 Hesse 87.4±2.4 87.1±2.4 87.3±2.4 Overall 87.3±2.4 .51±.10 10-nearest neighbors R % P % F% MSE Collaborative Method Zweig 93.9±2.6 84.7±3.4 89.0±1.1 Hesse 82.8±4.6 93.3±2.5 87.6±1.8 Overall 88.4±1.4 .46±.06 Competitive Method Zweig 97.6±2.0 83.9±3.3 90.1±1.2 Hesse 81.0±4.7 97.2±2.1 88.2±2.0 Overall 89.3±1.5 .42±.06 Simple Majority Voting Zweig 90.2±6.1 84.3±2.7 86.9±2.0 Hesse 82.9±4.2 89.9±5.0 86.0±1.1 Overall 86.5±1.4 .54±.06

165

Page 6: [IEEE 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2012) - Belgrade, Serbia (2012.09.20-2012.09.22)] 11th Symposium on Neural Network Applications

comprehensive solution in this domain. It is undeniably true in the studied examples that it would have to be verified against much wider number of books as for other books of same authors may demonstrate different characteristics. It seems to deliver promising results to combine neural networks with fuzzy decision making in combination stage of the experts’ answers by exploiting the rating as fuzzy numbers. Another question yet to be answered is how number of experts in the committee affects the prediction power of the system. Adding more experts to the committee with propagation of unlearned data to the next expert may improve prediction accuracy as it is proven in [25] for a different domain. Additionally, focusing on the process of features selection would be also relatively helpful to recognize more informative features among many others by employing principle component analysis on raw data consisting of high number of attributes.

ACKNOWLEDGMENT This study was proposed to the author by Prof. Dr. Mehmet

Can. The author acknowledges his unpriced efforts and encouragements. The author thanks Amir Yamak for his feature extractor software and Suvad Selman for his valuable comments.

REFERENCES [1] J. Diederich, J. Kindermann, and E. Leopold, “Authorship

attribution with support vector machines,” Applied Intelligence, vol. 19, pp. 109-123, 2003.

[2] P. Juola, “Authorship Attribution,” Foundations and Trends in Information Retrieval, vol. 1, no. 3, pp. 233-334, 2006.

[3] D. R. Prabhakar and M. Basavaraju, “A Novel Method of Spam Mail Detection using Text Based Clustering Approach,” International Journal of Computer Applications, vol. 5, no. 4, pp. 15-25, 2010.

[4] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin, “Intrusion detection by machine learning: A review,” Expert Systems with Applications, vol. 36, no. 10, pp. 11994-12000, Dec. 2009.

[5] T. C. Mendenhall, “The characteristic curves of composition,” Science, vol. 9, no. 216, pp. 237–249, 1887.

[6] H. S. Sichel, “On a Distribution Law for Word Frequencies,” Journal of the American Statistical Association, vol. 70, no. 351, p. 542, Sep. 1975.

[7] F. Mosteller and D. L. Wallace, Inference and Disputed Authorship: The Federalist. Reading: Addison-Wesley, 1964.

[8] J. F. Burrows, “Word patterns and story shapes: The statistical analysis of narrative style,” Literary and linguistic computing, vol. 2, no. 2, pp. 61-70, 1987.

[9] J. F. Burrows, “‘An ocean where each kind. . .’: Statistical analysis and some major determinants of literary style,” Computers and the Humanities, vol. 23, no. 4–5, pp. 309-321, Aug. 1989.

[10] F. J. Tweedie, S. Singh, and D. I. Holmes, “Neural network applications in stylometry: The Federalist Papers,” Computers and the Humanities, vol. 30, no. 1, pp. 1-10, 1996.

[11] N. Tsimboukakis and G. Tambouratzis, “A comparative study on authorship attribution classification tasks using both neural network and statistical methods,” Neural Computing and Applications, vol. 19, no. 4, pp. 573-582, Nov. 2009.

[12] C. Apté, F. Damerau, and S. M. Weiss, “Automated learning of decision rules for text categorization,” ACM Transactions on Information Systems, vol. 12, no. 3, pp. 233-251, Jul. 1994.

[13] T. Tas and A. K. Gorur, “Author Identification for Turkish Texts,” Journal of art and sciences, vol. 7, pp. 151–161, 2007.

[14] M. Bagavandas, A. Hameed, and G. Manimannan, “Neural Computation in Authorship Attribution: The Case of Selected Tamil

Articles Journal of Quantitative Linguistics, vol. 16, no. 2, pp. 115-131, May 2009.

[15] J. Arifovic and R. Gençay, “Using genetic algorithms to select architecture of a feedforward artificial neural network,” Physica A: Statistical Mechanics and its Applications, vol. 289, no. 3–4, pp. 574-594, Jan. 2001.

[16] T. Kaya, E. Aktas, I. Topcu, and B. Uluengin, “Modeling Toothpaste Brand Choice: An Empirical Comparison of Artificial Neural Networks and Multinomial Probit Model,” International Journal of Computational Intelligence Systems, vol. 3, no. 5, pp. 674–687, 2010.

[17] S. Anbazhagan and N. Kumarappan, “Day-ahead price forecasting in Asia’s first liberalized electricity market using artificial neural networks,” International Journal of Computational Intelligence Systems, vol. 4, no. 4, pp. 476-485, 2011.

[18] S. Haykin, Neural Networks: A comprehensive Foundation. Upper Saddle River, New Jersey: Prentice Hall International, Inc., 1999.

[19] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” in International Joint Conference on Neural Networks, 1990, pp. 21-26.

[20] authorship attribution of literary texts,” INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS, vol. 1, no. 4, pp. 151-158, 2007.

[21] Y. Le Cun, Efficient Learning and Second-Order Methods. 1993. [22] G. Montagna, M. Morelli, O. Nicrosini, P. Amato, and M. Farina,

“Pricing derivatives by path integral and neural networks,” Physica A: Statistical Mechanics and its Applications, vol. 324, no. 1–2, pp. 189-195, Jun. 2003.

[23] C. J. van Rijsbergen, Information retrieval, 2nd ed. London: Butterworths, 1979.

[24] H. T. Ng, W. B. Goh, and K. L. Low, “Feature selection, perception learning, and a usability case study for text categorization,” ACM SIGIR Forum, vol. 31, no. SI, pp. 67-73, Dec. 1997.

[25] F. Åström and R. Koker, “A parallel neural network approach to prediction of Parkinson’s Disease,” Expert Systems with Applications, vol. 38, no. 10, pp. 12470-12474, Sep. 2011.

166