A New Dimensional Knowledge Discovery for Routing Network using Latent Feature Model

IJIRST –International Journal for Innovative Research in Science & Technology| Volume 2 | Issue 02 | July 2015 ISSN (online): 2349-6010

All rights reserved by www.ijirst.org 202

A New Dimensional Knowledge Discovery for

Routing Network using Latent Feature Model

Mr. Emil George Ms. Silja Varghese

Department of Computer Science and Engineering Department of Computer Science and Engineering

Nehru College of Engineering and Research Center,

Pampady, Thrissur, Kerala, India

Nehru College of Engineering and Research Center, Pampady,

Thrissur, Kerala, India

Abstract

Cyber Physical Systems are primarily focused to interact with other systems across the data collection. Information collection

depends on the field of knowledge discovery. This contains information either from field based or collaborative filtering. In a

domain recommendation model, filtering data based on the rating pattern across the domains. The basic functionality of the cross

domain recommendation model is to seek and retrieve the information with reduced the repeated records. A domain

recommendation model, which are identifying the common routing pattern across specific domains and determining the specific

domain rating patterns in each domain containing similar behavior and relation which exists among the fields. For the clustering

information, system utilizing a different data set with an additional amount of accumulating data. These data sets are compared

and filtered with items and are clustered with domains. The clustering process reduces the uncompleted data sets and offers more

meaningful data. Domain recommendation model mainly integrating the knowledge discovery and information transport as part

of the information transfer organization. The major type of systems is offered retrievals of items through the single field. The

cross domain recommendation provides data by integrating multiple areas with identical details.

Keywords: Cyber Physical System, Cross Domain Recommendation, Nonnegative Matrix Factorization, Knowledge

Discovery and Information Retrieval

_______________________________________________________________________________________________________

I. INTRODUCTION

Now a day the information handling and sharing are the most abundant in the field of e-commerce. This will produce numbers of

drawbacks to get the right information from suitable sources. To integrate with information overload recommended system

provides more beneficial options. Recommender systems are the best research field used to pile up more suitable information.

Major component of these organizations belongs to a single field. But in that respect user knowledge acquired in single area

could be clustered and collaborated in various other domains with similar settings. The cross domain recommendation model is

not only limiting with information sharing, but also provide, distributed recommendations of similar related user ratings

getting from different areas, e.g. not focusing just a moving picture [2], but also music CDs, books somehow related to that

picture. Existing work also puts up a joint recommendation on multiple areas, simply choose the user recommendation only

based on the privileges and this sharing is determined in some arena of pursuit. The major critical part of these surveys is to

recover out the existing correlation of particular preferences with users in different arenas, getting the fact to relating item factors

with user interests, design the right models to integrate and make these findings, and develop a fantastic structure to encompass

whole of these evaluations. Cross domain recommendation may tend to be higher user recommendations for information and to

be more precise than single domain.

Finally, there are schemes that concentrate on joining combined recommendations for different areas to establish a single

knowledge base system with standardized features. Whenever using the web site, the system will record user‟s feedback and

utilized it as ratings. Established on such ratings, recommended system predicts personalized information to notified items. That

is suitable method used to carve up this information and sorted with suitable approaches. Only in some situation users are not

willing to supply the useful feedback due to the lack of involvement. For the instance, such as user buying an electronic device,

they may not accept a plan to purchase new single lately. It may cause data sparsity problems. The inefficient or unreliable

response from the user causes a major limitation for domain related models. For solving the prescribed type of problems, the

cross domain recommendation model utilizes multiple domain with similar referencing features. Due to the sharing of

information across multiple fields, cross domain recommendation model collects a vast quantity of data. Most models take over

that the collected information will be dense for all types of users with items [4]. An augmented matrix can be recreated by

horizontally merging all clustered matrices with similar feedback. The existing studies in the social networks focus the

relationship across the two different areas. These interrelations generated different impact on the society and make the new

combinations with different similar fields. For example Bioinformatics formed by combining biology with computer science.

Medical information processing uses the aggregated result of data mining in medicine, which provides efficient effect on medical

scientific discipline. The cross domain recommendation can be useful in all the fields for combining similar things with different

fields. These combined solutions will supply more efficient effect by combining useful information from both fields. The cross

A New Dimensional Knowledge Discovery for Routing Network using Latent Feature Model (IJIRST/ Volume 2 / Issue 02/ 035)


domain recommendation is mainly working with knowledge prediction manner. Thus, studying collaboration and achieving to

the result become hard. The main challenging factor is to recover out the figures of the cross domains. Cross–domain clustering

is usually not applied in the traditional approach with identical domain [3], partly because this attack is more difficult to obtain,

the more suitable relationship in the same region which one does not familiar. Thus, this approach makes challenging one to

applying at once without having proper training samples and information. When compared with the data mining, clustering helps

the researchers to predict the region, which they require to analyze the work and find the identical feedbacks. The cross domain

area required some higher degree of anticipation.

II. RELATED WORKS

In the area of medical analysis, medical analyzer who wants to apply data prediction techniques to detect the deceases, it

will be difficult for them to predict suitable relationship across data mining. Cross domain may bring about difficulties in

finding suitable combinations of terminology in two dissimilar areas. Most of the domain cluster models are founded on

the single field. In this example, the ground substance is prepared with user-items from the auxiliary matrix. This is same

as the mannequin used for the data collection in the self- taught learning model. The self-taught learning model collects the

abundant quantity of data from the auxiliary matrix and represents the test data as the linear combination basis. Cross

domain recommendation forms two sided data as a matrix. Two sided Matrix keeps items and users as the constituents of

the rows and columns. Instead of array in the self-taught learning model, Cross domain model uses matrix form. The code

book model is founded on the manikin-based collaborative filtering and flexible mixture model. Model based collaborative

model is likewise founded on the two sided clustering and is defined equally P(r, i, j) = P(i)P(j)P(r|ui, vj) ø (ui, vj),

Where ui denotes the user cluster to which ith

user belongs, vj the item cluster to which the jth

item belongs and ø the cluster

association parameter. Flexible mixture model is the probabilistic model defined as

P(r, i, j) =∑ ( | ) ( | ) ( | ) ( ) ( )

Where u is denotes the user cluster, v for item cluster, and r for ratings. The flexible mixture model is fully probabilistic

model, whereas the cross domain model is based on clustering item with users. Collaborative filtering have been done on

reducing the sparsity and the similarity enhancement [3] and focuses to get the identical domain knowledge via user-item

clustering, association rule and item based reasoning [5] are smoothest the auxiliary matrix by setting out the average rating

values within user cluster [6] rating patterns by simultaneously clustering the items and users. After clustering the items and

users combine these solutions based on user and item based search solutions. These models block to find out the required

domain. Nevertheless, these examples are fully based on the corresponding domain only. P. B. Li [3] referred a system which

attempts to combine social networks for collaborative filtering. This example worked on content-based filtering and

collaboratively filtering. Different ranking systems which are practiced to learn the heterogeneous content from the social

networks using machine learning methods and combines regression. Yuan et al. [8] aimed to fuse heterogeneous social

relationships for recommendation using factorization and regularization technologies. Different algorithms are recommending

combining the similarities of online social community users by collaborating topics and filtering by probabilistic domain

modeling. Yet, most recommended works referring collaborative filtering with single domain only, and does not look at the

sparsity problem. These recommended models are providing data with regard to specific areas on respective subjects. Dual

transfer learning exploits the duality by matrix tri-factorization, which proposes to solve the problem generated in clusters and

sorting of information.

III. PROBLEM DEFINITION

The existing model generates the latent feature structure by sharing common rating across the domains, which collects the

items contained in the user-item clusters. But the rating patterns from the multiple domains rarely contain the similar ratings.

At the same item limited numbers of domains are closely related to target domains of interest than the others. All the relatedness

among the multiple domains cannot be collected with the identical routing patterns, which are not exact value of all domains.

Domain cluster model is designed based on the item cluster factor and user cluster factor from the available rating data. Based on

the subspace identification, the model can learn the user item rating pattern in common cluster level is shared across the domains.

The model can simultaneously collect domain specific rating pattern by studying knowledge from different domain. When

considering the book-rating and movie-rating web ports, the books and movies can have similar topics or clusters on their basic

meaning (e.g., the categories of classical or comedy), but these similarities may not get in all situations about award winning

movies cannot help to find out the clustering of books on the topics of award history. Inspired by the situation, the

recommendation model focuses knowledge based assumptions to establish relations across the domains. These predictions relate

the user ratings with item characteristics and similarities between the domains. Data collected from the multiple data sources

have heterogeneous data types and each one is categorized as “domain”. This problem setting is based on the assumption that

some data types in the auxiliary domains can be obtained more easily than the data type in the target domain. Normally, this



problem setting requires the user/item sets in different data domains to be the same. Then the knowledge can be discovered and

transferred by finding relationships between data domains.

IV. PROPOSED TECHNIQUE

As the domain clustering models consists of two domains. One is the domain which is used to process the data and next is

modeled with the co-clustering items. A latent feature model is used for clustering items as well as users.

1) Definition 1:

Non Parametric Latent feature models. The idea behind latent feature model is to decompose data into small components based

on the domain and clustering each item/user based on the similarities. Table - 1

User and Item profiles in books and movies domains

Attribute Value

(Book Domain) Rating

Attribute Value

(Movie Domain) Rating

Toy Story Cartoon 4.2 White Balloon Cartoon 1.0

Golden Eye Fantasy Novel 3.5 Apollo Adventure 0.8

Get Shorty Adventure 5.0 Crumb Carrie Fisher 1.0

When clustering the users and items, the algorithm is working based on the routing information as well as timely information

available from the user data. Therefore the algorithm provides more accurate results on co-clustering items, users with respect to

time. The Table. 1 shows the comparison of two different domains which have similar characteristics and behavior. Each item is

classified with the ratings assigned to it. The items are related to users based on the user identifiers. The attributes of the book

domain clustered with item type cartoon and the compute rating based similarities between items or users. Let us consider the

example shown in Table 1, comparing two domain attributes from book and movies based on the similar category “Cartoon”.

Then compute the similarities between same items provided by same users or ratings of similar items provided by similar target

users.

Fig. 1: The shows the user-item values against two rating matrices. The unknown data is represented by „?‟. The parameters u1 and u2 are

representing the user rating values and v1 and v2 are representing the corresponding rating for items.

When considering the latent feature model, information with the most similarities associated with each domain can

consider for collecting the data. For example, a “Data Mining” information find out in the grouping with most

similarities will be considered for the keywords “clustering”, “sorting”, “classification”, etc.

Latent Feature Relational Model A.

Latent feature relational model is a method proposed for filtering the user/item ratings. The latent feature model allowing

similarity models for clustering a range of datasets. Multiple latent features can be active in an observed data set. Different

methods are used for Dimensionality reduction such as factor analysis and probabilistic matrix factorization provides a statistical

approach to triggering the latent features. These methods provide a small set of dimensions and model each data set as weighted

combinations. Dimensionality methods can improve predictions on an observed data. When considering the user/item rating

patterns, this model allows the dependencies between data points. The dependencies exist between the observations may be

spatial or temporal. The dependencies are due to the sharing of latent features between the nearby data points. The main goal of

latent feature model is to improve the performance of the generative models by representing latent structures of individual

entities from the observed data in a proper manner.

Model Description B.

The data set is formed as a matrices Z with an infinite number of columns. The rows of Z correspond to the users and

the columns correspond to the items. The analysis of data, the user for pointing, and the cells represent the corresponding ratings

against the user/items. The data sets are filled by selecting the matched rating patterns of similar domains. The rating distribution



matrix consists of a finite number of rows and columns. If the observed ratings from the two matrices are more similar, the

resulting matrix will be more likely to share the same ratings.

V. DATA SETS

The following real-world data sets are used for the experiments.

1) Movie Lens data sets: This data set contains more than 100000 user options with rating from 1 to 5 by 2875 users on

11542 movies. Creating matrix with grouping data and randomly choosing 300 users by 300 movies with top most

ratings and unknown columns are filled with average ratings.

2) Book-Crossing data sets: A book rating data sets contain 21500 user ratings rank from 1 to 7 by 560 users on 25000

books. The ratings are limited to 1 to 5 for the suitably comparison with movie lens datasets. Select a sub-matrix

randomly with 200 users and 500 books with most ratings.

VI. EXPERIMENT RESULT AND DISCUSSION

Cross Domain User-Item clustering A.

A useful method to detect the issues is to define cross domain graph connecting user‟s source and auxiliary domains and

uses the clustering algorithm to rank the relations in the auxiliary domain. User-item similarities provide the better choice

for the multi domain connections through the subset of user- item pairs between the domains. In this method, not all

collections are considered. One challenge is how to get the relation against the source domain to auxiliary domain. For this

clustering algorithm is used.

Rating pattern sharing across domains B.

The Rating-pattern sharing was proposed in Code Book Transfer (CBT), for solving adaptive transfer learning (domain

adaptation) problems in CF. Then the idea was incorporated into a probabilistic model, Rating-Matrix Generative Model

for solving collective transfer learning (multi-task learning) problems in CF. Both CBT and multi-task learning are cross-

domain CF methods over system domains. Rating patterns are transferred across the related metrics. Since user-item

ratings are matrix data with two finite sets of objects (user and item). These objects are co-cluster simultaneously and find

similarity among different rating matrices by matching group level rating pattern. We can thus simultaneously group users

based on their ratings on items and group items based on their associated ratings provided by users in both domains to find

shared group-level rating patterns.

Domain Clustering for Collaborative Filtering C.

In this work, the user-item matrix X is clustered by using orthogonal nonnegative tri-factorization algorithm. The result

will produce the produce the new user-item co -clustering. Traditional methods require comparing whole database, apart

from this, clustering method can filter the data based on the clustering results. Therefore clustering technique can reduce

the search space. Table – 2

Comparison with the results reported Mean Absolute Error (MAE) of the compared models on Each Movie and Book-Crossing related domains

Algorithms Given 5 Given 10 Given 20

NMF (Non Negative Matrix Factorization) 0.980 0.942 0.879

CBT (Code Book Transfer) 0.954 0.922 0.839

RMGM (Rating Matrix Generative Model) 0.944 0.933 0.915

CBCF (Cluster Based Collaborative Filtering) 0.924 0.896 0.890

CLFM (Cluster-Level Latent Feature model) 0.933 0.909 0.88

Table 2. Shows the comparative study of models with results reported Mean Absolute Error of the compared models on Each

Movie and Book-Crossing related domains under different configurations.

The evaluation criteria are formed using different parameters. For each user there are three different categories (Given 5,

Given 10, and Given 20) of observations are given for comparing different algorithms. The Fig. 3 shows the comparison

algorithm of different models. The MAE value getting by using the RMGM model is showing more performance than the

CBT and NMF.



Fig. 2: MAE performance of the compared models with respect to the value of shared subspace dimensionality in EachMovie domain.

The evaluation result of CLFM is more effective than any other model using for clustering the user ratings and it reduces

the sparsity by sharing useful information across the systems. The smallest value in the graph having better performance.

VII. CONCLUSION

This paper is mainly focused in the area of cluster level latent feature on the multiple domains. The clustering techniques and

domain factors are categorized via orthogonal nonnegative matrix tri-factorization using cluster level latent feature model, which

can transfer useful knowledge from the auxiliary domain to target areas. The knowledge is transferred in the form of code book.

The experiments have evaluated the performance of rating and ranking prediction in terms of various metrics using different

models and other comparative methods. This model is fully based on the clustering of user/item ratings using cluster level

features. For future work it is planned to systematically study the co-clustering scenario of multiple domains and form new

clustering methods with differentiating agents.

REFERENCES

[1] Sheng Gao, Zhanyu Ma, Member, IEEE, and Patrick Gallinari, “A Cross Domain Recommendation model for Cyber Physical System, ” IEEE

Transactions on Emerging Topic in Computing vol. 1, No. 2, January 2014.

[2] P. B. Li, “Cross-domain collaborative filtering: A brief survey,” in Proc. 23rd IEEE ICTAI, Nov. 2011, pp. 1085_1086. [3] Jie Tang, Sen Wu, Jimeng Sun, and Hang Su, “Cross-domain Collaboration Recommendation,” KDD‟12, August 12–16, 2012.

[4] Gang Chen. Fei Wang, “Collaborative Filtering using Orthogonal Nonnegative Matrix Tri-factorization,” Seventh IEEE International Conference on Data

Mining. [5] Abel, F., Herder, E., Houben, G.J., Henze, N., Krause, D.: Cross-system user modelling and personalization on the social web. User Modelling and User-

Adapted Interaction, 1–42 (2011) [6] Siting Ren, and Sheng Gao, “Improving Cross-domain recommendation through Probabilistic Cluster Level Latent Factor Model” cs.IR 24 sept 2014.

[7] Bin Li, Xingquan Zhu, Senior Member, IEEE, Ruijiang Li, and Chengqi Zhang, Senior Member, IEEE, “Raing Knowledge sharing in Cross-domain

Collaborative filtering” IEEE Transactions on Cybernetics 2014. [8] Q. Yuan, L. Chen, and S. Zhao, “Factorization vs. regularization: fusing heterogeneous social relationships in a top - and recommendation.,” In RecSys‟11,

pages 245–252, 2011.

Documents

A New Dimensional Knowledge Discovery for Routing Network using Latent Feature Model