IEEE Projects for M.tech,2013 IEEE Projects, Final year Engineering Projects,Best student projects,

Knowledge and dataNO PRJ

TITLEABSTRACT DOMAIN YOP

1 A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.

Knowledge and data

2013

2 Anomaly Detection via Online Oversampling principal Component Analysis

Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this paper, we propose an online oversampling principal component analysis (osPCA) algorithm to address this problem, and we aim at detecting the presence of outliers from a large amount of data via an online updating technique. Unlike prior principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems. By oversampling the target instance and extracting the principal direction of the data, the proposed osPCA allows us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen analysis explicitly, the proposed framework is favored for online applications which have computation or memory limitations. Compared with the well-known power method for PCA and other popular anomaly detection algorithms, our experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency.

Knowledge and data

2013

3 Lineage Encoding: An Efficient Wireless XML Streaming Supporting Twig Pattern Queries

In this paper, we propose an energy and latency efficient XML dissemination scheme for the mobile computing. We define a novel unit structure called G-node for streaming XML data in the wireless environment. It exploits the benefits of the structure indexing and attribute summarization that can integrate relevant XML elements into a group. It provides a way for selective access of their attribute values and text content. We also propose a lightweight and effective encoding scheme, called Lineage Encoding, to support evaluation of predicates and twig pattern queries over the stream. The Lineage Encoding scheme represents the parent-child relationships among XML elements as a sequence of bit-strings, called Lineage Code(V, H), and provides basic operators and functions for effective twig pattern query processing at mobile clients. Extensive experiments using real and synthetic data sets demonstrate our scheme outperforms conventional wireless XML broadcasting methods for simple path queries as well as complex twig pattern queries with predicate conditions.

Knowledge and data

2013

#56, II Floor, Pushpagiri Complex, 17th Cross 8th Main, Opp Water Tank,Vijaynagar,Bangalore-560040.

Website: www.citlprojects.com, Email ID: [email protected],[email protected]: 9886173099 / 9986709224, PH : 080 -23208045 / 23207367

JAVA / J2EE PROJECTS – 2013(Networking, Network-Security, Mobile Computing, Cloud Computing, Wireless Sensor Network, Datamining,

Webmining, Artificial Intelligence, Vanet, Ad-Hoc Network)

mailto:[email protected],[email protected]

http://www.citlprojects.com/

4 MKBoost: A Framework of Multiple Kernel Boosting

Multiple kernel learning (MKL) is a promising family of machine learning algorithms using multiple kernel functions for various challenging data mining tasks. Conventional MKL methods often formulate the problem as an optimization task of learning the optimal combinations of both kernels and classifiers, which usually results in some forms of challenging optimization tasks that are often difficult to be solved. Different from the existing MKL methods, in this paper, we investigate a boosting framework of MKL for classification tasks, i.e., we adopt boosting to solve a variant of MKL problem, which avoids solving the complicated optimization tasks. Specifically, we present a novel framework of Multiple kernel boosting (MKBoost), which applies the idea of boosting techniques to learn kernel-based classifiers with multiple kernels for classification problems. Based on the proposed framework, we propose several variants of MKBoost algorithms and extensively examine their empirical performance on a number of benchmark data sets in comparisons to various state-of-the-art MKL algorithms on classification tasks. Experimental results show that the proposed method is more effective and efficient than the existing MKL techniques.

Knowledge and data

2013

5 TACI: Taxonomy-Aware Catalog Integration

A fundamental data integration task faced by online commercial portals and commerce search engines is the integration of products coming from multiple providers to their product catalogs. In this scenario, the commercial portal has its own taxonomy (the “master taxonomy”), while each data provider organizes its products into a different taxonomy (the “provider taxonomy”). In this paper, we consider the problem of categorizing products from the data providers into the master taxonomy, while making use of the provider taxonomy information. Our approach is based on a taxonomy-aware processing step that adjusts the results of a text-based classifier to ensure that products that are close together in the provider taxonomy remain close in the master taxonomy. We formulate this intuition as a structured prediction optimization problem. To the best of our knowledge, this is the first approach that leverages the structure of taxonomies in order to enhance catalog integration. We propose algorithms that are scalable and thus applicable to the large data sets that are typical on the web. We evaluate our algorithms on real-world data and we show that taxonomy-aware classification provides a significant improvement over existing approaches.

Knowledge and data

2013

6 Mining Order-Preserving Submatrices from Data with Repeated Measurements

Order-preserving submatrices (OPSM's) have been shown useful in capturing concurrent patterns in data when the relative magnitudes of data items are more important than their exact values. For instance, in analyzing gene expression profiles obtained from microarray experiments, the relative magnitudes are important both because they represent the change of gene activities across the experiments, and because there is typically a high level of noise in data that makes the exact values untrustable. To cope with data noise, repeated experiments are often conducted to collect multiple measurements. We propose and study a more robust version of OPSM, where each data item is represented by a set of values obtained from replicated experiments. We call the new problem OPSM-RM (OPSM with repeated measurements). We define OPSM-RM based on a number of practical requirements. We discuss the computational challenges of OPSM-RM and propose a generic mining algorithm. We further propose a series of techniques to speed up two time dominating components of the algorithm. We show the effectiveness and efficiency of our methods through a series of experiments conducted on real microarray data.

Knowledge and data

2013

Documents

IEEE Projects for M.tech,2013 IEEE Projects, Final year Engineering Projects,Best student projects,