1 Applications of Slow Intelligence Systems. 2 Outline Application: Social Influence Analysis Application: Product Service Optimization Application:

Embed Size (px)

DESCRIPTION

3 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion

Citation preview

1 Applications of Slow Intelligence Systems 2 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 3 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 4 Application to Social Influence Analysis In large social networks, nodes (users, entities) are influenced by others for many different reasons. How to model the diffusion processes over social network and how to predict which node will influence which other nodes in network have been an active research topic recently. Many researchers proposed various algorithms. How to utilize these algorithms and evolutionarily select the best one with the most appropriate parameters to do social influence analysis is our objective in applying the SIS technology. 5 The Social Influence Analysis SIS System Input data stream is first processed by the Pre- Processor. The Enumerator then invokes the super- component that creates the various social influence analysis algorithms such as Linear Threshold LIM, Susceptible-Infective-Susceptible SIS, Susceptible- Infective-Recovered SIR and Independent Cascading. The Tester collects and presents the test results. 6 LIM Results of concept 1 and concept 3 with two combinations of parameters in Plurk dataset 7 LIM Results of concept 1 and concept 3 with two combinations of parameters in Facebook dataset 8 The SIA/SIS System The Timing Controller will restart the social influence analysis cycle with a different SIA super component such as the Heat Diffusion algorithms, or with different pre-processor. The Eliminator eliminates the inferior SIA algorithms, and the Concentrator selects the optimal SIA algorithm. 9 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 10 Production of personalized or custom-tailored goods or services to meet consumers' diverse and changing needs SIS Application to Product Configuration 11 Ontological Filter and Slow Intelligence System Figure 6 - Ontological Filter and the Slow Intelligent System 12 A Scenario A customer would like to buy a Personal Computer in order to play videogames and surf on the internet. He knows that he needs an operating system, a web browser and an antivirus package. In particular, the user prefers a Microsoft Windows operating system. He lives in the United States and prefers to have a desktop. He also prefers low cost components. 13 Ontological Transform for Product Configurator 14 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 15 Detect current hot topics and predict future hot topics based on data collected from the internet TDT System composes of Crawler & Extractor: Collect latest data from Internet for user s needs Restrict range of data collection from web data (focus crawler) Topic Extractor Discover current hot topics from a set of text documents Topic Detector Predict hot topics Topic Detection and Tracking (TDT) System Overview 16 Topic/Trend Detection System Crawler & Extractor Web data DB Web Crawler HTML documents Information Extractor * Extract articles and metadata (title, author, content, etc) from semi- structured web content Users Keywords of Interests Topic Extractor Social Media Text documents Crawler & Extractor 17 Taxonomy Creation Example Collection Taxonomy Selection and Refinement Interactive Exploration Training Focused Crawler : Classification System proposes the most common classes User marks as GOOD User change trees Yahoo! Open Directory Project URLs Browsing System propose URLs found in small neighborhood of examples. User examines and includes some of these examples. Integrate refinements into statistical class model (classifier-specific action). 18 Distillation Feedback Focused Crawler: Distillation Identify relevant hubs by running a topic distillation algorithm. Raise visit priorities of hubs and immediate neighbors. Report most popular sites and resources. Mark results as useful/useless. Send feedback to classifier and distiller. 19 Extractor Given a Web page: Build the HTML tag tree Mine data regions Mining data records directly is hard Identify data records from each data region Learn the structure of a general data record A data record can contain optional fields Extract the data 20 TDT Petri Net Simulation Topic Detection and Tracking 21 22 Crawler 23 Initial State 24 Accept user input 25 Validate user input 26 Refine user input 27 Train the system 28 Detect most popular topic 29 Extractor 30 Extractor activated 31 Generate HTML tag trees 32 Detect important data 33 Train the system with record 34 Extract data 35 Save data into knowledge base 36 Topic Detection and Tracking 37 Slow Intelligence Steps in blue color: Accept user request Send request data to TDT Enumerator generates combinations Eliminator selects the best method to fit our need Evaluate combinations Use concentrator to highlight the selected results Send the result to TDT Generate the instructions to the server Dispatcher gets the instruction Decide where we are going to send the instructions Send the instructions to the server End of simulation run 38 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 39 Introduction High-dimensional feature selection is a hot topic in statistics and machine learning. Model relationship between one response and associated features, based on a sample of size n. 40 Math formulation Let be a vector of responses and be their associated covariate vectors where. When for the classification problem, we assume a Logistic model: We estimate the regression coefficient and the bias by minimizing the loss function: 41 Application Supervised learning: gene selection problem in bioinformatics one wants to eliminate those irrelevant genes (features) to obtain a robust classifier. one wants to know which genes are the most critical factors to the disease. n samples, patients or healthy ones each samples data with p gene expression levels Important genes selected each Gene expression level 42 Challenges Dimensionality grows rapidly with interactions of the features Portfolio selection and networking modeling: 2000 stocks involve over 2 millions unknown parameters in the covariance matrix. Protein-protein interaction: the sample size may be in the order of thousands, but the number of features can be in the order of millions. To construct effective method to learn relationships between features and responses in high dimension for scientific purposes. 43 Feature Selection Approach Main SIS procedure main_Enumerator main_Eliminator main_Adaptator main_Propagator main_Concentrator time controller Sub procedure sub_enumerator sub_concentrator knowledge base 44 Main Enumerator Enumerate p features Among these features, some are relevant to the responses while others not. 45 Main Eliminator Apply Pearson Correlation between each feature and response, then rank the value from high to low and eliminate the lowest features. is a pre-defined constant. is selected top feature set. 46 Sub Enumerator Enumerate all feature selection algorithms in Knowledge base by applying them to feature set. And select top features as set from for each algorithm. Knowledge Base: stores the existing candidate algorithms. We add L1-regularized regression, elastic-net regularized regression and forward stepwise regression. In principle, any feature selection algorithms can be put into the knowledge base. 47 Sub Concentrator For each selected feature set, we compute the loss function: and choose the best algorithm with the minimum loss. Then the sub system selects features from. We denote the feature set 48 Main Adaptor For all other features in the total p features, we add each one to and compute the loss function: 49 Main Concentrator Ranking all with from low to high, and select the top features with the smallest. top features 50 Main Propagator Add these top features to to form the new feature set. top features 51 Timing Controller Timing controller controls the termination of whole process. It sets a threshold. if, it stops after sub concentration process and outputs the selected features; if, the process continues to main adaption. The larger the is, the more accurate feature selection result is, but it needs more time to compute. Thus slow decision cycles can result in better performance for a long run. 52 General algorithm 53 Experimental Results: Dataset description Leukemia dataset: Leukemia is a type of cancer of the blood. This dataset consists of 72 samples including 47 acute myeloid leukemia and 25 patients with lymphoblastic leukemia, including expression levels of 7129 human genes. The data is separated to 38 samples for training set and 34 samples for testing set. Colon cancer dataset: This dataset consists of 62 samples including 40 tumor colon tissues and 22 normal colon tissue, including expression levels of 2000 human genes. The data is separated to 32 samples for training set and 30 samples for testing set. 54 Experimental protocol We compare our system with the three individual feature selection algorithms in Knowledge base. We report the number errors: and balance error rate: 55 Experimental results Our method out-performs individual algorithm. When we increase K, the number of cycles defined by time controller, the accuracy of our system improves. It is a tradeoff between the running time and the performance. 56 Experimental results For biological background, these genes are critical for leukemia disease : Zyxin is known to interact with leukemogenic bHLH proteins. This one is selected by both SIS (K=5) and SIS (K=10). Cystatin C (CST3) and Cystatin A are very important two genes selected by SIS (K=10) not by SIS(K=5), which indicates larger K leads more accurate result. 57 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application: Topic/Trend Detection Application: High Dimensional Feature Selection Discussion 58 Discussions Implemented Social Influence Analysis algorithms to find best model based upon Slow Intelligence principles Applied Slow Intelligence principle to ontological filtering for Product and Service Selection Modeled and simulated Trend and Topic Detection system using Petri net with the framework of Slow Intelligence System. Studied a new feature selection application with the framework of Slow Intelligence System. It leads to superior performance and can handle high dimensional data. 59 Further Work Design mechanism to dynamically update the knowledge base by applying SIS approach onto itself Design a user-friendly interface to develop and manage an application system Q&A