Upload
tanay-dixit
View
9
Download
0
Embed Size (px)
DESCRIPTION
Projects
Citation preview
Mead learning– Big data & deep learning projects
1
1.B2B Recommendation (similar to Indiamart)
● Dataset : weblog file with approximate 700000 searching
queries of different users in different categories of B2B search engine.
● Each search contains following data in weblog file: date time s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2
Tools and Technology
● Dataset : Weblog of Fibre2Fashion domain ● Technology : JAVA ● Algorithm: Collaborative Deep Learning for Recommendation Systems
3
B2B search engine
4
Cs-uri-query
Weblog Analysis
5
Unique page visitor data with timestamp
6
Unique Page in weblog
Visitor client address
7
Unique Client IP in weblog
Recommendation
8
Server IP
Client IP Page
Query Time taken
This figure shows the highest time spend pages by the clients.
2.Pingax Recommendation ● We created a Google chrome plug-in with
recommendation system which gives the best recommendation as per the user search for all the items in different websites.
● Technology: Java, Apache Mahout, Apache Hadoop, Json,
JavaScript
9
Pingax Digital Recommendation Settings
10
Recommendation
11
1st Recommendation
Recommendation
12
2nd Recommendation
Recommendation
13
Recommended item
3.IMDB movie review classification ● Dataset: IMDB movie Review text files ● Tools & technology: Eclipse , JAVA
● Machine Learning & Deep Learning Technique: Deep
Learning using Linear Support Vector Machines and Conditional Random Fields as Recurrent Neural Networks
● Class labels: Positive , Negative , Neutral
14
Dataset
15
POSITIVE
NEGATIVE
SVM train
16
Training Parameters
Accuracy
17
CRF Train
18
CRF Train parameters
CRF Test
19
CRF Test parameters
Accuracy
20
4.A Collaborative Approach for Web Personalized Recommendation System
● Dataset: Movielens data ● Technology: JAVA ● Algorithm: User based collaborative Deep Learning
filtering technique .
21
Dataset
22
• The full data set, 100000 ratings by 943 users on
1682 items. • Each user has rated at least 20 movies. • Users and items are numbered consecutively
from 1. • The data is randomly ordered and 80%/20%
splits of the data into training and test data.
User ID, Item(Movie) Id, Rating
Evaluation of System
23
Analysis with different similarity measures
24
With Preprocessing Without preprocesing Neighbour 20 100 1000 20 100 1000 Pearson Coefficient
SCORE: 0.80281969 RMSE: 0.807538041
SCORE: 0.82893594 RMSE: 0.83615161
SCORE: 0.9604098849 RMSE: 0.957224878
SCORE: 0.9146494635 RMSE: 0.882693501
SCORE: 0.843550186 RMSE: 0.8463632529
SCORE: 0.86570804 RMSE: 0.86579376
Euclidean Distance
SCORE: 0.7694310985 RMSE: 0.767654578
SCORE: 0.7399951798 RMSE: 0.746315315
SCORE: 0.724689055 RMSE: 0.740014834
SCORE: 0.89754363 RMSE: 0.900235117
SCORE: 0.83637523 RMSE: 0.8443356
SCORE: 0.89751163 RMSE: 0.90023111
Log Likely-hood
SCORE: 0.795861932 RMSE: 0.7785749268
SCORE: 0.7467147307 RMSE: 0.759681185
SCORE: 0.726946821 RMSE: 0.74111319
SCORE: 0.823518133 RMSE: 0.82825220
SCORE: 0.8034275434 RMSE: 0.80880505
SCORE: 0.805596293 RMSE: 0.814149532
Tanimoto Coefficient
SCORE: 0.7945015649 RMSE: 0.7798661428
SCORE: 0.7490188732 RMSE: 0.76143536582
SCORE: 0.7290753292 RMSE: 0.74004491
SCORE: 0.832473341 RMSE: 0.8394239511
SCORE: 0.8064418071 RMSE: 0.813111737
SCORE: 0.8037747631 RMSE: 0.811900568
5.Flower grain image classification using supervised classification algorithms
● Dataset: Magnified images of flowers ● Technology: JAVA
● Algorithm: Grain Analysis (Image Processing)
● Machine learning Technique: Neural Network and Deep Belief network ,SVM
● Microscope Magnification: 100X
25
Deep Neural Network Here we extended simple neural network architecture to deep belief network.
Simple Neural Network Deep Belief Network
26
Flower and it’s Grain images
27
Model Parameter
28
SVM training Parameters
Code & Accuracy
29
6. Large scale medical text classification and
identification in Healthcare
● Dataset : Medical Text files
30
healthcare ENTITY RELATIONSHIP DETECTION FROM LARGE TEXT FILES
In this project, we developed algorithm that will predict if a relationship exists between two entities from medical text files. (Like Leg pain or pain in
leg). We used deep learning Support Vector Machine algorithm (Binary
Classifier) to accurately identify it.
Accuracy: 87% on testing (unknown) data
31
Healthcare
ENTITY DETECTION IN CLINICAL DOMAIN In this Project, we detected different keywords (modifiers) like Negation, Conditional, Severity, Temporal, Body measurements, some Disease name
and others from large medical text files using NLP Algorithms and classified
it using probabilistic graphical model like Deep CRF networks and Hidden Markov Model.
Accuracy: 93% on testing (unknown) data.
32
7.Neural network design for rock image
recognition The objective of this project is to develop the method for Rock Image Classification system using microscopic imaging of surface parameter. Rock surface parameters are color,grain and texture. The combined feature extracted from each of this parameter is used to uniquely identify rock type or to recognize its signature. We designed and developed multi layer feed-forward deep neural network to classified non-linear complex data.
Accuracy: 95% on testing untrained rock images
33