Msc presentation Bioinformatics

Embed Size (px)

DESCRIPTION

This is presentation of solution for food science. Software that uses pattern recognition for determining meat spoilage. Software uses GPU, Machine learning techniques for creating training and testing set for determining quality of recognition.

Citation preview

  • 1. Biological patterns(electronic nose) data classification and recognition machine learning approaches using GPGPU Pavels Kartasevs Msc Applied Bioinformatics course Cranfield University

2. Contents Electronic nose SVM and ANN Comparison of developed solution Heterogeneous processing Results Further improvements and conclusions 3. Problem description Prediction tools allows to analyze information from different sources Application: Meat spoilage prediction Meat spoilage problem (from manufacturer to producer) Fast enough solution and availability of free software 4. Meat spoilage Problem that can impact health Cause many different bacteria Sensory panel/laboratory analysis disadvantage Automatic analysis tools 5. Electronic nose Wide emerging field of cheap analysis devices Can be used for food science Automatic food quality determination 6. Electronic nose in prediction of meat spoilage Electronic nose generates data Low cost of the device Fast result E-nose results interpretation 7. SVM and Neural networks SVM Support vector machines are relatively new form of supervised machine learning. Artificial neural networks Artificial neural network by their model mimics human brain structure. 8. Difference between SVM and ANN SVM is fast Must preform grid search to find optimum solution Construct mathematical model of problem ANN learns, opposite to SVM Can work efficiently than SVM Processing speed depends on neuron count 9. SVM Performance comparison ensembleSVM_Count_ALL.R CPP_BIO Program 0 50 100 150 200 250 300 350 112 10 308 19 "Intel Xeon(R) CPU X5492 @ 3.40GHz 8 DDR2 800 Mhz, Ubuntu 64-bit" Core i5-3210M / 4 Gb DDR3 Minutes 10. Implementation To get such speed all application/algorithm was reimplemented in C/C++ programming language which is the fastest programming language LibSVM C/C++ library 11. Prediction performance of R SVM 1 iterations, C param. from 1 to 50 with step 1, gamma from 0.1 to 10 with step 0.1, 80 SVM Time 60 min. Time 6.5 min. 12. GPU as co-processor Gpu is good on parallel computations GPU memory latency GPU library call latency 13. GPU libraries results Easy-cpu Easy-gpu Svm-train(cpu) Gpusvm-0.2 0 5 10 15 20 25 30 35 Processing time of 2Mb beef_fillets_fitr data Library Time(Seconds) Why is GPU slower? 14. GPU Ensemble Due to small data amount running one SVM on the GPU in inefficient But using GPU structure is making sense to run ensemble of SVM on the GPU in parallel 15. Re-implementation of libsvm on the GPU 2 different approaches Target NVIDIA FERMI GPU Target ALL NVIDIA GPU 16. GPU re-implementation results Full GPU processing time: 1 minute vs 12 seconds on the CPU As accurate as CPU 17. GPU re-implementation results(2) Heterogeneous GPU processing cpu gpu 0 0.5 1 1.5 2 2.5 1.8 2 Time (seconds) GPU implementation is slower by 10% 18. GPU re-implementation results(3) 30.00% 70.00% Time performing by CPU to calaulate SVM matrix SVM Kernel calculation Other computing SVM Kernel matrix calculation on GPU saves ~30% of the CPU time, CPU is free to do other calculations 19. Solution for Graphical User Interface 1 2 3 3 4 20. Future improvements Further improvements of solution might include: Re-implement solution fully in Java languarge to make portable and library and platform independent Add Web-interface to the solution Write installation application to easy install solution 21. Conclusions Implemented solution is 10 times faster, than existing R framework solution Graphical interface implemented Different analysis types Heterogeneous computing 22. Thank you. Any questions?