34

Lightning Talks & Integrations Track - Real-Time Streaming of Events to Predict Noise @ ABDW17, Pune

Embed Size (px)

Citation preview

●●●

●●●●●●

●●●●

High Throughput Raw Data Ingestion (~35,000 ops/sec)

Data Pre-Processing ( <2 hrs for 96 GB

data)

Visualization on raw and processed

data

Data Access & Fast Query (400K

rows/sec)

Achieved 89%

Prediction Accuracy of

parasitic events

/input/Perietids_1/Day_7.csv

Time,Variable3,Variable71,-0.000456,0.0002932,0.016824,0.0472063,0.022986,0.0465844,0.027403,0.0488215,0.026759,0.0391146,0.016151,0.025287

/output/Perietids_1_Day_7.csv

Region,Well,Day,Time,Variable,ValuePerietids_1,1,Day_7,1,Variable3,-0.000456Perietids_1,1,Day_7,1,Variable7,0.000293Perietids_1,1,Day_7,2,Variable3,0.016824Perietids_1,1,Day_7,2,Variable7,0.047206Perietids_1,1,Day_7,3,Variable3,0.022986Perietids_1,1,Day_7,3,Variable7,0.046584Perietids_1,1,Day_7,4,Variable3,0.027403Perietids_1,1,Day_7,4,Variable7,0.048821Perietids_1,1,Day_7,5,Variable3,0.026759Perietids_1,1,Day_7,5,Variable7,0.039114Perietids_1,1,Day_7,6,Variable3,0.016151Perietids_1,1,Day_7,6,Variable7,0.025287

/output/Perietids_1_Day_7.csv

Region,Well,Day,Time,Variable,ValuePerietids_1,1,Day_7,1,Variable3,-0.000456Perietids_1,1,Day_7,1,Variable7,0.000293Perietids_1,1,Day_7,2,Variable3,0.016824Perietids_1,1,Day_7,2,Variable7,0.047206Perietids_1,1,Day_7,3,Variable3,0.022986Perietids_1,1,Day_7,3,Variable7,0.046584Perietids_1,1,Day_7,4,Variable3,0.027403Perietids_1,1,Day_7,4,Variable7,0.048821Perietids_1,1,Day_7,5,Variable3,0.026759Perietids_1,1,Day_7,5,Variable7,0.039114Perietids_1,1,Day_7,6,Variable3,0.016151Perietids_1,1,Day_7,6,Variable7,0.025287

●●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●●●

●●●

●●

●●●●

●●●●●●

1.●●●

2.●●●

3.●●●

●●

●●

●●●

●●●●

Single Job

●●

Sr. no.

Algorithms Area under ROC (Receiver Operating

Characteristic)

Area under PR(Precision Recall)

1 Decision Tree 0.910859892634328 0.8998662900517511

2 Gradient Boosted Tree 0.9153602433919851 0.8982661492173283

3 Random Forest Tree 0.8921605540835773 0.8737343102424423

4 Support Vector Machine 0.45597755210613794 0.2680391212960384

Data Pre-processing job (~96 GB data)• Time = 1 hr 53 mins, ~900 million input rows, With Min, Max, (Max-Min), Average

Feature• Time = 12 hr 29 mins, ~900 million input rows, With Min, Max, (Max-Min),

Average, Distinct Feature – Spark ML-Lib• Time = 52 mins, ~900 million input rows, With Min, Max, (Max-Min), Average,

Distinct Feature – scikit-learn

Model Building & Save• 2 mins for 9752 input rows post-aggregations

Event Prediction for 2 wells• 3-4 mins, ~12 million input rows, 70 day files for 2 wells (140 records)

Overall Accuracy 93.45%Precision 87.47%

Recall/TPR 90.07%

Specificity 94.73%

Total events=1603 Predicted (0)

Predicted (1)

True (0) TN=1079 FP=60 1139

False (1) FN=45 TP=419 464

1124 479 1603

Overall Accuracy 92.20%Precision 88.90%

Recall/TPR 89.53%

Specificity 95.54%

Total events=1977 Predicted (0)

Predicted (1)

True (0) TN=1350 FP=63 1413

False (1) FN=59 TP=505 564

1409 568 1977

●●●●●