5
1 Journal of Computer Engineering Vol 2, No 1 (2011) ISSN: 20101619 1 Journal of Computer Engineering journal homepage: http://icoci.org/jce Data Mining Techniques for Wind Speed Analysis (A case study for Gaza Strip) 1 Marwa F. AlRoby, 2 Alaa M. ElHalees Computer Science Department, Islamic University of Gaza – Palestine 1 [email protected] , 2 [email protected] Abstract: Weather Data Mining is a form of Data mining concerned with finding hidden patterns inside largely available meteorological data, so that the information retrieved can be transformed into usable knowledge. In this paper we used meteorological data mining to analyze wind speed behavior. The data was recorded between 2004 to November 2006 daily historical data by meteorological station of Gaza. After preprocessing the data, we applied data mining techniques: association rules, classification, cluster and outlier analysis. From these four tasks, we found the most appropriate of these techniques to be applied on weather data is classification task, especially the neural networks method because the nature of the data is time series. Keywords: Data mining, Association rules, Classification, Cluster, Outlier analysis I. INTRODUCTION Weather prediction has been one of the most interesting and fascinating domain. It plays an important role in natural and human life. Agriculture sectors, aviation operations, fisheries, food security, shipping, safety at sea, monitoring of water resources, many industries and tourism are dependent on the weather conditions [7]. It is often used to predict and warn about natural disasters that are caused by abrupt change in climatic conditions to take the necessary protection in early time [7]. It also helps in using wind energy as an alternative source of energy, mainly for electrical power generation. For their reasons, the scientists have been forecasting the meteorological characteristics using a large set of methods, one of them is data mining method, a method developed recently, can be successfully applied in this domain [2]. This paper investigates the wind speed prediction domain of data mining using a case study. It showed what kind of data could be collected, how could we preprocess the data, how to apply data mining methods on the data, and finally how can we benefited from the discovered knowledge. There are many kinds of knowledge can be discovered from data. In this work we investigated the most common ones which are association, classification, clustering and outlier detection. The paper is structured as follows: in Section 2 summaries related works in weather data mining. Section 3 gives a general description of the data we used in our case study. Section 4 describes the preprocess stage of the used data. Section 5 illustrates our experiments about applying data mining methods on the wind speed data. Finally we conclude this paper with a summary. II. LITERATURE REVIEW Many of research have been done in wind speed data analysis domain. In this section we review some of them. With increasing agricultural and industrial activities in the country, the demand for energy is also increasing, K. Sreelakshmi and P. Ramakanthkumar [4] introduced a feedforward neural network model for short term wind speed prediction, which uses back propagation algorithm. They utilize six different parameters values (Mean temperature, Humidity, Wind gust, Wind direction, Barometric pressure and Wind speed) as the input to model. Historical data of 10 years is considered for the experimentation. Finally, they reach on model that predicted wind speed differs from the actual value by max 5%. Elia Georgiana Petre [2] tried to forecast weather by using CART (Classification And Regression Trees) that is one of the most popular decision tree algorithms, can be used to build a classification tree to predict the future temperature values. Data collection registered over Hong Kong. For this, there are used meteorological data registered between 2002 and 2005. CART algorithm can only work with nominal variables, for this reason, they translate data to nominal type. Finally, they obtain results for their example as illustrate in Figure (1). Paulo Cortez and An´ıbal Morais [1], present a novel data mining approach to predict the burned area (or size) of forest fires by using meteorological data. They used recent realworld Figure 1: Evaluation on training set

Wind Analysis

Embed Size (px)

DESCRIPTION

Wind Analysis

Citation preview

  • 1 JournalofComputerEngineering Vol2,No1(2011) ISSN: 20101619

    1

    JournalofComputerEngineeringjournalhomepage:http://icoci.org/jce

    DataMiningTechniquesforWindSpeedAnalysis(AcasestudyforGazaStrip)

    1MarwaF.AlRoby,2Alaa M.ElHaleesComputerScienceDepartment,IslamicUniversityofGaza Palestine

    [email protected] ,[email protected]

    Abstract:Weather DataMining is a formof Datamining concernedwith finding hidden patterns inside largely availablemeteorologicaldata, so that the information retrieved canbe transformed into usable knowledge. In this paperweusedmeteorologicaldataminingtoanalyzewindspeedbehavior.Thedatawasrecordedbetween2004toNovember2006dailyhistorical data by meteorological station of Gaza. After preprocessing the data, we applied data mining techniques:associationrules,classification,clusterandoutlieranalysis.Fromthesefourtasks,wefoundthemostappropriateofthesetechniquestobeappliedonweatherdataisclassificationtask,especiallytheneuralnetworksmethodbecausethenatureofthedataistimeseries.

    Keywords:Datamining,Associationrules,Classification,Cluster,Outlieranalysis

    I. INTRODUCTION

    Weatherpredictionhasbeenoneofthemostinterestingand fascinating domain. It plays an important role innatural and human life. Agriculture sectors, aviationoperations,fisheries,foodsecurity,shipping,safetyatsea,monitoring of water resources, many industries andtourismaredependentontheweatherconditions[7].It isoftenusedtopredictandwarnaboutnaturaldisastersthatarecausedbyabruptchangeinclimaticconditionstotakethenecessaryprotectioninearly time [7]. Italsohelps inusing wind energy as an alternative source of energy,mainly for electrical power generation. For their reasons,the scientists have been forecasting the meteorologicalcharacteristicsusingalargesetofmethods,oneofthemisdataminingmethod,amethoddevelopedrecently,canbesuccessfullyappliedinthisdomain[2].

    This paper investigates the wind speed predictiondomainofdataminingusingacasestudy.Itshowedwhatkindofdatacouldbecollected,howcouldwepreprocessthe data,how to apply dataminingmethods on the data,and finally how can we benefited from the discoveredknowledge. There are many kinds of knowledge can bediscovered from data. In this work we investigated themost common ones which are association, classification,clusteringandoutlierdetection.

    The paper is structured as follows: in Section 2summariesrelatedworks inweatherdatamining.Section3 gives a general description of the datawe used in ourcasestudy.Section4describesthepreprocessstageoftheused data. Section 5 illustrates our experiments aboutapplying data mining methods on the wind speed data.Finallyweconcludethispaperwithasummary.

    II. LITERATUREREVIEW

    Many of research have been done in wind speed dataanalysisdomain.Inthissectionwereviewsomeofthem.

    With increasing agriculturaland industrial activities inthecountry, thedemand forenergy isalso increasing,K.SreelakshmiandP.Ramakanthkumar[4]introducedafeedforwardneuralnetworkmodelforshorttermwindspeedprediction,whichusesbackpropagationalgorithm.They utilize six different parameters values (Meantemperature, Humidity, Wind gust, Wind direction,Barometric pressure and Wind speed) as the input tomodel. Historical data of 10 years is considered for theexperimentation. Finally, they reach on model thatpredictedwindspeeddiffersfromtheactualvaluebymax5%.EliaGeorgianaPetre [2] tried to forecastweatherby using CART (Classification And Regression Trees)that is one of themost popular decision tree algorithms,can be used to build a classification tree to predict thefuture temperaturevalues.DatacollectionregisteredoverHongKong. For this, there are usedmeteorological dataregistered between 2002 and 2005.CARTalgorithm canonly work with nominal variables, for this reason, theytranslatedatatonominaltype.Finally,theyobtainresultsfortheirexampleasillustrateinFigure(1).

    Paulo Cortez andAnbalMorais [1], present a novel dataminingapproachtopredict theburnedarea(orsize)offorestfiresbyusingmeteorologicaldata.Theyusedrecentrealworld

    Figure1:Evaluationontrainingset

  • 2 JournalofComputerEngineering Vol2,No1(2011) ISSN: 20101619

    2

    data, collected from thenortheastregionofPortugal.Also theyapplyseveralexperimentswithfivedataminingtechniques(i.e.multiple regression,DecisionTrees (DT),RandomForest(RF),Neural Networks (NN) and Support Vector Machine (SVM)).Four distinct feature selection they used: spatial, temporal,components from the Canadian Fire Weather Index (FWI)system and meteorological data (rain, wind, temperature andhumidity). Finally, they find SVM technique produce the bestpredictionsforsmallfires.Thedrawbackisthelowerpredictiveaccuracy for large fires. T.Tugay Bilgin and A.YlmazCamurcu[5],determinedregionsinTurkeythathavesimilarairtemperaturecharacteristics.Theyappliedadataminingbasedondensity based spatial clustering of applications with noise(DBSCAN) on air temperature database that is collected fromcountry wide metrology stations in Turkey. Finally, theyobtainedclusterthathavesimilartemperaturetrends.L.Fugonetal [3], used different data mining models for evaluate windpowerforecasting.Modelsareevaluatedincludeneuralnetworks,RandomForestsandsupportvectormachines.

    III.DATACOLLECTION

    In our case study we collected the weather dataset frommeteorological station of Gaza. Gaza is located on thecoast of the Mediterranean Sea, north of the SinaiPeninsula and southwest of Jerusalem at 34olongitudeand 31olatitude [8]. Gaza strip has a temperate climate,with mild winters, and dry Mediterranean type, hotsummers subject to drought [9]. The observed data ofwindspeedcontain4yearscover theyears fromJanuary2003toNovember2006dailyhistoricaldata.Thenumberof recodes is 1429. Our work involves the utilization offive different parameters valueswhich areacquired fromthe weather station report such average temperature,pressure, humidity, wind direction and wind speed. Theparameters are considered as input shown in the table 1additional to day and month attributes. The output isdegreeofwindspeed.Thetypeofattributesisnumeric.

    IV.DATAPREPARATIONANDPREPROCESSING

    Togetbetterinputdatafordataminingtechniques,wedid some preprocessing for the collected data. After weintegratedthedata intoone file, toincrease interpretationandinclusiveness,werearrangedatainlagformat.Thelagisthevalueofwindspeedrecordedintheprecedingday.In this case we make three lags of wind speed [day,yesterday,anddaybeforeyesterday]throughwindowsizewe discretized the numerical wind speed attribute to

    categorical ones. For example, we grouped wind speedattributeintofivegroupsveryhigh,high,middle,lowandvery low.Alsoweapplynormalizationmethodonsubsetattributes [average temperature, pressure, humidity, winddirection and three lags wind speed] to conversion thevalues between zero and one. Figure (2) illustratenormalized data, windowing wind speed attribute anddiscretizedtheoutput.

    After we using some preprocessing techniques, we alsovisualized data by using Weka software as displayed inFigure(3).

    V. DATAMININGTASKSINWINDSPEEDSYSTEMS

    Inthenextsections,wedescribetheresultsofapplyingdataminingtasksonourdataforeachofthefourtasks.

    Table1:Listofparameters

    Figure2:Normalizeddata,windowingwindspeedattributeanddiscretizedtheoutput.

    Figure 3:visualizingdatausedinthecasestudyusingWekasoftware.

  • 3 JournalofComputerEngineering Vol2,No1(2011) ISSN: 20101619

    3

    VI.ASSOCIATIONRULESAssociationrulemining,playingacriticalroleinthefieldof data mining, searches for interesting relationshipsamong items in a given data set [6]. It studies thefrequency of items occurring together in transactionaldatabases, and based on a threshold called support,identifies the frequent item sets. Another threshold,confidence which is conditional probability than an itemappears in a transaction when another item appears, isusedtopinpointassociationrules.Forassociation rulesmining,weusedFPGrowthminingalgorithmwithminsupport=0.95andminconfidence=0.8It allows finding rules of the form If [wind_sp1 = low,RH_avg=highandT_avg=high]then[wind_sp0=lowwhere wind_sp1, RH_avg and T_avg] as first rule inFigure(4).However,thepreviousrulesays:whenthelag1ofwindspeedislow,theaverageofhumidityishighandtheaverageof temperatureishigh thenwe found lag0ofwind speed is low. The fourth rule is [wind_sp2 = low,wind_sp1 = low, output (windspeed) = low, RH_avg =high,T_avg=high] then [wind_sp0= low].Thatmeanswhenthelag2ofwindspeedislow,lag1ofwindspeedislow,outputofwindspeedislow,theaverageofhumidityis high and the average of temperature is high then wefound lag0 of wind speed is low. The strength of someassociation rules can be obtained with change minconfidence=0.987asillustrateinFigure(4).

    VII. CLASSIFICATION

    Classification analysis is a data mining task thatorganization of data in given classes. Also known assupervised classification, the classification uses givenclass labels to order the objects in the data collection.Classification approaches normally use a training setwhereallobjectsarealreadyassociatedwithknownclasslabels [8]. In wind speed data mining, we classify thedegreeofwindspeedforeachdayintooneofourclasses(veryhigh,high,middle,lowandverylow),alsowespiltdatato70%fortrainingand30%for testing. Inourcasestudyweusedthreeclassificationmethods:

    Firstmethod:UsingRuleInduction

    Ruleinductionisoneofthemostimportanttechniquesofmachine learning that is extraction of useful ifthen rulesfromdatabasedonstatisticalsignificance[9].Weusedit

    torepresentlogicalrulesofwindspeeddataasillustratedinFigure(5).wereachto64.80%ofaccuracy.

    Secondmethod:Usingknearestneighbor

    A technique thatclassifieseachrecord inadatasetbasedon a combination of the classes of the k record (s)mostsimilartoitinahistoricaldataset[9],inourcasewechosek=5. We reach to 62.70% of accuracy that is less thanaccuracyinruleinduction.

    Thirdmethod:Usingneuralnetwork

    Neural network is a simulation of the human brainacquiresknowledgethroughlearning.AlsoitisNonlinearpredictive models and resembles biological neuralnetworksinstructure.Amultilayeredperceptronnetwork(MLPN) from the rapid miner program were trained ondataset using the feedforward back propagation (FFBP)algorithm with two hidden layer as Figure (6) and thenumberoftrainingcyclesis1000.Inourwindspeeddata,wereachto67.37%ofaccuracy.

    As in result, we find neural network method performedquitewell,comparedwiththeotherclassificationmethods.Buttodeterminewhichmethodisthebest,weappliedTTest to determine the probability for thenullhypothesiswe found the probabilities for random values with thesameresult.

    Figure 4:Topfiveofassociationrulesforwindspeeddata

    Figure 5:Rulemodelofwindspeed

    Figure 6:Thestructureofthefeedforwardbackpropagationneuralnetwork(FFBP).

  • 4 JournalofComputerEngineering Vol2,No1(2011) ISSN: 20101619

    4

    VIII. CLUSTERING

    Similar to classification, clustering is the organization ofdata in classes. However, unlike classification, inclustering, class labels are unknown and it is findinggroupsofobjectssuchthattheobjectsinonegroupwillbesimilar to one another and different from the objects inanother group. Clustering is also called unsupervisedclassification,becausetheclassificationisnotdictatedbygivenclasslabels[8].

    In wind speed data mining, clustering has been used togroup wind speed state for each day according tochangeabilityofweatherforeachday.InourcaseweusedKMeans Algorithm to cluster the given data into fivegroups (k=5) and guide them based on their behavior.Figure(7)givesMeanofeachclusterforeachattribute.

    ToshowthegraphweuseSingularValueDecomposition(SVD)withtwodimensionsasdescribedinFigure(8)

    IX.OUTLIERDETECTIONOutliers are data elements that cannot be grouped in agiven class or cluster also known as exceptions orsurprises,theyareoftenveryimportanttoidentify.Whileoutliers can be considered noise and discarded in someapplications,theycanrevealimportantknowledgeinotherdomains, and thus can be very significant and theiranalysisvaluable[8].

    In our case study, we used outlier analysis to detectoutliersinthewindspeeddataset.Twomethodsappliedforoutlierdetection:

    Firstmethod:UsingDetectOutlier(Distances)

    DetectOutlier (Distances)method to identifiesn outliersinthegivenwindspeeddatasetbasedon thedistance totheirknearestneighbors,inthiscasewesetupparameterswith put the number of neighbors (K) =10 to be theanalyzed, alsowe change the number of topnOutliers=30, and we choose Euclidian distance function will beused for calculating the distance between two points.Toshow the graph we used Singular Value Decomposition(SVD) with two dimensions as described in Figure (9).Red points are representing outliers and blue points arerepresenting normal instances, green circle is surroundedofoutliers.

    Secondmethod:UsingDetectOutlier(LOF)

    Detect Outlier (LOF)method to identifies outliers in thegivenwindspeeddatasetbasedonlocaloutlierfactors,inthis casewe set up parameterswith put10 for the lowerbound for minimal points for the outlier and 30 for theupper bound forminimal points for the outlier, and alsowe choose Euclidian distance function will be used forcalculating the distance between two points. The LOF iscomputed for eachminimal points value in the range byaveraging the ratio between the minimal points localreachabilitydensity of all objects in the kneighborhoodand the object itself.To show the graphwe useSingularValue Decomposition (SVD) with two dimensions asdescribed in Figure (10). Red points are representingoutliersandbluepointsarerepresentingnormalinstances,greencircleissurroundedofoutliers.

    Figure7:ClusteringwindspeeddataintofivegroupsusingKMeansAlgorithm

    Figure 8:Graphofwindspeedclusters

    Figure9:GraphdisplaytheoutliersinwindspeeddatabyDetectOutlier Distances

  • 5 JournalofComputerEngineering Vol2,No1(2011) ISSN: 20101619

    5

    X. CONCLUSIONIn this paper, we applied data mining tasks (associationrules, classification, cluster and outlieranalysis) onwindspeed data set. The data set observation recorded fromJanuary 2003 to November 2006 daily historical datathroughmeteorologicalstationofGaza.

    WeappliedassociationrulesbyusedFPGrowthminingalgorithmwithminsupport=0.95andminconfidence=0.8and tried to obtain five strength rules with change minconfidence to0.987.

    Thenwediscoveredclassificationrules,weappliedthreemethod: rule induction, k nearest neighbor and neuralnetwork.Inruleinductionwereachto64.80%ofaccuracyandwithusingknearestneighborwereachto62.70%ofaccuracy that is less than accuracy in rule induction,finally,weappliedfeedforwardbackpropagation(FFBP)algorithmandreachto67.37%ofaccuracy.

    Alsoweclusteredthewindspeeddataintofivegroupsandguide thembased on their behavior.After that,we usingoutlier analysis we detected all outliers in the data, weapplied two methods: Detect Outlier (Distances) andDetectOutlier(LOF).

    Finally, we found the most appropriate of thesetechniques to be applied onweatherdata is classificationtask, especially the neural networks method because thenatureofthedataistimeseries.

    REFERENCES[1] Cortez,P.andA.Morais,"Adataminingapproachto

    predict forest fires usingmeteorological data.", NewTrends in Artificial Intelligence, Proceedings of the13th EPIA Portuguese Conference on ArtificialIntelligence,pp:512523,2007.

    [2] EliaGeorgianaPetre, "ADecisionTree forWeatherPrediction"PP:7782,LXI,No1/2009.

    [3] Fugon L., Juban J., Kariniotakis G., 2008. "Datamining forwind power forecasting".EuropeanWindEnergyConfernce.

    [4] Sreelakshmi K, Ramkanthakumar, P., 2008. "NeuralNetworks for shortterm wind speed prediction".World Academy of Science, Engineering andTechnology42,721725.

    [5] Tugay Bilgin and Yilmaz amurcu, 2004, "A DataMiningApplicationonAirTemperatureDatabase",inLNCS 3261 Advances in Information Systems,SpringerBerlin/Heidelberg,ISBN9783540234784,pp.6876

    [6] S. YAU AND L. ZHOU, "Association rule miningand quantitative association rule mining amonginfrequentitems"

    [7] World Meteorogical Organization (WMO)http://www.wmo.int/pages/index_en.html [Online]

    [8] http://churmura.com/technology/computerscience/whatcandataminingdo/30684/ [Online]

    [9] http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm [Online]

    [10]http://www.webgaza.net/background/Gaza_City.htm[Online]

    [11]http://www.gazaweather.com/ [Online]

    Figure10:GraphdisplaytheoutliersinwindspeeddatabyDetectOutlierLOF