View
5
Download
0
Category
Preview:
Citation preview
Humanitarian Assistance Ontology
Implementation during Disaster Management
in Chennai Flood-2015 Using Text Mining
Techniques 1C. Anbarasi and
2P. Mayilvahanan
1Vels University,
Pallavaram, Chennai.
anbuchandignou@gmail.com 2Department of Computer Applications,
Vels University, Chennai, Tamil Nadu, India.
Mayilkadir@yahoo.com
Abstract A disaster management plan for the city is in the works, following alarm
over a series of earthquakes that have recently occurred in Nepal, and the
tremors felt in various parts of the country, flood disaster in Chennai 2015.
The main task of the research work is being focused on constructing an
application using data mining techniques and algorithms during disaster
situation
The Commissionerate of Revenue Administration in association with
Chennai Corporation and Chennai district collectorate will work together
on this, according to sources. Chennai is yet to have a comprehensive
disaster management plan, which includes predefined roles and
responsibilities with specific tasks for each official. The disaster
management plan will include detailed mapping of safest escape routes
and resources for facilitating rescue and relief operations. The lack of a
disaster management plan has previously led to a delay in relief and rescue
work after major disasters such as tsunami and flood in the city.
We will start collecting data on resources at the ward level. Chennai
Corporation has 200 wards covering an area of 426 sq km. Ward level
mapping was done after the tsunami. Some of the earlier works pertaining
to mapping for disaster preparedness are not relevant after the boundaries
of wards and zones changed following expansion of the city, said a disaster
International Journal of Pure and Applied MathematicsVolume 116 No. 21 2017, 729-739ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu
729
management expert who worked on the resilience index for the city. A
Climate Disaster Resilience Index prepared for Chennai based on the data
collected on five elements-physical, social, economic, institutional and
natural-for ten old zones of the Corporation has to be revised for the 15
existing zones.
There is also a need for integrating other local bodies on the outskirts in
the disaster management plan. According to a previous study, the coastline
from Ennore to Kasimedu Fishing Harbour was found to be safer. The
coastline from Cooum River to Kovalam creek was more vulnerable to
disasters such as a tsunami. The plan for disaster management will have a
list of low lying areas, slums, persons with disability, senior citizens,
pregnant ladies, cooks, electricians, power cutting tools, ham radios,
dilapidated buildings, hospitals and schools.
Humanitarian Assistance Ontology in data mining for Crisis Response is
the new beginning for researchers, the purpose of which is to develop an
application tool using ontology concept in data mining. In the existing
methods the data is collected manually. After the data collection, it is sent
to the other team of people for inspection and verification. Once the
inspection and verification process is completed, the report is submitted to
the supervising team of workers. The report is then sent to the decision
makers to appoint human assistance during disaster. So, this type of
execution of work during disaster situation is a tedious procedure. This
procedure depends upon more number of workers.
Recent proposal of OntoDm helps to build a decision support system
during a disaster situation using humanitarian assistance ontology for
decision makers. The proposed ontology named OntoDM, is based on a
general framework for data mining. This includes dentition of basic data
mining concepts like data type and dataset. The disaster online data is
collected from social media which is extremely heterogeneous, both
structurally and semantically, which creates a need for data integration and
ingestion in order to assist the emergency management officials in rapid
disaster recovery whenever disasters occur.
Comments and queries during disaster through social media like twitter,
whats’app, e-mail, facebook, data in blogs are collected and integrated for
analysis and management. In this research work the sample data of
Chennai flood- 2015 is taken for analysis. The research is being conducted
with social conscience, which could help the society to be aware of the
disaster situation. By sharing the situation through social media, many of
us are exposed to current crisis need. It is an application of concept of
Ontology and how it could be useful to share their situation through online
and seek the help of National Disaster Relief Force (NDRF) and State
Disaster Relief Force (SDRF). An attempt to bring out new trends in the
International Journal of Pure and Applied Mathematics Special Issue
730
data mining research was a comprehensive study to build a onto-database
to recognize the right information at right time to right people as a
technology in context of crisis response.
Key Words:Disaster management, social media, ontology, classification
algorithm, text mining, word cloud.
1. Introduction
All these data are in many different formats. The solutions can be categorized
into main components that interact with one another, namely to employ disaster
management data spaces. Immediately after a disaster, the data is not in a
structured format. But, by applying data mining concepts and algorithms the
data is automatically pre-processed into useful information for decision makers
in a short time. To pre-process the data, the research work propose a method
which merges logic rules and data mining techniques, to represent the
humanitarian needs.
2. Literature Review
The findings have shown that very minimal data had been analysed from the
necessary data collection during a disaster. Crisis management operations must
be in dynamic nature and not static. There is no proper method towards crisis
management operations. Also, to develop a decision support system for decision
makers during crisis response time to enhance task performance 1. This paper
proposes a graph clustering method on a directed weighted graph network for
detecting communities in social network, based on neighbourhood nodes and
the frequency of the path traversed 2
. The social media attracts the youth to
organize any difficult task and motivate them to be an activist during
emergency. However, research is being carried over to find the best way to
organize and to improve response methods during crisis.3.
Social media
supports wide-scale interaction that can be collectively resourceful, self-
policing, and generative of information that is otherwise hard to obtain 4.
An automatic approach for identifying messages communicated via Twitter
contributes to situational awareness, which alarms the public during
emergencies 5. The Intelligent Disaster Decision Support System (IDDSS) was
recently developed to provide a platform for integrating a vast range of road
network, traffic, geographic, economic and meteorological data as well as
dynamic disaster and transport models 6. This paper attempts to assess the role
of IDSS in decision making. First, it explores the definitions and understanding
of DSS and IDSS. Second, this paper illustrates a framework of IDSS along
with various tools and technologies that support it 7.
3. Proposed Methodology
Once the cluster solutions are arrived, the clusters are further examined for their
International Journal of Pure and Applied Mathematics Special Issue
731
right classification. In classification algorithm the data sets are homogeneous,
This research work deals with 300 users water related comments The proposed
method also recommends appropriate humanitarian responses. The main
advantage of the OntoDM is to prioritize humanitarian needs and to identify
humanitarian responses on water related queries automatically. This proposed
method converts unstructured data into structured data. Now, the decision
makers can focus more on implementing the solutions. The method is
implemented on real data from the Chennai Flood 2015 crisis. The use of this
method in the Chennai Flood 2015 crisis , clearly shows the need for
humanitarian assistance in specific places to avoid delay and confusion in
humanitarian responses in real time. The developed application DisManOnto
collects information from social media related to disaster situation. Using
classification algorithm, water-related queries are extracted and stored in a data
base. By applying a set of ontological rules the humanitarian assistance
ontology is recommended to construct a decision support system.
Structure of HAO-System
Figure 1. Structure of a HAO system presents the overall data flow for the
implementation of HAO in identifying the response type during disaster. Data is
collected from the social media using text mining libraries. The unstructured
data is changed to structured input data. Features are extracted from the data.
These features are presented to the input layer of HAO algorithm along with
target values assigned for each cluster analysis. At the end of cluster analysis of
the HAO algorithms, final SPSS-results stored in the database.
International Journal of Pure and Applied Mathematics Special Issue
732
Figure 1: Structure of HAO-System
Architecture of HAO with Text Mining Libraries
Figure 2: Architecture of HAO-System
Social Media -Data Query
When a disaster happens, the system will receive a lot of information at once. It
is necessary for the system to select a small portion of entities that a user really
Inductive queries
Data preprocessing
Data extraction
Disaster management data collection from messages, e-mail,
twitter, whatsapp, facebook and blogs, web portal
Unstructured data
structured data
Onto database
Document summarization Spatial clustering
Reports
Classification clustering
International Journal of Pure and Applied Mathematics Special Issue
733
cares about to display in the dashboards.
Development Sample
By applying the following statistical method, sample data can be developed into
a useful information.
1. statistical modeling – natural language processing, sentiment analysis, naïve
bays k-means clustering
2. Business rules formulation- strategic rules to promote meaningful insights
3. Statistical tools- R language and SPSS.
Text Mining
Data process and service component retrieves data from the databases, and
performs necessary data process, analytical, or mining functions to generate
response results requested from the web interface e.g., the customer remarks,
information from external sources. The unformatted data input from developed
sample is changed to formatted data using text mining libraries and word cloud.
Decision Making by NLP, Sentiment Analysis
We have developed techniques to facilitate information sharing and
collaboration between both private and public sector participants for major
disaster recovery planning and management. The proposed work is an
integrated environment for exploratory analysis of spatial data that equips an
analyst with a variety of data mining tools and provided the service of
automated mapping of source data and data mining results.
4. Text Mining Techniques Used
Creating a word cloud using R software
Install and Load the Required Packages
Text Mining and Word cloud Packages are Required.
They can be installed and loaded using the R code For eg:install.packages
(“tm”) for text mining, install.packages(“SnowballC”) for text stemming. Load
library(“tm”) and library(“SnowballC”)
Algorithm 1 - Term Extraction Input:
Output: The list N of comments multi-word terms.
Step 1: Collect bigram frequencies for M in a proximity database ontoDB.
Step 2: For all 4-grams s t u v in M, remove one count for
t u in ontoDB if
- mi(t, u) < mi(s, t) or
- mi(t, u) < mi(u, v)
Step 3: For all entries (t, u) in ontoDB, add (t, u) to a list N if :
- C(x, y) > minCount
-S(x, y) >minLogM
Algorithm 2 - Multi-word term extraction algorithm
International Journal of Pure and Applied Mathematics Special Issue
734
Input: A list N of two-word c for a corpus M in any language and a proximity
Database ontoDB consisting of bigram frequencies for M.
Output: The list X of extracted multi-word terms.
Step 1: collect features for comments terms For each comment u in N
For each v1 v2 …u …v2i-1 v2i in M
Add all possible substrings involving c in ontoDB.
Step 2: save the proximity database
Remove each entry in ontoDB that has frequency < minFreq.
Step 3: Extend two-word comments into an initially empty list X
For each comment u in N
extend(u, X, ontoDB)
if most occurrences of u in the corpus have not been
extended then add u to X.
Cleaning the text
The tm_map() function is used to remove unnecessary white space, to convert
the text to lower case, to remove common stopwords like „the‟, “we”. The
information value of „stop words‟ is near zero due to the fact that they are so
common in a language. Removing this kind of words is useful before further
analysis. The R code can be used to clean your text:
Algorithm 3 To remove Stopwords
Input – group of sentences
Output – sentences without stopwords
Step 1. Input given
Step 2. Convert the text to lower case
Step 3. Remove numbers
Step 4. Remove english common stopwords
Step 5. Remove your own stop word
Step 6. Remove punctuations
Step 7. Specify your stopwords as a character vector
Step 8. Eliminate extra white spaces
Step 9. Text stemming
Step 10. Sentences without stop words
Algorithm 4 - Data generation in Tabular form Input – Unstructured Information
Output – Structured information
Step 1 – Browsing information
Step 2 – Document Preprocessing Sentence Boundary detection Stop word removal
Stemming. term disambiguation
Step 3 – Term Extraction
Calculate term frequencies.
Calculate percentage
Step 4 – Check for the terms in ontology.
International Journal of Pure and Applied Mathematics Special Issue
735
Step 5 - data is stored in the table for the terms that have similarity frequency
5. Implementation
The word cloud and text mining techniques are used in Chennai–flood 2015
data set and the results are generated. The below screen shots shows that the
text mining libraries are implemented successfully. Comments from social
media are captured, clustered, classified, evaluated and reports related to crisis
response is generated.
Figure 3: Disaster Management System Home Page
Figure 4: Database - Chennai Flood 2015
Figure 5: Display of Calculation of Percentage from Various Respondents
6. Future Enhancement
The system evaluation results demonstrate the effectiveness and efficiency of
our proposed approaches. During the system implementation and assessment
process, the users provided suggestions, limitations and possible enhancements.
International Journal of Pure and Applied Mathematics Special Issue
736
Our future efforts will be focusing on the following tasks: evaluation with
images and video files. To develop more accurate and fast application tool
which captures the current user‟s comment and provide them with actionable
answers dynamically. The feedback from our users is positive and suggests that
our system can be used to share the valuable actionable information and to
pursue more complex tasks.
7. Conclusion
This research work discusses about the steps of HAO implementation during
disaster management in Chennai flood 2015. The extraction of structured data
from unstructured data conveys the demographic details of the affected people.
From this proposed methodology, an application tool is developed on inductive
queries and comments collected from social media during Chennai flood – dec‟
2015. Using text mining libraries and word cloud the data set is classified and
by using the logical rules the implementation of classification algorithm helps
the data to be structured. From the pool of information, we are able to convert
raw data into meaningful information. This meaningful information helps the
decision makers in speedy crisis response action during a disaster situation.
References [1] Saeed N.A.L., Zakaria N.H., Ahmad M.N., The use of Social Media in
Knowledge Integration for Improving Disaster Emergency Management Task Performance: Review of Flood disasters, Indian Journal of science and technology 9(34) (2016), 1-12.
[2] Parimala M., Lopez D., Kaspar S., K-Neighbourhood Structural Similarity Approach for spatial Clustering, Indian journal of Science and Technology 8(23) (2015), 1-11.
[3] Singla M.L., Apoorv D., How Social Media Gives You Competitive Advantage, Indian 8(4) (2015), 1-6.
[4] Sutton J., Palen L., Shklovski I., Backchannels on the front lines: Emergent uses of social media in the southern California wildfires, Proceedings of the 5th International ISCRAM Conference (2008), 1-9.
[5] Verma S., Natural Language Processing to the Rescue Extracting Situational Awareness, Tweets During Mass Emergency, ICWSM (2011),1-8.
[6] Kaviani A., Thompson R.G., Rajabifard A., Griffin G., Chen Y., A decision support system for improving the management of traffic networks during disasters, Australasian Transport Research Forum (2015).
[7] Tariq A., Rafi K., Intelligent decision support systems-A framework,Information and Knowledge Management 2(6) (2012), 1-9.
[8] Lindsay B., Social Media and Disasters: Current Uses, Future Options, and Policy Considerations, Congressional Research Service (2011).
International Journal of Pure and Applied Mathematics Special Issue
737
[9] Bizer C., Lehmann J., Kobilarov G., Auer S., Becker C., Cyganiak R., Hellmann S., DBpedia-A crystallization point for the Web of Data. Web Semantics: science, services and agents on the world wide web 7(3) (2009), 154-165.
[10] Nuno F., Borbinha J., Calado P., An Approach for Named Entity Recognition in Poorly Structured Data, The Semantic Web: Research and Applications (2012), 718–732.
[11] Frantzi K., Ananiadou S., Mima H., Automatic recognition of multi-word terms:the c-value/nc-value method, International Journal on Digital Libraries 3(2) (2000), 115-130.
[12] Harrison C., Jorder M., Stern H., Stavinsky F., Reddy V., Hanson H., Waechter H., Lowe L., Gravano L., Balter S., Using online reviews by restaurant patrons to identify unreported cases of foodborne illness–New York City, 2012-2013, Morbidity and Mortality Weekly Report (MMWR) 63(20) (2014), 441–445.
[13] Kang J.S., Kuznetsova P., Luca M., Choi Y., Where not to eat? Improving public policy by predicting hygiene inspections using online reviews, Empirical Methods in Natural Language Processing (2013), 1443–1448.
[14] Lamb A., Paul M.J., Dredze M., Separating fact from fear: Tracking flu infections on Twitter, Proceedings of NAACL-HLT (2013), 789–795.
[15] Myers S.A., Leskovec J., The bursty dynamics of the Twitter information network, Proceedings of the 23rd International World Wide Web Conference (2014), 913–924.
[16] Naaman M., Becker H., Gravano L., Hip and trendy: Characterizing emerging trends on Twitter, Journal of the American Society for Information Science and Technology 62(5) (2011), 902–918.
[17] Paul M.J., Dredze M., You are what you tweet: Analyzing Twitter for public health, Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011).
[18] Petrovi´c S., Osborne M., Lavrenko V., Streaming first story detection with application to Twitter, Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (2010), 181–189.
[19] Psallidas F., Becker H., Naaman M., Gravano L., Effective event identification in social media, IEEE Data Engineering Bulletin 36(3) (2013), 42–50.
[20] Magdum P.M., Nandedkar V.S., Developing D-Matrix of Unstructured Text Using Ontology Based Text Mining. International Advanced Research Journal in Science, Engineering and Technology 3(6) (2016), 122-144.
International Journal of Pure and Applied Mathematics Special Issue
738
739
740
Recommended