Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Agile Analytics: How organizations can benefit from the agile
methodology for smoother delivery of data-driven analytics projects
Masterarbeit
Zur Erlangung des Grades eines Masters of Science im Studiengang Webwissenschaft
vorgelegt von
Rafi Ullah
215202917
Erstgutachter: Prof. Dr. Harald von Kortzfleisch, Institut für Management
Zweitgutachterin: Denisse Mendoza Almanzar, M.Sc., Institut für Management
Betreuung: Denisse Mendoza Almanzar, M.Sc., Institut für Management
Koblenz, August 2019
ii
Erklärung
Ich versichere, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen als
die angegebenen Quellen und Hilfsmittel benutzt habe.
Ja Nein
Mit der Einstellung dieser Arbeit in die Bibliothek bin ich einverstanden. ☐ ☐
ABSTRACT iii
ABSTRACT
Despite the inception of new technologies at a breakneck pace, many analytics projects
fail mainly due to the use of incompatible development methodologies. As big data
analytics projects are different from software development projects, the methodologies
used in software development projects could not be applied in the same fashion to
analytics projects. The traditional agile project management approaches to the projects do
not consider the complexities involved in the analytics. In this thesis, the challenges
involved in generalizing the application of agile methodologies will be evaluated, and
some suitable agile frameworks which are more compatible with the analytics project will
be explored and recommended. The standard practices and approaches which are
currently applied in the industry for analytics projects will be discussed concerning
enablers and success factors for agile adaption. In the end, after the comprehensive
discussion and analysis of the problem and complexities, a framework will be
recommended that copes best with the discussed challenges and complexities and is
generally well suited for the most data-intensive analytics projects.
Keywords Agile Analytics, BI, Data Science, Data Analytics
TABLE OF CONTENTS iv
TABLE OF CONTENTS
1 Introduction ............................................................................................................. 1
1.1 General ............................................................................................................... 1
1.2 Problem Statement ............................................................................................. 2
1.3 Research Questions and Objectives ................................................................... 3
1.4 Research Methods .............................................................................................. 4
2 Theoretical Background ......................................................................................... 5
2.1 Agile Development ............................................................................................ 5
2.2 Agile Software Development ............................................................................. 5
2.3 Different Types of Agile Methods ..................................................................... 5
2.3.1 Extreme Programming (XP) ....................................................................... 6
2.3.2 SCRUM ...................................................................................................... 6
2.3.3 Kanban ........................................................................................................ 7
2.3.4 Crystal ......................................................................................................... 7
2.3.5 Feature Driven Development...................................................................... 8
2.3.6 Dynamic Systems Development Method (DSDM) .................................... 8
2.4 Business Intelligence (BI) .................................................................................. 9
2.4.1 Introduction to BI ....................................................................................... 9
2.4.2 BI Architecture ......................................................................................... 10
2.4.3 BI architecture Layers .............................................................................. 10
2.5 Big Data ........................................................................................................... 12
2.5.1 Three V’s from Big Data .......................................................................... 12
2.5.2 Big Data Tools and Technologies ............................................................ 13
2.6 Agile Analytics ................................................................................................ 16
2.6.1 Introduction to Agile Analytics ................................................................ 16
2.6.2 Phases of Agile Analytics ......................................................................... 19
2.7 Data Value Chain for Analytics ....................................................................... 21
2.7.1 Data Discovery & Acquisition ................................................................. 21
2.7.2 Integration ................................................................................................. 22
2.7.3 Data Analysis & Delivery......................................................................... 23
2.7.4 Data Governance ...................................................................................... 23
2.8 Business Intelligence and Analytics ................................................................ 24
2.8.1 Levels of Business Analytics .................................................................... 25
2.9 Past and Existing Agile Analytics Development ............................................. 26
v
2.10 Agile BI ............................................................................................................ 26
2.10.1 Factor Effecting Agile BI ......................................................................... 28
3 Literature Review .................................................................................................. 32
3.1 Introduction to Literature Review .................................................................... 32
3.1.1 The need for conducting a Literature Review .......................................... 33
3.1.2 Stages of Literature Review ..................................................................... 34
3.1.3 Structure of the Literature Review ........................................................... 34
3.2 Selected Research and Literature Review Methodologies .............................. 35
3.2.1 A Short Guide to Action Research (Johnson, 2012) ................................ 35
3.2.2 Guidelines for performing Systematic Literature Reviews in Software
Engineering (Kitchenham & Charters, 2007) and Procedures for Performing
Systematic Reviews (Kitchenham, 2004) ............................................................... 37
3.2.3 Scoping studies: Towards a methodological framework (Arksey &
O’Malley, 2005) ...................................................................................................... 41
3.3 Chosen Methodology ....................................................................................... 43
4 Systematic Literature Review .............................................................................. 44
4.1 Planning the Review ........................................................................................ 44
4.1.1 Review Objectives .................................................................................... 44
4.1.2 Research Questions .................................................................................. 44
4.1.3 Search Strategy ......................................................................................... 44
4.1.4 Search Criteria .......................................................................................... 45
4.2 Conducting the Review .................................................................................... 46
4.2.1 Search the Literature ................................................................................. 47
4.2.2 Selection of Literature .............................................................................. 48
4.2.3 Literature Quality Assessment.................................................................. 50
4.2.4 Data Extraction and Synthesis .................................................................. 54
4.2.5 Present the Review ................................................................................... 61
5 Conclusion, Limitations and Future work .......................................................... 71
5.1 Conclusion ....................................................................................................... 71
5.2 Limitations ....................................................................................................... 79
5.3 Future Work ..................................................................................................... 79
6 References .............................................................................................................. 81
7 Annotated Bibliography ....................................................................................... 87
LIST OF FIGURES vi
LIST OF FIGURES
Figure 1 Research Onion (Own Image Following (Saunders & Schwaferts, n.d.) .......... 4
Figure 2 Business intelligence architecture (Own image following (Jasim Hadi, Hameed
Shnain, Hadishaheed, & Haji Ahmad, 2015) ................................................................. 10
Figure 3 ETL process ..................................................................................................... 11
Figure 4.Analytics ladder (Own image following Tom, 2012) ...................................... 17
Figure 5 (Own Image following Nagle & Sammon, 2013) ............................................ 22
Figure 6 Stages of Literature (Own image following (Levy & Ellis, 2006)) ................. 34
Figure 7 Steps for Action Research Process (Own image following (Johnson, 2012)) . 36
Figure 8 Steps to perform Systematic Literature Review (Own image following (Okoli
& Schabram, 2010)) ....................................................................................................... 40
Figure 9 Iterative search cycle (Own image following (Jacobs, 2015) .......................... 47
Figure 10 Snowballing step (Own image following (Wohlin, 2014)) ............................ 49
Figure 11 The selection and filtration stages of primary studies (Own image following
(Ahmad et al., 2013)) ...................................................................................................... 51
Figure 12 Grounded Theory Process (Own image following(Chun Tie et al., 2019)) .. 57
Figure 13 Year-wise selected publications (Own Image) .............................................. 62
Figure 14 Percentage of methodologies applied in selected papers (Own Image) ......... 63
Figure 15 Research papers type (Own Image) ............................................................... 64
Figure 16 Agile Data Science lifecycle (Own image following (Schleier-Smith, 2015))
........................................................................................................................................ 72
Figure 17 Agile Analytics Pyramids (Own image following (Schleier-Smith, 2015)) .. 73
LIST OF TABLES vii
LIST OF TABLES
Table 1 Apache technologies and description following (SINGH & SINGLA, 2015) .. 16
Table 2 Disadvantages of traditional Business Intelligence Approach following
(MUNTEAN & SURCEL, 2013) ................................................................................... 27
Table 3 “five C’s” for literature review following (Labaree, n.d.) ................................. 33
Table 4 Difference to Literature Review ........................................................................ 33
Table 5 Comparison of Systematic Reviews with Scoping Reviews (Own table
following (Grant & Booth, 2009) ................................................................................... 43
Table 6 Database resources for literature ....................................................................... 45
Table 7 Query results after search and initial filters ....................................................... 49
Table 8 Selected Journals and Conference Papers ......................................................... 53
Table 9 Grounded theory and Meta-Ethnography difference (Own table following
(Khan, 2014) ................................................................................................................... 58
Table 10 Concept Matrix (Own image following (Müller-Bloch & Kranz, 2015)) ....... 58
Table 11 Concept Matrix ................................................................................................ 61
Table 12 Type of Research papers ................................................................................. 63
Table 13 Papers findings and summary .......................................................................... 70
LIST OF ABBREVIATIONS viii
LIST OF ABBREVIATIONS
Business Analytics (BA)
Data Science (DS)
Systematic Literature Review (SLR)
Extreme Programming (XP)
Test-Driven Development (TDD)
Product Owner (PO)
Work In Progress (WIP)
Feature Driven Development (FDD)
Dynamic System Development Method (DSDM)
Rapid Application Development (RAD)
Data Warehouse (DW)
Hadoop Distributed File System (HDFS)
Value-Driven Development (VDD)
Business Intelligence (BI)
Big Data (BD)
Big Data Management (BDM)
CI/CD (Continuous Integration/Continuous Delivery)
Introduction 1
1 Introduction
1.1 General
To be in competition, modern enterprises utilizing data for their competitive advantage.
Organizations are collecting and curating data on a grand scale in support of innovative,
data-intensive applications both to increase revenues and to improve operational
efficiencies. It has become clear that the value of data lies in an organizations ability to
operationalise it. To work on the ocean of data to create data pipeline and extract useful
information for decision support Business Analytics (BA) is a great challenge (Valentine
& Merchan, 2016). Moreover applying some data science models and get it to production
as a working product is no less than a battle. Teams are not coding collaboratively; models
have to wait for engineering and testing, which ultimately makes it difficult to manage
the resources.
In data-driven digital transformation, the ability to derive and process data in a
meaningful way offers a competitive advantage. It helps to create new questions over that
analytics and identify new opportunities. Combining data with suitable agile
methodologies makes the process of achieving business value very efficient and cost-
effective.
Data Science, together with agile methodology, facilitates to observe more and beyond to
solve the problems creatively and in a systematic way. Over time, agile principles and
practices have been evolved, along with new challenges and future directions. New
trends, including data science and fast analytics, have arisen as part of business
intelligence (Dharmapal & Sikamani, 2016).
Analytics product delivery could not be accomplished through traditional waterfall
software development model because analytics is directed more towards data discovery,
getting the relevant information and then making sense out of it. The assumption is to
how to utilize an agile methodology with a focus on knowledge discovery rather than
software features development.
Agile Analytics is also based on a set of rules and guiding principles. Practically, It is a
set of well-organized techniques, which can partially be customized for specific analytics
project (Collier, 2011 p.7). Agile analytics comprises of all the practices ranging from
project planning and management to monitoring. It also enables effective collaboration
Introduction 2
between all stakeholders, including customers, management and the teams responsible
for project delivery.
1.2 Problem Statement
Ever since the introduction of agile methodologies in 2001. They had been playing a vital
role in software development and has been applied excessively in many organizations
ranging from start-ups to large enterprises. By applying agile methodologies, the
organizations have delivered successful software development projects (Collier, 2011
p.9). Development of analytics projects including BI, big data and data science are not
possible with a traditional agile approach. The focus of an analytics project is to create
business value using data discovery. The agile methodologies will be applied to analytics
software with a focus on information use rather than tool development. There is a growing
requirement in the last five years to provide faster information and analytics for decision
making (Collier, 2011 p.10).
In analytics, BI systems are provided to assist the decision-making for top management
of an organization. For the success of any BI initiative, it is essential to deliver accurate
information. To some extent, BI solutions still lag due to time constraints, which
ultimately causes project failure. To tackle all these issues and to get the required
information to management, BI system development should be agile.
With the emergence of data science, it became part of a much broader set of analytics
applications across all industries. The common challenge of enterprise data science is that
the specialized data science teams are working on the things which are not related to their
expertise. With a traditional waterfall model of product development, it is challenging to
create an environment of collaboration, data is mostly inaccessible, and data scientist are
not aware of project ownership (Valentine & Merchan, 2016). With an agile approach to
data science, it embraces collaboration and creativity. Changes can be made later as a
project evolves, and stakeholder feedback is ongoing.
Introduction 3
1.3 Research Questions and Objectives
In any organization development of analytics and BI system requires the people from
much different expertise to collaborate to bring the requirements to the actual product.
RQ1: How have agile methodologies evolved with the emergence of data science?
RQ2: Which agile frameworks, methodologies and best practices are well known for
data-intensive applications and how they could be applied?
RQ3: What are the enablers and success factors of Agile Analytics?
RQ4: What are the challenges organizations faced for adopting agile methodologies for
analytics applications?
The objective of the research is to identify how organizations can employ the best agile
practices to enhance productivity and successful completion of data analytics projects and
create value for the organizations. The focus will be on how we can create value from
data using Agile Analytics. It provides a way of exploring the data by focusing on finding
the value (Tom, 2012). It will be done by understanding the factors that affect Agile
Analytics. Any analytics application including BI, big data and data science systems
comprises of different processes, and each process is composed of multiple steps. Then
to be completely agile, all the steps and components should also be agile. At every step,
all the components from BD, DS and BI and software development teams should work
and collaborate smoothly (Collier, 2011 p.38). The bottleneck in any component will
affect the project as a whole and will slow down the delivery. The outcome of this thesis
would be the guidelines on how an organization can apply agile methodologies to data
analytics projects.
Introduction 4
1.4 Research Methods
Qualitative Research Methods
The essential purpose of the research is to deliver the factual knowledge by abolishing
the myths and misconceptions which allow the people to act (Timulak, 2009). There are
generally two methods for conducting research they are quantitative and qualitative
methods (Yin, 2015). In this thesis, we will use qualitative methods. The systematic
literature review (SLR) and document analysis are used as a research methodology. A
systematic literature review (SLR) is an organized way to identify, evaluate and interpret
research findings related to a specific research topic (Saltz & Shamshurin, 2016). It
facilitates the identification of gaps and helps summarize existing research concerning
process and methodologies used by agile teams to collaborate.
Figure 1 Research Onion (Own Image Following (Saunders & Schwaferts, n.d.)
Theoretical Background 5
2 Theoretical Background
2.1 Agile Development
The core idea and manifesto behind agile is “individuals and interactions over processes
and tools; working software over comprehensive documentation; customer collaboration
over contract negotiation; and responding to change over following a plan. The result of
following these ideals, software development becomes less formal, more dynamic, and
customer-focused” (Chang & Larson, 2016).
2.2 Agile Software Development
In recent years, agile methodologies have emerged as a whole new alternative to
traditional software project management methodologies. There has been a consensus for
a substitute for documentation driven software development processes (Ilieva, Ivanov, &
Stefanova, 2004). The key challenges involved in software development are changes
which occur and evolves at a much higher rate. In that case, if there occurs any change
later in the project lifecycle, it will not only increase the cost but will also delay the project
delivery. Agile methodologies have been introduced to counter these risks.
After the introduction of agile manifesto in 2001, many agile methods have been
introduced, although these methods differ in particulars but serve the core idea of enabling
the teams to accommodate the rapid change which ultimately reduces the project cost and
make sure timely delivery (Paetsch et al., 2003).
Although every agile method is useful for a separate set of problems and contexts, the
challenge remains how to determine which agile method is suitable for given project
activities. All methodologies come with the risk and the main tasks for project managers
is to identify and manage those risks by utilizing the most effective methodology for a
specific situation.
2.3 Different Types of Agile Methods
The agile methodologies have emerged at a breakneck pace. There are multiple
methodologies introduced for different problem contexts. We will discuss some of the
agile methodologies and will compare how they are different in application and working.
Theoretical Background 6
2.3.1 Extreme Programming (XP)
Being the widely used agile methodology, XP supports the values shared by the agile
manifesto for software development and also specify its own set of principles and
guidelines (Paetsch et al., 2003). Extreme programming is based on a set of values. It is
a way to work in coordination with corporate and personal values. It analyzes the progress
and evaluates activities in the software development process. XP values include
simplicity, communication, feedback, courage, and respect (Newkirk, 2002). Extreme
programming is applied using a set of 12 practices. It works well mostly with a small
development of up to 15 developers. Rather than focusing on a long process of
requirements and design documentation XP focus on executable functionality and TDD.
It emphasized robust regression testing, refactoring and code inspections through pair
programming.
2.3.2 SCRUM
Scrum is an agile framework used for managing the entire lifecycle of software
development projects. Sometimes discussed as an agile methodology, Scrum is a robust
framework used for managing software development processes. It put extra responsibility
to the teams to decide and come up with the working solution rather than providing detail
for each step. The scrum model contains multiple components, but the roles, processes,
and artefacts are considered as the main components. In Scrum planning meeting, teams
commit to a specific set of the feature instead of task definitions. The scrum teams consist
of five to ten people each, who work on the development of assigned features from the
start to the end (Cervone, 2011). The teams are self-organizing and cross-functional.
Overall there is no team leader to assign tasks, and all the issues and problems are solved
as a whole team. As scrum is an iterative technique, it delivers the product features in an
incremental way. Scrum could be applied to projects of either size from small, medium
to large scale projects and even scalable to the whole organization (Livermore, 2007).
Scrum contains two important roles Scrum Master and Product Owner (PO). Scrum
Master’s responsibility is to make sure that the team is working according to scrum rules
and guidelines. PO represents customers and conveys the business requirements and lead
towards the required product (Angela Stringfellow, 2017). Scrum processes consist of
five main activities, including sprint kick-off meeting, sprint planning, the sprint itself,
daily scrum meeting, and at the end sprint review meeting (Cervone, 2011). Projects
Theoretical Background 7
evolve with a series of sprints. In the planning meeting at start team members commit to
certain features, and then sprint backlog is created to track the project progress.
2.3.3 Kanban
Organizations moving towards agility across the world are implementing Kanban to their
existing software development processes to model their workflows in an agile way and
faster delivery to market (Matharu, Mishra, Singh, & Upadhyay, 2015). Kanban is a lean
method and popular framework used in agile software development. The robust feature
provided by Kanban is to visualize the flow of work using charts with every column
representing each step in the flow (Poppendieck & Cusumano, 2012). In software
development, each work item is represented visually on a board called Kanban board,
through which team members can see the actual progress and state of the work item. It is
a method to create the product with continuous delivery without burdening the teams,
which ultimately helps the teams to work together more efficiently. The Kanban offers
visibility to each process in software development by showing assigned task to each
developer and prioritize the critical tasks and spot the bottlenecks.
Moreover, its objective is to minimize WIP items and develop only those which are
requested by the customer, which will produce the flow of continual delivery of items to
the customer. According to team capacity, Kanban limits the work in progress items and
balance the demand and output of the team’s delivered items. It helps to maintain a steady
flow by visualizing the processes and minimizing the bugs (Ahmad, Markkula, & Oivo,
2013).
2.3.4 Crystal
Crystal is an agile development framework and a part of crystal methodologies family
developed by Alistair Cockburn in early 1990s. Crystal, along with adaptive ideas,
evolved together to provide guidelines for software development processes (Beck et al.,
2001). Rather than tools and processes, it put more emphasis on individuals and
collaboration between them, which will bring efficiency to the development project.
Crystal is a very flexible model which could easily be integrated with other agile models
and frameworks. The core idea is that there should be different treatment for every
different project based on the situation. Crystal method is a methodology toolkit in which
organizations combine elements from different methodologies to fit the individual
project. Organizations develop different combinations of methodologies which fit their
Theoretical Background 8
business needs (Livermore, 2007). It provides incremental development with not more
than four months. To deal with time constraint an active communication between the
teams and other stakeholders is recommended with support to lightweight documentation
(Farhan, Tauseef, & Fahiem, 2009).
2.3.5 Feature Driven Development
Introduced in 1997, FDD is an agile development methodology which consists of two
stages, first is to identify feature to develop and second is to develop that features
iteratively. FDD can effectively control complex and incremental projects. It offers
different capabilities to management including analysis and performs different tasks to
add the value to the customer, checking the project progress with time and budget
constraint, identify risks and ways to address and minimize them (Hunt, 2006). FDD
consists of different phases which consist of
Development of the standard model, the advanced and high-level model, and
the scope of the system is defined
Build a feature list, features are identified from information gathered after
modelling
Plan by feature, development plan for each identified feature is formed
Design by feature, detailed design of feature is defined
Build by feature, after detailed design teams start developing the feature
FDD can adapt to each project of a different nature (Firdaus, Ghani, & Yasin, 2013).
2.3.6 Dynamic Systems Development Method (DSDM)
DSDM is an end-to-end agile framework which not only addresses the whole project life
cycle but also addresses its impact on business. Like other agile frameworks, DSDM put
the focus on individuals rather than tools. It understands the needs of business and delivers
the software as quickly as possible by providing best practices for rapid application
development (Stapleton, 1999). Invented in 1994 while using Rapid Application
Development (RAD) by project managers, DSDM represents and utilize much of existing
knowledge about project management and became a famous framework among software
development community for handling complex business projects (Voigt, 2004). It is
known for its capability for rapidly delivering working software having budget and time
constraints with changing requirements. It is adequate for a large scale and also on small
Theoretical Background 9
solutions. The primary purpose of DSDM formation is to replace the unstructured
software development using RAD. Being an iterative and incremental approach, it
emphasizes continuous customer involvement. It has a total of four phases
Feasibility and Business Studies
In this phase, the technical feasibility of application is analyzed against the proposed
problem. In addition applicability of DSDM is also verified for that specific problem.
After carefully reviewing the business problem with system requirements, the underlying
architecture of the system is defined
Functional Model Iteration
The prototype of the required system is developed iteratively for early user review and to
get a clear understanding of the proposed system. The prototype is improved iteratively
based on user feedback.
Design and Build Iteration
At this phase, the prototypes are tested to verify their usability in the operational
environment. The components are refined further to achieve the required system.
Implementation
At this stage, the system is deployed, and the users are trained.
2.4 Business Intelligence (BI)
2.4.1 Introduction to BI
In the past two decades, BI has become increasingly crucial among business communities.
BI is known as a set of standard processes, tools, and technologies needed to look insight
into data for information and use that information to derive business plans and to make
critical decisions. BI is an umbrella concept which includes big data and BA and
knowledge management (H. Chen, Chiang, Storey, & Robinson, 2012). With the
lowering cost to store and process a vast amount of data, there has been exceptional
growth in the adoption of BI technologies from large enterprises in the last decade.
Businesses are employing sophisticated data analysis techniques to make business
decisions and to offer personalized services for customers (Chaudhuri, Dayal, &
Narasayya, 2011).
With BI, companies can lower cost and improve business opportunities. It also helps to
identify inefficient business process causing loss. Although BI comes up with huge
Theoretical Background 10
promises, implementing a successful BI project is not less than a challenge. Data must be
organized and processed before feeding the BI system to get better insights.
2.4.2 BI Architecture
The most essential in any BI architecture is the data warehouse and data marts on which
any BI system can fully function. The classic BI architecture consists of multiple layers,
including data sources, data warehouse, ETL layer, and end-user (Chaudhuri et al., 2011).
DW is the central part at which data from multiple sources are gathered and stored after
processing. The main purpose behind building a DW is that data must be stored centrally,
which later could easily be accessible for analysis and decision making (Narasimhan,
2014). A data warehouse is considered as a critical factor for any successful BI
implementation.
Figure 2 Business intelligence architecture (Own image following (Jasim Hadi, Hameed Shnain,
Hadishaheed, & Haji Ahmad, 2015)
2.4.3 BI architecture Layers
BI architecture has the following layers
Data Source Layer
Data source layer comprises of several data sources which are operational database and
external data sources. In a traditional setting, operational data refers to structured data
stored in relational databases. The external data sources consist of social media and
customer data (Jasim Hadi et al., 2015). The data from all these resources will be
transferred to DW.
Theoretical Background 11
ETL Layer (Extraction, Transformation and Loading)
The ETL is the process to Extract, Transform and Load the data from multiple selected
data sources to the data warehouse. In the extraction process data from external and
internal sources are identified and extracted. The Extraction process is critical as it must
be planned carefully as the later analytics process depends on this data. The organization
must decide about the relevant data sources. Extracting all the data or irrelevant data will
not be sufficient for decision making (Coelho Da Silva et al., 2018). The extracted data
is moved to the staging area. The extracted data is different and is not useful for analysis.
In the transformation process, data is cleaned, mapped and transformed by applying
multiple operations before moving it to the data warehouse. The concluding step involves
in the ETL process is loading. In this step, relatively massive datasets need to be loaded
to the DW (Chaudhuri et al., 2011). Hence it needs optimization for better performance.
Figure 3 ETL process
Data Warehouse Layer
At this layer, all the data collected from multiple sources is loaded into DW. A DW is a
storage location where data is stored to be directly consumed for reporting and analytics
purposes. The data serves as a single point for serving different organizational analytics
needs by preserving the data quality(Chaudhuri et al., 2011). In some cases, due to
complexities and organizational requirements, data warehouses divided into subsets
Theoretical Background 12
called data marts. Each data mart could serve the analytics requirements for different
departments (Chaudhuri et al., 2011).
End-User layer
End-user consists of dashboards and analytics portals of an organization. The reporting
tools display the data in a meaningful way. The end-user can manipulate and interact with
the dashboard to perform analysis.
2.5 Big Data
The term Big Data is used for large scale datasets, both structured and unstructured, which
are generated through day to day or other business transactions and activities. Diverse
data is produced from several sources in different sizes. As the name implies, the term
big data describes the data which cannot be captured and processed with traditional
databases. Data from numerous sources, including the internet of things (IoT), social
media and mobile devices data are the new sources of complex data. The Big Data sources
include data from sensors, devices, logs, transactions, social, web and many other types
of data from many different sources (Orton, 2015).
By analysing the big data, the researchers and analysts can find insights which could be
used by business users to make decisions. Different analytics techniques such as machine
learning, data science, natural language processing (NLP), and statistical analysis are
applied on data to get in-depth insights.
Big data have many characteristics which differentiate it from transactional data that are
Volume, Velocity, and Variety.
2.5.1 Three V’s from Big Data
Volume
Volume denotes the amount of data. In addition to structured data, the big data systems
have to process a large amount of unstructured data. Data could be a value from sensors,
a social media feed, and log data, mobile, and customer touchpoint data or any unknown
value (Swapnil & Anil, 2016). The size of the data can exceed to terabytes. Because of
lowing the price or storage cost, it has become a standard to store the data for different
purposes. Different technologies are available in the market to store huge amount of data,
including Hadoop, which store the data in a distributed file system known as HDFS
(Narasimhan, 2014).
Theoretical Background 13
Velocity
Velocity represents the speed at which data is created. As the flow of information is
continuous, it created increasingly at a breakneck pace. To provide a quick response and
to maximize efficiency, rapid processing is needed (Swapnil & Anil, 2016). In some
processes, data flow must be analyzed as soon as it is created from the source. Ability to
analyze real-time data provide actionable insights rapidly. The actual potential of data
could only be unleashed based on how quickly the system can process the generated data.
Velocity related challenges could be handled with new innovative data technologies like
NoSQL database(Swapnil & Anil, 2016).
Variety
As discussed before, data is collected from several sources which could be structured,
semi-structured and non-structured data sources. Structured data refers to relational
databases and is stored in tables, semi-structured data refers to XML, JSON and the data
from NoSQL, unstructured data consists of log files, text and multimedia content, sensors
data and many different kinds of other business data (Narasimhan, 2014). Variety refers
to the nature of data represented by various data sources with different data types. It is
the biggest challenge to analyze a large volume of data of various types (Swapnil & Anil,
2016).
2.5.2 Big Data Tools and Technologies
Due to the size and types of data involves, big data requires the latest tools and
technologies to process it. There are latest technologies available which not only provide
the extension of traditional databases but also perform storage and processing of big data.
Currently, Hadoop and Map-reduce are the leading tools to process big data. As a
programming model map-reduce work on the Hadoop framework. It performs parallel
processing by dividing the tasks into smaller sub-tasks (SINGH & SINGLA, 2015).
Hadoop
Hadoop is a distributed batch processing system used for parallel processing of big data.
Hadoop has three main components, HDFS, YARN, and MapReduce. It is a scalable and
distributed system which can work on minimum network bandwidth (Singh, Singh, Garg,
& Mishra, 2015). Hadoop is very flexible and low cost to manage and process large data
sets efficiently. It also does job scheduling for data and resources and provides load
balancing (Coelho Da Silva et al., 2018). Hadoop Distributed File System (HDFS) is the
Theoretical Background 14
most critical component of Hadoop to manage the data from different servers. It is file-
based and does not require a data model to store data. It can handle all type of data from
log files to XML and JSON files.
MapReduce
Developed by google MapReduce is the main technology which is implemented by
google in an open-source environment. MapReduce permits to write optimized programs
to process large chunks of data. The fundamental function of MapReduce is to divide a
big task into several smaller chunks and treat them accordingly (Chaudhuri et al., 2011).
MapReduce has a total of four components
Input The data collected to process is further divided into chunks which are allocated to
mappers. These small chunks are called input.
Mapper Mappers are the smallest unit assigned for processing
Reducer Output from mappers serves as the input for the reducer which ultimately
aggregates it for final output
Output All Jobs which are aggregated at reducer are collected here and represented as
the combined output
Hadoop Distributed File System
HDFS is a data storage element of open source Apache Hadoop project. Able to store any
form of data, HDFS runs on low-cost hardware and is highly scalable to thousands of
systems. Based on master-slave architecture, HDFS is a file-based system. It converts
large data files into standard 64 MB chunks and stores them in a massive cluster
(Narasimhan, 2014). There are two nodes, data node and name node in each Hadoop
cluster. The name node is considered as the master node. It controls all data nodes. Data
nodes are regarded as slave nodes(Coelho Da Silva et al., 2018).
Hadoop Technologies Description
Apache HBase HBase is a NoSQL database which runs
on the top of (Hadoop distributed file
system) HDFS. Written in java, Hbase
allows to read and write the data on the
HDFS simultaneously.
Theoretical Background 15
Apache PIG PIG is a scripting language. It is used to
analyze large datasets. With PIG the
developers can write the programs to
process large dataset from Hadoop cluster.
PIG convert the programs into map-
reduce jobs and execute them. Initially, it
was developed by yahoo for its data
scientists.
Apache Sqoop Sqoop was built to transfer bulk data
between relational databases and Hadoop.
It is used to import data between external
data sources to HBASE, HDFS or HIVE,
and vice versa. It allows parallel data
transferring ability. It uses SQL queries
and already saved data to perform import
and export and to perform updates
between Hadoop and relational database.
Apache Hive Just like SQL, Hive is a query language
called HiveQL. It was developed by
Facebook, mainly for data analysis. It is
adopted in many Big Data organizations.
It also allows querying the data from
HDFS, which later would be converted
into map-reduce jobs.
Apache Flume Flume assists in collecting and moving a
large amount of data with the help of
distributed services. With flume, the user
can utilize log data and can perform
parallel streaming to collect real-time log
data from multiple sources.
Apache Zookeeper Apache Zookeeper is an open-source
project initiated by Apache. It provides
centralized infrastructure to perform
synchronization across the whole Hadoop
Theoretical Background 16
cluster. It also provides naming services
and configurations throughout the cluster.
Table 1 Apache technologies and description following (SINGH & SINGLA, 2015)
2.6 Agile Analytics
2.6.1 Introduction to Agile Analytics
The volume of data is increasing rapidly through both internal and external resources, and
analytics tools are also getting more sophisticated to handle different domains of an
enterprise. Organizations are trying to find value through analytics in the face of complex
market competition. Decision-makers of any organization need rapid and precise
analytics insights for making timely decisions to be ahead of competitors. Quick changes
are required in the current system to cope with market threats, which ultimately end up in
the organization with many parallel system developments as the companies are not
prepared to handle the change at this pace. There is no unified and consistent approach to
integrate new technologies with the existing systems (Earley, 2014).
Agile analytics provide flexible analysis and adjusting change to specific needs rather
than following the rigid flow. Similar to agile methodology, Agile Analytics also
comprises of a set of rules and guiding principles. By focusing on end goals, it provides
data-driven predictions for decision making. It is a style for building DW, analytics, and
BI application. It offers faster data discovery, which is well suited to changing business
requirements. It focuses on iterative development lifecycle with an emphasis on finding
value by analysing multiple data points. It reveals new interpretations and answers from
result findings. It can analyze unorganized and raw data by prioritizing value over the set
of processes. It will give the flexibility to examine the datasets with no real value and will
also help to remove biases.
Theoretical Background 17
Figure 4.Analytics ladder (Own image following Tom, 2012)
Development of complex BI and analytics systems involves high risk, lengthy and
complicated planning and massive uncertainty about project success. With analytics
systems using the traditional system, a development method could result in a high-quality
product. However, it requires much planning and is comparatively expensive to adopt, as
it requires extensive design and documentation. Putting much effort before having even
a small workable solution could be a massive setback for the company if the project fails.
In that case, it is recommended to have only sufficient amount of planning, adoption to
change, preparation for risks and involvement of team members to productively achieve
the development targets. Often confused with methodology or framework, Agile
Analytics is a development style. In every BI an analytics project, every team must
support agile values and principles and adopt technical and team management and
customer collaboration (Collier, 2011, p. 3-6).
Although Agile Analytics delivers meaningful information at a swift pace, it requires
some changes in the organizational approach to handling the data. With technology
advancement, it has become easy for businesses to identify useful information from data.
Theoretical Background 18
Agile analytics can accommodate changes and can scales very fast upon new
requirements and impacts. It does not require long project timeline and huge investments
to start (Tom, 2012). Tasks from business use cases related explicitly to analytics are
assigned to cross-functional teams. For specific tasks related to another specialized area,
a member from other teams could also be injected in the analytics team temporarily.
With the main focus on high-quality BI and analytics solution, Agile Analytics have
multiple goals as explained below
Iterative and Incremental Development
Although it is considered as a modern approach, iterative and incremental development
dates back to mid-1950s. As the development world embraced agile methodologies,
incremental, iterative and evolutionary style of development widely applied, replacing
the traditional waterfall model. It offers incremental development of a system by
prioritizing user-valued features and evolve the system by continuous adaption of
customer feedback (Larman & Basili, 2003). Work is performed in small iterations from
one to three week and not more than four weeks. These small iterations with continuous
user feedback will keep the project in the right way.
Value-Driven Development
It is generally believed that goal-oriented development to create business value could
more effectively be achieved using value-driven development. The aim is to get a value
feature at the end of every iteration (Ktata & Lévesque, n.d.). In VDD, application
development is directly associated with business economics values. Irrespective of
complexities and steps involved in the development of data infrastructure, users are only
concerned about the outcome which ultimately would solve the business problem (Souza,
Moreira, Abrahão, Araújo, & Insfran, 2016).
Production Quality
After each iteration, the provided feature must perform as per business requirement. It
must be tested and debugged accurately at every stage of development. As the product
evolves with every iteration, the outcome should be a working functionality. Continuous
testing should be planned and performed rigorously to get a quality feature. The feature
should be accepted finally after getting approval from the end-user.
Theoretical Background 19
Automate Routines
Agile development discourages to perform routine processes manually. To be fully agile,
routine processes must be automated. Manual testing should be replaced by automated
testing. Manually testing every test case causes valuable time loss. Automated testing will
support to validate the system repeatedly to ensure product quality. With the emergence
of DevOps, Continues Integration and Continuous Delivery could be performed, which
will reduce considerable time to production. Analytics teams automate any repeated
process and can put more focus on developing quality features.
Team Collaboration
The analytics project requires expertise from different areas comprising software
development to data engineering and data science. Collaborations among all stakeholders
are crucial to project success. Establishing a workspace which provides a collaborative
environment and promotes positive collaboration among team members (Hossain, Ali
Babar, & Paik, 2009). Continuous interaction and collaboration between the user and the
development team with a daily meeting among technical team members ensure a
successful project.
Self-Organization and Self-Managing Team
Create self-organizing independent teams which can manage their tasks autonomously.
Hire the experts and provide them with the required tools, then allow them to be
successful on their own. Agile project management enables team members to work
independently and facilitate collaboration between end-user and other stakeholders in the
project. The teams will decide the features to deliver in every iteration and will be fully
responsible for committed modules.
2.6.2 Phases of Agile Analytics
Feasibility and Planning Phase
Being the first phase, it includes user requirements collection, identifying stakeholders
and develop product backlog. Customer expectations are being set after a deep
understanding of business and data. User stories which also serve as product backlog are
written and explained by the product owner. Features are planned to develop and deliver
independent from each other. Different tools and technologies are assessed and identified
to be used in the analytics phase. Different data sources are also identified to collect data
and infrastructure is also planned. Resource identification and allocation, along with the
Theoretical Background 20
release iteration plan, are prepared (Dharmapal & Sikamani, 2016). Cross-functional self-
dependent teams are made and assigned tasks for iterative development of identified
features. Despite core team members, other members could be added later as the product
evolves.
Implementation Phase
Features are prioritized and developed iteratively from product backlog prepared in the
planning phase. Also, data is collected from several data sources that will be transformed
into the standard scope and combined to central storage. Data cleaning is performed by
removing isolated and unrelated data. At this phase daily stand up meeting is performed
to increase collaboration between all stakeholders including data analyst, business
owners, development team and product. Collaboration among teams is essential for
successful and timely product development (Dharmapal & Sikamani, 2016). This phase
also allows parallel development of user interfaces (UI) for data visualization. Pre-
deployment model testing is performed to ensure that the newly developed model is
compatible with the existing system and interfaces. One of the critical features for Agile
Analytics framework is that it facilitates the parallel development of application
components including UI, Model development and visualization. After the pre-
deployment pilot phase, the results are evaluated from the management to assess the
business impact. If the outcomes fulfil expectations, the project will move to the next
stage, and if it deviates, the modifications will be performed. The project could also be
abandoned if the required output is not possible (Tom, 2012). Once the project passes the
inspection, it will be moved to the production environment. At this stage, the project has
deliverable consisting of new dashboard, reports and analytics solution (MUNTEAN &
SURCEL, 2013).
One of the principal value of agile development is that it can accommodate changing
business requirements, and the final deliverable would be a required product. Predictive
analytics could also be performed on data using analytics tools. Continuous integration
and testing for different environments in every iteration will ensure the required project
without any deviation (Dharmapal & Sikamani, 2016).
Final Phase
The final phase makes sure that all the requirements have been successfully implemented.
It is the final stage before the product is released. Final testing is performed, and models
Theoretical Background 21
are evaluated. The output product from this phase will benefit to end-user to make
business decisions.
2.7 Data Value Chain for Analytics
Data value supply chain describes the data flow process in an organization and explains
the essential tasks require to perform in each task to acquire valuable business value. It
has been used as a process model for the development of a decision support system. It
consists of multiple activities that need to be performed in a sequence to get valuable data
insights. In the information value chain, organizations employ multiple different activities
(Curry, 2016). A systematic review of each activity would be helpful to provide insight
into value generation and hurdles involved in that process.
The data value chain explains the insight into the data evolution, starting from collecting
raw data to perform analysis until the final predictions for decision making. The
framework to generate the value consists of four main activities by which organizations
create the commercial value of their data.
1. Data Discovery
2. Integration
3. Data Analysis & Delivery
4. Data Governance
2.7.1 Data Discovery & Acquisition
Every organization needs data sources to perform analyzes and make predictions; Data
discovery involves the collection, preparing and organizing the data sets (Miller & Mork,
2013). With the emergence of the latest data technologies, it has become possible to
collect, store and process a massive amount of data without having full information about
its structure and meaning. Hadoop is a commonly used software invented by google to
land a massive amount of data. It manages data storage and processing in a distributed
environment without any need to do data modelling (Chang & Larson, 2016). Considered
as a hub of data initiatives, including predictive analytics, machine learning, and data
mining application, Hadoop supports structured and unstructured data and gives the user
more flexibility.
The first step of the Data Value Chain is to collect the data. It could be from multiple
sources, consisting of both internal and external. Data could either be human or sensors
generated. In the case of social media when a user posts something, a range of data is
Theoretical Background 22
generated including user location, device information, users like and dislikes and content
of data. Considering the total number of that specific social media platform, the total
amount of data collected is beyond imagination. In other forms, all B2b transactions,
financial transaction, flight, and aviation systems are some of the other few examples to
understand the amount and significance of data generated through customer touchpoints
daily (The Data Value Chain, 2018). With the rise in the popularity and adoption of the
Internet of Things (IoT) and industry 4.0 digitalization, sensors and recording devices
also create a massive sum of data.
The first step is to create the inventory of available data sources and to evaluate the quality
against different parameters. The crucial step is to convert unstructured data into
structured data (Miller & Mork, 2013). After identifying and collecting the data, in the
next step, it is filtered and cleaned before putting it into the DW. Analytics is performed
over the data from DW. Data discovery is one of the biggest challenges in the whole
analytics process as the later steps fully dependent on this (Curry, 2016).
Figure 5 (Own Image following Nagle & Sammon, 2013)
2.7.2 Integration
After deciding the data sources and performing data acquisition, the data which is
collected from multiple sources would be transformed into the common representation
which is appropriate for performing analysis. Mapping is performed to relate the data
from multiple sources. Physical integration is performed in the form of DW. It is also
Theoretical Background 23
possible to integrate virtually through federated models. With the latest semantic web
technologies, it has become possible to query on combined data in a non-tabular form.
Relational databases are more suitable for tabular data (Miller & Mork, 2013). As the data
is integrated in a structured way, its value increased. Data is stored in a convenient way
for fast access to quickly perform analysis to get the strategically useful information (The
Data Value Chain, 2018).
2.7.3 Data Analysis & Delivery
At this stage, data is analyzed to get data insights for decision making and delivered to
the customer. It involves exploring the data to extract hidden information which could
make an impact according to the business needs. There are many different methods of
analysis, the simple and basic approach to define mathematical and statistical models
(Curry, 2016). There are many sophisticated methods in practice which are currently
being used to get insight into data. The results from this stage would be used to make
decisions. To better understand the problem, visualization is performed in the form of
static result report or application with interactive charts and graphs. By visualization, end-
users can easily understand the meaningful information to quickly grasp the problem and
make decisions (Miller & Mork, 2013). The final aim of the whole process is what actions
should be taken after visualizing the problem. If the visualized results are showing the
trends in loss, the stakeholders can take necessary steps and make a decision which could
positively affect the situation to convert the loss into reward. The end-user should be able
to grasp the information and interact with the application without any difficulty(Curry,
2016).
2.7.4 Data Governance
Data Governance is a collection of rules, policies, practices, and processes which ensure
the data management during the whole analytics process. It deals with the availability,
integrity, and security of data to make sure quality data is available for analysis at every
instance of analytics implementation phases. A Data Governance program consists of a
set of rules and procedures and a governing body to plan and execute those procedures
(Tallon, 2013). Considered as one of the main force for driving business value from data,
Data Governance is a success factor for any analytics initiative. It has a direct link to data
quality, compliance, and security of data.
Theoretical Background 24
2.8 Business Intelligence and Analytics
Increasingly available robust technologies to process BD, BI, and advanced analytics
have become very significant. The prospect linked with data analytics for different
organizations has helped to evolve techniques, technologies, practices, and
methodologies to analyze the business data to assist the business to understand market
opportunities and threats and to make the timely decisions. In addition to utilizing tools
and technologies, business analytics provides business-centric approach and
methodologies which have a productive impact on applications from various industries
including e-commerce, healthcare, security, financial institutions and more (Lim & Chen,
2013).
The terms Analytics, Business Analytics, Advanced Analytics, and Business Intelligence
are related to one another. Analytics can be well-defined as the process or practice which
analyzes, explore, visualize and find trends in data by utilizing information system,
statistical and operations research methodologies. Analytics is a common terminology
equally applied to all disciplines (Schniederjans, Schniederjans, & Starkey, 2014). It is
the ability to collect, manage and analyze massive datasets to get rightful information at
the fast pace and to present it to the management. It consists of not only tools and
technologies but the methodologies specific to this process. Some organizations have
adopted the latest technique and technologies in BI systems. Applying new techniques
like predictive modelling and interactive visualization in analytics is called advanced
analytics (Halper, 2017). Traditional BI systems provide transaction and cost
management activities with visual representations in the form of dashboards. It includes
ETL, DW and OLAP functionalities. With self-service and advanced data analytics
techniques, users can interact with user interfaces, perform predictive analytics, visual
data discovery, reporting, and data analysis in a more advanced and sophisticated way
(Halper, 2017).
Sometimes referred to as Business Analytics, Advanced analytics offers a range of
algorithms to perform complex analysis on structured and unstructured bulk of data. It
uses sophisticated machine learning models, statistical formulas and other cutting-edge
tools and technologies to find patterns and behaviour of data to make predictions and to
optimize decisions. It is not only an algorithmic process but a complete set of activities
consisting of the infrastructure of data management and processing, organizational
processes to adopt data-driven environment, and development techniques (Halper, 2017).
Theoretical Background 25
2.8.1 Levels of Business Analytics
There are many levels of analytics, but the three primary levels are discussed below
1. Descriptive Analytics
2. Predictive Analytics
3. Prescriptive Analytics
Descriptive Analytics
In the analytics process, the first step is to identify and reason about past events about
what has happened and why? Descriptive Analytics answer these questions (Mohr &
Hürtgen, 2018). Performed on big data, descriptive analytics help to discover hidden
information, and explain the characteristics and relationship among the characteristics. It
provides brief information about what has happened and why and what is happening right
now, e.g. pay-per-click and email marketing (Sun, Sun, & Strang, 2018).
Predictive Analytics
Used as a recommendation, Predictive Analytics describes the events which could happen
in future. Predictive analytics algorithms such as machine learning, regression analysis,
neural networks exist since long. Though, the infrastructure and technology
advancements in recent years have made it very convenient to use these algorithms in
analytics application. Predictive analytics have been widely used in the marketing
industry to better understand customer needs and preferences (H. Chen et al., 2012).
Prescriptive Analytics
In prescriptive analytics, analysis is performed on raw data, and the best solution is
recommended. It reasons the information about expected possible scenarios, past
performance, available resources, and suggests the strategy for future actions. It combines
and utilizes both descriptive and predictive analytics (Sun et al., 2018). Based on the
results from predictive analytics, it suggests all the favourable outcomes and proposes a
future course of action to bring in a particular outcome. It uses feedback and learning
system to improve the relations between suggested action and required outcome.
Recommendation systems used by Spotify and Netflix are well-known examples of
prescriptive analytics.
Theoretical Background 26
2.9 Past and Existing Agile Analytics Development
Ever since the introduction of agile development strategies in software development.
They have been proved very significant. However, there has not been satisfactory
research done on how to use agile methodologies for the development of an analytics
application. Development of analytics systems such as BI system is incremental and
always needs some revisions to implement changing business requirements (Ko &
Abdullaev, 2007). Agile methodologies can manage this environment. As being specific
to the organizational needs, there is no standardized approach to apply agile
methodologies.
There are some proposed agile methodologies for the development of individual stages
of the analytics process, but no particular methods for the development of complete
analytics solutions or incorporating change in the existing one (Knabke & Olbrich,
n.d.).In the book “Agile Analytics: A Value-Driven Approach to Business Intelligence
and Data Warehousing”, ken collier in 2011 introduced and implemented agile
methodologies that specifically deals with the development of Data-Intensive
solutions(Collier, 2011).
2.10 Agile BI
Agile meaning is the ability to be flexible to adopt change. Defined by Forrester Research
Agile BI is a methodology which combines all stakeholders including business processes,
organizational structure, tools and technologies to assist the decision-makers (Evelson,
2014). There is an abrupt shift towards agile methodologies from the traditional waterfall
approach. The popularity of agile methodologies gathered support and attention of people
of the data community (Kannan & Services, 2011). Adoption of fast analytics,
particularly machine learning and data science, natural language processing, predictive
analytics, and artificial intelligence, are not supported in traditional BI systems.
Traditional BI system mostly provides static views on structured data, which contradicts
the agile manifesto which allows to adopt changing business requirements. There are
some disadvantages of traditional BI systems mentioned below in the table
Disadvantage Problem
Massive duplication of data With every change, there will be a new
requirement to change duplicate data,
Theoretical Background 27
which causes data inconsistencies and will
reduce data quality.
Different tools required to perform
different tasks
As the meta specifications are not shared
across tools, different tools are used to
perform a different task which in turn
create data inconsistencies
Strong relational data models Due to structured relational data models, it
is very inflexible to make any change.
Analysis of external and unstructured data
is also minimal
Waterfall model approach With the waterfall approach, there are
very long development cycles with less
user involvement. Very inflexible to adopt
changing analytics requirements. Testing
is performed at the end.
Table 2 Disadvantages of traditional Business Intelligence Approach following (MUNTEAN &
SURCEL, 2013)
For BI projects, teams must be collaborative and well organized. The BI systems should
be developed with a focus on creating business value rather than getting more into
technological details. Typical BI systems consist of three layers consisting of ETL, DW
and Front End (Ko & Abdullaev, 2007). Organizations have multiple data sources from
internal to external data sources with many different types of data, including databases,
files, and emails, excel sheets, audio, and video data and more. Traditional BI system
does not utilize all the data but instead, rely on small fractions for the required result.
Some BI systems only depend on structured data (MUNTEAN & SURCEL, 2013).
The BI projects should be planned iteratively rather than provide all in one package.
Initially, small but working functionality would lay the foundation of later effective
delivery until the maturity and successful product delivery (MUNTEAN & SURCEL,
2013).
Most of agile BI project teams use Just-In-Time planning with iterative and incremental
planning to accommodate rapid change in the requirements. Customers provide
continuous feedback and work alongside the development teams to keep development in
line with business needs. Teams deliver partial but fully functional features after each
Theoretical Background 28
iteration. The product evolves with every iteration to accommodate all changes. Teams
and customers collaborate to accommodate changes and to work together to achieve the
desired product (Leybourn, 2013). Customers expect fast delivery without waiting for
months. Agile methodologies can make a positive impact on users by delivering value
information quickly rather than waiting for months. The approach of applying agile
methodologies on BI systems would be how to get the required information rather than
building a software application (Chang & Larson, 2016). Agile BI provides timely
information in a proper format to the right person.
Application of agile in BI is still not much researched upon in academia, but it is a buzz
word and highly discussed in the business world. The most essential and highly applied
aspect of agile is agile software development. Although software development using agile
methodologies is widely known, agile business intelligence is a less known but a vital
aspect. There are some success factor for agile BI discussed below
2.10.1 Factor Effecting Agile BI
Iterative Development Life Cycle
In agile software development, the whole process is iterative and incremental. Rather than
spending months and years to come up with finished product consisting of all required
functionalities, the agile development put more focus on rapid delivery of partial
functionalities get the customer feedback to refine the features for the next iteration. This
iterative and incremental approach also well suited for large scale analytics projects. In
BI implementation software is the constant, while the other things include creating the
data connections between software and data sources and creating the user experience are
variable. As the user experience is consists of a large part of BI systems, profound user
involvement is critical in successful implementation (Intelligence, n.d.). Teams are
working together in an iterative process to extract and derive useful knowledge in each
iteration. In BI, each iteration is based on user stories, and the end of each iteration, the
release represents each story (Intelligence, n.d.). The whole systems are divided and
developed in small chunks. The primary purpose of iterative development is to get
customer feedback and to validate the project progress, which means is that every
iteration should provide working functionality. One of the critical task to apply agile
methodologies to analytics projects is to adopt a personalised project management
process which supports iterative and incremental approach (Collier, 2011).
Theoretical Background 29
Flexible to Adopt Changes
The development process should flexible to adopt changes as the product evolves. The
organizations need to define the strategy to predict and cope with the sudden changes.
The changes could be different types depending on nature. Internal business changes
could be a change in strategy or reorganization (Strohmaier & Lindstaedt, 2005). External
changes could be the emergence of latest technologies and change in demand dimension.
Agile analytics is to identify and respond to changes quickly. The system must be able to
predict future changes by using collected data and should respond rapidly (Woolley,
Melbourne, & Hobbs, 2008). BI applications are developed to answer the strategic
question according to changing and new business needs. Agile BI could be viewed as the
ability of a business to provide valuable information in response to rapid changes. In
advanced business analytics solutions, the ability to realize the upcoming changes by
performing data comparison will provide the ability for early adoption (Sanaa, Afifi, &
Darwish, 2016).
Automate ongoing BI Process
Foundation of creating the analytics application utilizing agile methodology depends on
its ability to automate the repetitive task and to focus on quick insights into data to derive
fast decisions (Valentine & Merchan, 2016). The routine manual process should be
automated, and teams should implement automated continuous implementation and
deployment. With cutting edge latest technologies like machine learning and artificial
intelligence, and data science systems are capable of automating as much workflow as
possible. Increased automatization and intelligence of BI systems will increase the
efficiency to analyze in a short time by reducing manual inputs (Sanaa et al., 2016). Time
for each iteration could be minimized by automating testing and deployment (H. M. Chen,
Kazman, & Haziyev, 2016).
Team and User Collaboration
Collaboration among teams, team members and customers are pivotal to successful
implementation. As compared to traditional approaches, agile put more focus on
collaboration, which in turn enhance productivity. In an agile setting, customers work
alongside development teams until project maturity (Kruchten & Gorans, 2014).
Collaboration with the customer is valued over contract negotiation. Changes made
alongside development iteration could be reviewed quickly to validate customer
expectations. Without customer feedback, functional documents will define how the
Theoretical Background 30
product should look like; these documents sometimes deviate from customer
expectations. With the agile approach, customers can see the working solutions
throughout the whole project (H. M. Chen et al., 2016).
Technology/ Technical Factors
Applying agile methodologies to the BI system will change the way the infrastructure
was being used for testing, build, and deployment from many different teams. The
induction of new Agile Development and Operations (DevOps) tools that optimize and
reduce the time to delivery and support frequent releases to end-users. Tools and
technology support is essential for any agile team to be critical (Kruchten & Gorans,
2014). Although considered as the most crucial factor in BI development, all other things
should be in place before the technology phase. Data for business intelligence is collected
from multiple sources; the sources could be structured, semi-structured or unstructured.
Its ranges from spreadsheets, logs and sensors data, JSON files and databases. Data
representation is different for each source, and some sources will have data quality issues,
which later could affect the development process and lead to an unsuccessful
implementation (H. Chen et al., 2012). To tackle this problem, organizations create an
integrated data source to make it consistent with using with BI and analytics system.
Every organization has a different approach for data integration, which ultimately decides
project success (Williams & Williams, 2007). Considered as the most critical and integral
to BI, the DW integrate and serve the data to BI systems, and the way teams handle it
affect the agility in BI systems (Baars & Zimmer, 2013). One of the significant
components of the analytics system is frond-end application. The business value created
through processing on extracted data could only be understandable by the user if
displayed correctly at front-end (Nagle & Sammon, 2013).
Front-end technologies are directly accessed by users, hence critical to Agile BI
implementation. To be agile in BI implementation, the whole components of this process
should be agile (Krawatzeck, Dinter, & Thi, 2015). Agility in front-end application could
be achieved by introducing new tools or by modifying the existing ones as needed. There
could be multiple different tools involved in this process (Baars & Zimmer, 2013).
Alignment of Goals and Business Value
Goals and objective for implementing Agile Analytics must be cleared. Every business
has different reasons and objectives; some adopt it to increase productivity, to improve
customer involvement, manage project risk and to address the quality problem and more.
Theoretical Background 31
There could be plenty of reasons to shift to an agile way. The goals must be cleared and
evaluated against the success factor. The organization should also analyze how the
objective supports business goals. It will ensure achieving a strategic objective by
enabling the improvement in analytics processes (Collier, 2011 p.302).
Training and Coaching
Formalized way of transferring knowledge is represented as training. Agile training help
to understand agile practices, principles, and values to newly formed Agile groups.
Sometimes referred to as a scrum master, the agile coach is a vital member of an agile
community who has deep understandings of agile techniques. The coach could be a
mentor, collaboration conductor, and problem-solver. Successfully adopting agile
methodologies require a focus on coaching and training. All team members, including
managers, need agile training (Collier, 2011).
Literature Review 32
3 Literature Review
3.1 Introduction to Literature Review
The literature review is the way of exploring by targeting a specific area of research to
define the approaches to answer the questions. Focusing and elaborating the work which
has been done on a specific topic, it may give the contextual knowledge to perform higher
work or to answer at its own(Levy & Ellis, 2006). It analyzes and combines information
from scholarly literature about significant issues by comparing and evaluating main
arguments, methodologies and approaches.
According to (Webster & Watson, 2002), the literature review is the process of reviewing
the existing literature to create a foundation for proceeding knowledge. It assists in the
development of theory and to uncover areas where more research efforts are needed. It
could be a necessary part of a research report, article or dissertation and could also be
used as a standalone research paper.
The literature review must be structured to facilitate the reader to grasp the idea entirely.
Following are the “five C’s” of writing literature review discussed in the table below
Cite The researcher should always keep the focus
specific to the research.
Compare Compare the arguments and theories
presented in the research paper or literature to
get the authors finding and agreements. Find
out who has employed the same approach?
Contrast Contrast several arguments, methodologies,
and approaches explained in the literature to
find the point of disagreement and debate.
Critique Find out the persuasive arguments and the
fact behind its persuasiveness. Which
findings are the most reliable ones and why?
Put more attention to the assertive verbs used
by the author.
Connect Connect the findings with the area of research
and draw perspective based on what has been
said in the literature.
Literature Review 33
Table 3 “five C’s” for literature review following (Labaree, n.d.)
The difference of literature review to the book review and annotated bibliography
Annotated Bibliography Book Review Literature Review
Annotated Bibliography
reviews and aggregate the
relevant sources and
explain its significance
with relevance to the
research question.
It involves analysing and
evaluating the book.
A literature review
involves surveying all
relevant literature to
identify already revealed
and hidden facts about a
particular topic.
Table 4 Difference to Literature Review
3.1.1 The need for conducting a Literature Review
Form a concrete theoretical foundation for research
One reason behind conducting the literature review is to analyze and explore the research
and outcomes which has already been done regarding the specific area of interest. It lay
the solid foundation to perform quality research based on analysis and synthesis of
available literature. Although all information from literature is not equally accurate, the
researchers need to use quality literature as a basis for further studies. It ensures the
legitimacy of research performed and brings out reliable findings (Levy & Ellis, 2006).
By providing the foundation for a proposed area of research, it also authenticates the
existence of the research problem (Abbott & Grady, 2011).
Conducting a useful literature review not only provides a theoretical foundation for
further studies but also help to choose the precise research methodology for further
studies. It should also provide the researcher to justify the selection of specific
methodology as to why applying that approach is more suitable in that specific area of
research (Webster & Watson, 2002).
How and which literature fits with the research topic and vice versa
Based on the concept-centric approach, to conduct a useful literature review, the
researcher must continuously validate how the following literature is related to his area
of research. The answer to this question will keep the researcher on track to related
literature. Furthermore, the researcher should focus on the literature, which corroborates
with the problem related to the research area on which the researcher is working. It will
Literature Review 34
give the solid ground to the researcher to perform research on that problem (Levy & Ellis,
2006). Not only should the researcher try to make the literature fit into his study but also
their study should fit into the literature body of knowledge.
3.1.2 Stages of Literature Review
The simplest version of conducting a literature review can be explained in three steps, 1)
Input, 2) Processing, and 3) Output. In this, the main and the most critical is the
processing. It involves performing multiple activities sequentially, including collect,
comprehend, analyze and synthesize. The output of the literature review should contribute
something new to the body of knowledge. Strictly following the three steps guarantees
the effective literature review for a beginner researcher (Levy & Ellis, 2006).
Figure 6 Stages of Literature (Own image following (Levy & Ellis, 2006))
3.1.3 Structure of the Literature Review
The structure of the literature review consists of three main parts
1. Introduction
2. Body
3. Conclusion
Introduction
The introduction describes the topic with some elaboration about its significance and a
statement about the conclusion that the researcher will extract after analysis and
synthesizing the review. Moreover, in this, the researcher also discusses the importance
of research related to the research question (Skene, n.d.).
Literature Review 35
Body
In this section, the researcher examines the past research with a focus on agreements,
arguments, and methodologies. With the focus on the area of interest, the researcher will
find the gaps and will state how his work will respond to it (Andersson, Beveridge, &
Singh, 2015). Each paragraph in the body should deal with the separate theme of the
research topic. Several reviews are synthesized in each paragraph to make a clear
connection between different sources. Each source must be critically analyzed to
contribute to filling the research gap (Willans, n.d.).
Conclusion
A conclusion provides the overall summary and findings of the literature review
(Andersson et al., 2015). It explains in detail the overall literature and which literature
leads to conclusive findings. It provides the answer to the research questions and gives
suggestions about future work in this direction (Willans, n.d.).
3.2 Selected Research and Literature Review Methodologies
3.2.1 A Short Guide to Action Research (Johnson, 2012)
Action research is proposed by Kurt Lewin in 1940 in the United States. Using it changed
the way the researcher interacts with the research setting. The name “action research”
represents its working, where the work is not separated from investigation to conclude
and propose the actions to solve the problem. According to (Johnson, 2012), action
research deals with a variety of investigative and analytic research to identify problems
and weakness. It involves studying a real school situation to improve the quality of direct
practices. It is a systematic and sequential way to observe the practices applied, their
problems and recommend possible treatment to solve that particular problem.
The action research consists of five essential steps
1) Identify a problem or topic area to perform research to determine the course of
studies.
2) Decide which data should be collected, identify data sources and decide the
frequency of data collection.
3) Collect and analyze data from sources identified in the previous step.
4) Describe findings and how they could be applied, create a plan to apply a course
of action based on the finding from the analysis step.
Literature Review 36
5) Share findings and plan of action with others
Action research does not progress in a linear sequential way always. Based on the
situation, some or several processes need to repeat to get to the required results.
Although there is much freedom to collect, analyze and present the findings, action
research is still a very systematic approach. It contributes to finding the answer to
questions which are unknown. It is not a very complicated method and is considered as
very problem-focused, precise and accurate methodology. To adequately apply this
methodology, an in-detail study must be planned before starting to collect data.
The length and time duration for every action research varies. It depends on how big the
problem is and the time needed for data collection and analysis. Observations should be
at regular intervals independent of its duration. By performing a detailed literature review
and relating the conclusions to it delivers a contextual understanding of research.
Figure 7 Steps for Action Research Process (Own image following (Johnson, 2012))
Literature Review 37
3.2.2 Guidelines for performing Systematic Literature Reviews in
Software Engineering (Kitchenham & Charters, 2007) and
Procedures for Performing Systematic Reviews (Kitchenham,
2004)
What is Systematic Literature review?
According to (Kitchenham, 2004), a systematic literature review is a systematic way of
identifying and evaluating the already performed research related to a specific topic or
field of study.
A thorough and fair literature review performed systematically. It is considered to be of
high scientific value, as the later research depends on the outcomes from this. It gives the
basis for carrying out systematic reviews. Considerably taking more time and effort than
a traditional review, the systematic review provides thorough information regarding the
effects of certain phenomena’s against research setting and empirical methods. If the
outcome is consistent, this is the indication that the phenomenon is robust. In other cases,
the reason and source of variation could be investigated.
A rigorously performed literature review in a systematic way would assist the research
process in multiple ways
Help to summarize and synthesize the existing research and evidence concerning
to conduct and outcomes
Identify research gaps identify and suggest the area for further investigation
Helps to avoid duplication of work
Assist in designing framework shape and the future research investigation plans
Review Process
A systematic review consists of multiple activities. The number and order of activities
performed differ according to different guidelines. In the guidelines presented by
(Kitchenham, 2004), the author has divided the review process in three-phase, with each
phase consisting of multiple different activities.
1. Planning the review
2. Conducting the review
3. Reporting the review
Literature Review 38
1. Planning the review
The need for systematic review
Before beginning to perform a systematic literature review, the researcher must ensure
whether it is needed. The need for systematic review arises when researchers want to
summarize all the phenomenon related to the research topic in an unbiased way.
Specifying Research Questions
In the planning phase, the most crucial step is to define the research question. The
researchers must have a clear and concise research question.
Structure of Research Question
Before performing SLR against the desired topic, the research question must be complete
to follow PICOC method.
Population, identified as to who? Population refers to the target group in the area
of investigation
Intervention, defined as what or how? Intervention address the specific issue or
area of research to perform an investigation
Comparison, identified as compared to what? Comparison specifies what the
investigation will be compared to.
Outcomes identified as and specify what the required outcomes researcher want
to achieve are
Context represents the settings of circumstances and environment of investigation
Development of a Review Protocol
A review protocol states the underlying method which would be used to perform the
systematic review. A structured and well-defined protocol will make sure the outcomes
are not biased.
2. Conducting the Review
Identification of Research
The primary purpose of the systematic review is to find as many as possible related
literature specific area of studies and research question. Generate search strategy to look
for relevant literature in the database. Define keywords and generate queries to retrieve
documents. Search strategies are mostly iterative.
Literature Review 39
Study Selection
After the relevant studies have been searched and collected, they will be assessed based
on their relevance. A selection criterion is defined and the literature which meets that
criteria are selected.
Literature Review 40
Figure 8 Steps to perform Systematic Literature Review (Own image following (Okoli &
Schabram, 2010))
Literature Review 41
Study Quality Assessment
After the literature is selected based on inclusion and exclusion criteria. The quality of
the literature is assessed. The quality represents how much the research is bias and has
maximum validation.
Data Extraction
Data extraction is performed by doing full-text studies of the selected literature and then
record the information in the form.
Data Synthesis
The results from the primary studies are collected and summarized at this stage. Synthesis
could be descriptive occasionally supplemented by quantitative summary. Using
statistical techniques to achieve quantitative synthesis is called meta-analysis. The
synthesis activities must be defined in the review protocol.
3. Reporting the Review
Effectively communicating the result from the review process is very important. To make
sure that the later researcher could evaluate the finding and process study performed. The
final report should contain all the details.
3.2.3 Scoping studies: Towards a methodological framework (Arksey
& O’Malley, 2005)
There are several terms which have been used to describe the scoping review in literature
including scoping study, systematic scoping review, scoping project, scoping report,
evidence mapping and many more. In the available literature, it has also been defined by
many authors in different ways.
According to the definition discussed in the paper by (Arksey & O’Malley, 2005), the
scoping review purpose is to quickly record the critical concepts from literature which
supports the research area with the evidence sources available. When targeting a complex
area which has not been rigorously reviewed, it could also be considered as a standalone
project.
As proposed in definition, scoping review involves a complete review of the literature.
The extent of detailed analysis of extracted literature always depends on the research
question and study purpose. There could be multiple reasons to conduct a scoping study,
the four most common of them discussed below.
Literature Review 42
1) It is performed to study the variety and nature of research activity. In this type of
research, findings are not discussed in details.
2) Sometimes mapping is performed to assess the feasibility of conducting
systematic review relevant to the research area.
3) To describes detailed findings in the area of research study, and provide a
mechanism to summarize those findings which the policymakers and consumers
can quickly utilize.
4) Scoping review identifies research gaps by summarising the literature.
According to the methodological framework developed and adopted by (Arksey &
O’Malley, 2005), consists of several stages before giving a conclusion or findings. There
are a total of five stages
Stage 1 Identifying the research question
Stage 2 Identifying relevant studies
Stage 3 Study selection
Stage 4 Charting the data
Stage 5 Organising, summarizing and reporting the results.
Comparison of Systematic Literature Review and Scoping Review
Label Description Search Appraisal Synthesis Analysis
Scoping Review Initial assessment of
potentially available
research literature
according to the
scope of the study.
The goal of this
review is to
recognise the extent
of possible research
evidence.
Search
depends on
time and other
constraints.
No formal
quality
check
Mostly in
tabular
form with
some
narrative
comments.
Describe the
quality as well as
quantity of
literature by critical
features. Tries to
specify a practical
review.
Literature Review 43
3.3 Chosen Methodology
Systematic reviews are a type of research synthesis which is conducted to categorise and
retrieve findings which is relevant to a specific research topic. In the following thesis,
Systematic literature review is used as a research methodology to answer the research
question.
As discussed in detail above SLR is suitable in the following situations to
1. There is a need to find standardized international evidence
2. To Check the current practice, variations to that practices or to classify new
practices of the research area
3. There is a need to search, recognize and report the areas which have the
potential for future research
4. Need to Inspect the conflicting results
5. Produce the statement to use as decision-making output
According to the following indication, SLR is best suited to our requirements (Munn et
al., 2018). The SLR identifies and interprets already available research to answer the
research questions by summarizing the findings and perform synthesis on those findings
(Kitchenham & Charters, 2007).
The main steps during the research performed namely planning, conducting and reporting
the review by following the guidelines proposed by (Kitchenham & Charters, 2007)
Systematic
Literature
Review
In SLR, it tries to
search, appraise and
synthesize research
evidence
systematically by
following the
provided principles
and guidelines.
Perform an
exhaustive and
comprehensive
search.
The
inclusion
and
exclusion
criteria are
determined
through
quality
checks.
The
synthesis
is
narrative
with
tabular
form.
Describe what is
already known, the
recommendations,
unknown
phenomena’s and
uncertainty about
findings, and future
research
recommendations
Table 5 Comparison of Systematic Reviews with Scoping Reviews (Own table following (Grant & Booth, 2009)
Systematic Literature Review 44
4 Systematic Literature Review
4.1 Planning the Review
In planning, research questions are proposed, a search strategy is defined along with
search terms and search query, inclusion and exclusion criteria are defined. Following
steps are explained in detail
4.1.1 Review Objectives
With already growing usage of agile methodologies in software development, data-driven
organization have also put a focus on employing agile methodologies for analytics
application development. In analytics, software development is not the whole process but
a part of the analytics process lifecycle. Timely information is more critical than working
functionality. Employing traditional methodologies have not reaped the same result as
agile in software development. With successfully implementing the agile methods in
software development, these methodologies could also be advantageous for delivering
successful analytics projects (Nagle & Sammon, 2013). There have been several
contributions to the research related to agile adoption in analytics application, but there
is still no agreement and unobstructed views on the research topic. The main objective
for doing this research is to have an in-depth understanding of agile practices in the data-
driven environment and the challenges faced during the agile adoption. The agile best
practices, enablers and success factors will also be discussed.
4.1.2 Research Questions
RQ1: How have agile methodologies evolved with the emergence of data science?
RQ2: Which agile frameworks, methodologies and best practices are well known for
data-intensive applications and how they could be applied?
RQ3: What are the enablers and success factors for Agile Analytics?
RQ4: What are the challenges organizations faced for adopting agile methodologies for
analytics applications?
4.1.3 Search Strategy
After proposing the objective for underlying research study and research question, a
search strategy is defined to retrieve and analyze the already available literature specific
Systematic Literature Review 45
to the research area. It involves determining the search sources which are mostly
electronic databases. The studies were searched and retrieved from specified databases.
The references to studies are also analyzed to discover other but related literature. After
that, inclusion and exclusion criteria’s are applied to the documents which were retrieved
through search from the electronic database and reference search.
4.1.4 Search Criteria
The criteria defined to retrieve the literature from an electronic database consists of two
parts. The first part of the search string includes words related to agile and the second part
related to data analytics.
Database Resources
Database URL
ACM Digital library https://dl.acm.org/
IEEE Xplore https://ieeexplore.ieee.org
SpringerLink https://link.springer.com/
ScienceDirect https://www.sciencedirect.com/
Google Scholar https://scholar.google.com/
Table 6 Database resources for literature
Example Database search query
Agile AND (“business intelligence” OR “data analytics” OR “advanced analytics” OR
“big data analytics” OR “BI” OR “data science” OR “data-driven” OR “analytics”)
Keeping this search string as an underlying theme. For each database, the search query is
generated according to the specification provided with that database.
In a standalone scenario, the following query is also applied.
“Agile Analytics”
There are also some queries used in combination with individual keywords from agile
and data analytics. In the actual search queries will be modified according to digital
database search requirements.
Agile Approaches AND (“business intelligence” OR “data analytics” OR “advanced
analytics” OR “big data analytics” OR “BI” OR “data science” OR “data-driven” OR
“analytics”)
Systematic Literature Review 46
Agile methodology AND (“business intelligence” OR “data analytics” OR “advanced
analytics” OR “big data analytics” OR “BI” OR “data science” OR “data-driven” OR
“analytics”)
Agile project management AND (“business intelligence” OR “data analytics” OR
“advanced analytics” OR “big data analytics” OR “BI” OR “data science” OR “data-
driven” OR “analytics”)
Inclusion and Exclusion
From all the literature retrieved through a search query, the most relevant research which
satisfies the inclusion criteria is selected.
At this stage, inclusion and exclusion criteria’s are applied to the selected literature
iteratively.
Inclusion
The study and the research performed are in English.
The study is conducted between 2002 to 2019.
The paper should describe the agile method and practices applied in any of
development phase of an analytics application.
The literature gives insight into the adoption of agile in a data-driven environment.
The study must be related to the research topic.
Full text of the selected literature must be available.
Exclusion
Literature which is not directly focusing on the agile methodologies and practices
studies that do not discuss agile methods in context to a data-driven environment.
PowerPoint presentations or slides, keynotes or tutorials.
Agile methods applied in manufacturing or other industrial fields which do not
involve data-oriented or analytics application.
Duplicate paper from different databases.
Research literature other than the English language.
4.2 Conducting the Review
In this phase, the findings and documents retrieved from the search and retrieval process
will be discussed in detail
Systematic Literature Review 47
4.2.1 Search the Literature
The main goal of a systematic literature review is to find the possible number of literature
related to the research area. To fulfil this requirement database search is performed
according to search strategy using keywords and search strings. At first, preliminary
searches were performed to examine the initial search results and fine-tune the query
accordingly. Top results were compared with the most relevant papers which were already
downloaded for testing. Searches were performed using the most common query string
consisting of a single word or sentence. The search results were highly dependent on the
keywords or search string used and were easily influenced by the minor change. Search
is also performed to filter based on the titles. Though results showed that search string
consisting of multiple keywords has some more precision than single keywords.
Figure 9 Iterative search cycle (Own image following (Jacobs, 2015)
After that, using the search strategy defined in the last step, the databases were searched,
and related literature was retrieved. In the first step, there were vast numbers of results
returned by each digital database, especially google scholar, which returned total in
thousands. At google scholar, there were some duplicate references to other digital
databases. From the original search from all databases, in total, a few hundred documents
were retrieved.
Systematic Literature Review 48
4.2.2 Selection of Literature
After that inspection is performed based on inclusion and exclusion criteria. The selection
and inspection process is iterative, in round one by extensive study of titles and abstract,
documents which satisfy these criteria were selected. The search on all databases returned
hundreds of papers in total. After this stage full-text filtering is performed. There were
also vast numbers of irrelevant documents which were discarded. After the first round,
we still had a considerable amount of papers left. We also have to apply exclusion criteria.
After carefully applying the exclusion criteria, we left with a total 196 of documents.
Doing the 3rd round of review, the final documents count is 47.
By referencing the base search strings and keywords as defined in the last steps, these
strings were modified for each database. The search strings mentioned in Table 7 was not
used precisely in the current form but modified and refined based on the search results.
Like in SpringerLink, we explicitly filtered the result based on the word “agile” in the
title. Similar modifications are also performed on other databases. Results from google
scholar are not mentioned in the table as it returns a massive number of documents which
are duplicate to already retrieved documents from different databases. In most cases, it is
referring to the other database documents which we had already extracted. However,
some papers which are retrieved from google scholar and added were checked through
full-text search along with inclusion and exclusion criteria in the final round.
Search String ACM IEEE Springerlink Science direct
Total Selected Total Selected Total Selected Total Selected
Agile AND (“business intelligence” OR “data
analytics” OR “advanced analytics” OR “big
data analytics” OR “BI” OR “data science” OR
“data-driven” OR “analytics”)
12 9 690 128 95 50 96 15
"Agile Analytics"
5 2 4 3 14 7 9 6
agile AND ("data analytics")
5 1 61 12 1 1 686 96
agile AND ("business intelligence")
3 2 30 9 35 17 11 4
agile AND ("advanced analytics")
2 2 6 2 5 1 136 32
Systematic Literature Review 49
agile AND ("big data analytics")
4 2 21 7 31 7 2 2
agile AND ("data science")
6 4 19 9 11 10 1 1
agile AND (“data-driven”)
22 7 34 11 40 26 76 11
Table 7 Query results after search and initial filters
Snowballing
Snowballing is a data-gathering methodology used in the SLR. The method utilizes the
reference leads to get to more useful data. The reference list of literature is searched and
followed to discover other useful but related sources (Wohlin, 2014). In the following
research snowballing demonstrated good result to find content related to Agile Analytics.
There was comparatively less literature retrieved through standard search because of
Agile Analytics being a very new concept.
Figure 10 Snowballing step (Own image following (Wohlin, 2014))
Systematic Literature Review 50
Papers gathered after the search and applying filtration were evaluated, and references
were followed. Forward snowballing was applied to identify referenced paper from
already selected papers. This method of gathering the data is suitable for relatively new
topics. Agile analytics is a new trend, and not much research was available on this topic.
Also, in industry, it has not yet been implemented widely. With the help of snowballing,
it was possible to get to the most relevant literature.
4.2.3 Literature Quality Assessment
Quality assessment criteria are applied to assess the methodological quality of the selected
literature. It consists of a list of questions. The answers to these specific questions provide
the measure to the extent to which the selected literature will contribute to the area of
research.
The quality assessment following the criteria was applied for the reason that it will help
to investigate the effectiveness of synthesis findings. This type of quality measures
criteria’s has been applied in systematic literature review since long (Dybå & Dingsøyr,
2008).
Each paper was evaluated according to quality assessment questions defined. Against
every question, there is a rating scale. Scale range from 0 to 1 with research accurately
answering the question rates 1, not answering at all rate 0 and partially answers were rated
0.5. The questions for quality assessments were
C1. Does the explicit context of the research defined in the paper?
C2. Do the objectives of selected literature are well defined?
C3. Does the research findings are ambiguous and are not clearly stated?
C4. How valuable is the area of study based on the research findings?
Systematic Literature Review 51
Figure 11 The selection and filtration stages of primary studies (Own image following (Ahmad
et al., 2013))
The quality assessment criteria were applied on all 47 papers, which were finally selected
after applying inclusion and exclusion criteria and rounds of filtering. As the area of
investigation related to agile analytics is entirely new, there are not many studies
available. Therefore after applying inclusion and exclusion criteria’s, moderate quality
assessment was performed with a less strict criteria’s. Criteria’s are clearly defined and
applied in context to the application of agile methodologies in the analytics environment.
After performing the quality assessment, there was a total of 17 papers left.
Sr.No Title of Article/Journal Authors Publication
Year
P1 Agile big data analytics:
AnalyticsOps for data science
Nancy W. Grady
Jason A. Payne
Huntley Parker
2017
P2 Comparing Data Science Project
Management Methodologies via a
Controlled Experiment
Jeffrey Saltz
Ivan Shamshurin
Kevin Crowston
2017
P3 Agile Project Management
Approach and its Use in Big Data
Management
Patrícia
Franková
Martina
Drahošová
Peter Balco
2016
Systematic Literature Review 52
P4 Agile big data analytics
development: An architecture-
centric approach
Hong Mei Chen,
Rick Kazman,
Serge Haziyev
2016
P5 A Review and Future Direction of
Agile, Business Intelligence,
Analytics and Data Science
Victor Chang,
Deanne Larson
2016
P6 The Goal Questions Metrics for
Agile Business Intelligence
Hadeer Sanaa,
Walaa Afifi,
Nagy Darwish
2016
P7 Big data analytics using agile model Surend Raj
Dharmapal,
K. Thirunadana
Sikamani
2016
P8 Model for Assessment of Agile
Methodology for Implementing Data
Warehouse Projects
Kuldeep
Deshpande,
Bhimappa Desai
2015
P9 How to make business intelligence
agile: The agile BI actions catalogue
Robert
Krawatzeck,
Barbara Dinter,
Duc Anh Pham
Thi
2015
P10 Agile Business Intelligence:
Collection and Classification of
Agile Business Intelligence Actions
by Means of a Catalog and a
Selection Guide
Robert
Krawatzeck,
Barbara Dinter
2015
P11 An Architecture for Agile Machine
Learning in Real-Time Applications
Johann Schleier-
Smith
2015
P12 Agile BI : The Future of BI Mihaela
MUNTEAN,
Traian SURCEL
2013
Systematic Literature Review 53
P13 A Classification for Business
Intelligence Agility Indicators
Henning Baars,
Michael Zimmer
2013
P14 Business Intelligence and Analytics Hsinchun Chen,
Roger H.
L.Chiang,
Veda C Storey
2012
P15 Acceptance of agile methodologies:
A critical review and conceptual
framework
Frank K.Y Chan,
James Y.L.Thong
2009
P16 Future research challenges in
business agility - Time, control and
information systems
Markus
Strohmaier,
Herwig Rollett
2005
P17 Beyond Flexible Information
Systems : Why Business Agility
Matters
Markus
Strohmaier,
Stefanie N
Lindstaedt
2005
Table 8 Selected Journals and Conference Papers
Systematic Literature Review 54
4.2.4 Data Extraction and Synthesis
The data analysis is performed before extracting data. There are two methods of analysis
discussed here.
1. Literature Spread Sheet
Referring to the guidelines proposed by (Kitchenham & Charters, 2007), after the
selection of literature, the analysis was performed to identify and extract relevant
information. In this process, an excel sheet was set up to record the core ideas and
concepts, methodologies and findings the selected literature. Summarizing in the findings
in the excel table provides a high level of understanding and clarification.
Following are the example data that could be extracted in addition to some other relevant
details
Title of the paper
Authors
Review date
Relevance to the research area
Methodology applied
Data Analysis
Validation techniques
Limitations
Future work
Publication year
After the following data extracted from the papers, content analysis was performed to
describe the nature of each study.
2. Annotated Bibliography
An annotated bibliographies are the list of summaries of the selected paper. The primary
purpose of the annotated bibliography is to get a summary and evaluation of the article.
The summary of the paper discussed the concepts, methodologies used, relevance to
research, limitations and other important concepts related to our research topic.
Beside annotation, the bibliographic information of the article includes
Article title
Systematic Literature Review 55
Author name(s)
Publication year
Publication title (journal, report, book etc.)
There are multiple styles available to write bibliographic information. In this research,
the APA style is adopted. Annotated Bibliography of all selected literature is performed
and explained.
Chosen style
The Literature Spread Sheet style is a way of data analysis, which is more suitable when
more than one researchers are working on the same topic, and there is a need to compare
the results. In this case, thesis research is performed by one person. Hence Annotated
Bibliography was adopted for data analysis.
Data Synthesis method
There are multiple methods for performing data synthesis, both qualitative and
quantitative. In this research, the descriptive qualitative synthesis method is applied.
Among several different methods available, two methods which are most suitable to this
research type were discussed briefly. In the end, one method was chosen and applied to
the collected data. The following two methods are discussed below
1. Meta-Ethnography
2. Grounded Theory
Meta-Ethnography
The primary purpose of SLR is to form the synthesis based on the findings. Synthesis is
considered as a vital stage of SLR. Meta-Ethnography is an approach in which the main
purpose is to perform synthesis by incorporating identified and selected concepts from
multiple studies into the high-level theoretical structure. This approach is contrary to data
aggregating quantitative approach like a meta-analysis. It has been applied vastly in health
science, but there are very fewer approaches where it has been applied in software
engineering or computing (B Silva et al., 2013). There are a total of seven steps of
performing the synthesis of collected data
1. Getting started
2. Deciding the relevant literature
3. Reading the studies
Systematic Literature Review 56
4. Determining the relations between studies
5. Translating the studies into one another
6. Synthesising the translation
7. Expressing the synthesis
As there are multiple steps which have already been performed during the standard SLR
process. The synthesis after the analysis is performed following these iterative steps.
Grounded Theory
Grounded Theory is a method of synthesis. The primary assumptions and features of the
grounded theory include (Barnett-Page & Thomas, n.d.)
It provides concurrent phases of data collection along with data analysis
It is an inductive analysis approach
The theory is developed based on the data
The constant comparison method is utilized
Use of theoretical sampling
Generate the new theory
It is known as one of the best methods to produce theories. Grounded theory is well known
for its ability to discover theory about the phenomenon with less explanation available
about that phenomena. Grounded theory is sometimes also defined as the systematic way
of discovering the theory from obtained data. It is ideally useful when to uncover social
processes. The researchers employed grounded theory to find patterns and interactions
among different social units (Chun Tie, Birks, & Francis, 2019).
Systematic Literature Review 57
Figure 12 Grounded Theory Process (Own image following(Chun Tie et al., 2019))
Characteristics Grounded Theory Meta-Ethnography
Focus Developing the theory based
on data
Describe a culture-sharing
group
Best suited for Best suited for understanding
the interaction in social
phenomena
Described shared pattern
of a social group
Discipline
Background
Sociology Anthropology and
sociology
Unit of Analysis Process action and interaction
of individuals
Culture sharing groups
Data Collection
Forms
Interviews with individuals
In the qualitative literature
review
Observation, documents
and artefacts
Systematic Literature Review 58
Data Analysis
Strategies
Using coding Theme and description of
the case and cross-case
studies
Witten report Generate theory The detailed analysis
consists of cases
Table 9 Grounded theory and Meta-Ethnography difference (Own table following (Khan, 2014)
Chosen Method
By analysing the characteristic of both methodologies and suitability for the following
research topic. The Grounded Theory method was chosen for data synthesis.
Concept Matrix
Concept Matrix is an effective technique to organize the research after writing annotated
bibliographies and summarizing the concepts of all the papers. It is a technique employed
by the researchers to present the association between the available research articles and
the research topic. Each column in the matrix represents the concept intersection between
different articles for a broader topic(Ochoa, n.d.).
Sources Concepts
Concept 1 Concept 2 Concept 3 Concept 4
Source 1 ✖ ✖ ✖
Source 2
✖ ✖ ✖
Source 3 ✖ ✖
…. … . . . . .
✖ ✖
Table 10 Concept Matrix (Own image following (Müller-Bloch & Kranz, 2015))
Concept matrix supports in recognizing opportunities for synthesis. It describes the
precise depiction of the research topic by analysing the overlapping concepts. Synthesis
of all the article occurs when the bigger picture is displayed after answering the related
questions.
What are the aspects the authors agreed related to the research topic?
What are the disagreements?
What are the concepts which are discussed in specific articles but not in others?
How has the research aspect been changed between different articles?
Systematic Literature Review 59
What are the questions raised between articles?
By answer the following questions and some others, concepts are organized for synthesis
purpose. Construction of the matrix also depends on personal creativity, originality and
proficiency of the researcher (Klopper, Lubbe, & Rugbeer, 2007). Before the synthesis
process, the concepts must be grouped and presented logically. Concept matrix is
considered as an effective way of communicating findings and insights.
Concepts
Title
A
gil
e
A
gil
e B
I /
Anal
yti
cs
Big
Dat
a /D
ata
war
ehouse
Agil
e B
I /
Anal
yti
cs F
ram
ework
s
Agil
e A
nal
yti
cs
Agil
e M
ethodolo
gy
Chal
lenges
of
Agil
e B
I/A
nal
yti
cs
Lim
itat
ions
of
Agil
e B
I/ A
nal
yti
cs
Agil
e A
nal
yti
cs E
nab
lers
or
Succ
ess
Fac
tor
Pra
ctic
es
Dif
fere
nce
to o
ther
agil
e m
ethods
Agile big data analytics:
AnalyticsOps for data science
✖
✖
✖
✖
✖
✖
✖
✖
Comparing Data Science
Project Management
Methodologies via a
Controlled Experiment
✖
✖
✖
✖
✖
Agile Project Management
Approach and its Use in Big
Data Management
✖
✖
✖
✖
✖
✖
✖
Agile big data analytics
development: An
architecture-centric approach
✖
✖
✖
✖
✖
✖
✖
✖
✖
A Review and Future
Direction of Agile, Business
Intelligence, Analytics and
Data Science
✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Systematic Literature Review 60
The Goal Questions Metrics
for Agile Business
Intelligence
✖
✖
✖
✖
✖
✖
Big data analytics using agile
model
✖ ✖ ✖ ✖ ✖ ✖ ✖
Model for Assessment of
Agile Methodology for
Implementing Data
Warehouse Projects
✖ ✖ ✖ ✖ ✖ ✖ ✖
How to make business
intelligence agile: The agile
BI actions catalogue
✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Agile Business Intelligence:
Collection and Classification
of Agile Business Intelligence
Actions by Means of a
Catalog and a Selection
Guide
✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
An Architecture for Agile
Machine Learning in Real-
Time Applications
✖ ✖ ✖ ✖ ✖ ✖
Agile BI : The Future of BI
✖
✖
✖
✖
✖
✖
✖
✖
✖
A Classification for Business
Intelligence Agility Indicators
✖
✖
✖
✖
✖
✖
✖
✖
Business Intelligence and
Analytics
✖
✖
✖
✖
✖
✖
✖
✖
✖
Acceptance of agile
methodologies: A critical
review and conceptual
framework
✖
✖
✖
✖
✖
✖
Future research challenges in
business agility - Time,
✖
✖
✖
✖
✖
Systematic Literature Review 61
control and information
systems
Beyond Flexible Information
Systems : Why Business
Agility Matters
✖
✖
✖
✖
✖
Table 11 Concept Matrix
4.2.5 Present the Review
In the last phase, results from the literature review analysis are presented. The papers
which were searched and collected from multiple electronic databases will be used to
present the findings. The data was collected from the articles and papers using reading all
text and writing short Annotated Bibliographies and later was synthesised by the data
synthesis approach. After that, based on the finding, results are presented.
From 17 selected papers related to different aspects of the area of research, each paper
was analyzed based on context, research question and empirical findings. The studies
were conducted using multiple different methods.
Year-wise Selected Papers
From selected papers, the bar chart is displayed with the number of paper selected yearly
in Figure 13. The image shows the number of papers published in a year from 2005-2017.
Most of the papers, a total of 6 are from the year 2016. As the area of study is quite new,
much of the work has been performed in recent years.
Systematic Literature Review 62
Figure 13 Year-wise selected publications (Own Image)
Research Methodology
There were multiple methodologies used in different research papers. Most of the selected
paper has a different type of literature review as a research methodology. Total 58.8% of
content have utilized this methodology. Detail of each research paper against the article
is displayed below in the table
Case Study Number of papers Percentage
Experiment 3 17.6 %
Survey 2 11.8 %
Case study 1 5.9 %
Literature Review 10 58.8 %
Others 1 5.9 %
Systematic Literature Review 63
Figure 14 Percentage of methodologies applied in selected papers (Own Image)
Type of Research Articles
During the search and selection process, a considerable number of articles were selected
and sorted, and finally, 17 articles were selected. Out of the following selected papers,
total 8 were Journal papers and 9 were conference papers.
Type of Article Total number Percentage
Journal 8 47.1 %
Conference 9 52.9 %
Table 12 Type of Research papers
Systematic Literature Review 64
Figure 15 Research papers type (Own Image)
After collecting and summarizing all the papers. The following findings are extracted from
each paper
Paper Title
Summary/Excerpts/Findings
Research
Approach
Type of
Article
Agile big data
analytics: AnalyticsOps
for data science
The author discussed the challenges faced during big data
analytics lifecycle, especially with the introduction of
advanced analytics. Traditional project management
techniques cannot be applied in the same fashion.
Machine learning and data science concepts need to be
adopted with an agile approach to an analytics solution.
The tailored agile approach should be adopted
specifically to the project. The author proposed a new
Data Analytics process model called Data Science Edge,
which can adapt to new tools and technologies. It serves
as a complete process model for knowledge discovery.
By adopting agile models, the outcome would be quicker.
The DSE proposed a conceptual framework which adapts
to changes required in advanced analytics.
Literature
Review
Conference
Systematic Literature Review 65
Comparing Data
Science Project
Management
Methodologies via a
Controlled Experiment
The author has compared multiple agile methodologies to
measure and compare their performance on data science
projects. There is not much information available to
execute data science projects using agile methodologies.
There is a need for well-defined process model
specifically designed for data science projects. Different
agile methodologies were tested for data science projects.
According to experimentation result, Kanban turned out
to be the most accepted methodology concerning factors
like ease of use, project results, the satisfaction of
individual team members and the willingness of the
teams to work on future projects.
Experiment Conference
Agile Project
Management Approach
and its Use in Big Data
Management
The author compares different views from a management
perspective to find the agile approach. The argument is
about whether there must be a standard approach for all
projects, or there must be a tailored approach for each
different project. Big Data projects are the main focus
point. The principles of agile manifesto are recommended
for big data management. The author concluded that agile
methodologies could be applied for smoother execution
and delivery of big data projects. The agile approach is
suitable for Big Data Management (BDM), considering
the tools and technologies factors.
Survey Journal
Agile big data analytics
development: An
architecture-centric
approach
A methodology specific to Big Data Analytics is
recommended. The methodology is known as
Architecture-Centric Agile Big Data Analytics (AABA).
Big data analytics is crucial to increase business
efficiency. The author states the architectural challenges
faced during the big data analytics process. Success is
dependent on the performance and robustness of the
underlying architecture. The other challenges are of
adopting an agile methodology for the teams belongs to
Case Study Journal
Systematic Literature Review 66
different areas of expertise. Architecture centric approach
by following the agile methodology is recommended for
successful implementation. AABA methodology
explained in detail. AABA being more architecture
focused is slightly different from Agile Analytics.
A Review and Future
Direction of Agile,
Business Intelligence,
Analytics and Data
Science
With the introduction of advanced analytics like data
science, BI projects have changed the way of applying
agile methodologies. Agile methodologies must be
altered and tailored according to the changing
requirements. The author summarized the changes
required to incorporate new tools and technologies in the
existing working environment. The author compares and
proposed the new development lifecycle for advanced
analytics projects. A framework for BI delivery is also
presented by including the new advanced and fast
analytics processes.
Literature
Review
Journal
The Goal Questions
Metrics for Agile
Business Intelligence
This paper has discussed the agile business intelligence,
agile BI framework and data warehouse. Explained the
challenges and discussed the suitability of GQM for
performance measurement of the agile BI teams. The
GQM can guide the teams to achieve their desired goals.
The questions are defined against the defined metrics.
The answers to the questions provided the measurements
against the team’s performance.
Experiment Journal
Big data analytics using
agile model
The author provides guidelines about how agile
methodologies could be applied to Big Data projects. The
author explains each phase and step of analytics delivery
and explains how agile methodologies could be adopted
to each step. The author proposed three phases planning
phase, development phase and closure phase. The author
explains different roles in the agile analytics process.
Literature
Review
Conference
Systematic Literature Review 67
Model for Assessment
of Agile Methodology
for Implementing Data
Warehouse Projects
The paper provides guidelines for assessing the suitable
agile methodology according to the project’s needs.
Failure in data warehouse projects is due to using the
traditional waterfall model. In some cases, agile
methodologies have also not able to overcome failures,
primarily because each project needs a tempered
approach. Analytical Hierarchy Process (AHP) can be
used to analyze factors which cause for selecting agile for
data warehouse project. The author explained step by step
approach to AHP. The steps include deciding for goals,
comparison, comparison result and priority matric,
accuracy check, and generalized model. The factors
which cause selection of agile method are requirement
status, organization culture and process, team
effectiveness, data environment infrastructure.
Survey Journal
How to make business
intelligence agile: The
agile BI actions
catalogue
The paper discussed the competitive requirement in the
business context, which is volatile and changing quickly.
The author discussed how agile methodologies could be
utilized to react quickly to the changing requirement in
the BI system. A structured approach to react to change
is missing in agile BI current approach. BI action
catalogue is recommended, which consists of a set of
action and steps needed to adopt agility in BI. The author
divides the actions into four categories, which are
Principles, process models, techniques, and technologies.
Total of 21 actions is recommended.
Literature
Review
Conference
Agile Business
Intelligence: Collection
and Classification of
Agile Business
Intelligence Actions by
List of actions is summarized based on the categories
defined to increase the BI agility. Selection guide consists
of a series of action that is recommended by the author.
Schema is defined for different levels of selection guide.
Thirty-one actions are recommended. Literature review
Literature
Review
Journal
Systematic Literature Review 68
Means of a Catalog and
a Selection Guide
methodology is applied to analyze and synthesize the
collected data and summarize the results.
An Architecture for
Agile Machine
Learning in Real-Time
Applications
The use of agile for a machine learning project is entirely
new. The author discussed the ways how machine
learning for advanced analytics projects could embrace
agile methodologies for smoother development and
delivery. With the introduction of advanced techniques in
analytics, there is a need for a new sophisticated process
model which could be tailored according to project needs.
Agile methodologies are applied in software
development. The architecture, along with agile
methodology approach, is recommended. The benefits of
the recommended approach are improved collaboration,
natural real-time processing and quick iterations.
Concluding the article author asserts that agile
methodologies could be used for implementing machine
learning projects in data science teams.
Experiment Conference
Agile BI : The Future
of BI
The author tried to answer several questions related to
agile BI. What is agile and how it is crucial for BI? What
are the factors and essential features which support agile
BI? The main components and layers of traditional BI
architecture are discussed. The traditional approach is
compared with an agile approach. Due to a massive set of
problems with this approach, the agile approach is
recommended with the benefits explained for each step.
There are three main components of agile BI, which are
agile development, agile business analytics, and agile
information infrastructure. Agile BI is useful for
changing user requirements, and to effectively handle the
failures of IT system to meet the business needs, and fast
delivery to market. The key elements to promote agile BI
is recommended.
Literature
Review
Conference
Systematic Literature Review 69
A Classification for
Business Intelligence
Agility Indicators
Meaning of agile is different for BI systems. BI
functionality is a critical decision-support tool for top
management and decision-makers. Agile is considered to
be a practical methodology for BI development. There is
not much work has been done until now on the following
topic. The case study method is used with interviews
from 14 experts. Industries with different areas are
chosen for analysing how they are applying agile
methodologies to their BI projects. BI architecture and
agility of the target companies are studied. The agility of
enterprise should not be measured with BI agility. BI
agility must be measured after the requirements are
precise and mature. BI agility is split between different
organizational components. Agility must be divided and
measured at each component. BI agility divided into three
components BI functional agility, BI scale agility, and BI
content agility.
Case Study Conference
Business Intelligence
and Analytics
The latest data technologies have increased the
competition among the companies. Management needs to
make quick decisions to be ahead in the competition. BI
responsibilities have changed and evolved with data
technologies. Emerging areas of BI&A are identified
using the research framework. The BI&A is divided into
three versions, BI&A 1.0 to 3.0. Each the aspects
including application, data, analytics and impacts, were
discussed. The current and future landscape of BI&A
explained.
Literature
Review
Journal
Acceptance of agile
methodologies: A
critical review and
conceptual framework
A framework for applying agile methodologies have been
recommended. With the adoption of agile methodologies
in software, it has become smoother for teams to work
iteratively in an incremental way. Much of the challenges
handled efficiently with agile methodologies, which was
Literature
Review
Journal
Systematic Literature Review 70
not possible earlier with the traditional waterfall model.
Earlier, organizations faced difficulty with adopting agile
methodologies. The framework for acceptance of agile
methodologies is discussed and defined. The factors
involved in acceptance are motivation related factors like
career consequences and organizational culture, training
and external support, opportunity related factors, agile
methodology characteristics, and knowledge
management outcomes. Knowledge management could
be used to examine the agile methodology acceptance.
Future research
challenges in business
agility - Time, control
and information
systems
The author discussed the challenges faced by the
organization in the form of competition. The businesses
need to be able to adapt changes quickly. This ability to
adapt changes quickly is called business agility. The
dimension of problems concerning to business agility is
time, control, and information system. There are two
directions of business agility research, which are, organic
IS and decision support system. The organic IS serves as
autonomous IS. Decision support systems strengthen the
human qualities in the control system. The triadic
problem of time, control and information system will
remain a challenge.
Literature
Review
Conference
Beyond Flexible
Information Systems:
Why Business Agility
Matters
The current state of the information system can handle
the dynamic needs of businesses? Agility for any
organization depends on how flexible are the information
systems of that organization. Guidelines are proposed,
which helps the businesses to adopt maximum agility. B-
KIDE framework for business oriented modelling of
organizational knowledge. The following framework
address the knowledge to better understand process
flexibility.
Literature
Review
Conference
Table 13 Papers findings and summary
Conclusion, Limitations and Future work 71
5 Conclusion, Limitations and Future work
5.1 Conclusion
From all the studies performed using systematic literature review on the application of
agile methodologies in BI, analytics and DW, there exist multiple challenges in adoption
due to less practical information available. The impact of Agile BI on business agility is
not yet studied rigorously. The thesis aims to identify agile methodologies practices in
Business Analytics environment, and the issues organizations faced and the new
challenges with the introduction of new techniques and technologies. There has been
much technological advancement in data storage and handling with data moving from
relational databases to NoSQL databases. The new techniques and approaches like data
science and machine learning have been introduced in analytics. There are multiple new
and robust technologies for Big Data management. The agile methods have applied to
most of the cases, and the outcome was better than the traditional approach. With
customer involvement, every delivery after each iteration is a working functionality. The
agile approach has addressed the problems of each stage of analytics lifecycle from data
acquisitions to tools election, modelling issues and presentation.
RQ1: How have agile methodologies evolved with the emergence of data science?
Due to the successful implementation of agile methodologies in software development,
the analytics and data science practitioners are trying to incorporate agile methodologies
following agile manifesto. There have been multiple approaches to data science from a
software development approach to architectural approach, and working on the
development of a statistical model. All of these approaches somehow needed an agile
approach with incremental iterations.
As compared to software development, data science requires more responsiveness. Agile
promises this through the responsive development lifecycle and continuous feedback
through validation. It makes sure that there is a continuous correction and also provides
learning to the model. Teams collaborate to improve model performance. The agile
approach provides all these characteristics by keeping the project quality and deadlines.
Conclusion, Limitations and Future work 72
Figure 16 Agile Data Science lifecycle (Own image following (Schleier-Smith, 2015))
As shown in Figure 16. The agile data science lifecycle starts with an understanding of the
problem. The solution to the problem is proposed by creating new features. Training and
test data is created and tested. After that model is deployed to the production environment,
its performance and results are observed. If there is a need for improvement, a new
iteration is started with new ideas and required improvements. The iteration will continue
until the model is stable.
According to (Schleier-Smith, 2015) theoretical framework to perform data science with
agile involves the following areas
Iterative development
Intermediate testing
Prototyping
Data visualization and understanding
Structuring the data by data-value pyramids
Follow the critical path
Document the whole process
The theoretical framework also highlights the importance of iterative implementation, test
and training of algorithm until deployment and later observation. In each iteration,
intermediate work is shared with the teams to incorporate the feedback. In Figure 17, as
explained the process path of retrieving the value from the data.
Conclusion, Limitations and Future work 73
Figure 17 Agile Analytics Pyramids (Own image following (Schleier-Smith, 2015))
The Data Scientist mostly work in the upper layer of the pyramid. The tasks performed at
each layer are mentioned in the pyramid with the respective layer. At the prediction layer
algorithms are trained, and in the end at action layer, discovered information is utilized
to create business value (Schleier-Smith, 2015).
The process model based on CRISP-DM called Data Science Edge. The DSE
recommends the BDA lifecycle and explains how it is different from BI. These are
Scope
Data Acquisition/Discover
Analyze/Visualize
Validate
Deployment
Based on the experiments performed on the actual data science teams working in an agile
environment at Walmart, a checklist is recommended consist of actions that should be
performed before putting the code in production.
The data scientist should be able to describe the accuracy of model and CPU
consumption
All type of debts must be minimized
Everything should be documented
The weakness of the model must be known to improve in the next iteration
Conclusion, Limitations and Future work 74
DataOps techniques with tools and technologies have also been employed by data science
agile teams to automate the processes and reduce the time of analytics development
lifecycle.
RQ2: Which agile frameworks, methodologies and best practices are well known for data-
intensive applications and how they could be applied?
There are multiple methods and frameworks available. The selection of the specific
method or framework depends on the requirements. Irrespective of methodology or
frameworks, teams in agile analytics use just in time planning with iterative and
incremental delivery to incorporate sudden changes when the requirements change. After
each iteration, the product evolves due to continuous involvement and feedback of the
customer. Teams collaborate with the customer to towards required results.
Depending on the nature of requirements, BI type and context, there are multiple agile
frameworks. To some extent, all of the frameworks and methodologies required tailoring
according to business needs and type of BI system. Some of them are discussed below
Scrum
Used as a project management framework, Scrum is famous for its incremental nature of
product delivery.
Kanban
It is a project management methodology used as continuous workflow management.
Team’s activities can be visualized.
Test-Driven Development
In this approach, the developers write the test case and then develop the components to
pass that test case. It ensures quality control in each life cycle.
Extreme programming
It is an agile framework which provides sustainable delivery within the teams.
As explained in the paper (Deshpande & Desai, 2015), the author explained the model to
assess the suitability of applying agile methodology to the analytics projects. There are
numerous aspects which need to be taken into considerations before moving towards
agile.
Conclusion, Limitations and Future work 75
In the paper (Sanaa et al., 2016), based on the best practices, the author recommended a
framework for Agile BI. The framework consists of five phases.
Discovery phase involves analysing the data sources and requirements
Design phase involves designing the BI architecture
Development phase objective is to deliver working software functionality
Test included performing the final testing before deployment
Deploy the working functionality after each iteration
DataOps or AnalyticsOps is the methodology recommended specifically for data-
intensive projects which are using data science and machine learning. It enhances the fast
time to value by promoting cross-functional collaboration. With an emphasis on people
and processes, it enhances productivity by enabling agile and iterative workflow. The
DataOps must be employed by several different functional groups to deliver business
value. The following are the functional groups
Data Engineers
Data Scientists
Developers and Architects
Operations
IT security and Data Governance
In another article, an architecture-centric approach called Architecture-Centric Agile Big
Data Analytics (AABB) is recommended. It addresses both technical and organizational
issues. It is different from other agile analytics approaches as it focuses more on
architecture as a key enabler for success in agile adoption.
CRISP-DM and the extension called Data Science Edge (DSE) process model
recommended by SAIC in the paper (Grady, Payne, & Parker, 2017). It serves as a model
for knowledge discovery. As compared to processes of agile BI analytics. DSE has the
following steps
Plan
Collect
Curate
Analyze
Act
Conclusion, Limitations and Future work 76
The planning phase involves the planning of analytics systems, data sources and
requirements management. Collect phase handles the data collection process from
different data sources. Curation of data is to map the data to integrate data into a central
location. Analyze phase includes the design and development of the knowledge
discovery. Last is the Act phase, which defines what actions have to perform based on
the knowledge discovered?
RQ3: What are the enablers and success factors for Agile Analytics?
The key enabler for analytics projects of any organization depends on how the
organization is planning to embrace agility. The business itself is agile or not. The data
itself, selected methodology and the technologies have been the key enablers for agile
analytics. The technology enabler has a specific focus on how the data is handled from
data collection to data warehouse and its use for later in the machine learning or data
science models. The availability of self-service capability and interactive visualization of
analytics on a range of data. The users of the analytics projects are mostly management
and decision-makers. They must be able to perform multiple interactive tasks without any
assistance from IT teams. The technology advancement and available set of tools enable
the user to retrieve the required result very quickly. There are multiple approaches to
design among them we discussed is Architecture Centric Big Data Analytics (AABA) (H.
M. Chen et al., 2016). Software architecture has a central role in adapting agile
methodology. Although technology has a vital role in agile analytics, it heavily depends
on organizational practices and how quickly and to what extent they are following agile.
In this thesis and our findings, we have discussed multiple approaches of agile
methodologies of adapting agile to analytics project. In other research questions, multiple
methods and frameworks are also discussed and explained.
The success factors as explained in the literature, among them some we have discussed
already in the literature are (Sanaa et al., 2016)
Iterative development life cycle
Flexible to Adopt Changes
Value-Driven Development
Production Quality
Automate Routines
Team Collaboration
Conclusion, Limitations and Future work 77
Self-Organization and Self-Managing Team
Training and Coaching
Customizing the agile approach
Leadership
Communication and transparency
RQ4: What are the challenges organizations faced for adopting agile methodologies for
analytics applications?
There are multiple challenges involved in agile adoption. Some of them are specific to
analytics projects, and some are related to organizations moving towards agile in generals.
In case of analytics projects, if the organizations have not adopted to agile methodologies
at all, then moving towards agile, they will face multiple challenges specific to analytics
and agile adoption in general. Some of them are discussed below
User involvement
User experience is a critical component of BI and analytics. User must be involved in
every phase of the implementation as possible to achieve the best results. All-time contact
with the user is sometimes hard as the user could have a different schedule (Deshpande
& Desai, 2015). To make the agile implementation plan which coincides with the user
schedule is quite challenging.
Absence of Data ownership and Data governance
In most cases, organizations are struggling with information management. The
identification of data assets and acquisition based on the ownership of data is quite
challenging. For ongoing agile analytics process, a continuous availability, integrity, and
security of data is of great concern and require many efforts.
There is no or minimal automation and TDD
Test-driven development is considered as a proven methodology for producing a quality
product. Sometimes organizations have not adopted this process and also performing
routine tasks manually rather than automating the process (Collier, 2011 p.225).
Lack of CI/CD practices
Like in software development DevOps technologies provide CI/CD capabilities which
have fasten the time to delivery. CI/CD also improves collaboration among the teams,
Conclusion, Limitations and Future work 78
which is the fundamental principle of agile methodology. In analytics, adoption of CI/CD
is relatively new and not much adopted.
The undecided priority of immediate or long term goals
When deciding to apply agile methodologies, Sometimes, organizations do not have a
clear vision about the project and their organizational needs. They are always confused
between immediate results or long term projects.
Technological difficulties in Agile Analytics
Tool support
There are multiple tools available for a different task in BI lifecycle. For TDD and
ETL tasks, data warehouse and automation tasks, the available tools are not
mature enough like in software development.
Data volume
The management of the volume of data is a great challenge. Providing the data for
analysis from multiple data source is a processing challenge.
Heavy Lifting
There always needs much efforts in the backend to perform simple analytics tasks.
Development of DW after collection of data multiple sources is a great challenge
as it requires much efforts.
Organizational challenges
Lack of motivation
One of the organizational challenges is that the organizations are not motivated to
adopt change. The benefits of adopting agile over current processes are not clear.
Complex hierarchy
A change in hierarchies and working environment needs a lot more efforts.
Organizations sometimes stuck into a bureaucratic process and are not willing to
move towards more flat hierarchies.
Not willing to invest
Moving towards agile needs massive investment in terms of hiring professionals,
training and coaching and some other organizational stuff. The companies with a
limited budget for an agile transformation are sometimes reluctant for such an
investment.
Conclusion, Limitations and Future work 79
5.2 Limitations
The idea of Agile Analytics is relatively new, and there is not much known about the
impacts of applying it in the analytics projects. As there are not much actual case studies
are available till now, this is considered as the limitation of the study as to know the real-
time effects. However, the overall impacts which were proposed in the literature were
studied. The organizational policies and the extent to which the organizations needs to
tailor the agile methodologies are also not covered in this scope of the thesis. The other
limitation was the absence of a quantitative approach. The literature sample set for a
qualitative approach was also not diverse and did not have the detail about actual
implementations.
5.3 Future Work
Agile Analytics is a relatively new context, and there have not been many publications
on this topic. Application of agile in analytics has been there for a while, but most of them
are tailored approaches of software development methodologies. As shown in our article
search part, there has not been many results for Agile Analytics. The concept itself is
entirely new and mixed with the induvial function or a part of the analytics life cycle.
Much of the available literature was related to Agile BI. The use cases were from
traditional Bi systems with obsolete requirements.
There has not been much discussion about new perspectives such as Data Science and
Machine Learning in analytics, especially in BI. The nature and the context of the
traditional agile approach are not producing the required results. Agile Analytics is the
specialized agile approach to analytics, including BI and data warehouse projects. It has
not been widely applied in the real world, and there has been less information available
through practitioners. This topic needs extensive investigation as there are many areas
which are still unexplored.
There have already been some attempts to explore the different context of agile analytics,
including frameworks and methodologies, enablers and success factors, challenges
organizations face during adoption, and the best practices involved in the adoption
process. The research provided a good understanding of applying agile methodologies to
Big Data Analytics. Agile methodologies are already making their space in the decision-
support system, but there is still a need for improvement. In the following thesis, several
different areas related to agile analytics are investigated and reviewed based on the
available literature.
Conclusion, Limitations and Future work 80
There could be many directions for further research, as the selected topic is very vast. The
research in academia has started to rise, but there is not much theoretical work has been
performed primarily in case of project management, there are very fewer case studies
available. Still, there is a need to perform further studies through case studies as it
provides real insight into the working and implementation of agile practices and provides
with a point of view from industry practitioners. Studying on the organizational
environment of companies where agile has been or going to be adopted in the analytics
would give a whole lot of new information. The companies approach towards agile
analytics along with technical capabilities could be studied. The outcomes could be
compared with the ideal outcomes and investigate the challenges faced during adoption
was tackled by respective organizations.
There could be further research on the methodologies, which has no proven results in
real-time projects. The practices explained could be verified and further studies in
different areas which are not studied in the literature. The behavioural barriers in the agile
adoption within the analytics teams are also an area very less explored and could be a
potential area for further research.
References 81
6 References
Abbott, L., & Grady, C. (2011). A Systematic Review of the Empirical Literature
Evaluating IRBs: What We Know and What We Still Need to Learn. Journal of
Empirical Research on Human Research Ethics.
Ahmad, M. O., Markkula, J., & Oivo, M. (2013). Kanban in software development: A
systematic literature review. Proceedings - 39th Euromicro Conference Series on
Software Engineering and Advanced Applications, SEAA 2013, 9–16.
Andersson, B., Beveridge, A., & Singh, K. (2015). Literature Review, Academic Tip
Sheet. 2003, 1–2.
Angela Stringfellow. (2017). What Is Scrum? How It Works, Best Practices, and More -
DZone Agile. Retrieved June 4, 2019, from https://dzone.com/articles/what-is-
scrum-software-development-how-it-works-be
Arksey, H., & O’Malley, L. (2005). Scoping studies: Towards a methodological
framework. International Journal of Social Research Methodology: Theory and
Practice.
B Silva, F. Q., J O Cruz, S. S., Gouveia, T. B., Fernando Capretz, L., B, F. Q., J O, S.
S., … B da Silva, F. Q. (2013). Using Meta-Ethnography to Synthesize Research:
A Worked Example of the Relations between Personality on Software Team
Processes.
Baars, H., & Zimmer, M. (2013). A Classification for Business Intelligence Agility
Indicators. Proceedings of the 21st European Conference on Information Systems
(ECIS 2013), 1–12.
Barnett-Page, E., & Thomas, J. (n.d.). Methods for the synthesis of qualitative research:
a critical review ESRC National Centre for Research Methods NCRM Working
Paper Series Number (01/09).
Beck, K., Beedle, M., Van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M.,
… Thomas, D. (2001). Manifesto for Agile Software Development Twelve
Principles of Agile Software.
Cervone, H. F. (2011). Understanding agile project management methods using Scrum.
OCLC Systems and Services, 27(1), 18–22.
Chang, V., & Larson, D. (2016). A Review and Future Direction of Agile, Business
Intelligence, Analytics and Data Science. International Journal of Information
Management, 36, 700–710.
Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence
technology. Communications of the ACM, 54(8), 88.
Chen, H., Chiang, R. H. L., Storey, V. C., & Robinson, J. M. (2012). SPECIAL ISSUE:
BUSINESS INTELLIGENCE RESEARCH BUSINESS INTELLIGENCE AND
ANALYTICS: FROM BIG DATA TO BIG IMPACT.
Chen, H. M., Kazman, R., & Haziyev, S. (2016). Agile big data analytics development:
An architecture-centric approach. Proceedings of the Annual Hawaii International
Conference on System Sciences, 2016-March, 5378–5387.
Chun Tie, Y., Birks, M., & Francis, K. (2019). Grounded theory research: A design
References 82
framework for novice researchers. SAGE Open Medicine, 7, 205031211882292.
Coelho Da Silva, T. L., Magalhães, R. P., Brilhante, I. R., De Macêdo, J. A. F., Araújo,
D., Rego, P. A. L., & Neto, A. V. L. (2018). Big data analytics technologies and
platforms: A brief review. CEUR Workshop Proceedings, 2170, 25–32.
Collier, K. W. (2011). Agile Analytics: A Value-Driven Approach to Business
Intelligence and Data Warehousing: Delivering the Promise of Business
Intelligence (Agile Software Development).
Curry, E. (2016). The Big Data Value Chain: Definitions, Concepts, and Theoretical
Approaches. In New Horizons for a Data-Driven Economy (pp. 29–37).
Deshpande, K., & Desai, B. (2015). Model for Assessment of Agile Methodology for
Implementing Data Warehouse Projects. International Journal of Applied
Information Systems, 9(8), 42–49.
Dharmapal, S. R., & Sikamani, K. T. (2016). Big data analytics using agile model.
International Conference on Electrical, Electronics, and Optimization Techniques,
ICEEOT 2016, 1088–1091.
Dybå, T., & Dingsøyr, T. (2008). Empirical studies of agile software development: A
systematic review. Information and Software Technology.
Earley, S. (2014). Agile analytics in the age of big data. IT Professional, 16(4), 18–20.
Evelson, B. (2014). The Good The Bad And The Ugly Of Enterprise BI. Retrieved from
https://go.forrester.com/blogs/14-08-25-
the_good_the_bad_and_the_ugly_of_enterprise_bi/
Farhan, S., Tauseef, H., & Fahiem, M. A. (2009). Adding agility to architecture tradeoff
analysis method for mapping on crystal. 2009 WRI World Congress on Software
Engineering, WCSE 2009, 4, 121–125.
Firdaus, A., Ghani, I., & Yasin, N. I. M. (2013). Developing Secure Websites Using
Feature Driven Development (FDD): A Case Study. Journal of Clean Energy
Technologies, 1(4), 322–326.
Grady, N. W., Payne, J. A., & Parker, H. (2017). Agile big data analytics: AnalyticsOps
for data science. Proceedings - 2017 IEEE International Conference on Big Data,
Big Data 2017, 2018-Janua, 2331–2339.
Halper, F. (2017). TDWI Self-Service Analytics Maturity Model Guide Interpreting Your
Assessment Score 2017.
Hossain, E., Ali Babar, M., & Paik, H. Y. (2009). Using scrum in global software
development: A systematic literature review. Proceedings - 2009 4th IEEE
International Conference on Global Software Engineering, ICGSE 2009, 175–184.
Hunt, J. (2006). Feature-Driven Development. In Agile Software Construction (pp. 161–
182).
Ilieva, S., Ivanov, P., & Stefanova, E. (2004). Analyses of an agile methodology
implementation. Proceedings. 30th Euromicro Conference, 2004., 326–333.
Intelligence, W. B. (n.d.). Where Agile Meets BI.
Jacobs, S. (2015). Searching the literature is iterative. Retrieved from
References 83
https://wp.nyu.edu/discoveringevidence/its-iterative/
Jasim Hadi, H., Hameed Shnain, A., Hadishaheed, S., & Haji Ahmad, A. (2015). Big
Data and Five V’S Characteristics. International Journal of Advances in
Electronics and Computer Science, (2), 2393–2835.
Johnson, A. (2012). A Short Guide to Action Research. Special Education Department
Faculty Publications.
Kannan, K., & Services, I. P. (2011). Agile Methodology for Data Warehouse and Data
Integration Projects. Integration The Vlsi Journal.
Khan, S. N. (2014). Qualitative Research Method: Grounded Theory. International
Journal of Business and Management, 9(11).
Kitchenham, B. (2004). Procedures for Performing Systematic Reviews.
Kitchenham, B., & Charters, S. (2007). Source: “Guidelines for performing Systematic
Literature Reviews in SE”, Kitchenham et al Guidelines for performing Systematic
Literature Reviews in Software Engineering.
Klopper, R., Lubbe, S., & Rugbeer, H. (2007). The Matrix Method of Literature
Review. In Alternation (Vol. 14).
Knabke, T., & Olbrich, S. (n.d.). Towards agile BI: applying in-memory technology to
data warehouse architectures.
Ko, I. S., & Abdullaev, S. R. (2007). LNCS 4490 - A Study on the Aspects of
Successful Business Intelligence System Development. In LNCS (Vol. 4490).
Krawatzeck, R., Dinter, B., & Thi, D. A. P. (2015). How to make business intelligence
agile: The agile BI actions catalog. Proceedings of the Annual Hawaii
International Conference on System Sciences, 2015-March, 4762–4771.
Kruchten, P., & Gorans, P. (2014). A Guide to Critical Success Factors in Agile
Delivery. 29.
Ktata, O., & Lévesque, G. (n.d.). Agile development: Issues and avenues requiring a
substantial enhancement of the business perspective in large projects.
Labaree, R. V. (n.d.). Research Guides: Organizing Your Social Sciences Research
Paper: Writing a Research Proposal.
Larman, C., & Basili, V. R. (2003). Iterative and incremental development: A brief
history. Computer, 36(6), 47–56.
Levy, Y., & Ellis, T. J. (2006). Proceedings of the 2006 Informing Science and IT
Education Joint Conference Towards a Framework of Literature Review Process
in Support of Information Systems Research.
Leybourn, E. (2013). AGILE BUSINESS INTELLIGENCE OR HOW TO GIVE
MANAGEMENT WHAT THEY NEED WHEN THEY NEED IT.
Lim, E. P., & Chen, H. (2013). Business Intelligence and Analytics : Research
Directions. 3(4), 1–10.
Livermore, J. A. (2007). Factors that Impact Implementing an Agile Software
Development Methodology. Proceedings 2007 IEEE SoutheastCon, 82–86.
References 84
Matharu, G. S., Mishra, A., Singh, H., & Upadhyay, P. (2015). Empirical Study of
Agile Software Development Methodologies. ACM SIGSOFT Software
Engineering Notes, 40(1), 1–6.
Miller, H. G., & Mork, P. (2013). From data to decisions: A value chain for big data. IT
Professional, 15(1), 57–59.
Mohr, N., & Hürtgen, H. (2018). Achieving Business Impact with Big Data. McKinsey,
15.
Müller-Bloch, C., & Kranz, J. (2015). A Framework for Rigorously Identifying
Research Gaps in Qualitative Literature Reviews Completed Research Paper.
Munn, Z., Peters, M. D. J., Stern, C., Tufanaru, C., McArthur, A., & Aromataris, E.
(2018). Systematic review or scoping review? Guidance for authors when choosing
between a systematic or scoping review approach. BMC Medical Research
Methodology.
MUNTEAN, M., & SURCEL, T. (2013). Agile BI The Future of BI. Informatica
Economica, 17(3/2013), 114–124.
Nagle, T., & Sammon, D. (2013). Fast and Flexible: Exploring Agile Data Analytics.
Cutter Benchmark Review, 13(2), 5.
Narasimhan, R. (2014). Big Data – A Brief Study. International Journal of Scientific &
Engineering Research, 5(9).
Newkirk, J. (2002). Introduction to agile processes and extreme programming.
Proceedings of the 24th International Conference on Software Engineering - ICSE
’02, 695.
Ochoa, A. (n.d.). Concept Matrix for Sustainability Literature Review Aspect of Topic
Natural Resource Levels Biodiversity Pollution Economic Impacts Urbanization
How Topics Come Together.
Okoli, C., & Schabram, K. (2010). Working Papers on Information Systems A Guide to
Conducting a Systematic Literature Review of Information Systems Research A
Guide to Conducting a Systematic Literature Review of Information Systems
Research.
Orton, C. G. (2015). Biological treatment planning. AAPM 57th Meeting, 1–5.
Paetsch, F., Maurer, F., Eberlein, A., Maurer, F., Eberlein, A., & Maurer, F. (2003).
Requirements Engineering and Agile Software Development. Proceedings of the
Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises,
WETICE, 2003-Janua, 1–6.
Poppendieck, M., & Cusumano, M. A. (2012). Lean Software Development: A Tutorial.
IEEE Software, 29(5), 26–32.
Saltz, J. S., & Shamshurin, I. (2016). Big data team process methodologies: A literature
review and the identification of key factors for a project’s success. Proceedings -
2016 IEEE International Conference on Big Data, Big Data 2016.
Sanaa, H., Afifi, W., & Darwish, N. (2016). The Goal Questions Metrics for Agile
Business Intelligence. Egyptian Computer Science Journal, 40(2), 24–42.
Schleier-Smith, J. (2015). An Architecture for Agile Machine Learning in Real-Time
References 85
Applications. 2059–2068.
Schniederjans, M., Schniederjans, D., & Starkey, C. (2014). Business Analytics
PRINCIPLES, CONCEPTS AND APPLICATIONS.
SINGH, J., & SINGLA, V. (2015). Big Data: Tools and Technologies in Big Data.
International Journal of Computer Applications, 112(15), 975–8887.
Singh, S., Singh, P., Garg, R., & Mishra, P. K. (2015). Big Data: Technologies, Trends
and Applications. 6(5), 4633–4639.
Skene, A. (n.d.). Riting a Lit Review. Writing.
Souza, E., Moreira, A., Abrahão, S., Araújo, J., & Insfran, E. (2016). Comparing Value-
Driven Methods: an experiment design Adapt@Cloud: User-Centered Dynamic
Adaptation of Cloud Services View project Requirements Engineering conference
View project Comparing Value-Driven Methods: an experiment design.
Stapleton, J. (1999). DSDM: Dynamic Systems Development Method. Proceedings of
the Technology of Object-Oriented Languages and Systems, 406.
Strohmaier, M., & Lindstaedt, S. N. (2005). Beyond Flexible Information Systems : Why
Business Agility Matters.
Sun, Z., Sun, L., & Strang, K. (2018). Big Data Analytics Services for Enhancing
Business Intelligence. Journal of Computer Information Systems, 58(2), 162–169.
Swapnil, W., & Anil, Y. (2016). Big Data: Characteristics, Challenges and Data
Mining. In International Journal of Computer Applications.
Tallon, P. P. (2013). Corporate governance of big data: Perspectives on value, risk, and
cost. Computer, 46(6), 32–38.
The Data Value Chain. (2018).
Timulak, L. (2009). Meta-analysis of qualitative studies: A tool for reviewing
qualitative research findings in psychotherapy. Psychotherapy Research.
Tom, B. (2012). Agile Analytics. ACM SIGSOFT Software Engineering Notes, 37(6),
43.
Valentine, C., & Merchan, W. (2016). DataOps: an Agile Methodology for Data-Driven
Organizations. 8.
Voigt, B. J. j. (2004). Dynamic System Development Method. (January), 22.
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare the future. MIS
Quarterly, 26(2), xiii–xxiii.
Willans, B. (n.d.). Writing a literature review.
Williams, S., & Williams, N. (2007). Critical Acclaim for The Profit Impact of Business
Intelligence.
Wohlin, C. (2014). Guidelines for Snowballing in Systematic Literature Studies and a
Replication in Software Engineering.
Woolley, B., Melbourne, V., & Hobbs, G. (2008). Agility in Information System.
Bseccanterburyacnz.
References 86
Yin, R. K. (2015). Yin, Robert K. - Case study research _ design and methods-Sage
Publication (2015).
Annotated Bibliography 87
7 Annotated Bibliography
Grady, N. W., Payne, J. A., & Parker, H. (2017). Agile big data analytics:
AnalyticsOps for data science. Proceedings - 2017 IEEE International Conference
on Big Data
This article describes the challenges faced during the development lifecycle of an
analytics application and with the introduction of advanced analytics. The author
explains that the traditional way of handling the analytics cannot meet the changing
environment and agile approach to decision-support information. How to incorporate
Machine Learning and Data Science concepts in the analytics lifecycle and utilize
agile methodologies in the information development lifecycle. The author explains
how relying on the traditional data analytics lifecycle has created complexities and
legacies and explains the needs to adapt to more agile and sophisticated processes.
The author has adopted literature review methodology and industry practices. The
author starts by explaining the limitations of the existing analytics process models.
The author stresses the idea of “failing fast” and follow the direction which could
promise the required outcome. The author explains how agile methodologies have
been adopted for iterative and incremental software development and also describes
the drivers causing that, among them are human factors, cost and risk factors and the
system complexity. The author has proposed a new Big Data Analytics process model
to incorporate new tools and technologies called Data Science Edge (DSE). It is
extending and based on the CRISP-DM. It provides useful enhancement and serves
as a complete lifecycle to knowledge discovery, which includes data storage,
collection and software development. It provides the mechanism to decompose the
steps to provide better enhancement and analytics improvement. The author has also
discussed the theoretical frameworks to perform data science proposed by another
researcher. The author compared the business intelligence and fast analytics with their
proposed DSE process model. In the end, the author concludes that by following the
agile methodologies, the outcomes could be expected more quickly to make a decision
and to validate the current state of the project. The author explains how the DSE
process model aligned itself similar to agile methodologies adaption in software
development and proposed the conceptual changes required to adopt agile in data
analytics.
Annotated Bibliography 88
Saltz, J., Shamshurin, I., & Crowston, K. (2017). Comparing Data Science Project
Management Methodologies via a Controlled Experiment. Proceedings of the 50th
Hawaii International Conference on System Sciences (2017), 1013–1022.
In the following conference paper, the author has compared different agile
methodologies to compare their performance concerning a data science project. The
author has argued that with rapidly growing technologies and capability to process a
vast amount of data, the data science field is proliferating. However, there is not much
known about how to execute the project in a group. The author stresses the need for
well-defined process methodologies for data science projects. The author has applied
controlled experimentation methodology to test and compare the performance of
different agile methodologies. The author chooses student groups as a subject and
designed experimentation in a way that each group worked following different agile
methodologies throughout the semester. The limitations are that the experimentations
are not applied to real data science teams rather than students. In the end, the author
analyzed and compared the performance results. The main research questions were to
identify criteria to measure different methodologies and decide which methodology
is better than others. In the start, the author did some literature review to build some
theoretical background knowledge and explained the existing data science process
methodologies like CRISP-DM and SEMMA and some other customized
methodologies. The author explains with the help of a literature review that most of
the data science projects are managed in an ad hoc way, with 55% of Big Data project
never completed. The author compared the experimentation in the context of software
development and after assigning the project and setting up the experimentation
environment. The author did an interview and surveys and rated each methodology.
The author provides pros and cons and also tried to extract the details about the
environments where each method perform better than others. The survey results
showed that the Kanban is the most accepted methodology concerning factors, ease
of use, project results, the willingness of the teams to work on future projects and
satisfaction of individual team members.
Annotated Bibliography 89
Chen, H. M., Kazman, R., & Haziyev, S. (2016). Agile big data analytics
development: An architecture-centric approach. Proceedings of the Annual Hawaii
International Conference on System Sciences, 2016-March, 5378–5387.
In this paper, the author has suggested a methodology targeting Big Data Analytics
projects. The proposed methodology, called Architecture-centric Agile Big Data
Analytics (AABA). The author states about the importance of big data for
organizations to increase their operational efficiency, identify markets, making timely
decisions and many more. The author highlights the challenges faced during Big Data
implementation in an architectural point of view. The author explaining the technical
challenges argued that success in any analytics system is directly dependent on robust
infrastructure architecture that can effectively process big data. Organizational
challenges include how the data scientist teams, along with software and data
engineers, can work effectively to get the value from data. The author explained how
agile methodologies could effectively tackle these challenges. In the paper, the author
analyzed how big data system should be designed, which can adequately support the
analytics process and how agile methods could be adapted for Big Data Analytics.
The author proposed and tested the architecture-centric development, which,
according to him, can support both, the big data system design and agile adaption.
The author used an empirical case study method called (CPR) to identify practical
issues and collaborate with the experts. The author analyzed 10 case studies from
different organizations. The author explained AABA methodology in detail and
described the software architecture as the key enabler for agility. The author explained
the difference to AABA methodology to Agile Analytics, claiming it more
architecture focussed. The author claims that this approach gives better control of
continuous delivery. Although author claimed the success in the adaption of
methodology, he clarifies that there are some biases and limitations including case
study as a method itself, the use case studies from relatively big companies which are
already in profits, outsourced part of projects and some more.
Chang, V., & Larson, D. (2016). A Review and Future Direction of Agile, Business
Intelligence, Analytics and Data Science. International Journal of Information
Management, 36, 700–710.
In this article, the author is investigating agile business intelligence and how the
advancement in technologies and arrival of Advanced Analytics and Data Science has
changed the way agile methodologies were applied in BI delivery. To adapt the new
Annotated Bibliography 90
trend in BI agile methodologies must be altered to fit the changing circumstances. The
author tried to understand and summarize how and how much agile practices have
been transformed with new trends in BI and the challenges involved. The author starts
by explaining the background information related to core topics and gradually
discussing current agile practices in BI delivery. The author briefly compares the BI
delivery and Fast Analytics and Data Science project lifecycle with detail to each step.
After analysing both separately, the author described how fast analytics and data
science could be adopted in Agile BI delivery framework. The author proposes two
separate layers consists of a different task for Both BI and fast analytics and data
science systems. The alignment between both layers with the impact the successful
delivery of analytics system.
Sanaa, H., Afifi, W., & Darwish, N. (2016). The Goal Questions Metrics for Agile
Business Intelligence. Egyptian Computer Science Journal, 40(2), 24–42.
In this paper, the author has argued the BI process and the challenges involved in that
process. The paper aims to measure the performance of agile teams working towards
the development of the BI system. The author has applied the Goal Question Agility
Matrix (GQAM) method for this purpose. Developed by Victor Basilli in 1994,
GQAM relies on BI and Agile concepts. The research methodology applied in this
paper is experimentation as the performance of the teams will be measured against
the set criteria defined by Goal Question Metrics (GQM). The author has explained
about the essential components of BI systems, agile BI frameworks, agile BI success
factors and tools and technologies required for success. The author explained all the
phases of agile BI framework from the discovery phase to deploy. The author talked
about the characteristics and attributes of the analytics application. In the end, the
author applied GQAM and explained how it could be used to measure team
performance. The GQAM is based on GQM, which has three components goal,
question and metrics. It also provides the teams with guidance to achieve the desired
performance.
Franková, P., Drahošová, M., & Balco, P. (2016). Agile Project Management
Approach and its Use in Big Data Management. Procedia Computer Science,
83(Ant), 576–583.
In this paper, the author has argued the different views from the management
perspective that whether there must be a tailored approach for every project or there
Annotated Bibliography 91
must be the standard approach to all projects. The author put a particular focus on the
management of Big Data projects. The author used the survey as a research approach.
After performing interviews with management from different projects, the author
identified and recommended the principles of the agile manifesto, which could be
applied in big data management. Based on the research and data collected, the author
tried to answer the questions about which approach is suitable for big data
management, is agile approach is suitable for big data projects, and the
recommendations for implementation of big data projects. The author starts from
explaining the agile approach for software development and big data management
process to make a theoretical background. Then the author compared the different
aspects of BDM concerning their importance to the projects and concluded that agile
methodologies could be applied to big data projects. By suggesting the agile
methodologies for BDM author also advocated the other factors, including the
technological considerations.
Dharmapal, S. R., & Sikamani, K. T. (2016). Big data analytics using agile model.
International Conference on Electrical, Electronics, and Optimization Techniques,
ICEEOT 2016, 1088–1091.
In this conference paper, the author gave the background information about Big Data
Analytics and discussed how agile methodologies could be applied to Big Data
projects. The author argued about the importance of Big Data for business for
competitive advantage. The author explains the companies can utilize the data for
sales forecast, marketing and various purposes. The author describes the history and
the rate at which the data is being created. The steps involved in the big data analytics
from setting the goal to tool delivery are discussed. The author divided the whole
analytics process into three phases planning phase, development phase, and closure
phase. Besides, the author argued about different roles in agile development specific
to big data analytics and compared the agile approach with the traditional waterfall
approach. The author used a literature review methodology to extract findings and
conclude results. The author concluded by explaining the adoption of agile in
analytics projects. It provided collaboration, timely delivery to market and customer
satisfaction.
Annotated Bibliography 92
Deshpande, K., & Desai, B. (2015). Model for Assessment of Agile Methodology for
Implementing Data Warehouse Projects. International Journal of Applied
Information Systems, 9(8), 42–49.
In this paper, the author has argued about the failure of data warehouse projects are
due to its implementation using the traditional waterfall model. The failure ratio for
data warehouse projects is very high. The author describes that although it is
considered that agile methodologies can overcome this failure, but still, there are cases
where agile methodologies have not made any difference. The author proposed that
there must be an assessment for suitability of agile methodology before adopting it
for the project. In this paper, the author has suggested the Analytical Hierarchy
Process (AHP) to analyze the factors impacting the selection of agile for data
warehouse projects. The author discusses the problems and challenges involved with
the traditional waterfall approach to data warehouse implementation. Then the author
explained about history and usage of agile methodology for data warehouse projects.
By defining the objective of the study to identify the characteristics that impact the
outcome of data warehouse project implemented using agile methodology, whether
these characteristics affect positively or negatively, and what are the chances to
predict the success based on that characteristics. The author then explained the steps
for the AHP process, which include deciding for goals, comparison, comparison result
and priority matric, accuracy check, and generalized model. The selection of BI
projects suitable for agile methodologies depends on the four main factors, which are
requirement status, organization culture and process, team effectiveness, data
environment infrastructure. The author used the survey as a research methodology. In
the end, the author proposed the model the assessment model and argued about
applying it to as many projects to find the suitability score.
Krawatzeck, R., Dinter, B., & Thi, D. A. P. (2015). How to make business
intelligence agile: The agile BI actions catalog. Proceedings of the Annual Hawaii
International Conference on System Sciences, 2015-March, 4762–4771.
In this paper, the author is arguing about the changing requirement in the BI system,
how the BI system would react in case of sudden and volatile requirements. The
ability to react to change in BI systems is known as BI agility. The author conveyed
that there is no structured approach to agile BI. To remain competitive for an
organization, it must be able to respond to changing requirements. The paper aims to
recommend the agile BI action catalogue. For any BI implementations, the
Annotated Bibliography 93
practitioners can compare their actions with the catalogue to find missing steps. The
author discussed the agile BI and divide the action required into four categories
Principles, process models, techniques, and technologies. The author explained the
step by step process of literature review and the analysis performed on the selected
articles. For each category defined above, the author explains in details by collecting
and summarizing the relevant literature to answer the questions. The total of 21
actions were recommended. The author argued that most of the actions identified are
related to the implementation phase.
Schleier-Smith, J. (2015). An Architecture for Agile Machine Learning in Real-
Time Applications. 2059–2068.
In this conference paper, the author put the focus on the more advanced techniques
used by recommendation systems such as machine learning. With the advancement in
the data handling tools and technologies and the emergence of data science and
machine learning, there is a need to handle the project systematically as software
development projects. Agile methods have been adopted for a long time and proved
to be very beneficial in software development. In the case of machine learning agile
approach is entirely new, and not much had been done till now. The author proposed
an approach and architecture to handle the machine learning projects iteratively.
Giving the example of some top tech companies like Facebook and hi5 explained their
approach to machine learning and data science. The author summarized the approach
and explained the key benefits, including improved collaboration, natural real-time
processing and quick iterations. The author utilized experimentation as a research
methodology and the Meet me application as a real-time implementation. By applying
the event history architecture, the author concluded how agile methodologies could
be applied in data science teams for implementing machine learning projects.
Krawatzecka, R., & Dintera, B. (2015). Agile Business Intelligence: Collection and
Classification of Agile Business Intelligence Actions by Means of a Catalog and a
Selection Guide. Information Systems Management, 32(3), 177–191
In this journal article, the author summarized a list of actions based on multiple
categories to increase the BI agility. The author recommends the selection guide for
BI actions required to adapt quickly changing requirements. The categories defined
are principles, methods, techniques and technologies. The author explained the
current background and importance of BI and explained the categories in detail.
Annotated Bibliography 94
Defined the schema for different levels of selection guide. The author proposed 31 BI
action based on three levels and classification schema. The author applied the
literature review methodology to analyze and synthesize the collected data and
summarize the results in the form of selection guide.
MUNTEAN, M., & SURCEL, T. (2013). Agile BI.The Future of BI. Informatica
Economica, 17(3/2013), 114–124.
In this paper, the author tried to answer the fundamental question related to BI
development. The discussed the concepts which help to understand what is agile?
Why is it appropriate to develop BI using the agile methodology? Also, which are the
key features to support the agile BI? The author starts by explaining the basic concepts
and tools and technologies for the BI system. The author defined BI according to
different kinds of literature. The author explains the traditional BI architecture. The
main components and layers in traditional BI systems include but not limited to, data
sources, ETL layer, data warehouse, and BI tools for visualization. After a brief
description of the architecture, the author explains its disadvantages. The
disadvantages include a massive amount of duplicate data, use of different tools for
every different task, rigid data models and more. The author then moves on to
explaining the agile BI. The author breakdown the agile BI into three main
components agile development, agile business analytics, agile information
infrastructure. The author then explains each component in detail by comparing it with
the traditional approach and performing a SWOT analysis for BA. After all the
analysis performed, the author concluded that agile BI is useful for changing user
requirements, to effectively handle the inability of IT to meet the business demands,
and fast delivery to market. The author identified the key elements to promote agile
BI.
Baars, H., & Zimmer, M. (2013). A Classification for Business Intelligence Agility
Indicators. Proceedings of the 21st European Conference on Information Systems
(ECIS 2013), 1–12.
In this paper, the author tries to distinguish how agile meaning is different for every
BI solution. The author explains the BI function and its importance to top
management. The author then continues by explaining what it means for BI to be agile
and the effectiveness of using agile in BI systems. The author explained the related
work from other publication and argued that on this topic, there is not much academic
Annotated Bibliography 95
research has been done so far, and most of the information is available through
practitioners. The author applied case study research methodology and interviewed
14 experts for data collection. In the case study method author choose the companies
from different industrial background and which have already adapted. The author
explored the architecture and agility applied to BI implementation of companies. The
author concluded different result, the agility for the enterprise should not be tied to BI
agility, BI agility should be measured after the requirement is matured and become
concrete, BI agility should split into organizational components, BI agility should split
according to each architectural layer. Overall, BI agility should be divided into three
main components BI functional agility, BI scale agility, and BI content agility.
Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business Intelligence and
analytics. MIS Quarterly, 36(4), 1165–1188.
In this paper, the author describes the importance of BI and relates it to the new
emergence of the latest data technologies. The author proposed the framework to
identify the emerging area of BI&A research. The author identified the critical
challenges involved in this process. The author starts by explaining the key arguments
and the facts related to BI, analytics and big data emergence and current level of use.
The author has divided the overall BI&A structure into three versions, BI&A 1.0 to
3.0. The author then discussed the evolution and technology advancement in each
version. There are new areas of research with the capability of new technologies to
process and analyze the massive amount of data. The characteristics and capabilities
of each version are discussed and summarized. The impact and usefulness of BI&A
versions are discussed on areas like E-Commerce and market intelligence, E-
Government and politics, science and technology, smart health and wellbeing, and
security and public safety. Each aspect of BI&A like application, data, analytics and
impacts were discussed. Current and future landscape of BI&A system are explained
briefly.
Chan, F. K. Y., & Thong, J. Y. L. (2009). Acceptance of agile methodologies: A
critical review and conceptual framework. Decision Support Systems, 46(4), 803–
814.
In this paper, the author has performed a critical review of the framework for applying
agile methodologies to a project. The author explains how the challenges can be
handles while adopting agile methodologies for software development. The author
Annotated Bibliography 96
used the literature review as a research methodology and provided a framework to
perform future research. The author explains the introduction of agile methodologies
in software development and explains the history of its arrival. The author discussed
the problems organizations faced during adoption, which include developer ignore it,
it was not correctly adjustable and more. The author explains the conceptual
framework and the factors for acceptance of agile methodology which are ability
related factors like training and external support, motivation related factors like career
consequences and organizational culture, and the other factors are opportunity related
factors, agile methodology characteristics, knowledge management outcomes. All
these factors should be in consideration before performing the acceptance test of agile
methodology. Knowledge management was proposed to examine the acceptance of
agile methodology. The author explains how the characteristics explained in the
conceptual framework can be used to understand the decision of developers to accept
agile methodologies.
Strohmaier, M., & Rollett, H. (2005). Future research challenges in business agility
- Time, control and information systems. Proceedings - Seventh IEEE
International Conference on E-Commerce Technology Workshops, CEC 2005
Workshops, 2005, 109–115.
In this paper, the author discussed the challenges faced by businesses in the face of
competitions for competitors. This paper conceptualizes business agility and
identifies challenges for future research. The author explains the triadic dimension
problems concerning to business agility that includes, time, control, and information
system. Based on the following analysis, the author proposed two directions of
business agility research, which are, organic IS and decision support system. The
author starts by explaining the triadic problem in context to Business Agility and
levels of analysis. The organic IS serves as autonomous IS, which is goal-oriented
and decision support systems strengthen the human qualities in the control system.
Due to the changing nature of Business agility the triadic problem of time, control and
information system will remain a challenge.
Strohmaier, M., & Lindstaedt, S. N. (2005). Beyond Flexible Information Systems :
Why Business Agility Matters.
In this paper, the author argues about whether the current state of the information
system can handle the dynamic needs of businesses. The business agility for any
organization depends on how flexible are the information systems of that
Annotated Bibliography 97
organization. How quickly they respond to the change. The author performed analysis
and proposed a guideline, which helps the businesses to adopt maximum agility. The
author explains the business agility concept with the meaning of flexibility in business
agility. The author applied B-KIDE framework for business oriented modelling of
organizational knowledge. The following framework address the knowledge to better
understand process flexibility. The author applied literature review methodology for
analysis and proposed outcome.