36
IN DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS , STOCKHOLM SWEDEN 2017 Improving Back-End Service Data Collection ISABEL GHOURCHIAN CHARLOTTA SPIK KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

Improving Back-End Service Data Collectionkth.diva-portal.org/smash/get/diva2:1119098/FULLTEXT01.pdf · Streaming, API, Java, caching, mobile application . ii Referat ... implement

Embed Size (px)

Citation preview

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2017

Improving Back-End Service Data Collection

ISABEL GHOURCHIAN

CHARLOTTA SPIK

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

1

Improving back-end service

data collection

ISABEL GHOURCHIAN CHARLOTTA SPIK

[email protected] [email protected]

Bachelor thesis, 15hp

Supervisor: Johan Montelius

Examiner: Thomas Sjöland

i

Abstract

This project was done for a company called Anchr that develops a location based mobile

application for listing nearby hangouts in a specified area. For this, they integrate a number

of services which they send requests to in order to see if there are any nearby locations listed

for these services. One of these services is Meetup, which is an application where users can

create social events and gatherings.

The problem this project aims to solve is that a large number of requests are sent to Meetup’s

service in order to get information about the events, so that they then can be displayed in the

application. This is a problem since only a limited number of requests can be sent within a

specified time period before the service is locked. This means that Meetup’s service cannot

be integrated into the application as it is now implemented, as the feature will become useless

if no requests can be sent. The purpose of this project is therefore to find an alternative way

of collecting the events from the service without it locking. This would enable the service to

be integrated into the application.

The hypothesis is that instead of using the current method of sending requests to get events,

implement a listener that listens for incoming events from Meetup’s stream, to directly get

updates whenever an event is created or updated.

The result of the project is that there now exists a system which listens for events instead of

repeatedly sending requests. The issue with the locking of the service does not exist anymore

since no requests are sent to Meetup’s service.

Keywords

Streaming, API, Java, caching, mobile application

ii

Referat

Detta projekt genomfördes för ett företag som heter Anchr som utvecklar en platsbaserad

mobilapplikation för att lista närliggande sociala platser inom ett specificerat område. För

detta integrerade de ett antal tjänster som de skickar förfrågningar till för att se om det finns

några närliggande platser listade för dessa tjänster. En av dessa tjänster är Meetup som är

en applikation där användare kan skapa sociala evenemang.

Problemet detta examensarbete syftar till att lösa är att ett stort antal förfrågningar skickas

till Meetups tjänst för att få information om evenemangen så att de kan visas i applikationen.

Detta är ett problem då endast ett begränsat antal förfrågningar kan skickas till deras tjänst

inom ett visst tidsintervall innan tjänsten spärras. Detta betyder att Meetups tjänst inte kan

integreras in i applikationen såsom den är implementerad i nuläget, eftersom funktionen

kommer bli oanvändbar om inga förfrågningar kan skickas. Syftet med detta projekt är

därför att hitta ett alternativt sätt att samla in evenemang från tjänsten utan att den spärras.

Detta skulle göra så tjänsten kan integreras in i applikationen.

Hypotesen är att istället för att använda den nuvarande metoden som går ut på att skicka

förfrågningar för att få nya händelser, implementera en lyssnare som lyssnar efter

inkommande händelser från Meetups stream, för att direkt få uppdateringar när ett

evenemang skapas eller uppdateras.

Resultatet av detta är att det nu finns ett system som lyssnar efter evenemang istället för att

upprepningsvis skicka förfrågningar. Problemet med låsningen av tjänsten existerar inte

längre då inga förfrågningar skickas till Meetup’s tjänst.

Nyckelord

Streaming, API, Java, cachning, mobilapplikation

iii

Contents

Introduction .................................................................................................................... 1

1.1. Background .......................................................................................................................................... 1

1.2. Problem .............................................................................................................................................. 2

1.3. Purpose ............................................................................................................................................... 2

1.4. Goal ..................................................................................................................................................... 2

1.5. Method ................................................................................................................................................3

1.6. Delimitations ..................................................................................................................................... 4

1.7. Outline ................................................................................................................................................ 4

Background ..................................................................................................................... 5

2.1. Difference between REST API and streaming API ........................................................................... 5

2.2. Streaming........................................................................................................................................... 6

2.3. Previous work in the area ................................................................................................................. 6

Method ............................................................................................................................ 9

3.1. Presentation of different methods .................................................................................................... 9

3.2. Methods used in this project ........................................................................................................... 13

The implemented system ............................................................................................... 16

4.1. Software development tools ............................................................................................................. 18

4.2. JavaScript Object Notation (JSON) ................................................................................................ 18

4.3. Dropwizard ....................................................................................................................................... 19

4.4. Retrofit, Okio and okHttp ............................................................................................................... 20

4.5. RxJava .............................................................................................................................................. 20

Result ............................................................................................................................ 21

Discussion and conclusion ............................................................................................ 24

6.1. Sources of error ................................................................................................................................ 24

6.2. Choice of methods ........................................................................................................................... 25

6.3. Future work ..................................................................................................................................... 25

References ..................................................................................................................... 27

iv

Terminology

API Set of data and functions for interaction between computer programs

Binary search A search algorithm for finding a target value within a sorted list or array

Dropwizard Java framework for web service development

JSON Human friendly data-interchange format

Maven Software project management comprehension tool

OkHTTP Used for sending network requests

Okio Used for buffering data

REST Architecture style used for modern web services

REST API Listens and responds to requests from client

Retrofit HTTP client for Java

RxJava API for asynchronous programming with observable streams

Streaming Making data available as soon as client needs it

Streaming API Persistent connection with client

v

1

Chapter 1

Introduction

In this section the project is described briefly. This includes a description of the background,

purpose, goal and method, to give a short explanation of why and how the project was carried

out. Furthermore, delimitations as well as an outline of the report are presented in this

section.

1.1. Background

The work was done for a startup company called Anchr. Anchr is a small Swedish company

run by only three people. The company’s business concept is developing a location-based

mobile application. The idea behind the application is to show all so called “hangouts” within

a specified radius of the user's current position. These hangouts could for example be cafés,

restaurants, bars and monuments. The application should be user-friendly and facilitate the

everyday life for people by making it easier for the users to find locations of interest, without

having to perform extensive online searches. Information about these places will be

displayed as well as opening hours, reviews and description. This information is collected

from a number of different APIs, for example Yelp and Wikipedia. With this application a

user can quickly see all different hangouts that are nearby by just starting the application.

The user can also search for, share and save events as well as build a network of user

connections and chat with other users.

One API that the company wants to integrate into their application is Meetup. Meetup is a

service that allows people from all over the world to arrange events, so called “meetups”, that

anyone can join. These events could be anything from restaurant visits to programming

events[1]. The service allows people to meet and socialize with others with similar interests.

Meetup provides a REST API and a streaming API that can be used to get information from

their server (see section 2.1. for an explanation of the difference between these two kinds of

APIs). There are different API methods for different types of information a user of the API

could be interested in receiving. For example, if the user wants the events themselves, they

CHAPTER 1: INTRODUCTION

2

would use the “events” API method, but if the user is only interested in getting photos from

Meetup, they would use the “photos” API method. The API method used for this project is

“events”. By integrating Meetup’s API with the company’s application all Meetup events near

the user’s location is displayed in the application, so that users can easily find and search for

Meetup events.

1.2. Problem

To get the data from Meetup, Anchr has implemented a client that sends requests to Meetup’s

REST API. These requests contain the location of the user, a radius and possibly a search

term to query the API and see if there are any meetups within the specified radius of the

user’s location that possibly matches the search term if a term is provided. Meetup’s server

then sends a response to the client with meetups matching the query, if any exists.

The problem with collecting data this way is that only a limited number of requests can be

sent within a certain amount of time, as Meetup has specified a limit in order to not overload

their server. If this limit is exceeded within the specified time period the API key is locked,

which means that the API can no longer be used, as an API key is needed in order to use the

API, and no more requests can be sent. The REST API thus have a limit for how many

requests it can handle. This is problematic as the user thereby would possibly miss nearby

meetup events as only a few of them can be displayed, and no new ones would appear if the

user changes its position. Therefore the current client code for Meetup’s API cannot be

integrated into the application, as it does not provide a stable and working system. The

question this project aims to answer is therefore: is there an alternative to collect data

without the limitations of the REST API?

1.3. Purpose

The purpose of this report is to explain how a system can be developed to gather data from

Meetup’s API without having the problem described in section 1.2. Furthermore, the research

methods used to do this is presented and discussed.

The purpose of the project is to solve the problem presented above (see section 1.2) so that a

working implementation of the application with the Meetup API can be implemented.

1.4. Goal

The goal of this project is to develop a system that is able to receive events from Meetup’s

API and prevent the API key from locking itself and stop providing events. The system should

be able to be integrated into the existing application and work in a way so the users quickly

get access to nearby events from their current position.

The system should be a microservice that collects the desired data from Meetup’s service

without the locking problem. The hypothesis is that this can be achieved by, instead of

sending repeated requests to the server’s API, implement a listener that listens for incoming

updates from the server’s streaming API. If this succeeds, the number of requests sent will

be reduced to zero, as there would no longer be a need to send requests to the REST API to

get information about the events, and the problem will no longer remain.

The goal is also to save the data that is received from the streaming API in a database. This

is necessary in order to collect the information in an easily accessible space. The stream

continues to provide events indefinitely every time an event is created or updated. These

CHAPTER 1: INTRODUCTION

3

events must be saved somewhere because if they were displayed directly in the application

when they were received they would disappear every time the user restarted the application.

Furthermore, the system developed must run for a couple of months before it can be released

with the application. This is because if the application was to be released as soon as the

system was implemented, no events created before the system started running would be

displayed in the application, which would lead to the user missing a number of events. For

example, if a user created an event in May that takes place in July, and the system started

running in June, the event would not appear in the application, as it was created before the

application started running and therefore would not have been saved in the database. If

however the program is allowed to run for a couple of months to gather events, the events

created before the system was implemented would have passed, and only the upcoming

events would be displayed. This should result in the database containing all upcoming events

from the server, which would allow the microservice that has been developed to be able to

send requests to the database instead of the server’s API. The hypothesis is that after a few

months the database should be in synch with the information the server site has.

The goal is also that the system should be able to handle the number of requests sent by the

server. There is no guarantee that the server will send evenly distributed events, so our

system must be able to handle several events per second as well as irregular bursts of events.

1.4.1. Benefits, ethics and sustainability

The ethical aspects of this project could be a violation of people's privacy, as the application

has access to the user’s location. This information is sent via a TCP connection to the system.

It is therefore important that the connection is secure so that it cannot be hacked. Otherwise

anyone could get access to the locations of the application users, which is a violation of

people’s privacy.

This project may contribute to a more sustainable environment in a positive way. Since the

resulting application is meant to display different nearby events it is not necessary for the

user to travel long distances. If a user is in a particular place and is looking for a social event

or hangout and are not familiar with the surrounding area, it is likely that they will use some

sort of travel method to reach a place they are already familiar to. This way of traveling long

distances can affect the environment negatively since the emissions from vehicles such as

train, subways, cars and other means of travel that people may choose are polluting the air.

Instead of traveling in that manner a user can now with the finished application pick up their

phone and quickly get access to all the nearby events within the area they are currently

standing. Hopefully the presented events are within walking distance or only a limited

amount of transportation is needed. Because of this, it is likely that the developed application

will benefit the environment by reducing travel time for the users.

Furthermore, people in their everyday life would benefit from this system when they use the

application since it helps them to reduce their travelling time and cost, and in addition makes

it easier for them to find places without having to search online for information.

1.5. Method

For this project the engineering design process was used to develop the system. This method

involves defining the problem, perform background research and create the solution on this

basis before the product is developed and tested. This method is therefore about working

from an identified problem, either generally in society or more specifically for a company or

customer, and from this develop an idea to solve the problem. This method is specifically

formed for engineers for developing new products and is the most common method used by

CHAPTER 1: INTRODUCTION

4

engineers. For a more detailed description of this method and the steps involved, as well as

presentations for the different versions of the engineering design process, see section 3.

Various other methods were evaluated for this project before choosing what method was the

most appropriate to use. For a description of these methods see section 4. For a comparison

between the different methods and a discussion about how the most appropriate for this

project was chosen, see section 6.

1.6. Delimitations

In this project only one solution, that is, listening to a streaming API, is considered to

collecting the desired data from Meetup without the locking of the API key. This is not a

feasible method if a service does not provide a streaming API. Therefore, this work is only

relevant in situations where a streaming API is available. It is consequently likely that there

exist other methods which might reduce the number of requests sent to the API and prevent

the locking that are not considered in this report. The reason for this delimitation is mainly

due to time limitations for the project.

Furthermore, only one streaming API is investigated in this project, which means that all

conclusions drawn will be based solely on the results from this streaming API. However, the

purpose with streaming APIs is generally the same, which means that the conclusions drawn

can be expected to be relevant for other streaming APIs as well.

1.7. Outline

Chapter 2 explains the concepts needed to understand the built system, as well as giving a

description of previous work in the same area. Chapter 3 presents different existing methods

considered for this project and the method chosen to be used for the project. Chapter 4

presents the system architecture and the different steps included to build the system. Chapter

5 describes the results of the project and finally chapter 6 discusses error sources, choice of

methods and future work.

5

Chapter 2

Background

In this section the background for the project is presented. This includes a description of the

different aspects the reader might need information about in order to understand the project,

such as a description of the difference between REST API and streaming API, a description

of how streaming works, and a description of TCP, as well as a presentation of previous

similar works done in this area.

2.1. Difference between REST API and streaming

API

An application program interface (API) is a set of data and functions for allowing interaction

between computer programs and exchange of information between them. API:s are used by

client programs to communicate with web services. [2]

The representational state transfer (REST) is an architecture style that is commonly used for

modern web services. It is built on the following constraints[2]:

1. Identification of resources: An identifier for a Web-based concept, for example

URI.

2. Manipulation of resources through representations: Allows a resource to be

represented in different formats without changing its identifier.

3. Self-descriptive messages: Metadata for additional info of the resource.

4. Hypermedia as the engine of application state: Links to related resources.

A Web API listens and responds to requests that comes from a client. If the API uses a REST

architecture style it is called a REST API. A REST API makes a Web-service “RESTful” [2].

Unlike a REST API, a streaming API has a “persistent connection”, which means that it keeps

the request open indefinitely, that is, it never closes the connection. There are a few steps

involved when using a streaming API[3]:

1. The client makes an initial request.

CHAPTER 2: BACKGROUND

6

2. The server defers the request until an update is available or a timeout has occurred.

3. The server sends an update to the client when it becomes available.

4. The connection is not terminated by the sent update and the server returns to step

three.

This way of requesting a streaming API is based on the capability of the server to send data

on the same response without terminating the requests. It reduces network latency since the

client and the server do not need to close and open the connection repeatedly.[3]

2.2. Streaming

Streaming is about making data available as soon as a client application needs it. This is

useful when a user needs immediate access to data that is constantly updated. If delays in

data transferring is crucial and the data needs to be analyzed as soon as it becomes available,

streaming is the way to go.[4]

Streaming should be separated from downloading. Consider the following example to

illustrate the difference: Imagine you want a drink of water. If you fill up a glass of water and

then drink it, it can be viewed as downloading. If you instead drink the water directly from

the bottle, it can be viewed as streaming[5]. The difference between these two scenarios is

that in the first case you save your water in a temporary location before consuming it. In the

second case you drink the water directly from the source.

A streaming data system is a system that delivers data when the client requests it

immediately. These kinds of streaming systems are in-the-moment systems.[4]

The streaming technology is capable of transmitting data such as video and audio events in

real-time over the internet, when they are happening. The main features of streaming are the

following[5]:

● Deliver live content, for example a football match, concert or political speech at the

same time it happens.

● Provides random access to movies. The streaming server can act as a remote video

player and perform functions like: skip back and forth, enable watching a portion

of media production.

● It occupies no space on the user’s hard disk. The user saves the URL of the stream

instead of the actual media.

● Uses no extra bandwidth other than what is needed.

● Allows streaming for tracks.

● Can use broadcast and multicast approaches, a stream can be sent to several users.

A streaming server has the requirement that the data must be delivered in real time as soon

as it becomes available, although some level of transmission errors can be tolerated, unlike

regular web servers which only download data and does not have the capability of streaming

it. The streaming servers uses excessive bandwidth to buffer ahead the data faster than real-

time. When packets are lost, the server retransmits only the lost packets and thereby reduces

network traffic.[5]

2.3. Previous work in the area

In a previous study by Bifet, Holmes and Pfahringer a streaming API is examined, detecting

changes and frequency of tweets. This API is from Twitter which is a microblogging service

CHAPTER 2: BACKGROUND

7

used for users to post messages known as tweets. The tweets are often short and constantly

generated. The Twitter streaming API provides real time access to all the tweets in a filtered

form. They also include replies and mentions that are created by public accounts. The API

requires a valid Twitter account and uses basic HTTP authentication, the generated data can

then be retrieved in a JSON format.[6]

Bifet, Holmes and Pfahringer worked with a method called the MOA-TweetReader that

reads tweets in real time by using the Twitter streaming API. It also detects changes and finds

terms whose frequency changes. Figure 1 displays the architecture of the method.[6]

Figure 1:Architecture of the MOA-TweetReader Model

MOA-TweetReader takes the tweets as input which are then converted to machine learning

instance. Standard machine learning methods are used by the TweetReader. There is an item

for storing the frequency of the most frequent terms as well as a change detector which

changes the frequencies of the items. The tweets generated are a list of words that can be

transformed by the adapted Twitter filter, retrieving the most relevant features. This filter is

based on a space saving algorithm that has the best performance results. The algorithm

works in the following manner: every time an item that has been monitored before arrives

its count is incremented by one. To detect changes the authors used ADWIN as a change

detector which keeps a variable length window of recently seen items. If an older fragment

value differs from the rest of the window it is dropped.[6]

The streaming data is useful when one wants to discover different moments that are

happening all over the world at any given time. The system the authors introduced as the

MOA-Tweetreader for data streams works well when the tweets that are generated from the

Twitter Streaming API delivers a large quantity in real time.[6]

Furthermore, Bommaiah, Guo, Hofmann and Paul wrote an article where they presented a

design and implementation of a caching system for streaming media. They describe their

core of the streaming cache as helpers. These helpers caches proxies inside a network, each

client is associated with one helper and they provide servicing requests for streaming objects

by sharing common resources. If a client wants to get hold of a streaming object it sends a

request to the server, which is then redirected to the client’s helper.[7]

The caching system consist of helpers. Each of these helpers have a limited amount of disk,

memory, network and computational resources. When it receives a request it must decide

how they should be handled with these limited resources. If a streaming request arrives at a

cache it could be a partial hit since parts of the object could be stored in the cache and other

parts somewhere else.[7]

CHAPTER 2: BACKGROUND

8

The main modules of a helper consist of a RTSP client and server which receives and

processes requests from the server, a buffer manager for the available memory, a cache

manager for the disk space allocated for caching, and finally a scheduler that manages the

global queue of events. The helper helps to improve the perceived quality of multimedia

streams with this kind of caching.[7]

9

Chapter 3

Method

In this section different methods considered in this project are presented, as well as the

methods that were ultimately used. The methods considered are mostly focused around

engineering and technology, as these were considered more in line with the project, but some

more general methods were considered as well. This is by no means a complete presentation

of all available development methods.

3.1. Presentation of different methods

In this section a presentation of the different methods considered for this project is

presented. This includes the following methods:

● Deductive method

● Inductive method

● Hypothetico-deductive method

● Quantitative method

● Qualitative method

● The design science research methodology process

A discussion about the different aspects of the methods and why there were not chosen for

this project is presented in section 6.2.

3.1.1. Inductive and deductive methods

A number of methods have been researched and considered for this project. For example,

Blomkvist and Hallin writes in the book “Metoder för teknologer” (methods for

technologists)[8] about deductive methods contra inductive methods. A deductive method

involves initial research in the area as a way to form theories that are then tested with an

empirical study. An inductive method involves in contrast making an empirical study based

on the identified problem and from this draw conclusions around and understand the

results.[8]

In “The qualitative content analysis process”[9] the authors write that whether an inductive

or a deductive method is used depends on the purpose of the study. They recommend that

CHAPTER 3: METHOD

10

an inductive method is used when there are no previous studies dealing with the

phenomenon or when knowledge is fragmented. The inductive method is described as

moving from the specific to the general, as particular instances are observed and then

combined into a general statement. A deductive method however is recommended when the

aim is to test an earlier theory in a different situation or to compare categories at different

time periods. In contrast to the inductive method, the deductive method is described as

moving from the general to the specific, as it is based on an earlier theory or a model.

According to the same source, both the deductive and inductive method consists of three

main phases: preparation, organizing and reporting.[9]

3.1.2. Hypothetico-deductive method

According to Andersson and Ekholm, the hypothetico-deductive method is most often used

within scientific and technological research areas. This method involves an initial theory or

idea as a starting point for the research that is then followed by experimentation and finally

evaluation of the result.[10] The following three criteria need to be met for the method to be

scientific[10]:

1. Objective, that is, it gives basically the same result independent of who performs

the research.

2. Controllable, that is, the method can be controlled with alternative methods.

3. Theoretically rooted, that is, there are hypotheses or theories that can explain how

the method works.

According to the author of “Hermeneutics and the hypothetico-deductive method”[11], the

hypothetico-deductive method is about formulating a hypothesis and deducing

consequences from it to arrive at conclusions. These conclusion are well supported through

the way their deductive consequences fit in with well-supported beliefs. [11]

The scientific method follows a number of steps. The list below is the steps for the scientific

method adapted for technological research[11]:

1. How can the problem formulation be solved?

2. How can a product be developed to solve this problem effectively?

3. What information is available and required to develop the product?

4. Develop the product from the information from step 3. If the product is shown to

be complete, continue to step 6.

5. Try again with a new product

6. Create a model/simulation of the suggested product.

7. What are the consequences of the model/simulation?

8. Test the application of the model/simulation. If the outcome is not satisfactory,

continue to step 9, otherwise skip to step 10.

9. Identify and correct shortcomings in the model/simulation.

10. Evaluate the result relative to existing knowledge and practice, and identify new

problem areas for future research.

3.1.3. Quantitative and qualitative method

The terms “quantitative” and “qualitative” refer to the type of data generated in the research

process. The main difference is that quantitative research produces data in form of numbers

while qualitative research produces data in the form of text or prose. A qualitative method

CHAPTER 3: METHOD

11

can for example be using surveys to gather information about opinions from different target

groups, while a quantitative method can be about measuring some numerical value. In order

to produce different results the research methods are typically different. Qualitative research

is mostly linked with non-economic social science disciplines, while quantitative research is

strongly associated with economics and natural science learning.[12] Table 1 summarizes the

difference between qualitative and quantitative methods.

Table 1:Comparison Between Qualitative and Quantitative Methods[12]

Data collected through quantitative methods are often believed to give more objective

information as they were collected using standardized methods and because they can be

replicated. Because of this qualitative research is considered most suitable for formative

evaluations, whereas quantitative research can be used for summative evaluations as they

require quantitative measures to judge the ultimate value of the project.[13]

3.1.4. The design science research methodology

process

Design science research focuses on the development of artifacts with the intention of

improving the functional performance of the artifact. Design science research is typically

applied to different aspects of engineering and computer science, such as algorithms and

human/computer interfaces.[14]

A typical design science research method proceeds as follows[14]:

1. Awareness of problem: the output of this phase is a proposal for a new research

effort.

2. Suggestion: This phase is described in “Design Science Research in Information

Systems” as: “Suggestion is essentially a creative step wherein new functionality is

envisioned based on a novel configuration of either existing or new and existing

elements”[14].

3. Development: in this step the artifact is developed and implemented. The

techniques for how this is done depends on what artifact is developed.

4. Evaluation: in this step the artifact is evaluated according to criteria specified in

the proposal developed in the first step. Deviations from the hypothesis is noted,

analyzed and tentatively explained.

5. Conclusion: the results are consolidated and the knowledge gained is categorized

as either “firm”, which means that facts have been learned and can be repeatedly

applied or behavior that can be repeatedly invoked, or as “loose ends”, which is

anomalous behavior that defies explanation and may very well serve as the subject

of further research.

Figure 2 shows the research process model for the design science research methodology.

CHAPTER 3: METHOD

12

Figure 2: Design Science Research Process Model[14]

In ”A Design Science Research Methodology for Information Systems Research” The design

science research process is described as:

[...] a rigorous process to design artifacts to solve observed problems, to

make research contributions, to evaluate the designs, and to

communicate the result to appropriate audiences. Such artifacts may

include constructs, models, methods, and instantiations. [15]

Hevner describes design science as a problem solving process and writes that the

fundamental principle of design science research is that “knowledge and understanding of a

design problem and its solution are acquired in the building and application of an

artifact”[16]. The purpose of the seven guidelines is to assist researcher to understand the

requirements for effective design-science research. Hevner writes that each of the seven

guidelines should be addressed in some manner for design science research to be complete.

Hevner shows the guidelines and summarizes the descriptions of them in the table below

(see table 2).[16]

CHAPTER 3: METHOD

13

Table 2: Design Science Research Guidelines[16]

3.2. Methods used in this project

In this work a deductive method has been used. This is because a literature study was

performed initially and a hypothesis (see section 3.1.1.) was formed based on the information

gathered. After this, a system was implemented (see section 4) to test and verify the

hypothesis. From the results conclusions were drawn and the question was answered.

The method used to develop the system was also quantitative, as it was about gathering data

from a system and it gave a numerical result, as opposed to a qualitative method that

produces data in the form of text or prose. The method used was also experimental, as several

experiments were performed to decrease the number of requests sent, which further points

to it being a quantitative method (see section 3.1.3.).

Furthermore, the engineering design process has been used to design and develop the

product. This method is aimed to identify the need and from this create solutions and develop

a product. A formal definition of engineering design is found in the curriculum guidelines of

the Accreditation Board for Engineering and Technology (ABET). ABET states that:

Engineering design is the process of devising a system, component, or

process to meet desired needs. It is a decision-making process (often

iterative), in which the basic sciences, mathematics, and the engineering

sciences are applied to convert resources optimally to meet these stated

needs.[17]

Tayal describes the engineering design process as a “formulation of a plan or scheme to assist

an engineer in creating a product”[18]. It is further described as an often iterative decision

CHAPTER 3: METHOD

14

making process to meet desired needs. Below are a series of steps that engineers follow to

solve problems[18]:

1. Define the problem

2. Do background research

3. Specify requirements

4. Create alternative solutions

5. Choose best solution

6. Do development work

7. Build a prototype

8. Test and redesign

It is common for the engineers to jump back and forth between the steps and repeating

earlier steps[18]. This is called an iterative process. The steps are thus not followed

religiously.

The engineering design process is a so called “open-ended” design as the best solution to

meet the requirements of the problem is not known in advance. Previous knowledge together

with information gathered from research is used to explore possible solutions and compare

different ideas in order to select the solution that best uses the available resources and best

meets the products requirements.

Yousef Haik describes the engineering design process as

[...] a sequence of events and a set of guidelines that helps define a clear

starting point that takes the designer from visualizing a product in

his/her imagination to realizing it in real life in a systematic manner—

without hindering their creative process.[19]

This definition also makes it clear that the engineering design process is about starting with

an idea, developing a design and finally developing the product. The author describes two

different ways to design a device or system[19]:

1. Evolutionary change: Here the product is allowed to evolve over time with only

slight improvement.

2. Innovation: technological discoveries has placed a great deal of emphasis on new

products, which draw heavily on innovation.

Haik uses the telephone as an example when describing the difference between these two

points. The telephone was an innovative design as it was a new product made possible from

technological discoveries. The telephone then evolved slowly for many decades but with only

minor improvements until the next innovation and technological jump occurred with the

mobile phone. This in turn evolved with small improvements being added until the next

innovative design, and so on. Haik also adds that “although the emphasis is on innovation,

designers must test their ideas against prior design. Engineers can design for the

future but must base results on the past”.[19]

There are however multiple versions of the engineering design process, as is presented by

T.J. Howard and S.J. Culley, E. Dekoninck in their report “Describing the creative design

process by the integration of engineering design and cognitive psychology literature”[20],

where they compare different versions of the method. The version used for this project is the

one described above. The list below is described as the general agreed upon phases of the

process[20]:

CHAPTER 3: METHOD

15

1. Establishing a need phase

2. Analysis of task phase

3. Conceptual design phase

4. Embodiment design phase

5. Detailed design phase

6. Implementation phase

The difference between the version used and the version presented above is most noticeably

that the one presented above is more focused around the different phases of work while the

one used is more of a step by step guide of how to develop the product. Furthermore, the list

presented above is more focused on the different design phases while the version used is

more focused on the implementation and the testing of the product.

16

Chapter 4

The implemented system

The developed system’s basic idea is to listen to a streaming API and save the results in a

database, thus eliminating the need to send requests to the server. The process of building

this system includes a number of different steps that were performed in order to achieve the

desired working system:

1. Explore Meetup’s streaming API

2. Set up a client

3. Make the client listen to the streaming API

4. Set up a database with tables corresponding to the different information included

in the API

5. Store the generated results from the listening stream in the database

6. Retrieve the information from the database and convert it into a Java object

7. Testing the system and collecting data

The process where the information travels from the streaming API to the user is shown in

figure 3 below:

CHAPTER 4: THE IMPLEMENTED SYSTEM

17

Figure 3: Illustration of the Process of Sending Data from Meetup's Server to the User

Exploring the streaming API was done to know what kind of response the client would

receive so that the client could be built to handle what the server sent. The desired

information was collected from the streaming API’s documentation[21]. From this

documentation it was learned that the response from the API is in JSON format (see section

4.1. for a description of JSON). It was also found out that Meetup uses a persistent connection

with the client that will only be terminated for server maintenance. This is what was desired

for this project as the application developed requires an indefinite stream to get all

information wanted. The documentation also specified what host (base URL) the client

should connect to. Furthermore, the documentation contained a detailed figure over what

information the response sent contained.

The next step in the development process was setting up a client. This was done with the help

of the Retrofit library, that uses OkHttp to send network requests (see section 4.4.), which

made the process simple. The client is the system that will connect to the server and receive

the response.

To make the client listen to the streaming API the Retrofit framework (see section 4.4.) was

used. This framework allows the user to make a client listen to a specified endpoint, which

in the case of this project is Meetup’s streaming endpoint.

Step 4 was setting up a database with the tables needed to store all the information from the

API. This was done using PostgreSQL, which is an open source object-relational database

system[22]. What tables to create was determined from the information in the API’s

documentation, where a detailed description of what was sent from the server was available.

CHAPTER 4: THE IMPLEMENTED SYSTEM

18

When all the tables were set up in the database the code was written for caching the response

from the server in the database. This mostly involved writing SQL statements for inserting

information in the database.

The next step was retrieving the information in the database and creating a Java object with

the information. This object is what is sent when a GET request is sent to the application with

a user’s location and possible search term. Postman was used to send requests to the

application to test the response. This meant sending a request to the one of the endpoints set

up for the application. For example, to one endpoint a location (longitude and latitude) and

a radius is sent and the response from the application is an array of the objects created for

all the Meetup events within the specified radius of the location. (An example of the response

from the application can be seen in figure 5).

Finally, extensive tests were done to make sure everything was working as intended. This was

done by running the program for a full week on one of KTH’s servers (sky2.it.kth.se). It is of

course not possible to run the application indefinitely, but it was decided that a week of

streaming should be sufficient to determine that the stream could be trusted to be endless,

as well as making sure that the system could handle bursts of events being received in a small

amount of time. During this time, the number of received items from the server was counted

to calculate the average number of items received per minute. The result of this is presented

in section 5.

4.1. Software development tools

The system was developed in Java using the IDE “IntelliJ IDEA”. A Maven project was set up

in IntelliJ to manage the libraries used. Maven is a software project management and

comprehension tool[23]. Maven uses the project object model POM to manage a project’s

build, reporting and documentation[23].

The database used was PostgreSQL with PGAdmin used as an administrator for the database.

However, most of the database-related work was done in Windows command prompt

(CMD).

4.2. JavaScript Object Notation (JSON)

JSON is a human friendly lightweight data-interchange format. It is easy to parse and

generate and is also language independent. It has the following structures[24]:

● Name and value pairs which is realized as an object, record, struct, dictionary,

hash table, keyed list or associative array.

● An ordered list of values which is realized as an array, vector, list or sequence.

These data structures are interchangeable with all modern programming languages.

A JSON object is a set of name and value pairs separated by a comma and begins with a left

brace, “{“, and ends with a right brace, “}”, with a colon at the end of the statement[24]. An

example of a JSON object can be seen in figure 4.

CHAPTER 4: THE IMPLEMENTED SYSTEM

19

Figure 4: Example of a JSON Object

The figure shows a JSON object, “widget”, with other JSON objects within it, “window”,

“image” and “text”. On the left side of the colon is the name of the object, for example

“window”, and on the right is the content of the object.

4.3. Dropwizard

DropWizard is a Java framework which gathers different libraries for development of a REST

web service. It provides functions for building web applications with the help of Maven. Since

a web application needs HTTP in order to work properly Dropwizard uses the Jetty HTTP

library for embedding a HTTP server into the project. The project has a main method that

starts the HTTP server and the application is run as a simple process.[25]

In order to build web applications there is a need for performance, as well as clean and

testable classes which maps HTTP requests into objects. The Jersey framework provides

these different features and is integrated in the Dropwizard framework. Among other things

Jersey also supports streaming output and GET requests, two features that are both used in

the developed application.[25]

The created Dropwizard application includes an application class which gathers all the

different bundles and commands to provide basic functionality. It is from this class that the

whole application is started in a run method. This run method is called from a main method,

that works as an entry point for the application.[25]

CHAPTER 4: THE IMPLEMENTED SYSTEM

20

4.4. Retrofit, Okio and okHttp

Retrofit is a type-safe HTTP client for Java, and is used to connect to a REST web service. It

is a framework for authenticating and interacting with APIs, where the API interfaces are

turned into callable objects. Retrofit also makes it possible to download JSON data from a

web API with help from the HTTP annotation[26]. Retrofit uses OkHttp to send network

requests. OkHttp is an HTTP client for sending and receiving HTTP-based network

requests[27,28].

OkHttp was also used in this project to deactivate a default timeout for the client, in other

words making the connection to the server endless. Without setting this timeout the

connection to the server was deactivated if no event was sent to the client in a default amount

of time, resulting in an exception from the client.

The Okio library was used to buffer the data received from the server, which made it possible

to easily access, store and process data. This was done with Okio’s BufferedSource, which

contains an internal buffer for storing of bytes. A buffer is a sequence of bytes where the size

does not have to be defined in advance and there is no obligation to handle positions, limits

or capacities. The readings and writings were buffered as a queue. In other words,

BufferedSource was used to buffer the events received from the server. Since the stream is

endless, a fixed size could not be given and this buffer was therefore used for storing the

JSON objects that came out as the result.[29]

4.5. RxJava

RxJava is an API for asynchronous programming with observable streams. Observables are

objects that represent a source of data which it streams when it gets available in our system.

A subscriber is used to listen on this observable and is called subscription. The subscriber

listens until the observable marks itself as ready, or, as in this case, continues

indefinitely.[30]

21

Chapter 5

Result

The previous system, which sent multiple requests, exceeded the 200 request limit of

requests that could be sent before the API key locked itself. With the current system the

number of requests has been decreased down to zero. This is because instead of sending

several requests to the Meetup API to receive the events, the system is now listening to

Meetup’s streaming API so that the client receives all newly created events and all event

updates directly.

The objects that are received from the stream sometimes arrive several at a time in a burst

and sometimes several minutes can go by without anything being sent from the server.

Because of this variety in objects, tests had to be carried out to make sure the client could

handle this variation as well as handling several requests arriving in a small amount of time.

The program was therefore run for a week to see if any complications would occur. The

number of events received during this week were documented in order to calculate the

average number of events received in one minute (see section 4). The result of this were that

129 540 events were received during the week, which means the average number of requests

received per minute is approximately 13.

The events sent from the streaming API are sent as JSON objects (see section 4.2. for a

description of JSON). This object contains all the information that is sent from the API,

including id, time and description of the events. The JSON object for events contains other

JSON objects for fee, venue and group, that contains information of the price of the event,

the venue where it takes place and the group that created the event respectively. Group in

turn contains two JSON objects for category and group photo. Each of these objects has

different fields for the information the object contains. For example, fee contains fields for

amount and currency. If any of the objects do not contain any information, for example if the

event is free and therefore has no fee, the object is left out from the event. Below is a pretty

printed (the objects are structured instead of being displayed as a single line) version of one

JSON object sent from the server (see figure 5). As seen from the figure, it is a free event as

it contains no fee object.

CHAPTER 5: RESULT

22

Figure 5: Example of an Event Sent from Meetup's server

These JSON objects were then saved in a database, where each object was saved in a separate

table, that is, there was one table for each of the objects stated above where the information

from that object was stored. The tables had columns corresponding to each of the names of

the fields. Each of the rows in the table is a new event received from the server. Furthermore,

the table contains a column for the event id so that it can be identified which event the

information is from. Below is an example of the content displayed for the table “fee”. For

example, we can see from the figure that the first row is from an event with the ID

“skkjcnywjbtb” that costs 15 GBP (British pounds) per person to attend. If a numeric value

does not exist it is set to -1, as shown in the second column for the fee_api_version.

CHAPTER 5: RESULT

23

Table 3: Table "fee" in the Database

24

Chapter 6

Discussion and conclusion

From the result it is evident that the goal of finding an alternative way of collecting data from

Meetup’s server without the locking problem of the REST API has been achieved. The

number of requests are minimized and the data is collected and stored without any problems.

This means that the hypothesis of implementing a listener to listen to a stream of events from

the streaming API that were presented in section 1.4. achieved the desired result. It is now

possible to receive all the events from Meetup’s API without the API key locking itself.

From running the system for a week (see section 4 and 5) without the system crashing or any

other complications, the conclusion is drawn that the system is indefinite and can handle a

large number of data arriving in a small amount of time.

6.1. Sources of error

A possible error source of this project is that at irregular times after several hours of

streaming, an EOFException[31] occurs. This is an exception thrown to signal an end of file.

Extensive testing has been done to determine the cause of the error and find a solution,

however, no pattern has been identified of when the exception occurs or why. From the

research process it has been determined that the most likely reason is a malformed response

sent from the server. This can however not be confirmed since at no times when the exception

occurs has the event causing the exception been identified. The testing that has been done to

determine the cause involved running the stream directly in a browser. It was discovered that

the stream stopped when run in the browser at the same time it stopped when run in the

system developed in this project, and no more events were received until the page was

refreshed. From this it is likely that the fault is not caused by the developed system, but rather

by the data sent from the server.

The problem described above is temporarily solved by performing a so called “retry” on the

connection to the stream. This automatically restarts the connection every time the exception

occurs, so that the client can continue listening to the stream. Restarting the connection

could in worst case cause a loss of data, in case any data is sent during the small amount of

time it takes for the system to restart. This would mean events are lost so that they are never

CHAPTER 6: DISCUSSION AND CONCLUSION

25

displayed in the application, and the user could miss Meetup events. However, since the

stream can go on for several hours before the exception occurs and since the time it takes to

restart the connection is short, at most half a second, the impact is not considered severe.

The odds of receiving an event at precisely this time are quite small, and since the application

is currently only released in Sweden and the large majority of the Meetup events takes place

in other countries, the effect of this error source is considered small.

6.2. Choice of methods

A method can, as described in section 3.1, be either deductive or inductive. A deductive

method was chosen rather than an inductive method as a hypothesis was formed for how the

problem could be solved early in the process and the system was then implemented to test

the hypothesis and solve the problem, which follows the deductive method (see section

3.1.1.). As described in section 3.1.1. an inductive method is, unlike the deductive method,

about making an empirical study based on an identified problem and from this draw

conclusions around the results. This is taking an opposite approach compared to the

deductive method.

The engineering design process was chosen as the main method for this project as it is a well-

established method for development of new products for engineers. It seemed suitable for

the kind of work that was to be done in this project. The method had the advantage of giving

concrete steps to follow through the development process, which provided a clear structure

for the work and as a result made it easier to perform. Following the steps throughout the

project gave a clear structure and was especially useful to specify the requirements of the

system. This not only made it clear what needed be done but also helped to figure out how to

do it.

The hypothetico-deductive method is, as stated in section 3.1.2, about starting with an initial

theory and then perform experiments and evaluate the result. Though this is basically what

has been done during this project, the engineering design process was chosen in favor of this

method as the engineering design process was deemed more specifically formed for

engineering development and therefore more suitable for this project. Therefore, the steps

for the hypothetico-deductive method were not intentionally followed, though some of them

have been performed due to this method being quite similar to the engineering design

process, though it is more broad and general.

Lastly, the design science research method was considered for this project. This project is

quite similar to the engineering design process, as it is about the development of artifacts to

solve observed problems, something that is central for the engineering design process as well.

Furthermore, it is often applied to engineering and computer science. However, the

guidelines (see section 3.1.4) that according to Hevner has to be addressed for the method to

be classified as design science research method has not been fully addressed as some of them

were deemed irrelevant or too extensive for this project. For example, guideline three states

that “the utility, quality, and efficacy of a design artifact must be rigorously demonstrated via

well-executed evaluation methods”[16]. This level of evaluation was not done mainly due to

time limits. Because these guidelines were not followed, the design science research method

has not been used.

6.3. Future work

Continuous work on this project could involve solving the recurring EOFException (see

section 6.1.) that occurs after several hours of continuous streaming. As mentioned in section

CHAPTER 6: DISCUSSION AND CONCLUSION

26

6.1., this exception usually signals that the end of a file or stream is reached unexpectedly.

Since the stream is supposed to be endless according to its documentation[21], the program

should not receive this kind of exception. To solve this exception, further research must be

done to determine the cause of the error as well as to be able to solve it. This would mean

that the problem with a potential loss of events during the time it takes to restart the

connection with the server (see section 6.1.) no longer would be an issue.

In this project only one solution to the problem of sending too many requests to the server

has been considered, namely to use a streaming API instead of sending requests to a REST

API (see section 1.6.). This makes the solution quite specific, only working in the case a

streaming API is provided for the service. Future work could therefore involve finding

alternative solutions to reduce the number of requests sent to the server so the locking does

not occur.

The system that has been developed for this project is currently run separately from the

system that Anchr has developed to form the mobile application. The next step would thereby

be to integrate the created system with their existing system, so that the events received from

Meetup’s streaming API can be displayed in the application.

The current system is built in a way that when a request is received all the rows in all the

tables in the database are searched through in order to find the events that matches the

request. As the system is continuously running, the database will fill up with millions of rows

of data, which will increase the search time when querying the database. This will affect the

performance of the system and the application when listing nearby events. Therefore, the

system could be developed further by implementing a more efficient way of searching the

database. One possible solution would be sorting the events when they are inserted into the

database so that binary search can be used to find events in a shorter amount of time.

Furthermore, a system could be developed to delete events that have passed, and therefore

no longer are relevant, from the database and thereby reducing the number of rows that must

be searched for each request.

A future feature that could be added to the application is allowing the user to search for

events within a specific time interval. This could for example be useful if a user is visiting a

particular place for a limited amount of time. The user could then choose to only display

events that takes place within this specified time interval.

REFERENCES

27

References

1. "About Meetup." https://www.meetup.com/about/ 2017-05-26

2. Masse, M. REST API Design Rulebook: Designing Consistent RESTful Web Service Interfaces: O'Reilly Media, 2011.

3. Loreto, S.; Saint-Andre, P.; Salsano, S.; and Wilkins, G. "Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP." 2011.

4. Psaltis, Andrew G. "Streaming Data " Understanding the real-time pipeline:Manning Publications Co., 2016.

5. Kozamernik, Franc. "Media Streaming over the Internet." an overview of delivery technologies:EBU Technical Department, 2002.

6. Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard; and Gavalda, Ricard. "Detecting Sentiment Change in Twitter Streaming Data." Proceedings of the Second Workshop on Applications of Pattern Analysis, vol 17 (Diethe TomBalcazar JoseShawe-Taylor John and Tirnauca Cristina, eds). Proceedings of Machine Learning Research:PMLR, 2011;5--11.

7. Bommaiah, E.; Guo, K.; Hofmann, M.; and Paul, S. Design and implementation of a caching system for streaming media over the Internet. Proceedings Sixth IEEE Real-Time Technology and Applications Symposium. RTAS 2000 2000 2000;111-121.

8. Blomkvist, Pär, and Hallin, Anette. Metod för teknologer: Studentlitteratur AB, 2014.

9. Elo, Satu, and Kyngäs, Helvi. "The Qualitative Content Analysis Process." Journal of Advanced Nursing 62 (2008): 107-115.

10. Andersson, Niclas, and Ekholm, Anders. "Vetenskaplighet – Utvärdering av tre implementeringsprojekt inom IT Bygg och Fastighet 2002: Lunds Tekniska Högskola, Institutionen för Byggande och Arkitektur, 2002.

11. Martin, M., and McIntyre, L.C. "Hermeneutics and the hypothetico-deductive method." In Readings in the Philosophy of Social Science. United States of America: The MIT Press, 1994.

12. Garbarino, Sabine, and Holland, Jeremy. "Quantitative and Qualitative Methods in Impact Evaluation and Measuring Results." Discussion Paper. Birmingham, UK: University of Birmingham, 2009.

13. Frechtling, Joy. "An Overview of Quantitative and Qualitative Data Collection Methods." In The 2002 User-Friendly Handbook for Project Evaluation, 2002.

14. Vaishnavi, V, and Kuechler, W. "Design Science Research in Information Systems." 2004.

15. Peffers, Ken; Tuunanen, Tuure; Rothenberger, Marcus; and Chatterjee, Samir. "A Design Science Research Methodology for Information Systems Research." J. Manage. Inf. Syst. 24 (2007): 45-77.

REFERENCES

28

16. Hevner, Alan R.; March, Salvatore T.; Park, Jinsoo; and Ram, Sudha. "Design science in information systems research." MIS Q. 28 (2004): 75-105.

17. ABET. "Criteria for Accrediting Engineering Programs." Effective for Reviews During the 2016-2017 Accreditation Cycle United States of America:ABET, 2015.

18. Tayal, S.P. "Engineering Design Process." International Journal of Computer Science and Communication Engineering (2013).

19. Haik, Yousef, and Shahin, Tamer. "Engineering Design Process." United States of America:Global Engineering: Christopher M. Shortt 2011.

20. Howard, T. J.; Culley, S. J.; and Dekoninck, E. "Describing the creative design process by the integration of engineering design and cognitive psychology literature." Design Studies 29 (3// 2008): 160-180.

21. "OpenEvents Stream." https://www.meetup.com/meetup_api/docs/stream/2/open_events/ 2017-05-26

22. www.postgresql.org. "About." https://www.postgresql.org/about/ 2017-05-26

23. maven.apache.org. "Welcome to Apache Maven." https://maven.apache.org/# 2017-05-29

24. www.json.org. "Introducing JSON." http://www.json.org/

25. www.dropwizard.io. "Getting Started." http://www.dropwizard.io/1.1.0/docs/getting-started.html 2017-05-29

26. square.github.io. "Retrofit: A type-safe HTTP client for Android and Java." http://square.github.io/retrofit/ 2017-05-26

27. square.github.io. "OkHttp: An HTTP & HTTP/2 client for Android and Java applications." http://square.github.io/okhttp/

28. guides.codepath.com. "Using OkHttp." https://guides.codepath.com/android/Using-OkHttp 2017-05-26

29. github.com. "Okio." https://github.com/square/okio 2017-05-26

30. www.vogella.com. "RxJava 2.0 - Tutorial." http://www.vogella.com/tutorials/RxJava/article.html 2017-05-26

31. docs.oracle.com. "Class EOFException." http://docs.oracle.com/javase/7/docs/api/java/io/EOFException.html?is-external=true 2017-05-26

TRITA TRITA-ICT-EX-2017:89

www.kth.se