19
Kasi & Branchaud 1 Proactive Task-Conflict Avoidance through Constraint Processing Bakhtiar Kasi and Josh Branchaud Supervised by Berthe Choueiry and Anita Sarma May 3, 2011

Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 1

Proactive Task-Conflict Avoidance through Constraint Processing

Bakhtiar Kasi and Josh Branchaud

Supervised by

Berthe Choueiry and Anita Sarma

May 3, 2011

Page 2: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 2

Introduction and Motivation

In the field of software engineering where there are large software systems being

developed teams of programmers, it is becoming increasing necessary for there to be

workspace awareness tools. As developers in a team work on their individual

contributions to a project, it is necessary that they each have the safety of their own

workspace. Developers need this isolation in order to work without distraction and to be

creative. Though this isolation is important, it is even more important to have effective

communication throughout the team. Effective communication is essential to the design

and implementation of a successful software project. Without communication,

requirements would not be met, the work of each developer would be completely

uncoordinated, and conflicts of all different severity would arise. All these side effects

would lead to over due and over budget software projects that would more than likely

fail. Thus, communication is of utmost importance. It becomes clear that there needs to

be a balance between developer isolation and team communication. The aim of many

workspace awareness tools is to provide this balance.

Of the wide array of workspace awareness tools, one such tool is called Palantír. Palantír

is an Eclipse IDE plug-in that communicates with each user’s workspace and the

centralized versioning system so as to provide a relevant and minimally intrusive layer of

communication between all developers on a team. Each team member becomes

increasingly aware of the relevant changes going on in the workspaces of their

teammates. This is an approach to file-level workspace awareness because it focuses on

each user’s interactions with the individual files in the project. Another approach not

Page 3: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 3

supported by Palantir is task-level awareness. Task-level awareness is an abstraction of

file-level awareness. Instead of focusing on a user’s activity at a file-level, it takes a step

back to look at a task-centric view of the project.

The task-centric view is an approach to software development that focuses on the tasks in

a user’s workspace rather than the files. A user is either delegated a set of tasks or can

create their own tasks. The details of the task-centric approach and an implementation

known as Mylyn will be presented in the next section of this paper. Task-centric tools

like Mylyn are susceptible to some of the same concerns as the file-level approach.

Similarly to the file-level, conflicts can appear in task-centric approach as well. It,

therefore, becomes just as important to provide some type of workspace awareness tool

for Mylyn and the like to be able to prevent or at least mitigate conflicts.

Conflicts occur in two cases: (1) when multiple developers concurrently edit the same

artifact, and (2) when changes to one artifact affect concurrent changes to another

artifact. In the first case, two developers edit the same artifact in separate workspaces, so

their respective changes need to be combined to create a consistent version. We term this

kind of conflict a Direct Conflict. Note that merge tools help in resolving direct conflicts

to an extent, but cannot always guarantee a semantically consistent and desired outcome,

as a result of which merging is still a time consuming and often manual process. As an

example of the second case of conflicts (indirect conflics), it may happen that a developer

working in his or her private workspace modifies a library interface that another

developer just imported and started referring to as part of a change in his or her private

Page 4: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 4

workspace. This kind of conflict is usually more difficult to detect, as it tends to reveal

itself at a later stage in the development process (e.g., as a build failure, test case failure,

or, worse, bug after deployment). We term this kind of conflict an Indirect Conflict. [3]

We propose a way of preventing conflicts that we call proactive task-conflict avoidance,

one approach to minimizing conflicts in tasks of a team is through constraint processing.

We can identify the constraints (dependencies across tasks that may lead to a direct or

indirect conflicts) and then solve these constraints to schedule tasks, which will minimize

conflicts. By analyzing Mylyn data for a team, we can proactively determine optimal task

selection such that conflicts between tasks (based on files) are avoided. We analyze two

approaches to this optimal task selection through constraint satisfaction processing

(CSPs). The first approach was implemented and tested and we have provided the results.

The second approach is simply a proof of concept. Other approaches have been

considered but not pursued in depth and as such will be addressed in future work.

However, before going into the details of our solutions, we will give more background on

Mylyn and discuss the process of our data collection.

Background

Mylyn formally known as Mylar [1] extends the Eclipse SDK by providing a task-centric

interface to programmers. Mylyn makes it possible for Eclipse to display only the

artifacts that are relevant to a particular task in a workspace. A task is any unit of work

that the programmer is interested in at a particular moment, such as a user-reported bug,

new functionality or a modification request. Tasks when created in the workspace are

stored in the local workspace and are referred to as local tasks, whereas users can also

Page 5: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 5

create, view or modify tasks in a remote repository such as Bugzilla, Trac, etc. Each task

has an associated context -- the set of all artifacts relevant to a task. The task context

includes files or methods being edited, and the APIs to which the programmer refers to.

The task context grows as the user interacts with different artifacts in the workspace.

Initially the task context is empty; files are added to the context as the task progresses.

Mylyn makes the task context significant by focusing the UI only on the interesting

information and hiding all others. The activity monitor in Mylyn extracts this information

by continuously monitoring the programmer’s activity and interactions in a workspace

and saves them to the task context. This reduces information overload and focuses a

programmer’s work by filtering and ranking the information presented otherwise by the

development environment.

Mylyn monitors both direct and indirect interactions of the programmer. A direct

interaction occurs when a programmer selects a particular Java file to view its source, or

when she edits a part of it and then saves the file. Indirect interactions are the events

where program elements and relationships are indirectly selected or edited. For example,

a propagation event occurs when a programmer navigates to a different file by using the

‘open declaration’ shortcut in Eclipse. Another interesting event is the prediction event

which describes possible future interactions that Mylyn anticipates the programmer might

perform. An example of a prediction event is a scenario where a parent class may be of

interest to the current task because a class that inherits it is being edited.

Page 6: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 6

Mylyn monitors programmers’ activities and captures the relevance of code elements to

their task in a degree-of-interest (DOI) model [2]. The model associates an interest value

with each artifact in the task context. As the user interacts with a program element its

DOI value increases – gains interest. Similarly the DOI values decay – decrements for

artifacts that are not selected periodically. Mylyn uses text cues to highlight files and

methods that are most interesting in a given task.

From the programmer’s point of view, the model represents the subset of program

elements in the IDE that are relevant to the current task. Mylyn helps in improving

programmer’s productivity by reducing the time spent on searching, scrolling, and

navigating. Also it helps the programmer to switch context easily between tasks, so the

programmer does not have to rebuild the task context again.

Page 7: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 7

Figure 1: Mylyn integration in eclipse IDE

In the figure above (Figure 1) you can see how Mylyn integrates into the eclipse IDE. On

the right of the figure (label 1) you can see the tasks list for the user organized with the

task id and short description, the tasks may be saved locally or they can be saved at a

centralized location like the Eclipse repository. The interesting feature of Mylyn is the

task focused UI (label 4); only files relevant to the current task are shown here. Visual

cues are used to highlight interesting items. Files that are used frequently are landmarked

by making them bold.

Page 8: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 8

Data Collection

We used Mylyn’s open source data for this project; Mylyn keeps its development

information in the Bugzilla repository (eclipse.org). The data is open source and is

publicly available. The information that Mylyn keeps is in the form of XML files that

corresponds to a task. Each file contains all the interactions of a user did within a task. A

snippet of this XML information is shown in the following figure:

Figure 2: Context information saved by Mylyn

Each event has the event type associated with it, it also contains the timestamp and the

file modified in the event.

To access the Mylyn issues and their associated task context, we wrote a customized web

crawler that connects to the Bugzilla repository and downloaded the XML file for 6

releases of Mylyn. In this same way we collected task contexts for a total of 155 tasks.

Using our own XML parser, we extracted all the events from the XML and saved them to

our own customized database in MySQL. After the data collection was complete we were

able to perform some statistical analysis and extract useful scenarios from the data. The

following table summarizes our data:

Page 9: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 9

Task  Repository Bugzilla  (Eclipse.org)

Versions: 3.0,  3.0.4,  3.1,  3.3,  3.4,  3.5

#  of  Tasks: 582  (458  Complete,  124  Incomplete) Median:  6  days

Tasks  with  contexts: 155 Median:  25  mins

Between: 05-­‐29-­‐2007  and  04-­‐05-­‐2011

Developers: 18

Edit  events: 553  (2%  of  total  events)

Table 1: Summary of the Mylyn’s data

Unfortunately, the data that we collected was not as useful as originally expected.

Specifically, the task information was not accurate; task length sometimes lasted for a

couple of days. An extreme example is a task that was open for more than 365 days. This

shows that open source data is not always reliable since task information is not always

updated by the users correctly or certain tasks might be assigned and started but not

completed by open source users. To make the data useful for the project context we

extracted scenarios irrespective of the event types and irrespective of the dates for at least

three developers. We found at least 1000 tasks that had potential conflicts in their context

and using these tasks we were able to extract 15 different scenarios for our analysis.

These scenarios will be further explained in a later section.

Approach Overview

As discussed earlier, we sought two approaches with this project. The first, and primary,

approach was modeled, implemented, and tested. The second approach was

mainly modeled and sample scenarios were run by hand as proof of concept.

Page 10: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 10

Other approaches, as discussed in various meetings with advisors, were

considered but not pursued. Some of this other approaches appear to hold

promise, but were simply out of the scope of this project. As such, these

approaches will be considered for future work as the project matures and is

further refined.

Main Approach:

Looking at the data we had collected, it was clear that there were many questions that

could be asked each with a different way of solving it. It also became apparent that we

were trying to bite off more than we could chew. The complexity of the various problems

and scenarios that we were discussing was daunting. We needed to simplify the question

so as to get at an answer. More complex questions can come later. As a result, we came

up with the following approach.

A broad way of terming our goal for this approach is to help each developer choose tasks

that do not conflict with other developer’s tasks. It is important to make a distinction

here. The goal is not to force the developers to work on particular tasks, but rather to

suggest tasks to the developers. The suggestions can be taken or they can be ignored. We

do not wish to force a developer to do a certain task nor do we have a way of forcing

them. Developers should come to realize that by taking the suggestions of the tool, they

will become more productive and spend less time resolving conflicts later on.

Page 11: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 11

As stated above, we needed to make some simplifications in order for this project to

become feasible. The first assumption is that we are building a solution for a single day

of work. The day will be broken up into equally sized intervals so that tasks can be

assigned to those intervals. This alludes to the assumption that each task requires the

same amount of time to complete. Furthermore, each developer has enough tasks for the

day, so no interval will go unassigned as long as there is a consistent solution. Each task

has a set of files associated with it which is the basis for conflicts between tasks. A final

point of clarification is that we are not trying to take a temporal reasoning approach here.

Though this may seem like a scheduling problem based on time intervals, it is more of a

pseudo-scheduling problem. Each interval is fixed in length and time, so no temporal

reasoning is required.

We can define a problem with two parts: the given information and the question we want

to answer with that information.

Given: • A set of developers (each with a distinct name) • A task list for each developer (each containing a distinct set of tasks) • The files associated with each task (may over lap with other tasks) • Interval length (same for all intervals) • Length of a work day • Intervals (result from creating distinct intervals based on the number of intervals

that are in a day and then multiplying that by the number of developers) Question: What assignment of tasks to intervals results in no conflicts?

After defining the problem, modeling the problem naturally follows. Based on the

discussion so far, a partial model has begun to form. However, the model can be more

Page 12: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 12

explicitly defined as a CSP. A CSP consists of a set of variables, the set of domains for

those variables, and the constraints between those variables.

CSP = (V,D,C) • V, Variables – the set of time intervals which result from breaking the day up

into equal-length intervals based on the given interval length and then multiplying it by the number of developers.

• D, Domains – the given task lists, each domain is associated with a distinct set of variables because each of those sets of variables can be thought of as belong to a specific developer.

• C, Constraints – there are two types of constraints, the binary form of ‘all-diff’ constraints which simply force a given set of variables to take on different values and the conflict constraints which assert that no interval can take on a value that conflicts with the assignment of an interval that is chronologically equivalent.

The above CSP definition can be made more clear with an example. We will use an

example of the very scenarios we tested to further illustrate the modeling of this

constraint problem.

The variables in this CSP are the intervals which are the colored rectangles (blue, yellow,

and orange). Each of these intervals needs to take on a task as its value. The number of

domain sets in this CSP is equal to the number of developers and in this case is equal to

3. The coloring of the intervals denotes which developer that it belongs to. The first

domain set is for the blue intervals because it is the task list of developer P1. The second

Page 13: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 13

domain set is for the yellow intervals because it is the task list of developer P2 and so on.

As for the constraints, there are two types as mentioned. The first type says that intervals

of the same color (which have the same domain) cannot take on the same value.

A task in IiPn cannot equal a task in IiPm for iϵ{1,2,3} and n,mϵ{1,2,3,4}. (Binary representation of all-diff)

The second type says that intervals that are vertically adjacent cannot have conflicting

tasks which are tasks that contain at least one same file.

A task in IiPn cannot have conflicting files with a task in IjPn for i,jϵ{1,2,3} and i≠j and nϵ{1,2,3,4}

Given this modeling of the problem, we were able to generate a number of testable

scenarios. We created 15 scenarios each with 3 developers and 4 intervals per day. Each

developer was given a task list of size 6. What sets the scenarios apart is that they vary by

how constrained they are (how many conflicts there are between tasks). We decided it

would also be interesting to decrease interval length allowing for more tasks to fit into a

given work day. We made variations of each of these scenarios by allowing 5 and 6

intervals to fit into a day. The result was 45 scenarios that could be run through one of the

binary CSP solvers created during the semester.

We chose to use the FCCBJ hybrid solver which gets the benefits of Forward Checking

and Conflict-Directed Backjumping in a single solver. Using this solver we were able to

quickly find solutions to the XCSP scenarios. We were surprised to find that even our

more highly constrained scenarios had multitudes of solutions. Though we are seek to

find solutions, it is important to consider the boundary cases; to see the point at which a

Page 14: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 14

scenario become too highly constrained to have a solution. So, we created a couple of

scenarios that were designed to be overly constrained. This, of course, resulted in no

solutions. In the interesting of having a scenario worth presenting, we wanted to find a

balance so as to have a few solutions, but not a million solutions.

We created the following scenario which had 96 solutions and is relatively easy to

understand.

Example 1: Fixed length task intervals

Using the FCCBJ solver was able to find solutions for the above scenario, the labels

shows the order in which the tasks are assigned in only one of the possible solutions.

Since all tasks had fixed length, it was possible for us to find the solutions beforehand.

From the example you can see that Frank was assigned task 252297, whereas David and

Shawn were assigned tasks 189689 and 256809 respectively. So with this assignment we

Page 15: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 15

are sure that there is no conflict in the context of above 3 developers. Similarly with the

sequence that was proposed by the scheduler will ensure that there is no conflict in the

same interval and eventually all the task were be completed by the end of day effectively.

Secondary Approach:

Our secondary approach is still based on the previous assumption that the initial

assignments are done by considering tasks as fixed length tasks, so we need to assign one

task to each interval. But to make the scenario more realistic we will consider the

possibility that a developer may finish a task earlier than expected. Task duration can

vary based on the intellectual differences between developer and the nature of a task as

well. Some tasks may be shorter and are finished earlier. Similarly, based on individual

difference some (experienced) developers will be able to complete task earlier than

Example 2: Variable length task intervals

Page 16: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 16

others.

In the above example (example 2) base on the fixed length assumption the scheduler will

recommend a task sequence for all three developers. Assume that after some time Frank

is done with his first task (225927). At this point the scheduler will try to assign a task to

Frank for the remaining three tasks in such a way that it does not conflict with the task

being currently carried out by David and Shawn (i.e. we want a task assignment for Frank

such that it does not conflict with the tasks 189689 and 256809 of David and Shawn

respectively). At this point we will have the initial task of Frank removed from his

domain, we also remove the constraints shared with this task, further we add some new

unary constraints to make sure that the assignments for David and Shawn are not

changed. With this new set of assignments the Scheduler is able to select 254862 for

Frank. This approach will reevaluate the CSP every time a task is completed. This is a

greedy approach which attempts to find the best task for a developer to work on at any

given time when a task has been completed.

Future Work

The two approaches discussed above only cover a small subset of the ways in which we

could target the problem of task-conflict avoidance. We recognize that we have only

begun to explore this area. As such, we will share some of the directions in which further

pursuit of this research topic could go.

First off, the current approach that we have taken is extensible. It is not just limited to 3

developers and a few time intervals. More complex situations with many more

Page 17: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 17

developers, larger task sets, and larger projects could be modeled this way as well. This

approach is, however, built on top of a number of large assumptions as spelled out earlier.

Now that a simplified version has been explored, more complex solutions can be sought

after. Additionally, this and other similar approach are limited until they can be integrated

with the Mylyn environment.

In order for various approaches to be usable and effective for teams, it is necessary to

have access to the data in real-time so that information and solutions can also be provided

to the developers in real-time. It would be a significant achievement to be able to create

an Eclipse plug-in to work with Mylyn that could generate solutions in real-time for

users.

There are also approaches to this problem that could be solved with different types of

constraints. Two types of constraints that were discussed and considered are Soft

Constraints and MaxCSPs. The advantage of these two types of constraints is that they

would work in realistic scenarios where tasks may be much more highly constrained.

They would be able to provide solutions that hard/crisp constraints simply cannot allow.

For instance, with Soft Constraints a solution could be found where conflicts exist, but

because of the weighting of the CSP, an optimal solution is found. Similarly, MaxCSPs

could be used. MaxCSPs seek to find a solution that maximizes the number of constraints

satisfied. This is another way in which some constraints might be broken, but an optimal

solution can still be found. Approaches like these would surely make this type of tool

more robust.

Page 18: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 18

The second approach as presented earlier in this paper has a unique limitation. It is

dynamic and because of that being able to model scenarios for it is unfeasible. Rather, we

would need to build a simulator that could be used to create scenarios step by step. This is

necessary because dynamic scenarios cannot be feasibly created and tested by hand.

Another area that we believe would be interesting to explore is using the DOI model as a

heuristic. The DOI model can give us an idea of the likelihood of certain files being

associated with certain tasks. Having this information would allow the tool to be even

more proactive.

Conclusions

We tried to model our scenarios as closely as possible (to the real world scenarios), but

we concluded that the open source data for Mylyn was not as useful as we thought, we

could only predict the type of problems that could occur in the real development

environment. However, given that we tested our model on scenarios ranging from simple

(with fewer constraints) up to some complex scenarios (that had many constraints), we

were able to find more than 1 solution in all cases. We are hopeful that our scenarios may

be generalized to the real world problem instances. However, we can only further test the

strength of our model once we have events generated by the simulator.

Page 19: Proactive Task-Conflict Avoidance through Constraint ...bakhtiarkasi.com/wp-content/uploads/2017/04/TR-UNL... · Mylyn and discuss the process of our data collection. Background Mylyn

Kasi & Branchaud 19

References

[1] Mik Kersten, Gail C. Murphy. 2006. Using task context to improve programmer productivity. SIGSOFT '06/FSE-14. DOI = http://dx.doi.org/10.1145/1181775.1181777 [2] Mik Kersten, Gail C. Murphy. 2005. Mylar: a degree-of-interest model for IDEs. AOSD 2011. DOI = http://dx.doi.org/10.1145/1052898.1052912 [3] Anita Sarma, David Redmiles, and Andre van der Hoek. 2008. Empirical evidence of the benefits of workspace awareness in software configuration management. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering(SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 113-123. DOI=http://doi.acm.org/10.1145/1453101.1453118