CMSC698.doc

Decision-Tree Learning for Negotiation Rules

Zhongli Ding

A paper submitted to the

Computer Science Electrical Engineering Department

in partial fulfillment of the requirements for the M.S. degree at

University of Maryland Baltimore County

January, 2001

CMSC 698 Advisory Committee

Dr. Peng Yun (Advisor), Associate Professor in Computer Science

Dr. Charles K Nicholas (Reader), Associate Professor in Computer Science

Certified by

Yun Peng, CMSC698 Advisor

1

Abstract

The emergency of e-commerce increases the importance of research on various multi-agent systems (MAS). MAS is used loosely to refer to any distributed system whose components (or agents) are designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are suitable for the domains that involve interactions between different people or organizations with different (possibly conflicting) goals and proprietary information. A potential application area of MAS is in the “Supply Chain Management System” to integrate a company's activities across the entire e-supply chain - from acquisition of raw materials and purchased components through fabrication, assembly, test, and distribution of finished goods, and roles of these individual entities in the supply chain be implemented as distinct functional software agent which cooperate with each other in order to implement system functionality in a e-business environment. The major interactions in supply chains are done through negotiation strategically between enterprises and consumers. Correspondingly, automated negotiation interactions between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important. Much better benefits and profits can be obtained if these autonomous negotiation agents are capable of learning and reasoning based on experience and improving their negotiation behavior incrementally just as human negotiators. Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing set of rules. In this project, based on a negotiation-based MAS framework for supply chain management and a set of negotiation performatives defined, we are trying to test the possibility of adopt decision-tree learning (or rule-based learning) method in the negotiation process. Experiment results on the effect of using rule-based learning method in a pair-wised negotiation process between one buyer and one seller are presented, which show that with carefully designed data scheme and sufficiently many training samples, decision tree learning method can be used to effectively learn decision rules for some e-commerce activities such as negotiation in supply chains.

Keywords E-Commerce, Multi-Agent System, Supply Chain Management System, Negotiation, Negotiation, Negotiation Performatives, Decision-Tree Learning, Rule-based learning

2

1. Introduction

The development of computer software and hardware leads to the appearance of non-human software agencies. A software agent is considered as an entity with goals, capable of actions endowed with domain knowledge and situated in an environment [29]. The term of multi-agent systems (MAS) is used loosely to refer to any distributed system whose components (or agents) are designed, implemented, and operate independently of each other. Multi-agent systems (MAS) are suitable for the domains that involve interactions between different people or organizations with different (possibly conflicting) goals and proprietary information [29]. Comparing with monolith single systems and traditional distributed systems, due to insufficient knowledge of the system environment, the required coordination of the activities between multiple agents and the dynamic nature of the MAS, the design and implementation of a MAS is of considerable complexity with respect to both its structure and its functionalities.

A supply chain is the process of moving goods from the customer order through the raw materials stage, supply, production, and distribution of products to the customer. More formally, a supply chain is a network of suppliers, factories, warehouses, distribution centers and retailers, through which raw materials are acquired, transformed, produced and delivered to the customer [30]. A supply chain management system (SCMS) manages the cooperation of these system components. In the computational world, roles of individual entities in a supply chain can be implemented as distinct agents. Correspondingly, a SCMS transforms to a MAS, in which functional agents cooperate with each other in order to implement system functionality [31].

In supply chains, enterprises and consumers interact with each other strategically. A great portion of these interactions are done through negotiation. Thus, automated negotiation interactions between two or more agents (say, buyers and sellers) in a multi-agent SCMS are very important. We can get much better benefits and profits if these autonomous negotiation agents are capable of learning and reasoning based on experience and improving their negotiation behavior incrementally just as human negotiators. Moreover, problems stemmed from the complexity of MAS can be avoided or at least reduced by endowing the agents with the ability to adapt and to learn, that is, with the ability to improve the future performance of the total system, of a part of it, or of a single agent [3,17,23]. Learning can be used either to extract an entire rules set for an agent, or to improve a pre-existing set of rules.

Two most important issues might be concerned with learning in negotiation are: How to model the overall negotiation process, i.e. the design of the modeling framework of our negotiation-based multi-agent system for supply chain management? What is the learning algorithm or method we might choose for the decision-making of the agents?

In [31], researchers propose a negotiation-based MAS framework for supply chain management and describe a number of negotiation performatives, which can be used to construct pair-wise and third party negotiation protocols for functional agent cooperation. It also explain how to formally model the negotiation process by using Colored Petri Nets (CPN) and provide an example of establishing a virtual chain by solving a distributed constraint satisfaction problem.

Based on this framework, one main job is trying to test the possibility of adopt decision-tree learning (or rule-based learning) method in the negotiation process and my project experiment is part of this task. There exist a lot machine learning methods that might be useful, but as pointed out later, decision-tree learning is the most suitable one. In this project, I experiment the effect of

3

using rule-based learning method in a pair-wised negotiation process between one buyer and one seller. This is the first step of our job and future work might extend this into a negotiation-based multi-agent system that is more complex with a lot functional agents joining in, staying, bargaining, or leaving the system.

In section 2, we give a brief summary of past research works and learning techniques. Then in Section 3, we give a simple explanation of the designed MAS framework of the system and a set of negotiation performatives used. In Section 4 we present our experiment results so far and give some simple analysis. Finally, in Section 5, we make a conclusion and give our future research goals. In Appendix, we also provide some sample experiment results.

2. Learning Overview and Decision-Tree Learning

In this section, we briefly survey existing research work on learning and adaptation in multi-agent systems, especially those applied in e-commerce activities, and give a simple introduction of the decision-tree learning method along with the reason we choose it, before giving the design framework of our experimental Negotiation Rules Learning (NRL) system and the experiment results in later sections.

2.1 Categories and Objectives of Learning

There are a number of different ways in which learning can be used within a MAS for different objectives.

An agent, standing alone, can learn its owner’s intention and decision-making strategies. In this case, the human user is often served as the trainer whose decisions in response to environmental inputs are used as the training samples. The agent along with the accumulation of the training samples can incrementally learn the decision logic of the human, which might be difficult to explicitly encode as decision rules by hand.

An agent, standing alone, can learn to improve its responses to the environment inputs (including those from other agents) as long as some objective functions (e.g., various utility functions) are well defined. In this case, training samples are its previous interactions with the environment, including the corresponding the objective function values.

An agent can learn about other agents in order to compete or cooperate with them. This type of learning is deeper than the previous two in that an agent learns something that other agents used to make their decisions and uses such knowledge to better fine-tune its own strategy. Learning in this category can be as simple as learning a few parameters that other agents used to conduct their operation (e.g., the reservation prices and markup percentages of suppliers in a supply chain [24]), or can be quite complicated as learning models of other agents’ decision strategies [6,9,10,19,23].

A set of agents can learn to simultaneously adjust their respective decision processes. This type of learning occurs mostly in those MAS whose agents are tightly cooperating with each other to achieve a common goal (e.g., winning a robot soccer game), and the learning inputs often reflect the system’s performance (e.g., scores in a soccer game) rather than performance of individual agents (players) [14,18].

In some applications, MAS learning can be done in the so-called “batch mode”, i.e., the system is trained over a set of pre-assembled training samples while the system is not in user (either before the system’s deployment or when the system is taken offline). In most cases, however, it is

4

preferred that learning is conducted in the “incremental model”, i.e., the system is incrementally adjusting/modifying itself by learning from a continuous stream of inputs from the environment while it is in actual use [25]. This is because 1) training samples, which record the interaction history of an agent, can be collected more efficiently and truthfully when the system is in actual use; and 2) incremental learning allows the system or its agents to adapt to the change of environment in a timely fashion.

2.2 Example Techniques in MAS Learning

Interest in research on MAS learning has seen a steady increase in the past several years, there are many MAS learning systems with vastly different architectures, different application areas, and different machine learning techniques. What follows are brief descriptions of some examples of these systems and the learning techniques they use.

Reinforcement learning: Reinforcement Learning (RL) is the process by which an agent improves its behavior in an environment via experience. RL is based on the idea that the tendency to perform an action by an agent should be strengthened (reinforced by a reward) if the action produces favorable results, and weakened (punished) if the action produces unfavorable results. One of the most important advantages of RL, in comparison with other learning methods, is that it requires very little prior knowledge of the environment, as it does not require having a target or desirable output for each input when forming a training sample. RL algorithms such as Q-learning [7,11] can incrementally adjust the system toward more favorable outcomes as long as it is provided a feedback judgment (good or back) on the system’s output for a given input. For this reason, RL has been seen as one of the most widely used learning methods in MAS [15,16,11]. Most noted application of RL is perhaps in the domain of robot soccer games [14,18] where the game’s outcome (win or lose) is fed back to train the soccer team. RL has also been applied to other problems, including setting right prices in competitive marketplaces [20], learning agent coordination mechanism [11], learning to schedule multiple goals [1], and in dealing with malicious agent in a market based MAS [21].

Optimization-based learning techniques: Optimization-based learning methods such as genetic algorithms [8], neural networks [12,14,20], and linear programming [22] have been used in some experimental MAS to train individual agents to optimize their performance as long as their performance can be specified as some forms of objective functions (e.g., their utility functions). One example of such systems is the DragonChain that uses genetic algorithm (GA) approach to improve its performance in playing the MIT “Beer Game”, a game of electronic supply chain for beers [8]. Mimicking the law of biological evolution of the survival of the fittest, the GA learning in DragonChain was able to help the system to obtain good beer order policies for both retailers and wholesalers by search through the huge space of possible order policies. Their experiment showed that this system outperformed those governed by classic business rules by eliminating the Bullwhip phenomenon, and more interestingly, it can dynamically changing its policies to adapt to changing order patterns from the customers.

Probabilistic learning techniques: Probabilistic learning techniques are of particular interest to MAS learning because of their ability to handle the high degree of uncertainty of the learning environment caused by agent interaction. Uncertainty is even more prevalent when an agent tries to learn models of other agents. In probabilistic learning, an agent does not attempt to learn a deterministic model of another agent, but a probability distribution of a set of possible models of that agent. Examples of formalisms that support probabilistic learning include the Bayesian Belief Networks (BBN), which represent probabilistic dependencies among variables of interest

5

in a graphic representation, and Influence Diagrams (ID), which further extend BBN to include decision nodes and utility nodes. A decision making model for supply chain management, called Bazaar, was developed based on BBN [24]. In this system, an agent (say a buyer) uses Bayesian learning to incrementally learn the distributions of reservation prices and the markup rates of its negotiation partners (sellers). Work by Suryadi and Gmytrasiewicz [19] uses ID learning for an agent to construct models other agents. An agent in their system maintains a number of possible models for each of the other agents that it is interacting and the probability distribution of these models. When none of these existing models has sufficiently high probability, one of them is modified (the parameters and even the structure of the underlying network are changed) to better reflect the observed behavior. An unspoken assumption for the above probabilistic learning systems is that the learning agent must have some prior knowledge of the behaviors of other agents it is trying to learn. At least, it has to assume the knowledge of the set of parameters with which the behavior of the other agent can be expressed because these parameters are the necessary building blocks for the possible probabilistic models. Unfortunately, this assumption may not hold in many MAS applications.

Supervised learning: Supervised learning covers a class of learning methods that requires a teacher to tell the learning system what is the target/correct/desired output for each training input. The target output is then compared with the current system output, and the discrepancy is used to drive the update of the system. Supervised learning includes backpropogation training in neural networks, K-nearest neighbor, minimum entropy, and some form of decision tree learning. Supervised learning is particularly suitable for learning user models for personal agents and human interface agents [4,9,12,13]. This type of agent works on behalf of human users and tries to best satisfy the users’ need. Instead of provide detailed rules to guide the agent (which may not be feasible for complex tasks), the human user can easily work as the teacher to provide desirable response to each input as a training sample to the agent. Payne at al has used k-nearest neighbor method to train a user interface agent [13], while Pannu and Sycara have used backpropagation method to train a personal agent for text filtering and notification [12].

Rule-based learning: Learning rules for rule-based reasoning systems has also been reported in the literature [5,13]. Decision tree learning is perhaps the most mature technical for this type of learning. The advantage of rule-based learning lies on the fact that rules are easy for humans to understand. This allows domain experts to inspect and evaluate rules generated by a learning module, and make decision on whether to accept each of these rules. Moreover, since rules are probably the easiest way to represent and encode experts’ knowledge, many learning systems can start with a set of pre-defined rules and then let the rule-based learning module to modify the rule set with additional observations. Learning thus will greatly facilitate the growth, modification, and maintaining consistence of the knowledge base. These are precisely the reasons that we have chosen rule-based learning for our EECOMS Negotiation Rules learning (NRL) task.

2.3 Decision-Tree Learning

Simply state, a decision tree is a representation of a decision procedure for determining a class label to associate with a given instance (represented by a set of attribute-value pairs). All non-leaf nodes in the tree are decision nodes. A decision node is associated with a test (question on the value of a particular attribute), and a branch corresponding to each of the possible outcomes of the test. At each leaf node, there is a class label (answer) to an instance. Traversing a path from the root to a leaf is much like playing a game of twenty questions, following the decisions made on each

6

decision node on the path. Decision trees can be induced from examples (training samples) that are already labeled.

One of the concerns of DT learning is how to construct trees that are as small as possible (measured by the number of distinct paths a tree has from its roots) and at the same time consistent with the training samples. In a worst case, the induced tree can be degenerated in which each sample has its own unique path (the tree size would then be in the order of exponential to the number of attributes involved). Information theoretic approach has been taken by several DT learning algorithms to address this problem, also to a lesser extent, to the problem of generalization [15].

The basic thought of a DT learning algorithm is:

For each decision point, If all remaining examples are all positive or all negative, we're done. Else if there are some positive and some negative examples left and attributes left, pick the remaining attribute which is the "most important", the one which tends to divide the remaining examples into homogeneous sets Else if there are no examples left, no such example has been observed; return default Else if there are no attributes left, examples with the same description have different classifications: noise or insufficient attributes or nondeterministic domain

Figure 3 below gives a simple example of DT learning. A tree of good size has been induced. It has 6 distinct paths, but it could have in the worst-case 12 distinct paths, each for a unique value assignment pattern of (color, size, shape). This is because some general rules were induced (if color = read, then immediately conclude the class = +; shape will be considered only if color = blue). The figure also shows a set of if-then rules can be generated from the induced tree. Essentially, each distinct path represents a rule: value assignments on the branches on the path constitute the conditional part of the rule, and the value assignment of the leaf node at the end of the path constitutes the conclusion part of the rule.

2.4

7

The Choice of the DT Learning Method

An assumption we made when selecting suitable learning method is that decisions included in the training data (extracted from messages exchanged during negotiation sessions) are good decisions. Therefore, the goal of learning is not to attempt to further optimizing the decision process that was used to generate the data, but to learn rules/ strategies that lead to these decisions. In other words, the learned rules would make the decisions which are the same as (or similar to) those in the training set if the same or similar decision parameters are given. This lead us to the choice of supervised learning, instead of unsupervised or reinforcement learning. The training samples serve as instructions from a teacher or supervisor as each sample gives the desired or “correct” value assignment to the target attribute with respect to the pattern of value assignment to the decision parameters in the sample.

Among all supervised learning methods, we have chosen to experiment with Decision Tree Learning (DT learning) [14,15,26] for the following reasons:

DT learning is a mature technology. It has been studied for 20+ years, has been applied to various real-world problems, and the learning algorithm has been improved by several significant modifications.

The basic algorithm and its underlying principles are easy to understand. It is easy to apply DT learning to specific problem domains, including our NRL task. Several good, easy to use DT learning packages are commercially available (free or with

reasonable cost) [26,27, 28]. It is easy to convert the induced decision tree to a set of rules, which are much easier for human

experts to evaluate and manipulate, and to be incorporated into an existing rule based systems than other representations.

3. Negotiation Rules Learning System Framework

In a negotiation MAS, learning can be applied to various aspect of agent negotiation: training the agent to make decisions in a way similar to what an experienced human manager would make; learning the models of an agent’s negotiation partner at different level of details; and learning negotiation strategies that outsmart its negotiation partners. We have experimented decision tree learning method in our EECOMS negotiation rules learning task. In this section we will give a brief description of the negotiation rule learning system design and a set of negotiation performatives used.

3.1 Objectives

Several theoretical issues are concerned with such a MAS system mentioned above: the high time complexity of learning process, the lack of sufficient learning data and prior knowledge, the inherently uncertainty of the learning results, and the stability and convergence of learning MAS. Thus, the overall objective is to study the feasibility of employing machine learning techniques to automatically induce negotiation decision rules for supply chain players (buyers and sellers) from transaction data. Specifically, we aim at investigating the following:

Constructing a rule base by learning the decision strategy of a human expert: Initially the human makes all negotiation decisions. The prospective rules induced from the negotiation transactions are shown to the human for approval/rejection. The approved rules are then

8

incorporated into existing rule base. Including humans in the loop allows us to have quality training samples as they are generated by an experienced human negotiator rather than an agent with set of not yet well-developed rules. It also makes the rule base more trustworthy to humans since every induced rule is inspected by a human before it is included into the rule base.

Learning the model of the negotiation partners’ behaviors: By properly incorporating the learned partner’s decision rules, an agent can make more informed decisions, and in turn improve its performance (reducing the negotiation time/steps and increasing the payoff).

3.2 Outline of NRL System Design

Negotiation partners (buyers and sellers), represented by computer programs in a virtual supply chain, constitute a multi-agent system in the broad sense of this term as discussed in Section 1. Figure 1 shows the diagram of such a system, for each side of the negotiation, we have a decision module, and the rules in these modules can be more and more complete since each agent has learning ability that is implemented by the learning module. Initially, we have a human negotiator (or a set of pre-defined rules) to guide the negotiation process.

3.3 Negotiation Performatives

All of the functional agents in a MAS should have some understanding of system ontology and use a certain Agent Communication Language (ACL) to make conversation, transfer information, share knowledge and negotiate with each other, which offers a minimal set of performatives to describe agent actions and allows users to extend them if the new defined ones conform to the rules of ACL syntax and semantics. Knowledge Query and Manipulation Language (KQML) and the ACL defined by Foundation for Intelligent Physical Agents/Agent Communication Language (FIPA ACL) are the most widely used and studied ACLs. In KQML there are no predefined performatives for agent negotiation actions. In FIPA ACL there are some performatives, such as proposal, CFP and so on, for general agent negotiation processes, but they are not sufficient for our purposes. For example, there are no performatives to handle third party negotiation. The NRL system design presents a negotiation performative set designed for MAS dealing with supply chain management [31].

9

In the following table, we give the negotiation performatives’ name, their corresponding meaning and the possible performatives a functional agent can use to reply when certain performative comes in:

Name Meaning Performative ResponsedCFP call for proposal Proposal | TerminateCFMP call for modified proposal Proposal | TerminateReject reject a proposal Proposal | TerminateTerminate Terminate the negotiation NONEAccept accept a proposal NONEProposal The action of submit a proposal Accept | Reject | Terminate | CFMP

Initially, one agent starts negotiation by sending a CFP message to the other agent. After several rounds of conversation in which proposes and counter-proposes are exchanged, the negotiation between two agents will end when one side accepts (rejects) the other side’s proposal or terminates the negotiation process without any further explanation [31].

4. Experiments and Results

A preliminary experimental learning system form NRL was constructed earlier to evaluate the feasibility of learning decision rules for a buyer agent. A set of 27 training samples was manually generated following the data schema. These samples were fed to C5.0, a decision-tree learning package (a descendant of Ross Quinlan’s classic ID3 decision tree learning algorithm [14,15]) obtained from RuleQuest Research, http://www.rulequest.com/. The learning was successfully completed, a decision tree was constructed, from which a set of eight decision rules were generated by C5.0. These rules suggest to the buyer agent what actions it should take next (i.e., what types of messages should be sent out next), based on factors such as how well the terms such as price, quantity and delivery date in its current “Call-For-Proposal” match those in the returned “Proposal” from the seller it is negotiating with, the reputation of the seller, and how deep the current negotiation is progressing.

The initial results from the experiment were encouraging. It is shown that the decision tree learning can be an effective tool to learn the negotiation rules from the data that reflects the past negotiation experience. The few rules learned are in general reasonable and consistent with the intuitions we had and used to generate the training data. However, the experiment was restricted by the quality of the training data (only 27 hand made samples were used), and the results were far from convincing (only three rules were learned).

Encouraged by the results from the preliminary study, we went forward to conduct a more extensive experiment of NRL by decision tree learning. The main extensions include the following:

A program to automatically generate training samples is developed. This data generator is based on a set of decision rules that take into considerations of all important decision factors for making proposals (and counter proposals). Unlike the small data set generated manually and somewhat arbitrarily in the preliminary study, this data generator allows us to experiment with large quantity of learning samples (hundreds or even thousands of them) that are consistent and more realistic.

A better data schema is selected after a series of experiments to yield good results.

10

http://www.rulequest.com/

The buyer agent not only learns decision rules for itself but also learns rules that the seller appears to use, thus constructs a (simple) model of the seller.

Next we summarize the experiment system and present the results.4.1 The Training Data Generator

More realistic training data may induce better, more meaningful rules. They can also be used to test and validate the learning results. Since no realistic data set of sufficient size is available to us, nor could we obtain help from human experts in supply chain negotiation, we were not able to adequately resolve this problem to our complete satisfaction. As an alternative, we have developed a training sample automatic generator to generate as many samples as we need. This sample generator essentially simulates the negotiation process on both the buyer side and seller side, based on two sets of decision rules encoded in the program, to generate a stream of message exchanges between the buyer and the seller. The actual training samples are then extracted from these messages. By changing the relevant rules, the sample generator can be used to generate samples reflecting different negotiation scenarios.

Message format: (msg_type, sender, receiver, item, price, quantity, delivery_date)For example, a negotiation session may start with the following message from the buyer to the seller (CFP Buyer Seller X 9.25 50 7) that Buyer wishes to entertain a proposal from Seller for 50 pieces of good X at price $9.25 a piece, to be delivered 7 days from this day. Seller may response with a message (Proposal Seller Buyer X 11.25 50 7) that it can sell these good by the given delivery date at the unit price of $11.25. To simplify the experiment, we chose to let the buyer and the seller to negotiate only one type of good, named X.

System parameters: A set of random numbers are used to determine the attribute values for the CFP message for each negotiation session and the values for the initial Proposal message in response to the CFP message. These numbers, to an extent, simulate the dynamic nature of the internal and external environment in which the negotiation is taking place. These parameters include the following:Buyer’s need for good X:

quantity: a random number between 1 and 200 (with 0.1 probability in [1,50], 0.8 probability in [51,150], and 0.1 probability in [151, 200]), and it can be mapped to three regions: [1,50] {small}, [51, 150] {ordinary}, [151, 200] {large}.

delivery_date: a random number between 1 and 20, it can also be mapped to three regions: 1~5 days{short}, 6~15 days{regular}, 16~20 days{long}.

asking price: a random number between 7 and 11, and the fair market unit price is $10, the range of all possible prices is partitioned into six regions: min (7price< 8), low (8price<9), normalminus(9price<10), normalplus (10price11), high (11<price12), and max (12<price13).

importance of the order: a random number of binary value Seller’s reputation: a random number of binary value

Seller’s capacity to supply good X: daily production capacity of good X: a random number between 8 and 12 current inventory: a random number between 20 and 50 importance of the Buyer as a customer: a random number of binary value

With these random numbers, each negotiation starts with a CFP message of different quantity, delivery_date, and asking price. In response, the seller first determines if the requested

11

delivery_date can be met for the given quantity based on the current inventory and the daily production capacities from this day to the delivery_date. To simplify the data generation, we assume that the seller submit its initial proposal (usually with a price higher than the asking price in the CFP message) only if the requested quantity and date can be met. The negotiation then continues with the buyer gradually increasing the asking price and the seller decreasing the bidding price until the session ends. The details of the negotiation are governed by decision rules at either side.

Negotiation rules for Seller agent: The following rules are used in the data generator to form response messages from Seller agent.

SR-1: Terminate the negotiation IF Seller cannot meet the quantity-date requested in the incoming CFP messageSR-2: Terminate the negotiation IF asking price = min (7price< 8) SR-3: Terminate the negotiation IF asking price = low (8price<9) & Buyer is NOT an important customerSR-4: Otherwise, submit a Proposal with the requested quantity and date, and the price is determined by:

SR-41: IF the asking price = normalplus or high or max (10price13) THEN bidding price = asking priceSR-42: Otherwise,

IF incoming msg-type = CFP THEN IF asking price = low (8price<9) & Buyer is an important customer THEN propose a higher price (bidding price>asking price)ELSE IF asking price = normalminus (9price<10) Buyer is NOT an important customer THEN propose a higher price (bidding price>asking price)ELSE IF asking price = normalminus (9price<10) Buyer is an important customer THEN bidding price = asking price

ELSE propose a lower price (bidding priceasking price)

Negotiation rules for Buyer agent: The following rules are used in the data generator to form messages from Buyer agent.

BR-1: Terminate the negotiation IF bidding price =max (12<price13) BR-2: Terminate the negotiation IF bidding price = high (11<price12) & the current depth of negotiation 7BR-3: Reject the incoming proposal IF bidding price = high (11<price12) & either Seller’s reputation is bad or

this order is not important & the current depth of negotiation < 7BR-4: CFMP for a lower price IF bidding price = high (11<price12) & Seller’s reputation is good & this order is

important & the current depth of negotiation < 7BR-5: Accept the current proposal IF the bidding price10BR-6: Accept the current proposal IF the bidding price = asking price (delta_price =0)BR-7: Accept the current proposal IF bidding price = normalplus (10<price11) & Seller’s reputation is good &

The order is important.BR-8: CFMP for a lower price IF bidding price = normalplus (10<price11) & either Seller’s reputation is bad or

this order is not important & the current depth of negotiation < 7BR-9: Terminate IF bidding price = normalplus (10<price11) & either Seller’s reputation is bad or this order is

not important the current depth of negotiation 7

The function used in negotiation to reduce and increase price by 'buyer' agent: 1/(1 + e xp(-x)) (x maybe from -3 to +3)The function used in negotiation to reduce and increase price by 'seller' agent: (Depth_Max - depth+1) * tan(0.25)

depth 0 1 2 3 4 5 6 7

diff_buyer N/A 0.05 0.12 0.27 0.50 0.73 0.88 0.95

diff_seller 2.043 1.787 1.532 1.277 1.021 0.766 0.511 0.255

4.2 Data Schema for Training Samples

12

The negotiation process in NRL is very complex. Consider just the task of making a counter proposal by a buyer when it receives a new proposal during a negotiation session. This task amounts to optimizing a function (e.g., payoff) based on a high dimensional many-to-many mapping. The input involves parameters reflecting the enterprise’s planning and execution (customer orders, production capacity, inventory, etc.); the distance between asking and bidding values of negotiation terms (prices, quantities, delivery dates, etc.) at the given point of a negotiation session; the trustworthiness of the negotiation partner; how long the current session has been (the longer it lasts, the less likely a good deal can be made); the importance to the buyer for the on-going negotiation to succeed; and the availability of other alternative sellers, etc. The output is also composed of a large number of parameters that gives a detailed description of a (counter) proposal. The training samples for DT learning are composed by these attributes.

A training sample is a vector or a list of values for a set of attributes extracted from a message exchanged during the negotiation. A sample can be divided into two parts. The first part involves attributes one hopes that the learned rules can be used to generate. They are thus referred to as learning target. The second part includes those attributes that support the conclusions of the learned rules on assigning values to the target attributions; they are referred to as decision parameters. The data model or data scheme for training samples specifies what is to be learned (the target attribute) and what are the decision parameters (other attributes the target depend on).

A training sample is synthesized from three consecutive messages: the current incoming message, the one that precedes it, and the one in response to it. Figure 2 is an example of training samples used in our early learning experiment.

In this experiment, to simplify the investigation, we have decided to focus on learning rules for determining appropriate message type for response to an incoming message, namely, our learning target is the performative (or message type) that will be used to response the incoming one. The target attribute will be either CFP, CFMP, Terminate, Accept, Reject for Buyer, Terminate and Proposal for Seller.

13

Selection of decision parameters is more complicated. Since the type of each outgoing message from an agent is determined by the content of the incoming message and the content of the previous message from the same agent, a large number of attributes that may potentially affect the new message type can be extracted from the two proceeding two messages and from their differences. For example, consider the situation that Buyer receives a Proposal (msg-2) from Seller after sending a CFP message (msg-1). The new message (msg-3) from Buyer, in responding to msg-2, may depend on:

attributes from msg-1:bprice (Buyer’s asking price)bquantity (Buyer’s requested quantity)bdate (Buyer’s requested delivery_date)last_msg (type of the last msg from Buyer to Seller)

attributes from msg-2:sprice (Seller’s bidding price)squantity (Seller’s proposed quantity)sdate (Seller’s proposed delivery_date) incoming_msg (type of the incoming msg from Seller)

attributes from the difference between msg-1 and msg-2:delta_price delta_quantity delta_datematch_dq (true only if both delta_quantity and delta_date are zero)

attributes about other properties of Buyer:opp-reputation (Buyer’s evaluation of Seller’s reputation)weight-item (whether this order is important to Buyer) depth (number of msgs Buyer has sent during the current session)

A small set of decision parameters may be insufficient for the learning module to differentiate training examples, and thus resulting in a decision tree with many ambiguous leave nodes (nodes with multiple labels). On the other hand, a large set of decision parameters may refine the decision tree to a level that is too detailed to be practically useful because the induced tree would have a great height and a large number of branches (rules). For example, one of our experimental run used all the parameters listed above. The induced decision tree for Buyer has height of 14, which means that some rules would have to check as many as 14 conditions before it draws a conclusion. Moreover, a total of more than 100 rules are generated from this tree. It may be possible to obtain a workable smaller set of rules can be obtained from these raw rules by some pruning techniques, but this would require a substantial post-learning process.

After several trials, we have chosen the follows decision parameter sets for the Buyer and Seller, respectively.

Buyer: sender, depth, receiver, last_msg, incoming_msg, item, sprice, opp_reputation, weight_item, match_qd.Seller: sender, depth, receiver, last_msg, incoming_msg, item, bprice, opp_importance, match_qd.

4.3 The Experiment Results

We have experimented three software packages for decision tree learning, they are (1) C5.0/See5.0 from RuleQuest Research [27], (2) Ripper from ATT [28], and (3) ITI (Incremental Tree Induced) from University of Massachusetts [26]. C5.0/See5.0 was not selected for the final experiment because the free-of-charge version we have restricts the dataset to have no more than 200 training samples, which are not sufficient to make the learning process converge. Ripper, although not restricting the size of the training set, was rejected because it always produces a very small number

14

of rules (possibly due to a severe pruning process it uses to generate the final output). ITI was selected not only because it works well with our learning task but also because it supports incremental learning, a valuable feature we plan to further explore in the future.

3000 randomly generated negotiation sessions were generated by the automatic data generator described in Section 3.1. Each session includes a sequence of message exchanges between Buyer and Seller, starting with a CFP message from Buyer. The experiment showed that this amount of training samples is sufficient for the learning process to converge (the induced tree becomes stable). Datasets of smaller size may be used to learn most, but not all decision rules, because they do not contain all possible scenarios, especially those with small probabilities. These samples were fed into ITI under the data model described in Section 3.2. Two induced decision trees and their corresponding data model files, for Buyer and Seller, respectively, were included in Appendix.

Learned rules for Buyer: 12 rules can be generated from the induced decision tree, corresponding to the 12 paths in the tree. Two rules (the first and last, counting from left to right) are related to starting and ending a session, the 10 others are rules for determining the new message types. These rules match very well with the rules used to generate the training data. There is no apparent inconsistency between these two sets of rules. For example, the second rule

IF sprice = max THEN Terminateis the same as BR-1 in Section 3.1. The next three rules (all under the condition that quantity and delivery-date match)

IF sprice = high AND weight_item = unimportant THEN RejectIF sprice = high AND weight_item = important AND opp_reputation = bad THEN RejectIF sprice = high AND weight_item = important AND opp_reputation = good THEN CFMP

jointly match the rules of BR-3 and BR-4. All other induced rules also match the data generation rules well.

Learned rules for Seller: 6 rules can be generated from the decision tree, corresponding to the 6 paths in the tree. The last two rules are related to starting and end a session, the first 4 for determining the new message types. These rules again match very well with the rules used to generate the training data. The first two rules from the tree

IF bprice = min THEN Terminate IF bprice = low AND opp_weight = unimportant THEN Terminate

are the same as SR-2 and SR-3 in Section 3.1, the next two rulesIF bprice = low AND opp_weight = important THEN ProposalIF bprice > low THEN Proposal

jointly match the rules of SR-4.

5. Conclusions and Future Work

Due to the inherent complex, uncertain, and dynamic nature of the multi-agent systems, it is very difficult, if not impossible to encode agents’ decision strategies a priori. Learning while doing becomes imperative for constructing MAS in applications. This is also true for automatic negotiation systems for supply chain management where each entity operates autonomously and interacts with others (its negotiating partners) to reach deals. We have begun an empirical investigation of the feasibility of adopting some existing machine learning techniques to learn negotiation rules (NRL) from transaction data (mainly from the messages exchanged among negotiation agents). Our experiment results show that, with carefully designed data scheme and sufficiently many training samples, decision tree learning method can be used to effectively learn decision rules for some e-commerce activities such as negotiation in supply chains.

15

More interestingly, our experiment showed that the Buyer agent could learn the model of its partner (the Seller) using only the information available in the messages they exchanged during the negotiation. Although the model learned is about the behavior of the seller, not about the underlying mechanism governing the decision making at the Seller agent, it may provide the buyer some power to predict the response from the seller or to choose the actions that will bring the most desired responses.

Although our experiment only involves learning how to determine one aspect of the responding message, namely its message type, it is conceivable that this method can be used to learn other aspects (e.g., how to set the price) from the same raw data (possibly with different data models). In other words, a more complete model (with multiple facets) of an agent can be constructed by simultaneously running multiple decision tree learning processes, one for an aspect of the agent.

In future, more investigation can be considered in the following directions: (1) Experiment decision tree learning with other, preferably real-world training data. (2) Study the issue of how to incorporate the learned model of your partner (or opponent) to improve your negotiation decision-making. (3) Investigate the applicability of incremental decision tree learning and how it can improve the agent’s performance by making it adaptive to the changing environment. (4) Develop some hybrid learning architecture that employs different learning techniques for different aspects of the e-commerce activities.

References

[1] Arai, S. Sycara, K., and Payne, T.R. (2000). Multi-agent Reinforcement Learning for Scheduling Multiple-Goals. In Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS'2000).

[2] Arrow, K. (1962). The implications of learning by doing. Review of Economic Studies, 29, 166170. [3] Brazdil, P., Gams, M., Sian, S., Torgo, L., & van de Velde, W. (1991). Learning in distributed systems

and multi-agent environments. In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 412--423). Lecture Notes in Artificial Intelligence, vol. 482. Berlin: SpringerVerlag.

[4] Caglayan, A., et al., (1996). Lessons from Open Sesame!, a User Interface Agent. In Proceedings of PAAM ’96.

[5] Haynes, T., Lau, K., and Sen, S. (1996). Learning Cases to Compliment Rules for Conflict Resolution in Multiagent Systems. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996.

[6] Hu, J. and Wellman, M. (1998). Online Learning About Other Agents in a Dynamic Multiagent System. In Proceedings of the Second International Conference on Autonomous Agents (Agents98), Minneapolis, MN, USA, May 1998

[7] Humphrys, M. (1995). Wlearning: Competition among selfish Q-learners. Technical Report no. 362. Computer Laboratory, University of Cambridge.

[8] Kimbrough, S.O., Wu, D.J., and Zhong, F. (2000). Artificial Agents Play the Beer Game, Eliminate the Bullwhip Effect, and Whip the MBAs. http://grace.wharton.upenn.edu/~sok/fmec/schedule.html

[9] Maes, P. (1994). Social interface agents: Acquiring competence by learning from users and other agents. In O. Etzioni (Ed.), Working Notes of the 1994 AAAI Spring Symposium on Software Agents.

[10] Mor, Y., Goldman, C.V., and Rosenschein, J.S. (1996). Learn Your Opponent's Strategy (in Polynomial Time). In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 164-176). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag, 1996.

[11] Mundhe, M. and Sen, S. (1999). Evaluating Concurrent Reinforcement Learners. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden.

[12] Pannu, A. and Sycara, K. (1996). Learning Personal Agent for Text Filtering and Notification. In Proceedings of the International Conference of Knowledge-Based Systems (KBCS 96), Dec., 1996.

16

[13] Payne, T.R., Edwards, P., & Green, C.L. (1995). Experience with rule induction and k-nearest neighbor methods for interface agents that learn. In WSIMLC95).

[14] Quinlan, J.R. (1986). “Induction of Decision Trees”. Machine Learning, 1, 81-106.[15] Quinlan, J.R. (1993). “Combining Instance-Based and Model-Based Learning”, in Proceedings of 10th

International Conference on Machine Learning, 236-243. [16] Schmidhuber, J. (1996). A General Method For Multi-Agent Reinforcement Learning In Unrestricted

Environments. Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996.

[17] Sian, S.S. (1991). Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine Learning (MAML). In Y. Kodratoff (Ed.), Machine learning -- EWSL91 (pp. 440--456). Berlin: Springer-Verlag.

[18] Stone, P. and Veloso, M. (1996). Collaborative and Adversarial Learning: A Case Study in Robotic Soccer. In Working Notes of the AAAI Spring Symposium on Adaptation, Coevolution, and Learning in Multiagent Systems, Stanford, CA, March, 1996.

[19] Suryadi, D. and Gmytrasiewicz, P.J. Learning Models of Other Agents Using Influence Diagrams. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden.

[20] Tesauro, G. (1999). Pricing in Agent Economies Using Neural Networks and Multi-Agent Q-learning. In Proceedings of IJCAI-99 Workshop on Agents Learning About, From and With other Agents. 1999, Stockholm, Sweden.

[21] Vidal, J. and Durfee, E. (1997) Agents Learning about Agents: A Framework and Analysis. In Working Papers of the AAAI-97 Workshop on Multiagent Learning.

[22] Weiß, G. and S. Sen (Eds.) (1996). Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.

[23] Weiß, G. (1996). Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography, In G. Weiß and S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems (pp. 1-21). Lecture Notes in Artificial Intelligence, Vol. 1042. SpringerVerlag.

[24] Zeng, D. and Sycara, K (1996). Bayesian Learning in Negotiation. In Working Notes of the AAAI 1996 Stanford Spring Symposium Series on Adaptation, Coevolution and Learning in Multiagent Systems.

[25] Zeng, D. and Sycara, K (1997). Benefits of Learning in Negotiation. In Proceedings of AAAI. [26] http://www.cs.umass.edu/~lrn/iti/ -- Incremental Tree Induced (ITI) By UMASS[27] http://www.rulequest.com/ -- C5.0/See5.0 for DT learning from Rulequest Research[28] http://www.research.att.com/~diane/ripper/ripper-2.5.tar.gz -- Ripper for DT learning from ATT. [29]

P. Stone and M. Veloso, “Multiagent Systems: A Survey from a Machine Learning Perspective,” Under review for journal publication, February, 1997.[30] M. Barbuceanu, and M. S. Fox, “The Information Agent: An Infrastructure Agent Supporting Collaborative Enterprise Architectures,” in Proceedings of Third Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, IEEE Computer Society Press, 1994.[31] Ye Chen, Yun peng, Tim Finin, Yannis Labrou, Scott Cost, BIll Chu, Rongming Sun and Bob Willhelm, “A negotiation-based Multi-agent System for Supply Chain Management”, Workshop on supply chain

management, Autonomous Agents '99, Seattle, WA, May 1999.AcknowledgementI will like to thank my advisor Dr. Yun Peng for his great help for this Master’s project and Dr. Charles K Nicholas for the reviewer of this report. I also

want to thank Mr. Ye Chen and Dr. Tim Finin for their pertinent suggestions. Appendix: Induced decision trees (by ITI)Data Model: selleri.names Decision tree for the Seller agent

17

Supplier

http://www.research.att.com/~diane/ripper/ripper-2.5.tar.gz

http://www.rulequest.com/

http://www.cs.umass.edu/~lrn/iti/

Data Model: buyeri.names Decision tree for the Buyer agent

18

19

Documents

CMSC698.doc