Trading Agent Competition: Performance Evaluation

Trading Agent Competition: Trading Agent Competition: Performance EvaluationPerformance Evaluation

Presented by Brett BorghettiPresented by Brett Borghetti

[email protected]@cs.umn.edu

22 March 200622 March 2006

mailto:[email protected]

Think about thisThink about this

• You own a small business You own a small business

• You make a bunch of strategic You make a bunch of strategic decisions/plans/policiesdecisions/plans/policies

• Your 1Your 1stst quarter net profit is $100,000 quarter net profit is $100,000– Which choices helped?Which choices helped?– Which choices hurt?Which choices hurt?– Can your decisions be examined Can your decisions be examined

independently?independently?– How do you improve next quarter?How do you improve next quarter?

The SituationThe Situation

• We sometimes have to make our We sometimes have to make our plans and policies before their plans and policies before their executionexecution

• We don’t know fully what the market We don’t know fully what the market will do next quarter (uncertainty)will do next quarter (uncertainty)

• We are in competition with other We are in competition with other businesses/entities who may act to businesses/entities who may act to thwart our plansthwart our plans

A SolutionA Solution

• Repeat (until good enough):Repeat (until good enough):– Predict the effects of our choices offlinePredict the effects of our choices offline– Adjust our choices to optimize outcomeAdjust our choices to optimize outcome

• Execute our plansExecute our plans

• Measure the effectiveness of our Measure the effectiveness of our choices onlinechoices online

Presentation OverviewPresentation Overview

• TAC-SCM OverviewTAC-SCM Overview

• Current analysis methodsCurrent analysis methods

• New methodsNew methods

• Future ResearchFuture Research

What is TAC-SCM?What is TAC-SCM?

• Simulation of a market supply chainSimulation of a market supply chain– Agent is the computer manufacturerAgent is the computer manufacturer– Buys parts from suppliers in auctionBuys parts from suppliers in auction– Manage assembly line/production Manage assembly line/production

scheduleschedule– Reverse Auction to sell computersReverse Auction to sell computers– Ship computers to customersShip computers to customers

• Six agents compete: maximize profitSix agents compete: maximize profit

TA

C-S

CM

CU

RR

EN

T A

NA

LYSIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

TAC-SCM InteractionTAC-SCM InteractionTA

C-S

CM

CU

RR

EN

T A

NA

LYSIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Game Flow DiagramGame Flow Diagram

TAC - Why is it Interesting?TAC - Why is it Interesting?

• Complexity: Beyond human-in-the-loop capabilityComplexity: Beyond human-in-the-loop capability– Compete with 5 other agents selling computersCompete with 5 other agents selling computers– Real time: 15 sec/day x 220 daysReal time: 15 sec/day x 220 days– Auctions (normal and reverse for all transactions)Auctions (normal and reverse for all transactions)– 8 parts suppliers with production capacity changing daily8 parts suppliers with production capacity changing daily– 16 different computer types to build in 3 price classes16 different computer types to build in 3 price classes– 100s of Customers with varying demand and reserve 100s of Customers with varying demand and reserve

pricesprices– Price probing, future purchase decisions . . . . . Price probing, future purchase decisions . . . . .

• Small market: Agents have large impact on each Small market: Agents have large impact on each otherother– Explicit Competition – PROFIT!Explicit Competition – PROFIT!– Learning other’s habits & patterns and out-thinking themLearning other’s habits & patterns and out-thinking them– Information denial / Decision perturbationInformation denial / Decision perturbation

TA

C-S

CM

CU

RR

EN

T A

NA

LYSIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

UMN MinneTAC DesignUMN MinneTAC Design

• Component-based architectureComponent-based architecture– Procurement – Purchase parts from suppliersProcurement – Purchase parts from suppliers– Production – Manages the production line Production – Manages the production line – Sales – Interacts with customers to make salesSales – Interacts with customers to make sales– Shipping – plans customer shipping scheduleShipping – plans customer shipping schedule– Repository – centralized data storage / Repository – centralized data storage /

accesorsaccesors– Oracle – decision assistance evaluatorsOracle – decision assistance evaluators

TA

C-S

CM

CU

RR

EN

T A

NA

LYSIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Design pros and consDesign pros and cons

• Lower module coupling = good designLower module coupling = good design– More simultaneous developersMore simultaneous developers– Easier to maintainEasier to maintain

• Self interest vs. Common goodSelf interest vs. Common good• Causality – which components Causality – which components

responsible for a good or bad decision?responsible for a good or bad decision?• How do we analyze and improve our How do we analyze and improve our

global performance?global performance?

TA

C-S

CM

CU

RR

EN

T A

NA

LYSIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Current Analysis MethodsCurrent Analysis Methods

• Run offline simulations and tweak Run offline simulations and tweak components to optimize profitcomponents to optimize profit– CPU intensive (1 hour per game)CPU intensive (1 hour per game)– Statistical significance => many gamesStatistical significance => many games– Competition is limitedCompetition is limited– Causal analysis is complicatedCausal analysis is complicated

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

New Analysis MethodsNew Analysis Methods

• What if we could measure performance of What if we could measure performance of components inside of the agent?components inside of the agent?– We could directly compare performance between We could directly compare performance between

two components of the same type against the two components of the same type against the same TAC market datasetsame TAC market dataset

– We could reduce the number of games required We could reduce the number of games required to show correlations / relative performanceto show correlations / relative performance

– We could more rapidly determine which ‘tweaks’ We could more rapidly determine which ‘tweaks’ actually have an effect on game outcomeactually have an effect on game outcome

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Challenges of MeasuringChallenges of Measuring

• Which metrics are actually Which metrics are actually correlated with profit?correlated with profit?

• How do we assign sharing of credit How do we assign sharing of credit or blame?or blame?

• How do we account for the varying How do we account for the varying market conditions while taking market conditions while taking measurements over multiple measurements over multiple games?games?

• How do we simulate various How do we simulate various competitive environments offline?competitive environments offline?

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Methodology - OverviewMethodology - Overview

• Controlling the market conditionsControlling the market conditions– Control RandomnessControl Randomness– Control market supply / demand Control market supply / demand

situationsituation

• Measuring component performanceMeasuring component performance– Create metricsCreate metrics– Determine if metric is correlated with Determine if metric is correlated with

profitprofit– Assign component responsibilityAssign component responsibility

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Controlling RandomnessControlling Randomness

• Re-design server to allow deterministic / Re-design server to allow deterministic / replayable gamesreplayable games

• Three types of random processes:Three types of random processes:– Server variables (customer/supplier)Server variables (customer/supplier)– Agent-dependent variablesAgent-dependent variables– Dummy agent variablesDummy agent variables

• Each process gets its own seedEach process gets its own seed– Eliminates race conditions in replaysEliminates race conditions in replays– Allows some process true randomness while Allows some process true randomness while

others replayothers replay

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Market Manipulation Market Manipulation AgentsAgents

• Goal – develop a way of manipulating Goal – develop a way of manipulating supply and demand conditions during a supply and demand conditions during a simulation to observe how competitive simulation to observe how competitive agents respondagents respond

• Method – Build TAC agents that are not Method – Build TAC agents that are not concerned with their own profit, but concerned with their own profit, but rather with absorbing/releasing market rather with absorbing/releasing market shareshare

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Market Manipulation Market Manipulation AgentsAgents

• Market Relief AgentMarket Relief Agent– Accepts and fulfils no customer RFQsAccepts and fulfils no customer RFQs– Purchases no parts from suppliersPurchases no parts from suppliers– Result: Reduces demand on suppliers and Result: Reduces demand on suppliers and

reduces supply to customersreduces supply to customers• Market Pressure AgentMarket Pressure Agent

– Makes more promises to customers than regular Makes more promises to customers than regular agent could handleagent could handle

– Buys more parts from suppliers than regular Buys more parts from suppliers than regular agent shouldagent should

– Result: Increases demand on suppliers and Result: Increases demand on suppliers and causes customer demand to go downcauses customer demand to go down

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Measuring Component Measuring Component PerformancePerformance

• Create suite of metrics to measure:Create suite of metrics to measure:– Replacement costs when a part is soldReplacement costs when a part is sold– Storage costs of parts/computersStorage costs of parts/computers– Late penaltiesLate penalties– Wasted production cyclesWasted production cycles– Remaining inventory at end of gameRemaining inventory at end of game– ……

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Measuring CausalityMeasuring Causality

• How do we assign responsibility?How do we assign responsibility?

• For example: Why was the item late?For example: Why was the item late?•Didn’t ship the product? Didn’t ship the product?

•Didn’t make the product? Didn’t make the product?

•Didn’t have the parts?Didn’t have the parts?

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Implementing MetricsImplementing Metrics

• Allow for easy creation of new metricsAllow for easy creation of new metrics– Serialize game informationSerialize game information– Evaluations can then be made offlineEvaluations can then be made offline– Enables us to experiment in finding metrics Enables us to experiment in finding metrics

that are correlated with profit.that are correlated with profit.

• But how do we even know if a metric is But how do we even know if a metric is correlated with profit?correlated with profit?– Large amount of variability in each gameLarge amount of variability in each game– Need a large sample size, which takes timeNeed a large sample size, which takes time

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Results to dateResults to date

• We have some preliminary data We have some preliminary data regarding how the manipulation regarding how the manipulation agents cause the other agents to agents cause the other agents to behave under various market behave under various market conditionsconditions

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Performance Results: Performance Results: Market Relief Agent vs Dummy Market Relief Agent vs Dummy AgentsAgents

Cost of Materials Per Order vs #MRAs

0

5000

10000

15000

20000

25000

1 2 3 4 5

#MRAs

Co

st

pe

r O

rde

r ($

)

Cost per order

Revenue per order shipped vs #MRAs

24100

24200

24300

24400

24500

24600

24700

24800

1 2 3 4 5

#MRAs

Re

ve

nu

e p

er

ord

er

sh

ipp

ed

Revenue per order shipped

Note the scale of this graphTA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Performance Results: Performance Results: Market Relief Agent vs Dummy Market Relief Agent vs Dummy AgentsAgents

Material Cost, Profit & Fees vs #MRAs

90000000

95000000

100000000

105000000

110000000

115000000

120000000

125000000

1 2 3 4 5

#MRAs

Mat

eria

l Co

st &

Pro

fit

Fees & Interest

Total Avg Profit

Total Avg Material Cost

Customer Orders Accepted vs #MRAs

4000

4500

5000

5500

6000

6500

1 2 3 4 5

#MRAs

#Cu

sto

mer

Ord

ers

Missed Orders

Late Orders

On time Orders

• Unexpected benefits!Unexpected benefits!– MRAs can reveal undesireable traits/logic flaws in an MRAs can reveal undesireable traits/logic flaws in an

agentagent

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Performance Results: Performance Results: Market Pressure Agent vs MinneTACMarket Pressure Agent vs MinneTAC

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H


TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H


TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Performance Results: Performance Results: Market Pressure Agent vs CompetitionMarket Pressure Agent vs Competition

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

ConclusionsConclusions

• We’ve created some new tools for measuring We’ve created some new tools for measuring offline performanceoffline performance– Replayable gamesReplayable games– Market Condition ManipulationMarket Condition Manipulation– Embedded Metrics CollectionEmbedded Metrics Collection

• Started choosing what metrics contain Started choosing what metrics contain information allowing profit predictioninformation allowing profit prediction

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ESEA

RC

H

Future WorkFuture Work

• Improve Market Manipulation agentsImprove Market Manipulation agents– Make competition modeling more realisticMake competition modeling more realistic

• Find additional metrics that have a better correlation Find additional metrics that have a better correlation to overall profit to overall profit – Better off-line prediction of on-line performanceBetter off-line prediction of on-line performance

• Use metrics to guide development of better Use metrics to guide development of better components components – Leads to better profit performance [build to the metric]Leads to better profit performance [build to the metric]

• Use on-line metrics to make live strategic decisions Use on-line metrics to make live strategic decisions – Live ‘tuning’ of components if they begin to underperformLive ‘tuning’ of components if they begin to underperform– Selection of ‘pinch-hitter’ components in certain market Selection of ‘pinch-hitter’ components in certain market

conditions conditions

TA

C-S

CM

CU

RR

EN

T A

NA

LY

SIS

NEW

METH

OD

SFU

TU

RE R

ES

EA

RC

H

Acknowledgement / InfoAcknowledgement / Info

Special thanks to:Special thanks to:– Eric SodomkaEric Sodomka– Dr. Maria GiniDr. Maria Gini– Dr. John CollinsDr. John Collins– UMN TAC teamUMN TAC team

More Info atMore Info at– MinneTAC websiteMinneTAC website

• www.cs.umn.edu/tacwww.cs.umn.edu/tac– SICS websiteSICS website

• www.sics.se/tacwww.sics.se/tac

http://www.cs.umn.edu/tac

http://www.sics.se/tac

Documents

Trading Agent Competition: Performance Evaluation