Tax Collections Optimization for New York Statedmkd.cs.vt.edu/papers/Interfaces12.pdf · Miller et al.: Tax Collections Optimization for New York State 76 Interfaces 42(1), pp. 74–84,

Vol. 42, No. 1, January–February 2012, pp. 74–84ISSN 0092-2102 (print) � ISSN 1526-551X (online) http://dx.doi.org/10.1287/inte.1110.0618

© 2012 INFORMS

THE FRANZ EDELMAN AWARDAchievement in Operations Research

Tax Collections Optimization for New York State

Gerard Miller, Melissa Weatherwax, Timothy GardinierDepartment of Taxation and Finance, State of New York, Albany, New York 12227

{[email protected], [email protected], [email protected]}

Naoki Abe, Prem Melville, Cezar Pendus, David JensenIBM Research, Yorktown Heights, New York 10598

{[email protected], [email protected], [email protected], [email protected]}

Chandan K. ReddyDepartment of Computer Science, Wayne State University, Detroit, Michigan 48202,

[email protected]

Vince Thomas, James Bennett, Gary Anderson, Brent CooleyGlobal Business Services, IBM Corporation, Armonk, New York 10504

{[email protected], [email protected], [email protected], [email protected]}

The New York State Department of Taxation and Finance (NYS DTF) collects over $1 billion annually in assesseddelinquent taxes. The mission of DTF’s Collections and Civil Enforcement Division (CCED) is to increase col-lections, but to do so in a manner that respects the rights of citizens, by taking actions commensurate with eachdebtor’s situation. CCED must accomplish this in an environment with limited resources. In a collaborativework, NYS DTF, IBM Research, and IBM Global Business Services developed a novel tax collection optimizationsolution to address this challenge. The operations research-based solution combines data analytics and optimiza-tion using the unifying framework of constrained Markov decision processes (C-MDP). The system optimizesthe collection actions of agents with respect to maximizing long-term returns, while taking into account the com-plex dependencies among business needs, resources, and legal constraints. It generates a customized collectionspolicy instead of broad-brush rules, thereby improving both the efficiency and adaptiveness of the collectionsprocess. It also enhances and improves the tax agency’s ability to administer taxes equitably across the broadscope of individual taxpayers’ situations. The system became operational in December 2009; from 2009 to 2010,New York State increased its collections from delinquent revenue by $83 million (8 percent) using the same setof resources. Given a typical annual increase of 2 to 4 percent, the system’s expected benefit is approximately$120 to $150 million over a period of three years, far exceeding the initial target of $90 million.

Key words : dynamic programming; tax policy; decision support systems; data analysis.

The New York State Department of Taxation andFinance (NYS DTF) collects over $1 billion annu-

ally in assessed delinquent taxes. The mission ofDTF’s Collections and Civil Enforcement Division(CCED) is to increase collections, but to do so in amanner that respects the rights of citizens by takingactions commensurate with each debtor’s situation.

CCED must accomplish this in an environment withlimited resources.

New York State has always been aggressive inapplying technology to facilitate collections; however,critical resource decisions about “who will work thecases and when” have essentially relied on manualrules based on gut decisions. The result was a

74

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Miller et al.: Tax Collections Optimization for New York StateInterfaces 42(1), pp. 74–84, © 2012 INFORMS 75

one-size-fits-all approach that was both inefficientand inequitable in treating tax debtors. NYS DTFsought a more flexible decision model that wouldcustomize its collection activities to an individualdebtor’s situation.

In a collaborative work, NYS DTF, IBM Research,and IBM Global Business Services developed a noveltax collections optimization solution to meet this chal-lenge. The solution combines data analytics and opti-mization using the unifying framework of constrainedMarkov decision processes (C-MDP). It optimizescollection actions to maximize long-term returns,while considering other complex dependencies.

The resulting system has some notable character-istics. Via the tight coupling of data analytics andoptimization in the C-MDP framework, it realizesa level of decision automation that had never beenachieved. Because the optimization process is guidedby data analytics, the resulting collection policy canadapt to changes in the environment. It also opti-mizes the collection rules, driving long-term returns,which are critical in this domain because all collec-tions actions (e.g., warrant as prerequisites to levies)do not show immediate rewards. This aspect canfree management from relying on manual, sequen-tial workflow rules, which have hindered the opti-mization of collections processes to date. Althoughthe sequential dependencies between various collec-tions stages are automatically discovered, the systemalso accepts hard constraints, which are expressedas rules, for use in its optimization process. Hence,the system brings together three principal elementsof operations research (OR)—analytics, optimization,and rules—in a novel, coherent, and effective man-ner, and simultaneously attains their correspondingbenefits—adaptability, optimality, and practicality.

The system became operational in December 2009,and the results to date are compelling. New YorkState increased its delinquent revenues by $83 million(8 percent) using the same set of resources. Assum-ing CCED’s typical annual increase of 2 to 4 percent,the expected additional benefits from the case identi-fication and selection system (CISS) are $120 to $150million in additional tax revenues over the next threeyears. This far exceeds the initial target of $99 mil-lion. We expect revenue increases to improve as thesystem adapts to environmental changes. In addition,

we have seen an increase in the productivity of ourprimary enforcement actions and a decrease in theneed to use them. The use of detailed characteris-tics to make informed decisions about individual taxdebtors allows us to treat taxpayers as individuals.The result has been both an increase in revenues anda decrease in the number of enforcement actions—asituation in which all parties win.

ChallengesThe NYS Tax Department has always stayed abreastof and taken advantage of new technologies thatcould improve its collections efficiency. In 2005, thedepartment started discussions with IBM about anOR project, which we called the Case Identifica-tion and Selection System (CISS). During the previ-ous 10 years, the tax department had implementednumerous changes to modernize its operations; how-ever, problems still existed. To understand the valuethat OR has added through CISS, an understandingof the background and nature of the task of collectingdelinquent taxes in New York State is important.

Tax collection in New York State, much like every-thing else, is made more complex by the numberand diversity of its citizens. Tax debtors come in allshapes, sizes, and levels of sophistication; some aremore willing to pay their taxes than others. Collec-tion techniques appropriate for collecting a debt froma small family-owned business are unlikely to be thebest techniques for collecting taxes from a Fortune 500company; however, both are in our debtor inventory.In addition, in New York, unlike some other states,the collections division is responsible for collectingsome 40 state and local taxes.

The laws dictating how tax debts are collected inNew York State are an additional hurdle. The statemust first file an administrative lien, also known as atax warrant or judgment, before any levy, wage gar-nishment, or seizure of other assets is allowed. Thisrequirement made it necessary for our OR solutionto address not only what was the best decision at apoint in time, but to anticipate the future impact thata decision would have (i.e., a levy requires a priorwarrant).

In the decade prior to 2005, CCED operated asa field-based collections operation that relied almost

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Miller et al.: Tax Collections Optimization for New York State76 Interfaces 42(1), pp. 74–84, © 2012 INFORMS

exclusively on effective and costly face-to-face con-tact with tax debtors at their homes or businesses.By 2005, CCED had centralized its operations arounda state-of-the-art call center and a reduced fieldstaff that focused primarily on business tax debts. Itreduced its work force by 30 percent and significantlyshifted its staff from field to call center functions.These adjustments and the corresponding automa-tion of many tasks resulted in an increase in collec-tions revenue to twice what it had been in 1995 withno significant increase in the amount available foractive collections. In 1995, CCED collected $500 mil-lion; by 2005, its collections were just under $1 billionalthough it had reduced its staff by 30 percent.

The success story was good; however, problemsstill existed. As CCED evaluated where it had comefrom and where it wanted to go, one fact becameclear. Decisions on what collections path a case shouldtake were still being made based on crude formulasdeveloped based on that most basic analytical tool—the gut. Initial routing choices—should a particularcase be worked by call center staff, field staff, spe-cialized units, or not at all—were based on broad lin-ear statements such as “business tax cases over $xwill go to the field.” The problems with this are obvi-ous. Although some business tax cases over $x arebest suited for the field, many could be collected suc-cessfully by telephone. The cost of a telephone call isdramatically less than a field visit. Additionally, casesstayed in the call center too long and were not beingassigned to the field fast enough. By the time a casewas assigned to the field, the taxpayer was often outof business and the debt no longer collectible.

This one-size-fits-all approach was also problematicwhen looked at relative to the collection cycle withinthe call center. Once assigned to call center inventory,all cases followed a linear path of allowable actionsthat were based on law and policy. All cases startedwith an initial telephone contact in which we wouldtry to come to an agreement with the taxpayer toavoid any civil enforcement actions. If left unresolved,the case proceeded to warrant and, if still unresolved,to levy or wage garnishment based on the availabil-ity of those sources. All tax debtors were treated thesame in this process. Decisions to take action werebased on what we could do. Via the magic of OR,

CISS now allows us to make decisions based on whatwe should do.

Let’s look at that last statement: decisions basedon what we should do. Why is that so important?In evaluating the process that we describe above, doesit make sense to treat all tax debtors the same? Notonly was our linear, gut-driven process not maximiz-ing revenue, it was also unfair to taxpayers. If thetax debtor is a chronic delinquent who has requiredenforcement action in the past, is a telephone calllikely to result in payment? Conversely, if this is thefirst infraction for a tax debtor who has a long historyof voluntary compliance, should we not take extraefforts to contact the debtor before we proceed withwarrant and levy procedures?

We needed a way to make decisions based on theindividual characteristics of each tax debtor. The lin-ear one-size-fits-all approach was both inefficient andunfair; there had to be a better way. How could wepossibly find a way to do this on an inventory ofmore than a million tax debtors, all with their ownindividual stories? Our answer was OR and businessanalytics.

Solution

Solutions ConsideredBecause of the complexity of the collection process,and particularly because of the legal and businessconstraints, following rigid manually constructedrules to guide collection activities is common practice.Even in state-of-the-art rule-based collection systems,the role of data analytics is typically limited to thatof augmenting the rules engine with scores that canbe referenced by the rules. However, existing systemsbased on manually constructed rules suffer from theshortcoming that the rules do not adapt to changes inthe environment, including economic and organiza-tional changes. Hence, the need for a more adaptivealternative approach has been recognized in the field.

An alternative to such a rigid manual rule systemis a mathematical model that describes the complexcollections process, applies analytics to estimate it,and then optimizes to arrive at a collections pol-icy within the class of policies describable in themodel. The MDP provides one such framework witha rich-enough vocabulary to describe the complexity

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


involved in a collections process (e.g., the presence ofdifferent stages and the intricate interactions amongthem). An earlier work at General Electric appliedan MDP-based approach to the problem of manag-ing consumer credit delinquency (Makuch et al. 1992).However, this work was based primarily on opti-mization techniques and did not integrate data ana-lytics and optimization to the extent that we didin the present work. Specifically, the states of theMDP were fixed a priori, and the transition prob-abilities among them were estimated prior to theoptimization process. Today, with the abundance ofhistorical data in digital form and with advances indata mining and analytics technology, the businessrequirements and expectations demand a collectionsmanagement system that can customize enforcementactions depending on a diverse set of individualtaxpayer characteristics. Working with a small fixednumber of taxpayer states is insufficient for this.

As we discuss in this paper, we developed anovel tax collections optimization solution that usesa unique combination of data analytics and optimiza-tion techniques in a unified manner, based on theframework of C-MDP. The system optimizes the col-lection actions of agents with respect to maximizinglong-term returns, while taking into account the com-plex dependencies among business needs, resources,and legal constraints.

With C-MDP and the associated methodology ofconstrained reinforcement learning (C-RL), the sys-tem realizes a level of decision automation on a scalethat had never been achieved in real-world businessoptimization applications. In addition, to deal withthe complex constraints and dependencies that havehindered the automation of collections decision pro-cesses, it also accepts constraints expressed as rules.

Contacttaxpayer by

phoneAssessmentinitiated

No responsefrom taxpayer

Assigned tocall center Payment

Installpayment

Availableto warrant

Issue warrantAvailable

to levy

Complete

Find financialsource

Figure 1: The graphic illustrates an MDP formulation of the collections process.

Thus, analytics, optimization, and rules are broughttogether in a coherent manner to provide a practicaland viable solution.

Solution DescriptionIn this subsection, we describe our solution in detail.The description is at a conceptual level that draws onconcrete examples. We refer the reader to Abe et al.(2010) for the technical details.

As we mention above, we use the MDP framework(Puterman 1994, Tsitsiklis and Van Roy 1997) to for-mulate the process of tax collections. Intuitively, thestates of the MDP represent the stages in the collec-tion process that respective taxpayers’ cases undergo.The actions are the collections actions taken by DTFfor those cases, and the rewards are the dollars col-lected, optionally reflecting other measures of com-pliance (see Figure 1). We assume that the taxpayersmake transitions through this state space representingthe collections process, and their transition probabil-ities and their rewards will naturally depend on thechoice of the collections actions. The goal of our solu-tion is to find a near-optimal policy that determinesthe appropriate choice of collections actions to take,depending on the taxpayer’s current state.

An important point worth emphasizing is thatalthough Figure 1 appears to show relatively fewstates, which are fixed a priori, this is not the case.We elect to represent the state space by a high-dimensional feature space that consists of hundredsof features summarizing each taxpayer’s history,including transactions and collection actions taken.We used approximately 200 features in the deployedsystem; however, the system can accommodate a vari-able number of features. Partitions (segments) of this

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


feature space are automatically discovered within ourmodeling and optimization procedure.

When we can reasonably assume that the full struc-ture of the MDP is known or is estimated a priori(i.e., we can reliably estimate the state set and thetransition and expected reward functions), then well-known methods (e.g., the value iteration procedure)exist to solve for an optimal policy in the MDP. How-ever, we emphasize that this assumption is not rea-sonable for a large-scale real-world problem such asthe one we have at hand. Given that we have ahigh-dimensional feature space representing the statespace, reliably estimating the structure of the MDP isa formidable task. Therefore, we turn to the approachknown as RL (i.e., approximate dynamic program-ming) or the collection of methods for learning a near-optimal policy in an MDP, given access only to thedata generated from the MDP, not the MDP structureitself (Kaelbling et al. 1996, Puterman 1994, Suttonand Barto 1998, Tsitsiklis and Van Roy 1997).

Another important aspect of our application is theexistence of various constraints that must be respectedin our search for an optimal collections policy. Sev-eral types of constraints are involved. The first cate-gory consists of resource constraints that are imposedbecause some collections actions consume physicalresources. Other constraints might not be resourcerelated; these include bounds on the number of cer-tain types of actions that are considered reasonablein a given period. The second major category of con-straints, called action constraints, are those that areimposed by business or legal considerations.

The foregoing considerations motivate our solutionto be based on a constrained version of RL method-ology (C-RL). Although the literature includes a con-siderable amount of prior work in the area of C-MDP(Altman 1992), less is known in the area of C-RL.Existing methods within the C-MDP literature, whichare primarily based on optimization, are not well-suited to our current formulation, because they donot readily render themselves to be combined effec-tively with statistical data modeling, except throughdirect estimation of the C-MDP structure separatelyfrom the optimization process. In this work, we electan integrated approach in which data analytics andoptimization are tightly coupled and applied in aniterative fashion.

Figure 3 (see the appendix) provides an informaldescription of how the generic method works. Thekey lies in the constrained version of Bellman’s equa-tion, shown in the center of the figure, for estimatingthe expected long-term reward as a function of stateand action. The concept is that the long-term reward(R on the left side) equals the sum of the immediatereward and the long-term reward expected from thenext state.

As in the original value iteration procedure for solv-ing Bellman’s equation, we iterate this process eachtime by updating the value of R on the left sideusing the estimated value of R from the previousiteration on the right side. Note that the equality cor-responds to regression or estimation, whereas max-imization corresponds to optimization, both over alarge population. In our method, we do the estimationstep using a scalable procedure for segmented lin-ear regression; in each segment, R is estimated usingregression in terms of action variables (Apte et al.2001, Natarajan and Pednault 2002). Thus, the proce-dure discovers segments in the feature space, each ofwhich is relatively uniform with respect to the effectsof actions. When this process is completed, the out-put (a set of action effectiveness coefficients per seg-ment) is entered into the optimization process, whichdetermines a policy that allocates actions to maximizeR over the population while respecting various con-straints. This iterative procedure, consisting of ana-lytics and optimization, is repeated many times; eachtime, it performs further “looks ahead” and improvesthe quality of estimation and optimization.

The actual algorithm we deploy is more involved;it includes steps to normalize R in its iterative process,which brings out the relative effects of competingactions on R. However, the basic concept is capturedin this intuitive account. Abe et al. (2010) providedetails.

The quality of the models improves through thisiterative process. This includes the segmentation. Thenumber of segments is generally different in each iter-ation, and the quality of segmentation improves asthe iterations proceed, as do the overall action alloca-tions. For example, note that the segmentation in theinitial iteration is optimized with respect to the esti-mation of immediate rewards, which is clearly sub-optimal for estimating the long-term reward. This is

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


also confirmed in empirical evaluation using multipledata sets, as Abe et al. (2010) discuss.

The discovered segments and the action allocationsgiven to them by the constrained optimization pro-cedure form the collections rules. Note also that theaction constraints can involve complex definitions;hence, representing them as rules is natural. Thus, thedeveloped system can be viewed as an automatic rulegeneration system, which receives as input action con-straint rules (and training data), invokes analytics andoptimization, and generates collections rules. Both theinput and output rules of the engine can be housedin a rules management system, with the engine pro-viding an automatic rule generation capability.

Another key advantage of the engine is its focus onmaximizing long-term rewards. By using the C-MDPframework, the rules it generates are optimized withrespect to the long-term return, which is critical inthe collections domain because all collections actionsdo not enjoy immediate rewards (e.g., warrants asprerequisites to levy, as noted earlier), and lookingahead is necessary to fairly assess the value of suchactions. Rules optimized from a myopic viewpoint(i.e., with respect to immediate reward maximization)would not be suitable. Although the generated rulesare optimized with respect to long-term rewards, theyare instance-in-time rules that specify which actionsare to be taken now, and they can be executed in arules engine. This is an added advantage of our MDPformulation.

Scalability is another strength of our solution. As itscomponents do, it uses a scalable segmented regres-sion engine (Apte et al. 2001, Natarajan and Ped-nault 2002) and a fast linear program solver (COIN-OR), both of which were developed primarily at IBMResearch. These modules are incorporated in a uni-fying framework of C-RL; the solution provides ascalable system that is well suited for real-worlddeployments.

DeploymentIn this section, we give some concrete examples ofthe key artifacts in the system deployed at NYSDTF. These include actions, resource constraints, statespace (modeling) features, legal and business con-straints, output segments, and action allocations.

Actions and Resource ConstraintsIn the legacy system, the overall collections pro-cess is guided by the notion of case assignment.We have accordingly designed the set of actions: weconsider direct collections actions that can be takenat an assigned organization, and indirect or move-ment actions that move cases from one organizationto another. Table 1 lists the set of actions being con-sidered. The first and third groups consist of directactions, and the second group includes the movementactions. The third group (i.e., perform field visit) isspecial in that it can only be performed in districtoffices.

Resources are equated to the person-hours avail-able in various organizations for performing the col-lections actions being considered. For each action thatwe consider, we estimate the expected amount of timerequired to perform it based on historical data (seeTable 1, column 2); we use these hours in conjunctionwith the person-hours in organizations to determinethe resource constraints. Note that direct actions con-sume resources from the organization to which thecase is assigned currently, whereas indirect actions,in and of themselves, do not consume any resources.However, they will incur resources in their futureorganizations; we consider this in evaluating the look-ahead mechanism inherent in the C-MDP framework.Additionally, we provide a form of constraints to

Action Hours Upper bound

Collections actionsContact taxpayer by mail 0001 51000Contact taxpayer by telephone 0014 21000Create warrant 0001 51000Create income execution 0001 10Create levy 0009 51000

Movement actionsMove to district office 0 51910Move to high-value team 0 330Move to collection vendors 0 340Move to indiv. case enforcement 0 100

Organization-specific actionPerform field visit 00625 51000

No actionTake no action 0 1001000

Table 1: The table illustrates collections actions being considered, includ-ing the hours required to perform them and upper bounds on their alloca-tions specified in hours per day.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


explicitly control the number of allocations of actionsthat we call upper bounds. Because there is one uni-fied action (i.e., move to district office) for movinga case to any district office, unlike other specializedorganizations, we allow for an additional type ofbound per district office.

Modeling FeaturesGiven data consisting of taxpayer background infor-mation, complete taxpayer history of transactions(i.e., payments) and actions (e.g., contact and collec-tions actions) taken onto them by the DTF, we gen-erate a time-stamped sequence of feature vectors atmultiple sampling (or evaluation) time steps for eachtaxpayer case; we use these as training data in theC-RL procedure. In the deployed system, we usedapproximately 200 modeling features. Below, we listsome examples that we have grouped into categoriesbased on their features.

(1) Taxpayer: number of nonrestricted financialsources, sales tax inactive indicator, number ofbankruptcy filings;

(2) liability: total liability balance, sum of collectibleassessments, sum of assessments available to warrant;

(3) transactional: tax paid previous year, number ofpayments since last action, number of payments todate, sum of payments during previous year;

(4) collections: number of open perfected warrants,days since last warrant perfected.

Legal and Business ConstraintsIn addition to the modeling features, the engineuses a group of features called action constraints—binary features specifying, for each action considered,whether the action is currently allowed on the case,according to the business and legal rules. The genera-tion of these features is done by entering the modelingfeature vectors into the rules engine, which contains acatalog of approximately 300 business and legal rulesthat users have constructed (see Table 2).

Output Segments and AllocationsTable 3 shows an example segment output by theengine. Some interesting observations can be made.Notice that this segment (i.e., Segment 212), for whichmany warrant actions are allocated, includes the con-dition that the “number of nonrestricted financial

Rule no. Rule contents

Contact rule502.12 A collection letter should not be sent to a taxpayer

whose mailing address is invalid.2,000.1 A contact action should only occur for a taxpayer with

at least one open mature assessment.2,005.9 A contact by mail must not be made for a taxpayer with

an active promise to pay within 30 days.Levy rule

2,601 A levy is not allowed for a taxpayer unless the taxpayerhas at least one perfected warrant.

Table 2: The table shows samples of action constraint rules.

sources” is at least 1. This is one of the condi-tions necessary for a levy; its inclusion for a seg-ment for which warrant is recommended suggeststhat the look-ahead mechanism of our RL methodis working properly. Also note that “state equalscall_center_not_warranted," which means that thecase has not been warranted.

Deployment ArchitectureThe architecture of the deployed system, CISS for Col-lections, is schematically depicted in Figure 2. Theoverall solution is deployed and executed within aWebSphere process server (WPS) application server.Data sources include the collections data, tax returndata, and additional external data (indicated as Sys-tem 1, System 2, and System 3 in the figure), whichare used to compute summary information about thetaxpayers; this information is stored as an intermedi-ate representation referred to as taxpayer profiles. Thetaxpayer profile exists in a DB2 database (see TP pro-file in Figure 2) as a single XML document for eachtaxpayer. The profile records the history of the tax-payer states using the temporal data model approach

Segment definition Action allocation

Segment 212state = call_center_not_warranted 5,103 crt_warrantand tax_paid_last_year ≤ $X 152 no_actionand 1 ≤ number_nonrestricted_financial_sourcesand 1 ≤ sales_tax_inactive_indicatorand number_payments_since_last_action ≤ 1etc 0 0 0

Table 3: The table illustrates an example of an output model segment.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 2: The diagram shows the collections system architecture.

suggested by Wang and Zaniolo (2004), using transac-tional semantics with a granularity of one day. A sim-ple transform can generate the taxpayer’s state as itwas on any particular day.

Numerous legacy systems generate events, whichare passed to the engine by a separate workflowprocess that specifies how events flow through theengine. Each event has an XSLT2 (i.e., a language fortransforming XML documents into other XML doc-uments) transform that updates the taxpayer profilewith the new information. Each week, taxpayers whohave had an event, and those whose next review willbecome due that week, are considered for scoring.They pass through a number of transforms that checktheir suitability for scoring, apply prescriptive rules,and generate the state features, thus providing thescoring data. An analogous process is used to gener-ate the training data of the C-RL procedure, exceptin the latter case in which a sequence of state fea-ture vectors per taxpayer is generated at a numberof evaluation points in the past. The two types ofstate feature data are indicated as taxpayer state (cur-rent) and taxpayer state (training data), respectively,in Figure 2. Taxpayer profiles are computed for overtwo million taxpayers, typically as training data for

modeling. We use feature data for a subset of them—approximately a few hundred thousand taxpayers,each with five to 10 state feature vectors; on average,they total approximately one million data records.

The overall CISS for Collections solution includesbusiness process management (BPM) technology formonitoring purposes. The automation and humanworkflow components of the WPS are the corner-stone of the BPM aspects of the CISS for Collec-tions solution. These components help orchestrateover 30,000 automated collections actions and deliverover 8,000 pieces of collection work requiring humaninvolvement each week.

BenefitsIncreasing revenue from collections on outstandingassessed tax liabilities is our project’s primary goal.At the end of the first year, we clearly achievedthis goal and exceeded expectations. The projectedincrease in revenue for CISS was $99 million tobe achieved over three years. As previously noted,CCED implemented the CISS system in December2009. A comparison of CCED revenue in 2010 (afterthe CISS implementation) and in 2009 (prior toCISS implementation) shows an increase of $83 mil-lion (8.22 percent). CCED has averaged year-to-year

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


increases ranging between 2–4 percent during the pastseveral years; this places the revenue increase fromCISS in its first year in the range of $40–$50 millionalready realized, with a promise of $120–$150 millionover the three-year period originally projected.

This increase was possible despite several condi-tions that normally would have led CCED to expectreduced collections (or average collections, at best) in2010 versus 2009. Primary among these reasons wasthat 2009 had been an unusually successful year. The6 percent increase over 2008 was the largest year-to-year increase on a steady path of year-to-year pro-gression over the previous five years. CCED expectedthis because it had implemented a new program thatrequired all New York State banks to match theiraccount holders to judgment debtor files. The resultwas a bonanza of new levy sources and a significantincrease in levy collections. Levy revenue accountedfor almost one-third of the total increase in 2009.CCED anticipated that the effectiveness of the match-ing process would decrease over time. The 8 percentincrease over what had been its best year, and withoutthe flood of new sources provided by the first match-ing in 2009, indicates that some factor had changed.

CCED was also concerned about the effect the econ-omy, which was beginning to show signs of slowrecovery in mid-to-late 2009, would have on collec-tions in 2010. We knew that the most recent previousrecession in 2001 had caused no real impact in the firstyear of recovery (2002); however, it resulted in a defi-nite decrease in collections by the second year (2003),in which CCED experienced its only year-to-yeardecrease in collections in the past decade. Finally, wenote that because of budget constraints, maintainingstaffing levels was difficult in 2010. In 2009, we wereable to maintain a consistent staffing level throughoutthe year; however, in 2010, we were unable to replacestaff members as they left; our staff numbers, partic-ularly in our call center, did not return to expectedlevels until October 2010.

Because of these concerns (i.e., the prior year’sunusually high revenue, the economy, and staffingreductions), CCED had projected only a 3 percentincrease over 2009 for 2010. This projected increaseconsisted almost totally of the $30 million projectedfor CISS in its first year. The actual increase of over8 percent was a surprise. In addition to the strong

revenue numbers in spite of the obstacles noted,the improvement in dollars collected per action isanother strong indication that CISS recommendationsfor enforcement action were both superior and theroot cause of our increased revenue in 2010. Warrants,which as described previously are tax liens againstthe assets of an individual or business, and levies,which are legal notices that require a third party suchas bank to turn over funds to pay a tax debt, accountfor over $250 and $150 million, respectively, in rev-enue per year. A comparison of actions taken prior toCISS recommendations with those taken as a result ofthe recommendations shows a marked increase in rev-enue per action taken. In the case of warrants, 22 per-cent more revenue per action taken was achieved; theincrease in dollars per levy served was 11 percent.

As we noted in the Challenges section, our goalswere to both increase collections and to make deci-sions based on what we should do—not what wecould do. Actions such as warrants and levies clearlyhave serious consequences for tax debtors and impactothers who depend on those debtors. If greateramounts of revenue can be collected and CCEDreduces the need to use these disruptive collectiontools, all parties win.

It had long been a general principle that the morewarrants we did and the more levies we served, themore money we would collect. CISS forced us toreevaluate that concept. We reduced the number ofwarrants by 9 percent and the number of levies servedby 3 percent, while increasing revenue from bothin 2010. This translates to thousands of New YorkState citizens who did not have to undergo the ordealof finding out that money had been taken fromtheir bank accounts or their credit had been harmed.CISS makes the recommendation to take these actionsbased on the tax debtor’s prior history and of whatwill work now—not what action is next in a linearprocess.

The more-complex tax debtor cases require morethan a point-in-time recommendation. These casesmust be separated from those that can be workedappropriately via a series of recommendations foraction as soon as possible. They must be assigned toa field agent and are often time sensitive because thedebtor might go out of business. CISS has reducedthe average age of a case that has been assigned to a

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


field office by 9.3 percent. An evaluation of collectionsby our field operations shows this has paid off. Rev-enue from field operations increased by $32 million(12 percent) in 2010 compared to 2009.

Appendix. The Core AlgorithmIn this appendix, we briefly describe the core C-RLalgorithm used in our solution. Recall that the objec-tive associated with an MDP is to find a policy, �,that maximizes the expected cumulative reward,

E

[

�∑

t=0

R4st1�4st55

]

1

where st1 t = 01 0 0 0 1�, are the sequence of statesresulting from the repeated application of the policy.

As we mentioned earlier, the key concept behindthe class of C-RL methods we use in this work isin the constrained version of Bellman’s equation, inwhich the maximization on the right side is obtainedthrough constrained optimization rather than throughstrict maximization (see Figure 2). Loosely basedon this idea, we can derive constrained versionsfor many known RL methods, including constrainedQ-learning (Watkins 1989). Although the foregoing

Figure 3: The schematic shows a representation of a generic C-RL procedure.

discussion motivates a family of methods, not all ofthem are well suited to real-world applications. Deal-ing with variable time intervals between actions isa critical issue. Among the body of past works thataddressed the problem of extending Q-learning andother related learning methods to variable time inter-vals and continuous-time setting (Baird 1994), Baird’sadvantage updating algorithm is particularly attrac-tive and has proven effective in previous applications(Abe et al. 2004).

Advantage updating is based on the notion ofadvantage of an action a relative to the optimal actionat a given state s, written A4s1a5:

A4s1a5=1ãts

4Q4s1 a5− maxa′

Q4s1a′550 (1)

In the above equation, Q4s1a5 denotes the Q-functionor the two-place value function, and ãts denotes thetime interval between the state s and the subsequentstate. The notion of advantage is useful because it nor-malizes the dependence of the value function on thetime interval (by division by ãts), and factors out theinfluence of the state (by subtraction of maxa′ Q4s1a′5).

Given this notion of advantage, advantage updatingis an online learning method that learns this function

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


iteratively, by a coupled set of update rules for theestimates of A and V (the value function), and a nor-malization step for A4s1a51 which drives maxa′ A4s1a′5

toward zero. Although it differs from the canoni-cal Q-learning method, its central step still involveschoosing an action that maximizes the A-value esti-mate. Therefore, given the standard version of thisalgorithm, its constrained version can be derived byreplacing the maximization by the appropriate con-strained optimization, giving rise to a specific instan-tiation of the general idea described in Figure 3. Abeet al. (2010) provide details of this method.

ReferencesAbe, N., N. Verma, C. Apte, R. Schroko. 2004. Cross channel

optimized marketing by reinforcement learning. Proc. 10thSIGKDD Internat. Conf. Knowledge Discovery Data Mining, ACM,New York, 767–772.

Abe, N., P. Melville, C. Pendus, C. K. Reddy, D. L. Jensen,V. P. Thomas, J. J. Bennett, et al. 2010. Optimizing debt col-lections using constrained reinforcement learning. Proc. 16thSIGKDD Internat. Conf. Knowledge Discovery Data Mining, ACM,New York, 75–84.

Altman, E. 1992. Asymptotic properties of constrained Markovdecision processes. Technical Report RR-1598, Institut National

de Recherche en Informatique et en Automatique (INRIA),Valbonne, France.

Apte, C., E. Bibelnieks, R. Natarajan, E. Pednault, F. Tipu,D. Campbell, B. Nelson. 2001. Segmentation-based modelingfor advanced targeted marketing. Proc. 7th SIGKDD Inter-nat. Conf. Knowledge Discovery Data Mining, ACM, New York,408–413.

Baird, L. C. 1994. Reinforcement learning in continuous time:Advantage updating. Proc. Internat. Conf. Neural Networks,Orlando, FL, 2448–2453.

Kaelbling, L. P., M. L. Littman, A. W. Moore. 1996. Reinforcementlearning: A survey. J. Artificial Intelligence Res. 4(1) 237–285.

Makuch, W. M., J. L. Dodge, J. G. Ecker, D. C. Granfors, G. J. Hahn.1992. Managing consumer credit delinquency in the US econ-omy: A multi-billion dollar management science application.Interfaces 22(1) 90–109.

Natarajan, R., E. P. D. Pednault. 2002. Segmented regression estima-tors for massive data sets. Proc. 2nd SIAM Internat. Conf. DataMining, SIAM, Philadelphia, 566–582.

Puterman, M. L. 1994. Markov Decision Processes: Discrete StochasticDynamic Programming. John Wiley & Sons, Inc., New York.

Sutton, R. S., A. G. Barto. 1998. Reinforcement Learning: An Introduc-tion. MIT Press, Cambridge, MA.

Tsitsiklis, J. N., B. Van Roy. 1997. An analysis of temporal differencelearning with function approximation. IEEE Trans. AutomaticControl 42(5) 674–690.

Wang, F., C. Zaniolo. 2004. Xbit: An XML-based bitemporal datamodel. Proc. 23rd Internat. Conf. Conceptual Modeling, Shanghai,China, Springer-Verlag, Berlin, 810–824.

Watkins, C. J. C. H. 1989. Learning from delayed rewards. Ph.D.thesis, Cambridge University, Cambridge, UK.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Documents

Tax Collections Optimization for New York Statedmkd.cs.vt.edu/papers/Interfaces12.pdf · Miller et al.: Tax Collections Optimization for New York State 76 Interfaces 42(1), pp. 74–84,