60
POPULATION AND HOUSEHOLD PROJECTION METHODOLOGY FOR CITIES AND SUBREGIONS JUNE 2003 Stuart H. Sweeney Professor of Geography University of California, Santa Barbara

POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

POPULATION AND HOUSEHOLD PROJECTION METHODOLOGY

FOR CITIES AND SUBREGIONS

JUNE 2003

Stuart H. SweeneyProfessor of GeographyUniversity of California, Santa Barbara

Page 2: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-i-

EXECUTIVE SUMMARY AND RECOMMENDATIONS

This document provides an overview of several alternative methodologies that could be used toproject households and population for the SCAG region. The current endeavor is to produce aset of trend projections that exclude the influence of local institutional constraints or developmentscenarios. This report describes methods suitable for the trend extrapolation and suggest othermethods that may be appropriate for the baseline forecast and plan forecast. The report alsoexplores methods that are not currently feasible given currently available data sources but may befeasible for future trend projections.

The overall framework for the projections is circumscribed by several key decisions made bySCAG staff. First, the methodologies are required to be consistent with the housing unit method. This is an certainly a reasonable choice since the housing unit method is essentially the state-of-the-art for small area estimates and projections. Second, all city and subregion projections areconstrained to be non-negative. This constraint is motivated by political and administrativeconsiderations. Third, vacancy rates will not be explicitly projected. This forces a majordeviation from the logic of the housing unit method but the choice makes political sense. Insteadthe households (occupied housing units) will be projected. Finally, as noted above, the trendextrapolations will not consider anything beyond historical trends in the data. Institutionalconstraints, land constraints, and build-out scenarios from general plans will not be considered inthe current round of forecasts.

Though the document is wide-ranging, the constraints above force a choice among a small set ofalternatives at the current stage of the planning process. The projections can be broken down intotwo main modules: (1) occupied housing units, and (2) persons per household. The remainder ofthis summary provides an overview of the alternatives under each module, with references to thedetailed sections in the main document, and my recommendations.

Occupied housing unit projectionsThe alternatives for the occupied housing projection are related to the data inputs rather than themethodology. The methodology is a constrained extrapolation using stochastic simulation(sections 3.2 and 3.4). The input data series, as described in section 3.1, can include up to 21observations by combining information from annual time series, either the California Departmentof Finance E-5 series or the U.S. Census Bureau building permit series, with enumeration-basedvalues from the 1980, 1990, and 2000 censuses. The E-5 series seems to be the better choice (seesection 3.1). The second issues is whether model parameters should be estimated using the 21observation series for each city or if the long-term trends reflective in the absolute changebetween 1980 and 2000 (two observations per city) would be more reflective of the future.

Thus, the two options are:

(1) use the 21 observation series for each city, or,

(2) use only two observations, the earliest and latest values, per city.

My recommendation is option 1, see section 3.1, but the decision should be put to the members ofthe FTTF.

Page 3: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-ii-

Persons per household forecastThe PPH projection is more problematic given the tension between expectations for a strongdemographic component in the methodology and the lack of suitable data to support such amethodology. The so called ‘state-of-the-art’ for PPH projections tends to be very rudimentary atthe city level. Section 4.3 describes two demographically-driven models but both fail on differentaccounts. One due to data limitations and the experimental/ ‘un-tested’ nature of themethodology. The other due to significant internal validity issues with the methodology. Thefeasible alternatives at the current time are described in sections 4.1 and 4.2.

Again, there are two options:

(1) use the evolution of the occupied housing type mixture in conjunction with city level PPH bytype values to project total PPH, or,

(2) use a constrained trend extrapolation of the E-5 PPH values with bounds determined by expertopinion, currently [1.2, 5.5].

My recommendation is neutral in this case. The decision should be made after general discussionby the FTTF.

Page 4: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-1-

1. INTRODUCTIONThis document provides a description of alternative projection methodologies that will be used toproduce a final set of trend population and household forecasts at the small area level for theSCAG region. The base year for the projection is 2000 and the target year is 2030. To start, weshould note that we assign distinct meanings to the terms projection and forecast, followingIsserman (1991). The term projection refers to conditional if-then outcomes. That is, a projectedvalue is simply the logical extension of assumptions used in a model. The sections below discussseveral alternative projection models for each of the components of the housing unit projectionmethod. The term forecast refers to the one projection model, or equivalently one set ofassumptions, that the forecaster expects will be the most representative over the forecast period;in this case 2000 to 2030. Whereas the projections are logical extensions of a set of assumptions,forecasts reflect the expert opinion of the forecasting team. The trend forecast of population andhousing will be derived from the subset of projection models that seem most appropriate given theweight of both empirical and contextual evaluation criteria.

The nature of constraints imposed on the projections process are worth noting at the outset. There are three distinct domains that impose limits on the projections: (1) political/administrative,(2) data availability/quality, and (3) methodological. The first domain is concerned with theexpedience of administrative process and the anticipated political salience of model assumptionsand results among constituency groups. The second and third domains are traditional concerns inany model building exercise. In the small area context, the data availability and data qualityconcerns are severe. Each of the three domains impose fundamental limits on the process withcommensurate impacts on the resulting projected values.

Another overarching consideration is the decision by SCAG staff to partition projections into atrend projection, a baseline forecast, and a plan forecast. Some of the elements below wouldonly be considered in the baseline forecast and plan forecast. They are included here to providea complete statement of the proposed methodology.

Lastly, the objective in the current round of trend projections is twofold. On the one hand, thecurrent exercise needs to produce a trend projection to be reviewed by local jurisdictions andsubregions. On the other hand, the methodological developments may not be feasible in thecurrent round due to data limitations, political constraints, or simply time limits or budgetaryconstraints. Yet, any developments which are not operationalized during this round will provide auseful basis for the next round of trend projections in 2005 and may inform future data collectionneeds.

2. HOUSING UNIT METHOD: OVERVIEW

The overall methodological framework for the small area projections is provided by the housingunit method. The housing unit approach is an often used, and widely accepted, small areaestimation and projection methodology. For small area intercensal estimates, the population (Pt) is estimated as the group quarters (GQt) population plus the product of occupied housing units(Ht) and average persons per household (PPHt). In most applications the group quarterscontribution is negligible and the accuracy of the estimate depends on the ability to estimate,either directly or by proxy, the number of new and demolished housing units, the vacancy rate,and the average number of persons per unit.

Page 5: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-2-

(1)

In a projections context, each of the three components of the relationship are projected into thefuture. Thus, the projected population in year t+n is expressed as

where the hats indicate projected values. In the standard approach the number of occupiedhousing units is derived from three other forecasts including new housing units, demolishedhousing units, and the vacancy rate. The first two sum to the total number of housing units, afteradjusting for the base year housing stock.

The methodological choice of the housing unit method was imposed by the SCAG staff. It is areasonable choice since it is essentially the ‘state-of-the-art’ method for making either estimates orprojections at the small area level. It is used by most state demography agencies and by the U.S.census bureau. It is also noteworthy that the method has persisted for almost 50 years withoutany major modifications to the approach. That stagnancy speaks to the paucity of data at thesmall area level, thus necessitating a simple method.

The methods proposed for the SCAG sub-county demographic projections attempt to improve onthe housing unit method while remaining within its overall architecture stated in equation (1). Thetraditional method is uni-regional, assumes spatial independence, and is not stochastic. Over therelatively long 30 year projection period, some of the most important considerations will berelated to the impact of long-run demographic trends, the spatial evolution of the populationcomposition, and the extent to which either administrative or resource constraints impede thecreation of new housing units. The methods described below introduce spatial dependence,stochastic simulation, and a multiregional demographic sub-model. These improvements push thelimits of the data but are also careful to incorporate all possible information into the projections.

The sections below discuss projection methods, data, and assumptions for each of the componentsin (1). The last section of the methodology discusses the key assumptions that will be importantto consider in the projections review stage.

3. OCCUPIED HOUSING UNITS BY TYPEBy definition, projection year total housing units will equal the number of current housing unitsplus newly built units over the projection period and less demolitions over the projection period. As noted previously, occupied housing units is derived from total housing units by multiplying thelatter by a vacancy rate. Based on the political contentiousness surrounding projection yearvacancy rates, or the proportion of seasonal housing, the SCAG staff decided to directly modeloccupied housing units. The lack of any demolition data provides an additional justification forproceeding directly with occupied housing units. The methods below apply equally to anyhistorical housing series and could be adapted at a later date to project total housing units.

The subsections below propose projection methods that range from simple to complex; thecomplexity attempts to capture more realistic sets of conditions that we expect to prevail over theforecast period. The first subsection (3.1) describes the base data sources and modificatoins tothat data, (3.2) describes standard extrapolation methods that are applied to population oremployment projections, the second section (3.3) describes several approaches that can be used to

Page 6: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-3-

derive city-level growth ceilings based on either prior growth, land use change, or both, and thethird section (3.4) proposes a stochastic simulation method that accounts for spatial dependencein the housing unit projections. Section (3.5) describes some additional calculations needed forthe sub-types of total occupied housing units.

3.1 DataThe primary data inputs used to project the number of occupied housing units are: (1) historicalcensus enumeration data for occupied housing units by type at the city level, (2) historical buildingpermit series at the city level, (3) the E-5 series of intercensal housing by type estimates from theCalifornia Department of Finance, Demographic Research Unit, and (4) high spatial resolutionland use classification maps for the years 1990 and 1993. The enumeration data is from the 1980,1990, and 2000 censuses. The building permit series are collected by the U.S. Census Bureau andprovide information on the number of single-family and multi-family permits issued for the years1980 to 2000. The E-5 series provides annual estimates of total housing and housing by type for1980 to 2000. The land use maps provide the observed land use allocation by detailed land useclassification based on the interpretation of aerial photographs. We are confident that the housingunit data and land use transition matrices (derived from the land use maps) provide a goodindicator of future occupied housing unit trends.

The projections literature cites several common drawbacks associated with the permit data in bothestimation and projection contexts. Some of the shortcomings include the inability to identifyseasonal versus year-round housing units, that permits only indicate the intent to build, notevidence of a completed structure (this can be overcome by using ‘certified’ permit data), and thatpermits may indicate add-ons to existing structures rather than the creation of new units. Also,simple extrapolation models could predict more housing units than reasonably expected to occurgiven the extent of build out in a city, the availability of undeveloped land, and zoning constraintson existing land. In our case the building permit series from the census bureau only includes thenumber or permits and the number of units for new housing units. Moreover, by focusing only onoccupied housing units the seasonality issue is immaterial.

The E-5 series provides an alternative information set capturing intercensal variations in housinggrowth with housing type detail. The estimates from the E-5 series are probably more accuratethan the building permit information for at least two reasons. First, the state demographers aremore aggressive in collecting information on city level changes than the U.S. Census Bureau. Therefore the base data is likely to be cleaner than the raw building permit figures. Second, statedemographers are carefully impose important assumptions on the interpretation base data files. Importantly, building permit counts have a lagged transition into completed housing units. Asimilar lag assumption could be imposed on the census building permit series but that wouldmerely duplicate efforts already executed by state demographers and reflected in the E-5 series.

The reason for using either the E-5 series or the building permit series is the same. The annualseries provide an indication of the intercensal changes. In general, using a larger, more detailedinformation set should improve the resulting projections. The census enumeration data onlyprovides one, two, or three data points for a given housing type and city depending on when thecity was incorporated. By combining the relative rates of change from the annual series with theknown enumeration values from the census, the combined series is up to 21 years in length (1980-2000). For cities incorporated after 1980 the series will be shorter. The combined E-5/census

Page 7: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-4-

enumeration series passes through each of the observed decadal enumeration values and matchesthe relative rate of change from the E-5 data for the intercensal periods. Figure 1 displays fourexamples of the underlying data series that combine census data with the E-5 series. There arefour data series derived from the data: one for total occupied units, one for single family occupiedunits, and one for multi-family occupied units.

One could make arguments the use of the longer, more detailed series just described. The 1990sin the SCAG region witnessed a significant shift from historical growth patterns. SouthernCalifornia has traditionally been a fast growing region and future patterns are likely continue thatlong-term trend despite the economic recession and slow population and housing growththroughout much of the 1990s. This perspective would argue for only including the long-termgrowth reflected by change between 1980 and 2000. While this perspective is valid in noting thelong-term growth prospects and aberrant pattern of the 1990s, the annual series may not onlyreflect the recession but also important fundamental constraints on the growth prospects forparticular cities, independent of the recession. Some cities and regions erected significantinstitutional barriers to housing growth 1980s and 1990s. Other cities may have reached eithernatural limits or planned build-out limits during the 1990s. If either of these cases are true thanthe annual series would detect important trends that would not be reflected in the long-termgrowth rate. The annual series can also be justified on the ground that near-term annualvariations exert less influence on the resulting projections than the 1980 data point. The latterpoint, given it’s distance from the series mean, has more leverage on the parameter estimates ofthe extrapolation models. Therefore, the annual series essentially satisfies both concerns.

3.2 Extrapolation modelsExtrapolation methods are used to predict the future values of a variable based on its observedhistorical time path. In this case, the occupied housing unit data series for each city is used toestimate the parameters of alternative functional forms.

Alternative functional forms are provided in Table 1. In each case the functional relationship canbe transformed to a linear relationship (see Table 1). Given the linear transformed relationship,standard regression methods can be used to estimate the parameters of the non-linear functions. The predicted number of units for a given model is based on those parameter estimates. Theadequacy of the functional relationship is assessed using standard input and output evaluationmeasures. The input measures are listed in the last column of Table 1, and each is simply amathematical expression of the core assumption for each functional form. For example, the

geometric growth equation implicitly assumes that the growth rate ( ) is constant over the

projection period. The input evaluation criteria test whether this assumption is true in theobserved data series. The output evaluation criteria include standard measure of fit (MeanAbsolute Percent Error) and bias (Mean Error). Once a final ‘best fit’ curve is selected, theparameter estimates are used to project the number of building permits from 2000 to 2030.

Figure 2 provides a graphical representation of each nonlinear functional form suitable for theextrapolation projections of the occupied housing unit data. The only curve not represented in thefigure is a linear curve; we omit it since it is easy to visualize a straight line. Three of the fournonlinear functional forms require the input of a growth ceiling. In most cases, especially with a

Page 8: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-5-

(2)

(3)

long projection period, it is unlikely that housing construction will continue unconstrained. This iscertainly the case in cities that, for all practical purposes, have neither vacant land nor agriculturalland. Without a calculating a growth ceiling, c, the only functional forms available to fit to thedata are either a linear or geometric function.

The growth constraints are also extremely important in the current context since the city andsubregion projected values have to add up to the county level totals derived using alternativemethods. The constraints provide a means to operationalize the county control totals while stillrelying on historical city level trends.

3.3 Growth ceilingsThe growth ceiling, c, in Figure 1 can be based on expert opinion or derived analytically from datathat inform the nature of potential constraints. There are three potential constraints in the SCAGregion: (1) water availability, W, (2) institutional (e.g. zoning, development climate) constraints, I,and (3) land availability, L. The research team has already reviewed the available data fromregional water districts and determined that there is insufficient information to develop plausibleand empirically-informed water constraints. The institutional and land availability constraints canbe quantified with current information, and subsequent studies currently in progress at SCAG willimprove the ability to develop realistic growth ceilings. In addition, city-level constraints shouldalso be informed by the historical growth trajectory, G.

Given the foregoing discussion, the growth constraint should be proportional to a set of factorsthat inform the likely proportionate division of a known county total over a set of cities andunincorporated subregions. We could write the vector of constraints,

where the first three components would be excluded in a trend projection but would be includedin a baseline forecast and a plan forecast. The growth trajectory is easy to operationalize giventhe county constraint and an observed growth increment over a representative historical period.

The proportionality, rather than equality, is used in (2) since the sum of the constraints have to

equal the county control total; alternatively the city level constraint proportions, , have to

sum to one. In a stochastic simulation we could regard those proportions as multinomialprobabilities; we will return to that point below.

We could also specify a particular functional form for (2) such as,

where the tildes indicate that the variable have been normalized to lie between 0 and 1 and thesuperscript parameters, ", are a set of subjective weights indicating the relative strength of thefactors expected to prevail over the forecast period. The weights could be set by an expert panel(such as the FTTF) or by participants in the local review process.

In considering the land use and institutional components of the constraints, there are threepotential scenarios. In several cities the 1993 land use data indicates development environmentsthat have already reached build-out. In those cases it is unrealistic to assume that growth in

Page 9: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-6-

(4)

multifamily or single family units can occur without offsetting declines in other types of housingstock. In those cities new growth in population will primarily depend on changes in theoccupancy rates (persons per household) discussed in section 3 below. A second scenario appliesto cities on the urban fringe with relatively vast holdings of developable land (either vacant oragricultural). In those cases the constraints are calculated but are not binding over the forecastinterval.

The third scenario is the most critical. In these cases, the cities or subregions have developableland but the constraint is binding over the projection period. The review process should lookcarefully at the validity of the constraint, both in terms of land use classifications and theinstitutional environment. For example, in Ventura and Oxnard there are large stocks ofagricultural land but recent legislation has made it much more difficult to rezone the agriculturalland for higher use.

The next two sections describe particular approaches that could be used to operationalize the landuse portion of the constraint. The first is an accounting method based on land use classificationmaps. The second is based on the Landis’s urban futures model.

Accounting method:The first approach for identifying a growth ceiling is based on a simple accounting frameworkusing the 1990 and 1993 land use classification maps. The basic idea is to use the observedchanges during that period in conjunction with the observed building activity or growth inoccupied units to derive an expected ceiling on single family and multifamily units. Note that thismethod would not be used for the trend projection.

Table 2 provides a simplified matrix of observed land use transitions during the 1990 to 1993period. Panel A indicates the number of acres that have either stayed the same (main diagonal) orchanged classifications. Panel B should be read across the rows. It indicates the proportion of agiven land use type in 1990 that either remained the same or changed to a different type in 1993. Panel C indicates the net changes in acreage. The data in Table 2 represents aggregate changesfor the entire SCAG region.

Similar matrices can be defined for each city and subregion within the SCAG region. Defining acity level land use change matrix as L with elements lij , where the subscripts index the beginningand ending land use, we can define a building permit ceiling as

The notation bt1 refers to the number of building permits issued, or growth in occupied units, inyear t of type 1 (in this case single family dwellings). The three major terms enclosed inparentheses in (4) are interpreted as follows: the first term is the total undeveloped land (vacant

Page 10: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-7-

Figure 2: Calculation of growth ceiling, c

plus agricultural), the second term is land in rural and medium density residential as a proportionof all developed land, and the last term is the observed units per acre for rural to medium densityresidential growth over the 1990 to 1993 period. The product of the first two terms yields theacres of undeveloped land that are expected to be developed as rural to medium densityresidential. The last term converts those acres into building permits or occupied units dependingon the modeling context.

As shown in Figure 2, using 1993 as the benchmark, the value from (4) can be added to theexisting cumulative permits issued up to 1993 to arrive a the growth ceiling, c.

Parametric (‘Landis’) model:An alternative approach to modeling the land use constraint would be a parametric model of landuse change. One alternative would be to build on the types of models developed by John Landis,a Professor of City Planning at UC Berkeley. For completeness, we provide a brief synopsis ofhis approach in this section. Our opinion at this time, is that the data sources are insufficient totake this approach and, moreover, the results may be comparable to the less demandingaccounting method.

Landis’ California Urban Futures Model (CUF-II) is an urban growth and land use change modelcomprised of three pieces: projections of growth, a multionomial logit land use change model,and a growth allocation module. In CUF-II an observation is one hectare observed in twodifferent years with relevant descriptive data. A general logit model is fit to the observed land usetransitions, and the parameters estimated can then be used to estimate the probability oftransitioning from land use type I to land use type J for an arbitrary cell. Landis does not useCUF-II to estimate population. Rather, he produces an external population estimate and allocatesthe population based on the probability scores.

Page 11: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-8-

The are several potential problems with applying the Landis framework to the current context: (1) Population is estimated externally and any growth embedded in the land use change model resultsfrom the growth between the two years of observation; (2) Since transitions for a piece of landare estimated, standardized units would be desirable, either plots or cells (neither are available forthis study); (3) Additional data is necessary to predict the probability of change; (4) Spilloverrules would need to be developed.

Land use change models such as CUF-II might aid population projections with land constraints inthree ways. First, the probabilities associated with each cell provide preferences for the land usefor that cell. That information could be used to estimate the amount of land available forresidential use. Second, land in adjacent regions may appear to be available for residential usewhen land constrains a region’s growth. Third, the amount of land available for conversion fromnon-residential to residential use could be estimated. Likewise the amount of residential landconverting to non-residential uses could be estimated and taken off-line.

3.4 Spatial dependence and stochastic simulationThe extant literature on projections using the housing unit method excludes methods to controlfor several important aspects of urban growth. It is very likely that the individual jurisdiction leveltime series will exhibit spatial dependence; that is, the time paths of series in proximatejurisdictions will be highly correlated. Another problem in the forecasting literature is that thefinal forecasts often only provide a single forecasted time path, and on occasion a high and lowalternative. These outputs do not adequately represent the uncertainty in the final forecast.

Traditional population forecasts do not provide a measure of the uncertainty inherent inprojections. Most important, a single population forecast typically underestimates the temporalvariability of the observation record. Stochastic simulation amounts to generating alternative,equally likely, forecasts which are all consistent with the available information. Such alternativeforecasts exhibit the correct spatiotemporal variability inferred from the data, and, taken as a set,provide a measure of uncertainty in population or housing growth.

The use of a growth constrain informed by city level growth trajectories, G, provides aparsimonious method to incorporate information on spatial dependence. If growth is allocatedbased on prior growth, the natural spatial dependence in those growth patterns will implicitly becontained in the city level constraint.

Stochastic simulation can be incorporated by specifying probability distribution for the constraints,C, and fitting constrained growth curves to each draw from the distribution. As noted above theproblem of defining a set of small area projections equal to a region level constraint is identical to

finding a set of multinomial probabilities with parameters defined by the proportions, .

Using a Bayesian approach we would use the conjugate prior of the multinomial, the Dirichletdistribution, in conjunction with the observed growth increments. Additional uncertainty in theactual growth increments can be specified using a Beta distribution that allows the input to theDirichlet to vary between an upper and lower bound. In the current projections the upper andlower bound are defined by the minimum and maximum of the set defined by 5-year

Page 12: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

1 The county control totals for occupied housing units by type are projected separately. The forecasted

county control totals for single family, multiple family, and mobile homes are produced using exponentialsmoothing time series models. The independent forecasts are forced to sum to the total occupied housing countycontrol totals.

-9-

( ), 10-year ( ), and 15-year ( ) annual growth

increments. The two parameters of the Beta distribution (w and v) are set by the analyst. In thecurrent model the values are set at either (2,4) or (4,2) to sample more heavily in the direction ofthe 10-year annual growth increment since the SCAG staff seems to have some preference for thathistorical range. Another option would be to use (1,1) which specifies a uniform distribution withthe range between the minimum and maximum discussed above.

A single iteration of the simulation involves three steps: (1) a draw from the Beta distribution foreach city that defines a vector of annual growth increment realizations, (2) the vector is used tomake a single draw from the Dirichlet distribution which outputs a set of proportions, and (3) theproportions define a set of city constraints consistent with the control total and those constraintsare used to estimate the parameters of a particular growth curve (exponential, Weibull, orGompertz). The simulation is repeated multiple times (10,000+) to define a distribution ofgrowth curves for each city.

Finally, for each city simulation distribution and for each five-year increment of the forecastperiod (2005, 2010,..,2030) it is possible to recover a median value and an upper and lowerprobability bounds. Since the median is recovered for each five-year increment independently, themedians are drawn from different estimated growth curves. The median values are used to select asingle growth curve for each city that has the smallest mean absolute deviation from the median.

Model runs to date indicate the exponential always provides the best fit to the historical data andprovides the most plausible projection year values. Experimentation with the simulation alsoindicates that 10,000 simulated values produces stable projection estimates.

3.5 Additional calculations for units by type

The method described in section 2.3 works for total occupied housing units but a few additionalsteps are needed to calculate the occupied housing units by type (single-family detached, multi-family detached, mobile homes, and other). The basic method for single-family and multiple-family occupied units is identical to the simulation method discussed above. The mobile-homeand other categories are derived using fixed rates from the 2000 census. The fixed rates are eitherin relation to total housing or in relation to the sum of single- and multi-unit occupied housing. At the end of that process, there are four series that each add up to the county control totals foreach type of occupied housing unit.1 The problem is that the sum over each type of occupiedhousing unit does not equal the total occupied housing value for each city. Iterative proportionalfitting is used to force the ‘by type’ values to satisfy both the city occupied housing unit totals andthe county occupied housing unit by type totals. The resulting data matches the marginal totalsdefined by the county control totals and total occupied units by city totals while retaining the oddsratios of the unadjusted ‘by type’ projections for each city.

Page 13: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

2 For example, the cohort-component model used for the county projections relies on the age-structure offertility and mortality. The model is imminently simple in construction and avoids any detailed consideration ofthe economics of fertility. Even at the county level, the theoretically superior multiregional construction isjettisoned in favor of the simpler net migration formulation because of data availability and quality constraints.

-10-

4. HOUSEHOLD SIZE PROJECTIONSGiven projected occupied housing units from section 3, the next step is to estimate the residentpopulation living in the occupied housing units. The resident population is derived by multiplyingthe number of occupied housing units by the persons per household (PPH). Spatial and temporalvariation in PPH is related to several socio-economic factors. High PPH values, overcrowding,may be related to poverty, motivated by social-learning and safety concerns among recentimmigrant (ethnic enclaves), or perhaps a pocket of concentrated housing demand characteristicaround universities. Low PPH values could be linked to elderly populations and range of othersocial and economic pathologies. Ideally a PPH forecasting model would build on theories ofhousehold formation, immigration and assimilation, aging, and the economics of housing. Yet it isimportant to keep in mind that forecasting model deviate from explanatory models in theiradherence to social science theories. Forecasting, by nature, relies on strong patterns thatmaximize predictive power2. Forecasting relies on time series data whereas explanatory modelsare typically specified using either rich cross-section data or small samples of longitudinal data. Assuch, model specifications available for forecasting are usually much more highly constrained bydata than those available for explanatory modeling. Moreover, these constraints becomingincreasingly stringent as spatial resolution is increased. This is certainly the case with PPH at thecity and subregion level.

The sections below outlines four alternative methods that could be used to produce trendextrapolations for PPH. The most salient model from a theoretical and forecasting standpoint, themultiregional microsimulation in section 4.3 below, has data requirements that force undo relianceon model schedules and interpolation and that exceed the resources available for the currentround of projections. The feasible alternatives for the current trend projections are relativelysimple in form as dictated by data constraints.

4.1 Constant value, simple trend, or decomposition by type for persons per householdThe standard approach is to either assume a constant PPH with the value taken from the lastcensus or estimate a trend using past censuses. The constant assumption is unsatisfactory sinceconstancy is unlikely to prevail. The trend projections are somewhat problematic since both theareal extent and the existence of cities in the SCAG region varies over the three census years(1980, 1990, and 2000). At most the historical data provides three values and in several instancesonly two values or a single value. Moreover, there are expectations that PPH will be trending up-then down- in many places which would impose a quadratic form on the extrapolations. Over a30 year projection interval it is likely that the quadratic will dive sharply yielding unrealisticresults.

Another possibility is to view changes in the total PPH value as the result of underlying changes inthe composition of total occupied housing. The PPH values for different types of occupiedhousing show substantial variation. The median 1990 PPH values for single-familty, multi-family,mobile, and other are, respectively, 3.1, 2.4, 1.9, and 2.7. Figure 3 shows the distribution of thefour values over the 200 cities. Whereas, the single family PPH is higher and symmetric, the

Page 14: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-11-

multi-family PPH is lower and highly skewed. Since the observed PPH values by housing type inparticular regions are reflective of underlying differences in income distribution and other localfeatures that may induce the degree of relative crowding, it is plausible to allow changes in thetotal PPH to emerge from the shifts in the composition of housing types and their respective PPHvalues. As the composition of housing shifts over time, the PPH will shift with it.

Figure 4 shows some sample results from using the 1990 PPH by type with an allowance forhistorical drift. The PPH by type values are then applied to the occupied housing unit by typeforecasts to derive a total PPH value. The resulting plots show that even using a roughly fixed setof PPH rates, the resulting total PPH projected values are not fixed and can follow a non-linearpath.

4.2 Constrained trend using expert information

One of the problems with a simple extrapolation model of PPH is that predicted values mayviolate plausible bounds on the range of PPH. The lower limit is clearly unity in theory but it isextremely unlikely that any city would consist entirely of one person households. For the upperlimit values in excess of 6 would fall outside of any historical values observed in the region. Oneway to determine plausible bounds is to poll a panel of local experts. In this case, conversationswith two experts on the Los Angeles area suggests bounds of 1.2 and 5.5. The validity of thesebounds should be discussed by the Forecasting Technical Task Force.

This approach would involve three steps. First, constrained extrapolation curves would be fit tothe historical data with values constrained to lie between 1.2 and 5.5. Next, the PPH value wouldbe multiplied times the occupied housing for each city to calculate the resident population (recallequation 1). Finally, the projected resident population for each city would be rescaled to equalthe county control totals for resident population. Note that the rescaled resident populationvalues will imply a revised PPH for each city. If the revised PPH falls outside the bounds givenabove, iterative methods will be used to satisfy both the PPH constraints and the county controltotals.

The constraints imposed on PPH could also be used in the decomposition approach described insection 4.1.

4.3 Incorporating demographic processes

Spatial demography model: multiregional demographic microsimulationThe estimation and projection of persons per household (PPH) is still an active area of research inthe academic literature. A constant PPH is often unrealistic and research indicates that projectionerrors for PPH contribute the most to the total error in final forecasts based on the housing unitmethod. In the SCAG region, a constant PPH would seem to be a particularly bad assumption. The region continues to receive a substantial proportion of the new immigrants to the UnitedStates and those immigrant flows are highly spatially focused within the region. The effect ofimmigrant populations on PPH is particularly important to consider since immigrants and nativesuse the existing housing stock differently. In general, recent immigrant populations arecharacterized by much higher levels of overcrowding though the level of overcrowding varies byimmigrant type. Moreover, there is evidence that foreign born populations have higher birth rates

Page 15: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-12-

than the native-born population. As assimilation occurs the foreign-born population tends tofollow a predictable path through the housing stock and fertility schedules converge towards thenative-born population.

One way to explicitly account for such dynamics is to use a small area cohort-component modelwith detailed accounting for foreign-born / native-born categories. In particular, we could use azip code level cohort model that tracks the age-disaggregate stocks of native-born and foreign-born, by year of entry, with each population subject to characteristic fertility schedules, migrationschedules, and immigration schedules. Much of the variation in the foreign born is largelydependent on the fertility patterns prevailing in an immigrant’s country of origin. Therefore theideal model would also decompose the foreign born population by ethnicity - or at minimumLatino/non-Latino. The value of a multiregional cohort-component model is that it would allowthe PPH values to emerge from the underlying demographic processes at a small spatial scale.

The problem with building an operational spatial demographic model is that it requires extremelydetailed spatial data. The required city-level vital statistics (birth and death) and immigrationflows can be estimated from the national microdata birth and death records and the INS zip codedata. In both cases the data provides an incomplete picture for our purposes and spatialinterpolation and model-schedules are required to develop a complete set of data accounts. Theintraregional migration accounts are even more problematic. One source is the 1990 public usemicrodata (PUMS) in which the combinations of the sampling areas can be used to recover anincomplete geography of the SCAG region. This data could be combined with DMV records butthat data is unavailable for the current round of projections. Overall the required data accountsplace extreme demands on the available data sources and require the use of model schedules andsmoothing techniques typically applied in developing country contexts.

Another option that partially captures the spatial dynamics is to use net migration rather than acomplete matrix of interregional migration. The problem with net migration is that it is wellestablishment in the demography literature that the use of net migration in long-term projectionswill bias the resulting projections. Places with net out-migration will uniformly decline towardszero and those with net in-migration will trend towards a partitioning of the total populationamong the growing regions.

Demographic regression model: regression-extrapolation-imputationAn alternative approach would be to specify a predictive regression equation for PPH thatincludes relevant socio-economic variables as covariates. The regression equation would have tobe specified using the 200 observations (cities and unincorporated subregions) for the SCAGregion. The model parameters would be reflective of the overall region and may not be reflectiveof a particular city. Subregion dummy variables and interaction effects could be used to partiallyalleviate this problem.

Assuming that a regression equation specification does fit the cross-sectional data, the parameterswould be used to predict future values of PPH for each city. There are several problems with thisapproach. First, the constant parameter assumption, commonly termed functional stationarity, would assume that the parameters do not drift over time (unlike state-space models where theparameters evolve over time). Second, the predictions of the PPH values assume the existence offorecasts for each of the independent variables. This means that each of the independent variables

Page 16: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-13-

needs to be projected into the future. More importantly, it means that the errors in thoseprojections will be pooled and perhaps amplified in the final predictions from the regressionmodel.

Third, the if we are projecting individual independent variables such as population by ethnicity, weare only a small step removed from directly projecting total population. If that is the case, thenwhy not just directly project total population. The information sets are essentially identical. Infact, projections of the independent variables either implicitly or explicitly assumes the existenceof total population projections. The independent variables in the PPH regression should includethings such as percent Latino, percent foreign born, proportion over age 65. The projections ofthose independent variables can either be done using levels constrained to the county controltotals or direct projections of the percentages. The levels approach ensures consistency with thecounty control totals but the recovery of percentages necessitates projections of total population(the denominator of the percentages). The projection of percentages does not ensure consistencywith county control totals and the total population projection is implicit in the percentages.Moreover, projection and time series methods for rates are less well defined than those for levels.

Overall, the functional stationarity approach is tempting but is rarely used because of the issueslisted above. The statistical problems resulting in error propagation are a sufficient deterrentalone; the circularity of either implicitly or explicitly projecting the total population (as adenominator) provides an additional damning critique.

5. GROUP QUARTERS POPULATIONThe group quarters population is composed of individuals living in university dormitories, militarybarracks, and prisons. One often-used assumption is that the group quarters population will growat the same rate as the rest of the population; in other words, the group quarters population is aconstant fraction of total population. Over the 30 year projection period, the use of thatassumption would tend to overestimate the group quarters population.

One option here would be to use a survey sample to directly gather information about futurefacilities development plans among the major institutions in the region. Universities and prisons,for instance, have there own long-range plans which include future construction. It is also likelythat the cost of such a survey would not be worth the improved information since the groupquarters population is such a small share of the total population. A targeted survey to thosecommunities where the group quarters population is relatively large would make the most sense.

6. KEY ASSUMPTIONS

The three main areas where assumptions should be subject to local review are in the three areasthat traditionally account for the most error in forecast values. Those areas include the housingunit forecasts (section 3) and the occupancy rates (section 4).

Occupied housing unit projection assumptions:

• Existence and nature of development constraints: If the extrapolation procedures use anexternal constraint, the local review process should attempt to insure that the constraint is

Page 17: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

-14-

as reflective of reality as possible. The available data should only be viewed as a startingpoint. The local review process should focus on identifying sources of institutionalconstraints at the city level and quantifying those sources in terms of single family andmultifamily units. The general plan build-out analysis currently underway at SCAG mayhelp in the identification of more realistic constraint values.

• Excluding zoning codes from the land use map: We are currently excluding a range ofland use codes (such as public infrastructure) and physical topographies (land slopessteeper than a given value). The exclusions remove acreage from the developable landcategory. The review process should evaluate whether the set of exclusions are sufficient,too strict, or too liberal.

• Nature of new growth: The methodology in equation (2) proceeds on the assumption thatnew development of single family and multifamily units will be added in proportionsreflecting the existing development of the city. This assumption will likely understate theamount of land available to housing unit development since outlying areas will usually bedeveloped at lower intensities (assuming a decreasing rent gradient) than the existing landuses. More conservative values to use in the second term of equation (2) should bediscussed in the review process.

Occupancy (Persons per household) assumptions:

• Constant rate or demographically driven PPH: The conservative approach would be torely on the constant PPH rate assumption or the use of PPH rates by type and rely on thechanging composition of total occupied housing. The spatial demography approach is themost theoretically appealing but the data demands are prohibitive under the current time-line and budgetary constraints. The demographic regression approach is intuitivelyappealing on one level but theoretical and methodological drawbacks are severe. Of allthe options, the use of PPH by type and relying on the projections of occupied housingunits by type seems to be the best trade-off between predictive power and theory at thecurrent time.

• Assumptions on fertility schedules and convergence: In the demographically drivenapproach the fertility schedules used for the foreign born and the assumed rate ofconvergence should be subject to review and tested for sensitivity to the rates chosen.

Page 18: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

SCAG SMALL AREA PROJECTIONS: MANUAL AND CODE

JUNE 2003

STUART SWEENEYDEPT. OF GEOGRAPHYUNIVERSITY OF CALIFORNIA AT SANTA BARBARA

Page 19: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 1

SAS CODEDATA PREPARATION: adjust DOF to census.sas adjust DOF to census - mobile homes.sas adjust DOF to census - PPH.sas

HOUSEHOLD FORECAST: HH simulation calls.sas main HH simulation.sas graph HH sim.sas

PPH FORECAST: SCAG IML function library.sas PPH constrained regression.sas RSA projection.sas

INTRODUCTION

This document provides an overview of the computer code and associateddata files used to create the 2030 small area forecasts for the SouthernCalifornia Association of Governments. The code is written in SASprogramming language with most of the core modules written in SAS IML(interactive matrix language). All of the code and associated data filesare included on the CD-ROM included with this manual. After copying theSCAG directory from the CD-ROM to the main C: drive, the code can beexecuted in SAS software. A complete printed set of the computer programis included as an appendix to this document.

The code can be grouped into three categories: (1) data preparation, (2)household forecast, and (3) PPH forecast. The remaining sections willbriefly explain the tasks achieved by code in each category, the inputand output data from each set of code, and the interrelationship betweendifferent sets of code.

The code should be fairly easy to update for future forecastingexercises. If the same underlying geography is used updates will beextremely straightforward. Easy modifications would include changes inthe input baseline data, revised county control totals, or revised PPHconstraints. If the geography changes the modifications will requiretracing through all of the do loops over the 200 geographic units.

Page 20: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 2

DATA PREPARATIONThe basic data inputs include information from the California Departmentof Finance Demographic Research Unit (DOF), the U.S. Census Bureau, andcontrol totals derived from concurrent SCAG projections. The DOF dataincludes building permit and PPH time series. The census data providesdecadel census counts. All of the data sets were provided by SCAG.

One of the major tasks in preparing the data was rectifying all of thegeography across different periods. The baseline census data isinconsistent across decades for some cities (due to annexation) as is theDOF data. A SAS program in the ‘format’ subdirectory includes severalformat statements that map either 1990 or 2000 place FIPS codes to anumeric ID variable. Including the subregions (RSAs) there are 200geographic units. Every SAS program calls this basic format file to loadall of the place formats. The numeric ID allows for do loops over thecomplete sets of spatial units.

As described in the SCAG methodology, the DOF and baseline census datacan be combined to create synthetic time series. There are threeprograms that carry out that task:

1D. Adjust DOF to census.sas - Adjusts the DOF data to pass throughobserved census values. The code is executed once for each type ofhousing unit except mobile homes. There are a series of SAS macrovariables nested in the middle of the program which assign theseries (hh,sf,mf), variable name strings, and the output file name(see page A-2). The code is used to create the files: hh_adj_dof,sf_adj_dof, and mf_adj_dof. The code also generates line graphswhich can be used to visually inspect the output data.

2D. Adjust DOF to census - mobile homes.sas - This code is almostidentical to the previous program. The mobile home data is moreproblematic and requires some unique coding. The output file is:mb_adj_dof.

3D. Adjust DOF to census - PPH.sas - This code performs the same basictask for PPH values in the census and DOF data. The output data is:pph_adj_dof.

HOUSEHOLD FORECAST WITH STOCHASTIC CONSTRAINT

After the data sets are created above, the next step is to projecthousehold values, by housing type, out to 2030. As described in themethodology, the projection is constrained by a county control total. Each control total is partitioned into a city level constraint bysampling from a Dirichlet distribution with its parameters determined bya beta distribution. The beta distribution is bound between an upper andlower bound formed by the max (min) of the 5-, 10-, and 15-year growthrate for the city. Thus, fast growing cities will tend to account formore of the future growth. The goal is to use the statisticaldistribution to incorporate uncertainty about the future city levelconstraints. Other approaches are discussed in the methodology but thecurrent code only implements constraints initialized off of prior growth

Page 21: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 3

in the observed series.

There are two programs that implement the stochastic forecast:

1H. HH simulation calls.sas - This code includes macro variables tospecify general parameters (number of simulations, shape parametersfor the beta distribution) as well as specific parameters related tohousing type series. The program calls the main simulations program(2H) using the %include statements. Note that the call for mobilehomes include the use of spatial contiguity to resolve some missingdata issues with that series.

2H. main HH simulation.sas - This program is passed arguments from 1Hand then carries out the stochastic simulation. The basic flow ofthe program is as follows:

1. Define growth limits 2. Main simulation

a. start county loopb. start city within county loop c. draw ‘&sim’ number of values from beta-Dirichlet d. compute constrained exponential projection for each sim e. force adding up, select ‘optimal’ projection

f. end city loop g. end county loop.3. Write the projection data to a file.

Note that the beta-dirichlet sampling is drawn in a single passrather than as a loop. This dramatically speeds up the execution ofthe program. Still, the program does take a long time to executewith 10,000 or more simulations. The program also includes aprogram execution monitor that will display the city and iterationduring the program execution. Also note that the program directlycalls the file ‘control’ which contains the county control totals.

3H. graph HH sim.sas - This program is also called by 1H. It prints linegraphs of the output data from 2H.

PPH AND RSA FORECASTS WITH CONSTRAINTS

The last step is to calculate population values from the household valuesby applying a person per household rate. Again, the methodology reportdescribes several alternative approaches. The method implemented is theone selected by the FTTF committee. Specifically, the PPH values fromthe DOF series are adjusted to pass through the observed census values inthe baseline file. This was accomplished by 3D. Given the adjusted PPHvalues the goal is to extrapolate PPH but to force it to lie betweenconstraints specified by an analyst (in this case Simon Choi). Theconstraints prevent the PPH values from becoming outlandishly high orlow.

The PPH forecasts becomes a large balancing exercise. The HH forecastfrom the preceding step is the initial value. But HH values given the

Page 22: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 4

PPH projection may imply a total population that exceeds the countycontrol totals. Moreover, no series can trend negative.

The solution to the problem is a series of iterative adjustment programsthat search for a PPH path that is jointly consistent with the countypopulation constraint, PPH bounds, and household by type forecasts. Theiterative solution uses 3-way iterative proportion fitting (IPF) nestedwithin larger constraint checking loop. After that task is complete, itis relatively straightforward to derive the RSA forecasts. The programassociated with this block include:

1P. SCAG IML function library.sas - This program creates a compiledlibrary of IML functions that can be called by 2P. The SCAGdirectory on the CD-ROM already includes the compiled library so itis not necessary to run the program again unless the function codeis modified.

2P. PPH constrained regression.sas - This program is rather lengthy andcould have been broken into several shorter programs. The firstsection executes a basic regression based extrapolation of PPH. Thesecond major block uses an iterative method to find a solution tothe multiple constrained binding the city level forecast values ofPPH, HH by type, and population. The last section verifies that theconstraints have been met and produces line graphs to inspect theresulting data. The iterative solution takes a while to converge soa program execution monitor was written to display the iteration andconvergence information.

3P. RSA projection.sas - This program simply decomposes the cityforecasts into RSA regions.

ADDITIONAL COMMENTS

Extending code to new geographies or forecast lengths

As currently written, the code is tailored to this specific forecastingrun with 200 regions and a forecast period from 2000 to 2030. Theprimary control over the regionalization (geography) is contained in theformat program. The formats map census city codes and subregion codes toordinal city/region IDs. The main simulation program use a loop over thecounties and within it a loop over the vector of cities. There is also aformat (cc) which is an ordinal indicator of the city ID within a county. Thus any changes to the geography will have to have associated changes inthe formats. After that is done, any loops over 200 will need to bechanged.

The hard coding of the forecast period is more extensive. In severalplaces the code refers to 21 x 1 vectors (1980-2000) or for the forecastperiod with a five year interval is the code – ((0:6)*5)+J(7,1,21) –which generates the forecast period 2000 to 2030 (or 21, 26, ...,51).

To help in changing the code to subsequent applications, I have includeda simple comment “* – YEAR CHANGE – ;” for sections with hard coded timeor “* – GEOG CHANGE – ;” for sections with hard coded geography. I have

Page 23: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 5

not included “* – YEAR CHANGE – ;” comments in obvious cases where 1980,1990, 2000, or 2030 are directly referenced. In these cases a search andreplace will work directly.

IPF/Adding up routines

One of the more convoluted sections of code deals with adding upconstraints. As noted above, the projections are constrained to be non-negative and, with PPH, required to lie between upper and lower bounds. The function library contains three modules I wrote related to iterativeproportional fitting. Note that these are not part of function librarythat comes with SAS and are therefore not documented in the SAS helpdocuments. The first IPF module is a simple case for a 2-dimensionarray, X, with r rows and c columns. The function is:

IPF(row,col,dist,pred)

where,

row = a known row total with dimension rx1,col = a known col total with dimension 1xc,dist = an r x c matrixpred = an r x c matrix with row totals equal to row, column

totals equal to col and odds ratios equal to those ofdist.

The second IPF routine is for a 3-dimensional matrix (a cube) with two ofthe ‘faces’ – or 2-dimensional marginal distributions – being known. Inthis application the 3-dimensional array is YEAR x CITY x TYPE with the‘faces’ YEAR x CITY and YEAR x TYPE given. Suppose the 3-dimensionalarray is indexed by y,c, and t. The functions is:

IPF3_2F(f13,f23, offset, dimension, margins, max iterations, stoppingrule)

where,

f13 = a known 2-dimensional matrix input as an Nx1 vector(where N=y*c)

f23 = a known 2-dimensional matrix input as a Mx1 vector (whereM=y*t)

offset = a known distribution matrix input as a Zx1 vector (whereZ=y*c*t)

dimension = a 1x3 vector containing {y c t}margins = an argument used by the SAS function MARG, in this case

it should be {1 1 2, 2 3 3}. This does not need to bechanged for alternative geographies or forecast lengths.

The final two argument simply control iteration and the stopping rule. The latter should be a small number (0.001).

Page 24: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

Page 6

APPENDIX: SAS computer code

I. Data preparation

adjust DOF to census.sas ...............................................A-1adjust DOF to census - mobile homes.sas ................................A-5adjust DOF to census - PPH.sas .........................................A-9

II. Household forecast with stochastic constraint

HH simulation calls.sas ...............................................A-12main HH simulation.sas ................................................A-14graph HH sim.sas ......................................................A-18

III. PPH and RSA forecast with constraints

SCAG IML function library.sas .........................................A-19PPH constrained regression.sas ........................................A-21RSA projection.sas ....................................................A-29

IV. IML and MATLAB code related to general plan data

Create adjusted HH data with neg..................................A-31MATLAB code (working draft).......................................A-35

Page 25: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-1

/************************************************************** ** adjust DOF to census.sas -** Adjusts DOF building permit data to census household counts .** Needs to be re-run for hh, sf, and mf -- see macro variables ** below.** ** INPUT DATA: baseline.dbf (census data), dof (building permit)** OUTPUT DATA: ##_adj_dof, ##=hh,sf,mf***************************************************************/

options nonotes nocenter linesize=256 pagesize=1000 ;libname s "c:\scag\data" ;

%include "c:\scag\formats\scag place formats.sas" ;

PROC IMPORT OUT= base_cen DATAFILE= "C:\SCAG\data\baseline.dbf" DBMS=DBF REPLACE; GETDELETED=NO;RUN ;data t ; set s.dof ; where year=1990 and mb^=. and p00<99000 ; id=put(p00,p0_num.)/1 ; keep id mb ;run ;data base_cen ; set base_cen ; hh90=h90 ; id=put(fip00,p0_num.)/1 ; hc8090=hh90-hh80 ; sc8090=sdo90-sdo80 ; mc8090=mdo90-mdo80 ; hc9000=hh00-hh90 ; sc9000=sdo00-sdo90 ; mc9000=mdo00-mdo90 ;run ;proc sort data=base_cen ; by id ; run ;proc sort data=t ; by id ; run ;data base_cen ; merge base_cen t ; by id ; mb_occ=mbo90/mb ;run ;proc sort data=base_cen ; by subreg ; run ;proc summary data=base_cen nway ; where .5<=mb_occ<=1 ; var mb_occ ; class subreg ; output out=t mean=occ_avg ;run ;proc sort data=base_cen ; by subreg ; run ;proc sort data=t ; by subreg ; run ;data base_cen ; merge base_cen t ; by subreg ; if mb_occ>1 then mb_occ=1 ; if mb_occ=. then mb_occ=occ_avg ; if mb_occ<.5 then mb_occ=.5 ;run ;proc sort data=base_cen ; by id ; run ;/* proc print data=base_cen ; var id mbo90 mb mb_occ ; run ; */

data dof ; set s.dof ; keep year p00 sf mf mb hu sf80 mf80 mb80 hu80 ;data dof_c ; set s.dof ; where p00<99000 ; id=put(p00,p0_num.)/1 ; keep id year p00 sf mf mb hu sf80 mf80 mb80 hu80 ;data dof_c ; set dof_c ; where id notin (119,128,164) ; run ;proc iml ; use dof ;

read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99025) into dof025 ; r=nrow(dof025) ; id=J(r,1,8) ; dof025=id||dof025 ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99037) into dof037 ;

Page 26: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-2

r=nrow(dof037) ;

id=J(r,1,12)//J(r,1,15)//J(r,1,19)//J(r,1,50)//J(r,1,55)//J(r,1,71)//J(r,1,98)//J(r,1,104); dof037=id||(dof037//dof037//dof037//dof037//dof037//dof037//dof037//dof037) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99059) into dof059 ; r=nrow(dof059) ; id=J(r,1,119)//J(r,1,128)//J(r,1,138); dof059=id||(dof059//dof059//dof059) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99065) into dof065 ; r=nrow(dof065) ; id=J(r,1,153)//J(r,1,164); dof065=id||(dof065//dof065) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99071) into dof071 ; r=nrow(dof071) ; id=J(r,1,189) ; dof071=id||dof071 ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99111) into dof111 ; r=nrow(dof111) ; id=J(r,1,200) ; dof111=id||dof111 ;

dof=dof025//dof037//dof059//dof065//dof071//dof111 ;create dof_s from dof [colname={id year p00 sf mf mb hu sf80 mf80 mb80 hu80}] ; append

from dof ;quit ;data dof ; set dof_c dof_s ; run ;proc summary data=dof nway ; var sf mf mb hu sf80 mf80 mb80 hu80 ; class id ; output out=t sum= ;run ;proc means ; run ;

proc iml ; free ids ; do s=1 to 200 ; ** –GEOG– ; ids=ids//(J(21,1,s)||(1980:2000)`) ; ** –TIME– ; end ; create ids from ids [colname={id year}] ; append from ids ;quit ;proc sort data=ids ; by id year ; run ;proc sort data=dof ; by id year ; run ;data dof ; merge ids dof ; by id year ; drop p00 ;run ;

**** start adjustments **** ;

%let out =s.mf_adj_dof ; * sf mf mb hh ;%let outstr =id year mf c_mf df occ path subpth ;%let var = mf ;%let var9_=mf ; * sf mf mb hu ;%let var8_=mf80 ; * sf80 mf80 mb80 hu80 ;*%let string=hh80 hh90 hh00 hc8090 hc9000 ; *%let string=sdo80 sdo90 sdo00 sc8090 sc9000 ; %let string=mdo80 mdo90 mdo00 mc8090 mc9000 ; *%let string=mbo00 mb_occ ;options nonotes ;

proc iml ; r=3:12 ; r9=1||r ; ** –TIME– ; r=1:10 ; r8=r||12 ; ** –TIME– ; do i=1 to 200 ; * 1-200 ; ** –GEOG– ; use dof ; read all var {&var9_} where(id=i & year>=1990) into h9 ;

read all var {&var8_} where(id=i & year<=1990) into h8 ; read all var {id year} where(id=i) into year ;

read all var {&var9_} where(id=i & year=1990) into h1990 ; use base_cen ; read all var {&string} where(id=i) into cen ;

Page 27: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-3

if nrow(h9)=12 then h9=h9[r9,] ;if nrow(h8)=12 then h8=h8[r8,] ; * change missing values to zero ;cen=choose(cen=.,0,cen) ;h8=choose(h8=.,0,h8) ;h9=choose(h9=.,0,h9) ;

h1990=h9[1,] ; *print h8 h9 h1990 cen ;

* combining dof and census data ;if (cen[,1]>0 & h8[1,]>0) | (cen[,2]>0 & h1990>0) then do ; * both cen value positive,

DOF value present ; t=J(21,3,0) ; path=J(21,1,1) ;

neg8=0 ; neg9=0 ; t[1,]=(cen[,1]||h8[1,]||(cen[,1]/h8[1,])) ; t[11,]=(cen[,2]||h1990||(cen[,2]/h1990)) ;t[21,]=(cen[,3]||h9[11,]||(cen[,3]/h9[11,])) ; t[,2]=h8[1:10,]//h9 ; if cen[,1]>0 then do ; * value in 1980 ;

d=h9[2:11,]-h9[1:10,] ; if sum(d<0)>0 then neg9=1 ; subpth=J(21,1,1) ; p2=cen[,2]//((d/sum(d))*cen[,5]) ;

if sum(d)=0 then p2=cen[,2]//(J(10,1,.1)*cen[,5]) ; d=h8[2:11,]-h8[1:10,] ; if sum(d<0)>0 then neg8=1 ;

p1=cen[,1]//((d/sum(d))*cen[,4]) ; if sum(d)=0 then p1=cen[,1]//(J(10,1,.1)*cen[,4]) ; p=p1//p2[2:11,] ;

end ; if cen[,1]=0 then do ; * no value in 1980 ; d=h9[2:11,]-h9[1:10,] ; if sum(d<0)>0 then neg9=1 ; subpth=J(21,1,2) ; p2=cen[,2]//((d/sum(d))*cen[,5]) ;

if sum(d)=0 then p2=cen[,2]//(J(10,1,.1)*cen[,5]) ; p=J(10,1,0)//p2 ;

end ; free n ;

do j=1 to nrow(p) ; n=n//sum(p[1:j,]) ;

end ; if sum(choose({48,65,66}=i,1,0))>0 then do ; print i n t p ; end ;end ;

if (cen[,1]>0 & h8[1,]=0) | (cen[,2]>0 & h1990=0) then do ; * cen value positive in 80 or90 and DOF absent ; t=J(21,3,0) ; path=J(21,1,2) ; subpth=J(21,1,0) ; t[1,1]=cen[,1] ; t[11,1]=cen[,2] ; t[21,1]=cen[,3] ;

t[,2]=h8[1:10,]//h9 ; n=t[,1] ;end ; if (sum(h9>0)<=10 & sum(h9>0)>0) then do ;

g=h9/h9[11,] ; path=J(21,1,3) ; subpth=J(21,1,0) ; t=J(21,3,0) ; t[11:21,1]=g*cen[,3] ; t[,2]=h8[1:10,]//h9 ;

n=t[,1] ;end ;if (sum(h8>0)<=10 & sum(h8>0)>0) then do ;

g=h8/h8[11,] ; path=J(21,1,4) ; subpth=J(21,1,0) ; n[1:11,]=g*cen[,2] ;end ;

if neg8=1 then n[1:10,]=t[1:10,1] ; if neg9=1 then n[11:21,]=t[11:21,1] ; if cen[,4]<0 then n[1:10,]=J(10,1,0) ;

if cen[,5]<0 then n[1:20,]=J(20,1,0) ; if (sum(n>0)=1 | sum(n>0)=0) & ((cen[,3]-cen[,1])<=0 | cen[,1]=0) then do ;n=J(21,1,cen[,3]) ; a={'a'} ; *print i a n ; end ;

if sum(n>0)=1 & (cen[,3]-cen[,1])>0 then do ; n[1,]=cen[,1] ;

n[21,]=cen[,3] ; b={'b'} ; * print i b n ;

Page 28: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-4

end ;dat=dat//(J(21,1,i)||(1980:2000)`||round(n,1)||t||path||subpth) ; ** –TIME– ;

end ; create dat from dat [colname={&outstr}] ; append from dat ;quit ;

data &out ; set dat ; if &var=0 then &var=. ;* if cum=. and cen>0 then cum=cen ;origin=0 ;plc00=put(id,num_p0_.)/1 ;name=put(id,num_nam.) ; run ;proc sort data=&out ; by id ; *by name ; run ;

options linesize=100 pagesize=100 ;data t ; set &out ; miss=0 ; if year=2000 and &var^=c_&var then miss=1 ; if miss=1 then put id 3."," @@ ; *if id in (43, 84) then put id 4. year 5. mf comma10. c_mf comma10. path 3. subpth 3. ;run ;

goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=5 ;symbol2 c=DEFAULT ci=blue v=NONE ;symbol3 c=DEFAULT ci=BLUE v=NONE ;symbol4 c=DEFAULT ci=GREEN v=NONE ;symbol5 c=DEFAULT ci=orange v=NONE ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;

proc gplot data=&out ; * where year<2015 ; by id ; *by name ; plot (&var origin) * year /overlay haxis=axis1 vaxis=axis2 frame ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

Page 29: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-5

/************************************************************** ** adjust DOF to census - mobile homes.sas -** Adjusts DOF building permit data to census household counts ** for mobile homes. ** ** INPUT DATA: baseline.dbf (census data), dof (building permit)** OUTPUT DATA: mb_adj_dof***************************************************************/

options nonotes nocenter linesize=256 pagesize=1000 ;libname s "c:\scag\data" ;%include "c:\scag\formats\scag place formats.sas" ;

PROC IMPORT OUT= base_cen DATAFILE= "C:\SCAG\data\baseline.dbf" DBMS=DBF REPLACE; GETDELETED=NO;RUN ;data t ; set s.dof ; where year=1990 and mb^=. and p00<99000 ; id=put(p00,p0_num.)/1 ; keep id mb ;run ;data base_cen ; set base_cen ; hh90=h90 ; id=put(fip00,p0_num.)/1 ; hc8090=hh90-hh80 ; sc8090=sdo90-sdo80 ; mc8090=mdo90-mdo80 ; hc9000=hh00-hh90 ; sc9000=sdo00-sdo90 ; mc9000=mdo00-mdo90 ;run ;proc sort data=base_cen ; by id ; run ;proc sort data=t ; by id ; run ;data base_cen ; merge base_cen t ; by id ; mb_occ=mbo90/mb ;run ;proc sort data=base_cen ; by subreg ; run ;proc summary data=base_cen nway ; where .5<=mb_occ<=1 ; var mb_occ ; class subreg ; output out=t mean=occ_avg ;run ;proc sort data=base_cen ; by subreg ; run ;proc sort data=t ; by subreg ; run ;data base_cen ; merge base_cen t ; by subreg ; if mb_occ>1 then mb_occ=1 ; if mb_occ=. then mb_occ=occ_avg ; if mb_occ<.5 then mb_occ=.5 ;run ;proc sort data=base_cen ; by id ; run ;

data dof ; set s.dof ; keep year p00 sf mf mb hu sf80 mf80 mb80 hu80 ;data dof_c ; set s.dof ; where p00<99000 ; id=put(p00,p0_num.)/1 ; keep id year p00 sf mf mb hu sf80 mf80 mb80 hu80 ;data dof_c ; set dof_c ; where id notin (119,128,164) ; run ;proc iml ; use dof ;

read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99025) into dof025 ; r=nrow(dof025) ; id=J(r,1,8) ; dof025=id||dof025 ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99037) into dof037 ; r=nrow(dof037) ;

id=J(r,1,12)//J(r,1,15)//J(r,1,19)//J(r,1,50)//J(r,1,55)//J(r,1,71)//J(r,1,98)//J(r,1,104);

Page 30: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-6

dof037=id||(dof037//dof037//dof037//dof037//dof037//dof037//dof037//dof037) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99059) into dof059 ; r=nrow(dof059) ; id=J(r,1,119)//J(r,1,128)//J(r,1,138); dof059=id||(dof059//dof059//dof059) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99065) into dof065 ; r=nrow(dof065) ; id=J(r,1,153)//J(r,1,164); dof065=id||(dof065//dof065) ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99071) into dof071 ; r=nrow(dof071) ; id=J(r,1,189) ; dof071=id||dof071 ;read all var {year p00 sf mf mb hu sf80 mf80 mb80 hu80} where(p00=99111) into dof111 ; r=nrow(dof111) ; id=J(r,1,200) ; dof111=id||dof111 ;

dof=dof025//dof037//dof059//dof065//dof071//dof111 ;create dof_s from dof [colname={id year p00 sf mf mb hu sf80 mf80 mb80 hu80}] ; append

from dof ;quit ;data dof ; set dof_c dof_s ; run ;proc summary data=dof nway ; var sf mf mb hu sf80 mf80 mb80 hu80 ; class id ; output out=t sum= ;run ;proc means ; run ;

proc iml ; free ids ; do s=1 to 200 ; ** –GEOG– ; ids=ids//(J(21,1,s)||(1980:2000)`) ; ** –TIME– ; end ; create ids from ids [colname={id year}] ; append from ids ;quit ;proc sort data=ids ; by id year ; run ;proc sort data=dof ; by id year ; run ;data dof ; merge ids dof ; by id year ; drop p00 ;run ;

**** start adjustments **** ;

%let out =s.mb_adj_dof ; * sf mf mb hh ;%let outstr =id year mb c_mb path ;%let var = mb ;%let var9_=mb ; * sf mf mb hu ;%let var8_=mb80 ; * sf80 mf80 mb80 hu80 ;*%let string=hh80 hh90 hh00 hc8090 hc9000 ; *%let string=sdo80 sdo90 sdo00 sc8090 sc9000 ; *%let string=mdo80 mdo90 mdo00 mc8090 mc9000 ; %let string=mbo00 mb_occ ;options nonotes ;

proc iml ; r=3:12 ; r9=1||r ; ** –TIME– ; r=1:10 ; r8=r||12 ; b=0 ; ** –TIME– ; do i=1 to 200 ; * 1-200 ; ** –GEOG– ; use dof ; read all var {&var9_} where(id=i & year>=1990) into h9 ;

read all var {&var8_} where(id=i & year<=1990) into h8 ; read all var {id year} where(id=i) into year ;

read all var {&var9_} where(id=i & year=1990) into h1990 ; use base_cen ; read all var {&string} where(id=i) into cen ;

if nrow(h9)=12 then h9=h9[r9,] ;if nrow(h8)=12 then h8=h8[r8,] ; * change missing values to zero ;

Page 31: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-7

cen=choose(cen=.,0,cen) ;h8=choose(h8=.,0,h8) ;h9=choose(h9=.,0,h9) ;

h1990=h9[1,] ; *print h8 h9 h1990 cen ;

if cen[,1]>0 & sum(h9>0)>0 then do ; p2=(h9/h9[11,])*cen[,1] ; path=J(21,1,1) ;

p1=(h8/h8[11,])*p2[1,] ; p=p1//p2[2:11,] ;

p=choose((J(21,1,p[21,])-p)<0,0,p) ; nzero=sum(p>0) ;

end ; if cen[,1]>0 & sum((h8//h9)>0)=0 then do ;

p=J(21,1,cen[,1]) ; path=J(21,1,2) ;nzero=sum(p>0) ;

end ;if cen[,1]=0 & sum((h8//h9)>0)=0 then do ;

p=J(21,1,0) ; path=J(21,1,3) ;nzero=sum(p>0) ;

end ;if cen[,1]=0 & sum((h8//h9)>0)>0 then do ; * no census value - assume error in baseline

assumption, pos dof value ;h8=(h8/h8[11,])*h9[1,] ; path=J(21,1,4) ;p=h8[1:10,]//h9 ; p=int(choose((J(21,1,p[21,])-p)<0,0,p)*cen[,2]) ;

nzero=sum(p>0) ;b=b+1 ;

end ;do z=1 to nrow(p)-1 ;

if p[z+1,]<p[z,] then p[z+1,]=p[z,] ;end ;

* print i nzero ; t=J(21,1,0) ; t[21,1]=cen[,1] ;if cen[,1]=0 then do ; p_=p ; p=J(21,1,0) ; print p_ p ; end ;dat=dat//(J(21,1,i)||(1980:2000)`||p||t||path) ;

end ; create dat from dat [colname={&outstr}] ; append from dat ;quit ;

data &out ; set dat ; if &var=0 then &var=. ;* if cum=. and cen>0 then cum=cen ;origin=0 ;plc00=put(id,num_p0_.)/1 ;name=put(id,num_nam.) ; run ;data t ; set &out ; miss=0 ; name=put(id,num_nam.) ; if year=2000 and &var^=c_&var then miss=1 ; if miss=1 then put id 3."," @@ ; run ;data t ; set &out ; miss=0 ; name=put(id,num_nam.) ; if id in ( 18, 20, 24, 31, 43, 48, 54, 57, 65, 66, 68, 73, 84, 94, 96, 99,101,128,193) then put id 4. year 5. &var comma10. c_&var comma10. path 4. +2 name $40. ;run ;

proc sort data=&out ; by id ; *by name ; run ;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=5 ;symbol2 c=DEFAULT ci=blue v=NONE ;symbol3 c=DEFAULT ci=BLUE v=NONE ;

Page 32: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-8

symbol4 c=DEFAULT ci=GREEN v=NONE ;symbol5 c=DEFAULT ci=orange v=NONE ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;

proc gplot data=&out ; * where year<2015 ; by id ; *by name ; plot (&var origin) * year /overlay haxis=axis1 vaxis=axis2 frame ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

Page 33: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-9

/************************************************************** ** adjust DOF to census - PPH.sas -** Adjusts DOF building permit data to census household counts ** for persons per household (PPH). Incorporates min and max** values from FTTF.** ** INPUT DATA: baseline.dbf (census data), dof (building permit)** OUTPUT DATA: pph_adj_dof***************************************************************/

options nonotes nocenter linesize=256 pagesize=100 ;libname s "c:\scag\data" ;%include "c:\scag\formats\scag place formats.sas" ;

PROC IMPORT OUT= base_cen DATAFILE= "C:\SCAG\data\baseline.dbf" DBMS=DBF REPLACE; GETDELETED=NO;RUN ;data base_cen ; set base_cen ; hh90=h90 ; id=put(fip00,p0_num.)/1 ; pph80=res80/hh80 ; pph90=res90/hh90 ; pph00=res00/hh00 ; if res80=0 then pph80=pph90 ; c8090=pph90-pph80 ; c9000=pph00-pph90 ;run ;proc sort data=base_cen ; by id ; run ;

data dof ; set s.dof ; keep year p00 pph ;data dof_c ; set s.dof ; where p00<99000 ; id=put(p00,p0_num.)/1 ; keep id year p00 pph ;data dof_c ; set dof_c ; where id notin (119,128,164) ; run ;proc iml ; use dof ;

read all var {year p00 pph} where(p00=99025) into dof025 ; r=nrow(dof025) ; id=J(r,1,8) ; dof025=id||dof025 ;read all var {year p00 pph} where(p00=99037) into dof037 ; r=nrow(dof037) ;

id=J(r,1,12)//J(r,1,15)//J(r,1,19)//J(r,1,50)//J(r,1,55)//J(r,1,71)//J(r,1,98)//J(r,1,104); dof037=id||(dof037//dof037//dof037//dof037//dof037//dof037//dof037//dof037) ;read all var {year p00 pph} where(p00=99059) into dof059 ; r=nrow(dof059) ; id=J(r,1,119)//J(r,1,128)//J(r,1,138); dof059=id||(dof059//dof059//dof059) ;read all var {year p00 pph} where(p00=99065) into dof065 ; r=nrow(dof065) ; id=J(r,1,153)//J(r,1,164); dof065=id||(dof065//dof065) ;read all var {year p00 pph} where(p00=99071) into dof071 ; r=nrow(dof071) ; id=J(r,1,189) ; dof071=id||dof071 ;read all var {year p00 pph} where(p00=99111) into dof111 ; r=nrow(dof111) ; id=J(r,1,200) ; dof111=id||dof111 ;

dof=dof025//dof037//dof059//dof065//dof071//dof111 ;create dof_s from dof [colname={id year p00 pph}] ; append from dof ;

quit ;data dof ; set dof_c dof_s ; run ;

Page 34: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-10

proc summary data=dof nway ; var pph ; class id ; output out=t sum= ;run ;proc means ; run ;

proc iml ; free ids ; do s=1 to 200 ; ids=ids//(J(21,1,s)||(1980:2000)`) ; ** –TIME– ; end ; create ids from ids [colname={id year}] ; append from ids ;quit ;proc sort data=ids ; by id year ; run ;proc sort data=dof ; by id year ; run ;data dof ; merge ids dof ; by id year ; drop p00 ;run ;

**** start adjustments **** ;%let out =s.pph_adj_dof ; %let outstr =id year pph pph_cen pph_dof lim lim_ mn_s mx_s ;%let var = phh ;%let string=pph80 pph90 pph00 c8090 c9000 ; options nonotes ;

proc iml ; r=3:12 ; r9=1||r ; ** –TIME– ; r=1:10 ; r8=r||12 ; ** –TIME– ; do i=1 to 200 ; * 1-200 ; use dof ; read all var {pph} where(id=i & year>=1990) into h9 ;

read all var {pph} where(id=i & year<=1990) into h8 ; read all var {id year} where(id=i) into year ;

read all var {pph} where(id=i & year=1990) into h1990 ; use base_cen ; read all var {&string} where(id=i) into cen ;

read all var {mn_s mx_s} where(id=i) into lim ;if nrow(h9)=12 then h9=h9[r9,] ;if nrow(h8)=12 then h8=h8[r8,] ;

if cen[,1]<1.2 | cen[,1]>5.5 then cen[,1]=h8[1,] ;if cen[,2]<1.2 | cen[,2]>5.5 then cen[,2]=h9[1,] ;cen[,4]=cen[,2]-cen[,1] ;cen[,5]=cen[,3]-cen[,2] ;

* change missing values to zero ;cen=choose(cen=.,0,cen) ;h8=choose(h8=.,0,h8) ;h9=choose(h9=.,0,h9) ;

h1990=h9[1,] ;

* combining dof and census data ; t=J(21,3,0) ; ** –TIME– ;

neg8=0 ; neg9=0 ; t[1,]=(cen[,1]||h8[1,]||(cen[,1]/h8[1,])) ; t[11,]=(cen[,2]||h1990||(cen[,2]/h1990)) ; t[21,]=(cen[,3]||h9[11,]||(cen[,3]/h9[11,])) ; t[,2]=h8[1:10,]//h9 ; if cen[,1]>0 then do ; * value in 1980 ;

d=h9[2:11,]-h9[1:10,] ; if sum(d<0)>0 then neg9=1 ; p2=cen[,2]//((d/sum(d))*cen[,5]) ;

d=h8[2:11,]-h8[1:10,] ; if sum(d<0)>0 then neg8=1 ; p1=cen[,1]//((d/sum(d))*cen[,4]) ;

p=p1//p2[2:11,] ; end ; if cen[,1]=0 then do ; * no value in 1980 ; d=h9[2:11,]-h9[1:10,] ; if sum(d<0)>0 then neg9=1 ;

Page 35: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-11

p2=cen[,2]//((d/sum(d))*cen[,5]) ; p=J(10,1,0)//p2 ;

end ; free n ;

do j=1 to nrow(p) ; n=n//sum(p[1:j,]) ;

end ;

if sum(t[,1]>0)=1 then do ; t[1:20,1]=t[1:20,2] ;

print t;end ;tmp=t[loc(t[,1]),1] ;dif=tmp[nrow(tmp),]-tmp[1,] ;if dif>=0 then do ;

lim_a=J(21,1,lim[,2]+.2) ; lim_b=J(21,1,lim[,1]-.2) ;end;if dif<0 then do ;

lim_a=J(21,1,lim[,1]-.2) ; lim_b=J(21,1,lim[,2]+.2) ;end ;lim_c=J(21,1,lim[,1])||J(21,1,lim[,2]) ;dat=dat//(J(21,1,i)||(1980:2000)`||n||t[,1:2]||lim_a||lim_b||lim_c) ;

end ; create dat from dat [colname={&outstr}] ; append from dat ;quit ;

data &out ; set dat ; if pph=0 then pph=. ;if pph_cen=0 then pph_cen=. ;if pph_dof=0 then pph_dof=. ; lw=1.2 ; hg=5.5 ;plc00=put(id,num_p0_.)/1 ;name=put(id,num_nam.) ; run ;proc sort data=&out ; by id ; run ;

goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=red v=NONE width=3 ;symbol2 c=DEFAULT ci=blue v=NONE width=3 ;symbol3 c=DEFAULT ci=green v=NONE width=2 ;symbol4 c=DEFAULT ci=green v=NONE width=2 ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;title "PPH" ;footnote "pph=red, bounds=blue" ;proc gplot data=&out ; by id ; plot (pph_dof pph_cen lw hg) * year / overlay haxis=axis1 vaxis=axis2 frame ;

format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

Page 36: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-12

/************************************************************** ** HH simulation calls.sas -** Sets parameters and calls 'main HH simulation.sas' for each ** type of housing data (hh,sf,mf,mb). Macro variables control** the number of simulation and shape of the beta distribution.** ** INPUT DATA: baseline.dbf (census data), ##_adj_dof (##=hh,sf,mf,mb),** control.dbf** OUTPUT DATA: ##_stat (##=hh,sf,mf,mb)***************************************************************/

options nonotes nocenter linesize=256 ;libname s "c:/scag/data" ;

******************************** ;*** common program controls **** ;******************************** ;%include "c:/scag/formats/scag place formats.sas" ;%let sim=501 ; * number of simulations ;%let o=_o ;%let v= 1 ; * beta dist shape parm 1 - beta(v,w) ;%let w= .2 ; * beta dist shape parm 2 ;

**************************** ;*** TOTAL HOUSEHOLDS **** ;**************************** ;%let cont=0 ; * 1=use neighbors, 0=do not use neighbors ;%let type=Total HH ; * ID used in screen monitor ;%let var=HH ; * HH, SF, MF, MB -- in caps ;%let out=hh_stat_test ;%let in=hh_adj_dof ; * hh_adj sf_adj mf_adj ; %include "c:/scag/main hh simulation.sas" ;

**************************** ;*** SINGLE FAMILY HH **** ;**************************** ;%let cont=0 ; * 1=use neighbors, 0=do not use neighbors ;%let type=Single family ; * ID used in screen monitor ;%let var=SF ; * HH, SF, MF, MB -- in caps ;%let out=sf_stat_test ;%let in=sf_adj_dof ; * hh_adj sf_adj mf_adj ; %include "c:/scag/main hh simulation.sas" ;

*************************** ;*** MULTI FAMILY HH **** ;*************************** ;%let cont=0 ; * 1=use neighbors, 0=do not use neighbors ;%let type=Multi family ; * ID used in screen monitor ;%let var=MF ; * HH, SF, MF, MB -- in caps ;%let out=mf_stat_test ;%let in=mf_adj_dof ; * hh_adj sf_adj mf_adj ; %include "c:/scag/main hh simulation.sas" ;

*************************** ;*** MOBILE HOMES HH **** ;*************************** ;%let cont=1 ; * 1=use neighbors, 0=do not use neighbors ;%let type=Mobile homes ; * ID used in screen monitor ;%let var=MB ; * HH, SF, MF, MB -- in caps ;%let out=mb_stat_test ;%let in=mb_adj_dof ; * hh_adj sf_adj mf_adj ; %include "c:/scag/main hh simulation.sas" ;

/*** Graphs ***/%let type=Total HH;%let out=hh_stat ;%include "c:/scag/graph hh sim.sas" ;

%let type=Single-family HH;

Page 37: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-13

%let out=sf_stat ;%include "c:/scag/graph hh sim.sas" ;

%let type=Multi-family HH;%let out=mf_stat ;%include "c:/scag/graph hh sim.sas" ;

%let type=Mobile homes;%let out=mb_stat ;%include "c:/scag/graph hh sim.sas" ;

Page 38: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-14

/************************************************************** ** main HH simulation.sas -**** Extrapolation for building permit data with stochastic constraint. ** General parameters for any HH type and with or without spatial weights.** Program is called by HH simulation calls.sas.** ** INPUT DATA: baseline.dbf (census data), ##_adj_dof (##=hh,sf,mf,mb),** control.dbf** OUTPUT DATA: ##_stat (##=hh,sf,mf,mb)***************************************************************/

data bp ; set s.&in ; cum=&var ; cou=put(id,num_cou.)/1 ; cc=put(id,num_cc.)/1 ;run ;proc iml ; use bp ; do i=1 to 200 ; ** –GEOG– ; read all var {cum} where(id=i) into z ; if sum(z=.)=nrow(z) then do ; vw={.2 1} ; mn=0 ; mx=1 ; end ; if sum(z=.)<nrow(z) then do ; read all var {cum} where(id=i & year=1985) into g85 ; read all var {cum} where(id=i & year=1990) into g90 ; read all var {cum} where(id=i & year=1995) into g95 ; read all var {cum} where(id=i & year=2000) into g00 ; g_=((g00-g95)/5)||((g00-g90)/10)||((g00-g85)/15) ; t=choose(g_=.,0,g_) ; if sum(t>0)=0 then do ; read all var {year cum} where(id=i & cum>0) into t ;

g_=(t[nrow(t),2]-t[1,2])/(t[nrow(t),1]-t[1,1]) ; if g_<1 then g_=0||max(1,int(t[nrow(t),2]*.0005)) ; if g_>=1 then g_=g_||(g_*1.03) ;

end ; mn=g_[,><] ; mx=g_[,<>] ; vw={&v &w} ; if (g_[,1]-mn)/(mx-mn)<.5 then vw={&w &v} ; end ; g=g//(i||vw||mn||mx) ; end ; create lim from g [colname={id v w mn mx}] ; append from g ;quit ;data lim_ ; set lim ; run ;

proc iml ; if &cont=1 then do ; ** use neighbor weights when &cont=1 ; use s.plc_cont ; read all var _num_ into cont ; cont=cont[,3:ncol(cont)] ; use lim_ ; read all var {mn mx} into lim ; read all var _all_ into g ; do i=1 to 200 ; ** –GEOG– ; if sum(cont[i,]>0)=0 then do ; read all var {mn mx} where(id=i) into lm_i ; new=lm_i ; end ; if sum(cont[i,]>0)>0 then do ; cnt=cont[i,loc(cont[i,])] ; read all var {mn mx} where(id=i) into lm_i ; new=((0.7*lm_i)+(0.3*(lim[cnt,][+,]/ncol(cnt)))) ; end ; lim_=lim_//new ; end ; lim=g||lim_ ; create lim from lim [colname={id v w mn_o mx_o mn mx}] ; append from lim ; end ;

Page 39: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-15

quit ;

data lim ; set lim ; cou=put(id,num_cou.)/1 ; cc=put(id,num_cc.)/1 ;run ;

proc iml ;start monitor(county,city) ; window progress color='GRAY' irow=20 rows=9 columns=45 ; display progress(/ "&type" //"County:" county " of 6" // "City:" city " of 200") noinput ;finish monitor ;start regress(X,Y,out) ; * least squares parameter estimation ; N=NROW(X); /* number of observations */ K=NCOL(X); /* number of variables */ YBAR=Y[+,]/(n-1) ; YBAR=repeat(ybar,n,1) ; XPX=X`*X; /* cross-products */ XPY=X`*Y; XPXI=INV(XPX); /* inverse crossproducts */ B=XPXI*XPY; /* parameter estimates */ YHAT=X*B; /* predicted values */ RESID=Y-YHAT; /* residuals */ SSE=RESID`*RESID; /* sum of squared errors */ DF=N-K; /* degrees of freedom error */ if df<=0 then df=1 ; RMSE=SQRT(SSE/DF); /* root mean squared error */ out=(B`)||RMSE ; finish ;** init ;year_f=((0:6)*5)`+J(7,1,2000) ; ** –TIME– ;use bp ; read all var {id year cum} where(year=2000) into bp00 ;

* Beta-Dirichlet Sampling ;use s.control ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="&var") into cnty ; * countycontrol totals ;cnty=cnty` ; free c_ ;do z=1 to 6 ; *** start county interation, '1'=HH2000 ; ** –GEOG– ; free pred_e y00_ ; cnt=cnty[z,1] ; use lim ; read all var {id cc v w mn mx} where(cou=cnt) into lim ; free beta ; *** start beta sampling *** ; do i=1 to nrow(lim) ; v=J(1,&sim,lim[i,3]) ; w=J(1,&sim,lim[i,4]) ;

mn=lim[i,5] ;mx=lim[i,6] ;beta_=mn+((mx-mn)*rand('beta',v,w)) ;beta=beta//choose(beta_<0,0,beta_) ;

end ; *** end beta sampling *** ; free gamma ; *** start Dirichlet sampling *** ; do i=1 to nrow(lim) ; nz=loc(beta[i,]) ;

if ncol(nz)=0 then g=beta[i,] ;if ncol(nz)=&sim then g=rand('gamma',beta[i,]) ;if ncol(nz)<&sim & ncol(nz)>0 then do ; g=beta[i,] ;

g[,loc(beta[i,])]=rand('gamma',beta[i,loc(beta[i,])]) ;end;

gamma=gamma//g ; end ; *** end Dirichlet sampling *** ; dr=gamma#(1/gamma[+,]) ; c_=(dr*(cnty[z,8]-cnty[z,2])) ; do i=1 to nrow(lim) ; city=lim[i,1] ; run monitor(z,city) ; use bp ; read all var {year cum} where(id=city) into y ;

Page 40: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-16

y00=y[nrow(y),2] ; y00_=y00_//y00 ; if sum(y[,2]=.)=nrow(y) then do ;

prd_e=J(nrow(year_f),&sim,0) ; end ; if sum(y[,2]=.)<nrow(y) then do ; ** start over non-missing ; t=loc(y[,2])` ; y=y[t,2] ; one=j(nrow(y),1,1) ; x1=one||t ; t_f=((0:6)*5)`+J(7,1,21) ; ** –TIME– ; one_f=J(nrow(t_f),1,1) ; x1_f=one_f||t_f ; free prd_e prd_w prd_c ci ; do j=1 to &sim ;

c=c_[i,j] ; ci_=c_[i,] ; c=c+y00 ;

c=repeat(c,nrow(y),1) ; ci=ci||c ; c_f=repeat(c[1,],nrow(t_f),1) ; if sum((c-y)<=0)=0 then do ;

y1=log(c-y) ; * exponential ; call regress(x1,y1,reg1) ;

prd_e=prd_e||(c_f-exp(x1_f*reg1[,1:2]`)) ; end ; if sum((c-y)<=0)>0 then do ; prd_e=prd_e||J(7,1,y[nrow(y),]) ; end ;

end ; * j ; end ; ** end non-missing ;

pred_e=pred_e//(J(nrow(prd_e),1,city)||year_f||prd_e) ; end ; * i ; median=pred_e[,1:2]||(median(pred_e`)`) ;

**** start adding up **** ; j=2 ; free id prd_es md_e prd_e ; y00_=choose(y00_=.,0,y00_) ; base=repeat(y00_,1,&sim) ; print z ; do y=2005 to 2030 by 5 ; * forces adding up over county subset ; ** –TIME– ; j=j+1 ; prd_e=pred_e[loc(pred_e[,2]=y),] ; id=id//prd_e[,1:2] ;

prd_e=prd_e[,3:ncol(pred_e)]-pred_e[loc(pred_e[,2]=(y-5)),3:ncol(pred_e)] ; *print prd_e[format=comma10.] ; base=((prd_e#(1/prd_e[+,]))*(cnty[z,j]-cnty[z,(j-1)]))+base ; *print base[format=comma10.] ; prd_es=prd_es//base ; end ; do k=1 to nrow(prd_es) ; * extract median, 10th, and 90th percentile ; p=prd_es[k,] ; * exponential model results ;

rnk_p=p ; p[,rank(p)]=rnk_p ;

md_e=md_e//p[,int(ncol(p+1)*.1)||int(ncol(p+1)/2)||int(ncol(p+1)*.9)] ; end ; md_dev=abs(prd_es-repeat(md_e[,2],1,ncol(prd_es)))[+,] ; * absolute percent deviations frommedian ; min_e=prd_es[,loc(md_dev=md_dev[><])] ; * min deviation prediction from exponential ; stat=stat//(id||min_e||md_e) ; median_=median_//median ; **** end adding up **** ;end ; *** end county iteration ;stat=(bp00||repeat(bp00[,3],1,ncol(stat)-3))//stat ;lbl1={id year e_opt lw_e md_e up_e} ;create stat from stat [colname=lbl1] ; append from stat ;create median from median_ [colname={id year md_eu}] ; append from median_ ;quit ;

Page 41: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-17

proc sort data=stat ; by id year ; run ;data bp ; set bp ; where year in (1980,1985,1990,1995,2000) ; ** –TIME– ; keep id year plc00 name cum ;run ;data stat ; merge bp median stat ; by id year ; &var&o=cum ; &var=e_opt ; origin=0 ; name=put(id,num_nam.) ; p00=put(id,num_p0_.) ;run ;proc sort data=stat ; by id year ; run ;data s.&out ; set stat ; run ;

Page 42: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-18

/************************************************************** ** graph HH sim.sas -** Line graphs of output from HH simulations.** ** INPUT DATA: ##_stat (##=hh,sf,mf,mb)** OUTPUT DATA: line graphs***************************************************************/

goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=BLACK v=NONE width=4 ;symbol2 c=DEFAULT ci=blue v=NONE width=3 ;symbol3 c=DEFAULT ci=red v=NONE width=2 ;symbol4 c=DEFAULT ci=brown v=NONE ;symbol5 c=DEFAULT ci=brown v=NONE ;symbol6 c=DEFAULT ci=green v=NONE width=2 ;symbol7 c=DEFAULT ci=blue v=NONE ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;* plot (cum e_opt md_e lw_e up_e md_eu origin) ;title "&type" ;footnote "cum=blk, e_opt=blue, md_e=red, bounds=brn, md_eu=green" ;proc gplot data=s.&out ; by id ; plot (cum e_opt md_e lw_e up_e md_eu origin) * year /overlay haxis=axis1 vaxis=axis2 frame ;

format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

Page 43: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-19

/************************************************************** ** SCAG IML function library.sas -** Creates several versions of iterative proportional fitting and** a basic regression functions. The functions are written out** to a IML library.** ** functions: ipf, adjust, ipf3_sf, monitor, and regress***************************************************************/

libname imlcat "c:/scag" ;options notes ;proc iml ;start ipf(row,col,dist,pred) ; * ipf function ; row_d=dist[,+] ; col_d=dist[+,]` ; row_m=row ; col_m=(col/sum(col))*sum(row) ; iter=0 ; criteria=10 ; do while(criteria>.000001) ; iter=iter+1 ; dist=row_m#(dist#(1/dist[,+])) ; dist=col_m`#(dist#(1/dist[+,])) ; row_f=dist[,+] ; col_f=dist[+,]` ; r_fit=row_m-row_f ; c_fit=col_m-col_f ; criteria=sum(abs(r_fit))+sum(abs(c_fit)) ; end ; pred=dist ;finish ipf ;

** used with ipf3 ** ;start adjust(m_,nm_,mm_,id_,idm,dim) ; nm_=nm_` ; mm_=mm_` ; r=0 ; do k=1 to dim[,3] ; do j=1 to dim[,2] ; do i=1 to dim[,1] ; r=r+1 ; r_=loc(abs((idm-repeat(id_[r,],nrow(idm),1)))[,+]=0) ; m_[r]=m_[r]*(nm_[r_]/mm_[r_]) ; end ; end ; end ;finish adjust ;

start ipf3_2F(n12,n13,m,dim,cf,iter,stop) ; * Args (f13,f23,offset,dimension,margins,maxiterations,stopping rule) ; *** create id *** ; free id id_12 id_13 id_23 ; do k=1 to dim[,3] ; do j=1 to dim[,2] ; id_23=id_23//(j||k) ; do i=1 to dim[,1] ; id=id//(i||j||k) ;

if k=1 then id_12=id_12//(i||j) ;if j=1 then id_13=id_13//(i||k) ;

end ; end ; end ;

*** ipf iterations ; pass=0 ; n_all=n12||n13 ; do until(fit<stop | pass=iter) ; pass=pass+1 ; call marg(mid,m12,dim,m,cf[,1]) ;

run adjust(m,n12,m12,id[,{1 2}],id_12,dim) ; call marg(mid,m13,dim,m,cf[,2]) ;

run adjust(m,n13,m13,id[,{1 3}],id_13,dim) ; *call marg(mid,m23,dim,m,cf[,3]) ;

*run adjust(m,n23,m23,id[,{2 3}],id_23,dim) ; call marg(mid,m_all,dim,m,cf[,{1 2}]) ;

fit=sum(abs(n_all-m_all)) ; end ;

Page 44: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-20

mrg=n_all`||m_all` ; * print pass fit ;finish ipf3_2F ;

start monitor(county,city,iter,crit,fit,title) ; window progress color='GRAY' irow=5 rows=30 columns=60 ; display progress(/ title //"County:" county " of 6" // "City:" city " of 200" // "Iter:" iter " Crit:" crit // "Fit:" fit) noinput ;finish monitor ;

start regress(X,Y,out) ; * least squares parameter estimation ; N=NROW(X); /* number of observations */ K=NCOL(X); /* number of variables */ YBAR=Y[+,]/(n-1) ; YBAR=repeat(ybar,n,1) ; XPX=X`*X; /* cross-products */ XPY=X`*Y; XPXI=INV(XPX); /* inverse crossproducts */ B=XPXI*XPY; /* parameter estimates */ YHAT=X*B; /* predicted values */ RESID=Y-YHAT; /* residuals */ SSE=RESID`*RESID; /* sum of squared errors */ DF=N-K; /* degrees of freedom error */ if df<=0 then df=1 ; RMSE=SQRT(SSE/DF); /* root mean squared error */ out=(B`)||RMSE ; finish ;

reset storage=imlcat.scag ;store module=_all_ ;show ;quit ;

Page 45: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-21

/************************************************************** ** PPH constrained regression.sas - version 4**** Fits regression curves to the adjusted PPH data and uses upper** and lower limits for each city as specified by Simon Choi. The program** also forces all trends to be non-negative and that city forecasts add up** to county control totals. The last section checks non-negativity and adding up ** then plots the resulting data.** ** INPUT DATA: baseline.dbf (census data), pph_adj_dof, control, ##_stat (hh,sf,mf,mb)** OUTPUT DATA: SCAG_forecast (specified in the macro variable &out).**** Calls functions from the library SCAG.***************************************************************/

libname s "c:\scag\data" ;libname imlcat "c:\scag" ;%include "c:\scag\formats\scag place formats.sas" ;

%let type = PPH and RPOP ;%let out = scag_forcast_8_13_02 ;%let neg = 20 ;

PROC IMPORT OUT= base_cen DATAFILE= "C:\SCAG\data\baseline.dbf" DBMS=DBF REPLACE; GETDELETED=NO;RUN ;data base_cen ; set base_cen ; id=put(fip00,p0_num.)/1 ; cou=put(id,num_cou.)/1 ; cc=put(id,num_cc.)/1 ; cou_n=put(cou,county.)/1 ; gqi_=0 ; gqn_=0 ; if gn00^=. then gqn_=gn00/res00 ; if gi00^=. then gqi_=gi00/res00 ; hox_=hox00/hh00 ; pop00=res00+gn00+gi00 ; res_=res00/pop00 ; keep res00 pop00 gi00 gn00 hh00 sdo00 mdo00 mbo00 hox00 gqi_ gqn_ res_ hox_ id--cou_n cou ;run ; proc sort data=base_cen ; by cou cc id ; run ;data pph ; set s.pph_adj_dof ; cou=put(id,num_cou.)/1 ; cc=put(id,num_cc.)/1 ; keep year id pph_cen cc cou lim mn_s mx_s ;run ;proc sort ; by id year ; run ;proc freq data=pph ; tables cou*cc*id*lim / noprint out=lim ;run ;data lim ; set lim ; cou_n=put(cou,county.)/1 ; run ;data hh ; set s.hh_stat ; cou=put(id,num_cou.)/1 ; cou_n=put(cou,county.)/1 ; run ;

options nocenter notes linesize=256 ;proc iml ;reset storage=imlcat.scag ;load ;show storage ;

** init ;year_f=((0:6)*5)`+J(7,1,2000) ; ** –TIME– ;use pph ; read all var {pph_cen} where(year=2000) into pph00 ; read all var {id} where(year=2000) into id ;use base_cen ; read all var {cou_n pop00 res00 gi00 gn00} into cen ;base00=id||J(200,1,2000)||cen||pph00 ; ** –GEOG– ;

use s.control ; * county control totals ;

Page 46: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-22

read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="HH") into hh_control ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="SF") into sf ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="MF") into mf ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="MB") into mb ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="OT") into ot ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="PP") into pop ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="RS") into res ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="GI") into gqi ; read all var {c1 c2 c3 c4 c5 c6} where(id='CT'|id="GN") into gqn ;sf=sf` ; mf=mf` ; mb=mb` ; ot=ot` ; pop=pop` ; res=res` ; gqi=gqi` ; gqn=gqn` ;hh_control=hh_control` ;id=hh_control[,1] ;tmp=shape(sf[,2:8],0,1)||shape(mf[,2:8],0,1)||shape(mb[,2:8],0,1)||shape(ot[,2:8],0,1) ;tmp=(tmp#(1/tmp[,+]))#shape(hh_control[,2:8],0,1) ;sf=id||shape(tmp[,1],0,7) ; ** –TIME– ;mf=id||shape(tmp[,2],0,7) ; ** –TIME– ;mb=id||shape(tmp[,3],0,7) ; ** –TIME– ;ot=id||shape(tmp[,4],0,7) ; ** –TIME– ;

id=pop[,1] ;tmp=shape(res[,2:8],0,1)||shape(gqi[,2:8],0,1)||shape(gqn[,2:8],0,1) ;tmp=(tmp#(1/tmp[,+]))#shape(pop[,2:8],0,1) ;res=id||shape(tmp[,1],0,7) ; ** –TIME– ;gqi=id||shape(tmp[,2],0,7) ; ** –TIME– ;gqn=id||shape(tmp[,3],0,7) ; ** –TIME– ;

/*** ADJUST HH BY TYPE ***/

**** check and correct negative trends ;use hh ; read all var {id year cou_n hh} where(year>1995) into hh ;use s.sf_stat ; read all var {md_e} where(year>1995) into sf_ ;use s.mf_stat ; read all var {md_e} where(year>1995) into mf_ ;use s.mb_stat ; read all var {md_e} where(year>1995) into mb_ ;use base_cen ; read all var {hox_} into ot_r ;ot_r=shape(repeat(ot_r,1,7),0,1) ; ot_=hh[,4]#ot_r ;type_=hh||sf_||mf_||mb_||ot_ ; type_=choose(type_=.,0,type_) ;

* rescale hh city control to equal county control for each year ;free hh_r type00_ ; do z=1 to 6 ; ** –GEOG– ; county=hh_control[z,2:8] ; hh_=shape(hh[loc(hh[,3]=z),4],0,7) ; ** –TIME– ; hh_=(hh_#(1/hh_[+,]))#county ; hh_r=hh_r//shape(hh_,0,1) ; type00=type_[loc(type_[,2]=2000 & type_[,3]=z),5:8] ; type00_=type00_//(type_[loc((type_[,2]=2000 & type_[,3]=z)),1:4]|| ((type00#(1/type00[+,]))#(sf[z,2]||mf[z,2]||mb[z,2]||ot[z,2]))); end ;type_[,4]=hh_r ;type_=type00_[loc(type00_[,2]=2000),]//type_[loc(type_[,2]>2000),] ;

cf={1 1 2,2 3 3} ;

do z=1 to 6 ; * start county interation ; ** –GEOG– ; ** correction loop ; type=type_ ; hh_type=type_ ; pass=0 ; stop_=0 ; do until(adj=0 | fit_<0.005 | pass=&neg) ; use lim ; read all var {cou cc id lim} where(cou_n=z) into lim ; dim=6||nrow(lim)||4 ; * YEAR X CITY X TYPE ; ** –GEOG– ; pass=pass+1 ; if pass=1 then do ; adj=0 ; fit_last=uniform(J(7,4,0)) ; ** –TIME– ;

end ; if pass>10 then do ;

Page 47: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-23

tmp=shape(type[,5:8][,+],0,7) ; ** –TIME– ; type[,4]=shape((tmp#(1/tmp[+,]))#hh_control[z,2:8],0,1) ;end ;

f12=type[loc((type[,2]>2000 & type[,3]=z)),4] ; * YEAR X CITY ;f13=(sf[z,3:8]||mf[z,3:8]||mb[z,3:8]||ot[z,3:8])` ; * YEAR X TYPE ;rc=nrow(type[loc((type[,2]>2000 & type[,3]=z)),5:8]`)||ncol(type[loc((type[,2]>2000 &

type[,3]=z)),5:8]`) ;dist=shape(type[loc((type[,2]>2000 & type[,3]=z)),5:8]`,0,1) ; * YEAR X CITY X TYPE ;

run ipf3_2F(f12`,f13`,dist,dim,cf,100,0.05) ;dist=shape(dist,rc[,1],rc[,2])`;type=type[loc((type[,2]=2000 & type[,3]=z)),]//(type[loc((type[,2]>2000 &

type[,3]=z)),1:4]||dist) ; free hh_type ;

fit=J(7,4,0) ; do i=1 to nrow(lim) ; city=lim[i,3] ; s=type[loc(type[,1]=city),] ;

do j=5 to 8 ; neg_1=loc(J(1,1,0)//((s[2:7,j]-s[1:6,j])<0)) ; neg_2=loc(J(2,1,0)//((s[3:7,j]-s[1:5,j])<0)) ; neg_3=loc(J(3,1,0)//((s[4:7,j]-s[1:4,j])<0)) ;

neg_4=loc(J(4,1,0)//((s[5:7,j]-s[1:3,j])<0)) ; if nrow(neg_1)>0 then do ;

adj=adj+1 ; if ncol(neg_2)=0 then neg_2=neg_1 ;

if ncol(neg_3)=0 then neg_3=neg_1 ; if ncol(neg_4)=0 then neg_4=neg_1 ; neg_=union(neg_1,neg_2,neg_3) ; b=min(neg_)-1 ; e=max(neg_)+1 ; r=ncol(neg_) ; if max(neg_)=7 then s[neg_,j]=J(r,1,s[b,j]) ; if max(neg_)<7 then s[neg_,j]=J(r,1,s[b,j])+(((1:r)`/(r+1))*(s[e,j]-s[b,j])) ;

end ; end ;

fit=fit+(s[,5:8]) ; hh_type=hh_type//s ;

end ; type=hh_type ; fit=fit-((sf[z,]`||mf[z,]`||mb[z,]`||ot[z,]`)[2:8,]) ;

fit_=max(fuzz(abs(fit[2:7,]))) ; pass_=pass_//(z||adj) ;

run monitor(z,0,pass,adj,fit_,{"Correct neg hh trend"}) ; end ; print z pass fit_ fit [format=8.3] ; ** end loop ; hh_stat=hh_stat//hh_type ; end ; *** end county iteration ; lbl2={id year cou_n hh sf mf mb ot} ;create hh_stat from hh_stat [colname=lbl2] ; append from hh_stat ;create pass from pass_ [colname={z adj}] ; append from pass_ ;

/*** END ADJUST HH ***/

/*** START PPH & RES ***/

free c_ pop_type ;do z=1 to 6 ; *** start county interation ; ** –GEOG– ; free hh_ pred_e hh_type ; use lim ; read all var {cou cc id lim} where(cou_n=z) into lim ; use base_cen ; read all var {res00} where(cou_n=z) into res00 ; read all var {pop00} where(cou_n=z) into pop00 ;

read all var {gqi_ gqn_} where(cou_n=z) into gq ; read all var {res_} where(cou_n=z) into res_ ; read all var {id hh00 pop00 res00 gi00 gn00} where(cou_n=z) into

pop_00 ;

Page 48: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-24

pop_00=pop_00[,1]||J(nrow(lim),1,2000)||pop_00[,2:6] ; pass=0 ; error=1 ; dim=6||nrow(lim)||3 ; * YEAR X CITY X TYPE ; ** –TIME– ; do until(pass=3 | error=0) ; ** start until loop A ; free pred_e ; pass=pass+1 ; adj=0 ; do i=1 to nrow(lim) ; * start over i ; city=lim[i,3] ; run monitor(z,city,0,0,0,{"PPH regressions"}) ; use hh_stat ; read all var {hh} where(id=city & year>1995) into hh ; use pph ; read all var {year pph_cen} where(id=city) into y ; read all var {mn_s mx_s} where(id=city & year=2000) into lim_s ;

if pass=1 then pph_lim=lim_s[,+]/2 ; if pass>1 then pph_lim=pph[loc(pph[,1]=city & pph[,2]=2030),3] ;

y00=y[nrow(y),2] ; y00_=y00_//y00 ;

** quadratic ; if sum(y[,2]=.)<nrow(y) then do ; ** start over non-missing ; t=(loc(y[,2])`) ;

t2=t#t ; y=y[t,2]//pph_lim ; one=j(nrow(y),1,1) ; x1=one||((t||t2)//(51||(51*51))) ; t_f=((0:6)*5)`+J(7,1,21) ; ** –TIME– ; one_f=J(nrow(t_f),1,1) ; x1_f=one_f||t_f||(t_f#t_f) ; call regress(x1,y,reg1) ; prd_e=x1_f*reg1[,1:3]` ;

end ; ** end non-missing ; pred_e=pred_e//(J(nrow(prd_e),1,city)||year_f||prd_e||hh||J(7,1,gq[i,1])||J(7,1,gq[i,2]) ||J(7,1,res_[i,])||J(7,1,lim_s[,1])||J(7,1,lim_s[,2])||J(7,1,pph_lim)) ; end ; * end loop over i ; **** start adding up **** ; id=pred_e[,1:4] ; idy=shape(id[,1],0,7)[,1] ; p_=pred_e[,3:8] ; type=(p_[,1:2][,#])||(p_[,1:3][,#])||(p_[,{1 2 4}][,#]) ; * res, gqi, gqn ; p_=shape(type[,+],0,7) ; p_=id||shape((p_#(1/p_[+,]))#pop[z,2:8],0,1)||pred_e[,5:7] ; * id, year, pph, hh, pop,res_ ; p_[loc(p_[,2]=2000),5]=pop_00[,4] ; free p_type p_adj ;; do i=1 to nrow(lim) ; * start over i ; city=lim[i,3] ; s=p_[loc(p_[,1]=city),] ;

neg_1=loc(J(1,1,0)//((s[2:7,5]-s[1:6,5])<0)) ; neg_2=loc(J(2,1,0)//((s[3:7,5]-s[1:5,5])<0)) ; neg_3=loc(J(3,1,0)//((s[4:7,5]-s[1:4,5])<0)) ;

neg_4=loc(J(4,1,0)//((s[5:7,5]-s[1:3,5])<0)) ; neg_5=loc(J(5,1,0)//((s[6:7,5]-s[1:2,5])<0)) ; neg_6=loc(J(6,1,0)//((s[7:7,5]-s[1:1,5])<0)) ; padj=0 ;

if nrow(neg_1)>0 then do ; adj=adj+1 ; padj=1 ;

if ncol(neg_2)=0 then neg_2=neg_1 ; if ncol(neg_3)=0 then neg_3=neg_1 ; if ncol(neg_4)=0 then neg_4=neg_1 ; if ncol(neg_5)=0 then neg_5=neg_1 ; if ncol(neg_6)=0 then neg_6=neg_1 ;

neg_=min(neg_1,neg_2,neg_3,neg_4,neg_5,neg_6):max(neg_1,neg_2,neg_3,neg_4,neg_5,neg_6) ; b=min(neg_)-1 ; e=max(neg_)+1 ; r=ncol(neg_) ; if max(neg_)=7 then s[neg_,5]=J(r,1,s[b,5]) ; if max(neg_)<7 then s[neg_,5]=J(r,1,s[b,5])+(((1:r)`/(r+1))*(s[e,5]-s[b,5])) ;

Page 49: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-25

if s[7,5]<s[1,5] then s[,5]=J(7,1,s[1,5]) ; end ; p_type=p_type//s ; p_adj=p_adj//padj ;

end ; * end over i ; p_adj=repeat(p_adj,1,7) ; p_=shape(p_type[,5],0,7) ; p_0=p_#abs(p_adj-J(nrow(p_adj),7,1)) ; p_0d=J(nrow(p_0),1,0)||(p_0[,2:7]-p_0[,1:6]) ; p_0d=p_0d#(1/p_0d[+,]) ; p_1=p_#p_adj ; p_0d=choose(p_0d=.,0,p_0d)#(pop[z,2:8]-p_1[+,]-p_0[+,]) ; p_=p_1+p_0+p_0d ; *print idy p_ [format=comma8.] p_0d [format=comma8.] ; *print idy p_1 [format=comma8.] p_0 [format=comma8.] ; p_type[,5]=shape(p_,0,1) ; pph=id[,1:2]||(p_type[,8]#p_type[,5])#(1/p_type[,4]) ; pph=pph[loc(pph[,2]=2030),] ; lim_=pred_e[loc(pred_e[,2]=2030),8:9] ; lw=(round(pph[,3],.1)<round(lim_[,1],.1)) ; hg=(round(pph[,3],.1)>round(lim_[,2],.1)) ; error=(lw||hg)[+,] ; pph_old=pph ; pph[,3]=choose(pph[,3]<lim_[,1],lim_[,1],pph[,3]) ; pph[,3]=choose(pph[,3]<lim_[,1],lim_[,2],pph[,3]) ; dif=round(pph[,3],.1)-round(p_type[loc(p_type[,2]=2030),3],.1) ; print adj pass error ; end ; * end until loop A ; rs=p_type[,5]#p_type[,8] ; gq=(rs||rs)#p_type[,6:7] ; dist=rs||gq ; f12=p_type[loc((p_type[,2]>2000)),5] ; * YEAR X CITY ; f13=(res[z,3:8]||gqi[z,3:8]||gqn[z,3:8])` ; * YEAR X TYPE ; rc=nrow(dist[loc(p_type[,2]>2000),]`)||ncol(dist[loc(p_type[,2]>2000),]`) ; dist=shape(dist[loc(p_type[,2]>2000),]`,0,1) ; * YEAR X CITY X TYPE ; run ipf3_2F(f12`,f13`,dist,dim,cf,100,0.05) ; dist=shape(dist,rc[,1],rc[,2])`; p_type=pop_00//(p_type[loc(p_type[,2]>2000),{1 2 4 5}]||dist) ;

*print p_type ; *** force res, gqi, gqn positive ; pass=0 ; t_type=p_type ; do until (pass=1500 | fit_<.005) ; * start until loop B ; fit=J(7,3,0) ; adj=0 ; pass=pass+1 ; ** –TIME– ; dist=shape(t_type[loc(t_type[,2]>2000),5:7]`,0,1) ; * YEAR X CITY X TYPE ; ** –TIME– ; run ipf3_2F(f12`,f13`,dist,dim,cf,100,0.05) ; dist=shape(dist,rc[,1],rc[,2])`; t_=(pop_00//(t_type[loc(t_type[,2]>2000),1:4]||dist)) ; if pass=1 then fit_last=uniform(J(7,3,0)) ; free t_type ; do i=1 to nrow(lim) ; * start over i ; city=lim[i,3] ; t=t_[loc(t_[,1]=city),] ;

do j=5 to 7 ; * start over j ; neg_1=loc(J(1,1,0)//((t[2:7,j]-t[1:6,j])<0)) ;

neg_2=loc(J(2,1,0)//((t[3:7,j]-t[1:5,j])<0)) ; neg_3=loc(J(3,1,0)//((t[4:7,j]-t[1:4,j])<0)) ;

neg_4=loc(J(4,1,0)//((t[5:7,j]-t[1:3,j])<0)) ; neg_5=loc(J(5,1,0)//((t[6:7,j]-t[1:2,j])<0)) ; neg_6=loc(J(6,1,0)//((t[7:7,j]-t[1:1,j])<0)) ;

if nrow(neg_1)>0 then do ; adj=adj+1 ;

if ncol(neg_2)=0 then neg_2=neg_1 ; if ncol(neg_3)=0 then neg_3=neg_1 ; if ncol(neg_4)=0 then neg_4=neg_1 ; if ncol(neg_5)=0 then neg_5=neg_1 ; if ncol(neg_6)=0 then neg_6=neg_1 ;

Page 50: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-26

neg_=min(neg_1,neg_2,neg_3,neg_4,neg_5,neg_6):max(neg_1,neg_2,neg_3,neg_4,neg_5,neg_6) ;

b=min(neg_)-1 ; e=max(neg_)+1 ; r=ncol(neg_) ; if max(neg_)=7 then t[neg_,j]=J(r,1,t[b,j]) ; if max(neg_)<7 then t[neg_,j]=J(r,1,t[b,j])+(((1:r)`/(r+1))*(t[e,j]-t[b,j])) ; if t[7,j]<t[1,j] then t[,j]=J(7,1,t[1,j]) ; end ;

end ; * end over j ; fit=fit+(t[,5:7]) ;

t_type=t_type//t ; end ; * end over i ; fit=fit-((res[z,]`||gqi[z,]`||gqn[z,]`)[2:8,]) ;

fit_=max(fuzz(abs(fit[2:7,]))) ; run monitor(z,0,pass,adj,fit_,{"Correct neg res, gq trends"}) ;

if int(pass/100)=(pass/100) then do ; print pass adj fit_ ;

end ; end ; * end until loop B ; *** end force positive ; pop_type=pop_type//t_type ; *pot_type=pot_type//(pot_00//(t_type[loc(t_type[,2]>2000),{1 2 4 5}]||dist)) ;end ; * end county iteration ;print pop_type [colname={id year hh pop res gqi gqn}] ;pph=pop_type[,1:2]||pop_type[,5]#(1/pop_type[,3]) ;create pop_type from pop_type [colname={id year hh pop res gqi gqn}] ; append from pop_type ;quit ;

data tmp ; set s.pph_adj_dof ; where year=2000 ; year=2030 ; keep id year mn_s mx_s ;run ;proc sort data=tmp ; by id year ; run ;proc sort data=hh_stat ; by id year ; run ;data pop_stat ; set pop_type ; drop hh ; run ;proc sort data=pop_type ; by id year ; run ;data stat ; merge pop_type hh_stat tmp ; by id year ; pph=res/hh ; p00=put(id,num_p0_.)/1 ; name=put(id,num_nam.) ; low=1.2 ; high=5.5 ; origin=0 ; cou=put(id,num_cou.)/1 ; cc=put(id,num_cc.)/1 ; cou_n=put(cou,county.)/1 ; bound=0 ; dif=0 ; if year=2030 and (round(pph,.1)<round(mn_s,.1) or round(pph,.1)>round(mx_s,.1)) then do ; bound=1 ;

if pph<mn_s then dif=pph-mn_s ; if pph>mx_s then dif=pph-mx_s ;

put id 3. year 5. pph 7.4 mn_s 7.4 mx_s 7.4 dif 8.4 ; end ;run ;proc sort data=stat ; by id year ; run ;proc means data=stat sum ; var bound ; run ;data s.&out ; set stat ; run ;

** Check for control totals ** ;

proc freq data=s.&out ; tables year*cou_n / out=pop noprint ; weight pop ;proc freq data=s.&out ; tables year*cou_n / out=res noprint ; weight res ;proc freq data=s.&out ; tables year*cou_n / out=gqi noprint ; weight gqi ;proc freq data=s.&out ; tables year*cou_n / out=gqn noprint ; weight gqn ;proc freq data=s.&out ; tables year*cou_n / out=hh noprint ; weight hh ;proc freq data=s.&out ; tables year*cou_n / out=sf noprint ; weight sf ;proc freq data=s.&out ; tables year*cou_n / out=mf noprint ; weight mf ;

Page 51: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-27

proc freq data=s.&out ; tables year*cou_n / out=mb noprint ; weight mb ;proc freq data=s.&out ; tables year*cou_n / out=ot noprint ; weight ot ; run ;proc iml ; use pop ; read all var {count} into t ; pop=shape(t,7,6) ; ** –TIME– ; use res ; read all var {count} into t ; res=shape(t,7,6) ; ** –TIME– ; use gqi ; read all var {count} into t ; gqi=shape(t,7,6) ; ** –TIME– ; use gqn ; read all var {count} into t ; gqn=shape(t,7,6) ; ** –TIME– ; use hh ; read all var {count} into t ; hh=shape(t,7,6) ; ** –TIME– ; use sf ; read all var {count} into t ; sf=shape(t,7,6) ; ** –TIME– ; use mf ; read all var {count} into t ; mf=shape(t,7,6) ; ** –TIME– ; use mb ; read all var {count} into t ; mb=shape(t,7,6) ; ** –TIME– ; use ot ; read all var {count} into t ; ot=shape(t,7,6) ; fit_val=hh//sf//mf//mb//ot//pop//res//gqi//gqn ; use s.control ; read all var {c1 c2 c3 c4 c5 c6} into ctrl ; read all var {id} into ctr_id ; ctrl=ctrl[2:nrow(ctrl),] ; fit=fuzz(fit_val-ctrl) ; print ctr_id fit [format=8.3] ;quit ;

* hh by type ;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=3 ;symbol2 c=DEFAULT ci=blue v=NONE width=2 ;symbol3 c=DEFAULT ci=red v=NONE width=2 ;symbol4 c=DEFAULT ci=green v=NONE width=2 ;symbol5 c=DEFAULT ci=orange v=NONE width=2 ;symbol6 c=DEFAULT ci=blue v=NONE width=1 ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;footnote 'HH=black, SF=blue, MF=red, MB=green, OT=brown' ;proc gplot data=s.&out ; by id ; plot (hh sf mf mb ot origin) * year / overlay haxis=axis1 vaxis=axis2

frame ;

format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

* res,pop, etc. graphs ;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=2 ;symbol2 c=DEFAULT ci=blue v=NONE width=1 ;symbol3 c=DEFAULT ci=red v=NONE width=1 ;symbol4 c=DEFAULT ci=green v=NONE width=1 ;symbol5 c=DEFAULT ci=blue v=NONE width=1 ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;footnote 'POP=black, RES=blue, GQI=red GQN=green' ;proc gplot data=s.&out ; by id ; plot (pop res gqi gqn origin) * year / overlay haxis=axis1 vaxis=axis2

frame ;

format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;

Page 52: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-28

run;proc printto;run;

* pph graphs ;

goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=2 ;symbol2 c=DEFAULT ci=yellow line=2 v=NONE width=1 ;symbol3 c=DEFAULT ci=yellow line=2 v=NONE width=1 ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;footnote 'PPH=black' ;proc gplot data=s.&out ; by id ; plot (pph low high) * year / overlay haxis=axis1 vaxis=axis2 frame ; format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;

Page 53: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-29

/************************************************************** ** RSA projection.sas -** Uses baseline data for RSA and creates a simple rescaled forecast** for SCAG subregions based on the city forecasts.** ** INPUT DATA: rsa_base_7_29, scag_forecast (from PPH constrained regression.sas)** OUTPUT DATA: rsa_forecast***************************************************************/

libname s "c:\scag\data" ;libname imlcat "c:\scag" ;options nocenter linesize=256 ;%let in = scag_forcast_8_13_02 ; %let out = rsa_forcast_8_28_02 ;

%let code= 99025, 99006, 99008, 99010, 99011, 99012, 99013, 99014, 99017, 99059, 99003, 99099, 99071, 99015

;

data stat ; set s.&in ; where int(p00/1000)=99 ; data t ; set stat ; where year=2030 ; put p00 6.", " ;run ;

PROC IMPORT OUT= rsa DATAFILE= "C:\SCAG\data\rsa_base_7_29.dbf" DBMS=DBF REPLACE; GETDELETED=NO;RUN ; proc iml ;reset storage=imlcat.scag ;load ;show storage ;

cf={1 1 2,2 3 3} ;

free rsa_pop rsa_hh ;code={&code} ;do w=1 to nrow(code) ; c=code[w,] ; use rsa ; read all var _all_ where(subreg=c) into rsa_ ; id=rsa_[,1:2] ; rsa_=rsa_[,3:ncol(rsa_)] ; rsa_p=(rsa_#(1/rsa_[+,])) ; do y=2000 to 2030 by 5 ; use stat ; read all var {pop res gqn gqi hh sf mf mb ot} where(p00=c & year=y) into stat ; rsa=rsa//(id||J(nrow(id),1,y)||(rsa_p#stat)) ; end ;end ;create rsa_tu from rsa [colname={subreg rsa year pop res gqn gqi hh sf mf mb ot}] ; appendfrom rsa ;close rsa_tu ;sort rsa_tu out=rsa_ts by subreg rsa year ;use rsa_ts ; read all var _all_ into rsa ; print rsa ;

Page 54: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-30

do w=1 to nrow(code) ; c=code[w,] ;

*IPF for population ; f12_p=rsa[loc((rsa[,3]>2000 & rsa[,1]=c)),4] ; * YEAR X RSA - pop ; sr=nrow(f12_p)/6 ; dim=6||sr||3 ; * YEAR X RSA X TYPE ; ** –TIME– ; dist=rsa[loc((rsa[,3]>2000 & rsa[,1]=c)),5:7] ; f13_p=(shape(dist[,1],sr,6)[+,]||shape(dist[,2],sr,6)[+,]||shape(dist[,3],sr,6)[+,])` ; *YEAR X TYPE - pop ; ** –TIME– ; rc_p=nrow(dist`)||ncol(dist`) ; dist_p=shape(dist`,0,1) ; * YEAR X RSA X TYPE - pop ; print f12_p f13_p dist_p dim cf ; run ipf3_2F(f12_p`,f13_p`,dist_p,dim,cf,100,0.05) ; dist_p=shape(dist_p,rc_p[,1],rc_p[,2])`; rsa_pop=rsa_pop//(rsa[loc((rsa[,3]=2000 & rsa[,1]=c)),1:7]//(rsa[loc((rsa[,3]>2000 &rsa[,1]=c)),1:3]||f12_p||dist_p)) ; print rsa_pop ;

*IPF for HH ; f12_h=rsa[loc((rsa[,3]>2000 & rsa[,1]=c)),8] ; * YEAR X RSA - pop ; dim=6||sr||4 ; * YEAR X RSA X TYPE ; dist=rsa[loc((rsa[,3]>2000 & rsa[,1]=c)),9:12] ; f13_h=(shape(dist[,1],sr,6)[+,]||shape(dist[,2],sr,6)[+,]||shape(dist[,3],sr,6)[+,]||shape(dist[,4],sr,6)[+,])` ; * YEAR X TYPE - pop ; ** –TIME– ; rc_h=nrow(dist`)||ncol(dist`) ; dist_h=shape(dist`,0,1) ; * YEAR X RSA X TYPE - pop ; *print f12_h f13_h dist_h dim cf ; run ipf3_2F(f12_h`,f13_h`,dist_h,dim,cf,100,0.05) ; dist_h=shape(dist_h,rc_h[,1],rc_h[,2])`; rsa_hh=rsa_hh//(rsa[loc((rsa[,3]=2000 & rsa[,1]=c)),{1 2 3 8 9 10 1112}]//(rsa[loc((rsa[,3]>2000 & rsa[,1]=c)),1:3]||f12_h||dist_h)) ; print rsa_hh ; end ;check=sum(rsa_pop[,1:3]-rsa_hh[,1:3]) ;print check ;rsa=rsa_pop||rsa_hh[,4:8] ;pop_=rsa[,5:7][,+] ;hh_=rsa[,9:12][,+] ;pop=rsa[,4] ;hh=rsa[,8] ;chk2=((pop-pop_)||(hh-hh_))[+,]; print chk2 ;* use rsa ; * read all var _all_ into rsa_ ;* rsa=(rsa_[,1:2]||J(nrow(rsa_),1,2000)||rsa_[,3:ncol(rsa_)])//rsa ; create rsa_ from rsa [colname={subreg rsa year pop res gqn gqi hh sdo mdo mbo hox}] ; appendfrom rsa ;quit ;

proc sort data=rsa_ ; by subreg rsa year ; run ;data rsa_ ; set rsa_ ; pop_=res+gqn+gqi ; hh_=sdo+mdo+mbo+hox ; dif_p=pop-pop_ ; dif_h=hh-hh_ ;run ;proc print data=rsa_ ; var pop pop_ dif_p hh hh_ dif_h ;run ;

data s.&out ; set rsa_ ; run ;

Page 55: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-31

/************************************************************** ** Create adjusted HH data with neg trends.sas -**** OUTPUT DATA: hh_gr.csv file to matlab***************************************************************/

**** start adjustments **** ;

libname s "c:\scag\data" ;

%include "c:\scag\formats\scag place formats.sas" ;

%let out =s.hh_adj_neg ; * sf mf mb hh ;%let outstr =id year hh c_hh df occ path subpth ;%let var = hh ;%let var9_=hu ; * sf mf mb hu ;%let var8_=hu80 ; * sf80 mf80 mb80 hu80 ;%let string=hh80 hh90 hh00 hc8090 hc9000 ;

options nonotes ;data base_cen ; set base_cen ; if hh80=. then hh80=0 ;run ;

proc iml ; r=3:12 ; r9=1||r ; r=1:10 ; r8=r||12 ; yr=do(1980,2000,1)` ; free out gr ; do i=1 to 200 ; * 1-200 ; use dof ; read all var {&var9_} where(id=i & year>=1990) into hh90d ;

read all var {&var8_} where(id=i & year<=1990) into hh80d ; read all var {id year} where(id=i) into year ;

if nrow(hh90d)=12 then hh90d=hh90d[r9,] ;if nrow(hh80d)=12 then hh80d=hh80d[r8,] ;

use base_cen ; read all var {hh80} where(id=i) into hh80c ; read all var {hh90} where(id=i) into hh90c ;

read all var {hh00} where(id=i) into hh00c ;

if i=56 then hh80c=23000 ; * Error in value for 1980 Carson City ;if i=78 then hh80c=2900 ; * Error in value for 1980 Commerce City ;

if (hh80c>0 & hh90c>0) then do ; * start observations with complete data for 80, 90, 00censuses ;

*** 1980-1990 adjustment ; slp_c=(hh90c-hh80c)/10 ;

slp_d=(hh80d[11,]-hh80d[1,])/10 ; hh80=hh80c ; do j=1 to 10 ; inc=(hh80d[j+1,]-(hh80d[1,]+slp_d*j))/(hh80d[1,]+slp_d*j) ;

hh80=hh80//int( (hh80c+slp_c*j) + ((hh80c+slp_c*j)*inc) ) ; end ;

*** 1990-2000 adjustment ; slp_c=(hh00c-hh90c)/10 ;

slp_d=(hh90d[11,]-hh90d[1,])/10 ; hh90=hh90c ; do j=1 to 10 ; inc=(hh90d[j+1,]-(hh90d[1,]+slp_d*j))/(hh90d[1,]+slp_d*j) ;

hh90=hh90//int( (hh90c+slp_c*j) + ((hh90c+slp_c*j)*inc) ) ; end ;

out_=J(nrow(yr),1,i)||yr||(hh80d//J(nrow(yr)-nrow(hh80d),1,.))||(J(nrow(yr)-nrow(hh90d),1,.)//hh90d)||(hh80//hh90[2:11,]) ;

end ; * end observations with complete data for 80, 90, 00 censuses ;

if (hh80c=0 & sum(hh80d>0)>0 & hh90c>0) then do ; * start observations with completedata for 80, 90, 00 censuses ;

*** 1980-1990 adjustment ; print i "Missing 1980 Census" ; print hh80c hh90c hh80d hh90d ;

Page 56: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-32

hh80d=hh80d[loc(hh80d>0),] ; esthh=hh90c*(hh80d[1,]/hh80d[nrow(hh80d),]) ;

slp_c=(hh90c-esthh)/(nrow(hh80d)-1) ; slp_d=(hh80d[nrow(hh80d),]-hh80d[1,])/(nrow(hh80d)-1) ;

hh80=esthh ; do j=1 to (nrow(hh80d)-1) ; inc=(hh80d[j+1,]-(hh80d[1,]+slp_d*j))/(hh80d[1,]+slp_d*j) ;

hh80=hh80//int( (esthh+slp_c*j) + ((esthh+slp_c*j)*inc) ) ; end ;

if nrow(hh80)< 11 then hh80=J(11-nrow(hh80),1,.)//hh80 ; if nrow(hh80d)<11 then hh80d=J(11-nrow(hh80d),1,.)//hh80d ; *** 1990-2000 adjustment ;

slp_c=(hh00c-hh90c)/10 ; slp_d=(hh90d[11,]-hh90d[1,])/10 ;

hh90=hh90c ; do j=1 to 10 ; inc=(hh90d[j+1,]-(hh90d[1,]+slp_d*j))/(hh90d[1,]+slp_d*j) ;

hh90=hh90//int( (hh90c+slp_c*j) + ((hh90c+slp_c*j)*inc) ) ; end ;

out_=J(nrow(yr),1,i)||yr||(hh80d//J(nrow(yr)-nrow(hh80d),1,.))||(J(nrow(yr)-nrow(hh90d),1,.)//hh90d)||(hh80//hh90[2:11,]) ;

end ; * end observations with complete data for 80, 90, 00 censuses ;

if (hh80c=0 & sum(hh80d>0)=0 & hh90c>=0) then do ; * start observations with completedata for 80, 90, 00 censuses ;

*** 1990-2000 adjustment - type 1 ; if hh90c>0 then do ; if sum(hh90d>0)=11 then do ;

print i "Missing 1990 Census - Type 1a" ; print hh90c hh00c hh80d hh90d ;

slp_c=(hh00c-hh90c)/10 ; slp_d=(hh90d[11,]-hh90d[1,])/10 ;

hh90=hh90c ; do j=1 to 10 ; inc=(hh90d[j+1,]-(hh90d[1,]+slp_d*j))/(hh90d[1,]+slp_d*j) ;

hh90=hh90//int( (hh90c+slp_c*j) + ((hh90c+slp_c*j)*inc) ) ; end ;

end ;

if sum(hh90d>0)<11 then do ; print i "Missing 1990 Census - Type 1b" ;

print hh90c hh00c hh80d hh90d ;mx=max(loc(hh90d[,1]=.)) ; imp=hh90d[11,]/hh00c ;hh90d[1,]=int(hh90c*imp) ; slp=(hh90d[mx+1,]-hh90d[1,])/(mx);do j=1 to mx-1 ;

hh90d[j+1,]=int(hh90d[1,]+slp*j) ;end ;

slp_c=(hh00c-hh90c)/10 ; slp_d=(hh90d[11,]-hh90d[1,])/10 ;

hh90=hh90c ; do j=1 to 10 ; inc=(hh90d[j+1,]-(hh90d[1,]+slp_d*j))/(hh90d[1,]+slp_d*j) ;

hh90=hh90//int( (hh90c+slp_c*j) + ((hh90c+slp_c*j)*inc) ) ; end ;

end ; end ;

*** 1990-2000 adjustment - type 2 ; if hh90c=0 then do ;

print i "Missing 1990 Census - Type 2 *************************** BING " ; print hh90c hh00c hh80d hh90d ; hh90d=hh90d[loc(hh90d>0),] ; esthh=hh00c*(hh90d[1,]/hh90d[nrow(hh90d),]) ;

slp_c=(hh00c-esthh)/(nrow(hh90d)-1) ; slp_d=(hh90d[nrow(hh90d),]-hh90d[1,])/(nrow(hh90d)-1) ;

hh90=esthh ;

Page 57: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-33

do j=1 to (nrow(hh90d)-1) ; inc=(hh90d[j+1,]-(hh90d[1,]+slp_d*j))/(hh90d[1,]+slp_d*j) ;

hh90=hh90//int( (esthh+slp_c*j) + ((esthh+slp_c*j)*inc) ) ; end ; if nrow(nn90)< 11 then hh90=J(11-nrow(hh90),1,.)//hh90 ;

if nrow(hh90d)<11 then hh90d=J(11-nrow(hh90d),1,.)//hh90d ; end ;

out_=J(nrow(yr),1,i)||yr||(J(nrow(yr),1,.))||(J(nrow(yr)-

nrow(hh90d),1,.)//hh90d)||(J(nrow(yr)-nrow(hh90d),1,.)//hh90) ;end ; * end observations with complete data for 80, 90, 00 censuses ;/* start growth rates */

gr20=out_[loc(out_[,5]>0),5] ;gr20=(gr20[nrow(gr20),]-gr20[1,])/nrow(gr20) ;gr10=out_[loc(out_[,5]>0),5] ;

if nrow(gr10)<=10 then gr10=(gr10[nrow(gr10),]-gr10[1,])/nrow(gr10) ; if nrow(gr10)>10 then gr10=(gr10[nrow(gr10),]-gr10[nrow(gr10)-10,])/11 ;

gr=gr//(i||gr10||gr20) ;

/* end growth rates */

out=out//out_ ; end ; use s.control ; read all var {C1 C2 C3 C4 C5 C6} where(id='HH') into HH ; read all var {C1 C2 C3 C4 C5 C6} where(id='CT') into cou ; ctrl=cou[,]`||HH[nrow(HH),]` ; print ctrl ; create ctrl from ctrl [colname={cou lim}] ; append from ctrl ; create out from out [colname={id year hh80d hh90d hh}] ; append from out ; create gr from gr [colname={id gr10 gr20}] ; append from gr ;quit ;data s.hh_adj_neg ; set out ; run ;data s.hh_gr ; set gr ; cou=put(id,num_cou.)/1 ; run ;proc sort data=s.hh_gr ; by cou ; run ;proc sort data=ctrl ; by cou ; run ;data s.hh_gr ; merge s.hh_gr ctrl ; by cou ; run ;proc sort data=s.hh_gr ; by id ; run ;

data out ; set out ; zero=0 ; max=20000 ; name=put(id,num_nam.) ; run ;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue graphrc interpol=join;symbol1 c=DEFAULT ci=black v=NONE width=2 ;symbol2 c=DEFAULT ci=black v=NONE width=2 ;symbol3 c=DEFAULT ci=BLUE v=NONE width=2 ;symbol4 c=DEFAULT ci=GREEN v=NONE ;symbol5 c=DEFAULT ci=orange v=NONE ;axis1 color=blue width=2.0 ;axis2 color=blue width=2.0 ;axis3 color=blue width=2.0 ;

proc gplot data=out ; by id ; plot (hh80d hh90d hh zero max) * year / overlay haxis=axis1 vaxis=axis2 frame ;

format id num_nam. ;run; quit;proc printto log=work.curstat.last.log new;run;proc printto;run;proc means data=s.hh_gr ; run ;

proc iml ; yr=do(1980,2000,1)`//do(2005,2060,5)` ; do i=1 to 200 ; id=id//(J(nrow(yr),1,i)||yr) ;

Page 58: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-34

end ; create id from id [colname={id year}] ; append from id ; quit ;

data t ; set s.hh_gr ; file "c:\scag\matlab\hh_gr.csv" ; put id "," cou "," lim "," gr10 "," gr20 ;run ;

data t ; set s.hh_adj_neg ; keep id year hh ; run ;proc sort data=t ; by id year ; run ;data t ; merge id t ; by id year ; run ;data t ; set t ; file "c:\scag\matlab\hh_neg.csv" ; put id "," year "," hh ;run ;

Page 59: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-35

/************************************************************** ** MATLAB code – this is a working draft and generates ** complex numbers for some city forecasts**************************************************************/

% Constrained city forecasts for SCAG to 2030 ;

%control parameters ;alph=.5 ; % weight on past 10 year growth in 2030 constraint ;seed=5 ; % random number seed ;nsim=3 ; % number of simulations ;nyear=71 ; % number of years including forecast year ;byear=1980 ; % base year ;eyear=2050 ; % end year ;Gprms=[0 0.15 0 1; 1 0.2 .3 1 ; 2 0.5 .4 1] ;gparm=[Gprms(:,[1 2]) zeros(3,1) Gprms(:,3) Gprms(:,3) ones(3,1)] ;

% data ;addpath c:\scag\matlabhh=csvread('hh_data.csv') ;hh_s=hh(find(hh(:,3)>0),:) ;

% design matrix ;constant=kron(eye(200),ones(nyear,1,1)) ;time=kron(eye(200),(byear:1:eyear)') ;x_f=[constant time] ; % projection matrix ;%x_f(:,1)=ones(size(x_f,1),1) ;x=x_f(find(hh(:,3)>0),:) ; % design matrix ;

%correlated beta dist ;gp=csvread('scag_gp.csv') ;mn=gp(:,1)+ones(200,1) ;mx=gp(:,3) ;rng=mx-mn ;w=gp(:,4) ;v=gp(:,5) ;all=dlmread('all.csv',',') ;coord=all(:,[1 2]) ;GMRF=cholsim(coord,gparm,nsim,seed) ;UNIF=rand(10000,1) ;prob=hist_equal(GMRF,UNIF,[0 1]) ;BETA=betainv(prob,repmat(w,1,nsim),repmat(v,1,nsim)) ;gp_lim=round(repmat(mn,1,nsim)+(BETA.*repmat(rng,1,nsim))) ;

% Simulated gompertz growth paths ;Gomp = zeros(200,nsim);y_f=zeros(size(x_f,1),nsim) ;for is = 1:nsim ind=(hh(find(hh(:,2)==2000),3)<gp_lim(:,is)) ; % 0/1 indicator of increase or decrease ; ind=reshape(repmat(ind,1,nyear)',200*nyear,1) ; ind_=ind(find(hh(:,3)>0),1) ; c=reshape(repmat(gp_lim(:,is),1,nyear)',200*nyear,1) ; c_=c(find(hh(:,3)>0),1) ; y=zeros(size(c_,1),1) ; row=find(ind_==0) ; y(row)=log(log(hh_s(row,3))-log(c_(row))) ; row=find(ind_==1) ; y(row)=log(log(c_(row))-log(hh_s(row,3))) ; beta=inv(x'*x)*(x'*y) ; XB=x_f*beta ; row=find(ind==0) ; y_f(row,is)=exp(exp(XB(row)))./c(row) ; row=find(ind==1) ; y_f(row,is)=c(row)./exp(exp(XB(row))) ; %disp(' '); %tit = ['Realization # ', num2str(is)]; %disp(tit); %disp('Mean and variance of simulated values');

Page 60: POPULATION AND HOUSEHOLD PROJ ECTION METHODOLOGY …sweeney/research/scag/SCAG_Final_Report.pdf · (sections 3.2 and 3.4). The input data series, as described in section 3.1, can

A-36

%disp(num2str([mean(noise(:)) var(noise(:),1)]))end

beta=reshape(beta,200,2) ;slope=exp(beta(:,2)) ;figure(2) ;locmap([coord slope], [-119.2464 -114.5857 ; 32.6857 34.8612],[min(slope)max(slope)],.2,'Slope',0) ;