87
1 Developing a specific measure of insolvency risk for the Pension Protection Fund

Developing a specific measure of insolvency risk for the ... · The purpose of this paper is to document the creation of the PPFspecific Insolvency Risk - Score created by Experian

Embed Size (px)

Citation preview

  • 1

    Developing a specific measure of insolvency risk for the Pension Protection Fund

  • 2

    Contents

    Executive Summary

    Introduction 1

    Scope and Purpose of the model 2

    Scorecard Development Overview 3

    Data Sources 4

    Data Variables 5

    Model Validation 6

    Overrides 7

    Operational application, delivery and support of the score 8

    Conclusions and Next Steps 9 Appendix 10

    About Experian Our business information credentials

  • 3

    Executive Summary

    The purpose of this paper is to document the creation of the PPF-specific Insolvency Risk Score created by Experian for the sole purpose of providing the PPF with an objective evaluation of the likelihood of any given eligible scheme employer or guarantor becoming insolvent during the next 12 months. In it, we introduce the background and outline the context in which such a score is required. We define the scope and purpose of the model, the success criteria against which it is to be assessed and the steps taken to meet these criteria. We then detail the approach to developing the model, including the definition of the outcome that the model seeks to predict, the data variables tested and ultimately used in creating the scorecards, the population on which it was trained, segmentation of this population and the methodology used to construct scorecards for each of the segments. Next, we provide evidence of the ability of the model to meet the success criteria and finally, we outline the operational application of the model.

  • 4

    1. Introduction The Pension Protection Fund (PPF) collects an annual Levy from PPF-eligible Defined Benefit (DB) pension schemes (mainly in the private sector). In the event that an employer becomes insolvent, and there are insufficient assets in the pension scheme to pay PPF levels of compensation, the scheme would enter the PPF, with the PPF taking the assets of the scheme and paying compensation to the members out of the funds collected from schemes that transfer, the Levy and the return on its assets. There are around 17,000 PPF employers, and the total levy collected, whilst varying from year to year, is around 650m per year. Approximately 10% of this Levy is Scheme-based, i.e. calculated based on the size of liabilities within the scheme. Almost 90% of the Levy is Risk-based, i.e. based on the likelihood that it will need to make a claim on the PPF and the likely scale of any potential claim. The definition of the Risk-based Levy is found in s175 of the Pensions Act 2004 and the Board needs to determine the rules for calculating it within these parameters. In broad terms, there are four fundamental determining factors:

    1. the degree to which the pension fund is underfunded; 2. the likelihood of the employer becoming insolvent over the course of the 12 months

    for which the levy is paid; 3. the risks associated with the nature of a scheme's investments when compared with

    the nature of its liabilities; 4. risk reduction measures (e.g. contingent assets).

    The first, third and fourth components are calculated by the PPF, whilst the second is calculated by an independent third party with expertise in predictive risk modelling. Experian was retained by the PPF in July 2013 to provide the Insolvency Risk Score component. The Insolvency Risk Score element is also relevant to one of the risk-reduction measures described in the fourth factor, namely Type A contingent assets parent company guarantees which, in some circumstances, provide for the guarantor's insolvency risk to be used, rather than that of the employer(s). This decision was made after a competitive tender where Experian offered two alternative solutions. The first was the option to use its off-the-shelf credit score, Commercial Delphi, which is widely used by lenders and providers of trade credit as a tool to inform lending decisions. As an already existing score, the PPF were able to evaluate the strength of this option from the outset of the project. The alternative proposed solution was for Experian to construct a new scoring model, calibrated to predict insolvencies within the specific universe of employers that participate in eligible DB pension schemes and are, therefore, in the PPF universe. This paper focuses on the construction of this new model the PPF-specific score, or Pension Protection Score (PPS).

  • 5

    2. Scope and Purpose of the Model Within Experians proposal to the PPF was a choice between two models:

    1) Experians proprietary off-the-shelf credit score, Commercial Delphi; 2) a tailored score, created specifically to model insolvency risk within the PPF

    universe Details of this choice are outlined in Section 3.2. Essentially though, the PPF chose to retain both options until they could be compared and an informed decision taken (since, of course, the PPF-specific model did not exist and so could not be tested until it had been built). The initial premise of the PPF-specific model was that it should be as demonstrably accurate in its ability to predict insolvencies as possible, thereby reassuring stakeholders that the Risk-based levy was determined in as fair a way as possible. However, other considerations beside absolute accuracy are important to the acceptance of the model amongst stakeholders, and to reflect this, the PPF outlined a set of success criteria against which the PPF-specific model would be assessed, in comparison with the alternative option Commercial Delphi.

    2.1 PPFs success criteria The PPF set the following criteria against which the PPF-specific model and the alternative option Commercial Delphi would be assessed:

    1. Improved predictive power. The standard test of the predictive power of risk scores is the Gini coefficient. Expressed as a percentage, a score of 100% would indicate that a model was capable of perfectly identifying in advance which of a given population would survive the period and which would not. A score of 0% would indicate that a model had no more predictive power than assigning probabilities to the population at random. What constitutes a good Gini depends on how easy it is to predict the outcome of interest. In the context of insolvency risk prediction, scores of below 45% would typically be considered weak, 45-55% is average and anything above 55% strong. As a further test, on the set of employers that are listed and so have a rating from one or more of the main Rating Agencies (Moodys, S&P and Fitch), the scores were compared to these ratings to ensure that they were broadly aligned with their view of the business.

  • 6

    2. Adherence to best practice. The PPF retained PwC, an independent specialist, to evaluate the process by which the score was developed and report on any concerns that their view of best practice procedure was not followed during this process.

    3. Stability of model outputs. As well as being predictive, it is important that the score

    developed should be stable and sustainable. Stability is tested in a number of ways. Firstly, the predictive power (measured by the Gini coefficient) should not fluctuate significantly over time. A degree of fluctuation is inevitable, given the relatively small size of the population being scored and the consequent fact that a small number of unexpected insolvencies in a given year could have a material impact on the Gini for that year. However, significant fluctuations would give cause for concern. Secondly, the model should be demonstrated to work well across different segments of the population for example, on small, medium and large firms; independents and group members; manufacturers, construction firms, retailers, service sector businesses etc. Thirdly, the probability of insolvency assigned to a given business should not fluctuate significantly over time, and any substantial increase or decrease in score should be justifiable.

    4. Resilience to manipulation. There is clearly a strong incentive for employers to seek

    to improve their insolvency-risk score, as doing so reduces the levy that they have to pay. Whilst it is not possible and indeed, not desired to prevent employers from being able to take steps to improve their score, it is desirable that employers should not be able to game the model: that is, to take simple steps to change pieces of information that in reality do not affect their survival prospects, yet do improve their score.

    5. An appropriate level of transparency of calculation. Whilst not wishing to make it easy

    or even possible for employers to manipulate their score, it is important that each employer is able to understand its score, and be able to challenge the score that they have been awarded. This requires a high level of transparency in the calculation of the score.

    6. Successful transition from D&B failure scores. It is recognised that employers have

    learned to understand the D&B scores that currently affect their levy calculation. It is unavoidable that in switching to a different score, some businesses will find their insolvency risk has increased, while for others, it will have decreased. Naturally, this means that some businesses will be levied differently. It is, therefore, important that employers can understand how the new score is calculated so that they can understand why their score has changed.

  • 7

    7. Appropriate appeals mechanism. As noted above, it must be possible for any employer to be able to challenge the score that they have been awarded. In the first instance, this should be through an informal enquiry to understand their score, scheme sponsors having the opportunity to correct any inaccurate information which has contributed to the score. Should this process not satisfactorily resolve the complaint, then it must be clear to the employer what steps they can take to appeal formally against their score, and how this appeal will be dealt with by Experian and if relevant by the PPF.

    8. Flexibility and coverage. It is important that as many of the PPF employers as

    possible are assigned a score based on their own information, rather than a default score, for example the industry or scheme average. In particular, certain types of employers, such as Not-for-Profit organisations and non-UK employers have been identified as populations for which the score must be able to flex, or an alternative solution provided, to ensure such entities are being scored fairly and appropriately.

    9. Cost of information requirements on employers. Again recognizing the importance to

    employers of being able to monitor their scores and take positive steps to improve these (though not through manipulation, as outlined above), the new score should not increase the burden on employers in terms of gathering information, and if possible, should reduce this burden.

    2.2 Experians approach to meeting these criteria Several of these success criteria are things that the construction of a new score, tailored to the PPF employer universe was explicitly intended to achieve. Others were not specific considerations at the time of proposing the approach.

    1) Improved predictive power. As outlined in the introduction, the primary reason for proposing to construct a new score was that a model tailored to the PPF employer population would be able to use variables specifically proven to be predictive within that highly unique cross section of the business universe, and combine them in an optimal way to predict probability of insolvency. The extent to which this proves true will be the main measure of success against the first of the criteria.

    The comparison of the new score against Ratings Agency ratings where available is a further test of the model. However, it should be noted that such ratings are produced using a wealth of data about the business being rated, which is only available via intensive research. Nonetheless, one would expect that a AAA rated firm, deemed by the Ratings Agencies to be of the highest possible credit quality, should not receive a poor insolvency probability score.

  • 8

    2) Adherence to best practice. Needless to say, Experian continually strives to provide the best possible quality of work to all clients. The approach we outline in Section 5 followed our own standard methods for building predictive models, and we have engaged with PWC, from the start of their involvement in the project, to understand and react to concerns or questions they may have about our methodology or results. Several adjustments to the original model have been made as a direct result of feedback provided by PWC, including the addition of two entirely new scorecards.

    3) Stability of model outputs. Although not one of the main reasons for constructing the

    new score, a higher level of stability than would be found in off-the-shelf models is likely to be a beneficial side effect. These off-the-shelf credit scores are developed with the intention of giving credit providers the most up-to-date view of how likely a given business is of being able to repay a debt. As such, they are typically redeveloped periodically - every few years - to improve predictive power, often benefitting from new data sources not previously available. This is obviously likely to result in significant changes. Secondly, they tend to make significant use of timely event driven data monthly payment performance, County Court Judgments (CCJs) etc. - to ensure that scores are as current as possible. This induces turbulence on a monthly basis, meaning that the score as at 31st March in a given year may or may not be the same as those assigned over the preceding few months.

    The new score uses primarily financial information derived from filed accounts, making minimal use of payment performance and none at all of CCJs. Therefore, for the vast bulk of employers the only factors which can trigger a change to the score is the filing of a new set of accounts, or potentially a new charge being registered at Companies House, or an existing charge being satisfied. For a small proportion of employers, a significant change in payment performance could also lead to a monthly change, but even within this population, the weight of this characteristic is not great. Substantial testing has been undertaken to confirm the stability of the score in terms of predictive power within different segments of the population and over different timescales (see section 7 Model Validation for more details).

    4) Resilience to manipulation. The fact that publicly available information, primarily filed

    accounts, are the principal component of the score means that it should be substantially harder to make changes that have no substantive impact on the probability of insolvency but that result in positive score changes. Adding an extra director to the board or changing the SIC code reported on annual accounts are examples of alterations which can be made relatively easily, yet which are very unlikely to fundamentally alter the likelihood of the business becoming insolvent. Within the new score, for a business to achieve an improvement in score, it would need to improve some fundamental aspect of its financial health for example, reduce its liabilities, improve its profit margins or increase its cash and liquid assets.

  • 9

    Additionally, recognising that most employers will want to understand their score and, in the interest of building and maintaining a positive relationship with these employers, a high degree of transparency is agreed to be beneficial. Indeed, the ability to provide greater transparency than proprietary off-the-shelf models was seen as a very significant advantage of the new score.

    5) Transparency of calculation. The use of a model, details of which can be better laid out in the PPF's annual determination provides for a much greater level of transparency. Experian has also developed a what-if analysis calculator which will be freely downloadable from a web portal enabling users with access rights to see exactly which information has been used to generate their score, and to test the impact of making any changes to that information.

    It is inevitable that, to a degree, transparency and susceptibility to manipulation go hand in hand: the more information is provided to employers about how their score is calculated, the better able they are to make changes that will influence the score.

    6) Successful transition from D&B failure scores. In principle, one would expect that a

    strong, healthy business would be seen as exactly that, no matter what specific metrics were used to measure the insolvency risk attached to the business. Similarly, a struggling business will not be able to hide that reality. Nonetheless, it is inevitable that different models might produce different results in some cases.

    The key aspect of this success criterion for us is the ability to fully justify the scores that we have awarded. The level of transparency that comes with the score will help this we will be able to inform specifically which variables have been used, how heavily each weighs within the model, and provide empirical evidence to justify why certain characteristics have led to a given business being scored in a certain way.

    Naturally, in switching to a new scorecard, a proportion of employers will experience a change in the insolvency probability assigned to them if this was not the case, it would be impossible for the score to represent any significant improvement in predictive power. In some cases, the change in insolvency probability will be relatively minor, i.e. will not alter the levy band to which a scheme is assigned. In other cases though, the movement could be more substantial.

    7) Appropriate appeals mechanism. Experian in consultation with the PPF has developed a formal Appeals process. The expectation is that the greater transparency provided by the new score and provision of an online portal for monitoring scores and their input data will reduce the volume of Appeals compared to that previously experienced.

  • 10

    8) Flexibility and coverage. The initial proposal was to construct and calibrate the new

    score within the population for which we have full visibility of insolvencies i.e. Companies-House registered businesses, which account for 90% of the employer population. Given that the score relies predominantly on financial data from filed accounts, we proposed to source accounts for non-Companies-House registered businesses and then apply the same scoring algorithm to the information in these accounts as is applied to those sourced from Companies House.

    There are two main alternative sources of financial data. For charities registered with the Charity Commission (in England and Wales, Scotland and Northern Ireland), accounts can be sourced directly from these bodies. For non-UK businesses, we have another partner organization DBI which is able to source electronic accounts from local data suppliers across most of the countries in which PPF employers reside.

    Additionally, both ourselves for the UK and DBI for non-UK entities have the ability to research companies manually and capture filed accounts that may not appear on any central register such as Companies House or the Charities Commission, but which are publicly available. We will also enable entities which are not required to file account information to submit this to us directly. Finally, for the relatively small population of employers for which it is not possible to source the relevant information by one of these methods, we would use industry, or scheme or blended average insolvency probabilities (in line with PPF current practice).

    9) Cost of information requirements on employers. The nature of the inputs to the

    score mainly filed account figures should ensure that the burden on most employers in terms of information gathering or provision is light, and no more than they are required to do in any case. It is, of course, possible that some employers may choose to increase the amount of information that they report in an effort to improve their score, but this would be a choice that they make, rather than a burden imposed upon them. As noted above, the transparency of the score should also reduce the amount of time employers need to understand their scores, the components that go into it, and what they would need to do to improve it (albeit that the latter may not be as easy to achieve as was previously the case).

    In meeting these criteria, Experians engagement with the PPF has been extensive, and has incorporated feedback and concerns raised by the Industry Steering Group set up by the PFF as well as PwCs comments in their role as an independent reviewer of best practice.

  • 11

    2.3 The choice between using an existing model or constructing a PPF-specific model

    Experian has an off-the-shelf credit score, Commercial Delphi, which is well established (the first version was developed in 2003) and is widely used by lenders and businesses extending trade credit to customers as a tool to inform these lending decisions. However, it was not our preferred proposition for a number of reasons:

    1. Commercial Delphi is trained on predicting failure across the entire business universe of well over 2m incorporated businesses (a non-Ltd Commercial Delphi also exists, again trained to predict failures across a similar population of sole traders and partnerships). As we outline in section 3.4, the PPF employer population is not at all representative of this wider universe in terms of a number of fundamental measures and characteristics. Consequently:

    a. some of the data variables used within this score are likely to be inappropriate indicators of insolvency probability for the PPF universe;

    b. other data items may be highly predictive within the PPF population, yet are either not predictive, or rarely available within the wider universe and so not used within Commercial Delphi;

    c. even when variables are used within Commercial Delphi and remain

    predictive within the PPF universe, their relative weight within the score may not be optimal for this particular population.

    2. Commercial Delphi is designed as a failure score, not an insolvency score. That is, it

    has been calibrated to predict all forms of company failure, including dissolution, and not solely those involving insolvency.

    3. Commercial Delphi is a proprietary model, the formula for which is Experians IP.

    Whilst our customer services team is trained and well used to handling queries about scores from both clients and businesses that have been scored, the level of transparency around the calculation of the score that could be provided would be limited. This could lead to frustration and an unsatisfactory experience for the levy payers seeking to understand their score and how it could be improved.

    4. In fact, lessons learnt discussions with the PPF highlighted concerns, that stakeholders have expressed about off-the-shelf models such as Commercial Delphi:

    a. concerns over the manipulability of the score, due to some of the information used to calculate it (for example, number of directors, and industry sector

  • 12

    classification are relatively easy to amend without making any fundamental changes to the business, yet can affect off-the-shelf models);

    b. relating to point 1a above, specific concerns that variables like payment performance and CCJs, whilst highly predictive for smaller businesses are not appropriate measures for multi-million-pound turnover businesses which make up the majority of PPF employers;

    c. concerns that the inclusion of factors such as these lead to a level of volatility

    in the score month-on-month when the reality is that the actual likelihood of a business becoming insolvent over the next 12 months rarely varies to that extent.

    In summary, therefore, our expectation was that Commercial Delphi would be to a large extent, a like-for-like replacement for the D&B rating: - both are very widely used and accepted models within the whole UK business universe, but are not optimised for the PPF universe. Both have an element of black box, i.e. lack of transparency over their calculation. As a generic model, it is not optimised for resistance to manipulation to improve the score, without truly improving the quality of the business. It also includes timely and event-driven information such as CCJs which can lead to scores fluctuating significantly over even a short period of time. Experian has recent experience in constructing a separate score based purely on financial information, as well as in developing bespoke scorecards for other clients. Some of these are trained on the clients own experience of default and, in some cases, the models constructed incorporate customers own data. Such bespoke work has shown that there are substantial improvements in model accuracy to be gained from tailoring a model to both the specific outcome that one wishes to predict, and the population of businesses that one requires to score. Consequently, our recommendation was to follow a similar approach and build a new model, trained on actual historical insolvency experiences within the universe of PPF employers over the 7 years that the PPF has been calculating a Risk-based Levy, and using variables demonstrated to be most predictive and appropriate for the very particular mix of businesses having eligible defined-benefit pension schemes.

  • 13

    2.4 Overview of the PPF universe The population for which a score is required consists of 17,084 distinct employers as of March 2013. This breaks down as follows: Figure 1: PPF Employers by type of entity Type of Business Number of

    employers % of population

    UK businesses incorporated and filing annual accounts at Companies House

    12,113 71%

    Charities & not-for-profit organisations; Public sector entities (health, education etc.)

    3,052 18%

    Non-registered sole traders / partnerships 734 4% Non-UK businesses 437 3% Registered businesses not filing accounts 221 1% Dormant companies 65 0% Unmatched UK entities 462 3% Total 17,084 100% Within the largest population above that of UK businesses incorporated at Companies House, there are substantial, and very fundamental, differences in the types of business when one compares PPF employers to the wider business universe on which off-the-shelf commercial credit scores are calibrated:

    PPF employers are typically large businesses (between 1 in 3 and 1 in 4 having turnover in excess of 100m, whilst over 99.5% of the business universe is smaller than this).

    The vast bulk of PPF employers (86%) are not independent entities, but part of a

    group either the parent (11%) or a subsidiary (75%). Within the whole universe, the inverse is true over 90% of firms are independent (and indeed, 40% are sole traders or traditional partnerships, not even incorporated entities these account for fewer than 2% of the PPF employer population)

    Only 8% of PPF employers were incorporated in the 21st century, with the majority

    (70%) being established for over 20 years. By comparison, over 50% of firms currently alive and trading in the UK were created since the turn of the century.

    The manufacturing sector accounts for a significant proportion of PPF employers -

    20%, compared to under 10% of the overall business universe. The PPF universe has a correspondingly lower proportion of retail and service sector businesses than the wider business population.

  • 14

    3. Scorecard Development Overview

    The model has been developed by two teams within Experian working separately, but in conjunction with one another.

    a. The London team consists of a Product Director, Project Manager, Senior Consultant, Lead Data Analyst and Senior Data Analyst, with support from other Analysts as needed. The expertise here is in working with the data on a daily basis, and understanding its nuances, caveats, possibilities and difficulties, as well as in building predictive models.

    b. The Paris team consists of a Senior Project Manager and Lead Data Analyst,

    both of whom have statistical backgrounds and qualifications and extensive experience of building predictive models such as this one, supported by a Client Director with an actuarial degree.

    Most of the preparatory work was undertaken by the London team, with findings shared and discussed with the Paris team. The actual segmentation and construction of the models was undertaken by the Paris team, with findings shared with the London team and with regular iterations of the models taking into account comments and concerns raised. The testing of the models was then shared between the two teams, with the Paris team undertaking the more statistical tests and the London team testing the model on different subsets of the population and over different timeframes. Firstly, the universe of employers to be scored is identified. In this case, the PPF was able to provide details of every eligible scheme and every employer and guarantor within those schemes for each of the historical levy years from 2006/07 to 2013/14. The majority of employers appear in all, or many, of these levy years, though it is possible for new employers to appear, and of course for existing employers to be removed (e.g. because they have gone insolvent). Each of these employers was matched against the Experian database, and assigned a business level Unique Reference Number. In the majority of cases, the PPF was able to include the Companies House Number (CRN) attached to each employer. This was cross-referenced against the CRN derived from the Experian match. For the purposes of constructing the model, only safe matches, i.e. those with the same CRN as provided by the PPF, were used. This avoided contaminating the model with spurious data caused by potentially incorrect matches. For the purposes of ongoing matching, incidences of discrepancies between the CRN provided via the Pensions Regulator (tPR) and that which Experian matches to will be raised with the PPF for manual decisions to be made as to the most appropriate match. Naturally, in the event that a wrong choice of match is made, the employer in question would be able

  • 15

    to raise this with Experians customer services team and provide the relevant information to have the match corrected.

    Any businesses that had become insolvent (based on the information Experian receives from Companies House, detailed in Section 4 above) were flagged as such, with the date of entering insolvency recorded. Since the PPF levy year runs from 1st April to 31st March, it is necessary to calculate scores for all employers as at 31st March, predicting the likelihood of each employer becoming insolvent over the subsequent 12 months of the levy year. Consequently, all information known about a given employer as at 31st March of the levy year in question is attached to the employer in question. In this way, each employer is treated as a separate data point for each year in which they appear. We simply re-assess their probability of becoming insolvent over the next 12 months based on the latest information available.

    This approach increases the number of data points on which the model can be trained effectively around 17,000 employers x 6 years, rather than each employer representing just one data point. It also means that the model should not be over-fitted to a particular year, which may or may not be representative of the experience of insolvency in other years. Instead, the calibration period covers a benign economic environment (2006-7), a recession (2008-9), and a recovery period (2010-12). To further check for the possibility of over-fitting the model to the population on which it was developed, a hold-out sample of 10% of the total population over the 6-year period was randomly selected and tested to ensure it was representative of the development population.

    The information appended to each employer as of 31st March of the levy year covered full financials from the most recent set of filed accounts and the previous 3 years of filed accounts, plus a selection of non-financial information which has proven to be predictive in past risk models (see section 7 for more details of the variables tested). Each variable was assessed in terms of:

    c. the proportion of the development population for which the information was available;

    d. the ability of the variable, alone, to predict which employers would become

    insolvent during the levy year (as measured by the Gini coefficient of the variable in question).

    Variables with poor coverage, i.e. where a substantial proportion of the population would be classed as unknown, and those shown to have little predictive power were

  • 16

    removed and not assessed within the model. Rather than applying absolute cut-offs, e.g. requiring a minimum of 75% coverage of a variable, and a Gini of at least 10% for a variable to be retained, an element of judgment was applied.

    e. This recognises that many of the variables capture essentially the same characteristics so, for example, if two very similar measures of profitability were considered and one has a higher level of coverage and predictive power, the other may be removed even if it still has reasonably high levels of both. Conversely, a variable which is relatively complementary to others being considered, and shown to be additively predictive, may be retained even if the coverage is lower than might ordinarily be acceptable

    Having thus identified the selection of candidate variables for inclusion within the model, segmentation of the population was undertaken to further divide it into homogeneous sub-populations.

    3.1 Population segmentation and sampling

    It was agreed that the population to be used for creating the model scorecards would be only those PPF employers that matched with a sufficiently high level of confidence to the Experian business database. Additionally, only employers where Experian agreed upon the match with PPF (exactly the same Companies House number) would be used to ensure that all businesses used within the model were definitively within the PPF universe.

    This decision means that non-UK businesses and businesses not registered with Companies House were not included within the modelling population. The reason for this is the lack of a definitive record of insolvency from a consistent source. The scoring of these businesses is, therefore, based on a combination of applying the scorecards to those businesses where the data is available, and overrides in the form of industry or scheme averages.

    Additionally to these exclusions any business that was considered to be dead i.e. had already entered insolvency proceedings, or had entered into dissolution prior to the scoring date, March 31st of each given year, would be removed from the modelling population.

    In developing a predictive model, it is preferable not to apply a one size fits all approach, but to segment the population into a number of homogeneous sub-populations, with a separate scorecard being developed for each segment. The inputs and weightings assigned to each scorecard within one segment may be substantially different to those in another segment. It is of course crucial to ensure that the number of data points within each sub-population is sufficient to enable a statistically robust model

  • 17

    to be produced. However, this means that it is necessary, in defining the segments to be modelled separately to make certain choices as to what constitutes homogeneity.

    Numerous choices exist in terms of how to segment any given universe population: for example, size of business, industry sector, group affiliation, number and type of annual accounts filed (and so depth and history of data available). The rationale for creating segments is to identify populations that behave significantly, and demonstrably differently, either in absolute terms, or in terms of the factors that predict the likelihood of insolvency.

    So for example, if large firms have a significantly lower insolvency rate than small firms, that is a justification for placing small and large firms in different segments and modelling them separately. Within service sector businesses, it may be that the relationship between trade creditors and trade debtors is an important predictor, whereas within retail, the level of stock is more critical, and within manufacturing, gearing levels or fixed assets play a more important role. This would justify treating these as separate segments, as by combining them and creating a single scorecard would be to dilute some of these fundamental differences.

    The fact that some firms file full accounts with a P&L, whilst others only file small accounts and report an abbreviated balance sheet, means that a scorecard that uses measures of profitability may work well within the full account filing population, but the measures of profitability could not be calculated within the small account filers and so the scorecard would be less predictive. And members of corporate groups should be treated not in isolation, but in the context of the group within which they operate.

    A logical conclusion would be that a separate model should be created for each micro-segment so one for small retail businesses that have filed just one set of small accounts; another for similar ones that have filed several years of accounts and enabling us to consider trends over time in variables; another for similar ones filing full accounts; others for medium-sized and large and very large retailers, and the same combinations for manufacturing firms, construction, property, business services, consumer services, public services, with all of the above populations split into independent firms and group members.

    Clearly, this could very quickly lead to hundreds, even thousands of truly homogeneous micro-segments being created. Indeed, the ultimate extension of the segmentation principle would be that even within apparently homogeneous population no two businesses are truly alike and each business should have a scorecard created specifically for its circumstances. The ratings agencies Moodys, S&P and Fitch adopt a company-by-company approach. But of course, they are assessing a finite, largely stable and volumetrically small population of firms (around 2.5% of the PPF universe matched to a rated business, although a further 10-15% are subsidiaries of

  • 18

    rated businesses), and this approach is not scalable to a wider, and consistently evolving, population.

    In practice, besides the obvious time implications of building so many separate scorecards, the reality is that for the scorecards to be statistically robust, and consider as many variables as possible, it is crucial that there are sufficient data points and specifically enough insolvent businesses within each segment on which to train the model. Since the scorecard modelling works by assessing the relative probabilities of insolvency between given characteristics, if one particular characteristic does not contain any businesses that subsequently go insolvent, then the relationship between the insolvency rate associated with this characteristic and any other cannot be measured. It could not reasonably be assumed that simply because no insolvencies were associated with the characteristic within the population in question and over the 6-year window of observation, the characteristic is inherently associated with a 0% insolvency rate.

    Consequently, there is a trade-off between defining as many homogeneous segments as possible (homogeneity being measured specifically in terms of their insolvency behaviour), yet having as few segments as possible to maximize the number of data points in each and, thereby, allow more variables to be considered in each model, maximizing the robustness of the model.

    Ultimately it was decided that industry-based segmentation should not be used. This is due to the ease with which the industry classification could be manipulated by employers to enable scoring based upon a more lenient scorecard.

    Instead, the total population is segmented into smaller sub-populations where the data availability or profile of the data, is markedly different. This approach allows for discrimination of the insolvency rating and allows discrete, homogeneous sub-populations, to be scored. The scorecards have been developed on all Companies House registered UK based businesses within PPFs employer portfolio. The segments are homogeneous: they have the same data available, different levels of insolvency rates are seen between one segment and another, and the variables that prove predictive within one segment are not the same ones that prove predictive or at least not with the same weighting within another segment.

    3.1.1 Primary segmentation The modelled population is segmented into independent businesses and those that are part of a corporate group. Group members and Independents should not be mixed together because there are fundamental differences in the complexity of the two types of business, and in the factors that underpin their chances of becoming insolvent:

  • 19

    1. Businesses that are part of a corporate group will have a different risk profile to those that are independent. A business that is part of a corporate group will have additional risks or benefits from being associated with other businesses.

    2. Factors such as a failing subsidiary or a strong parent company, potentially, have an impact on insolvency risk.

    3. Financial support from parent or sister companies through intra-group funding may

    provide assistance to corporate group members in a way that is unavailable to independent businesses.

    Being part of a corporate group may also affect how accounts are filed. Therefore this population is further split into those that are filing consolidated accounts and those that are filing non-consolidated accounts. There are two key reasons for splitting the population in this way: 1. the different segments have differing insolvency rates so should be treated using

    different scorecards; 2. the variables available for those filing consolidated accounts is an aggregate view of

    the group as a whole rather than for the individual company in question. As such the values within these variables are not comparable and so should be treated separately.

    3.1.2 Secondary segmentation The second segmentation is based on whether a company is filing full accounts which include P&L information. The choice to further segment the population based on the type of accounts filed is justified in two ways: firstly due to the similarities in information available, and secondly because the evidence shows that there are different insolvency rates associated with firms filing small accounts compared to those filing full or consolidated accounts clearly the latter point is correlated with the size of firm, as shown in Figure 1. The increased depth of data within these full accounts allows for the inclusion within the scorecard of variables not available for small accounts filers. If the two types of account filers were mixed in the same scorecard, it would either result in P&L information not proving predictive, due to the high proportion of firms with an unknown value (all those filing small accounts), or the P&L information would be one or more of the components of the scorecard, but small account filers would all receive the same coefficient for this / these characteristics (the coefficient for unknown), reducing the ability to discriminate between them.

  • 20

    When using several financial variables in a model and mixing into the same scorecard different sub-populations, some of which have this financial information available and others not, all the financial variables with unknown values are correlated among themselves, which creates a bias in the results of the model.

    As shown in figure 2, the risk profile (insolvency rates) for businesses filing small accounts are significantly worse than for those filing full accounts. The account type segmentation captures the discrimination in risk levels almost as much as a size-based segmentation (using total assets - see figure 3). It also allows for greater stability over time, since a profit-seeking business will always be scored using the same scorecard unless it:

    a) becomes part of a group / ceases to be part of a group b) moves from filing one type of accounts to another

    With a size-driven segmentation, businesses can be scored using one scorecard at one point in time, then, if through growth or decline (which could be barely significant) they cross a threshold, they would be scored using an entirely different scorecard. This introduces more of a cliff-edge risk to the score, as businesses close to the size-boundary between scorecards may be able to evaluate which scorecard they prefer to be scored by and take steps to ensure they are scored in their preferred way.

    Figure 2: Employer size and type of accounts 0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Independent -Full Account

    Independent -Small/Medium

    Account

    Independent -Others

    Non ConsolGroup Memb -

    Full

    Non ConsolGroup Memb -Small/Medium

    Non ConsolGroup Memb -

    Others

    ConsolidatedGroup Members

    >250M

    100M to 250M

    20M to 100M

    5M to 20M

    1M to 5M

  • 21

    Figure 3: Secondary segmentation insolvency rates Segmentation Number of

    Employers Insolvencies Insolvency

    Rate Consolidated Group Accounts 13,260 152 1.15%

    Group Members Full Accounts 33,471 239 0.71%

    Group Members Small Accounts 4,308 82 1.90%

    Independent Full Accounts 5,497 53 0.96%

    Independent Small Accounts 4,081 102 2.50%

    Total 60,617 628 1.04%

    Figure 4: Size-based segmentation insolvency rates Total Assets Number of

    Employers Insolvencies Insolvency

    Rate

  • 22

    Figure 5: Tertiary Segmentation insolvency rates Segmentation Number of

    Employers Insolvencies Insolvency

    Rate Group Members Full: 50m plus 9,062 46 0.51%

    Group Members Full: 10m - 50M 10,910 74 0.68%

    Group Members Full: 0 10m 13,499 119 0.88%

    Total 33,471 239 0.71%

    3.1.4 Not-for-Profit scorecard Having constructed separate scorecards for each of the segments described above, using the methodology outlined in section 7, analysis was undertaken into the performance of the model. We considered the predictive power of each scorecard and of the overall model (measured by the Gini coefficient, described in 3.1), and also compared the predicted insolvency rate within various subsets of the population over the 6-year period with the observed insolvency rate over the same period. This confirmed that the model was predictive in both aspects across different sectors and sizes of business. However, within the sector including Health, Education, Local and Regional Government and Charities, the model was less predictive than in other sectors. More critically, it significantly overestimated the insolvency rate that this population would experience. As a result the scores awarded to entities in this sector were on average weaker than the performance of the sector merited. Several options to overcome this were discussed, including the application of a single score for all entities in the sector, and applying a multiplier to scale down the insolvency probability of all entities in the sector to reflect the difference between the predicted and observed central tendency. The decision reached, however, was to construct a separate scorecard in addition to the existing seven based on the segmentation above. Any entity meeting the definition of Not-for-Profit (NFP) will be scored based on this scorecard, irrespective of where it falls within the primary, secondary and tertiary segmentation. Whilst significant work was undertaken to finalise the precise definition by which NFPs will be identified, for the purposes of constructing this additional scorecard, an approximate working definition of NFP was used. Since the scorecards are constructed using Companies House insolvency information, only NFPs registered at Companies House could be included within the population used to build the scorecard. Rather than using Industry Sector classifications, which are often inaccurate, and can easily be changed by the business

  • 23

    without any need to verify the accuracy of the change, we used the legal form that each entity had registered under as the means of identifying NFPs. Entities registered under section 60 of the Companies Act 2006, i.e. limited by guarantee rather than by shares, with no shareholders benefitting from the performance of the business and required to operate on a not-for-profit basis were classed as NFP, as were Industrial Provident Societies and Royal Charter companies. This gave a population of around 3,500 entities within the PPF universe, and fewer than 40 insolvencies over the 6-year observation period. This was deemed to be insufficient to build a statistically robust model and so this population was supplemented with other businesses in one of the above legal forms outside of the PPF universe. To ensure this population was as similar in profile as possible to the PPF employers included in the NFP population, analysis was undertaken which led to the removal from the supplemental population of all entities incorporated post 2006 or with total assets of less than 100,000 (a significant number within the non-PPF population, but hardly any within the PPF population). This yielded a model development population of over 20,000 entities with around 500 insolvencies, with the insolvency rate within the non-PPF population being very close to that within the PPF population. Figure 6: Not-For-Profit insolvency rates Segmentation Number of

    Employers Insolvencies Insolvency

    Rate Not-For-Profit 10,996 28 0.25%

    3.1.5 Consolidated / Large Corporate override Within the development sample, the consolidated scorecard was applied to:

    1. All group members filing consolidated accounts; 2. All group members seen to be the ultimate parent company within the group (the

    majority of which would satisfy the first criteria, and file consolidated accounts, this is not always the case).

    This scorecard is always applied to any non-UK accounts sourced (which were not in the development sample). Subsequent analysis of the scorecards suggested that within the population being scored on the Group Member Full accounts 50m+ turnover scorecard, there are naturally some that are really very large indeed, and it was queried whether these should be scored on the basis of the Consolidated Account scorecard (which should therefore be renamed Consolidated / Large or Non-UK Corporate scorecard) instead. Analysis showed that the subset of firms in

  • 24

    the Group Full 50m+ turnover scorecard that also had 500m+ of total assets i.e. the very top end of this population in terms of size had a higher Gini if the Consolidated Accounts scorecard was applied to them than the current 50m+ scorecard. Although the impact on the overall Gini was negligible, due to the small proportion of firms affected, it was deemed justifiable to implement this override. Consequently, in implementing the scorecards, the population of corporate group members will be scored based on the consolidated / large or non-UK corporate scorecard if they meet one of the following criteria:

    1. filing consolidated accounts; 2. seen to be the ultimate parent within the group; 3. non-UK company not filing at Companies House and accounts sourced from the

    relevant country; 4. filing accounts showing turnover to be greater than 50m AND total assets greater

    than 500m.

    3.1.6 Overview of scorecards Figure 5 below shows the total volume of records included within each scorecard within the model development (including the holdout population). Figure 5: Summary of insolvency rates by scorecard Scorecard Number of

    Employers Insolvencies Insolvency

    Rate Consolidated Group / XL / Non-UK Accounts

    13,260 152 1.15%

    Group Members Full Accounts: 50m plus

    9,062 46 0.51%

    Group Members Full Accounts: 10 50m

    10,910 74 0.68%

    Group Members Full Accounts: 0 10m

    13,499 119 0.88%

    Non-Consolidated Small Accounts 4,308 82 1.90%

    Independent Full Accounts 5,497 53 0.96%

    Independent Small Accounts 4,081 102 2.50%

    NFP 10,996 28 0.25%

    Total 71,613 656 0.92%

  • 25

    The PPF populations not included within this sample were:

    Unmatched entities Unincorporated entities (individuals, sole traders and partnerships) Non-UK entities (filing accounts outside the UK)

    For the application of the scorecard, accounts will be sourced for such entities wherever possible and the formula for the appropriate scorecard applied to these accounts. Where this is not possible, scheme, industry or blended average scores will be applied as detailed in section 9.5

    3.1.7 Test sampling overview The employer data provided to create the bespoke model spanned years 2007 through to 2013. As such there are 6 distinct periods of time used for the building of the insolvency risk model. For each of these periods a full years worth of insolvency data is available to initially train and further test the model. Each year accounts for 15-20,000 unique companies. Ten per cent of the population is initially excluded from the model development and subsequently used as a test sample. Each company and time period is considered as a distinct data point. This means that any given company at a given point in time is considered to be a separate entity from the same company at a different point in time. As time dependent variables are used (e.g. distinct financial variables for accounts filed at a given time) this is a reasonable assumption. The benefit of using this method is that through random sampling there is a representative population from each period for both the model development and testing. This alleviates issues surrounding economic changes that may cause bias to stronger or weaker economic time periods within the development and additionally within the testing of the model. Random sampling of the test population may not, however, produce a good representative test sample. Consequently, stratified sampling, using proportional allocation, was used instead. This is done on a sector/size basis in order to include different types of companies within the test sample across the entire population. For example through proportional allocation the test sample has the same proportion of sector A/size A, sector A/size B, sector B/size A as the overall population. Through this stratified sampling the test sample will be representative across sector and size. Additionally other variables, such as age, annual account types and independent/corporate group members, within the test sample, are compared to the overall population to verify that these variables are representative of the entire population.

  • 26

    3.1.8 Insolvency data collection It was noted subsequent to the initial model development that, owing to the way in which data is provided to Experian by Companies House, there were two inconsistencies between the data supplied and the statutory definition of insolvency for PPF purposes:

    The first is that the data included Members' Voluntary Liquidation (MVL) given the declaration of solvency required, this is not an insolvency event for PPF purposes.

    The second is that Company Voluntary Arrangements (CVAs) were not included. The impact of this omission is minimal, since the majority of CVAs in the PPF universe were ultimately followed by another insolvency event that had been captured, albeit with a slight timing difference.

    As a consequence of these two inconsistencies, the volume of insolvencies on which the model was developed and trained was just over 900 (with an insolvency rate of 1.02%), whilst the revised data gives a total of just under 900 insolvencies (with an insolvency rate of 0.89%). Subsequent analysis of the impact of correcting these data inconsistencies showed that, in fact, the model became more predictive. This is not surprising, since the variables used are designed to identify signs of stress, and MVL is not necessarily an event caused by stress. Consequently, it was not deemed necessary to re-evaluate the inputs to the scorecards in the light of these inconsistencies. However, an adjustment factor was applied to each scorecard to reflect the lower observed insolvency rates seen after correction of the datasets.

    3.2 Good/Bad Definition For scorecard development, insolvent companies are labelled Bad. Insolvency is recorded and acknowledged according to Companies House, as detailed below. Non-insolvent companies are labelled Good. Companies that have closed through dissolution are considered non-insolvent and are, therefore, given a Good label. The Good/Bad definition is measured over a 12-month outcome from the point of scoring. The point of scoring for each company is considered to be March 31st of each year. If a company becomes insolvent within the next 12-months from the point of assessment this company is labelled as Bad.

  • 27

    3.3 Modelling method Experian uses logistic regression to create the scorecards using the SAS business analytics software. Logistic regression is used, as the dependent variable is binary (a company either becomes insolvent or not). There are several advantages of using logistic regression:

    It is robust; dependent and independent variables need not be normally distributed.

    It can handle nonlinear effects and does not assume a linear relationship between dependent and independent variables.

    It provides an objective assessment of the probability of a given outcome;

    Intuitive understanding of the results enables additional coherence checks, alongside

    statistical tests. Splitting the population into differing segments, a range of different variables can be used depending upon which are predictive and whether a particular data element is available. As probabilities of insolvency are given for each segment, direct comparisons for each record can be made. Each component of the insolvency score is validated using the most demanding techniques:

    Wald Test (used to assess the significance of prediction of each predictor); calibration based on the Gini index (Forecast index); use of hold-out sample to validate the model is not over-fitted to the development

    population; stability test over time.

    Using these techniques the modelled scores are robust and consistent over time. Initially an extensive list of variables, from a variety of sources was considered for use within the model. Each of these variables was analysed individually to determine whether they were predictive, based upon individual variables Gini coefficients and the fill-in rate (the % of companies having the data) for each variable. Each variable was split into a number of standardised buckets to identify whether the variable discriminates at these levels. This provides a view of which variables should be used within the scorecards. The combination of the variables is also considered when selecting variables for use within the model. This is analysed using Principal Component Analysis to further reduce the number of variables to consider for modelling, through eliminating variables with high levels of multi-collinearity prior to regression modelling.

  • 28

    4. Data Sources The specific insolvency scorecards use data collected by Experian, the vast majority of which comes directly from Companies House. In the development of the model both annual and interim accounts were used provided they had been filed with Companies House. In making the model operational both interim and annual accounts will be used provided they are filed with Companies House. Where financial information is sourced from elsewhere annual accounts will be used. Within the models, only a fraction of a per cent of employers are scored using accounts with a year-end in the 3 months prior to the start of the levy period (i.e. 31st March each year). Around 1% of accounts had a year-end in the previous quarter (Oct Dec), 4% in the quarter before that and 7% in the quarter before that. The majority of accounts had a year-end 12-15 months prior to the score date i.e. the previous Jan - March (18%) or 16-18 months prior i.e. the previous Oct - Dec (40%). 11% were older than this. For businesses that are not registered with Companies House, the financial information may be sourced from elsewhere.

    1. Not-for-Profit entities not filing accounts at Companies House have been searched for on the Charities Commission database where we are able to match to this dataset, and financials returned from this source

    2. Non-UK companies will currently have data sourced through Dynamic Business Information Limited (DBI), one of Experians international partners that specialises in the provision of offline, freshly-investigated international company reports, which includes sourcing of additional document images, registry extracts and financial statements. From early in the 2015 financial year, much of the information on non-UK companies will be available via a new Global Data Network. This gives more immediate online access to reports from a number of countries where Experian has a presence: UK, US, ROI, Denmark, Norway, and others where Experian have BIGNet partners: Germany and Austria (Credit Reform), France (Coface Services), Spain (Axesor), Sweden (UC), Finland (Asiakastieto), Netherlands, Belgium (Graydon) are all included in the first phase of this project which will roll out in June 2014.

    Below are details of the sub-sections of data that were analysed within the model build. Not all sub-sections of data listed below were used within the final scorecards.

  • 29

    4.1 Business summary data This is data found on the annual return and other data feeds from Companies House. It allows us to derive such variables as time in business and industry classification. This data is present for all public and private limited companies. Where a business is not registered with Companies House this data is collected from alternative sources. In the case of businesses within the Public Sector much of this data is collected by Oscar Research, specialists in Public Sector Data. If a business is a non-registered entity and not covered by Oscar Research or DBI then, where possible, this data is collected from business directory sources, including: Thomson, 118 Information, Intelligent Data Services, B2 Group and Local Data Company, all specialists in collecting business data.

    4.2 Balance sheet and P&L data This data comes from the accounts submitted to Companies House. All companies that have filed P&L accounts will have the highest level of these fields populated. Smaller companies are not required to provide all of this data. The scorecard variables derived from the balance sheets are a mixture of absolute values, ratios of balance sheet fields, ratios of balance sheet fields in relation to industry sector and variations in balance sheet items over time. Primarily the latest set of accounts is used as the basis for most of the financial information. For the majority of businesses these will be filed once a year, though a small proportion of employers are listed businesses reporting interim accounts, which will be used where available in updating scores (although for the purposes of the model construction, only annual accounts were used). Experian collects financial accounts data from Companies House on a daily basis and is an industry leader in both accuracy and speed with which this data becomes available. Accounts data for non-UK companies is collected by DBI and is used to score these entities. However they are not included within the initial scorecard build. DBI will check the list of non-UK businesses monthly for any newly-filed accounts and provide this data to us once a month Similarly, accounts data for charities and Not-for-Profit organisations is collected through the Charity Commission in England and Wales, Scotland and Northern Ireland. This too can be refreshed monthly and possibly daily. We are in discussions around the possibility of daily updates to bring the frequency of these accounts in line with those from Companies House.

  • 30

    4.3 Mortgages and charges data This data is captured from mortgages and charges filed at Companies House. The data includes charge types (mortgage, debenture, legal charge, etc.), created and satisfied dates, and lender details. Charges identified by the PPF as accepted Type B contingent assets (which specifically relate to the pension scheme) are excluded from the scoring, to ensure that if the only charge registered against an employer is this one, it does not count against them in their score. Investigation is currently under way as to whether any additional charges should justifiably be excluded from the scoring: specifically if the description of the charge clearly indicates that it is an amendment or addition to an existing charge, rather than a genuinely new charge, or if the lender is not a recognised lending institution. In terms of implementation of this data item for a given entity, only charges registered against that entity are considered this applies to the consolidated accounts scorecard as well, where only charges registered against the parent company being scored are considered. However, in calculating the group or parent strength score, the latest charge registered against any of the businesses within that corporate group is deemed to be the most recent charge and the age of this charge is what feeds into the group strength score.

    4.4 Payment performance data Experian currently captures information from the sales ledgers of around 5,000 B2B (Business to Business) suppliers (both directly and via invoice discounting houses) including all of the big utility firms, water companies and telecoms In total, more than 20m transactions per month are captured in this way. Experian provides the contributors of this information with insight into their customer portfolio, based on combining the sales-ledger data with Experians data assets. In exchange for this service, the contributors allow Experian to derive aggregate information about their customers specifically how quickly they are paying their invoices. Thus Experian might find company ABC Ltd on the sales ledger of 5 of these contributors. We see the date on which invoices are sent, the payment terms, and then we see when the invoices are paid. Some are paid by direct debit, others are paid early or on time, some are paid late or even very late. By aggregating the information from multiple contributors, we are able to establish a monthly picture of how quickly ABC is paying its invoices - Days Beyond

  • 31

    Terms or DBT, based on the sample of its suppliers that contribute data to the Payment Performance programme. Analysis of this data shows it to be predictive of businesses that are struggling. If a business pays slowly or goes from paying on time to paying late it will often be because it is experiencing cash flow problems and is paying its suppliers late as a result. This is particularly powerful amongst smaller businesses where little financial accounts data is available. The processing of the contributor ledgers and subsequent calculation of the monthly DBT values for businesses appearing on the ledgers of these contributors is business-as- usual for Experian, and has not changed for many years, although that is not to say it will not change in future. The contributors, whilst largely stable, will change over time. Experian naturally aims to get as many suppliers as possible contributing to this scheme and this can, of course, impact the scores over time. This is data supplied from the trade debtor ledgers of a large number of contributing companies and details how their trade debtors are settling accounts. It includes information on how quickly a company settles trade debts as well as data on accounts that have received no payments, are in collections, or have been accepted on cash only terms. This data is used and available at an aggregated view and is not used at an individual creditor level. The aggregated view of the payment performance data is updated on a monthly basis.

    4.5 Shareholders database This details the number and type of shareholders, along with the level of investment for the shareholders. It also provides details of company ownership, where the shares of a company are owned by members of a corporate group. This allows calculation of shareholders of corporate group, including parent companies and subsidiaries. The source of this data is Companies House.

    4.6 Companies House data-capture governance Current standard quality control checks include:

    prior to loading, checking the latest information against the previously held information;

    automated de-duplication of records; automated exception reporting; checking with Companies House if there are any inconsistencies, if there is anything

    that does not make sense or seems wrong;

  • 32

    once the information has been loaded on to our database, checking again to make sure it is not corrupted during the load process;

    both pre and post the data load, the application of automated statistical analysis routines such as number of records and special flags are applied to assess accuracy these are supported by manual checks where necessary

    regular database statistic reports which are used for monitoring ongoing volume and quality

    automated data validation routines which are used to verify data quality and any exceptions that fail these routines are investigated within 24 hours.

  • 33

    5. Data variables Initial un-variant analysis was carried out on a number of variables. This analysis was carried out to determine both the data population rate for a particular variable (proportion of all employers reporting the variable in question) and the Gini coefficient for each single variable. Those that had a high fill-in-rate (above 60%) and showed signs of predictivity (single variable Gini>15%) within a particular model were considered for the logistic regression. The following is a list of all variables that were considered for the scorecards. Where values are missing from the data they will be treated differently depending upon the variable and the observed insolvency rates for the unknown population in relation to other bandings of the variable. In some case these values may be considered as 0 values (in particular for change over time variables), whilst in others they will be considered a separate category, particularly if the absence of a variable could be considered as predictive in and of itself (for example, the absence of a mortgage registered at Companies House).

    5.1 Variables for testing Variable Balance Sheet Fixed Assets Land & Buildings Fixtures & Fittings Plant & Vehicles Total Tangible Fixed Assets Intangible Fixed Assets Other Fixed Assets Total Fixed Assets Current Assets Stocks Work In Progress Total Stocks / Work In Progress Trade Debtors Group Loans Director Loans Other Debtors Total Debtors Cash At Bank Other Current Assets Total Current Assets

  • 34

    Current Liabilities Trade Creditors Bank Overdraft Group Loans Director Loans Hire Purchase Leasing Total Hire Purchase / Leasing Short Term Loans Accruals / Deferred Income Social Security / VAT Corporation Tax Dividends Other Current Liabilities Total Current Liabilities Balance Indicator Fields [1] Working Capital Total Assets Capital Employed Long Term Liabilities Group Loans Director Loans Hire Purchase Leasing Total Hire Purchase / Leasing Other Long Term Loans Accruals / Deferred Income Other Long Term Liabilities Total Long Term Liabilities Provisions Deferred Taxation Other Provisions Total Provisions Balance Indicator Fields [2] Total Net Assets Shareholders Funds Issued Capital Share Premium Accounts Revaluation Reserve Retained Earnings Other Reserves Total Shareholders Funds

  • 35

    Balance Indicator Fields [3] Net Assets Minority Interests Net Worth P&L Items Gross Profit Turnover (UK) Turnover (Exports) Total Turnover Cost of Sales Total Expenses Gross Profit Pre-Tax Profit / Loss Depreciation Other Expenses Operating Profit Other Income Interest Payable Exceptional Items Discontinued Operations Pre-Tax Profit / Loss Retained Profit / Loss Tax Payable / Credit Extraordinary Items / Debits Dividends Retained Profit / Loss Balance Indicator Fields [4] Minority Interests Cash flow Items Operating Activities Return on Investments Taxation Investing Activities Capital Expenditure Acquisitions and Disposals Equity Dividends Paid Management of Liquid Resources Financing Cash flow Total Remuneration Items Non-Audit Fee Amortisation of Intangibles Number of Employees Wages

  • 36

    Social Security Costs Other Pension Costs Employees Remuneration Fees / Emoluments Pensions Other Costs Directors Remuneration Highest Paid Director Number Of Directors Account Type Change over 4 Years Capital Employed Cash At Bank Total Current Assets Total Current Liabilities Depreciation Employees Remuneration Number of Employees Total Fixed Assets Issued Capital Total Long Term Liabilities Net Worth Other Current Liabilities Other Debtors Other Income Pre-Tax Profit / Loss Retained Profit / Loss Retained Earnings Total Turnover Total Shareholders Funds Social Security / VAT Total Stocks / Work In Progress Total Tangible Fixed Assets Total Net Assets Total Assets Trade Creditors Trade Debtors Working Capital Ratios - Considered in absolute terms and relative to sector Current Ratio = Total Current Assets /Total Current Liabilities Acid Test = (Total Current Assets Total Stocks / Work In Progress) /Total Current Liabilities Stock Turnover = (52 * Total Turnover /Accounting Period) /Total Stocks & Work In Progress Credit Period (Days) = 365 * Total Debtors /(52 * Total Turnover /Accounting Period) Working Capital by Sales = 100 * Working Capital /(52 * Total Turnover /Accounting Period)

  • 37

    Trade Credit by Debtors = Trade Creditors /Total Debtors Return on Capital = 100 *(52 * Pre-Tax Profit & Loss /Accounting Period) /Capital Employed Return on Assets = 100 *(52 * Pre-Tax Profit & Loss /Accounting Period) /(Capital Employed +Total Current Liabilities) Pre-Tax Margin = 100 * Pre-Tax Profit & Loss /Total Turnover Return on Shareholders Funds = 100 *(52 * Pre-Tax Profit & Loss /Accounting Period) /Total Shareholders Funds Borrowing Ratio = 100 *(Long Term Group Loans +Long Term Director Loans +Long Term Hire Purchase +Long Term Leasing +Other Long Term Loans +Short Term Loans +Current Liab. Group Loans +Current Liab. Director Loans +Current Liab. Hire Purchase +Current Liab. Leasing +Bank Overdraft) /(Total Shareholders Funds Intangible Fixed Assets) Equity Gearing = 100 * Total Shareholders Funds /(Capital Employed +Total Current Liabilities) Debt Gearing = 100 * (Long Term Group Loans +Long Term Director Loans +Long Term Hire Purchase +Long Term Leasing +Other Long Term Loans) /(Total Shareholders Funds Intangible Fixed Assets) Interest Cover = Pre-Tax Profit & Loss /Interest Payable Sales by Tangible Assets = (52 * Total Turnover /Accounting Period) /Total Tangible Fixed Assets Average Remuneration per Employee = (52 * Employees Remuneration /Accounting Period) /Number of Employees Profit per Employee = (52 * Pre-Tax Profit & Loss /Accounting Period) /Number of Employees Sales per Employee = (52 * Total Turnover /Accounting Period) /Number of Employees Capital Employed per Employee = Capital Employed /Number of Employees Tangible Assets per Employee = Total Tangible Fixed Assets /Number of Employees Total Assets per Employee = (Capital Employed +Total Current Liabilities) /Number of Employees Employee Remuneration by Sales = 100 * Employees Remuneration /Total Turnover Creditor Days (Cost of Sales Based) = 365 * Trade Creditors /(52 / Accounting Period) *(Total Turnover Gross Profit) Creditor Days (Sales Based) = 365 * Trade Creditors /(52 / Accounting Period *Total Turnover) Mortgages and Charges, CCJS, and Age Age of mortgage/charge Number of mortgages/charges Number of CCJs CCJ Value CCJ Age Company Age Payment Performance Days Beyond Terms Days Beyond Terms 3 Month Change Days Beyond Terms 6 Month Change Days Beyond Terms 12 Month Change Days Beyond Terms vs 12 Month Change 12 Month Change vs Number of Increases Days Beyond Terms vs 12 Month Change vs Number of Increases

  • 38

    Director Details Number of directors Total number of companies associated with directors Average number of companies associated with directors Total number of companies associated with directors (grouping no directors and unknowns) Average number of companies associated with directors (grouping no directors and unknowns) Percentage of insolvent director companies Percentage of failed director companies Group Details Average Strength of Group Strength of Parent

    5.2 Scorecard variables Whilst numerous variables were tested in each model for predictive power and statistical significance, ultimately not all variables are used within the scorecards. Below are details of all of the variables used in one or more of the score cards. Note that within the Not-for-Profit scorecard, shareholders funds are replaced by Total Net Assets, which is essentially the same calculation (the difference between Total Assets and Total Liabilities), but reflects the nature of NFPs and the lack of shareholders within these types of organisation Variable Source Calculation Notes Acid Test Financial Accounts [Current Assets

    Stock and WIP] divided by Current Liabilities

    Average Remuneration per Employee

    Financial Accounts Total Employee Remuneration divided by total number of Employees

    Employee Remuneration figure annualised if accounting period is not 52 weeks

    Capital Employed Financial Accounts Taken directly from filed accounts

    If not stated on filed accounts, calculated as Total Assets minus Current Liabilities

  • 39

    Variable Source Calculation Notes Capital Employed Per Employee

    Financial Accounts

    Capital Employed divided by total number of Employees

    If not stated on filed accounts, Capital Employed calculated as Total Assets minus Current Liabilities

    Cash Financial Accounts

    Taken directly from filed accounts

    Change in Employee Remuneration

    Financial Accounts

    [Employee Remuneration in Year N minus Employee Remuneration in Year N-3] divided by Employee Remuneration in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Change in Fixed Assets

    Financial Accounts

    [Fixed Assets in Year N minus Fixed Assets in Year N-3] divided by Fixed Assets in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Change in Net Worth Financial Accounts

    [Net Worth in Year N minus Net Worth in Year N-3] divided by Net Worth in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

  • 40

    Variable Source Calculation Notes Change in Shareholders Funds

    Financial Accounts [Shareholders Funds in Year N minus Shareholders Funds in Year N-3] divided by Shareholders Funds in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Change in Stock and Work in Progress

    Financial Accounts [Stock + Work in Progress in Year N minus Stock + Work in Progress in Year N-3] divided by Stock + Work in Progress in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Change in Total Assets

    Financial Accounts [Total Assets in Year N minus Total Assets in Year N-3] divided by Total Assets in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Change in Turnover Financial Accounts [Turnover in Year N minus Turnover in Year N-3] divided by Turnover in Year N-3

    Year N figure taken from latest set of accounts filed prior to the date of score. Year N-3 figure taken from the latest set of accounts filed prior to 3 years before the date of score

    Company Age Companies House registration date

    Time elapsed between date of incorporation of the business and date of score calculation

  • 41

    Variable Source Calculation Notes Creditor Days (Sales Based)

    Financial Accounts [Trade Creditors divided by Turnover] x 365

    Turnover figure annualised if accounting period is not 52 weeks. If reported turnover figure is 0, or null but Other Income figure is positive, the Other Income figure is used in place of Turnover

    Current Ratio Financial Accounts Current Assets divided by Current Liabilities

    Days Beyond Terms: Latest Month

    Experian Payment Performance data

    See below for details

    Equity Gearing Financial Accounts Shareholders Funds Divided by Total Assets

    Fixed Assets Financial Accounts Taken directly from filed accounts

    Mortgage Age Companies House Mortgages and charges data

    Time elapsed between most recently registered mortgage or charge at Companies House and date of score

    Parent Strength Various See below for details Pre Tax Margin Financial Accounts Pre-Tax Profit divided

    by Turnover Turnover and Pre-Tax Profit figures annualised if accounting period is not 52 weeks. If reported turnover figure is 0, or null but Other Income figure is positive, the Other Income figure is used in place of Turnover

    Pre Tax Profit Financial Accounts Taken directly from filed accounts

    Pre-Tax Profit figure annualised if accounting period is not 52 weeks

  • 42

    Variable Source Calculation Notes Retained Earnings Financial Accounts Taken directly from

    filed accounts

    Return on Assets Financial Accounts Pre-Tax Profit divided by Total Assets

    Pre-Tax Profit figure annualised if accounting period is not 52 weeks

    Return on Capital Financial Accounts Pre-Tax Profit divided by Capital Employed

    Pre-Tax Profit figure annualised if accounting period is not 52 weeks. If not stated on filed accounts, Capital Employed calculated as Total Assets minus Current Liabilities

    Return on Shareholder Funds

    Financial Accounts Pre-Tax Profit divided by Shareholders Funds

    Pre-Tax Profit figure annualised if accounting period is not 52 weeks

    Sales by Employee Financial Accounts Turnover divided by number of Employees

    Turnover figure annualised if accounting period is not 52 weeks. If reported turnover figure is 0, or null but Other Income figure is positive, the Other Income figure is used in place of Turnover

    Shareholders Funds Financial Accounts Taken directly from filed accounts

    Total Assets Financial Accounts Taken directly from filed accounts

    Total Stock & Work in Progress

    Financial Accounts Taken directly from filed accounts

  • 43

    Variable Source Calculation Notes Turnover Financial Accounts Taken directly from

    filed accounts Turnover figure annualised if accounting period is not 52 weeks. If reported turnover figure is 0, or null but Other Income figure is positive, the Other Income figure is used in place of Turnover

    Turnover by Stock Financial Accounts Turnover divided by Stock

    Turnover figure annualised if accounting period is not 52 weeks. If reported turnover figure is 0, or null but Other Income figure is positive, the Other Income figure is used in place of Turnover

    For all variables above from Financial Accounts, the source of these accounts will be one of the following. In the unlikely event that accounts can come from more than one of these sources, the following order of acceptance would apply:

    1) Companies House; 2) Charity Commission; 3) Office of the Scottish Charity Regulator; 4) Northern Ireland Charity Commission; 5) other permitted public source (complete list to be confirmed); 6) other source as instructed by PPF; 7) direct submission.

    Raw accounting information is captured as per Experians usual business rules, and with the quality-control checks set out in section 6.8 applied at the point of data capture. Any manipulation or subsequent calculations using these raw figures are detailed in the Calculation and Notes columns above. All information sourced from Companies House is updated on a daily basis. All other sources are updated at a minimum of once per month. Information on non-UK companies is likely to remain at this frequency, whilst the goal is to be able to update the Charity Commissions data daily to bring these into line with the Companies House frequency.

  • 44

    From the point of submission of data to Companies House, the target is to have captured all data within 24 hours and run QA and generated exception reports within 48 hours. Any record generating an exception report is withheld for manual checking within the following 24 hours before being entered into the database. Thus the expected elapsed time from submission of any set of accounts to the information being available within the database is no more than 3 days. A further day would elapse before this information would be reflected within the PPF score. However, due to the uneven spread of accounts being filed over the course of a year, at peak filing times it may not be possible to meet these target data-capture timeframes. Consequently, we will guarantee that any accounts submitted to the appropriate place (Companies House, Charity Commission etc.) by the end of a month will be included in the score as of the end of the following month. If this is not possible, the score at the end of the month following submission will be retrospectively updated once the information has been captured and verified. In practice, we will endeavour to process accounts as quickly as possible and would expect that accounts filed in the middle of a month will still be reflected within that same month-end score although this cannot be guaranteed.

    5.2.1 Parent strength score The purpose of the Parent Strength Score (or Group Strength Score) is to reflect the reality that many PPF employers are not independent entities, but are part of corporate groups, in which their own strength and therefore insolvency probability is linked to that of their parent and sister companies. The primary justification for including a measure of the overall strength of the corporate group to which any given employer belongs i