Calculated Risk: Default Statistics, Or Mortgage Math Is Hard

by Anonymous on 8/16/2008 09:15:00 AM

I am very pleased to offer you this post, which is actually a "Guest Nerd" offering by our regular commenter and expert mort_fin, who works on undisclosed mortgage matters in an undisclosed location and often straightens us out in the comment threads when the conversation gets to technical matters of statistical analysis of mortgages. I helped a little bit with this post (any errors in the tables are mine), but the bulk of this is mort_fin's.

Some important context for the genesis of this: it came about after dear mort_fin, and a number of our other regulars, spent most of a frustrating Saturday afternoon arguing with another commenter about this post on the FHA "DAP" program (the infamous seller-money-laundered-downpayment-assistance-program). We basically came to the conclusion that a lot of people who defend the DAPs are not arguing in good faith about the performance of these loans--they pick out statistics they don't ever define clearly and wield them in misleading ways.

Because things like the FHA DAP are such important public policy questions, it seemed to mort_fin that there was much to be gained by helping non-specialists get a better grip on the various default statistics that are available and what they do (and do not) actually tell you. This is UberNerditude at its finest, and I thank mort_fin for taking the time and effort to help us move the intellectual ball a few yards in the never-ending battle with the DAP shills.

*************

Mort_fin says:

A recent flap in the comments surrounding default rates in FHA, especially with respect to down payment assistance programs, showed how easy it is to misunderstand and abuse mortgage default rates. So I thought I’d take a shot at writing The Fairly Intelligent Person’s Guide to Default Statistics.

The first issue to note is just the words. Default, as Tanta has noted in a previous excellent UberNerd, has a fairly precise legal definition, and a fairly vague usage in the popular and financial press. To a lawyer, if you move and rent out your abode you are probably in default, even if you keep making your mortgage payment every month, since you have violated the clause in the note that says you will occupy the premises. When reading the trade press, delinquency usually means not paying the mortgage, and default might mean that foreclosure proceedings have started, have finished, are being negotiated, etc. You can’t understand the analysis if you don’t read the fine print in the definitions. I’m going to stick with default meaning “foreclosure has happened” for the following examples.

To keep everything clear, and countable on fingers without resorting to toes, let’s say 10 people all get mortgages in a year (a group all getting mortgages at the same time is a “vintage” or a “cohort”). All the initial examples will relate to these 10 people. They are color coded based on their mortgage status. The first thing to notice is that life is complicated, and there are a lot of possible outcomes. Some loans end in foreclosure (default), some are refinanced into another mortgage (which can then end in a variety of ways), some people sell the house and pay off the mortgage, and some people just sit there paying the monthly nut for 10 years or more. And you don’t follow people forever—who knows what happened to any of these folks after 10 years?

In our sample pool of ten loans, each originated in 2000, we have the following outcomes:

• Fred: Purchase mortgage for 1 year, 1 year foreclosure process, foreclosed, becomes renter
• Matilda: Purchase mortgage for 1 year, refinanced mortgage for 2 years, 1 year foreclosure process, foreclosed, becomes renter
• Jose: Purchase mortgage for 3 years, refinanced mortgage for 5 years, sells home, becomes renter
• Rashid: Purchase mortgage for 4 years, foreclosure for 1 year, foreclosed, becomes renter
• Seamus: Purchase mortgage for 4 years, foreclosure process for 2 years (stayed because of a bankruptcy filing), foreclosed, becomes renter
• LuAnne: Purchase mortgage for 4 years, refinanced mortgage for 1 year, refinanced mortgage again for 1 year, foreclosure process for 1 year, foreclosed, becomes renter
• Saty: Purchase mortgage for 2 years, purchases new home
• Mitko: Purchase mortgage for 5 years, refinanced mortgage for 5 years
• Bob: Purchase mortgage for 10 years

This may or may not be a “typical” set of outcomes for any given pool of ten mortgages. But these are all very possible outcomes, and the point of this little exercise is to see clearly just how these possible outcomes are—or are not—reflected in the default statistics we have come to rely on for measuring mortgage performance.

A vintage of loans something like this would get sliced and diced in various ways by analysts. You might see a “lifetime projected default rate” or a “cumulative to date default rate” or a “conditional default rate” or a “foreclosure initiated rate" (also sometimes call the “foreclosure inventory”). None of them is right or wrong, but any of them can be misunderstood or abused.

Start with the “cumulative to date default rate.” (Remember that this counts foreclosures completed, not started.) If these are loans originated in December of 2000, you might ask in 2000 or 2001 “what percent have gone bad?” and the answer would be zero. For most borrowers, it is rare to miss payments in the first year (the fact that it wasn't rare starting about 2 years ago should have been an enormous alarm bell for people), and it takes very roughly a year between the time people stop missing payments and the time they finish foreclosure (timeline varies widely by state). But at the end of the 2002, you have one default, Fred, for a cumulative to date default rate of 10% (1 in 10). At the end of 2003 it’s still 10%, but by the end of 2004 it’s risen to 20% because Matilda has also gone bad. But, wait a minute, Matilda refinanced in 2001, so she never defaulted on a 2000 originated mortgage. I guess it stays at 10%. The “to date” cumulative rate eventually rises to 30% as first Rashid, and then Seamus, go to default.

At the end of 2009 the “to date” is 30%, and assuming that these are 30 year mortgages, we still do not know the “lifetime projected default rate” since Bob is still out there with an active loan. You know that the lifetime cumulative rate will be at least 30% since it can’t go down (well, actually in states with rights of redemption, it theoretically could go down, but those are pretty rare events) and it can’t be more than 40%. If Bob stays good it’s 30% and if he goes bad it’s 40%. Since few people go bad after 10 good years you would probably project a 30% lifetime cumulative default rate for these loans. Matilda and LuAnne don’t count, since they refinanced and are a “success” as far as the original lender is concerned (although they might be failures from the perspective of a policy to promote homeownership). And if you’ve taken comfort in the fact that the “to date” cumulative default rate is zero at the end of 2002 you’re in for an uncomfortable surprise next year.

Note that if you want to assess what these things will cost you from a credit cost perspective, the relevant figure is lifetime cumulative defaults. When you originate the loans (which is the date that matters, since you can’t retroactively up the interest rate or the insurance premium, the horse is out of the barn at that point) all you have are projections. You don’t know what anyone has done, since they just started. After 5 or 6 years you can make a pretty good projection, but it’s far too late at that point. In this business you have two and only two options: highly uncertain knowledge when it’s useful, or very precise knowledge long after it’s useful. That’s why this business is so much fun. And the projections are sensitive not only to the quality of the underwriting (how did Fred manage to get a loan in the first place???) but also to future house prices and unemployment rates, and refinancing opportunities (a rolling loan gathers no loss).

But cumulative defaults aren’t the only, or even the most commonly presented, statistics. The commenter in the aforementioned Haloscan thread was led astray by the Foreclosures Initiated (or Foreclosure Inventory) statistic. This counts how many foreclosures are “in process.” Foreclosure is a process, not an event. The details vary by state, but a common method of judicial foreclosure is the filing of a “notice of default” in which the servicer tells the seriously delinquent borrowers that they are headed to court. This may start the foreclosure clock. Motions and countermotions are filed, court dates are scheduled and postponed, and a date for the sale of the property is set. This is the process that can take, in very rough terms, a year to play out.

In 2001 the foreclosure initiation rate in our example pool would be zero, but in 2002 it would be 11%. Why 11% you ask? Well, one loan (Fred) is in process, and there are 9 loans still active (Matilda refinanced, remember). So 1 out of 9 is 11% (with a little rounding). In 2005 the foreclosure initiation rate is 40%. Only 5 loans are still active, and two of them are in the process of being foreclosed upon.

Note that the foreclosure initiation rate tells you very little about how much of an insurance premium you needed to charge to cover the credit risk, or even whether you had a bunch of good loans or bad loans. This rate depends on the denominator as well as the numerator, and the denominator can change for all sorts of reasons, like borrowers moving and borrowers refinancing, that don’t have any direct bearing on whether these were good loans or bad loans. The inventory rate has its uses, but summarizing credit quality or expected costs isn’t one of them. It is a pretty sensitive number for summarizing current conditions – it tends to rise rapidly when things get bad, and fall back to earth when things get good.

The other commonly cited statistic is the CDR, the Conditional Default Rate. It is “conditional” because it is “conditioned by survival.” The denominator consists of all the loans that have survived until today, neither prepaying in the past nor defaulting in the past. Again, for the first two years, the CDR is zero. In 2002 it is 13% since 8 loans are still alive, and 1 is defaulting. In 2003 and 2004 it is back to zero, and then in 2005 it skyrockets up to 33%, as only 3 loans still survive, and one of them has gone bad. The CDR is useful as an input to complicated cash flow models, but by itself it doesn’t tell you much about credit quality, since, again, it depends on the denominator as much as it does on the numerator, and for older pools of loans the denominator can be pretty small.

Returning to our little pool of ten loans, these are the values we get for these three measures over ten years. (Click on the table to enlarge.) You can see how confused a conversation at any given point in time would be that tossed these numbers around without context:

The big analytical mistake you do not want to make here, of course, is the one Tanta likes to complain about in the work of various apologists for high-risk lending: assuming that if 30% of the loans in a given vintage fail, then 70% of the borrowers were “successful.” Here’s another way of looking at our example pool that contrasts the results of a standard vintage analysis (what happened to the loans that were originated in 2000?) with an actual borrower analysis (what was the mortgage performance of these ten borrowers over ten years?). You get very different numbers:

Now try a more complicated graphic, which looks a lot more like an active portfolio of loans than a static vintage. Imagine that 10 people a year are flowing into your sights as an analyst, and the world looks boringly the same from year to year—each new vintage performs just like the previous one.

Here's the tabular result:

In the first year, foreclosures (cumulative to date, inventory, and CDR) are all zero again. In year 2 the foreclosure rate, cumulative and CDR, are still zero, but the inventory is now 1/19. There are 9 loans still active from the first cohort, and 10 new loans have come into the picture. Of course, new loans are almost never in foreclosure, so letting new loans flow into the picture lowers the foreclosure inventory rate substantially. It is still the case that 30% of loans in each vintage and 50% of borrowers will ultimately go bad, but now the foreclosure inventory is a little over 5%, and the cumulative default rate and CDR are still zero. Go out one more year, and 10 new loans have flowed into the picture. 30 loans have come in, 3 have left via prepayment, and 1 of the remaining 27 is a foreclosure that has happened in that year. So the CDR is now a little under 4% (1 out of 28), and the cumulative claim rate is a little over 3% (1 out of 30). Interpreting the numbers from a dynamic pool of mortgages (loans constantly flowing into the system) is harder than interpreting a static pool (always looking at the same set of loans).

A great real-life example of cumulative (to-date and projected) default rates and CDRs can be found in FHA’s Actuarial studies. Go down to Appendix, and click on Econometric Results in Excel. There are tabs for All_Orig_CumC (All Originations Cumulative Claims – to FHA, a foreclosure is a claim, since they are an insurer, not a mortgage investor) and tabs for All_Orig_ConC for the Conditional Claim Rates. In the cumulative tab, note that FHA projects 15.89% of 2007 originations will ultimately go bad, and this is entirely a projection. For 2000, they project 7.61% will go bad – as these loans are now 7 years old, this is based on actuals of 6.73% having gone bad, and a projection that there will only be a few more foreclosures left to go. On the Conditional Claims rate tab, note that the expectation over the next year is that 0.5% out to 3% of loans are expected to go bad in the next year, depending on how old the loans are (which cohort they are in). It is these annual rates, properly accumulative (you have to adjust the figures so your denominator is always originated loans, not surviving loans), that get you anywhere from 6% to 22% lifetime foreclosure rates, for the good years vs. the bad years. You may want to revisit the HUD site next year to see how projections get revised. These are based on an August 2007 house price forecast. I suspect that has now been rendered inoperative.

The lesson to learn here is to ask questions. 1) What are the definitions in use? 2) Are you looking at a static or a dynamic population? 3) What are you trying to measure? 4) What is happening to the denominator in your ratio?

If you’re trying to ascertain credit costs, failures to date or failures over the past year won’t get you where you want to go. And if you’re trying to ascertain homeownership success, mortgage failures alone won’t give you an answer. Matilda and LuAnne “succeeded” on their first mortgages, but still had a sheriff evict them eventually. Jose and Tania didn’t get foreclosed—the statistics would simply count them as a “voluntary prepayment”—but it’s hard to say that homeownership was a success for them since they ultimately found the cost of owning impossible to maintain and were simply “lucky” enough to sell before they were foreclosed. And even if you’re trying to do an apples to apples comparison—like “Did the CDR for this pool exceed the CDR for that pool?”—it’s important to keep in mind that numerators and denominators can shift because of prepayments, not just defaults. We really don’t know what was motivating the refinances in our pool—lowering interest rates? Taking cash out? We don’t really know whether the refinances improved or worsened the borrower’s actual financial position, but we do know that higher or lower prepayments in a pool can certainly make statistics like CDR “look” more or less frightening.

The unfortunate truth is that mortgage analysts simply do not in any normal circumstances have access to a dataset like the one we have made up for this post. Whether you are looking at a static pool with a single vintage or a dynamic portfolio with multiple vintages, you are tracking loans, not borrowers or properties, and you are tracking “prepayments” of those loans. You simply do not know whether that prepayment was a refinance or a sale of the home; you don’t know what that borrower did after the prepayment. This information simply isn’t in the standard databases. The bottom line about making claims regarding borrower “success” by reference to mortgage default statistics is “you can’t necessarily get there from here.”