Fungal Houses

Ever wondered why your flat’s Zestimate bounces around so much?

In high school economics class you might have learned about fungible goods. This strange word refers to things that could be swapped without the owners especially caring. A dollar is almost perfectly fungible, and so is an ounce of pure silver. Paintings and emotional knick knacks are not at all fungible. Fungible stuff is easy to trade on a centralized market, since a buyer should be happy to deal with any seller. This network effect is so important that markets “push back,” and invent protocols to force fungibility. Two arbitrary flatbeds of lumber at Home Depot are probably not worth the same amount of cash. However the CME’s random length lumber contract puts strict guidelines on how that lumber could be delivered to satisfy the obligation of the future contract’s short trader.

Real estate is seriously non-fungible. Even a sterile McMansion in the suburbs can have a leaky roof, quirky kitchen improvements, or emotional value for the house-hunting recent college grads. If we consider many similar homes as a basket, or a portfolio of the loans secured by the homes, then the idiosyncrasies of each home should net out to zero overall. Across those ten thousand McMansions, there should be a few people willing to pay extra for a man cave, but also a few people who would dock the price. This is the foundation of real estate “structured products,” such as the residential mortgage backed securities (RMBS) of recent infamy. Like flatbed trucks delivering a certain sort of wood for a lumber futures contract, a RMBS makes a non-fungible good more fungible.

The Usual Place
The combined idiosyncrasies of non-fungible things rarely net out to exactly zero, especially during a financial crisis. Nonetheless traders and real estate professionals want to think about a hypothetical, “typical” property. We define a local real estate market by city, neighborhood or even zipcode. How do we decide the value of a typical property? There is an entire industry built around answering this question. One simple, clean approach is to sample a bunch of real estate prices in a local market at a certain point in time, and then average the prices. Or maybe use a more robust descriptive statistic like the median price.

The most readily available residential home prices in the U.S. market are “closed” transactions, the price a home buyer actually paid for their new place. Using a closed transaction price is tricky, because it is published several months after a property is sold. Can a typical home price really be representative if it is so stale?

Even if we ignore the time lag problem, there is another serious challenge in using transactions to calculate a typical home price. Within any local real estate market worth thinking about, there are very few actual transactions compared with overall listing activity and buzz. Your town may have a hundred single-family-homes listed for sale last week, but only four or five closed purchases. A surprise during the buyer’s final walkthrough could wildly swing the average, “typical” home price. For the statistically inclined, this is a classic sample size problem.

There are plenty of ways to address the sample size problem, such as rolling averages and dropping outliers. Or you could just include transactions from a wider area like the county or state. However the wider the net you cast, the less “typical” the price!

Another approach is to sample from the active real estate market, those properties currently listed for sale. You get an order of magnitude more data and the sample size problem goes away. However everyone knows that listing prices do not have a clear cut relationship with closing price. Some sellers are unrealistic and ask too much, and some ask for too little to start a bidding war. What is the premium or discount between listing price and actual value? We spend a lot of time thinking about this question. Even closed transaction prices are not necessarily the perfect measure of typical “value” since taxes and mortgage specifics can distort the final price. Our solution is to assume that proportional changes in listing prices over time will roughly match proportional changes in the value of a typical house, especially given a larger sample from the active market.

A Picture
Below is a chart of Altos Research‘s real estate prices back through 2009, across about 730 zipcodes. For each week on the horizontal axis, and for each zipcode, I calculate the proportional change in listing price (blue) and in sold price (red) since the previous week. Then I average the absolute value of these proportional changes, for a rough estimate of volatility. The volatility of sold prices is extreme.

Price Volatility

Sarah Palin Email Word Cloud

After three years of legal wrangling, the diligent folks at Mother Jones released another set of Sarah Palin’s emails on Friday. There are plenty of subtleties to the story. Should a personal Yahoo! email account be used for government work? And why the frustrating digital / analog loop of printing emails to be scanned at the other end, like a fax machine?

For my own snickering, I spent a couple hours over the weekend downloading the email PDF’s, converting them to text, and then parsing out the choice “holy moly’s” and tender bits about Track in the army. Here is a word cloud of the former governor’s emails, via the amazing Wordle project.

Sarah Palin's Email Word Cloud

Case-Shiller April Forecasts

Another finger in the air, in the beginning of the month lull.

My forecasts for the March, 2011 Case-Shiller index levels were quite rushed. They were released quickly so I could publicly compare the forecasts with the CFE futures contracts about to expire. However, since the statistical models use active market data, there is no mathematical reason to wait on our forecasts until the end of the month. The April, 2011 index levels will be released on June 28th, but here are my forecasts given what the real estate markets were doing a few months ago:

City Confidence Forecast Predicted HPI
Minneapolis, MN +1 -10.52% 94.46
Phoenix, AZ +1 -2.85% 97.42
Las Vegas, NV +3 -1.56% 95.67
Atlanta, GA +2 -1.45% 96.93
Boston, MA 0 -1.32% 145.42
Los Angeles, CA -2 -1.22% 165.73
Seattle, WA +3 -0.46% 132.35
New York, NY -1 -0.21% 163.15
San Francisco, CA -3 -0.20% 129.56
Chicago, IL +2 -0.06% 110.50
San Diego, CA -3 +0.18% 154.16
Detroit, MI 0 +0.41% 67.34
Charlotte, NC 0 +0.50% 107.50
Miami, FL 0 +1.01% 138.66
Dallas, TX +1 +1.62% 114.72
Cleveland, OH +1 +2.12% 98.85
Denver, CO 0 +2.27% 123.29
Tampa, FL +1 +2.28% 129.98
Portland, OR +1 +4.71% 138.92
(The confidence score ranges from negative three for our weakest signals, up to positive three for strength. Unfortunately I am still sorting out a bug in our Washington, DC model.)

Dan Rice on How the Experts May Not Always Be Right: A Story About the Discovery of Preclinical Alzheimer’s Disease in 1991

Machine learning can be a check on conventional thinking, if we let it.

On the new analytics Linkedin group started by Vincent Granville, Dan Rice wrote a personal account of his frustrations with the Alzheimer’s research of 20 years ago, before we understood more about the preclinical period of the disease:

The problem that I have with domain expert knowledge selecting the final variables that determine the model is that it no longer is data mining and it often is no longer even good science. From the time of Galileo, the most exciting and important findings in what we call science are those data-driven findings that prove the experts wrong. The problem is that the prior domain knowledge is usually incomplete or even wrong, which is the reason for research and analytics in the first place. I understand that the experts are helpful to generate a large list of candidate variables, but the experts will often be wrong when it comes to determining how, why and which of these variable combinations is causing the outcome.

I had an experience early in my research career that has made me forever distrustful of the expert. I was doing brain imaging research on the origins of Alzheimer’s disease in the early 1990’s and all the experts at that time said that the cause of Alzheimer’s disease must be happening right when the dementia and serious memory problems are observed which may be at most a year before the ultimate clinical diagnosis of dementia consistent with Alzheimer’s. We took a completely data-driven approach and measured every variable imaginable in both our brain imaging measure and in cognitive/memory testing. From all of these variables, we found one very interesting result. What the experts had referred to as a “silent brain abnormality” that is seen in about 25% of “normal elderly” at age 70 was associated with minor memory loss problems that were similar to but much less severe than in the early dementia in Alzheimer’s disease. We knew that the prevalence of clinically diagnosed dementia consistent with Alzheimer’s disease was 25% in community elderly at age 80. Thus, we had a very simple explanatory model that put the causal disease process of Alzheimer’s disease back 9-10 years earlier than anyone had imagined.

The problem was that all the experts who gave out research funding disagreed and would not even give me another grant from the National Institute on Aging to continue this research. For years, nobody did any of this preclinical Alzheimer’s research until about 10 years ago when people started replicating our very same pattern of results with extensions to other brain imaging measures. What is still controversial is whether you can accurately PET image the beta-amyloid putative causal protein in living patients, but it is no longer controversial that Alzheimer’s has an average preclinical period of at least 10 years. Ironically, one of the experts who sat on the very committee that rejected my grant applications suddenly became an expert in preclinical Alzheimer’s disease over the past 5 years. The experts are very often dead wrong. We allow experts to select variables in the RELR algorithm, but our users tell us that they seldom use this feature because they want the data to tell the story. The data are much more accurate than the experts if you have an accurate modeling algorithm.

(Quoted with permission of the author.)

Housing Finger in the Air

The March, 2011 Case-Shiller numbers will be released this Tuesday, but the CME’s May futures contracts expire tomorrow. Some of the real estate transactions that will be summarized in Tuesday’s numbers are up to five months old, where our data is at most one week old. This is why Altos Research calls its statistics “real time,” since it is an order of magnitude more current than the benchmark in real estate data.

Below is a table of our forecasts for six of the Case-Shiller futures contracts. Check back in a few days, when I will compare with the actual March, 2011 numbers.

Metro Area Feb-2011 CS HPI Forecast Signal
Boston, MA 149.86 -2.33% 111bps below the future’s spot bid price
Chicago, IL 113.26 -1.28% in the spread
Denver, CO 121.26 -3.31% 64bps below the future’s spot bid price
Las Vegas, NV 98.28 -3.26% 96bps below the future’s spot bid price
Los Angeles, CA 168.25 -8.64% 763bps below the future’s spot bid price
San Diego, CA 155.05 +1.66% 209bps above the futures spot ask price
(all spot prices as of 10:30am PST on 26-May-2011)

Fighting the Last War: Shiller Paper

A new type of mortgage gets a price that means you never have to walk away.

Last month Robert J. Shiller, Rafal M. Wojakowski, Muhammed Shahid Ebrahim and Mark B. Shackleton published a paper with the financial engineering to price “continuous workout mortgages.” This is the Shiller of Irrational Exuberance and housing index fame.

A continuous workout mortgage leaves some of the risk of house price deprecation with the mortgage lender, since the mortgage balance automatically adjusts if the market tanks. The authors model an interest-only continuous workout mortgage as a loan bundled with a put option on the value of the home and a floor on interest rates. By design, the option to abandon the mortgage is always out of the money, so the borrower has little incentive to strategically default or walk away.

Pricing a continuous workout mortgage uses a standardized housing index. Perversely, this prevents a borrowers from trashing their own homes in order to reduce payments. So the bundled put option is on a housing index and not on the exact home. Others have written about the political and class bias encouraged when your savings are connected so directly to the neighborhood. Standard & Poor’s conveniently sells metropolitan housing indices. These S&P Case-Shiller housing indices have serious problems, including methodology transparency and data lag — no one can replicate and therefore validate the Case-Shiller numbers, the indices are published several months late, and they ignore the prices of homes pulled off the market without a sale.

Like proper quants, Shiller and colleagues push hard for a closed-form pricing formula. The party line is that clean formulas make for better markets, but computer simulation is easy enough now-a-days and far more accurate. Ahh, job security! To get a formula for the interest rate a lender should charge for a continuous workout mortgage, they make the heroic Black-Scholes universe assumptions, including:

  • The housing index can be traded, and traded without any brokerage fees. Also the index can be sold or bought for the same price.
  • Cash can be borrowed or lent at the exact same interest rate.
  • No one pays taxes.
  • The variance (jitter) in the housing index is independent of how much a trader expects to earn from investing in the housing index. This one is rarely mentioned, but not so obscure once you drop the “risk neutral” jargon.

And so also like proper quants, Shiller and his colleagues assume the frictionless, massless pulley from a high school physics class.

Dreaming of the Cloud

So far cloud 2011 is just client-server 1997 with new jargon.

As a modeler who manages a serious EC2 cluster, someone who has handed thousands of dollars to Amazon over the last few years, I remain frustrated at what the industry has settled on as the main unit of value. Root access on a Linux virtual machine does an admirable job of isolating my applications from other users, but it is a poor way to economically prioritize. We need a smarter metaphor to distribute a long-running job across a bunch of machines and to make sure we pay for what we use. I don’t so much care about having a fleet of machines ready to handle a spike in web traffic. Instead I want to be able to swipe my credit card to ramp up what would usually take a week, so it will finish in a couple hours.

(If you are a Moore’s Law optimist who thinks glacial, CPU-bound code is a thing of the past, you might be surprised to hear that one of my models has been training on an EC2 m1.large instance for the last 14 hours, and is just over halfway finished… Think render farms and statistical NLP, not Photoshop filters.)

My dream cloud interface is not about booting virtual machines and monitoring jobs, but about spending money so my job finishes quicker. The cloud should let me launch some code, and get it chugging along in the background. Then later, I would like to spend a certain amount of money, and let reverse auction magic decide how much more CPU & RAM that money buys. This should feel like bidding for AdWords on Google. So where I might use the Unix command “nice” to prioritize a job, I could call “expensiveNice” on a PID to get that job more CPU or RAM. Virtual machines are hip this week, but applications & jobs are still the more natural way to think about computing tasks.

This sort of flexibility might require cloud applications to distribute themselves across one or more CPUs. So perhaps the cloud provider insists that applications be multi-threaded. Or Amazon could offer “expensiveNice” for applications written in a side-effect free language like Haskell, so GHC can take care of the CPU distribution.

Banks from the Outside

How do you identify the big cheese at a bank, the decision maker you should sell to? It’s not as easy as it sounds.

Investment banks are notoriously opaque businesses with a characteristic personnel and power structure. Still, there is plenty in common across investment banks and a few generalizations an outsider can make when trying to deal with an investment bank.

The “bulge bracket” are the large investment banks. Bank pecking order and prestige is roughly based on a bank’s size and volume of transactions. Banks who do the most deals generate the highest bonus pool for their employees. The pecking order since the credit crisis is probably:

This list is obviously contentious — though Goldman Sachs and JPMorgan are the undisputed masters, and Citibank and BofA are both the train wrecks. BofA is also known as Bank of Amerillwide, given its acquisitions. Bear Stearns opted out of the 1998 LTCM bailout, which is probably why they were allowed to fail during the credit crisis. Lehman Brothers had a reputation for being very aggressive but not too bright, while Merrill Lynch was always playing catchup. NYC is the capital of investment banking, but London and Hong Kong trump in certain areas. I’ve indicated where each of the bulge brackets are culturally headquartered. Each bank has offices everywhere but big decision-makers migrate to the cultural headquarters.

Investment Bank Axes

There are two broad axes within each bank. One axis is “front office -ness” and the other axis is “title” or rank. The front office directly makes serious money. The extreme are those doing traditional investment banking services like IPO’s, M&A, and Private Equity. And of course, traders and (trading) sales are also in the front office. Next down that axis are quants and the research(ers) who recommend trades. Then the middle office is risk management, legal and compliance. These are still important functions, but have way less pull than the front office. The back office is operations like trade processing & accounting, as well as technology.

This first front office -ness axis is confusing because people doing every type of work turn up in all groups. JPMorgan employs 240 thousand people so there are bound to be gray areas. An M&A analyst might report into risk management, which is less prestigious than if the same person with the same title reported into a front office group.

The other axis is title or rank. This is simpler, but something that tends to trip up outsiders. Here is the pecking order:

  • C-level (CEO, CFO, CTO, General Counsel. Some banks confusingly have a number of CTOs, which makes that title more like:)
  • Managing Director (“MD”, partner level at Goldman Sachs, huge budgetary power, the highest rank we mere mortals ever dealt with)
  • Executive Director or (just) Director (confusingly lower in rank than an MD, still lots of budgetary power)
  • Senior Vice-President (typical boss level, mid-level management, usually budgetary power, confusingly lower in rank than a Director)
  • Vice-President (high non-manager level, rarely has budget)
  • Assistant Vice-President or Junior Vice-President (“AVP”, rookie with perks, no budget)
  • Associate or Junior Associate (rookie, no budget)
  • Analyst (right out of school, no budget, a “spreadsheet monkey”)
  • Non-officers (bank tellers, some system administration, building maintenance)

Almost everyone at an investment bank has a title. Reporting directly to someone several steps up in title is more prestigious. Contractors and consultants are not titled, but you should assume they are one step below their boss. If someone emphasizes their job function instead of title (“I’m a software developer at Goldman Sachs”), you should assume they are VP or lower. Large hedge funds and asset managers mimic this structure. So to review, who is probably a more powerful decision maker?

  • A. an MD in IT at BofA, based out of Los Angeles -or- B. an ED in Trading also at BofA, but based in Charlotte (highlight for the answer: B because front office wins)
  • A. an MD in Risk Management at Morgan Stanley in NYC -or- B. a SVP in M&A also at Morgan Stanley in NYC (A because title wins)
  • A. a Research Analyst at JPMorgan in NYC -or- B. a Junior Vice-President in Research at Citibank in London (A because NYC and front office wins)
  • A. a VP Trader at Morgan Stanley in Chicago -or- B. an SVP in Risk Management at UBS in London (toss up, probably A since traders win)
  • A. an Analyst IPO book runner at Goldman Sachs in NYC -or- B. an Analyst on the trading desk at JPMorgan in NYC (toss up, probably A because Goldman Sachs wins)

Sour Grapes: Seven Reasons Why “That” Twitter Prediction Model is Cooked

The financial press has been buzzing about the results of an academic paper published by researchers from Indiana University-Bloomington and Derwent Capital, a hedge fund in the United Kingdom.

The model described in the paper is seriously faulted for a number of reasons:

1. Picking the Right Data
They chose a very short bear trending period, from February to the end of 2008. This results in a very small data set, “a time series of 64 days” as described in a buried footnote. You could have made almost 20% return over the same period by just shorting the “DIA” Dow Jones ETF, without any interesting prediction model!

There is also ambiguity about the holding period of trades. Does their model predict the Dow Jones on the subsequent trading day? In this case, 64 points seems too small a sample set for almost a year of training data. Or do they hold for a “random period of 20 days”, in which case their training data windows overlap and may mean double-counting. We can infer from the mean absolute errors reported in Table III that the holding period is a single trading day.

2. Massaging the Data They Did Pick
They exclude “exceptional” sub-periods from the sample, around the Thanksgiving holiday and the U.S. presidential election. This has no economic justification, since any predictive information from tweets should persist over these outlier periods.

3. What is Accuracy, Really?
The press claims the model is “87.6%” accurate, but this is only in predicting the direction of the stock index and not the magnitude. Trading correct directional signals that predict small magnitude moves can actually be a losing strategy due to transaction costs and the bid/ask spread.

They compare with “3.4%” likelihood by pure chance. This assumes there is no memory in the stock market, that market participants ignore the past when making decisions. This also contradicts their sliding window approach to formatting the training data, used throughout the paper.

The lowest mean absolute error in predictions is 1.83%, given their optimal combination of independent variables. The standard deviation of one day returns in the DIA ETF was 2.51% over the same period, which means their model is not all that much better than chance.

The authors also do not report any risk adjusted measure of return. Any informational advantage from a statistical model is worthless if the resulting trades are extremely volatile. The authors should have referenced the finance and microeconomics literature, and reported Sharpe or Sortino ratios.

4. Backtests & Out-of-sample Testing
Instead of conducting an out-of-sample backtest or simulation, the best practice when validating an un-traded model, they pick the perfect “test period because it was characterized by stabilization of DJIA values after considerable volatility in previous months and the absence of any unusual or significant socio-cultural events”.

5. Index Values, Not Prices
They use closing values of the Dow Jones Industrial Average, which are not tradable prices. You cannot necessarily buy or sell at these prices since this is a mathematical index, not a potential real trade. Tracking errors between a tradable security and the index will not necessarily cancel out because of market inefficiencies, transaction costs, or the bid/ask spread. This is especially the case during the 2008 bear trend. They should have used historic bid/ask prices of a Dow Jones tracking fund or ETF.

6. Causes & Effects
Granger Causality makes an assumption that the effects being observed are so-called covariance stationary. Covariance stationary processes have constant variance (jitter) and mean (average value) across time, which is almost precisely wrong for market prices. The authors do not indicate if they correct for this assumption through careful window or panel construction.

7. Neural Parameters
The authors do not present arguments for their particular choice of “predefined” training parameters. This is especially dangerous with such a short history of training data, and a modeling technique like neural networks, which is prone to high variance (over-fitting).

Getting Bought

I am happy to announce that I just signed the paperwork to transfer my machine learning software to Altos Research, where my position is now Director of Quantitative Analytics. Altos and I have been working together on a contract basis since last November, when I started forecasting with the Altos data. The software itself (“Miri”) is my professional obsession — a programming library for data mining, modeling and statistics. Miri was the core of the FVM product we released in February (

Altos Research LLC is a real estate and analytics company founded back in 2005. We are about 15 people in Mountain View, CA who collect and analyze live real estate prices and property information from the web. Altos is not just “revenue positive,” but actually profitable. We are proud to have never taken outside funding.

Altos will continue to develop Miri, but I will also focus on technical sales, business development and my own trading portfolio. We have a serious opportunity to change the way the financial industry’s dinosaurs do modeling. I am still your friendly neighborhood data guy, just now mostly thinking about real estate.

My personal blog is at “”, where you can read my ramblings. And I talk shop on the Altos blog itself at “”.