A Different House Hedge

Where do stock market winners buy houses?

There are many ways to predict how the price of an asset will change in the future. For stocks, one approach is based on fundamental analysis and another approach uses portfolio diversification theory. A third approach to predicting stock movement is so-called “technical analysis,” which is too silly for more than a mention. There are also statistical arbitrageurs in the high-frequency market-making and trading arms race, who make minute predictions thousands of times per day. If we pretend real estate acts as a stock, we can stretch the analogy into a new mathematical tool for hedging house prices.

Fundamentalism

Fundamental analysis is usually what people think about when picking stocks. This is the Benjamin Graham philosophy of digging into a company’s internals and financial statements, and then guessing whether or not the current stock price is correct. The successful stock picker can also profit from an overpriced share by temporarily borrowing the stock, selling it, and then later buying it back on the cheap. This is your classic “short,” which may or may not be unethical depending on your politics. Do short trades profit from misery, or reallocate wasted capital?

Fundamental analysis is notoriously difficult and time-consuming, yet it is the most obvious way to make money in the stock market. Fundamental analysis is also what private equity and venture capitalists do, but perhaps covering an unlisted company or even two guys in a garage in Menlo Park. When you overhear bankers talking about a “long/short equity fund” they probably mean fundamental analysis done across many stocks and then managing (trading) a portfolio that is short one dollar for every dollar it is long. This gives some insulation against moves in a whole sector, or even moves in the overall economy. If you are long $100 of Chevron and short $100 of BP, the discovery of cheap cold fusion will not trash your portfolio since that BP short will do quite well. However for conservative investors like insurance companies and pension funds, government policy restricts how much capital can be used to sell assets short. These investors are less concerned about fundamental analysis, and more about portfolio diversification and the business cycle.

Highly Sensitive Stuff

If a long-only fund holds just automobile company stocks, the fund should be very concerned about the automobile sector failing as a whole. The fund is toast if the world stops driving, even if their money is invested in the slickest, most profitable car companies today. Perfect diversification could occur if an investor bought a small stake in every asset in the world. Though huge international indices try to get close, with so many illiquid assets around, perfect diversification remains just a theory. How can an investor buy a small piece of every condominium in the world? How could I buy a slice of a brand like Starbucks? Even worse, as time goes by companies recognize more types of illiquid assets on their balance sheets. Modern companies value intellectual property and human capital, but these assets are difficult to measure and highly illiquid. What currently unaccounted-for asset will turn up on balance sheets in 2050?

Smart fund managers understand that perfect diversification is impossible, and so they think in terms of a benchmark. A fund benchmark is usually a published blend of asset prices, like MSCI’s agricultural indices. The fund manager’s clients may not even want broad diversification, and may be happy to pay fund management fees for partial diversification across a single industry or country. Thinking back to our auto sector fund, they are concerned with how the fortune’s of one car company are impacted by the automobile industry as a whole. An edgy upstart like Tesla Motors is more sensitive to the automobile industry than a stalwart like Ford, which does more tangential business like auto loans and servicing.

Mathematically we calculate the sensitivity of a company to a benchmark by running a simple linear regression of historic stock returns against changes in the benchmark. If a company’s sensitivity to the benchmark is 2.5, then a $10 stock will increase to $12.50 when the benchmark goes up by one point. A sensitivity of 0.25 means the stock would just edge up to $10.25 in the same scenario. A company can have negative sensitivity, especially against a benchmark in another related industry. Tesla probably has a negative sensitivity to changes in an electricity price index, since more expensive electricity would hurt Tesla’s business. No sensitivity (zero) would turn up against a totally unrelated benchmark. Sensitivity has a lot in common with correlation, another mathematical measure of co-movement.

One type of sensitivity is talked about more than any other. “Beta” is the sensitivity of a stock to the theoretical benchmark containing every asset in the world. Data providers like Bloomberg and Reuters probably estimate beta by regressing stock returns against one of those huge, international asset indices. An important model in finance and economics is called the Capital Asset Pricing Model, which earned a Nobel Prize for theorizing that higher beta means higher returns, since sensitivity to the world portfolio is the only sort of risk that cannot be diversified away. Though the CAPM beta is a poor model for real-life inefficient markets, sensitivities in general are a simple way to think about how a portfolio behaves over time. For instance, it turns out that sensitivities are additive. So $100 in a 0.25 sensitive stock and $50 in two different -0.25 sensitive stocks should be hedged against moves in the index and in the industry the index measures.

Back to Real Estate

Prices in certain local real estate markets are bolstered by a rally in the stock market. The recent murmurings of another IPO bubble suggest that newly minted paper millionaires will soon be shopping for homes in Los Altos Hills and Cupertino. We can put numbers behind this story by calculating real estate price sensitivity to a stock market benchmark. If we choose the S&P 500 as the benchmark, the sensitivity number will be a sort of real estate beta. Since real estate is far less liquid than most stocks, I regressed quarterly changes in our Altos Research median ask price against the previous quarter’s change in the S&P 500. Historically speaking, those real estate markets with a high beta have gotten a boost in prices after a good quarter in the stock market. Those markets with a low, negative beta are not “immune” to the stock market, but tend to be depressed by a stock market rally.

Below is a map of the Bay Area’s real estate betas. These numbers were calculated using prices from Altos Research and benchmark levels from Yahoo! Finance. The darker red a zipcode, the greater an increase in the market’s home prices after a quarterly stock market rally. As we might expect, the betas in Silicon Valley are above average. However there are also some surprises in Visalia and Wine Country.

Real Estate Beta, Bay Area

Our hypothesis for positive real estate beta is easy: those IPO millionaires. But what could cause a real estate market to tank after a good run in the stocks? Perhaps negative real estate betas are in more mobile labor markets, where stock market wealth triggers a move away from home ownership. Or maybe negative real estate betas turn up in markets where the condo stock is higher quality than single-family homes, like in some college towns. Remember the betas mapped above are based on only single-family home prices.

Real estate remains a difficult asset to hedge, an asset almost impossible to short by non-institutions. This is unfortunate, because a short hedge would be a convenient way for people with their wealth tied up in real estate to ride out a depressed market cycle. However like long-only fund managers, real estate investors could benefit from thinking in terms of benchmark sensitivity. If we choose a benchmark that represents the broader real estate market, we could hedge real estate buy purchasing non-property assets that have negative real estate betas. You would want your value-weighted real estate beta to net out to about zero. Now there is a plethora of problems and assumptions around making investment decisions with a crude linear sensitivity number, but at least real estate beta gives us another tool for thinking about housing risk.

(An abbreviated version of this post can found be at http://blog.altosresearch.com/a-different-house-hedge/ on Altos Research’s blog)

Posted in altos-research, hedging, quant, quantitative-analysis, real-estate, trading | 2 Comments

Fungal Houses

Ever wondered why your flat’s Zestimate bounces around so much?

In high school economics class you might have learned about fungible goods. This strange word refers to things that could be swapped without the owners especially caring. A dollar is almost perfectly fungible, and so is an ounce of pure silver. Paintings and emotional knick knacks are not at all fungible. Fungible stuff is easy to trade on a centralized market, since a buyer should be happy to deal with any seller. This network effect is so important that markets “push back,” and invent protocols to force fungibility. Two arbitrary flatbeds of lumber at Home Depot are probably not worth the same amount of cash. However the CME’s random length lumber contract puts strict guidelines on how that lumber could be delivered to satisfy the obligation of the future contract’s short trader.

Real estate is seriously non-fungible. Even a sterile McMansion in the suburbs can have a leaky roof, quirky kitchen improvements, or emotional value for the house-hunting recent college grads. If we consider many similar homes as a basket, or a portfolio of the loans secured by the homes, then the idiosyncrasies of each home should net out to zero overall. Across those ten thousand McMansions, there should be a few people willing to pay extra for a man cave, but also a few people who would dock the price. This is the foundation of real estate “structured products,” such as the residential mortgage backed securities (RMBS) of recent infamy. Like flatbed trucks delivering a certain sort of wood for a lumber futures contract, a RMBS makes a non-fungible good more fungible.

The Usual Place
The combined idiosyncrasies of non-fungible things rarely net out to exactly zero, especially during a financial crisis. Nonetheless traders and real estate professionals want to think about a hypothetical, “typical” property. We define a local real estate market by city, neighborhood or even zipcode. How do we decide the value of a typical property? There is an entire industry built around answering this question. One simple, clean approach is to sample a bunch of real estate prices in a local market at a certain point in time, and then average the prices. Or maybe use a more robust descriptive statistic like the median price.

The most readily available residential home prices in the U.S. market are “closed” transactions, the price a home buyer actually paid for their new place. Using a closed transaction price is tricky, because it is published several months after a property is sold. Can a typical home price really be representative if it is so stale?

Sampling
Even if we ignore the time lag problem, there is another serious challenge in using transactions to calculate a typical home price. Within any local real estate market worth thinking about, there are very few actual transactions compared with overall listing activity and buzz. Your town may have a hundred single-family-homes listed for sale last week, but only four or five closed purchases. A surprise during the buyer’s final walkthrough could wildly swing the average, “typical” home price. For the statistically inclined, this is a classic sample size problem.

There are plenty of ways to address the sample size problem, such as rolling averages and dropping outliers. Or you could just include transactions from a wider area like the county or state. However the wider the net you cast, the less “typical” the price!

Another approach is to sample from the active real estate market, those properties currently listed for sale. You get an order of magnitude more data and the sample size problem goes away. However everyone knows that listing prices do not have a clear cut relationship with closing price. Some sellers are unrealistic and ask too much, and some ask for too little to start a bidding war. What is the premium or discount between listing price and actual value? We spend a lot of time thinking about this question. Even closed transaction prices are not necessarily the perfect measure of typical “value” since taxes and mortgage specifics can distort the final price. Our solution is to assume that proportional changes in listing prices over time will roughly match proportional changes in the value of a typical house, especially given a larger sample from the active market.

A Picture
Below is a chart of Altos Research‘s real estate prices back through 2009, across about 730 zipcodes. For each week on the horizontal axis, and for each zipcode, I calculate the proportional change in listing price (blue) and in sold price (red) since the previous week. Then I average the absolute value of these proportional changes, for a rough estimate of volatility. The volatility of sold prices is extreme.

Price Volatility

Posted in altos-research, real-estate | Leave a comment

Sarah Palin Email Word Cloud

After three years of legal wrangling, the diligent folks at Mother Jones released another set of Sarah Palin’s emails on Friday. There are plenty of subtleties to the story. Should a personal Yahoo! email account be used for government work? And why the frustrating digital / analog loop of printing emails to be scanned at the other end, like a fax machine?

For my own snickering, I spent a couple hours over the weekend downloading the email PDF’s, converting them to text, and then parsing out the choice “holy moly’s” and tender bits about Track in the army. Here is a word cloud of the former governor’s emails, via the amazing Wordle project.

Sarah Palin's Email Word Cloud

Posted in natural-language-processing, politics | Leave a comment

Case-Shiller April Forecasts

Another finger in the air, in the beginning of the month lull.

My forecasts for the March, 2011 Case-Shiller index levels were quite rushed. They were released quickly so I could publicly compare the forecasts with the CFE futures contracts about to expire. However, since the statistical models use active market data, there is no mathematical reason to wait on our forecasts until the end of the month. The April, 2011 index levels will be released on June 28th, but here are my forecasts given what the real estate markets were doing a few months ago:

City Confidence Forecast Predicted HPI
Minneapolis, MN +1 -10.52% 94.46
Phoenix, AZ +1 -2.85% 97.42
Las Vegas, NV +3 -1.56% 95.67
Atlanta, GA +2 -1.45% 96.93
Boston, MA 0 -1.32% 145.42
Los Angeles, CA -2 -1.22% 165.73
Seattle, WA +3 -0.46% 132.35
New York, NY -1 -0.21% 163.15
San Francisco, CA -3 -0.20% 129.56
Chicago, IL +2 -0.06% 110.50
San Diego, CA -3 +0.18% 154.16
Detroit, MI 0 +0.41% 67.34
Charlotte, NC 0 +0.50% 107.50
Miami, FL 0 +1.01% 138.66
Dallas, TX +1 +1.62% 114.72
Cleveland, OH +1 +2.12% 98.85
Denver, CO 0 +2.27% 123.29
Tampa, FL +1 +2.28% 129.98
Portland, OR +1 +4.71% 138.92
(The confidence score ranges from negative three for our weakest signals, up to positive three for strength. Unfortunately I am still sorting out a bug in our Washington, DC model.)

Posted in altos-research, forecasting, real-estate, trading | Leave a comment

Dan Rice on How the Experts May Not Always Be Right: A Story About the Discovery of Preclinical Alzheimer’s Disease in 1991

Machine learning can be a check on conventional thinking, if we let it.

On the new analytics Linkedin group started by Vincent Granville, Dan Rice wrote a personal account of his frustrations with the Alzheimer’s research of 20 years ago, before we understood more about the preclinical period of the disease:

The problem that I have with domain expert knowledge selecting the final variables that determine the model is that it no longer is data mining and it often is no longer even good science. From the time of Galileo, the most exciting and important findings in what we call science are those data-driven findings that prove the experts wrong. The problem is that the prior domain knowledge is usually incomplete or even wrong, which is the reason for research and analytics in the first place. I understand that the experts are helpful to generate a large list of candidate variables, but the experts will often be wrong when it comes to determining how, why and which of these variable combinations is causing the outcome.

I had an experience early in my research career that has made me forever distrustful of the expert. I was doing brain imaging research on the origins of Alzheimer’s disease in the early 1990’s and all the experts at that time said that the cause of Alzheimer’s disease must be happening right when the dementia and serious memory problems are observed which may be at most a year before the ultimate clinical diagnosis of dementia consistent with Alzheimer’s. We took a completely data-driven approach and measured every variable imaginable in both our brain imaging measure and in cognitive/memory testing. From all of these variables, we found one very interesting result. What the experts had referred to as a “silent brain abnormality” that is seen in about 25% of “normal elderly” at age 70 was associated with minor memory loss problems that were similar to but much less severe than in the early dementia in Alzheimer’s disease. We knew that the prevalence of clinically diagnosed dementia consistent with Alzheimer’s disease was 25% in community elderly at age 80. Thus, we had a very simple explanatory model that put the causal disease process of Alzheimer’s disease back 9-10 years earlier than anyone had imagined.

The problem was that all the experts who gave out research funding disagreed and would not even give me another grant from the National Institute on Aging to continue this research. For years, nobody did any of this preclinical Alzheimer’s research until about 10 years ago when people started replicating our very same pattern of results with extensions to other brain imaging measures. What is still controversial is whether you can accurately PET image the beta-amyloid putative causal protein in living patients, but it is no longer controversial that Alzheimer’s has an average preclinical period of at least 10 years. Ironically, one of the experts who sat on the very committee that rejected my grant applications suddenly became an expert in preclinical Alzheimer’s disease over the past 5 years. The experts are very often dead wrong. We allow experts to select variables in the RELR algorithm, but our users tell us that they seldom use this feature because they want the data to tell the story. The data are much more accurate than the experts if you have an accurate modeling algorithm.

(Quoted with permission of the author.)

Posted in machine-learning, medicine | Leave a comment

Housing Finger in the Air

The March, 2011 Case-Shiller numbers will be released this Tuesday, but the CME’s May futures contracts expire tomorrow. Some of the real estate transactions that will be summarized in Tuesday’s numbers are up to five months old, where our data is at most one week old. This is why Altos Research calls its statistics “real time,” since it is an order of magnitude more current than the benchmark in real estate data.

Below is a table of our forecasts for six of the Case-Shiller futures contracts. Check back in a few days, when I will compare with the actual March, 2011 numbers.

Metro Area Feb-2011 CS HPI Forecast Signal
Boston, MA 149.86 -2.33% 111bps below the future’s spot bid price
Chicago, IL 113.26 -1.28% in the spread
Denver, CO 121.26 -3.31% 64bps below the future’s spot bid price
Las Vegas, NV 98.28 -3.26% 96bps below the future’s spot bid price
Los Angeles, CA 168.25 -8.64% 763bps below the future’s spot bid price
San Diego, CA 155.05 +1.66% 209bps above the futures spot ask price
(all spot prices as of 10:30am PST on 26-May-2011)

Posted in altos-research, forecasting, real-estate, trading | 1 Comment

Fighting the Last War: Shiller Paper

A new type of mortgage gets a price that means you never have to walk away.

Last month Robert J. Shiller, Rafal M. Wojakowski, Muhammed Shahid Ebrahim and Mark B. Shackleton published a paper with the financial engineering to price “continuous workout mortgages.” This is the Shiller of Irrational Exuberance and housing index fame.

A continuous workout mortgage leaves some of the risk of house price deprecation with the mortgage lender, since the mortgage balance automatically adjusts if the market tanks. The authors model an interest-only continuous workout mortgage as a loan bundled with a put option on the value of the home and a floor on interest rates. By design, the option to abandon the mortgage is always out of the money, so the borrower has little incentive to strategically default or walk away.

Pricing a continuous workout mortgage uses a standardized housing index. Perversely, this prevents a borrowers from trashing their own homes in order to reduce payments. So the bundled put option is on a housing index and not on the exact home. Others have written about the political and class bias encouraged when your savings are connected so directly to the neighborhood. Standard & Poor’s conveniently sells metropolitan housing indices. These S&P Case-Shiller housing indices have serious problems, including methodology transparency and data lag — no one can replicate and therefore validate the Case-Shiller numbers, the indices are published several months late, and they ignore the prices of homes pulled off the market without a sale.

Like proper quants, Shiller and colleagues push hard for a closed-form pricing formula. The party line is that clean formulas make for better markets, but computer simulation is easy enough now-a-days and far more accurate. Ahh, job security! To get a formula for the interest rate a lender should charge for a continuous workout mortgage, they make the heroic Black-Scholes universe assumptions, including:

  • The housing index can be traded, and traded without any brokerage fees. Also the index can be sold or bought for the same price.
  • Cash can be borrowed or lent at the exact same interest rate.
  • No one pays taxes.
  • The variance (jitter) in the housing index is independent of how much a trader expects to earn from investing in the housing index. This one is rarely mentioned, but not so obscure once you drop the “risk neutral” jargon.

And so also like proper quants, Shiller and his colleagues assume the frictionless, massless pulley from a high school physics class.

Posted in mortgages, quant, quantitative-analysis, real-estate | Leave a comment

Dreaming of the Cloud

So far cloud 2011 is just client-server 1997 with new jargon.

As a modeler who manages a serious EC2 cluster, someone who has handed thousands of dollars to Amazon over the last few years, I remain frustrated at what the industry has settled on as the main unit of value. Root access on a Linux virtual machine does an admirable job of isolating my applications from other users, but it is a poor way to economically prioritize. We need a smarter metaphor to distribute a long-running job across a bunch of machines and to make sure we pay for what we use. I don’t so much care about having a fleet of machines ready to handle a spike in web traffic. Instead I want to be able to swipe my credit card to ramp up what would usually take a week, so it will finish in a couple hours.

(If you are a Moore’s Law optimist who thinks glacial, CPU-bound code is a thing of the past, you might be surprised to hear that one of my models has been training on an EC2 m1.large instance for the last 14 hours, and is just over halfway finished… Think render farms and statistical NLP, not Photoshop filters.)

My dream cloud interface is not about booting virtual machines and monitoring jobs, but about spending money so my job finishes quicker. The cloud should let me launch some code, and get it chugging along in the background. Then later, I would like to spend a certain amount of money, and let reverse auction magic decide how much more CPU & RAM that money buys. This should feel like bidding for AdWords on Google. So where I might use the Unix command “nice” to prioritize a job, I could call “expensiveNice” on a PID to get that job more CPU or RAM. Virtual machines are hip this week, but applications & jobs are still the more natural way to think about computing tasks.

This sort of flexibility might require cloud applications to distribute themselves across one or more CPUs. So perhaps the cloud provider insists that applications be multi-threaded. Or Amazon could offer “expensiveNice” for applications written in a side-effect free language like Haskell, so GHC can take care of the CPU distribution.

Posted in amazon-ec2, cloud | Leave a comment

Banks from the Outside

How do you identify the big cheese at a bank, the decision maker you should sell to? It’s not as easy as it sounds.

Investment banks are notoriously opaque businesses with a characteristic personnel and power structure. Still, there is plenty in common across investment banks and a few generalizations an outsider can make when trying to deal with an investment bank.

The “bulge bracket” are the large investment banks. Bank pecking order and prestige is roughly based on a bank’s size and volume of transactions. Banks who do the most deals generate the highest bonus pool for their employees. The pecking order since the credit crisis is probably:

This list is obviously contentious — though Goldman Sachs and JPMorgan are the undisputed masters, and Citibank and BofA are both the train wrecks. BofA is also known as Bank of Amerillwide, given its acquisitions. Bear Stearns opted out of the 1998 LTCM bailout, which is probably why they were allowed to fail during the credit crisis. Lehman Brothers had a reputation for being very aggressive but not too bright, while Merrill Lynch was always playing catchup. NYC is the capital of investment banking, but London and Hong Kong trump in certain areas. I’ve indicated where each of the bulge brackets are culturally headquartered. Each bank has offices everywhere but big decision-makers migrate to the cultural headquarters.

Investment Bank Axes

There are two broad axes within each bank. One axis is “front office -ness” and the other axis is “title” or rank. The front office directly makes serious money. The extreme are those doing traditional investment banking services like IPO’s, M&A, and Private Equity. And of course, traders and (trading) sales are also in the front office. Next down that axis are quants and the research(ers) who recommend trades. Then the middle office is risk management, legal and compliance. These are still important functions, but have way less pull than the front office. The back office is operations like trade processing & accounting, as well as technology.

This first front office -ness axis is confusing because people doing every type of work turn up in all groups. JPMorgan employs 240 thousand people so there are bound to be gray areas. An M&A analyst might report into risk management, which is less prestigious than if the same person with the same title reported into a front office group.

The other axis is title or rank. This is simpler, but something that tends to trip up outsiders. Here is the pecking order:

  • C-level (CEO, CFO, CTO, General Counsel. Some banks confusingly have a number of CTOs, which makes that title more like:)
  • Managing Director (“MD”, partner level at Goldman Sachs, huge budgetary power, the highest rank we mere mortals ever dealt with)
  • Executive Director or (just) Director (confusingly lower in rank than an MD, still lots of budgetary power)
  • Senior Vice-President (typical boss level, mid-level management, usually budgetary power, confusingly lower in rank than a Director)
  • Vice-President (high non-manager level, rarely has budget)
  • Assistant Vice-President or Junior Vice-President (“AVP”, rookie with perks, no budget)
  • Associate or Junior Associate (rookie, no budget)
  • Analyst (right out of school, no budget, a “spreadsheet monkey”)
  • Non-officers (bank tellers, some system administration, building maintenance)

Almost everyone at an investment bank has a title. Reporting directly to someone several steps up in title is more prestigious. Contractors and consultants are not titled, but you should assume they are one step below their boss. If someone emphasizes their job function instead of title (“I’m a software developer at Goldman Sachs”), you should assume they are VP or lower. Large hedge funds and asset managers mimic this structure. So to review, who is probably a more powerful decision maker?

  • A. an MD in IT at BofA, based out of Los Angeles -or- B. an ED in Trading also at BofA, but based in Charlotte (highlight for the answer: B because front office wins)
  • A. an MD in Risk Management at Morgan Stanley in NYC -or- B. a SVP in M&A also at Morgan Stanley in NYC (A because title wins)
  • A. a Research Analyst at JPMorgan in NYC -or- B. a Junior Vice-President in Research at Citibank in London (A because NYC and front office wins)
  • A. a VP Trader at Morgan Stanley in Chicago -or- B. an SVP in Risk Management at UBS in London (toss up, probably A since traders win)
  • A. an Analyst IPO book runner at Goldman Sachs in NYC -or- B. an Analyst on the trading desk at JPMorgan in NYC (toss up, probably A because Goldman Sachs wins)
Posted in investment-banks, trading | Leave a comment

Sour Grapes: Seven Reasons Why “That” Twitter Prediction Model is Cooked

The financial press has been buzzing about the results of an academic paper published by researchers from Indiana University-Bloomington and Derwent Capital, a hedge fund in the United Kingdom.

The model described in the paper is seriously faulted for a number of reasons:

1. Picking the Right Data
They chose a very short bear trending period, from February to the end of 2008. This results in a very small data set, “a time series of 64 days” as described in a buried footnote. You could have made almost 20% return over the same period by just shorting the “DIA” Dow Jones ETF, without any interesting prediction model!

There is also ambiguity about the holding period of trades. Does their model predict the Dow Jones on the subsequent trading day? In this case, 64 points seems too small a sample set for almost a year of training data. Or do they hold for a “random period of 20 days”, in which case their training data windows overlap and may mean double-counting. We can infer from the mean absolute errors reported in Table III that the holding period is a single trading day.

2. Massaging the Data They Did Pick
They exclude “exceptional” sub-periods from the sample, around the Thanksgiving holiday and the U.S. presidential election. This has no economic justification, since any predictive information from tweets should persist over these outlier periods.

3. What is Accuracy, Really?
The press claims the model is “87.6%” accurate, but this is only in predicting the direction of the stock index and not the magnitude. Trading correct directional signals that predict small magnitude moves can actually be a losing strategy due to transaction costs and the bid/ask spread.

They compare with “3.4%” likelihood by pure chance. This assumes there is no memory in the stock market, that market participants ignore the past when making decisions. This also contradicts their sliding window approach to formatting the training data, used throughout the paper.

The lowest mean absolute error in predictions is 1.83%, given their optimal combination of independent variables. The standard deviation of one day returns in the DIA ETF was 2.51% over the same period, which means their model is not all that much better than chance.

The authors also do not report any risk adjusted measure of return. Any informational advantage from a statistical model is worthless if the resulting trades are extremely volatile. The authors should have referenced the finance and microeconomics literature, and reported Sharpe or Sortino ratios.

4. Backtests & Out-of-sample Testing
Instead of conducting an out-of-sample backtest or simulation, the best practice when validating an un-traded model, they pick the perfect “test period because it was characterized by stabilization of DJIA values after considerable volatility in previous months and the absence of any unusual or significant socio-cultural events”.

5. Index Values, Not Prices
They use closing values of the Dow Jones Industrial Average, which are not tradable prices. You cannot necessarily buy or sell at these prices since this is a mathematical index, not a potential real trade. Tracking errors between a tradable security and the index will not necessarily cancel out because of market inefficiencies, transaction costs, or the bid/ask spread. This is especially the case during the 2008 bear trend. They should have used historic bid/ask prices of a Dow Jones tracking fund or ETF.

6. Causes & Effects
Granger Causality makes an assumption that the effects being observed are so-called covariance stationary. Covariance stationary processes have constant variance (jitter) and mean (average value) across time, which is almost precisely wrong for market prices. The authors do not indicate if they correct for this assumption through careful window or panel construction.

7. Neural Parameters
The authors do not present arguments for their particular choice of “predefined” training parameters. This is especially dangerous with such a short history of training data, and a modeling technique like neural networks, which is prone to high variance (over-fitting).

Posted in forecasting, natural-language-processing, trading, twitter | 1 Comment