Employee Founding

Am I a cofounder or an employee?

There is prestige to having been a cofounder of a startup, someone who was there from the beginning taking the lifestyle risk in return for the possibility of striking gold and changing the world. Now with that breathless sentence out of the way, how do you know if you are a founder or an employee? To me there are four key questions to answer:

  • Is the startup funded externally, from an outside entity like a venture or seed fund? This would be someone without huge sunk costs choosing to hand over money, in exchange debt or equity and upside in the startup’s future.
  • Is the startup selling to businesses (“enterprise”), and does the venture have a paying client-or-two outside of the Silicon Valley scene? Consulting for your buddy’s startup does not count.
  • Is the startup selling to consumers, and have consumers written checks or swiped their credit cards for actual money? Tons of freemium traction does not count.
  • Are you working part-time on something else simultaneously? If you spend every Tuesday and Thursday working as a barista to pay the bills, you are not full-time.

If the answer to any of the three is “yes,” then you are probably an employee and not a founder or cofounder, de facto or otherwise.

Posted in politics, startups | Leave a comment

Outside Ukulele

A model for the POU, probability -of- ukulele.

The Outside Lands 2014 lineup looks to be one of the best in years, and as usual it will be difficult to decide which stage to watch over the weekend. To help, I wrote an NLP model that measures the degree to which a band is likely to lapse into entitlement and self-parody. So think of it as a musical spectrum, from Kanye West to Death Cab for Cutie.

  1. Kanye West
  2. Flume
  3. Paolo Nutini
  4. Ben Howard
  5. Watsky
  6. Ray LaMontagne
  7. Duck Sauce
  8. Jonathan Wilson
  9. Run the Jewels
  10. Jagwar Ma
  11. Tiësto
  12. Big Freedia
  13. Lykke Li
  14. The Brothers Comatose
  15. Kacey Musgraves
  16. Valerie June
  17. Atmosphere
  18. Tycho
  19. Macklemore & Ryan Lewis
  20. Tom Petty & the Heartbreakers
  21. Tegan & Sara
  22. Haim
  23. Bleachers
  24. Holy Ghost!
  25. Christopher Owens
  26. Dum Dum Girls
  27. Lucius
  28. Gold Panda
  29. Courtney Barnett
  30. Vance Joy
  31. Bear Hands
  32. RayLand Baxter
  33. Gardens & Villa
  34. Imelda May
  35. Mikal Cronin
  36. Finish Ticket
  37. Tumbleweed Wanderers
  38. Boys Noize
  39. SBTRKT
  40. The Kooks
  41. The Flaming Lips
  42. The Killers
  43. Grouplove
  44. Warpaint
  45. Arctic Monkeys
  46. Chromeo
  47. Typhoon
  48. Chvrches
  49. Capital Cities
  50. Local Natives
  51. John Butler Trio
  52. Deer Tick
  53. Greensky Bluegrass
  54. Woods
  55. Tedeschi Trucks Band
  56. The Soul Rebels
  57. The Districts
  58. Nicki Bluhm and The Gramblers
  59. Spoon
  60. Jenny Lewis
  61. Phosphorescent
  62. Cut Copy
  63. Night Terrors of 1927
  64. Givers
  65. Disclosure
  66. Death Cab For Cutie
Posted in culture, music, natural-language-processing | Leave a comment

I Program in Whatever

A friend just asked me how to get better at JavaScript, the programming language du jour for Silicon Valley gigs. Or more generally, whether “practice” is the way to overcome learning barriers in programming.

The short answer is, indeed, you just need to practice. The great Peter Norvig says becoming a good coder takes ten years. Bright people heeding good advice can slash these ten thousand required hours quite a bit.

Types of Language
Though if I were you, I would start with separating the learning of a particular programming language from becoming a good programmer. This is one of the trickiest concepts for people coming into programming from another field. The doing of computer science and software engineering has very little in common with the syntax or standard library of a particular programming language, JavaScript or otherwise. Sapir-Whorf be damned, but a programming language is just a tool while a (real) language is a way to communicate. Would you say becoming a radiologist is the same thing as learning to use an x-ray machine? Are statistics and Excel the same thing?

The infamous ThoughtWorks interview process for software engineers that I went through ages ago had almost no questions about standard libraries or syntax. (“How do you close a socket in C?”) No one cares, because you can always look that up in a book. Instead most of the questions were about abstraction, with a few here & there about algorithms. (“How would you cleanup the coupling between this infovis module and the database?”)

Good programmers learn new languages trivially, because they all have the same underpinnings. I find it helpful to think of three schools of programming language now-a-days. The first are the aspiring or popular languages like JavaScript, Go, Ruby, Python, Java, C#, C++ and C. These languages all have their imperative syntactic roots in ALGOL from the 1950′s. The languages are heavy on the syntax, and try to stop programmers from shooting themselves in the foot.

The next school of programming languages are the lower-case-el lisps like Scheme and Clojure. The most important distinction of a lisp is its homoiconicity, a cumbersome term that means you write code in a data structure the programming language is good at manipulating. Paul Graham of Y Combinator is a famous proponent of coding in lisps. They are more powerful and expressive than the popular languages, so it is easier for a good Scheme programmer to pick up Python than the opposite. Even Go’s statically-linked by default killer feature was common in the Lisp and Smalltalk communities thirty years ago.

My third class of programming languages are the functional languages providing different degrees of type safety, like Haskell, OCaml and Erlang. These languages discourage state and side-effects, and by doing so help code run across many CPUs or machines. Functional languages are also about code that is provably correct. This school of programming language is (arguably) more expressive and powerful than even the lisps, so a Haskell hacker should be able to pick up Clojure more easily than the Clojure programmer could learn OCaml.

I intentionally avoid classifying programming languages according to their object oriented-ness, since OOP is just another way to generalize and abstract the coupling between different parts of a software system. You can do object-oriented programming in any language, but languages like Java and Ruby force the issue. (Yes, you can write object-oriented systems in old school C.) Don’t bother with domain-specific languages like SQL, Matlab or R, since they encourage bad habits and are easy to learn later. Nothing is scarier than a R or Python programmer who has never written any lisp.

If you are trying to become a better programmer, the best thing you can do is learn the underlying history and structure of all three schools of programming language. “All of this has happened before, and all of this will happen again.” However the closer your learning language is to the third school, the quicker you will start to understand the core of the matter.

A Little More Advice
What side projects are you helping code? If the answer is “I just program at work” or “I just read a lot of code on Github,” then you will never be a great coder. The advice Hilary Mason got on Twitter a while back was iffy in this regard.

Have you worked through the amazing SICP book yet? There is a reason it was MIT’s main textbook for a zillion years. The book seems to be a cultural signal or marker of good coders. Others think the Van Roy & Haridi book is better than SICP, but the writing style is really dry.

Posted in culture, programming | Leave a comment

New Sentiment Dataset

The good folks in Stanford’s Natural Language Processing Group have built a powerful new dataset for a paper being presented at the EMNLP conference in Seattle next month. The underlying foundation of the dataset is not particularly exciting, being yet another corpus of labeled movie reviews: The review sentence “Stealing Harvard doesn’t care about cleverness, wit or any other kind of intelligent humor” is provided along with its negative sentiment label, for example. What is more interesting is the corpus providing sentiment labels at every level of composition. So for the same sentence, the dataset also provides a distinct sentiment label for the sub-phrase “any other kind of intelligent humor” which is actually positive. Hence the dataset is a treebank, not just your typical corpus. A lot of Mechanical Turk wrangling went into this! This compositional and recursive labeling is a great resource for training contextual models, especially ones that go beyond the bag-of-words legacy.

Here at Trending we are experimenting with an online, regularized, high-dimensional linear approximation to the Stanford paper’s tensor RNN model, one that lets us use the whole Vowpal Wabbit stack. Next month they plan to release some (Matlab) code to parse the treebank, but have already released the data itself. Therefore I put together a simple Ruby module to parse the treebank, for your own statistical NLP, sentiment and machine learning projects. It includes a bit of Graphviz logic to render phrase trees and their sentiment as SVG:

The module is hosted on Github at “http://github.com/someben/treebank/” under a nice Free license.

Posted in machine-learning, natural-language-processing, sentiment-analysis | Leave a comment

Separated by the Same Language

Some snarky, some important advice about America and England.

About ten years ago, I moved from Chicago to London for grad school. I intended to spend a few years in the United Kingdom, but my best laid plans saw me there for about five years. This is a much longer span of time than the typical study abroad or a backpacker’s tour. This summer I returned to England for an extended visit and observation. Time has clarified some non-intuitive quirks I didn’t know I had learned while living here. So a list for future expats, tourists and the curious:

  • London dominates English culture, far more than New York or Los Angeles dominates American. It is the largest city in the European Union, sprawling bigger than Paris or Rome, and probably the most diverse. On the ground, London fashion leads New York and Los Angeles by a few years. Yes, even New York City. Really.
  • The most bureaucratic aspect of a very bureaucratic country is consumer banking. Everything about English checking accounts, ATMs and credits cards is mind-boggleingly difficult, inefficient and wasteful. Things are still mostly done on paper, with proofs of residence, reference letters and other signs of class being the necessity. Plan to spend literally ten times the amount of effort screwing around with English banks as you would in America.
  • The opposite is true of The Internet. When it comes to healthy competition among mobile phone providers and ISPs, England is incredibly high-tech. This is probably because England is geographically small and wealthy. So pay-as-you-go plans with dumb phones are convenient and dirt cheap, and getting fiber optic broadband to your flat is trivial.
  • The English are far more sensitive to class than Americans, especially around verbal accents. People in England can be extremely wealthy but still “low class,” and vice-versa. Differentiating wealth from class is probably the most alien aspect of English culture, for Americans. My favorite breakfast place in Bristol has a reputation for being posh (a.k.a. high class), but is actually less expensive than most supposedly bohemian hipsteraunts in the city. The English are more likely to “unlearn” a low-class accent, and Americans mistakenly think splashing a lot of cash guarantees privilege.
  • Restaurant servers in England rely less on tips for their income, which makes the service either atrocious, or more honest — depending on your politics. American-style tipping is becoming more common in England, but still the exception. Go with 10% atop the bill if you had good service, otherwise keep the change. You always have the right to dispute any gratuity automatically included in a bill. Do not tip if you pick up a round of drinks at the bar.
  • Speaking of which, English drinkers take turns buying full rounds of drinks for the group. This is good etiquette, and something Americans should take up. The English will notice if you never happen to run for a round, and you will get a bad reputation. Americans think of themselves as heavy drinkers, but we are actually more teetotaling than the English.
  • “In America a hundred years is a long time, and in England a hundred miles is a long way.” Because English culture is so old and the country so densely populated, there is a lot of diversity even between neighboring towns. Driving a couple hours for a visit is nothing to an American, but can baffle an English person.
  • Most English do a good job of differentiating American politics from the American people, even if we do elect those goofballs in DC. Politically speaking, our country is seen as an isolationist and violent bully. But culturally, everyone loves our hip hop and big-budget movies.
  • The English are as likely to think of themselves as European as not, so membership in the EU is a constant point of political tension here. The English are a bridge between the New and Old Worlds. The snarky newspaper headline is “Fog in the English Channel: Europe Cut-off!”
  • Being invited into an English home for a meal, tea or supper is a big deal, more so than in America. Take it as flattery and bring a bottle of wine.
  • The English can hate their (elected) government, but still love their country. This is one surprising upside of still having a monarch. Americans who hate their elected leaders are more likely to be seen as “unpatriotic.”
  • Taxes in the UK are actually not that much higher than in the US, despite what American politicians imply. My nominal tax rate as an evil banker in London was only a few percent more than it was working in Chicago. The English love to hate on the National Health Service (NHS), but it does a decent job of providing widely-accessible health care. There is a parallel private health care system for the wealthy, which is much more American in style. Most English see health care as a civil right like suffrage, unlike Americans who usually see health care as an expense.
  • That said, the English are not necessarily more healthy than Americans, but they are definitely thinner. You can usually spot the American tourist by their weight and the fact that they do not smoke.
  • The geography is confusing but easy to memorize. Britain or Great Britain is the large island off the coast of Europe. It contains the countries of England, Wales and Scotland. So the Scottish are British, but definitely not English! However the United Kingdom includes Northern Ireland, which is not (Great-) British. Sometimes the UK is represented as a whole (i.e. at The Olympics), while at other times the individual countries in the UK matter (i.e. soccer). The UK flag (the Union Jack) is an overlay of the English, Scottish and old Irish flags. The English flag is about St. George the dragon slayer, and looks like a red cross on white.
  • The English are a pretty secular people. They are not necessarily atheists, but religion is just not that big of a deal.
  • Beer is the only inexpensive thing in England. Well, maybe eggs and milk in the grocery store also. The best and most traditional beer is the hand-pulled sort you find at a pub. Start with these bitters, and then try the bright, alcohol-heavy and bubbly lagers. Timothy Taylor’s Landlord is a fine example. (Most Americans only ever drink lager or the occasional stout like Guinness.) Yes English beer is served warmer than American, but the English weather is cooler too. Cocktails in England usually mean carefully measured 25ml shots, leading an English friend to flatter America as the “land of the free-pour.”
  • The best fish & chips is not found in pubs, but in dedicated shops called chippies. To find a chippy, look for counter service, paper-wrapped fries and a small menu. Good fish & chips -fish has a tasty, crispy batter around surprisingly delicate fish. Greasy fish inside is not good fish & chips -fish. Examples are the Fryer’s Delight on Theobald’s Road in Bloomsbury in London, and Fish Lovers on Whiteladies Road [sic] in Bristol.
  • The solution to late-night, drunk munchies in England is your Middle Eastern kebab shop. Mayonnaise-heavy garlic sauce on your chips is a must, especially after a few pints.
  • Talk is of “the pub” as if there is only one, but this is just a quirk of language. There is not a place called The Pub, or ever just one pub in an area. You just say “meet me at the pub.” Similarly, English folks will refer to “my local [pub].”
  • The weather in England is grey and wet, but actually very mild. This is because of the North Sea jet stream, even though the island is on latitude with Scandinavia. Despite the Dickens novels, snow is rare here. And compared with America, there are very few bugs and insects. There have been people living in every part of England “forever,” so there is very little actual wilderness even though the countryside is green and pretty. The high latitude also means very dark winters, and long summer days. There is nothing like leaving the pub at nine o’clock in August while there is still plenty of sunshine.
  • Americans are terrible with European and British geography, but the English are just as bad with ours. When I mention my hometown of Chicago to many English, they presume it is near the East Coast because of movies with skyscrapers and organized crime. Explaining that Chicago is a seventeen hour drive from New York City usually stuns the table… Two friends from Barcelona and the Black Forest in Germany actually grew up closer to each other than my wife and I, from Manhattan and Chicagoland.
  • Traditional businesses in England have flaky and frustrating hours, especially as an American used to working from nine to five, and running errands outside of this window. While I lived in England, pubs were granted more flexible hours (2005) and smoking was banned (2007). So thankfully pubs are no longer required to close early and go lock-in.
Posted in culture, england, politics, restaurants | 5 Comments

What is a Promise Worth?

How do you prevent hyperinflation without destroying the economy? The answer ain’t Bitcoin.

A virtual currency like Bitcoin uses a decentralized proof-of-work ledger (the block chain) to solve the the double-spending problem. “Satoshi Nakamoto” deserve serious accolades for this clever architecture, but Bitcoin has a few serious problems. The first is its lack of security. The infrastructure around the currency is shoddy and fragile. The website where 80% of Bitcoin trading currently occurs is called the Magic: The Gathering Online Exchange (a.k.a. Mt.Gox). Recently Mt.Gox has crashed and been cracked, and does not support easy shorting. More importantly, the Bitcoin system may never mature without a central authority spending a lot of (traditional) money to build-out the infrastructure, with negligible or negative financial return-on-investment. Without a social program, in other words.

Even if Bitcoins did have the infrastructure and liquidity of a traditional currency like U.S. dollars or Japanese yen, there is another more fundamental problem with Bitcoin becoming the money of the future. Bitcoins are intrinsically deflationary.

The future will always be in one of two states: Either Bitcoin miners are running up against the limits of Moore’s Law, and are unable to profitably mine new Bitcoins. Or some bullshit singularity has occurred, giving us all access to infinite computational power. In this state, we would run up against the Bitcoin architecture’s hard-coded monetary supply cap of twenty-one million Bitcoins.

If human desire is infinite, then people will always want more money for goods and services. (All else equal, of course!) So we have an intrinsically fixed supply of a fungible good along with increasing demand. Therefore a Bitcoin is guaranteed to increase in value over time. Any fraction of a Bitcoin is guaranteed to increase in value over time. This may sound good if you happen to have a lot of BTC (Bitcoin) in your wallet. However at a macroeconomic level deflation is catastrophic, which I will explain.

A Hamburger on Tuesday
Would you trade something today that is certain to be worth more tomorrow? What about if the “something” is a currency, a good that has no intrinsic value other than it being money? (You cannot heat your house with the digital dollars in your checking account. Gotta pay the utility company first.) In an emergency you might spend your deflating currency, but in general you should hold onto your BTC as long as possible. And since there is uncertainty about the degree to which Bitcoin will deflate, the market will not instantly price BTC correctly. The BTC price of goods and services will not instantly adjust to match the level of computational power available to miners.

Some Bitcoin proponents think we can instantly discount the BTC price of all goods and services to sync-up with systematic BTC deflation, but this would need a seriously high-tech payment infrastructure. Square and Stripe are trying, but does anyone seriously believe the prices of all goods and services can be discounted in real-time by a macroeconomic indicator? We can’t even ditch the wasteful dollar bill!

The Bitcoin bulls also emphasize a currency’s dual role as a means of transaction and a store-of-value, but intrinsic deflation trashes both roles simultaneously. As a means of transaction, deflation makes allocating capital (money) across projects and activities difficult, and again, requires that perfect payment infrastructure. Since systematic deflation destroys every asset’s value and discourages economic activity, deflationary currencies do badly as stores-of-value. Less economic activity means GDP contraction and decreased livelihood. Yes, despite what Professor von Nimby may have spewed in your Postmodern Marxist Studies class, GDP is a very strong indicator for overall human happiness. Perpetual economic contraction makes your savings account irrelevant. You might have a zillion super-valuable BTC in your digital wallet, but you have nothing to spend them on. In other words, if you think (hyper-) inflation is bad, deflation is even worse…

Passing Notes
Let us go back to a few of the original Bitcoin goals. Bitcoin proponents want an efficient, liquid currency immune from the distortion caused by a government or central bank’s monetary policy. This is reasonable since inflationary monetary policy has a sad history of trashing peoples’ savings accounts, in places like the Weimar Republic or more recently in Argentina. So how can we build the decentralized, non-deflationary currency of the future?

Notes are an ancient monetary concept desperate for rethinking in the Internet age. At its most basic level, a note is a promise to exchange money, goods or services at some point in the future. However a note is not quite a futures contract, because the promise need not ever be exercised. And a note is not really an options contract, because a note need not ever expire. The most obvious form of a note is what a U.S. dollar bill used to represent when we were on the gold standard. It was a promise that the holder of the note (dollar bill) could exchange the note for a dollar’s worth of physical gold at any time. Notes are a lot easier to store and deal with than gold, and so they make a lot of sense for getting work done efficiently. We could also talk about the fungibility of notes, but that is less important at this point. And notes are definitely easier to move around than loaves of bread, head of cattle, barrels of oil, or other physical stuff with intrinsic value.

A hoard of notes would also be a decent store-of-value in your savings account, as long as the writer of the notes remains solvent and trusted. For example, a million dollars worth of U.S. gold-convertible notes is a great retirement nest-egg, since most normal people expect the U.S. government to honor its promises for a long time.

When the entity writing the note is trusted by just about everyone — expected to honor its contract — then the writer can declare the notes to be unconvertible, all at once. The notes become fiat currency, currency that is not explicitly backed by anything but the trust that the note writer will not issue too many notes and inflate away peoples’ savings.

Why does most global economic activity happen using a handful of fiat currencies, like the U.S. dollar or Euro? Nations have traditionally supported their (fiat) currencies through policy and war, because before the Internet trust did not scale. Imagine a small town. Mel and Stannis are neighbors in this town. Mel trusts Stannis to honor his promises, and accepts a note from Stannis in return for mowing Stannis’s lawn for the next year. Stannis’s note he writes for Mel says something like “Stannis promises to give the bearer of this note 100 loaves of bread, anytime.” Mel’s landlord Dave also trusts Stannis, and so he has no problem taking Mel’s note as rent. Stannis has essentially printed his own money that is a lot more convenient that baking 100 loaves of bread. Now in the next town over, no one really knows Stannis. Therefore Dave will have a hard time making use of Stannis’s note when he visits there to spend time with his grandparents. Dave and Mel trust Stannis, but the people living in the next town over do not.

In this parochial example, trust has not scaled across the network of transactions and relationships. The money Stannis created, the note he wrote, is not all that useful to Mel. Instead she could insist on being compensated by a note from an entity more trusted the world over, say the First Bank of Lannister which has a branch in both towns. Mel, Stannis, Dave and his grandparents all probably trust the First Bank of Lannister to pay its debts.

If Dave wants to spend Mel’s note written by Stannis in the next town over, he can ask a third party to guarantee or sign-off on the note. This can be done by exchanging Stannis’s promise for a promise by the First Bank of Lannister, which is more trusted throughout the realm. The First Bank of Lannister would be compensated for extending its trust by taking a cut of the promise from Stannis.

So before he leaves on his trip, Dave takes his rent check (note) from Mel into the First Bank of Lannister. They write a new note saying “The First Bank of Lannister promises to give the bearer of this note 95 loaves of bread, anytime” and gives this note to Dave in exchange for the note written by Stannis. The bank has decided to take responsibility for chasing down Stannis if he turns out to reneg on his promise, and in return they are compensated with the value of five loaves of bread. Here the Bank of Lannister has also issued its own currency, but more as a middle-man than someone doing economic activity like Mel’s lawnmowing or Dave’s landlording.

This middle-man role is very important but also difficult to scale across a physical economy. Eventually someone refuses to trust the First Bank of Lannister, and then the chain of economic activity halts. This is why the world’s global economy has consolidated onto a few currencies, for reasons of both efficiency and trust.

The Internets
In the age of the Internet and pervasive social networks like Facebook and Linkedin, everyone is connected in a global network. This is the famous degrees -of- Kevin Bacon or Erdös Number concept. Any two people are connected by just a few steps along the network. Most of Stannis’s friends on Facebook would be willing to accept a note or promise from Stannis, and the same holds true for Dave, Mel and the First Bank of Lannister’s social networks. Since the whole of humanity is probably connected in a trust network, software can automatically write those middle-man notes along the chain of connections. Therefore any two people can automatically find a chain of trust for spending money.

Back to our example, but in the age of the Internet. Mel, Dave and Stannis all trust each other, since they are Linkedin contacts. Peter reneged on a note a few months ago, so no one really trusts Peter except Stannis. Everyone unfriended Peter but Stannis, so Peter has a very isolated social network. This time around we do not need to care about geography and small towns, since everyone is connected via the Internet and social networks. Let’s say Peter wants to buy an old iPad from Dave, and Dave thinks the iPad is worth about a hundred loaves of bread. Peter could try to write a note promising a hundred loaves of bread, but Dave would not accept this note since he does not trust Peter. Now for the cool part.

Peter goes to a notes exchange website (NoteEx), and asks for a hundred-loaf note that Dave will trust. The website knows that Stannis trusts Peter, and that Dave trusts Stannis. (See the triangle?) Through the website, Stannis writes Peter a note for one hundred loaves of bread that Peter gives to Dave in exchange for the iPad. Dave has a note he trusts in exchange for his good, at the price he wanted. Similarly Stannis receives a note written by Peter, whom he trusts. This note might be for 105 loaves of bread, giving Stannis a little cut in exchange for trusting the dodgy Peter. This five loaf interest, cut or edge is Stannis’s compensation as a middle-man.

This can all be done automatically by the NoteEx server with a list of middle-men volunteers. People volunteer to be middle-men up to a maximum amount of exposure or risk (i.e. one thousand loaves of bread total). Or middle-men could even offer to guarantee up to two degrees of Kevin Bacon away, for a much higher cut. After a bunch of people volunteer to be middle-men in the NoteEx process, all economic activity could be subsumed, with social networks ensuring that you only ever receive payment (promises) from people you trust. A NoteEx transaction could have more than one middle-man, up to the six degrees of Kevin Bacon maximum that we assume connects all people.

Ironically, the good or service underlying the notes is not all that important, since notes are very rarely redeemed. In the same way that powerful governments can support fiat currencies backed by nothing, fiat notes backed by loaves of bread will not actually turn everyone into a baker. Usually notes are exchanged with their value being the trusted promise, but not necessarily the realization. Heavy stuff here.

Decentralized Bakery
The NoteEx website would be built atop an open and standard protocol, and competing notes exchanges could borrow from the Bitcoin architecture to be decentralized (i.e. the shared ledger). More importantly, there would be a natural level of inflation in the system as the cuts or interest that middle-men demand increase the total value of all promises across the economy. And of course, notes are an excellent store-of-value because who would you trust more to support you in an emergency or retirement than your tightest friends & family?

So! We have a theoretical monetary system free from government interference, and one that encourages economic activity through modest and natural inflation.

Posted in market-microstructure, politics, quant, quantitative-analysis, trading | 2 Comments

Hashing Language

How do you build a language model with a million dimensions?

The so-called “hashing trick” is a programming technique frequently used in statistical natural language processing for dimensionality reduction. The trick is so elegant and powerful that it would have warranted a Turing Award, if the first person to use the trick understood its power. John Langford cites a paper by George Forman & Evan Kirshenbaum from 2008 that uses the hashing trick, but it may have been discovered even earlier.[1] [2] Surprisingly most online tutorials and explanations of the hashing trick gloss over the main insights or get buried in notation. At the time of this writing, the Wikipedia entry on the hashing trick contains blatant errors.[3] Hence this post.

Hash, Man

A hash function is a programming routine that translates arbitrary data into a numeric representation. Hash functions are convenient, and useful for a variety of different purposes such as lookup tables (dictionaries) and cryptography, in addition to our hashing trick. An example of a (poor) hash function would map the letter “a” to 1, “b” to 2, “c” to 3 and so on, up to “z” being 26 — and then sum up the numbers represented by the letters. For the Benjamin Franklin quote “beware the hobby that eats” we get the following hash function output:

(beware) 2 + 5 + 23 + 1 + 18 + 5 +
(the) 20 + 8 + 5 +
(hobby) 8 + 15 + 2 + 2 + 25 +
(that) 20 + 8 + 1 + 20 +
(eats) 5 + 1 + 20 + 19
= 233

Any serious hashing function will limit the range of numbers it outputs. The hashing function we used on Benjamin Franklin could simply take the first two digits of its sum, the “modulo 100″ in programming terms, and provide that lower number as its output. So in this case, the number 233 would be lopped-off, and the hash function would return just 33. We have a blunt quantitative representation or mapping of the input that is hopefully useful in a statistical model. The range of this hashing function is therefore 100 values, 0 to 99.

Now a big reason to choose one hashing function over another is the statistical distribution of the output across the function’s range, or uniformity. If you imagine feeding in a random quote, music lyric, blog post or tweet into a good hashing function, the chance of the output being any specific value in the range should be the same as every other possible output. For our hashing function with a 0-99 range, the number 15 should be output about 1% of the time, just like every other number between 0 and 99. Note that our letter-summing hash function above does not have good uniformity, and so you should not use it in the wild. As an aside, keep in mind that certain hash functions are more uniform on bigger input data, or vice-versa.

Another reason to favor one hashing function over another is whether or not a small change in the input produces a big change in the output. I call this concept cascading. If we tweak the Benjamin Franklin quote a little bit and feed “beware the hobby that bats” into our silly hash function, the sum is now 230, which gets lopped-off to 30 within the hash’s output range. This modest change in output from 33 or 30 is another sign that our toy hash function is indeed just a toy. A small change in the input data did not cascade into a big change in the output number.

Here the important point is that a good hashing function will translate your input into each number in its output range with same probability (uniformity), and a small change in your input data will cause a big change in the output (cascading).

That’s Just Zipf-y

In human languages, very few words are used very frequently while very many words are very rare. For example, the word “very” turns up more than the word “rosebud” in this post. This relationship between word and frequency is very convex, non-linear or curved. This means that the 25th most common word in the English language (“from”) is not just used a little more frequently than the 26th most common word (“they”), but much more than the lower ranked word (26th).

This distribution of words is called Zipf’s Law. If you choose a random word from a random page in the Oxford English Dictionary, chances are that word will be used very rarely in your data. Similarly if you were to choose two words from the OED, chances are both of those words will not be common.

The Trick

If you are doing “bag-of-words” statistical modeling on a large corpus of English documents, it is easy find yourself accommodating thousands or millions of distinct words or ngrams. For example the classic 20 newsgroup corpus from Ken Lang contains over 61,000 different single words, and exponentially more two-word bigrams. Training a traditional statistical model with 61,000 independent variables or dimensions is computationally expensive, to say the least. We can slash the dimensionality of a bag-of-words model by applying Zipf’s Law and using a decent hashing function.

First we identify a hashing function with an output range that matches the dimensionality we wish the data had. Our silly hashing function above output a number from 0 to 99, so its range is 100. Using this function with the hashing trick means our statistical bag-of-words model will have a dimensionality of 100. Practically speaking we usually sit atop an existing high-quality hashing function, and use just a few of the least significant bits of the output. And for computational reasons, we usually choose a power of two as our hash function output range and desired dimensionality, so lopping-off the most significant bits can be done with a fast bitwise AND.

Then we run every word or ngram in the training data through our adapted hashing function. The output of the hash becomes our feature, a column index or dimension number. So if we choose 28 (two -to-the-power-of- eight) as our hashing function’s range and the next ngram has a hash of 23, then we set our 23rd independent variable to the frequency count (or whatever) of that word. If the next hash is the number 258, we map to the output 3 at the bit level for the third dimension, or 258 = 255 + 3 = 255 + (258 MOD 255) more mathematically. Our statistical NLP model of the 20 newsgroup corpus suddenly goes from 61,000 to only 256 dimensions.

Wait a Sec’…!

Hold on, that cannot possibly work… If we use the numeric hash of a word, phrase or ngram as an index into our training data matrix, we are going to run into too many dangerous hash collisions, right?

A hash collision occurs when two different inputs hash to the same output number. Though remember that since we are using a good hashing function, the uniformity and cascading properties make the chance of a hash collision between any two words independent of how frequently that word is used. Read that last sentence again, because it is a big one.

The pair of words “from” & “rosebud” and “from” & “they” each have the same chance of hash collision, even though the frequency with which the four words turn up in English is varied. Any pair of words chosen at random from the OED has the same chance of hash collision. However Zipf’s Law says that if you choose any two words randomly from the OED, chances are one of the words will be very rare in any corpus of English language documents. Actually both words will probably be infrequent. Therefore if a collision in our hash function’s output occurs, the two colliding words are probably oddballs.

Two Reasons it Still Works

Statistical NLP bag-of-words models that use the hashing trick have roughly the same accuracy as models that operate on the full bag-of-words dimensionality. There are two reasons why hash collisions in the low-dimensional space of the hash function’s output range do not trash our models. First any collisions that do occur, probably occur between two rare words. In many models, rare words do not improve the model’s regression / classification accuracy and robustness. Rare words and ngrams are said to be non-discriminatory. Now even if rare words are discriminatory in your problem domain, probability suggests the rare words do not co-occur in the same document. For this reason, the two rare words can be thought of as “sharing” the same representation in the model, whether this is decision tree sub-trees or a coefficient in a linear model. The Forman & Kirshenbaum paper says “a colliding hash is either unlikely to be selected as a feature (since both words are infrequent) or will almost always represent the word that led the classifier to select it.”

We cannot use the hashing trick for dimensionality reduction in every statistical model. Zipf’s Law means most features or independent variables in a bag-of-words representation equal zero. In other words, a point in the dimensional space of the bag-of-words (a “word vector”) is generally sparse. Along these lines, John Langford says the hashing trick “preserves sparsity.” For a random specific word, the chance of two random examples both having a non-zero value for that feature is low. Again this is because most words are rare.

The hashing trick is Zipf’s Law coupled with the uniformity & cascading properties of a good hash function, and using these to reduce the dimensionality of a sparse bag-of-words NLP model.

Notes

[1] Actually the first public version of the hashing trick John Langford knew of was in the first release of Vowpal Wabbit in back in 2007. He also points out that the hashing trick enables very efficient quadratic features to be added to a model.
[2] Jeshua Bratman pointed out that the Sutton & Barto classic textbook on reinforcement learning mentions the hashing-trick way back in 1998. This is the earliest reference I have yet found.
[3] “the number of buckets [hashing function output range] M usually exceeds the vocabulary size significantly” from the “Hashing-Trick” Wikipedia entry, retrieved on 2013-01-29.
Posted in machine-learning, natural-language-processing, vowpal-wabbit | 12 Comments

Under the Hood of Buying & Selling Predictions

How do those futures markets like Betfair and Intrade work?

The managers of a prediction market decide upon a finite number of prediction contracts. A contract is essentially a description of some hypothetical future event. For example, a contract might be “Fadebook trades at least $50 per share in 2012.” Another important aspect of the contract specification is the anticipation and preemptive resolution of ambiguity. If Fadebook does a 2:1 split, does the event become “…at least $25 per share?” What if the asking price for Fadebook shares reaches $50, but the bid price does not? Contracts also specify an expiration date, a time by which the event must occur. In the case of the Fadebook contract, the obvious expiration date would be January 1st of 2013. For other contracts, the expiration will be arbitrary to allow for a final decision on the event’s occurrence or nonoccurrence.

Virtual Currency
A prediction is a single user’s opinion on the likelihood of the contract’s event occurring. The prediction market encourages users to form opinions about contract events, and encourages users to wager virtual currency on a contract. Users purchase virtual currency (henceforth “credits”) with real money, or perhaps are granted an amount of virtual currency in a freemium offering.

Contract Size
Each contract also specifies a size, which is both the minimum number of credits a user can wager on a contract, and the marginal increase in the size of a wager. If the prediction market choose a size of “5 credits” for the Fadebook contract, then an individual user can wager only 5, 10, 15… credits on the contract. Contract sizes are necessary in order better match both sides of the wager. Somewhat confusingly, experienced traders refer to each increment of the contract size as “a contract.” If I wager 15 credits on the Fadebook contract with a size of 5 credits, then I am said to be wagering “3 contracts.”

Direction
When making a wager on a contract, the user specifies the number of credits she will risk, in contract size units, and the direction of the wager. If a user believes the event is certain to occur and the user is eventually proven correct, she will profit from buying a contract whenever its likelihood is less than 100%. If a user believes an event is certain not to occur and turns out to be right, the user will profit from selling the contract when its likelihood is greater than 0%. The buyer of a contract is said to be long and the seller of a contract we call short.

Payoff
When the contract’s event occurs, the long side earns the contract size in credits for each contract, less the likelihood level where she initially make the wager. So if Fadebook’s stock hits $50 in October of 2012 while I am long 3 contracts bought at 25% likelihood, then the prediction market would immediately close the contract and credit my account with 5 credits (size), times 3 contracts, less 25% of the total, or 11.25 credits. The opposite also occurs for the short side. In this case, the 3 contracts the short side sold are closed out worthless at 0%, and the short’s account is reduced by 5 credits size, times 3 contracts, times 25% of the total. Again this is 11.25 credits but deducted from the short side of the wager.

If the contract expires as the year 2012 comes to a close, and I shorted 2 contracts at 60% back in February of 2012, then I will earn a profit of 5 credits size, times 2 contracts, times 60% or 6 credits. This is exactly what the long side of those 2 contracts would lose. The long side may be 1 user, or 2 different users each long 1 contract.

By definition, the long and short sides of a contract will always balance. A user is not able to be long a contract unless another user is short. This would be similar to a real futures contract traded in Chicago or New York, but where the long side is committed to “buying” the event for 100% if it occurs.

Orders
The current likelihood (henceforth “price”) of a contract is determined by looking at the contract’s order book on the prediction market. Order books are how buyers and sellers of contracts are matched, and indicate a prediction contract’s current price or likelihood. An order book is an ordered list of buying and selling prices. If there is no market in a contract and a user wishes to make a wager, her estimated likelihood becomes the best buying or selling price for the contract. From now on we drop the price or likelihood’s percent sign for brevity.

Say the Fadebook contract has just been listed and publicized on the prediction market. If I believe the event is very likely to occur, then I might offer 99 to anyone willing to sell at 99. If I am correct, then I will eventually earn 1 credits per contract for my trouble. I will have paid 99 for something that will earns me 100. Though perhaps I want to leave myself more room to profit, and instead offer 25 to anyone willing to sell at 25. This would mean a profit of 75 per contract when it expires. The prices at which every user is willing to buy forms one side of the order book, the buying prices or bids.

A similar process happens on the selling side. I want to be short a contract when I do not believe the event will occur. So I would sell to anyone for more than 0 likelihood or price. If I end up correct in my prediction and the event does not occur when I went short at 90 likelihood, then I earn 90% of the contract size for each contract. The collected prices at which all users are willing to sell forms the other side of the order book, the selling prices or asks.

Market Orders
When a wager occurs at a price matching one of the buy or sell orders currently available in the market, the order is instantly made complete by matching a long and a short side of a wager. If I want to buy the Fadebook contract at 30 or 35 and there is already a user with a selling price or ask of 32, my wager will be immediately matched. In this sense, my order never actually appears in the prediction market order book for the contract, since the wagers are instantly matched.

Often a user will want to buy or sell at whatever likelihood or price the market is currently offering. In this market order, the user takes whatever happens to be the best price available at the time. Market orders are risky because the user may not know the exact price at which she commits to the wager. Market orders are also said to leduce liquidity in the market, and may be considered less healthy for the prediction market than regular limit orders.

Trading Out Early
The fundamental advantage to prediction markets over traditional oddsmaking is the option for a user to exit out of a profitable or loser wager early, before the contract event’s expiration. A user who is currently long or short a contract is free to trade the position to another party at any point, even if the user has only held the contract for a few minutes. This is advantageous for a user who wants to cut their losses early on a losing contract, or wants to take their profit early on a contract that becomes profitable.

If I went long 5 of the Fadebook contracts at 35 a few months ago, but that contract now has an order book centered around 50 bid/ask, then I can sell the 5 contracts for a 15 profit per contract right now. I do not need to hold my Fadebook contracts until 2013. This is not going short the contract, but selling to a willing buyer in order to net out my position with a profit.

Liquidity & Accurate Predictions
The more users are actively participating in wagering on a contract, the more accurate the collective estimate of likelihood. The most recent trade price or likelihood of a very busy and popular contract is an excellent estimate of the real likelihood of the contract event occurrence. If a single user believes a contract has a 35% likelihood, but a thousand other users are trading that contract around 75% likelihood, chances are the first user is wrong!

Prizes & Incentives
Prediction markets are more powerful when users are incentivized with real compensation. Therefore even if the prediction market’s virtual currency simplifies the regulatory aspects of the project, credits need to be closely tied to real money or prizes, so users assume actual risk when making wagers. Also any prizes need to incentivize users to make careful wagers and not necessarily “swing for the fences” on each wager. In other words, each bit of virtual currency must contribute to winning prizes.

Users who risk nothing of value when making wagers will not turn out to form a particularly accurate prediction market.

Posted in forecasting, market-microstructure, trading | Leave a comment

Sequential Learning Book

Things have been quiet around here since the winter because I have been focusing my modest writing and research skills on a new book for O’Reilly. We signed the contract a few days ago, so now I get to embrace a draconian authorship schedule over the next year. The book is titled Sequential Machine Learning and will focus on data mining techniques that train on petabytes of data. That is, far more training data that can fit in the memory of your entire Hadoop cluster.

Sequential machine learning algorithms do this by guaranteeing constant memory footprint and processing requirements. This ends up being an eyes-wide-open compromise in accuracy and non-linearity for serious scalability. Another perk is trivial out-of-sample model validation. The Vowpal Wabbit project by John Langford will be the book’s featured technology, and John Langford has graciously offered to help out with a foreword. Therefore the book will also serve as a detailed tutorial on using Vowpal Wabbit, for those of us who are more coder or hacker than statistician or academic.

The academic literature often uses the term “online learning” for this approach, but I find that term way too confusing given what startups like Coursera and Khan Academy are doing. (Note the terminology at the end of Hilary Mason’s excellent Bacon talk back in April.) So, resident O’Reilly geekery-evangelist Mike Loukides and I are going to do a bit of trailblazing with the more descriptive term “sequential.” Bear with us.

From the most basic principles, I will build up the foundation for statistical modeling. Several chapters assume statistical natural language processing as a typical use case, so sentiment analysis experts and Twitter miners should also have fun. My readers should know computers but need not be mathematicians. Although I have insisted that the O’Reilly toolstack support a few of my old-school LaTeX formulas…

Posted in book, machine-learning, sequential-machine-learning, vowpal-wabbit | 6 Comments

What is There to Eat Around Here?

Or, why clams are bourgeois — the presence of clams on menus is indicative of a place where people spend a lot of their money on housing. This is how I found out.

We have all played the proportional rent affordability game. How much of my income should I spend on where I live? One rule of thumb is “a third,” so if you take home $2,400 per month you aim to spend about $800 on rent or a mortgage payment. Some play the hypothetical budgeting version of the game. We might pay more of our income for housing if it means being able to live in a particularly desirable area.

Expensive Housing
Here is a map of income normalized by housing expense, for a bunch of Bay Area neighborhoods. This information is from our Altos Research active market real estate data. More technically, each dot on the map represents the ratio of a zipcode’s household income to the weighted average of single family home list prices and multi-family home list prices. I used median numbers, to minimize the impact of foreclosures or extremely wealthy households. Single and multi-family home prices were weighted by listing inventory, so urban condos matter as much as those McMansions in the ‘burbs. The green dots are areas where proportionally more income is spent on housing, and blue dots are the opposite.

Bay Area Housing Proportional Housing Expense

The data shows that people living in the city of San Francisco spend a much larger proportion of their income on housing than Oaklanders or those in San Jose. If we assume that the real estate market is somewhat efficient, then those who choose to live in certain neighborhoods forgo savings and disposable income. Why is it that housing expenses for living in San Francisco are so much higher than San Jose, even when we control for income disparity?

The Real Estate Menu
Like a proper hack economist, I am going to gloss over the obvious driving factors of proportionally expensive housing, such as poor labor mobility, lack of job opportunities, and a history of minority disenfranchisement. I am a chef by training — culinary arts degree from CHIC, the Le Cordon Bleu school in Chicago — and remain fascinated by the hospitality industry. So instead of diving into big social problems, I focused on something flippant and easy to measure: Where people go out to eat, across areas with different levels of proportional housing expense.

I analyzed the menus of a random selection of 5,400 sit-down and so-called “fast casual” restaurants across the United States. This menu population is hopefully large and diverse enough to represent dining out in general, though it is obviously biased toward those restaurants with the money and gumption to post their menus online. However there is not a disproportionate number of national chain restaurants, since even the most common restaurant, T.G.I. Friday’s, is only about 2.5% of the population:

Restaurant Histogram

Menu Words
The next step in my analysis was counting the common words and phrases across the menus. Here are the top fifty:

1. sauce, 2. chicken, 3. cheese, 4. salad, 5. grilled, 6. served, 7. fresh, 8. tomato, 9. shrimp, 10. roasted, 11. served-with, 12. garlic, 13. cream, 14. red, 15. fried, 16. onions, 17. tomatoes, 18. beef, 19. rice, 20. onion, 21. bacon, 22. topped, 23. mushrooms, 24. topped-with, 25. steak, 26. vinaigrette, 27. spinach, 28. lettuce, 29. pork, 30. green, 31. potatoes, 32. spicy, 33. white, 34. salmon, 35. in-a, 36. soup, 37. peppers, 38. mozzarella, 39. lemon, 40. sweet, 41. with-a, 42. menu, 43. beans, 44. dressing, 45. fries, 46. tuna, 47. black, 48. greens, 49. chocolate, 50. basil

Pervasive ingredients like “chicken” turn up, as do common preparation and plating terms like “sauce” and “topped-with”. Perhaps my next project will be looking at how this list changes over time. For example, words like “fried” were taboo in the 90′s, but more common during this post-9/11 renaissance of honest comfort food. Now-a-days chicken can be “fried” again, not necessarily “crispy” or “crunchy”.

A Tasty Model
Next I trained a statistical model using the menu words and phrases as independent variables. My dependent variable was the proportional housing expense in the restaurant’s zipcode. The model was not meant to be predictive per se, but instead to identify the characteristics of restaurant menus in more desirable areas. The model covers over five thousand restaurants, so menu idiosyncrasy and anecdote should average out. The algorithm used was our bespoke version of least-angle regression with the lasso modification. It trains well on even hundreds of independent variables, and highlights which are most informative. In this case, which of our many menu words and phrases are correlated with proportional housing expense?

Why Clams are Bourgeois

The twenty menu words and phrases most correlated with low proportional housing expense (the bluer dots) areas:

1. tortilla, 2. cream-sauce, 3. red-onion, 4. thai, 5. your-choice, 6. jumbo, 7. crisp, 8. sauce-and, 9. salads, 10. oz, 11. italian, 12. crusted, 13. stuffed, 14. marinara, 15. broccoli, 16. egg, 17. scallops, 18. roast, 19. lemon, 20. bean

Several of these words of phrases are associated with ethnic cuisines (i.e. “thai” and “tortilla”), and others emphasize portion size (i.e. “jumbo” and “oz” for ounce). Restaurants in high proportional housing expense areas (greener dots) tend to include the following words and phrases on their menus:

1. clams, 2. con, 3. organic, 4. mango, 5. tofu, 6. spices, 7. eggplant, 8. tomato-sauce, 9. cooked, 10. artichoke, 11. eggs, 12. toast, 13. roll, 14. day, 15. french-fries, 16. duck, 17. seasonal, 18. oil, 19. steamed, 20. lunch, 21. chips, 22. salsa, 23. baby, 24. arugula, 25. red, 26. braised, 27. grilled, 28. chocolate, 29. avocado, 30. dressing

These words reflect healthier or more expensive food preparation (i.e. “grilled” or “steamed”), as well as more exotic ingredients (i.e. “mango” and “clams”). Also, seasonal and organic menus are associated with low proportional housing expense. The word “con” turns up as a counter-example for Latin American cuisine, as in “con huevos” or “chili con queso”.

Food Crystal Ball
This sort of model for restaurant menus could also be used for forecasting, to statistically predict the sort of food that will be more successful in a particular neighborhood. This predictive power would be bolstered by the fact that the population of menus has a survivorship bias, because failed or struggling restaurants are less likely to post their menus online.

This confirms my suspicion that housing expense is counter-intuitive when it comes to dining out. People who spend more of their income on housing in order to live in a desirable location have less disposable income, but these are the people who pay more for exotic ingredients and more expensive food preparation. Maybe these folks can’t afford to eat in their own neighborhood?

Posted in altos-research, natural-language-processing, politics, real-estate, restaurant-menus, restaurants | 1 Comment