Some Ben? – Page 3 – machine learning, systematic trading, cooking

On Breath Catalogue

Breath Catalogue is a collaborative work by artist/scholars Megan Nicely and Kate Elswit, and data scientist/interaction designer Ben Gimpert, together with composer Daniel Thomas Davis and violist Stephanie Griffin. The project combines choreographic methods with medical technology to externalize breath as experience. Dance artists link breathing and movement patterns in both creation and performance. In Breath Catalogue, the goal is to expand the intrinsic dance connection between breath and gesture by visualizing and making audible the data obtained from the mover’s breath, and inserting this into the choreographic process to make the breath perceptible to the spectator. To do so, they are working with prototypes of breath monitors from the San Francisco-based startup Spire. Following the San Francisco premiere, Katharine Hawthorne interviewed Ben Gimpert to understand the inner workings of the technology interaction.

Katharine Hawthorne: What is the output of the breath sensor (what does it “measure”), and how does this get manipulated or translated into the visualizations?

Ben Gimpert: The sensor measures four things: the diaphragmatic or chest pressure placed on the device, as well as three dimensions of acceleration. These four numbers are sampled about thirty times per second, and then sent over Bluetooth radio to a laptop.

Is there latency in the sensor, in other words, how quickly is information transmitted and processed?

There is very little latency between sampling and receiving the data via Bluetooth on the computer. However, there are lot of complications. First the Bluetooth transmitter in the breath sensor can be easily disrupted or interfered-with by other radio frequency devices. Ironically, a dancer’s body can also block the radio transmitter in the device.

There is also an important but nuanced frame-of-reference problem when using this sort of sensor in performance: The breath sensor does not know the Euclidean origin of the space, what acceleration might occur at point (0, 0). It similarly does not know what is the beginning or end of a breath’s pressure. For this reason, the different breath visualizations avoid working with much memory of a breath. They always work from the difference between this moment’s breath pressure, and the last moment one thirtieth of a second ago. For the mathematically inclined, the viz uses plenty of moving averages and variance statistics. These moving averages give an intentional sort of latency, as Kate or Megan’s movement eases into the visuals.

I am curious about how you chose the specific graphics and visuals used in the piece (the lines and the other projected images).

The famous Joy Division album cover. Smoky particles at a rave in the nineties. The dancers wanting their breath to leave an almost-real residue in the space.

In each case the breath is not visualized literally, because that would be boring. If the pressure sensor has a low reading, suggesting that Kate or Megan is at an inhale, the code might move the frequency blanket imagery in a snapped wave upward. Or invert the breath by sending the neon bars outwards.

Relatedly, how much did you collaborate with the lighting designer on integrating the data visualizations into the overall visual landscape of the performance?

Alan [Willner] was great. He designed the lighting based on videos we sent him of the piece and the visualizations ahead-of-time.

Who is driving the collaboration? Did the dancers/choreographers suggest modes of interaction and then the visuals develop to suit the choreography? Or did the possible visualizations shape the movement landscape?

I have seen a lot of contemporary dance where an often-male technologist projects his video onto usually-female dancers. This is both sloppy politics, and pretty lazy. I wanted there to be a genuine feedback loop between what my code would project in the space, and how Kate and Megan move. So I was in the dance studio with the dancers throughout the creation of the piece.

Can you provide an example of a section where the “movement” led the development and/or a section where the “tech” led? I want to understand this feedback loop better. How was this process different than a traditional dance/tech collaboration?

The tech side of a typical tech/dance collaboration starts with an existing piece of software like MaxMSP or Isadora. The tech person puts together a couple cool looking visualizations, and then brings these along to the studio. In rehearsal, the visualizations are typically put on in the background while the dancers “interpret” or literalize the visualization with their bodies. This produces a lot of great looking stuff, but there is very little feedback going either direction. In Breath Catalogue, we developed a custom piece of software specifically for the piece. This custom approach with a hardware prototype like the sensor and avoided a proprietary (commercial) software dependency. In a very practice-as-research sense, I would often make live changes to the code while in the studio. The Breath Catalogue visualizations run in a web-browser, so it was easy for Kate and Megan to run them outside of the studio. at home. We are planning to release the Breath Catalogue software under an open source license, to support the community. (Some utility is already released on Github.)

A few specific examples of tech/dance collaboration in Breath Catalogue: At one point I was dragging the virtual 3D camera around the frequency blanket visualization (i.e. Joy Division). Kate and Megan asked me to hold at the point when the viz was like a roof above their heads. They developed some movement vocabulary based on this metaphor, and then later I made modifications to the JavaScript code so the roof looked more naturally lit. Another time, on a whim, Kate and Megan noticed that the breath sensor does a heightened job of tracking breath when the dancer is physically against a wall. That was the genesis of the “wall pant” section. My aesthetics run toward grand gestures and the baroque. In general, contemporary dance tends to minimalism and the referential, which nudged the visualizations toward abstract shapes and muted colors.

How much communication occurs between you and the performers throughout the performance?

Quite a bit. The breath sensor was an unpredictable aspect of the performance, but we three did not want to fake it. So we decided to err on adaptivity instead of pre-recording everything, and this meant a lot of thumbs-up & down cues during the transitions which Hope Mohr noticed for her review. Some of our music was cued off of Kate or Megan taking a certain shape, while at other points the dancers were waiting on the sensor’s connection.

There’s a moment in the piece when the Megan takes off the sensor and transfers it to Kate. Is their breath data significantly different? Also, has this moment ever caused any technical difficulties? Does the sensor have to recalibrate to a different body?

Yes, Kate and Megan each have a distinct style of breathing. If you are adventurous, this can be teased out of the breath data we posted online. In this piece, Megan’s breath is usually more staccato and Kate’s sustained. The sensor reconnects at several points, which is technically challenging. In the next iteration of Breath Catalogue, we will be using multiple sensors worn by one or more dancers. The visualization software that I built already supports this, but it is trickier from a hardware standpoint.

In your experience, how much of the data visualizations translate to the audience? How easy is it for an untrained eye to “get” what is going on and understand the connection between the performer’s breathing and the images?

It turns out to be quite difficult. We added a silent and dance-less moment at the beginning of the piece so the audience could understand the dancer’s breath’s direct effect on the viz. Yet, even with that, the most common question I have been asked about my work with Breath Catalogue was about the literal representation of the breath. As contemporary dance audiences, we are accustomed to referential and metaphorical movement. However I think visualizations are still expected to be literal, like an ECG. Or just decorative.

What is your favorite part of the piece?

In the next-to-last scene, the wireless pocket projector was reading live sensor data from the dancer via the attached mobile phone. Which was pretty fucking tough from a technical standpoint. Also the whimsical moment when Kate watches and adjusts her breath according to the baseline of that Police song. And when Megan grabs the pocket project for the film noir, and then bolts.

If you had the time to rework or extend any section, which would it be?

In one scene we remix the live breath data with data from earlier in that evening’s show. I would have made this more obvious to the audience, because it could be a pretty powerful way to connect breath and time passing.

Philanthropy Picks

The great Dinah Sanders does an annual blog post with her election picks, which is incredibly useful for navigating California’s referendum system. In this vein, here is a list of the philanthropies and charities where we donated this December 31st:

Organization for Black Struggle (25%), an old-school post-Black Power organization addressing the asshattery in Ferguson, MO. Open Society Foundations just gave them a lot of money to run with.
Missourians Organizing for Reform and Empowerment (10%), a new-school group addressing the asshattery in Ferguson, MO. Open Society Foundations just gave them a lot of money to run with.
California State Parks Foundation (10%), to protect some of the most beautiful places on Earth.
Doctors Without Borders (25%), big and famous for a reason, dealing with Ebola well.
OneVoice International (15%), supporting a two-state solution in Israel, smartly.
Girls Who Code (15%), young girls need to see serious, non-feminized (“softened”) science as an awesome career.

In previous years some of our favorites were Electronic Frontier Foundation (EFF), Planned Parenthood, and the American Civil Liberties Union (ACLU).

Just Put the Bird in the Fucking Oven

Every year there seems to be some elaborate new Thanksgiving turkey preparation technique. For a while we were all deep-frying the poor things, and our parents once tried putting a can of soda in the cavity. To baste from within, or something. By 2014 we have probably reached peak turducken, but nesting poultry is still a thing. Other tricks like butterflying (spatchcocking) and brining will have their day. These techniques have one thing in common. There is always the one anecdote of success, and a quiet majority that knows turducken was still pretty bland.

Yes the turkey is the focus of the Thanksgiving meal, but that does not mean it should be the focus of our cooking efforts. Look — turkeys are incredibly lean birds. They lack duck’s self-basting fattiness, or a chicken’s mild but distinctive flavor. Instead of endangering your porch or driveway with a dubious single-purpose deep-fryer, just put the bird in the fucking oven. Turkey is always dry, and you should accept the zen of this statement. Focus on your vegetable sides and gravy, and you will have a much better dinner.

Here is how to do Thanksgiving turkey right:

Order an organic, hormone-free, all-natural, free-range, beer-fed, daily-massaged, Wagyu, Angus turkey from a farm in Portland. Or do whatever is your closest approximation. Preferably he’s named Colin. (Yes, all eating turkeys are male, because the females lay eggs. Duh.) This might be the only decision that will actually matter for the bird’s taste and juiciness. Avoid a bird that has been frozen. Order about a pound of bird per person at your dinner, adjusting for kids and vegetarians.
Preheat your over to 450 degrees fahrenheit, or whatever that is in Europe.
Prepare a little bowl of seasoning. I like kosher salt, lots of cracked black pepper, minty and citrusy dried herbs like marjoram, and a pinch of sugar. You want several tablespoons of seasoning mix.
Wash the bird inside and out, removing the giblets (offal) inside. Yes, washing poultry may get food-borne nasties like Salmonella all over your kitchen. That is why you have paper towels and a disinfectant handy. Also make sure the bird is fully plucked. An old pair of dull tweezers can help. The more hippie your bird (see #1 above), the more likely it is to have some lingering feathers.
Dry the bird with paper towels. Brush him with melted butter, and then sprinkle all over with your seasoning mix.
Turn the bird upside down on your roasting pan. This bastes the dry breasts with the meagre fat that is in the bird. Oh, and the butter. Butcher-tie the legs and wings close to the body, if you are feeling fancy.
Just put the bird in the fucking oven.
After about a half-hour, or whenever the bird gets brown, turn your oven down to 325 degrees. Then after about three hours more, check the temperature inside a thigh. You want at least 165 degrees fahrenheit, but remember the bird will continue to carry-over cook a bit after you pull it out of the oven. Do not baste the bird, since this loses the heat in the oven and does not help much anyway. Do not open the oven to peek and smell and fret every ten minutes, even if your guests have arrived. Do not cover just the right breast with aluminum foil, and do not stuff the bird. It will all work out, I promise.

The Gravy to End All Gravies

I have been proposed marriage by men and women both, for my gravy. Get some chicken stock and a glassful of sherry boiling in a sauce pain. Add the turkey giblets. Turkey kidneys for the win! If you have some mirepoix chopped-up (onions, carrots and celery), toss them in the pan. Simmer for 45 minutes-or-so.

Since you have been smart and not bothered basting the turkey (right?), your roasting pan probably has a bunch of browned juices and fat. This is fond, the nectar of the gods. Strain your simmering stock right into the roasting pan. Scrape all that lovely fond up into the liquid, with a wooden spoon. (If you do not have wooden cooking spoons, you are a bad person and will always be a failure as a cook.) Return the liquid to your sauce pan, and simmer for about 20 more minutes, then strain again into a new saucepan.

Make a roux in a non-stick pan on the side. I use bread flour and whole, unsalted butter in approximately equal portions by volume. (Don’t overthink this.) Stir the roux as your butter melts. If you want to feel southern, let the roux brown a little bit. Whisk the roux into your simmering stock, and boil for ten minutes to thicken. Add some lemon juice and a ton of salt. If the gravy does not taste right, add more salt. If it still does not taste right, add more salt.

I can hear you asking about the cornstarch… Remember that part about making the best gravy ever? This requires butter, as all good things do. Compared to the glory that is roux, cornstarch is weak sauce.

Employee Founding

Am I a cofounder or an employee?

There is prestige to having been a cofounder of a startup, someone who was there from the beginning taking the lifestyle risk in return for the possibility of striking gold and changing the world. Now with that breathless sentence out of the way, how do you know if you are a founder or an employee? To me there are four key questions to answer:

Is the startup funded externally, from an outside entity like a venture or seed fund? This would be someone without huge sunk costs choosing to hand over money, in exchange debt or equity and upside in the startup’s future.
Is the startup selling to businesses (“enterprise”), and does the venture have a paying client-or-two outside of the Silicon Valley scene? Consulting for your buddy’s startup does not count.
Is the startup selling to consumers, and have consumers written checks or swiped their credit cards for actual money? Tons of freemium traction does not count.
Are you working part-time on something else simultaneously? If you spend every Tuesday and Thursday working as a barista to pay the bills, you are not full-time.

If the answer to any of the three is “yes,” then you are probably an employee and not a founder or cofounder, de facto or otherwise.

Outside Ukulele

A model for the POU, probability -of- ukulele.

The Outside Lands 2014 lineup looks to be one of the best in years, and as usual it will be difficult to decide which stage to watch over the weekend. To help, I wrote an NLP model that measures the degree to which a band is likely to lapse into entitlement and self-parody. So think of it as a musical spectrum, from Kanye West to Death Cab for Cutie.

Kanye West
Flume
Paolo Nutini
Ben Howard
Watsky
Ray LaMontagne
Duck Sauce
Jonathan Wilson
Run the Jewels
Jagwar Ma
Tiësto
Big Freedia
Lykke Li
The Brothers Comatose
Kacey Musgraves
Valerie June
Atmosphere
Tycho
Macklemore & Ryan Lewis
Tom Petty & the Heartbreakers
Tegan & Sara
Haim
Bleachers
Holy Ghost!
Christopher Owens
Dum Dum Girls
Lucius
Gold Panda
Courtney Barnett
Vance Joy
Bear Hands
RayLand Baxter
Gardens & Villa
Imelda May
Mikal Cronin
Finish Ticket
Tumbleweed Wanderers
Boys Noize
SBTRKT
The Kooks
The Flaming Lips
The Killers
Grouplove
Warpaint
Arctic Monkeys
Chromeo
Typhoon
Chvrches
Capital Cities
Local Natives
John Butler Trio
Deer Tick
Greensky Bluegrass
Woods
Tedeschi Trucks Band
The Soul Rebels
The Districts
Nicki Bluhm and The Gramblers
Spoon
Jenny Lewis
Phosphorescent
Cut Copy
Night Terrors of 1927
Givers
Disclosure
Death Cab For Cutie

I Program in Whatever

A friend just asked me how to get better at JavaScript, the programming language du jour for Silicon Valley gigs. Or more generally, whether “practice” is the way to overcome learning barriers in programming.

The short answer is, indeed, you just need to practice. The great Peter Norvig says becoming a good coder takes ten years. Bright people heeding good advice can slash these ten thousand required hours quite a bit.

Types of Language
Though if I were you, I would start with separating the learning of a particular programming language from becoming a good programmer. This is one of the trickiest concepts for people coming into programming from another field. The doing of computer science and software engineering has very little in common with the syntax or standard library of a particular programming language, JavaScript or otherwise. Sapir-Whorf be damned, but a programming language is just a tool while a (real) language is a way to communicate. Would you say becoming a radiologist is the same thing as learning to use an x-ray machine? Are statistics and Excel the same thing?

The infamous ThoughtWorks interview process for software engineers that I went through ages ago had almost no questions about standard libraries or syntax. (“How do you close a socket in C?”) No one cares, because you can always look that up in a book. Instead most of the questions were about abstraction, with a few here & there about algorithms. (“How would you cleanup the coupling between this infovis module and the database?”)

Good programmers learn new languages trivially, because they all have the same underpinnings. I find it helpful to think of three schools of programming language now-a-days. The first are the aspiring or popular languages like JavaScript, Go, Ruby, Python, Java, C#, C++ and C. These languages all have their imperative syntactic roots in ALGOL from the 1950’s. The languages are heavy on the syntax, and try to stop programmers from shooting themselves in the foot.

The next school of programming languages are the lower-case-el lisps like Scheme and Clojure. The most important distinction of a lisp is its homoiconicity, a cumbersome term that means you write code in a data structure the programming language is good at manipulating. Paul Graham of Y Combinator is a famous proponent of coding in lisps. They are more powerful and expressive than the popular languages, so it is easier for a good Scheme programmer to pick up Python than the opposite. Even Go’s statically-linked by default killer feature was common in the Lisp and Smalltalk communities thirty years ago.

My third class of programming languages are the functional languages providing different degrees of type safety, like Haskell, OCaml and Erlang. These languages discourage state and side-effects, and by doing so help code run across many CPUs or machines. Functional languages are also about code that is provably correct. This school of programming language is (arguably) more expressive and powerful than even the lisps, so a Haskell hacker should be able to pick up Clojure more easily than the Clojure programmer could learn OCaml.

I intentionally avoid classifying programming languages according to their object oriented-ness, since OOP is just another way to generalize and abstract the coupling between different parts of a software system. You can do object-oriented programming in any language, but languages like Java and Ruby force the issue. (Yes, you can write object-oriented systems in old school C.) Don’t bother with domain-specific languages like SQL, Matlab or R, since they encourage bad habits and are easy to learn later. Nothing is scarier than a R or Python programmer who has never written any lisp.

If you are trying to become a better programmer, the best thing you can do is learn the underlying history and structure of all three schools of programming language. “All of this has happened before, and all of this will happen again.” However the closer your learning language is to the third school, the quicker you will start to understand the core of the matter.

A Little More Advice
What side projects are you helping code? If the answer is “I just program at work” or “I just read a lot of code on Github,” then you will never be a great coder. The advice Hilary Mason got on Twitter a while back was iffy in this regard.

Have you worked through the amazing SICP book yet? There is a reason it was MIT’s main textbook for a zillion years. The book seems to be a cultural signal or marker of good coders. Others think the Van Roy & Haridi book is better than SICP, but the writing style is really dry.

New Sentiment Dataset

The good folks in Stanford’s Natural Language Processing Group have built a powerful new dataset for a paper being presented at the EMNLP conference in Seattle next month. The underlying foundation of the dataset is not particularly exciting, being yet another corpus of labeled movie reviews: The review sentence “Stealing Harvard doesn’t care about cleverness, wit or any other kind of intelligent humor” is provided along with its negative sentiment label, for example. What is more interesting is the corpus providing sentiment labels at every level of composition. So for the same sentence, the dataset also provides a distinct sentiment label for the sub-phrase “any other kind of intelligent humor” which is actually positive. Hence the dataset is a treebank, not just your typical corpus. A lot of Mechanical Turk wrangling went into this! This compositional and recursive labeling is a great resource for training contextual models, especially ones that go beyond the bag-of-words legacy.

Here at Trending we are experimenting with an online, regularized, high-dimensional linear approximation to the Stanford paper’s tensor RNN model, one that lets us use the whole Vowpal Wabbit stack. Next month they plan to release some (Matlab) code to parse the treebank, but have already released the data itself. Therefore I put together a simple Ruby module to parse the treebank, for your own statistical NLP, sentiment and machine learning projects. It includes a bit of Graphviz logic to render phrase trees and their sentiment as SVG:

The module is hosted on Github at “http://github.com/someben/treebank/” under a nice Free license.

Separated by the Same Language

Some snarky, some important advice about America and England.

About ten years ago, I moved from Chicago to London for grad school. I intended to spend a few years in the United Kingdom, but my best laid plans saw me there for about five years. This is a much longer span of time than the typical study abroad or a backpacker’s tour. This summer I returned to England for an extended visit and observation. Time has clarified some non-intuitive quirks I didn’t know I had learned while living here. So a list for future expats, tourists and the curious:

London dominates English culture, far more than New York or Los Angeles dominates American. It is the largest city in the European Union, sprawling bigger than Paris or Rome, and probably the most diverse. On the ground, London fashion leads New York and Los Angeles by a few years. Yes, even New York City. Really.
The most bureaucratic aspect of a very bureaucratic country is consumer banking. Everything about English checking accounts, ATMs and credits cards is mind-boggleingly difficult, inefficient and wasteful. Things are still mostly done on paper, with proofs of residence, reference letters and other signs of class being the necessity. Plan to spend literally ten times the amount of effort screwing around with English banks as you would in America.
The opposite is true of The Internet. When it comes to healthy competition among mobile phone providers and ISPs, England is incredibly high-tech. This is probably because England is geographically small and wealthy. So pay-as-you-go plans with dumb phones are convenient and dirt cheap, and getting fiber optic broadband to your flat is trivial.
The English are far more sensitive to class than Americans, especially around verbal accents. People in England can be extremely wealthy but still “low class,” and vice-versa. Differentiating wealth from class is probably the most alien aspect of English culture, for Americans. My favorite breakfast place in Bristol has a reputation for being posh (a.k.a. high class), but is actually less expensive than most supposedly bohemian hipsteraunts in the city. The English are more likely to “unlearn” a low-class accent, and Americans mistakenly think splashing a lot of cash guarantees privilege.
Restaurant servers in England rely less on tips for their income, which makes the service either atrocious, or more honest — depending on your politics. American-style tipping is becoming more common in England, but still the exception. Go with 10% atop the bill if you had good service, otherwise keep the change. You always have the right to dispute any gratuity automatically included in a bill. Do not tip if you pick up a round of drinks at the bar.
Speaking of which, English drinkers take turns buying full rounds of drinks for the group. This is good etiquette, and something Americans should take up. The English will notice if you never happen to run for a round, and you will get a bad reputation. Americans think of themselves as heavy drinkers, but we are actually more teetotaling than the English.
“In America a hundred years is a long time, and in England a hundred miles is a long way.” Because English culture is so old and the country so densely populated, there is a lot of diversity even between neighboring towns. Driving a couple hours for a visit is nothing to an American, but can baffle an English person.
Most English do a good job of differentiating American politics from the American people, even if we do elect those goofballs in DC. Politically speaking, our country is seen as an isolationist and violent bully. But culturally, everyone loves our hip hop and big-budget movies.
The English are as likely to think of themselves as European as not, so membership in the EU is a constant point of political tension here. The English are a bridge between the New and Old Worlds. The snarky newspaper headline is “Fog in the English Channel: Europe Cut-off!”
Being invited into an English home for a meal, tea or supper is a big deal, more so than in America. Take it as flattery and bring a bottle of wine.
The English can hate their (elected) government, but still love their country. This is one surprising upside of still having a monarch. Americans who hate their elected leaders are more likely to be seen as “unpatriotic.”
Taxes in the UK are actually not that much higher than in the US, despite what American politicians imply. My nominal tax rate as an evil banker in London was only a few percent more than it was working in Chicago. The English love to hate on the National Health Service (NHS), but it does a decent job of providing widely-accessible health care. There is a parallel private health care system for the wealthy, which is much more American in style. Most English see health care as a civil right like suffrage, unlike Americans who usually see health care as an expense.
That said, the English are not necessarily more healthy than Americans, but they are definitely thinner. You can usually spot the American tourist by their weight and the fact that they do not smoke.
The geography is confusing but easy to memorize. Britain or Great Britain is the large island off the coast of Europe. It contains the countries of England, Wales and Scotland. So the Scottish are British, but definitely not English! However the United Kingdom includes Northern Ireland, which is not (Great-) British. Sometimes the UK is represented as a whole (i.e. at The Olympics), while at other times the individual countries in the UK matter (i.e. soccer). The UK flag (the Union Jack) is an overlay of the English, Scottish and old Irish flags. The English flag is about St. George the dragon slayer, and looks like a red cross on white.
The English are a pretty secular people. They are not necessarily atheists, but religion is just not that big of a deal.
Beer is the only inexpensive thing in England. Well, maybe eggs and milk in the grocery store also. The best and most traditional beer is the hand-pulled sort you find at a pub. Start with these bitters, and then try the bright, alcohol-heavy and bubbly lagers. Timothy Taylor’s Landlord is a fine example. (Most Americans only ever drink lager or the occasional stout like Guinness.) Yes English beer is served warmer than American, but the English weather is cooler too. Cocktails in England usually mean carefully measured 25ml shots, leading an English friend to flatter America as the “land of the free-pour.”
The best fish & chips is not found in pubs, but in dedicated shops called chippies. To find a chippy, look for counter service, paper-wrapped fries and a small menu. Good fish & chips -fish has a tasty, crispy batter around surprisingly delicate fish. Greasy fish inside is not good fish & chips -fish. Examples are the Fryer’s Delight on Theobald’s Road in Bloomsbury in London, and Fish Lovers on Whiteladies Road [sic] in Bristol.
The solution to late-night, drunk munchies in England is your Middle Eastern kebab shop. Mayonnaise-heavy garlic sauce on your chips is a must, especially after a few pints.
Talk is of “the pub” as if there is only one, but this is just a quirk of language. There is not a place called The Pub, or ever just one pub in an area. You just say “meet me at the pub.” Similarly, English folks will refer to “my local [pub].”
The weather in England is grey and wet, but actually very mild. This is because of the North Sea jet stream, even though the island is on latitude with Scandinavia. Despite the Dickens novels, snow is rare here. And compared with America, there are very few bugs and insects. There have been people living in every part of England “forever,” so there is very little actual wilderness even though the countryside is green and pretty. The high latitude also means very dark winters, and long summer days. There is nothing like leaving the pub at nine o’clock in August while there is still plenty of sunshine.
Americans are terrible with European and British geography, but the English are just as bad with ours. When I mention my hometown of Chicago to many English, they presume it is near the East Coast because of movies with skyscrapers and organized crime. Explaining that Chicago is a seventeen hour drive from New York City usually stuns the table… Two friends from Barcelona and the Black Forest in Germany actually grew up closer to each other than my wife and I, from Manhattan and Chicagoland.
Traditional businesses in England have flaky and frustrating hours, especially as an American used to working from nine to five, and running errands outside of this window. While I lived in England, pubs were granted more flexible hours (2005) and smoking was banned (2007). So thankfully pubs are no longer required to close early and go lock-in.

What is a Promise Worth?

How do you prevent hyperinflation without destroying the economy? The answer ain’t Bitcoin.

A virtual currency like Bitcoin uses a decentralized proof-of-work ledger (the block chain) to solve the the double-spending problem. “Satoshi Nakamoto” deserve serious accolades for this clever architecture, but Bitcoin has a few serious problems. The first is its lack of security. The infrastructure around the currency is shoddy and fragile. The website where 80% of Bitcoin trading currently occurs is called the Magic: The Gathering Online Exchange (a.k.a. Mt.Gox). Recently Mt.Gox has crashed and been cracked, and does not support easy shorting. More importantly, the Bitcoin system may never mature without a central authority spending a lot of (traditional) money to build-out the infrastructure, with negligible or negative financial return-on-investment. Without a social program, in other words.

Even if Bitcoins did have the infrastructure and liquidity of a traditional currency like U.S. dollars or Japanese yen, there is another more fundamental problem with Bitcoin becoming the money of the future. Bitcoins are intrinsically deflationary.

The future will always be in one of two states: Either Bitcoin miners are running up against the limits of Moore’s Law, and are unable to profitably mine new Bitcoins. Or some bullshit singularity has occurred, giving us all access to infinite computational power. In this state, we would run up against the Bitcoin architecture’s hard-coded monetary supply cap of twenty-one million Bitcoins.

If human desire is infinite, then people will always want more money for goods and services. (All else equal, of course!) So we have an intrinsically fixed supply of a fungible good along with increasing demand. Therefore a Bitcoin is guaranteed to increase in value over time. Any fraction of a Bitcoin is guaranteed to increase in value over time. This may sound good if you happen to have a lot of BTC (Bitcoin) in your wallet. However at a macroeconomic level deflation is catastrophic, which I will explain.

A Hamburger on Tuesday
Would you trade something today that is certain to be worth more tomorrow? What about if the “something” is a currency, a good that has no intrinsic value other than it being money? (You cannot heat your house with the digital dollars in your checking account. Gotta pay the utility company first.) In an emergency you might spend your deflating currency, but in general you should hold onto your BTC as long as possible. And since there is uncertainty about the degree to which Bitcoin will deflate, the market will not instantly price BTC correctly. The BTC price of goods and services will not instantly adjust to match the level of computational power available to miners.

Some Bitcoin proponents think we can instantly discount the BTC price of all goods and services to sync-up with systematic BTC deflation, but this would need a seriously high-tech payment infrastructure. Square and Stripe are trying, but does anyone seriously believe the prices of all goods and services can be discounted in real-time by a macroeconomic indicator? We can’t even ditch the wasteful dollar bill!

The Bitcoin bulls also emphasize a currency’s dual role as a means of transaction and a store-of-value, but intrinsic deflation trashes both roles simultaneously. As a means of transaction, deflation makes allocating capital (money) across projects and activities difficult, and again, requires that perfect payment infrastructure. Since systematic deflation destroys every asset’s value and discourages economic activity, deflationary currencies do badly as stores-of-value. Less economic activity means GDP contraction and decreased livelihood. Yes, despite what Professor von Nimby may have spewed in your Postmodern Marxist Studies class, GDP is a very strong indicator for overall human happiness. Perpetual economic contraction makes your savings account irrelevant. You might have a zillion super-valuable BTC in your digital wallet, but you have nothing to spend them on. In other words, if you think (hyper-) inflation is bad, deflation is even worse…

Passing Notes
Let us go back to a few of the original Bitcoin goals. Bitcoin proponents want an efficient, liquid currency immune from the distortion caused by a government or central bank’s monetary policy. This is reasonable since inflationary monetary policy has a sad history of trashing peoples’ savings accounts, in places like the Weimar Republic or more recently in Argentina. So how can we build the decentralized, non-deflationary currency of the future?

Notes are an ancient monetary concept desperate for rethinking in the Internet age. At its most basic level, a note is a promise to exchange money, goods or services at some point in the future. However a note is not quite a futures contract, because the promise need not ever be exercised. And a note is not really an options contract, because a note need not ever expire. The most obvious form of a note is what a U.S. dollar bill used to represent when we were on the gold standard. It was a promise that the holder of the note (dollar bill) could exchange the note for a dollar’s worth of physical gold at any time. Notes are a lot easier to store and deal with than gold, and so they make a lot of sense for getting work done efficiently. We could also talk about the fungibility of notes, but that is less important at this point. And notes are definitely easier to move around than loaves of bread, head of cattle, barrels of oil, or other physical stuff with intrinsic value.

A hoard of notes would also be a decent store-of-value in your savings account, as long as the writer of the notes remains solvent and trusted. For example, a million dollars worth of U.S. gold-convertible notes is a great retirement nest-egg, since most normal people expect the U.S. government to honor its promises for a long time.

When the entity writing the note is trusted by just about everyone — expected to honor its contract — then the writer can declare the notes to be unconvertible, all at once. The notes become fiat currency, currency that is not explicitly backed by anything but the trust that the note writer will not issue too many notes and inflate away peoples’ savings.

Why does most global economic activity happen using a handful of fiat currencies, like the U.S. dollar or Euro? Nations have traditionally supported their (fiat) currencies through policy and war, because before the Internet trust did not scale. Imagine a small town. Mel and Stannis are neighbors in this town. Mel trusts Stannis to honor his promises, and accepts a note from Stannis in return for mowing Stannis’s lawn for the next year. Stannis’s note he writes for Mel says something like “Stannis promises to give the bearer of this note 100 loaves of bread, anytime.” Mel’s landlord Dave also trusts Stannis, and so he has no problem taking Mel’s note as rent. Stannis has essentially printed his own money that is a lot more convenient that baking 100 loaves of bread. Now in the next town over, no one really knows Stannis. Therefore Dave will have a hard time making use of Stannis’s note when he visits there to spend time with his grandparents. Dave and Mel trust Stannis, but the people living in the next town over do not.

In this parochial example, trust has not scaled across the network of transactions and relationships. The money Stannis created, the note he wrote, is not all that useful to Mel. Instead she could insist on being compensated by a note from an entity more trusted the world over, say the First Bank of Lannister which has a branch in both towns. Mel, Stannis, Dave and his grandparents all probably trust the First Bank of Lannister to pay its debts.

If Dave wants to spend Mel’s note written by Stannis in the next town over, he can ask a third party to guarantee or sign-off on the note. This can be done by exchanging Stannis’s promise for a promise by the First Bank of Lannister, which is more trusted throughout the realm. The First Bank of Lannister would be compensated for extending its trust by taking a cut of the promise from Stannis.

So before he leaves on his trip, Dave takes his rent check (note) from Mel into the First Bank of Lannister. They write a new note saying “The First Bank of Lannister promises to give the bearer of this note 95 loaves of bread, anytime” and gives this note to Dave in exchange for the note written by Stannis. The bank has decided to take responsibility for chasing down Stannis if he turns out to reneg on his promise, and in return they are compensated with the value of five loaves of bread. Here the Bank of Lannister has also issued its own currency, but more as a middle-man than someone doing economic activity like Mel’s lawnmowing or Dave’s landlording.

This middle-man role is very important but also difficult to scale across a physical economy. Eventually someone refuses to trust the First Bank of Lannister, and then the chain of economic activity halts. This is why the world’s global economy has consolidated onto a few currencies, for reasons of both efficiency and trust.

The Internets
In the age of the Internet and pervasive social networks like Facebook and Linkedin, everyone is connected in a global network. This is the famous degrees -of- Kevin Bacon or Erdös Number concept. Any two people are connected by just a few steps along the network. Most of Stannis’s friends on Facebook would be willing to accept a note or promise from Stannis, and the same holds true for Dave, Mel and the First Bank of Lannister’s social networks. Since the whole of humanity is probably connected in a trust network, software can automatically write those middle-man notes along the chain of connections. Therefore any two people can automatically find a chain of trust for spending money.

Back to our example, but in the age of the Internet. Mel, Dave and Stannis all trust each other, since they are Linkedin contacts. Peter reneged on a note a few months ago, so no one really trusts Peter except Stannis. Everyone unfriended Peter but Stannis, so Peter has a very isolated social network. This time around we do not need to care about geography and small towns, since everyone is connected via the Internet and social networks. Let’s say Peter wants to buy an old iPad from Dave, and Dave thinks the iPad is worth about a hundred loaves of bread. Peter could try to write a note promising a hundred loaves of bread, but Dave would not accept this note since he does not trust Peter. Now for the cool part.

Peter goes to a notes exchange website (NoteEx), and asks for a hundred-loaf note that Dave will trust. The website knows that Stannis trusts Peter, and that Dave trusts Stannis. (See the triangle?) Through the website, Stannis writes Peter a note for one hundred loaves of bread that Peter gives to Dave in exchange for the iPad. Dave has a note he trusts in exchange for his good, at the price he wanted. Similarly Stannis receives a note written by Peter, whom he trusts. This note might be for 105 loaves of bread, giving Stannis a little cut in exchange for trusting the dodgy Peter. This five loaf interest, cut or edge is Stannis’s compensation as a middle-man.

This can all be done automatically by the NoteEx server with a list of middle-men volunteers. People volunteer to be middle-men up to a maximum amount of exposure or risk (i.e. one thousand loaves of bread total). Or middle-men could even offer to guarantee up to two degrees of Kevin Bacon away, for a much higher cut. After a bunch of people volunteer to be middle-men in the NoteEx process, all economic activity could be subsumed, with social networks ensuring that you only ever receive payment (promises) from people you trust. A NoteEx transaction could have more than one middle-man, up to the six degrees of Kevin Bacon maximum that we assume connects all people.

Ironically, the good or service underlying the notes is not all that important, since notes are very rarely redeemed. In the same way that powerful governments can support fiat currencies backed by nothing, fiat notes backed by loaves of bread will not actually turn everyone into a baker. Usually notes are exchanged with their value being the trusted promise, but not necessarily the realization. Heavy stuff here.

Decentralized Bakery
The NoteEx website would be built atop an open and standard protocol, and competing notes exchanges could borrow from the Bitcoin architecture to be decentralized (i.e. the shared ledger). More importantly, there would be a natural level of inflation in the system as the cuts or interest that middle-men demand increase the total value of all promises across the economy. And of course, notes are an excellent store-of-value because who would you trust more to support you in an emergency or retirement than your tightest friends & family?

So! We have a theoretical monetary system free from government interference, and one that encourages economic activity through modest and natural inflation.

Hashing Language

How do you build a language model with a million dimensions?

The so-called “hashing trick” is a programming technique frequently used in statistical natural language processing for dimensionality reduction. The trick is so elegant and powerful that it would have warranted a Turing Award, if the first person to use the trick understood its power. John Langford cites a paper by George Forman & Evan Kirshenbaum from 2008 that uses the hashing trick, but it may have been discovered even earlier.^[1] ^[2] Surprisingly most online tutorials and explanations of the hashing trick gloss over the main insights or get buried in notation. At the time of this writing, the Wikipedia entry on the hashing trick contains blatant errors.^[3] Hence this post.

Hash, Man

A hash function is a programming routine that translates arbitrary data into a numeric representation. Hash functions are convenient, and useful for a variety of different purposes such as lookup tables (dictionaries) and cryptography, in addition to our hashing trick. An example of a (poor) hash function would map the letter “a” to 1, “b” to 2, “c” to 3 and so on, up to “z” being 26 — and then sum up the numbers represented by the letters. For the Benjamin Franklin quote “beware the hobby that eats” we get the following hash function output:

(beware) 2 + 5 + 23 + 1 + 18 + 5 +

(the) 20 + 8 + 5 +

(hobby) 8 + 15 + 2 + 2 + 25 +

(that) 20 + 8 + 1 + 20 +

(eats) 5 + 1 + 20 + 19

= 233

Any serious hashing function will limit the range of numbers it outputs. The hashing function we used on Benjamin Franklin could simply take the first two digits of its sum, the “modulo 100” in programming terms, and provide that lower number as its output. So in this case, the number 233 would be lopped-off, and the hash function would return just 33. We have a blunt quantitative representation or mapping of the input that is hopefully useful in a statistical model. The range of this hashing function is therefore 100 values, 0 to 99.

Now a big reason to choose one hashing function over another is the statistical distribution of the output across the function’s range, or uniformity. If you imagine feeding in a random quote, music lyric, blog post or tweet into a good hashing function, the chance of the output being any specific value in the range should be the same as every other possible output. For our hashing function with a 0-99 range, the number 15 should be output about 1% of the time, just like every other number between 0 and 99. Note that our letter-summing hash function above does not have good uniformity, and so you should not use it in the wild. As an aside, keep in mind that certain hash functions are more uniform on bigger input data, or vice-versa.

Another reason to favor one hashing function over another is whether or not a small change in the input produces a big change in the output. I call this concept cascading. If we tweak the Benjamin Franklin quote a little bit and feed “beware the hobby that bats” into our silly hash function, the sum is now 230, which gets lopped-off to 30 within the hash’s output range. This modest change in output from 33 or 30 is another sign that our toy hash function is indeed just a toy. A small change in the input data did not cascade into a big change in the output number.

Here the important point is that a good hashing function will translate your input into each number in its output range with same probability (uniformity), and a small change in your input data will cause a big change in the output (cascading).

That’s Just Zipf-y

In human languages, very few words are used very frequently while very many words are very rare. For example, the word “very” turns up more than the word “rosebud” in this post. This relationship between word and frequency is very convex, non-linear or curved. This means that the 25th most common word in the English language (“from”) is not just used a little more frequently than the 26th most common word (“they”), but much more than the lower ranked word (26th).

This distribution of words is called Zipf’s Law. If you choose a random word from a random page in the Oxford English Dictionary, chances are that word will be used very rarely in your data. Similarly if you were to choose two words from the OED, chances are both of those words will not be common.

The Trick

If you are doing “bag-of-words” statistical modeling on a large corpus of English documents, it is easy find yourself accommodating thousands or millions of distinct words or ngrams. For example the classic 20 newsgroup corpus from Ken Lang contains over 61,000 different single words, and exponentially more two-word bigrams. Training a traditional statistical model with 61,000 independent variables or dimensions is computationally expensive, to say the least. We can slash the dimensionality of a bag-of-words model by applying Zipf’s Law and using a decent hashing function.

First we identify a hashing function with an output range that matches the dimensionality we wish the data had. Our silly hashing function above output a number from 0 to 99, so its range is 100. Using this function with the hashing trick means our statistical bag-of-words model will have a dimensionality of 100. Practically speaking we usually sit atop an existing high-quality hashing function, and use just a few of the least significant bits of the output. And for computational reasons, we usually choose a power of two as our hash function output range and desired dimensionality, so lopping-off the most significant bits can be done with a fast bitwise AND.

Then we run every word or ngram in the training data through our adapted hashing function. The output of the hash becomes our feature, a column index or dimension number. So if we choose 2⁸ (two -to-the-power-of- eight) as our hashing function’s range and the next ngram has a hash of 23, then we set our 23rd independent variable to the frequency count (or whatever) of that word. If the next hash is the number 258, we map to the output 3 at the bit level for the third dimension, or 258 = 255 + 3 = 255 + (258 MOD 255) more mathematically. Our statistical NLP model of the 20 newsgroup corpus suddenly goes from 61,000 to only 256 dimensions.

Wait a Sec’…!

Hold on, that cannot possibly work… If we use the numeric hash of a word, phrase or ngram as an index into our training data matrix, we are going to run into too many dangerous hash collisions, right?

A hash collision occurs when two different inputs hash to the same output number. Though remember that since we are using a good hashing function, the uniformity and cascading properties make the chance of a hash collision between any two words independent of how frequently that word is used. Read that last sentence again, because it is a big one.

The pair of words “from” & “rosebud” and “from” & “they” each have the same chance of hash collision, even though the frequency with which the four words turn up in English is varied. Any pair of words chosen at random from the OED has the same chance of hash collision. However Zipf’s Law says that if you choose any two words randomly from the OED, chances are one of the words will be very rare in any corpus of English language documents. Actually both words will probably be infrequent. Therefore if a collision in our hash function’s output occurs, the two colliding words are probably oddballs.

Two Reasons it Still Works

Statistical NLP bag-of-words models that use the hashing trick have roughly the same accuracy as models that operate on the full bag-of-words dimensionality. There are two reasons why hash collisions in the low-dimensional space of the hash function’s output range do not trash our models. First any collisions that do occur, probably occur between two rare words. In many models, rare words do not improve the model’s regression / classification accuracy and robustness. Rare words and ngrams are said to be non-discriminatory. Now even if rare words are discriminatory in your problem domain, probability suggests the rare words do not co-occur in the same document. For this reason, the two rare words can be thought of as “sharing” the same representation in the model, whether this is decision tree sub-trees or a coefficient in a linear model. The Forman & Kirshenbaum paper says “a colliding hash is either unlikely to be selected as a feature (since both words are infrequent) or will almost always represent the word that led the classifier to select it.”

We cannot use the hashing trick for dimensionality reduction in every statistical model. Zipf’s Law means most features or independent variables in a bag-of-words representation equal zero. In other words, a point in the dimensional space of the bag-of-words (a “word vector”) is generally sparse. Along these lines, John Langford says the hashing trick “preserves sparsity.” For a random specific word, the chance of two random examples both having a non-zero value for that feature is low. Again this is because most words are rare.

The hashing trick is Zipf’s Law coupled with the uniformity & cascading properties of a good hash function, and using these to reduce the dimensionality of a sparse bag-of-words NLP model.

Notes

[1] Actually the first public version of the hashing trick John Langford knew of was in the first release of Vowpal Wabbit in back in 2007. He also points out that the hashing trick enables very efficient quadratic features to be added to a model.

[2] Jeshua Bratman pointed out that the Sutton & Barto classic textbook on reinforcement learning mentions the hashing-trick way back in 1998. This is the earliest reference I have yet found.

[3] “the number of buckets [hashing function output range] M usually exceeds the vocabulary size significantly” from the “Hashing-Trick” Wikipedia entry, retrieved on 2013-01-29.