Reforming Retail

What’s My Data Worth: A History of Restaurant Data Markets (Part 1)

This is part 1 of a 2-part series.

This industry wrongfully assumes that there are myriad buyers of restaurant and retail data. That whatever data they can find, in whatever format and scale, is instantly transformed into a Scrooge McDuck vault.

This couldn’t be further from the truth.

Since we’re tired of explaining how these data markets work, what follows is a relatively complete, modern history of on-premise (this is the colloquial usage even though it should be on-premises, grammatically speaking) data markets in food and beverage. (Note: on-premise is the term used for offline brick-and-mortar sales that don’t occur in big box grocery and mass merchandising, which is known as off-premise.)

The first attempt to monetize on-premise data was Gazelle Systems Inc., a dot-com era startup founded in 1999 and shuttered in 2003. The company developed a suite of customer-facing applications (loyalty, gift, online ordering, etc.) designed to be integrated within that era’s POS systems (you guessed it, a strategy doomed to failure) that had access to full transaction level detail. Gazelle was approached by IRI, who along with Nielsen remains one of the two biggest players in so-called syndicated store data, with the objective of permissioning, collecting, and cleansing beverage-alcohol sales for IRI’s largest client, Anheuser-Busch. Gazelle offered the restaurants free use of the applications and detailed analytics of their customer base in exchange for blinded and aggregated use of the data. Nevertheless, after four years and over $10MM in venture funding, Gazelle’s traction was limited to v.1.0 Micros and Positouch integrations, had little adoption in the POS channel, and only 1,500 restaurants nationwide to show for it. Yet another dot-com had bitten the dust.

In 2004 another company called GuestMetrics entered the on-premise data market. The sole purpose of the company was to collect item level (or SKU, stock keeping unit) data from food and beverage establishments and sell it to beverage suppliers and distributors. This business, in theory, could be very lucrative: roughly 25% of all food and beverage sales occur on-premise and the off-premise data market is worth billions annually once you count the multiple layers of data products.

To gain access to the data, which was much harder to acquire because there were no such things as APIs or cloud POS back then, GuestMetrics developed numerous relationships with POS resellers and POS companies. In exchange for a very simple reporting product, GuestMetrics would acquire the rights to sell the SKU data.

This was a very costly initiative for GuestMetrics on several fronts.

First, convincing POS companies and their resellers to partner took extremely long if they would even reply to the opportunity to begin with; as we’ve shown, the majority of legacy POS companies could never even bother to return a simple email. Part of this could be chalked up to the fact that data was a completely novel concept to this industry: remember, at this time POS companies and their channels didn’t even participate in credit card residuals. It’s also fair to note that the reporting product GuestMetrics offered was very basic and faced stiff competition from a fairly wide spectrum of providers.

Second, cleansing the data was a nightmare. In off-premise, the market has matured enough to develop UPCs, or universal product codes. These codes, administered and maintained by an organizational body named GS1, mean there’s complete traceability across the entire supply chain: so when Pepsi creates Cherry Diet Pepsi in 12 ounce cans, everyone from Pepsi, to the distributor, to the grocer, and even the consumer (what do you think that barcode is on all the products you scan from the grocery store?) knows what they’re handling.

In on-premise, there’s no such thing as UPC, and there are barely even SKUs, at least in food service. That’s because restaurants are real-time manufacturing operations – nothing goes out the front door looking anything like it did coming in the back door. When you order a Mike’s Special from a restaurant, how do you know what’s in it? You don’t, and probably neither does Mike despite making the damn thing. The results are POS databases fill with very dirty records that must be cleansed if there’s any hope of suppliers and distributors taking interest. Look at the below sample check and tell us if you know what the hell these items are:

Precisely.

Even so, GuestMetrics made this harder on themselves because they never used AI or even a technical approach to solving the data cleansing problem. Whereas any decent software engineer would build a fairly rigorous canonical database and pair it with some form of save-and-train user coding system, GuestMetrics employed a handful of bodies in New Orleans to manually clean the data. The tragedy was that no machine was trained, so nothing was ever going to become automated. The result was a continuously high cost of goods.

Third, and related to the second point, GuestMetrics had to convince suppliers and distributors that there was sufficient on-premise data at a high enough quality that it deserved their budgets. When it wasn’t, and it didn’t. GuestMetrics was, like Gazelle before them, barely adding 1,000 restaurants a year by being behind the POS and reseller bus instead of driving it.

This really gets to two issues: statistical sampling size, and data cleanliness.

There are nearly 700,000 commercial restaurant & bar establishments in the US, broken into a lot of crazy groups like full-serve vs quick-serve, bar vs club, independent vs chain, etc., and never mind geographic distribution. Even 5% sampling penetration – the bare minimum for reasonable projectability and accuracy – means one not only needs 35,000+ establishments supplying data, but they must be the “right mix” of establishments. (Note: in reality statistical confidence is reached when one hits > 20% of stores, or 140,000 restaurants.)

Not easy to do. At all.

That 35,000-establishment count translates into roughly 1.75 billion guest checks a year, each with an average of ten individual item purchases. So in order to make the business go, you have to organize and cleanse 17.5 billion individual item sales a year, and you can’t do that manually. These are restaurants, remember: they name, organize, and enter their menu items any which way they want, and they’re all vastly different.

The counterparts at suppliers and distributors has been accustomed to relatively pristine data from off-premise data companies Nielsen and IRI for literal decades; nearly nobody employed at Pepsi remembered what the initial set of data from Nielsen’s Scantrack looked like in 1979.

Additionally, Walmart represents about 25% of off-premise volume, with the next 5 largest grocers adding up to another 35%. Adding these together achieves a large sample size with high statistical accuracy for projections. How do you reach 60% of on-premise volume with 6 merchants? You don’t.

The net result of these failures – some of which are GuestMetrics’ own doing while some are simply endemic to the rat hole that is on-premise – meant GuestMetrics burned through nearly $25M before merging with Restaurant Sciences, the next topic of discussion.

Restaurant Sciences took a wildly different tack than that of GuestMetrics, which we’ll discuss in part 2.

2 comments

  • Good post Jordan. Readers with some familiarity in statistics may wonder where the estimate of 20% of on-premises locations as a sample size requisite comes from. After all, a typical political poll might use anywhere from 400 to 2,500 respondents to provide a margin of error of from 5% to 2% with a confidence of 95%, and the sample size is famously independent of the universe size. The analytics required to understand pricing, sales, etc. in this vast on-premises market are a completely different animal. They are not commonly dichotomous; the underlying data is not always standard normal distribution, and the population proportions p even in the dichotomous circumstances are rarely anything close to 0.5. More important, market analysts are most frequently needing to look at only specific subsets of the market – Independents in the Detroit Metro Area; Fast-Food Burger MUFSO’s in the western suburbs of Boston; PBR can-only sales in Nighclubs on the East Coast from 5PM to 9PM. Etc. Those sorts of cuts dramatically drive up the numbers for what level of sampling may be required.

    • “But I’ll just make money selling data,” says every POS company who knows nothing about statistics and market requirements. Yes, it is always humorous to see how undereducated people really are. You cannot do everything.

Archives

Categories

Your Header Sidebar area is currently empty. Hurry up and add some widgets.