Big Data Off the Beaten Path

Mar 24, 2016 - by Art Shectman

At Elephant Ventures, we live and breathe big data and analytics, but our work doesn’t take us into every nook and cranny of the big data world on a regular basis. Like most firms in the big data arena, we work in mainstream areas in fields like marketing, advertising, manufacturing, finance, health, and national security.

But you can’t spend big chunks of your time thinking about big data without becoming aware of some of the niche applications that put big data to some interestingly specialized purposes.

Spotify is already known for its ability to present listeners with suggested artists in the form of customized “Discover Weekly” playlists extrapolated from the music listeners have already chosen for themselves. Now the service has added a new feature, “Fresh Finds.” Instead of personalizing a playlist based on the listener’s own tastes, it mines review sites and blogs for the latest trends.

It then parses the listening habits of some 50,000 users who, in Spotify’s judgment, are the hippest of the hip, the customers listening to new music before the rest of us even know it’s there. With an assist from some human curation, their cutting-edge favorites are made into playlists and organized by genre. Ultimately, they get featured placement in the Spotify app.

Spotify’s new idea is supposed to challenge users by introducing them to new music that would never have made it into one of those playlists generated by a listener’s existing tastes. No matter how challenging things get, though, even the most unappealing playlist won’t ruin a life.

"With big data, the signal is there if you know how to find it."

In other applications, that may not be the case. Half a world away, and with a tip of its hat to dystopian science fiction, China has a big data project of its own. It’s meant to discern patterns of behavior that will predict terrorist acts or, at least, activities that threaten national stability. In China, the distinction is not always clear.

Given the Chinese government’s utter disdain for privacy protections, this is big data that’s truly big. The system can access information from banking, employment, telecommunication, shopping, and hobbies. That data is supplemented by a national network of surveillance cameras – called “Skynet,” although probably not in homage to the “Terminator” franchise – along with the existing network of neighborhood informants that has been keeping an eye on things for decades, operating long before computers were in the picture.

Even if you take privacy concerns out of the equation entirely, the predictive accuracy of this kind of system is open to serious question.

In the end, the Chinese project runs the risk of being all noise and no signal, but that’s an issue that affects more innocuous applications of big data principles, including one that’s focused on the weather.

In this case, it’s a project launched by OpenSignal, a company primarily concerned with mapping cell phone coverage around the world. Part of that project involved measuring battery temperature, which led the company to identify a correlation between battery and ambient temperatures. WeatherSignal, an app that collects atmospheric data from phones, grew out of that realization, spurred on by the fact that some phones can serve as functional weather stations. Some smart phones can supplement an ambient thermometer with a light meter, a barometer for air pressure readings, and hygrometer for humidity.

The obvious question to ask is whether you can trust the data the phones are generating. Aren’t the phones indoors a lot? In fact, don’t they spend a lot of time sleeping in our pockets? WeatherSignal addressed that last question with a specific fix, using readings from the light meter to determine if the phone was outdoors, but that’s almost beside the point.

The real answer goes to the nature of big data itself. Big data is not about making use of a limited set of precise data points. To the contrary, it’s about taking vast amounts of noisy data, potentially drawn from multiple streams, and deriving valid insights because you know how to manage and analyze that data.

That lesson applies to big data regardless of the purpose to which it’s put. It’s not about the data itself. It’s about the way you approach it, the knowledge that you bring to the table and the tools that you apply. Without that, big data can seem to be nothing but noise. The signal is there, though, if you know how to find it.