An algorithm might create a playlist you enjoy, but don’t mistake that for creativity.

Sep 22, 2015

Zane Lowe’s first show as a DJ on Apple Music was a bit dizzying. The songs he played lurched from punk-pop to post-rock to grime to electronica to stadium rock and beyond. He showcased previously unheard songs along with tracks recorded decades ago by well-known rockers. Yet despite the disarray—or probably because of it—I enjoyed the show. Each new track took me in a surprising direction, while the mix of artists and the energy of the songs seemed to match Lowe’s slightly deranged chatter.

One thing that stands out about Apple Music, a streaming service you can use on computers and mobile devices for $10 a month, is the presence of human DJs like Lowe on a channel called Beats 1. Lowe’s show introduced me to unfamiliar artists, and it highlighted intriguing musical connections—between, say, a stadium anthem by AC/DC and a recent piece of remixed electronica by a Scottish artist called Hudson Mohawke. The emotion running through all the songs was upbeat, even defiant.

Just as computers cannot yet create powerful and imaginative art or prose, they cannot truly appreciate music. And arranging a poignant or compelling music playlist takes a type of insight they don’t have—the ability to find similarities in musical elements and to get the emotional resonance and cultural context of songs. For all the progress being made in artificial intelligence, machines are still hopelessly unimaginative and predictable. This is why Apple has hired hundreds of people to serve as DJs and playlist makers, in addition to the algorithmic recommendations it still offers.

Bringing in human experts is a clever way for Apple to differentiate itself. Despite having pioneered the digital distribution and storage of music, it now finds itself lagging behind streaming services such as Pandora, Spotify, Rdio, and Tidal. None of these emphasize curation by human experts as much as Apple Music does. And while the algorithms that all these companies use for recommending songs have improved greatly in recent years, there’s no real musical understanding or appreciation going on. It shows. The algorithms employ statistical techniques to parse listener data, making an educated guess as to what you might like. There is still no algorithm that can account for human taste.

Hearing things

Pandora, one of the first music streaming services, is a good example of the algorithmic approach. Through a decade-old effort called the Music Genome Project, Pandora has employed music experts to tag songs with hundreds of characteristics, such as the genre, the types of instruments used, and even the melodic phrasing and tonality. When you give Pandora a band, composer, or song as a starting point, it creates a “radio station” of music with similar attributes. Choose the Beatles, and Pandora may automatically cue up a song by the Beach Boys, informing you, “We’re playing this track because it features mellow rock instrumentation, demanding vocal performances, interweaving vocal harmony, mixed minor & major key tonality, and melodic songwriting.”

Sadly, Pandora’s choices tend to be rather predictable—often just as bland and conventional as those on commercial radio. After beginning with the Beatles, you’re unlikely to hear a song in a very different style that was popular around the same time, for example, or a hip-hop artist who’s done a clever job sampling the work of Ringo and co.

More recently, algorithms have begun producing playlists that can feel a lot more nuanced and tailor-made. The world’s biggest streaming service, Spotify, which has more than 75 million users, is pushing the state of the art, using vast amounts of data to make personalized recommendations.

Chris Johnson, who leads one of Spotify’s data science teams in New York, says the company does employ humans to make some of its playlists. But it also collects as much data as possible on a user’s listening behavior, and then compares it with data collected from other users. The idea behind this technique, known as collaborative filtering, is that you’ll probably like a song that someone with similar tastes has already discovered and enjoyed. Last year, Spotify acquired a company called the Echo Nest that gathers information about new music posted to blogs, news websites, and social media. These opinions also now feed into Spotify’s recommendations, helping to make its music suggestions cleverer still.

In July, Spotify began testing a personalized playlist made available this way. “We look at what you’re playing, playlists you’re creating, and basically everything we know about you. From that, every Monday, there’s going to be this new playlist of music,” Johnson told me.

The first few playlists I received included several songs that I instantly loved, though none stray very far beyond the stuff I already listen to. It’s useful, but not quite mind-blowing.

There is an inherent limitation to such automated recommendation algorithms, too: they cannot suggest a new song, because there’s no data to show how much other listeners like it. In contrast to an algorithm, humans can usually tell, within a few moments of listening, just how much they like a new track. Here, though, recent advances in artificial intelligence are starting to help. Last year, ­Spotify began testing a way of analyzing a song itself rather than just the metadata associated with it. This involved training what’s known as a deep-learning network, roughly modeled on layers of neurons in the brain, to recognize frequency features of an audio signal (corresponding to the sound you hear and the way that sound changes over time) in millions of songs. These algorithms can classify a new song surprisingly well, as shown in example playlists posted by a member of Johnson’s team at Spotify.

But even this feat is not evidence of real musical understanding or judgment. Spotify’s deep-learning system still has to be trained using millions of example songs, and it would be perplexed by a bold new style of music. What’s more, such algorithms cannot arrange songs in a creative way. Nor can they distinguish between a truly original piece and yet another me-too imitation of a popular sound. Johnson acknowledges this limitation, and he says human expertise will remain a key part of Spotify’s algorithms for the foreseeable future.

Apple’s Beats 1 offers a vastly different experience. One radio show, The Alligator Hour, which is fronted by the musician and producer Joshua Homme, celebrates obscure but extremely original songs. It also revels in the absurd connections that can be found between some songs—pairing, for instance, the melodic side of the Sex Pistols with the adrenaline that fuels Donna Summer’s disco. It’s delightfully weird. In another show, called Mixtape Delivery Service,the musician Annie Clark (stage name St. Vincent) plays a custom list of songs inspired by one listener’s mood or situation. In her first show, Clark arranged a retrospective of less well known but culturally significant dance music for an 11-year-old girl who wanted to learn more about the genre.

Auditory Turing test

What is it that gives people this ability? Could deep learning or other artificial-intelligence systems ever develop “taste” that goes beyond classifying the characteristics of a song to determine whether it is “good” or not? Might computers be able to identify that intangible quality that people naturally associate with talent or creativity or originality? When I asked Johnson if an algorithm might someday be able to scout out a hit song from an unsigned artist, he said: “That’s exactly what we want to do.”

It’s a bold ambition, and one that might prove elusive. Musical appreciation and creativity have nothing to do with finding statistical patterns in great piles of data.

“What differentiates something unusual or bizarre from something creative? That’s a difficult question,” says Eyal Reingold, a psychologist at the University of Toronto who studies human creativity. For a machine to demonstrate creativity, he says, “it would have to produce something that’s not only unusual—or something that’s not been programmed into it—but that is judged to be useful, at least in some cultural context.”

Indeed, the slippery nature of creativity has led some psychologists and computer scientists to suggest that it could be a useful way to measure machine intelligence. In a paper published in 2001, two academics from Rensselaer Polytechnic Institute, together with David Ferrucci, then an IBM researcher who would go on to create a computer called Watson that would win the game show Jeopardy!, argued that a creativity test could be a better way to judge whether a computer had achieved human-type intelligence. They noted that the test proposed in 1950 by Alan Turing, which gauges a machine’s intelligence through a typed conversation, encourages programmers to employ trickery rather than build something genuinely intelligent. They reasoned that feats of creativity, whether in painting, writing, music, or some other field, are much harder to fake and are fundamental to intelligence. And they called their alternative the Lovelace test, after Ada Lovelace, often considered the world’s first computer programmer, who noted in 1843 that the first computing machines, impressive though they might be, would be incapable of doing anything original.

Tellingly, efforts to pass the Lovelace test have largely foundered. Still, the challenge lives on. In fact, Michael Casey, a professor of music and computer science at Dartmouth College, plans to hold several Turing tests early next year, perhaps followed by some Lovelace tests. One will involve computer DJs, with dancers asked to judge whether the songs they just heard were cued up by a human or by a machine. Casey hopes that within this limited context, a machine will demonstrate something akin to musical creativity.

He hardly seems confident, though. “No matter what type of algorithm we’ve tried to apply in the past to music—whether it’s something that tries to mimic Bach or Mozart, or tries to recommend music—at a certain point it feels like it doesn’t have any ‘shape’ to it,” he says, a little ruefully. “It may, for a few seconds, fool you, but it doesn’t have an overall plan. And I think the same may be true of an automated DJ set.”

Perhaps this will be true for a long while yet. And if we want machines to come up with something as unique and original as a show on Apple’s Beats 1, then we might need to think a little more creatively about how we design them.