Information underload – Mike Caulfied on the limits of #Watson, #AI and #BigData

From Mike Caufield, a piece that reminds me of the adage Garbage In, Garbage Out:

For many years, the underlying thesis of the tech world has been that there is too much information and therefore we need technology to surface the best information. In the mid 2000s, that technology was pitched as Web 2.0. Nowadays, the solution is supposedly AI.

I’m increasingly convinced, however, that our problem is not information overload but information underload. We suffer not because there is just too much good information out there to process, but because most information out there is low quality slapdash takes on low quality research, endlessly pinging around the spin-o-sphere.

Take, for instance, the latest news on Watson. Watson, you might remember, was IBM’s former AI-based Jeopardy winner that was going to go from “Who is David McCullough?” to curing cancer.

So how has this worked out? Four years later, Watson has yet to treat a patient. It’s hit a roadblock with some changes in backend records systems. And most importantly, it can’t figure out how to treat cancer because we don’t currently have enough good information on how to treat cancer:

“IBM spun a story about how Watson could improve cancer treatment that was superficially plausible – there are thousands of research papers published every year and no doctor can read them all,” said David Howard, a faculty member in the Department of Health Policy and Management at Emory University, via email. “However, the problem is not that there is too much information, but rather there is too little. Only a handful of published articles are high-quality, randomized trials. In many cases, oncologists have to choose between drugs that have never been directly compared in a randomized trial.”
This is not just the case with cancer, of course. You’ve heard about the reproducibility crisis, right? Most published research findings are false. And they are false for a number of reasons, but primary reasons include that there are no incentives for researchers to check the research, that data is not shared, and that publications aren’t particularly interested in publishing boring findings. The push to commercialize university research has also corrupted expertise, putting a thumb on the scale for anything universities can license or monetize.

In other words, there’s not enough information out there, and what’s out there is generally worse than it should be.

You can find this pattern in less dramatic areas as well — in fact, almost any place that you’re told big data and analytics will save us. Take Netflix as an example. Endless thinkpieces have been written about the Netflix matching algorithm, but for many years that algorithm could only match you with the equivalent of the films in the Walmart bargain bin, because Netflix had a matching algorithm but nothing worth watching. (Are you starting to see the pattern here?)

In this case at least, the story has a happy ending. Since Netflix is a business and needs to survive, they decided not to pour the majority of their money into newer algorithms to better match people with the version of Big Momma’s House they would hate the least. Instead, they poured their money into making and obtaining things people actually wanted to watch, and as a result Netflix is actually useful now. But if you stick with Netflix or Amazon Prime today it’s more likely because you are hooked on something they created than that you are sold on the strength of their recommendation engine.

Let’s belabor the point: let’s talk about Big Data in education. It’s easy to pick on MOOCs, but remember that the big value proposition of MOOCs was that with millions of students we would finally spot patterns that would allow us to supercharge learning. Recommendation engines would parse these patterns, and… well, what? Do we have a bunch of superb educational content just waiting in the wings that I don’t know about? Do we even have decent educational research that can conclusively direct people to solutions? If the world of cancer research is compromised, the world of educational research is a control group wasteland.

Advertisements

1 thought on “Information underload – Mike Caulfied on the limits of #Watson, #AI and #BigData”

  1. Reblogged this on Séamus Sweeney and commented:

    The piece linked to here has applications well beyond medicine and the specific diagnostic issues with Watson – some much of the hype of Big Data is predicated on the quality of the Big Data itself being unquestionable.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s