Friday, April 4, 2014

Making A Biologic At Scale -- A Challenge Uniquely Suited To REAL "Big Data" Analysis

As anyone who has worked around a vaccine shop of size knows, there may be no scientific task more daunting, than growing a perfectly-uniform batch of organisms, from scratch, in the multiple hundreds of thousands of doses scale.

Imagine trying to simultaneously grow upwards of six hundred thousand purple pansies -- not one of them with a single blemish, on any petal. At all. Now imagine having to do that over and over, 52 weeks a year. Oh. And yes -- every single bit of contaminant, biological or substrate -- must vanish before these mythical purple pansies are packed and shipped.

But that is the eye of the needle through which vaccines -- and many of the other newer biologic agents -- must be passed. In the early 2000s, Baxter could not keep a steady flow of its anti-hemophiliac biologic blood factor reliably streaming from the end of the line of its clearly state of the art facility in Europe, no matter how hard it tried. Similarly, in the 2010 to 2012 timeframe, Merck and MRL could not reliably keep the supplies flowing, on certain vaccines (like its Hep B vaccine) -- and likely considered exiting the market altogether.

So. . . unlike a lot of the bluster about uses for big data crunching -- this infinitely variable biological problem set is truly an in the wild environment where massively relational computing might help tease out real root causes of production anomalies. Cue the Info-media machines -- yes, a feature story follows -- on Merck. Diagnosing a vaccine problem, using real big data approaches. Nice. Do go read it all -- but here is a bit:

. . . .By early 2013, a Merck team was experimenting with a massively scalable distributed relational database. But when Llado and Megaro learned that Merck Research Laboratories (MRL) could provide their team with cloud-based Hadoop compute, they decided to change course.

Built on a Hortonworks Hadoop distribution running on Amazon Web Services, MRL's Merck Data Science Platform turned out to be a better fit for the analysis because Hadoop supports a schema-on-read approach. As a result, data from 16 disparate sources could be used in analysis without having to be transformed with time-consuming and expensive ETL processes to conform to a rigid, predefined relational database schema.

"We took all of our data on one vaccine, whether from the labs or the process historians or the environmental systems, and just dropped it into a data lake," says Llado. . . .

And in the end, they figured it out. Amazing. Know that the next biologic production slowdown (and chaos theory posits that such events are a near certainty, over a large enough n. . .) will likely turn out to be due to the confluence of twenty or thirty wholly-new, and very subtle, biological shifts. . . even ambient external plant humiditiy for example. So I salute the REAL "big data" hunters and gatherers here.

And remember -- unlike the competitive gardener, who only loses "best in show", if her pansies are blemished. . . people may die, if even one vial -- of 600,000 doses or so, of a given vaccine is biologically inactive -- or contains any form of particulate. The idea that we -- in the main -- see a safe, stable supply of any widely available modern biologic -- is a BIG science. . . miracle. Yes, I think that the right noun. I plainly do intend to mix cosmology and mysticism here -- with pure inductive scientific reasoning. It is a modern miracle. Thank you, MRL -- and Baxter.

[And. . .what the heck did they put in MY coffee? Wow.]

No comments: