skip to Main Content

Find Your Can of Beans in the Healthcare Big Data Supermarket

google-image

Search technology is pervasive. Whether you use Google, Bing, or any other search engine out there, you are a computer user who certainly makes use of a search engine. Search engines are known for a lot of things, but one thing that they aren’t known for is slow search speed. Can you imagine a world where, if you typed a set of terms like “Pharmaceutical marketing in California” into Google’s search box, you would have to wait many hours for an answer? It’s unthinkable today, but it’s important to understand that it wasn’t always this way. For many industries and problem domains, waiting hours for an answer is the rule not the exception.

Google, and other search engine developers, helped pioneer a new approach to working with large data sets that reject the limitations of traditional relational database technologies. They knew all too well that if you tried to index the entire internet with a traditional database, waiting for hours for an answer would be the expected result. This problem is not limited to indexing the web, however. You don’t even need to deal with the entire internet before you can get into trouble with relational databases. The real problem comes about when you want to be able to ask arbitrary questions of data, not just pre-canned questions, and get those answers back immediately.

In the Healthcare industry there is more than enough data available to get into trouble. From web traffic on multiple brand websites and micro-sites, internal and purchased demographic data, HCP prescribing behavior, opt-in preferences, campaign touch points, sales rep interactions, the social web, and opt-in patient information, record counts can easily reach into the billions and beyond. Data sizes for a single company can total terabytes or more. Asking questions of this amount of data is easy if your needs are constrained. It is hard if you want to do this near real-time and ad hoc – which is where we see the real need, and what we aim to enable for all our customers.

In the past, healthcare marketers only solution has been to hire service providers that deploy relational databases, a lot of expensive hardware, and no shortage of outsourced service hours. Or worse yet, service providers have been unable to fully enable this type of querying and segmentation, or have been forced to use sub-par software which then frustrates their customers. Finding viable answers to the question “who should I be targeting and when?” has taken months and worse, involves 7 to 8 figure price tags.

Clearly, there are many reasons why this state of affairs exists, and not all of them have to do with technology. Strategic vision (or lack thereof), process challenges, and volatile workforces all contribute to expenses typically found in the cost of finding customers and growing market share. That being said, enabling technology has a significant role to play in helping marketers and their service providers collaborate to make informed decisions.

One of the reasons that aggregated data is often hidden from brand managers is that the technology used to assess it isn’t designed to answer ad hoc questions on large amounts of data in a fast and user friendly way. This leads to interfaces that, if a brand manager were to use them, would return results in hours, not seconds. A long time to information makes it nearly impossible for brand managers to do their own exploration. Brand managers are forced to throw additional outsourced service resources at a problem just to get started. True collaboration at the data layer is far more a dream than reality. In the end, brand managers wind up in the back seat of a car they should be driving and their internal IT departments have no way to help them.

A question that is raised here is why do relational databases and traditional on-premise software solutions fail to deliver? There are a number of reasons for this, but to illustrate the point, let’s consider an analogy.

large-data-sets5Imagine a Super Market. Now envision the aisles of this Super Market filled with products of all different shapes, colors, pictures, brand names, inventory numbers, shelf positions, and prices. If your super market was small, and you wanted to find a can of beans with a green label, you could use just about any strategy you please and find that can in a reasonable amount of time.

But, when the supermarket is large, like a Costco, and you need to find bean cans with more specific criteria like “the number of green cans of beans with the word ‘fresh’ somewhere on the label, a picture of a farm house, and a price between $2 and $3 per can”, getting an answer will begin to take a lot more time if you don’t have the right strategy. Translating this type of problem to relational database land, one might try to store all the products in a database (think shelves) and create indexes (think signs) for commonly asked questions. Indeed, if you typically want to know how many cans of beans meet a single criteria, this can be a great solution. This would be like putting up a sign in Costco that said “Green label farmhouse pictured canned beans, $2.50, aisle 3, 265 cans in stock”. If you saw that sign, you would get an answer quickly. It should, however, be obvious why stores like Costco don’t create these particular signs. If Costco wanted to answer all potential questions, whatever they might happen to be, they would have to place a tremendous number of signs all over the place. In fact, they would need an infinite number of signs if they wanted to be ready with a quick answer to any possible question. Clearly this isn’t a practical solution, and we’re left with generic signs like “Canned Vegetables” and long search times for specific questions. This situation is analogous to using traditional databases to ask sophisticated questions of large data sets.

To give a practical healthcare data example, consider the following question: “How long does it take to find out how many cardiologists practice in California, and in the last 30 days have visited our brand website, signed up for our speaker program event, were delivered samples of our latest product, and have increased Rx market share in the last 90 days?” We want to know this so that we can invite them to an exclusive KOL program. Assuming we’re using a traditional relational database, which of the following do you think is the right answer:

A. Several hours
B. A few seconds
C. Many days

If you answered (B), you would definitely be wrong unless this was the one question you asked all the time. If you answered (A) you are most correct. Typical relational queries of a complex nature can take many hours. If you answered (C), you are partially correct. There are cases where the data is large and complex enough that the result could take multiple days to come back. If you need this information for a campaign or campaign adjustment that you want to execute in a hurry, like during a tradeshow, you don’t stand a chance with a traditional relational database.

Luckily, there is a better way. In the case of finding information about a can of beans, what if we were to deploy multiple people at the same time each solving a part of the problem in parallel? We would first need to re-arrange the shelves a bit to ensure they are setup in a way that makes it possible for separate people to do general counting on any problem. Thus, each person would be assigned a collection of products including cans of beans, boxes of cereal, detergents, etc. They would be given enough information to answer a part of any question asked, but no single person could answer the whole problem by themselves. When we want to ask a question, we’d broadcast the question to all the people over the intercom. Each individual would begin counting using the products they have, reporting their answers by holding up a sign. A single coordinator would be responsible for collecting the held up responses, tallying them, and reporting the final answer to us. This arrangement would let us get answers to arbitrary questions much faster than the single person approach because each person would answer their question quickly having only to work on a subset of the information. Even better, we could continue to speed up the process by adding more people to do the counting, breaking up the products into smaller and smaller groupings.

smaller-groupings-300x240

If we replace the word “people” with “computers”, and “products” with “database records”, we can start to see how a
fragmented non-relational model for data storage and retrieval makes it possible to get fast answers to difficult questions on large data sets. With modern, non-relational technology, it is common to deploy many low cost servers each responsible for answering questions on a small portion of a large set of data. This newer approach makes it possible for brand managers and their service providers to collaborate like never before. It lets them play with their data first hand and gain insights they never could before. It reduces costs both in terms of capital and time to get marketing programs delivered. In short, there is no reason that healthcare marketers should NOT be enabled to properly segment and execute campaigns on the vast sources of data their organization holds.

Data doesn’t win. Using data to drive marketing objectives wins. Are you using technology that just slows you down, or does your technology help you win?

By Chris Hahn, Appature

Author

Back To Top
Skip to content