Is Your Analytics Floor Rotten?

Is Your Analytics Floor Rotten?

written by Ben Scott, VP of Client Services

We’re remodeling one of our bathrooms right now (and yes, it’s taking three times as long as anyone planned).  But, something happened right at the beginning of it that made me glad we’d started, regardless of the delays.  On the first day, during demo the remodelers found a huge section of rotted subfloor.  Turns out there had been water leaking out of the shower pan for quite some time, and the floor was on the verge of simply giving out on the spot.  One of the guys showed us how bad it was by pressing on a portion of it with his hand, which went right through.  So, on any given day someone could have easily fallen through the floor.

You know though, subfloors don’t get a lot of play during the remodel planning.  I don’t think anyone is flipping through magazines looking at subfloors, or pinning subfloor work on Pinterest.  Few are likely to be convinced they need to do a remodel with the promise that the subfloor will be sound when it’s done (unless they already know the subfloor needs attention).  Subfloors just aren’t very appealing, and yet they absolutely have to be in good working order.  Imagine if our remodel had not included replacing the rotted subfloor.  It’s hard to picture because that would so obviously be a waste of time and money.  No matter how nice the bathroom looked when it was done, the unaddressed issue of the subfloor would make the entire room unsound.  In fact, we’d need to do the work again soon (like, when someone fell through the floor).  Yet, this type of thing happens all the time in organizational analytics projects.

Data management (which can be thought of as a combination of data governance and business logic management, along with ETL/ELT development work) is in a lot of ways the subfloor of the analytics world.  It’s just not particularly appealing or exciting.  If one wants investors to pay attention, much better to talk about “AI” or “bots”.  As an executive, data management may be on the strategic roadmap, but it’s probably not the lead billing.  It’s more likely that leadership wants to see talk of “predictive analytics” and “ML models”.  Yet, the truth is that if data management isn’t being planned for and executed against, the analytics system probably has a big rotten spot somewhere in its floor.

What we have to realize and continue to be mindful of is the very foundational reality that decisions-human decisions about how to conceptualize, organize and define data-are both unavoidable, and have a massive downstream impact to the rest of the system.  This is true of more traditional BI solutions, as well as predictive models and even AI.  As Roman Yampolskiy is quoted as saying in an article by Hope Reese on TechRepublic, “Any time you have a dataset of human decisions, it includes bias.”  In the medical field, human decisions are everywhere, from the clinical to the revenue cycle to the administration.  What’s the principle diagnosis, primary diagnosis, secondary diagnoses?  Which codes, primary and secondary define a disease, such as CHF?  Were those POA, or not, or not known?  Do we need to make any special allowances for events such as transplants, pregnancy or major accidents?  To which provider should the visit be attributed, the last attending or the longest attending?  These decisions are all strung together in large chains, the outputs of which heavily influence decision making later in the system, both human and machine.  For instance, if CHF patients are deemed to be at a higher risk for hospital utilization, it becomes very important how CHF is defined (which codes are included for primary and secondary, as well as any defined exclusions).  An incorrect or incomplete definition could lead to higher levels of unmanaged risk in the population.

In order to assist in minimizing unintentional bias and maximizing informed decision making, we work with all of our clients to be sure the subfloor of data management is installed and in good working order, before we begin surfacing data for use (to either humans or machines).  Here’s how we go about it:

  1. We work with each client to be sure all relevant data and logic definitions (e.g. the primary and secondary codes, along with exclusions for CHF) are clearly visible, and well understood.  We’re completely transparent about this, and if there’s a need to change one of our standard definitions, we’ll change it.
  2. We also work with our clients to recommend data governance practices within their organization.  This may not come to a formal system, but at least includes documentation and a change management process, so that logic definitions are introduced and/or changed only through predictable methods.
  3. We use the open source Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) (supported by OHDSI) for all of our databases.  This ensures that regardless of the data source, all incoming data is standardized into a well-known model for systemized analysis.  The CDM also allows for efficiency in re-usable analysis, in that the same types of analyses can use shared across data from disparate organizations.
  4. We run routine data quality checks across the system.  Just because there aren’t any hard errors during data import and processing doesn’t mean there aren’t some potential issues lurking within the data.  Are there more than the usual number of nulls for a given field?  Are there duplicates in a field that is supposed to be unique?  Is there an illogical diagnosis, such as pregnancy for a male?  These are potential data issues, and can cause downstream challenges if left unaddressed.
  5. Training; we love it.  Really, we do.  We think it might be the best way to increase collective awareness and coordination.  The client learns about things like how foundational choices made during data management ultimately impact decision making across the organization.  We learn about what we need to start, stop or keep doing to be sure our solutions are producing real, tangible value.

Data management, though not the most popular of all the analytics components, is nonetheless a fundamental and irreplaceable part of any analytics system.  Whether it’s humans or machines making downstream decisions, if the data isn’t being managed, there are probably going to be rotten areas, and it’s only a matter of time before someone or something falls through.