Wednesday, September 29, 2010

What's in Your Data?

You've heard the old addage "garbage in, garbage out" as it relates to data quality and the quality of subsequent analytics, right?  It is true that numbers that mis-represent reality will not tell an accurate story when they are crunched, but data quality is easier said than done.  And how bad do bad data need to be before they become really bad?  Afterall, unless there are systemic issues or consistent biases with a dataset and how it is collected, the "averages" will tend to give a pretty good representation of, well, the average - good enough anyway to gleen some business insight into a subject of study.  The loser in this scenario, however, is a crisp understanding of the underlying business variability, which oftentimes (unfortunately in this case) ends up being more important for business planning and decision making than the average.  But still, this is no reason to discard a dataset that is suspected to be unclean.

It turns out that there are a whole host of quantitative techniques for identifying and dealing with "bad" data.  No technique will turn garbage into gold, however.  One must eventually adopt a philosophy of sacrificing data quality for the act of getting down to business and actually doing something with the data.  Far too often there is an inordinate amount of effort put into data quality (rightfully so in some cases - like regulatory reporting), with analytics becoming an afterthought.  If data are to be leverage to make better, smarter decisions - with limited resources - there needs to be a relaxation of the high expectations for data quality. 

We can have it both ways though, if we reconsider how we capture data: Any data that requires a human to use any kind of judgement, and that additionally requires that same (or another) human to input those data into a computer, is right off the bat, "bad" (think surveys).  This can be mitigated somewhat by carefully controlled data forms, but the point is that no two humans will look at the same data call identically, and even the same human might look at it differently at different points in time.  A far better way to capture data is through the machine capture of human activity.  This is the "data exhaust" that gets emitted when humans interact with machines - login frequency, clicks, emails, web searches, non-cash purchases, monitoring systems, etc.  Let's admit it, most of our business activity one way or another interacts with machines, and those machines do (or could) capture the who, what, where and when of this activity (it is up to business analytics to figure out the "why").  Afterall, business analytics at its core is all about trying to understand specific human behaviors that relate to one's business (sales, marketing, operations, etc.).  So why not use a data source that specifically and automatically tracks that, rather than trying to replicate it with a subjective surveys?

Whatever the quality of your data, your means of capturing them or your philosophy and tolerance towards data quality, think about ways to improve your data, but don't obsess over it.  Instead, understand the shortcomings of what you've got, take advantage of what's good, and get out there and use your data!

Sunday, September 5, 2010

Models for Decision Making

What are decision making models? Generally they are tools that allow senior managers to prospectively evaluate management decisions they plan to make for their organization. Decision making models are classified as predictive models and optimization models. Predictive models focus on the likely outcome of a decision over some future time period. For example, a business development manager may wish to understand the impact over time of a marketing campaign prior to initiating it. Predictive decision models in this case would draw on historical data of past marketing campaigns and on professional judgment to forecast the impact of a prospective campaign, given its characteristics. These types of models allow for what is often referred to as "what-if" analysis, allowing the business development manager to "turn the dials" of the base assumptions for the campaign (i.e. target prospects).

Optimization models, on the other hand, take "what-if" analysis of predictive models to the next level. They allow the business manager to understand the "best" set of decisions for a particular business strategy. For the marketing campaign example, an optimization model would suggest target prospects that would generate the best return on investment for the campaign.

For many small and medium sized businesses, these types of decision making tools seem out of reach either because of lack of quantitative resources, or because of the high cost of commercially available decision making tools. These barriers need not prevent the adoption of this type of business strategy, however. Spreadsheet models or spreadsheet simulation provide the necessary technology to deliver decision making tools, at a low cost. Coupled with that, decision modeling consultants can offer quantitative resources on a contract basis that deliver business specific tools to the hands of the business manager.

Historical vs. Predictive Analysis

Oftentimes consumers of business information wish to summarize available data through data summaries and data visualizations (pivot tables, charts, graphs, etc.). While these tools are incredibly important to provide insight to past business outcomes, they are somewhat limited in telling a complete and forward looking story. They rely on a restricted set of past experiences and business scenarios (i.e. what have been sales results with current and past staff levels) and limit the ability to deduce potential future outcomes.

Predictive modeling, on the other hand, uses historical results to drive statistical models which can forecast outcomes for business scenarios that have not yet occurred (i.e. what will sales results be if we increase staff levels). This interpolation and extrapolation of allows for “what-if” analysis that informs the decision making process. Also, if correctly designed, predictive analysis will not only offer point estimates, but also expected outcome ranges that consider inherent business variability and uncertainty.