Data, information, and evidence
Data is the foundation of information, information builds evidence and evidence allows you to make informed decisions about improving productivity, streamlining processes and understanding customers. You need data analysis tools to turn data into information from which you can gather the evidence required to make and support your decisions.
Today, companies can collect more data than was practical in the past, and do so at a faster rate from multiple sources. Big data (or evidence-based decision-making) is the term that describes gathering, organizing, analyzing and managing the data, and it is the next generation of business intelligence, data warehousing and data analytics. It requires new ways to organize data, new data analysis tools, and you may need a team of data scientists to understand the data.
You can use real-time data analysis for tactical decisions associated with day-to-day operations. Strategic data analysis teases out long term trends and can occur in any time frame.
Let’s look at two trends related to big data:
- The ever increasing volume of data
- The reducing gap between data collection and its analysis
How can these trends change the way your business operates?
Data volume is growing at an increasing rate
The first trend to consider is the growth of the data companies collect.
Before the advent of the Internet, processing power and storage were expensive and ad hoc queries were impractical. Applications collected the minimum data required to record a transaction, identify a customer or describe a product. The data that companies used to collect included general ledger, payroll, accounts payable, accounts receivable, customer details, product description and invoices. If the CFO wanted to query the data, he/she submitted a query request to IT.
As the cost of processing power and storage reduced, companies collected more data and extracted some of the data into data warehouses. Online analytical processing (OLAP) of the data in data warehouses allowed anyone to analyze data without asking for permission from IT.
Today, we collect transaction data and data from a wider variety of sources and in multiple formats. The size of structured data in databases has increased with growing transaction volumes and unstructured data outside databases has increased at an even faster rate. Examples of today’s data collection are (in no particular order):
- Audio files
- Collaboration data
- CRM data including sales tracking, prospects, customer profiles
- Customer demographic data
- Data gathered by sensors (e.g. temperature)
- Image files
- Loyalty programs collect data about customer preferences
- Mobile applications provide location data
- Search results
- Social data
- Spatial, GPS and location data
- Statistical data extracted from open data sets (e.g. data.gov and data.worldbank.org)
- Text files
- Video files
- Wikis and blogs
Data comes from multiple sources including documents, spreadsheets, social media and structured sources such as relational databases. Sensors collect data (e.g. warehouse temperatures by the minute) and videos (e.g. recording who went in and out of the reception area). Companies can collect session data when customers use Web applications. Reconstructing user session data reveals what other products the customer looked at besides a purchased item. Retaining this information provides a detailed story of a customer’s interaction with the company.
Many companies have always collected vast amounts of data; the finance and insurance industries are two examples. While large collections of historical data have been around for some time these large data sets are not big data, they become big when you want to analyze the data. The characteristics of big data are:
- Volume – quantity or size
- Variety – many structure types and formats
- Velocity – speed of accumulation of the data
The variety and velocity are new factors that distinguish big data from data warehouses of the past. Of course, size matters and data volume is growing faster than ever placing pressure on storage capacity and processing power.
Implications of this trend
The upward trend in the volume of data places operational demands on companies that wish to retain and analyze their data.
The volume of data will force companies to expand their storage capacity and consider criteria for choosing what data to keep online, what data to discard and what data to send to offline storage.
The variety of data collected complicates the design of storage solutions. Unstructured data presents a challenge in storage management because of the size and distribution of the data across multiple databases and file systems.
A complex challenge is how to organize the data to optimize it for data analysis, when there is no time for complex extract, transform and load processes required to populate data warehouses. More and more of the tactical data analysis will occur using the original data.
The velocity of data accumulation means that companies have less time to perform tactical data analysis. Not analyzing all of the data may result in lost sales opportunities or failing to see where process optimization is practical.
The usefulness of evidence produced by data analysis depends on the quality of the data. Data accuracy and validity is essential to a valid interpretation of the data. Ensuring the data is valid at collection will reduce the complexity of the data analysis in both real-time and strategic analysis. Business rules control the data validation process. Business rules are spread through applications, policy and procedure documents and ad hoc decisions managers make during day-to-day operations. Data validation is complex and inflexible when business rules are distributed across many applications. Business rules in policy and procedure documents are easy to change but difficult to enforce. Companies that extract the business rules from applications into a rules engine will be able to respond to business rule changes more quickly. They can assign responsibility for business rules to the appropriate areas of responsibility.
Applications will have to expand to collect and process data such as individual customer preferences in order to capture the data necessary for up-sell and cross-sell opportunities in real-time.
Companies will have to devote more resources to managing data. Data managers will face the task of ensuring ongoing operation and availability of data in databases and file systems.
Data access and authorization requirements will become more urgent if companies are to prevent data theft or data compromise.
Privacy laws add a data management workload. Companies face a dilemma in whether to remove identity from the data, thereby losing the ability to profile individual customers, or retain the identity data and risk offending privacy laws.
The reducing gap between data collection and its analysis
A second trend is the reducing gap between data collection and its analysis.
Data is not evidence. Data requires analysis to become evidence on which companies can make decisions.
“Knowing what happened and why it happened are no longer adequate. Organizations need to know what is happening now, what is likely to happen next and what actions should be taken to get the optimal results.”
Source: “Big Data, Analytics and the Path From Insights to Value,” Steve LaValle, Eric Lesser, Rebecca Shockley, Michael S. Hopkins and Nina Kruschwitz, MITSloan Management Review, Winter 2011, Vol.52, no.2.
“99.9 percent of the time, you’re going to say ’no.’ But now let’s imagine if the store could automatically look at all of my past purchases and see what other items I bought when I came in to buy a pair of pants — and then offer me 50 percent off a similar purchase? Now that would be relevant to me. The store isn’t offering me another lame credit card — it’s offering me something that I probably want, at an attractive price.”
Source: Minelli, Michael; Chambers, Michele; Dhiraj, Ambiga (2012-12-27). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (Wiley CIO) (Kindle Locations 571-574). Wiley. Kindle Edition.
In the past, data analysis was a review of history. Business transactions occurred and data analysis occurred sometime later. Applications saved transaction data in real-time in relational databases. At a later point in time, data extract applications examined the transaction data, cleansed it, reorganized it and saved it in a data warehouse. Only then did the data analysis occur. Two of the differences between data warehousing and big data are where the data resides and when the analysis occurs. Data warehouses require time consuming tasks to extract, cleanse and transform the data so that it is ready for analysis. Today, companies can perform data analysis at the time a transaction occurs.
Data analysis used to be a back-office function. In today’s business context, the analysis occurs in day-to-day operations (tactical analysis) and post event what-if analysis (strategic analysis).
The value of the data reduces as the gap between the business transaction and the data analysis increases. Opportunities for actions related to or provoked by a transaction diminish when the data analysis occurs after the event. A person purchasing a shirt may consider a tie at the time of the purchase but is less likely to think about a tie a week or two later.
The pace of business activity has increased and post event data analysis occurs too late to take advantage of opportunities for generating business from transactions. Today, companies must be able to analyze data at or near the time of its collection, determine the usefulness of the data and initiate related actions to take advantage of business opportunities related to transactions.
Implications of this trend
The implications of this trend will vary according to the industry and nature of the products and services companies supply.
Companies selling consumer products need data analysis in real-time so that a Web site can suggest products related to items viewed by a customer. Point-of-sale systems can provide a similar service for customers purchasing products in a store by suggesting related products and offering discounts or provide an opportunity for sales assistants to chat with the customer.
Products with a long lead time up to a sale do not need real-time data analysis to suggest related products, nevertheless data analysis in real-time can pin-point bottle necks in production and distribution processes. Companies selling this type of product can also gather industry and competitor data and analyze the data to ensure the products they offer satisfy the requirements of potential customers.
Making decisions based on real-time data analysis can be misleading unless you’re sure you have sufficient data, the data is accurate and a reliable indicator of behavior and preferences?
Companies operating data warehouses will have to determine what type of data analysis will occur in real-time and what analysis will remain in the data warehouse.
All companies will have to add real-time data analysis to the applications that interface with customers via a Web site, a call center or sales assistant at a point-of-sale terminal.
An analysis of the infrastructure capacity will determine whether it can cope with the additional processing load imposed by real-time data analysis.
Transactions (e.g. a shirt sale) in the past were discrete events. Post transaction data analysis reduced the company and customer relationship into simplistic aggregations from which companies extrapolated trends.
In the past, when designing products and services, companies had to use a generic approach – a sort of one size fits all. The advent of big data and the tools to analyze the data allows companies to design products and services for individuals or smaller groups than a whole population.
Transactions today are part of a relationship between a company and individual customers (e.g. companies can use a shirt sale as a catalyst for related sales and offer loyalty programs). Companies can collect demographic and relationship data to help suggest the type and style of product to offer.
The following table presents a summary of differences between the time before big data and now.
|BEFORE BIG DATA||NOW|
|Transaction volume||Large||Larger and growing|
|Data volume||Small to medium||Medium to large|
|Data retention||Regulatory and only necessary data||Everything|
|Data access||Some data on-line with restore from off-line||Always on-line|
|Analysis level||Aggregated and summarized||Actual data for individual customers|
|Analysis possibilities||Infer from summary data||Compare individuals with others in same or similar demographic groups|
|Cost of real-time analysis||High to prohibitive||Affordable (probably essential for consumer products)|
Big data is big because the volume is larger than typical database and data analysis tools can collect, save, manage and analyze. Volume is not the primary issue. The question is how to use the data to increase productivity, streamline processes, reduce costs and drive sales.
Big data and data analytics are not the panacea to efficiency and profitability. Companies with this attitude can spend thousands of dollars on big data projects and achieve little insight. The cost of the technology components and corporate attitudes contribute to project failure.
Big data and analytics provide both an opportunity and a challenge. A successful big data initiative requires planning, management and, most important, it requires an agreed and measurable objective.
Don’t be driven by the hype. Technology companies offer tools for managing big data and analytics – claiming impressive results. You should test these claims in the context of your operations. Factors to consider are same industry, comparable size and even corporate culture.
If you are not thinking about how to use data for competitive advantage for your company, you should be.