Tuesday, March 31, 2015

Moore's Law, Cloud Computing and Business Intelligence


DW/BI and Cloud Computing


As we all know, Data Warehousing and Business Intelligence has been the key to making informed strategic decisions for an organization’s success. DW/BI are generally hosted on servers within the organization so as be accessible to a network of decision making executives.  In order to meet the increasing demand in data analysis, the varied functionalities expected from the DW/BI tools, and the enormous data the BI tools handle, DW/BI has started to move from servers and local machines to the cloud.

Cloud Computing is big advancement in technology is the cloud. Cloud provides the biggest advantage of flexibility to organizations to meet with their fluctuating demands. Organizations are moving to cloud in order to meet with their requirements in scalability, computing resources, speed, balancing workloads, storage and reliability. In addition to providing these benefits, it provides an effective cost-saving option as it meets the need of the house with flexibility in resources being used depending on the requirement.
With this introduction on DW/BI on cloud, let’s get into some of the latest advancements that combines DW/BI with cloud computing.

1.       Hautelook with Amazon RedShift:


RedShift is a cloud based, petabyte-scaled data warehouse service that is available with Amazon AWS. Unlike a relational database, it holds a column oriented DBMS that is capable of handling large scale data-sets with parallel processing. Using the Attunity Replicate and Attunity CloudBeam combines well together which accounts for the flexibility to change the configurations based on the data needs and thereby achieve high speed performance. Since the Redshift runs on Amazon AWS, it provides additional advantage of no hardware/maintenance cost.

HauteLook, a Nordstorm acquired company, uses Amazon RedShift for its cloud BI solution. It is reported that the company was able to improve its speed of processing and at the same time maintain the cost of the data warehousing by moving to Amazon Redshift.
2.       P&G with Teradata Enterprise Analytics on Cloud:

Teradata offers Enterprise Analytics on Cloud. It offers different variations in the cloud services as seen in the architecture: Private Cloud, Public Cloud and a Hybrid.
Teradata provides a Unified Data warehouse architecture using Discovery as a service and Data Management as a service on cloud. The Discovery platform uses MapReduce functions drives the sophisticated path and graph analysis on multi structured data to enable rapid exploration and discovery of data. Data Management module capitalizes Hadoop for large-scale dataset management which enables better analysis and management of data of any size.

P&G, a world-class leader in consumer product industry, uses Teradata Enterprise Analytics on Cloud for their global marketing. P&G has incorporated Teradata Customer Interaction Manager, Teradata Master Data Management and Teradata Warehouse miner. Using these the company was able to drill down and analyze their global marketing campaigns to find out their effectiveness. The company also talks about the speed of deployment and low capital costs incurred to set up this analytics tool. The company achieved creating conversations in cloud with about 4.8 billion consumers and was able to maintain all this data in one fresh location available for use by all P&G around the globe.
3.       Coach with MicroStrategy Cloud Business Intelligence Platform:


Microstrategy provides a powerful cloud analytical BI platform provides self-service analytics, big data analytics and mobile analytics. The main advantage is that it allows for combining of information from different systems without the need for any scripting language. It provides powerful predictive, data integration capabilities, mobile support and advanced analytical tools to help customers get the best results. The most differentiating factor for microstrategy is that unlike others, they host their software in their own data center and also allow customers to keep their data on their own premise to alleviate security concerns.

Coach is a leading designer accessories company. Using Microstrategy BI, the company achieved quick BI insights and decision making capabilities. Some of the main features appreciated by Coach are the re-usable report templates, timely visibility to key metrics on dashboard, visibility to granular level of details on the reports and interactive mobile applications.

Resources:

Thursday, March 5, 2015


Data Visualizations
Data warehouses provides us a logical view of the data from relational databases in a more generalized, consolidated, and multidimensional view.  They model focuses more on providing easier access to data and arriving at better insights about the business to make improved decisions. There is no specific way of modeling a data warehouse for a given business case. There will/could be many different models of data warehouse for same business but with different perspective. The best way is identify the one that suits the client requirement/needs the best. To explain in a better way let us take a closer look at different modeling for different business vignettes. We will be looking at the following businesses: Retail, Telecommunications and Airline Transportation.

          1.       Healthcare:
Fig1: Main Dashboard with customizable metrics shown
Healthcare is a large industry with several loosely coupled operations with numerous physicians, patients, specialists and the management across time and location. The management keeps track of all the hospital level hierarchical activities from equipment supply and maintenance to the performance and tracing across various departments. One of the best ways to visualize the data and metrics is shown in Fig1. The dashboard shows more comprehensive information on the census information for the hospital, patient cycle times, currently waiting and occupied physicians, overall staffing and even details on the usage/availability of patient rooms. Each of the data is shown with its average turnaround metric which provides a quick idea of the stance of the hospital. This dashboard could also allow you to add extra metrics/data thereby providing customizable dashboard options. The above data is concerning hospital management. 
Fig2: Drill-down Dashboard for each department workflow level
Drill down options should be provided with the dashboard in order to efficiently manage the workflow in department. As a sample, we can see the patient workflow in a specific department in the image in Fig 2. Complete visualization is through pie chart, line chart, bar graphs. Information like cycle time for a patient visit in this department, time spend over different procedures, service distribution over time are some of the important graphs shown which provides immense depth of details on performance of a department/physicians.

           2.       Telecommunication:
Fig3: Dashboard for Telecom industry operations
Telecommunication business encompasses a number of business operations like installing the hardware, setting up the network, service activation and upgrades, generating bills, receive payments and repair work orders. From the business perspective, the some important insights required could customer churn rate, customer addition by region by date, successful customer support, successful service plans and upgrades, total installation costs by region by date etc.
The best way to visualize telecommunication data is have divide the processes separately and design a graphical view describing each operation. A sample reference visualization is shown in the figure below. The financial indicators are grouped together and shown as line charts like: Total customers by date, financial growth over time, etc. Line charts could also be used to see the performance of the service plans offered and overall performance with competitors. Bar graphs providing information on customer satisfaction with respect to each service provided and plan upgrades happening over time.
Drill down options could be provided here too to see region wise new customer growth, region wise new service contracts signed and service plan distribution details. A consolidated table showing the current state of the company with revenue, average return per user, overall growth rate could all give better insights into the data.

     3.      
Transportation- Airline:
The Airline business keeps track of several data like the flight details, passenger information, payment processing, miles traveled, cost incurred for maintenance, flight construction/purchase, fueling costs etc. One of the striking ways represent data in this type of business is through bar charts and line charts. Performance lag could be shown by making two graphs for every metric, target and actual. Individually divided sections clearly showing overall performance/growth, fuel and other costs incurred, customer data, revenue information segment and flight information section will be very useful. Customer section showing line charts for customer satisfaction level, customer booking medium, customer base growth and average mileage points claimed statistics give detailed insights on the market reach and stand of the company.
Fig4: Dashboard for Flight operations
      While the financial section shows revenue, profit margin and market stand among competitors line charts, the flight section could show information about delays by airport, flight cancels by airport, miles flown with passenger load bar charts. There could be an overall performance section which shows information related to top satisfied flights, routes having the most issues, rating the reasons for disruption of flight and destination with maximum profits as bar charts/pie charts. These would provide the company an overall view to make predictions and business decisions like increased flight number on a particular route, improve customer support, change contract with airports, etc. which will have great positive effects on business revenue.

Resources:

Thursday, February 19, 2015

Structured Data Vs Unstructured Data


Information retrieval is the process of extracting useful information from any source which provides insights to resolve a business problem. This process involves mining through any kind of data source available. The data could be of two different types.

Structured Data refers to information where the data model has an organized structure and provides a straight forward ways to perform searches using the traditional algorithms. A structured data mostly resides in a proper relational database. Only about 10%-20% of the data available are in this form. Some common examples are A structured data would typically look like this:



Unstructured Data refers to all the other data available outside which is not held in an organized data model or databases. Unstructured data usually contains garbage data in addition to the useful information. The challenge with this type of data is its processing to bring out the garbage vs useful information. Almost 80%-90% of the data available are unstructured. Some examples of unstructured data include social media content, word documents, anything recorded on a paper by human, etc.



Growth of Unstructured Data Vs Structured Data: There has been an exponential growth in volume of unstructured data than structured data. Two major reasons attribute to this uncontrollable growth: User experience is better with rich content like pictures, videos, music, X-rays etc. and the storage issues that accompany the rich text.



To manage the wild growth of unstructured data generated within an enterprise and to extract information from it, organizations have adopted two main methodologies: Big Data Tools and the Business Intelligence Tools. The most conventional way is using BI.
Data warehouse is the central repository of integrated data from disparate operational systems. It provides a structure to the raw data by organizing it in the form of OLAP cubes or dimensional modeling. The data from these modeling techniques are then used by the BI reporting tools.


With the realization of the growth in unstructured data and the importance of taping useful information from such raw data to make more appropriate business decisions has pushed many organizations to adopt data warehousing. To understand the necessity of data warehouse in an enterprise, watch the video below:




Advantages of data warehousing:
  • Potential high returns on investment for organizations.
  • Centralized, structured and standardized data for easy interpretation and understanding.
  • Provides competitive gain.
  • Improved decision making by the management over short period of time by providing right information at the right time.
  • Better enterprise intelligence to enhance customer service.
  • Provides improved reporting capabilities.


Limitations of data warehousing:
  •  Cost/Benefit Analysis is a major disadvantage of data warehousing and it may consume lot of IT man hours and budget.
  • Extra reporting work may be a problem because data warehouse requires each data type to be generated by the IT professionals.
  • Time consuming as it requires data to be extracted, cleaned and then loaded.
  • Data owners lose control over their data which creates concerns for data security/privacy issues.
  • Data flexibility can be a problem as the data warehouse tends to have static data with minimal ability to drill down to specific solutions.
  • Lot of time and money may be wasted over training and maintaining data warehouses especially in a large enterprise.


Future of Data Warehouse:


Hadoop and Data Warehouse will complement each other and grow together as the business needs to rip the big data grows. A new generation of data warehousing would come up to enhance analytics and reporting in addition to providing integration with the latest technology platforms that support processing of unstructured data. Future data warehousing will be able to provide a 360 view of an organization’s operations with much broader perspective. In addition to this, data warehousing on cloud will become the trend and organizations will need to prepare for the transition. Compatibility with anywhere any device will become the trend. All data warehousing activities should be supported through browser requests and organizations will be able to work using tablets and mobile phones without installing specialized application. One last support that will be expected is the ability to transform the entire web data into a data mesh and make connections as needed on the fly. This will enable the data warehouse to handle any type of data that may emerge in the future.




Resources:

Tuesday, February 3, 2015

Comparative Analysis on Business Intelligence Solutions

BI Tools used in the comparison: Tableau, SAP, IBM Cognos, Microstrategy and Alteryx 

Criterions for comparison:

  1. Interactive Data-Visualizations: Capability to represent interactive data in different visualizations so as to enable deeper analysis and understand the data better. Some BI tools provide drag-and-drop feature to make the development process of this visualization easier.
  2. Ad-Hoc querying capability: Capability that allows users query for specific information and any business related questions apart from the reporting and dashboard data.
  3. Cost: Pricing for the whole purchase of the product or licensing of the product usage has to be affordable according to the size of the organization. High prices for smaller companies may be too heavy on them and decrement the value of the product in the market.
  4. Ease of use: The tool should be easy to use. It should allow the business users to use the tool with minimal/no assistance from technical experts, incorporate features like drag-and-drop features for data blending, visualization and reporting, along with easy and understandable components which makes navigation easier.
  5. Scalability: The tools should support analysis on datasets of any size and be suitable to work with any organization. Any change in the organization size should not affect/restrict/create a need for the organization to move to another tool.
  6. Reporting capabilities: The tool should support different types of reports with different visual rendering.
  7. Performance: Importing data into the environment, navigating between the tools, time to generate reports and create visualizations for any size data sets should have high performance ratio. 


Tools in Detail:

Tableau:  

Tableau is one of the leading vendors in today’s market of Business Intelligence. Some of the most notable features are:
  •   It provides a variety of data visualization tools with drag-and drop options.
  •   Business users can handle it almost no technical assistance.
  •   Support to predictive and trend analysis.
  •   Supports importing of data from different data sources.
  •   Rapid fire intelligence supports ad-hoc analysis to deliver faster results than any other products in the current market.
  •   Built-in data engine that to support offline data analysis.
  •   Integration with R and Maps.
  •   Supported on multiple platforms like mobile, desktop and tablets.
  •   Support for financial/budget forecasting.



The most commonly faced drawback with tableau is its pricing, organization-wise customization and complex functionalities which may seem difficult to use.

Screenshots from the Tool:





SAP:

SAP offers a wide range of products to organizations based on their size and need. The most salient feature of SAP Business Intelligence solution is its ability to be customized completely based on the needs of its client. They have been very successful in the BI industry till date despite the competition. Some of its features are:
  •   Supports analysis through predictive modeling, trend analysis, ad-hoc analysis and issue indicators.
  •   Comprehensive and customizable dashboard.
  •   Customized BI platform depending on client needs.
  •   Support on mobile platforms.
  •   Combine visual data from multiple sources to analyze the trend and pattern in data.
  •   Performance ratio is very high.
  •   Support for financial/budget forecasting. 

Despite of having these favorable features, the complexity of the tool requires users to be trained SAP professionals. The other drawback is pricing, which is not affordable by many organizations.

Screenshots from the Tool:




IBM Cognos:

IBM Cognos is an online Business Intelligence platform hosted on cloud and offers a variety of products to cater to the different needs of the industry. Features provided are:
  •   Support for predictive analysis, trend analysis, ad-hoc analysis and analytic reports.
  •   Provides multiple interactive data visualization options online and offline.
  •   Supports integration of Microsoft Office applications.
  •   Scorecards are used for business strategy and performance management.
  •   Provides high scalability without compromising on its features and insights offered.
  •   Support for financial/budget forecasting.


The major drawbacks are: Performance ratio is very less and lacks ease-of-use.


Microstrategy:

Microstrategy’s focus is on providing ease-of-use to any user, technical or non-technical, the ability to use the tool. Microstrategy can be hosted on cloud. Some of the important features are:

  •   Provides a wide range of data visualization options.
  •   Supports map visualization for geographical data.
  •   Provide support to run on mobile platform.
  •   Supports trend analysis, ad-hoc analysis and predictive analysis.
  •   Customizable dashboard options from which user can query and fetch results.
  •   Easy report generation using the drag-and-drop options along with different rending options.
  •   Supports ability to access data offline.
  •   Supports data import from different data sources.


Looking at major drawbacks for Microstrategy, there is no integration with tools like R or other tools. It does not support financial/budget forecasting. And no predictive analysis tools were provided for easier analysis of the data patterns.

Screenshots from the Tool:



Alteryx:

Alteryx is another leading Business Intelligence tool that focuses at analytics level. It focuses on big data analytics and customer analytics. Some important features are:
  •   Provides predictive capabilities that are usable by all types of users.
  •   Provides predictive tools that are customizable based on the user’s need.
  •   Provides seamless and fast data blending workflow which uses the drag-and-drop technique to get the right data in the right environment.
  •   Support for financial/budget forecasting.
  •   Supports customizable dashboards.
  •   Supports ad-hoc analysis from the dashboard itself.

To consider the negatives of Alteryx, it has low ease-of-use ratio, performance is not up to the market expectation and reporting visualizations do not have much options.

Comparison Chart:


Product/Criteria
Score/Importance
Tableau
SAP
IBM
Microstrategy
Alteryx
Ease of use
15
12
6
9
11
7
Cost
5
3
3
4
5
5
Performance
15
14
13
8
11
5
Ad-hoc Analysis
15
10
15
15
10
15
Reporting Tools
20
14
18
18
11
9
Data Visualizations
20
20
20
16
16
15
Scalability
10
8
10
10
7
7
Total:
100
81
85
80
71
63
Rank

2
1
3
4
5