Thursday, February 19, 2015

Structured Data Vs Unstructured Data


Information retrieval is the process of extracting useful information from any source which provides insights to resolve a business problem. This process involves mining through any kind of data source available. The data could be of two different types.

Structured Data refers to information where the data model has an organized structure and provides a straight forward ways to perform searches using the traditional algorithms. A structured data mostly resides in a proper relational database. Only about 10%-20% of the data available are in this form. Some common examples are A structured data would typically look like this:



Unstructured Data refers to all the other data available outside which is not held in an organized data model or databases. Unstructured data usually contains garbage data in addition to the useful information. The challenge with this type of data is its processing to bring out the garbage vs useful information. Almost 80%-90% of the data available are unstructured. Some examples of unstructured data include social media content, word documents, anything recorded on a paper by human, etc.



Growth of Unstructured Data Vs Structured Data: There has been an exponential growth in volume of unstructured data than structured data. Two major reasons attribute to this uncontrollable growth: User experience is better with rich content like pictures, videos, music, X-rays etc. and the storage issues that accompany the rich text.



To manage the wild growth of unstructured data generated within an enterprise and to extract information from it, organizations have adopted two main methodologies: Big Data Tools and the Business Intelligence Tools. The most conventional way is using BI.
Data warehouse is the central repository of integrated data from disparate operational systems. It provides a structure to the raw data by organizing it in the form of OLAP cubes or dimensional modeling. The data from these modeling techniques are then used by the BI reporting tools.


With the realization of the growth in unstructured data and the importance of taping useful information from such raw data to make more appropriate business decisions has pushed many organizations to adopt data warehousing. To understand the necessity of data warehouse in an enterprise, watch the video below:




Advantages of data warehousing:
  • Potential high returns on investment for organizations.
  • Centralized, structured and standardized data for easy interpretation and understanding.
  • Provides competitive gain.
  • Improved decision making by the management over short period of time by providing right information at the right time.
  • Better enterprise intelligence to enhance customer service.
  • Provides improved reporting capabilities.


Limitations of data warehousing:
  •  Cost/Benefit Analysis is a major disadvantage of data warehousing and it may consume lot of IT man hours and budget.
  • Extra reporting work may be a problem because data warehouse requires each data type to be generated by the IT professionals.
  • Time consuming as it requires data to be extracted, cleaned and then loaded.
  • Data owners lose control over their data which creates concerns for data security/privacy issues.
  • Data flexibility can be a problem as the data warehouse tends to have static data with minimal ability to drill down to specific solutions.
  • Lot of time and money may be wasted over training and maintaining data warehouses especially in a large enterprise.


Future of Data Warehouse:


Hadoop and Data Warehouse will complement each other and grow together as the business needs to rip the big data grows. A new generation of data warehousing would come up to enhance analytics and reporting in addition to providing integration with the latest technology platforms that support processing of unstructured data. Future data warehousing will be able to provide a 360 view of an organization’s operations with much broader perspective. In addition to this, data warehousing on cloud will become the trend and organizations will need to prepare for the transition. Compatibility with anywhere any device will become the trend. All data warehousing activities should be supported through browser requests and organizations will be able to work using tablets and mobile phones without installing specialized application. One last support that will be expected is the ability to transform the entire web data into a data mesh and make connections as needed on the fly. This will enable the data warehouse to handle any type of data that may emerge in the future.




Resources:

Tuesday, February 3, 2015

Comparative Analysis on Business Intelligence Solutions

BI Tools used in the comparison: Tableau, SAP, IBM Cognos, Microstrategy and Alteryx 

Criterions for comparison:

  1. Interactive Data-Visualizations: Capability to represent interactive data in different visualizations so as to enable deeper analysis and understand the data better. Some BI tools provide drag-and-drop feature to make the development process of this visualization easier.
  2. Ad-Hoc querying capability: Capability that allows users query for specific information and any business related questions apart from the reporting and dashboard data.
  3. Cost: Pricing for the whole purchase of the product or licensing of the product usage has to be affordable according to the size of the organization. High prices for smaller companies may be too heavy on them and decrement the value of the product in the market.
  4. Ease of use: The tool should be easy to use. It should allow the business users to use the tool with minimal/no assistance from technical experts, incorporate features like drag-and-drop features for data blending, visualization and reporting, along with easy and understandable components which makes navigation easier.
  5. Scalability: The tools should support analysis on datasets of any size and be suitable to work with any organization. Any change in the organization size should not affect/restrict/create a need for the organization to move to another tool.
  6. Reporting capabilities: The tool should support different types of reports with different visual rendering.
  7. Performance: Importing data into the environment, navigating between the tools, time to generate reports and create visualizations for any size data sets should have high performance ratio. 


Tools in Detail:

Tableau:  

Tableau is one of the leading vendors in today’s market of Business Intelligence. Some of the most notable features are:
  •   It provides a variety of data visualization tools with drag-and drop options.
  •   Business users can handle it almost no technical assistance.
  •   Support to predictive and trend analysis.
  •   Supports importing of data from different data sources.
  •   Rapid fire intelligence supports ad-hoc analysis to deliver faster results than any other products in the current market.
  •   Built-in data engine that to support offline data analysis.
  •   Integration with R and Maps.
  •   Supported on multiple platforms like mobile, desktop and tablets.
  •   Support for financial/budget forecasting.



The most commonly faced drawback with tableau is its pricing, organization-wise customization and complex functionalities which may seem difficult to use.

Screenshots from the Tool:





SAP:

SAP offers a wide range of products to organizations based on their size and need. The most salient feature of SAP Business Intelligence solution is its ability to be customized completely based on the needs of its client. They have been very successful in the BI industry till date despite the competition. Some of its features are:
  •   Supports analysis through predictive modeling, trend analysis, ad-hoc analysis and issue indicators.
  •   Comprehensive and customizable dashboard.
  •   Customized BI platform depending on client needs.
  •   Support on mobile platforms.
  •   Combine visual data from multiple sources to analyze the trend and pattern in data.
  •   Performance ratio is very high.
  •   Support for financial/budget forecasting. 

Despite of having these favorable features, the complexity of the tool requires users to be trained SAP professionals. The other drawback is pricing, which is not affordable by many organizations.

Screenshots from the Tool:




IBM Cognos:

IBM Cognos is an online Business Intelligence platform hosted on cloud and offers a variety of products to cater to the different needs of the industry. Features provided are:
  •   Support for predictive analysis, trend analysis, ad-hoc analysis and analytic reports.
  •   Provides multiple interactive data visualization options online and offline.
  •   Supports integration of Microsoft Office applications.
  •   Scorecards are used for business strategy and performance management.
  •   Provides high scalability without compromising on its features and insights offered.
  •   Support for financial/budget forecasting.


The major drawbacks are: Performance ratio is very less and lacks ease-of-use.


Microstrategy:

Microstrategy’s focus is on providing ease-of-use to any user, technical or non-technical, the ability to use the tool. Microstrategy can be hosted on cloud. Some of the important features are:

  •   Provides a wide range of data visualization options.
  •   Supports map visualization for geographical data.
  •   Provide support to run on mobile platform.
  •   Supports trend analysis, ad-hoc analysis and predictive analysis.
  •   Customizable dashboard options from which user can query and fetch results.
  •   Easy report generation using the drag-and-drop options along with different rending options.
  •   Supports ability to access data offline.
  •   Supports data import from different data sources.


Looking at major drawbacks for Microstrategy, there is no integration with tools like R or other tools. It does not support financial/budget forecasting. And no predictive analysis tools were provided for easier analysis of the data patterns.

Screenshots from the Tool:



Alteryx:

Alteryx is another leading Business Intelligence tool that focuses at analytics level. It focuses on big data analytics and customer analytics. Some important features are:
  •   Provides predictive capabilities that are usable by all types of users.
  •   Provides predictive tools that are customizable based on the user’s need.
  •   Provides seamless and fast data blending workflow which uses the drag-and-drop technique to get the right data in the right environment.
  •   Support for financial/budget forecasting.
  •   Supports customizable dashboards.
  •   Supports ad-hoc analysis from the dashboard itself.

To consider the negatives of Alteryx, it has low ease-of-use ratio, performance is not up to the market expectation and reporting visualizations do not have much options.

Comparison Chart:


Product/Criteria
Score/Importance
Tableau
SAP
IBM
Microstrategy
Alteryx
Ease of use
15
12
6
9
11
7
Cost
5
3
3
4
5
5
Performance
15
14
13
8
11
5
Ad-hoc Analysis
15
10
15
15
10
15
Reporting Tools
20
14
18
18
11
9
Data Visualizations
20
20
20
16
16
15
Scalability
10
8
10
10
7
7
Total:
100
81
85
80
71
63
Rank

2
1
3
4
5