Thursday, February 19, 2015

Structured Data Vs Unstructured Data


Information retrieval is the process of extracting useful information from any source which provides insights to resolve a business problem. This process involves mining through any kind of data source available. The data could be of two different types.

Structured Data refers to information where the data model has an organized structure and provides a straight forward ways to perform searches using the traditional algorithms. A structured data mostly resides in a proper relational database. Only about 10%-20% of the data available are in this form. Some common examples are A structured data would typically look like this:



Unstructured Data refers to all the other data available outside which is not held in an organized data model or databases. Unstructured data usually contains garbage data in addition to the useful information. The challenge with this type of data is its processing to bring out the garbage vs useful information. Almost 80%-90% of the data available are unstructured. Some examples of unstructured data include social media content, word documents, anything recorded on a paper by human, etc.



Growth of Unstructured Data Vs Structured Data: There has been an exponential growth in volume of unstructured data than structured data. Two major reasons attribute to this uncontrollable growth: User experience is better with rich content like pictures, videos, music, X-rays etc. and the storage issues that accompany the rich text.



To manage the wild growth of unstructured data generated within an enterprise and to extract information from it, organizations have adopted two main methodologies: Big Data Tools and the Business Intelligence Tools. The most conventional way is using BI.
Data warehouse is the central repository of integrated data from disparate operational systems. It provides a structure to the raw data by organizing it in the form of OLAP cubes or dimensional modeling. The data from these modeling techniques are then used by the BI reporting tools.


With the realization of the growth in unstructured data and the importance of taping useful information from such raw data to make more appropriate business decisions has pushed many organizations to adopt data warehousing. To understand the necessity of data warehouse in an enterprise, watch the video below:




Advantages of data warehousing:
  • Potential high returns on investment for organizations.
  • Centralized, structured and standardized data for easy interpretation and understanding.
  • Provides competitive gain.
  • Improved decision making by the management over short period of time by providing right information at the right time.
  • Better enterprise intelligence to enhance customer service.
  • Provides improved reporting capabilities.


Limitations of data warehousing:
  •  Cost/Benefit Analysis is a major disadvantage of data warehousing and it may consume lot of IT man hours and budget.
  • Extra reporting work may be a problem because data warehouse requires each data type to be generated by the IT professionals.
  • Time consuming as it requires data to be extracted, cleaned and then loaded.
  • Data owners lose control over their data which creates concerns for data security/privacy issues.
  • Data flexibility can be a problem as the data warehouse tends to have static data with minimal ability to drill down to specific solutions.
  • Lot of time and money may be wasted over training and maintaining data warehouses especially in a large enterprise.


Future of Data Warehouse:


Hadoop and Data Warehouse will complement each other and grow together as the business needs to rip the big data grows. A new generation of data warehousing would come up to enhance analytics and reporting in addition to providing integration with the latest technology platforms that support processing of unstructured data. Future data warehousing will be able to provide a 360 view of an organization’s operations with much broader perspective. In addition to this, data warehousing on cloud will become the trend and organizations will need to prepare for the transition. Compatibility with anywhere any device will become the trend. All data warehousing activities should be supported through browser requests and organizations will be able to work using tablets and mobile phones without installing specialized application. One last support that will be expected is the ability to transform the entire web data into a data mesh and make connections as needed on the fly. This will enable the data warehouse to handle any type of data that may emerge in the future.




Resources:

No comments:

Post a Comment