Putting simply, data means the collection of information. For example, your post on Facebook, the video you watched on YouTube, your search for an answer on Google, etc. All this is data, which means information.
What is Big Data?
As the name suggests, big data means a large amount of data. Big data typically refers to data storage amount in excess of one terabyte (TB) however it is not possible to specifically define big data. Big data is complex and therefore it is difficult to store, collect, maintain, analyze and visualize. Companies do processing on big data to extract useful information from it and eventually use it for making money and thereafter to use it for decision making. Companies use your data to show you things of your interest and that is how big data is used and processed.
Types of Big Data
There are 3 types of big data, namely:
- Structured Data
- Unstructured Data
- Semi-Structured Data
As the name suggests, structured data is any data which has a high level of organization with definite length and format, data model or ?schema? (structural representation of what is in the database) and this can be easily retrieved. Structured data is stored in databases, in excel sheet, or in tabular form in CSV file (Comma Separated Values), having a fixed number of rows and columns that clearly define its attributes. Structured data is highly organized and perspicuous for a machine language. Structured data is straightforward and is simple to handle and easy to store, format, execute, process, query and analyze. Structured data can be easily accessed and used by a computer program or by any person (user). Structured data is a Metadata that is hidden to the user by readable by search engines.
Structured data is often managed by SQL (Structured Query Language) for managing, querying and analyzing the data stored in RDBMS (Relational Database Management System) and Spreadsheets. Structured data contributes around 20 per cent to big data.
Example to understand better, as under the attribute name all names are mentioned and under the attribute email id all email ids are mentioned, it is kind of an employee table. In structured data there is no need to put much effort as things are presented in a well define manner.
As the name suggests, unstructured data is not in a structured format, without any format. Due to lack of any specific form or structure, it becomes very difficult and time-consuming to process and analyze unstructured data. Unstructured data is classified based on its source, into human-generated and machine-generated data. Human-generated data is found in abundance across the internet, for example emails, pictures uploaded on Facebook, videos we watch on YouTube, etc. Machine-generated data includes satellite images, scientific data procured from various experiments, radar data procured by various facets of technology, etc. Examples of unstructured data include text, audio, video, satellite images, surveillance images and recording, etc.
Semi-Structured data pertains to the data containing both the formats explained above, that is structured data and unstructured data. Data that is not in the traditional database format as structured data, but contain vital information which make it easier to process, are included in semi-structured data. For example, NoSQL documents are considered as semi-structured data, they also contain keywords that can be used to process the document easily.?