What we consider big data today will change with innovation and technological progress. However, a generally accepted definition is a collection of data sets that are so large and complex that they become difficult to process using traditional data processing applications (Wah, Cheng, & Wang, 2015). The amount of digital data available has increased exponentially from 25% in 2000 to over 98% in 2015. It is expected to double every two years with the current capacity of 130 exabytes and is expected to reach approximately 40,000 exabytes by 2020. For a quick understanding of exactly how large that figure is, one exabyte equals 1 billion gigabytes of data. Due to its size, it offers companies an untapped resource with vast potential. Big data can be considered the new oil that will fuel the future information economy (Wah et al., 2015). There are four distinct categorizations for big data: structured, unstructured, semi-structured, and mixed. Structured data is data that can be analyzed within existing data models and material information is extractable. Unfortunately, structured data only represents about 5% of all big data. Most of it is unstructured data that cannot be analyzed using traditional data processing methods and includes text, audio, video, and images. Semi-structured refers to data that is analyzable but lacks a formal data model structure. And finally, mixed is a combination of the previous three data points
tags