Solution
Ahmedali answered on
Apr 09 2020
Big Data & its Issues
Big Data & its Issues
Big Data & its Issues
Academic Research Pape
4/14/2018
Table of Contents
Introduction 3
Definition 3
Description 3
Application 3
Background 4
Data Challenges 4
Processing Challenges 4
Management Challenges 4
Big Data Platform Technology 5
Big Data Issues 5
Scalability Issues 5
Quality Issues 6
Security Issues 6
Privacy Issues 7
Solutions & Countermeasures 8
Conclusion 10
References 12
Abstract
Big Data is a technology that allows the business organizations and firms to manage huge clusters of data sets. It is characterised by 5Vs as Volume, Variety, Velocity, Variability, and Value. Big Data is being utilized in numerous sectors and areas as healthcare organizations, education institutes, real-estate firms, financial organizations, e-commerce, marketing, sales and advertising, customer relations, and many others. Big data analytics is also utilized in maintaining customer interactions and relationships. There are, however, certain issues that are associated with Big Data, such as data challenges, management challenges, processing challenges, scalability issues, quality issues, security and privacy issues.
The report covers the introduction of Big Data covering definition, description, and application. The background is also included in the research paper in terms of data challenges, management challenges, and processing challenges. There are also issues associated with Big Data that are listed and explained in four sections viz. scalability issues, quality issues, security issues, and privacy issues. The solutions and countermeasures to these issues are also covered followed by conclusion.
Introduction
Definition
Big Data is a buzzword in the cu
ent era of technology and refers to the huge clusters of information and data sets that may comprise of structured, unstructured, or semi-structured data. There are different numbers of sources that these data sets may come from. Big Data tools and technologies are being utilized by the business organizations and firms for handling and managing their data sets (Oussous, Benjelloun, Ait Lahcen & Belfkih, 2017).
Description
Big Data is characterised by five essential properties that are also refe
ed as 5Vs of Big Data.
· Volume: There are huge clusters of data and information sets that are associated with the business organizations. The term big in Big Data refers to the huge volumes of these sets of data.
· Velocity: There is a rapid increase in the need of the data sets by the business organizations and the speed at which this data gets generated is also extremely fast (De Mauro, Greco & Grimaldi, 2016).
· Variety: There is diversity in the data sets that are involved in Big Data. The diversity is present in terms of the structure, such as these sets may be structured, unstructured, or semi-structured in nature. Also, they may belong to different data types and formats as well.
· Variability: The data sets in Big Data can be interpreted and utilized in varied forms and purpose.
· Value: The process of data collection gathers data from numerous sources of data. However, it is necessary to ensure that these sets are accurate and relevant.
Application
Big Data tools and technologies are being applied in almost every business sector in the cu
ent times. Some of these areas and applicants include healthcare organizations, education institutes, real-estate firms, financial organizations, e-commerce, marketing, sales and advertising, customer relations, and many others. Big data analytics is also utilized in maintaining customer interactions and relationships.
Background
Data Challenges
The amount of data and information that is being generated and utilized by the business organizations is doubling in every two years. Most of these data sets are unstructured and this
ings up the problem of data storage and analysis. The purpose of the business organizations is not to simply generate and store the data sets. However, it is to generate insights from these data sets in a timely manner. With the variety and volumes of the data, the task is becoming a major challenge. The increase in the number of data sources and the inclusion of variety of data sets has led to emergence of data security and privacy challenges (Sivarajah, Kamal, Irani & Weerakkody, 2017). The application of security and privacy protocols is not easy to apply as there can be no standard protocols used. Scalability and data quality issues are also prime data challenges that are associated with Big Data.
Processing Challenges
The processing and analysis of the majority of the Big Data sets shall be done in real-time. As a result, the bidding servers were moved to auto-scaling infrastructure. The main processing issue that comes up in the infrastructure is the figuring out the instance when all the files from the servers are shipped and are ready for processing. There are also dependencies that exist that make the processing further challenging. As a result, there are delays that come up in the processing of the data sets and making the same available for analytics and reporting activities. There may be manual intervention needed in the events of hardware issues or exceeding of the configured timeout which may lead to other e
ors and mistakes. There are cluster issues in Big Data technologies and tools, such as Hadoop (JI et al., 2012). If the cluster fails, then a manual re-run of the jobs is required to be done to fix the issue. This may be time taking and may further delay the processing of the information and data sets.
Management Challenges
Data governance may come up as a major challenge in the management of the Big Data sets. It becomes necessary to solve these challenges for the management which may be complex and there may be a lot of changes in the policies and technologies that may be required to be done. The management may require setting up data administrators and managers for observing data governance and it may also become necessary to invest in the automated data governance management tools which may not be cost-effective from the organization point of view. There may also be legal issues and obligations that may come up as a result of non-compliance towards the applicable legal policies and norms. The management would be required to handle all of such issues to avoid the occu
ence of any major legal risk.
Big Data Platform Technology
Big Data makes use of a number of technologies and tools. Hadoop Distributed File System (HDFS) is the primary platform and framework that is used for data analysis and transformation. It is based on the MapReduce paradigm and has an interface that is patterned with the UNIX file system. Hadoop has the prime characteristic as the data partitioning and computation that may be ca
ied out across thousands of hosts. Hadoop cluster provides the scaling capacities in terms of computation, storage, and I/O bandwidth (Dhyani & Barthwal, 2014). The metadata and the application data are stored separately in these file systems. NameNode is the dedicated server on which the metadata is stored by HDFS. The dedicated...