Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i) cloud computing and Big Data enable science discoveries and application developments; (ii) cloud computing provides major solutions for Big Data; (iii) Big Data, spatiotemporal thinking and various application domains drive the advancement of cloud computing and relevant technologies with new requirements; (iv) intrinsic spatiotemporal principles of Big Data and geospatial sciences provide the source for finding technical and theoretical solutions to optimize cloud computing and processing Big Data; (v) open availability of Big Data and processing capability pose social challenges of geospatial significance and (vi) a weave of innovations is transforming Big Data into geospatial research, engineering and business values. This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
The variety and veracity of social media and other streamed data pose new challenges to the contemporary data processing and storage frameworks and architectures. For Big Data management, many non-traditional methodologies such as NoSQL and scalable SQL are implemented (Nambiar, Chitor, and Joshi 2014). More than often, NoSQL databases, such as MongoDB and Hadoop Hive, are used to store and manage social media data as document entries instead of relational tables (Padmanabhan et al. 2013; Huang and Xu 2014). Meanwhile, to address big streaming data processing challenges, scalable distributed computing environments based on cloud computing are leveraged (Gao et al. 2014; Cao et al. 2015; Huang et al. 2015). For example, Zelenkauskaite and Simes (2014) implemented an android-based mobile application and designed a cloud architecture to perform computationally intensive operations, including searching, data mining and large-scale, data processing. Huang et al. (2015) presented a C
id: 6e8380eded48072ab1e60b8334a310bb - page: 22
g. social media, socioeconomic data) to track disaster events, produce maps, and perform spatial and statistical analysis for disaster management. The proposed framework supports spatial Big Data analytics of multiple sources. Cao et al. (2015) also presented a scalable computational framework using an Apache Hadoop cluster to process massive location-based social media data for efficient and systematic spatiotemporal data analysis. An interactive flow mapping interface supporting real-time and interactive visual exploration of movement dynamics is developed and used to demonstrate the advantages and performance of this framework. Additionally, multimedia streaming data (e.g. social media, remote sensing) are difficult to analyze and process in real time because of the rapid arriving speed and voluminous data fields. Zhang et al. (2015c) constructed a Markov chain model to predict the varying trend of big stre
id: e7100be128eb6444109d2686fb292532 - page: 22
5.6. Quality of Service Quality of Service (QoS) describes the overall performance and is particularly important for Big Data applications and cloud computing in scheduling applications on the distributed cloud (Chen et al. 2013; Sandhu and Sood 2015b). If data services and cloud data centers are geographically distributed, it is essential to monitor the QoS globally for Big Data implementation and cloud computing. For example, Xia et al. (2015a) used thousands of globally distributed volunteers to monitor the OGC 33 34 C. YANG ET AL.
id: 304fd8882c060d9560c2b63606871194 - page: 22
Web Map Services (WMS) and Web Coverage Services (WCS). Sandhu and Sood (2015b) proposed a global architecture for QoS-based scheduling for Big Data applications distributed to different cloud data centers. Kourtesis, Alvarez-Rodrguez, and Paraskakis (2014) outlined a semanticbased framework for QoS management to leverage semantic technologies and distributed and data-streamed processing techniques. However, more efforts should be devoted to handling multiple QoS requirements from different users in the process of resource and task scheduling within a single or multiple cloud environment(s). 5.7. Cloud computing benchmark and adoption
id: c2323db6b243c93b5b16b5bea5132239 - page: 23