
Overview
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas.
The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community’s nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry.
The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
Preface

Dr Edward Curry
Co-Principal Investigator- the Insight Centre for Data Analytics, funded investigator- LERO The Irish Software Research Centre, Ireland
Computer science was created by humankind to solve problems. In 100 BC, early hand-powered computing devices such as the Antikythera mechanism were designed to calculate astronomical positions. In the 1800s, Charles Babbage proposed the Analytical Engine to solve general-purpose computational tasks. In the 1900s, the Bomde by Turing and Welchman was critical to code-breaking. Advances in computer science have been driven by the need for humanity to solve the most pressing challenges of the day. Today, computer science tackles significant societal challenges like organising the world’s information, personalised medicine, the search of the Higgs boson, climate change, and weather forecasts.
This book aims to educate the reader on how recent advances in technologies, methods, and processes for big data and data-driven Artificial Intelligence (AI) can deliver value to address problems in real-world applications. The book explores cutting-edge solutions and best practices for big data and data-driven AI and applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas, including health, energy, transport, finance, manufacturing, and public administration.
The book’s contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community’s nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The technological basis established in the BDV PPP will seamlessly enable future Partnership on AI, Data, and Robotics.
The book is of interest to two primary audiences: first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems and software design and deployment projects who are interested in employing these advanced methods to address real-world problems.
This book is arranged in two parts. The first part contains “horizontal” contributions of technologies and methods which can be applied in any sector. The second part includes contributions of innovative processes and applications within specific “vertical” sectors. First chapter provides an overview of the book by positioning the chapters in terms of their contributions to technology frameworks, including the big data value reference model and the AI, Data and Robotics framework, which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data, and Robotics.
Part I: Technologies and Methods details key technical contributions which enable data value chains. Second chapter investigates ways to support semantic data enrichment at scale. The trade-offs and challenges of serverless data analytics are examined in third chapter. Benchmarking of big data and Artificial Intelligence (AI) pipelines is the objective of fourth chapter, while fifth chapter presents an elastic software architecture for extreme-scale big data analytics. Sixth chapter details privacy-preserving technologies for trusted data spaces. Leveraging data-driven infrastructure management to facilitate AIOps is the focus of seventh chapter, and unified big-data workflows over High-Performance-Computing (HPC) and the cloud are tackled in eighth chapter.
Part II: Processes and Applications details experience reports and lessons from using big data and data-driven approaches in processes and applications. The chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Ninth chapter presents a toolkit for deep learning and computer vision over HPC and cloud architectures. Applying AI to manage acute and chronic clinical conditions is the focus of tenth chapter, while eleventh chapter explores 3D human big data exchange between the health and garment sectors. In twelfth chapter, we see how legal knowledge graphs can be used for multilingual compliance services in labour law, contract management, and geothermal energy. Thirteenth chapter focuses on big data analytics in the banking sector with guidelines and lessons learned from CaixaBank. Fourteenth chapter explores data-driven AI and predictive analytics for the maintenance of industrial machinery using digital twins. Fifteenth chapter investigates big data analytics in the manufacturing sector, and sixteenth chapter looks at the next generation of data-driven factory operations and optimisation. Large-scale trials of data-driven service engineering are covered in seventeenth chapter. Eighteenth chapter describes approaches for model-based engineering and semantic interoperability for digital twins across the product life cycle. In nineteenth chapter, a data science pipeline for big linked earth observation data is presented, and twentieth chapter looks ahead towards cognitive ports of the future. Distributed big data analytics in a smart city is the focus of twenty-first chapter, and twenty-second chapter looks at system architectures and applications of big data in the maritime domain. The book closes with twenty-third chapter exploring knowledge modelling and incident analysis for cargo.
Highlights
Book Highlights and Key Related Concepts
SUPPORTING SEMANTIC DATA ENRICHMENT AT SCALE
Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses…
learn more
TRADE-OFFS AND CHALLENGES OF SERVERLESS DATA ANALYTICS
Serverless computing has become very popular today since it largely simplifies cloud programming. Developers do not need to any longer worry about provisioning…
learn more
BIG DATA AND AI PIPELINE FRAMEWORK TECHNOLOGY ANALYSIS FROM A BENCHMARKING PERSPECTIVE
Big Data and AI Pipeline patterns provide a good foundation for the analysis and selection of technical architectures for Big Data and AI systems. Expe- riences from many projects in the Big Data PPP program has shown that a number of projects use similar architectural patterns with variations only in the choice of various…
learn more
AN ELASTIC SOFTWARE ARCHITECTURE FOR EXTREME-SCALE BIG DATA ANALYTICS
This chapter describes a software architecture for processing big-data analytics considering the complete compute continuum, from the edge to the cloud. The new generation of smart systems requires processing…
learn more
PRIVACY-PRESERVING TECHNOLOGIES FOR TRUSTED DATA SPACES
The quality of a machine learning model depends on the volume of data used during the training process. To prevent low accuracy models, one needs to generate more training data or add external data sources of the same kind. If the first…
learn more
LEVERAGING DATA-DRIVEN INFRASTRUCTURE MANAGEMENT TO FACILITATE AIOPS FOR BIG DATA APPLICATIONS AND OPERATIONS
As institutions increasingly shift to distributed and containerized appli- cation deployments on remote heterogeneous cloud/cluster infrastructures, the cost and difficulty of efficiently managing and maintaining data-intensive applications have risen. A new emerging solution to this issue is Data-Driven Infrastructure…
learn more
LEVERAGING HIGH-PERFORMANCE COMPUTING AND CLOUD COMPUTING WITH UNIFIED BIG-DATA WORKFLOWS: THE LEXIS PROJECT
Traditional usage models of Supercomputing centres have been extended by High-Throughput Computing (HTC), High-Performance Data Analytics (HPDA) and Cloud Computing. The complexity of current compute platforms calls for solutions to simplify usage and conveniently orchestrate computing tasks. These enable also…
learn more
THE DEEPHEALTH TOOLKIT: A KEY EUROPEAN FREE AND OPEN-SOURCE SOFTWARE FOR DEEP LEARNING AND COMPUTER VISION READY TO EXPLOIT HETEROGENEOUS HPC AND CLOUD ARCHITECTURES
At the present time, we are immersed in the convergence between Big Data, High-Performance …
learn more
APPLYING AI TO MANAGE ACUTE AND CHRONIC CLINICAL CONDITION
Computer systems deployed in hospital environments, particularly phys- iological and biochemical real-time monitoring of patients in an Intensive Care Unit (ICU) environment, routinely collect a large volume of data that can hold very useful information. However, the vast majority are either not stored and lost forever or are…
learn more
3D HUMAN BIG DATA EXCHANGE BETWEEN THE HEALTHCARE AND GARMENT SECTORS
3D personal data is a type of data that contains useful information for product design, online sale services, medical research and patient follow-up. Currently, hospitals store and grow massive collections of 3D data that are not accessible by researchers, professionals or companies. About 2.7 petabytes…
learn more
USING A LEGAL KNOWLEDGE GRAPH FOR MULTILINGUAL COMPLIANCE SERVICES IN LABOR LAW, CONTRACT MANAGEMENT, AND GEOTHERMAL ENERGY
This chapter provides insights about the work done and the results achieved by the Horizon 2020-funded Innovation Action “Lynx—Building the Legal Knowledge Graph for Smart Compliance Services in …
learn more
BIG DATA ANALYTICS IN THE BANKING SECTOR: GUIDELINES AND LESSONS LEARNED FROM THE CAIXABANK CASE
A large number of EU organisations already leverage Big Data pools to drive value and investments. This trend also applies to the banking sector. As a specific example, CaixaBank currently manages more than 300 different data sources (more than 4 PetaBytes of data and …
learn more
DATA-DRIVEN ARTIFICIAL INTELLIGENCE AND PREDICTIVE ANALYTICS FOR THE MAINTENANCE OF INDUSTRIAL MACHINERY WITH HYBRID AND COGNITIVE DIGITAL TWINS
This chapter presents a Digital Twin Pipeline Framework of the COG- NITWIN project that supports Hybrid and Cognitive Digital Twins, through four Big Data and AI pipeline steps adapted for Digital Twins. The pipeline steps are Data Acquisition, Data Representation, AI/Machine learning, and Visualisation and Control. Big Data and…
learn more
BIG DATA ANALYTICS IN THE MANUFACTURING SECTOR: GUIDELINES AND LESSONS LEARNED THROUGH THE CENTRO RICERCHE FIAT (CRF) CASE
Manufacturing processes are highly complex. Production lines have several robots and digital tools, generating massive amounts of data. Unstructured, noisy and incomplete data have to be collected, aggregated, pre-processed and transformed into structured messages of a common, unified format in order to be analysed not…
learn more
NEXT-GENERATION BIG DATA-DRIVEN FACTORY 4.0 OPERATIONS AND OPTIMIZATION: THE BOOST 4.0 EXPERIENCE
This chapter presents the advanced manufacturing processes and big data-driven algorithms and platforms leveraged by the Boost 4.0 big data lighthouse project that allows improved digital operations within increasingly automated and intelligent shopfloors. The chapter illustrates how three different companies have been able to…
learn more
BIG DATA-DRIVEN INDUSTRY 4.0 SERVICE ENGINEERING LARGE-SCALE TRIALS: THE BOOST 4.0 EXPERIENCE
In the last few years, the potential impact of big data on the manufactur- ing industry has received enormous attention. This chapter details two …
learn more
MODEL-BASED ENGINEERING AND SEMANTIC INTEROPERABILITY FOR TRUSTED DIGITAL TWINS BIG DATA CONNECTION ACROSS THE PRODUCT LIFECYCLE
With the rising complexity of modern products and a trend from single products to Systems of Systems (SoS) where the produced system consists of multiple subsystems and the integration of multiple domains is a mandatory step, new approaches for development…
learn more
A DATA SCIENCE PIPELINE FOR BIG LINKED EARTH OBSERVATION DATA
The science of Earth observation uses satellites and other sensors to monitor our planet, e.g., for mitigating the effects of climate change. Earth observation data collected by satellites is a paradigmatic case of big data. Due to programs such as Copernicus in Europe and Landsat in the United States, Earth observation data is…
learn more
TOWARDS COGNITIVE PORTS OF THE FUTURE
In modern societies, the rampant growth of data management technologies—that have access to data sources from a plethora of heterogeneous systems—enables data analysts to leverage their advantages to new areas and critical infrastructures. However, there is no global reference standard for data platform technology. Data…
learn more
DISTRIBUTED BIG DATA ANALYTICS IN A SMART CITY
This chapter describes an actual smart city use-case application for advanced mobility and intelligent traffic management, implemented in the city of …
learn more
PROCESSING BIG DATA IN MOTION: CORE COMPONENTS AND SYSTEM ARCHITECTURES WITH APPLICATIONS TO THE MARITIME DOMAIN
Rapidly extracting business value out of Big Data that stream in corporate data centres requires continuous analysis of massive, high-speed …
learn more
KNOWLEDGE MODELING AND INCIDENT ANALYSIS FOR SPECIAL CARGO
The airfreight industry of shipping goods with special handling needs, also known as special cargo, suffers from nontransparent shipping processes, resulting in inefficiency. The LARA project (Lane Analysis and Route Advisor) aims at addressing these limitations and bringing innovation in special cargo route planning so as to…
learn more
Chapters
CitationEditors
Edward Curry
Insight, DSI, NUI Galway
Edward Curry is a research leader at the Insight SFI Research Centre for Data Analytics. He has made contributions to semantic technologies, incremental data management, event processing middleware, software engineering, and distributed systems and information systems. Edward combines strong theoretical results with high-impact practical applications. He is also co-founder and elected Vice President of the Big Data Value Association, an industry-led European big data community.
Sören Auer
Leibniz University of Hannover, L3S
Sören Auer is Professor of Data Science and Digital Libraries at Leibniz Universität Hannover and Director of the TIB, the largest science and technology library in the world. He has made important contributions to semantic technologies, knowledge engineering and information systems. He is co-founder of several high potential research and community projects such as the Wikipedia semantification project DBpedia, the scholarly platform knowledge graphorkg.org and the innovative technology start-up eccenca.com. Sören also was founding director of the Big Data Value Association, led the semantic data representation in the International Data Space, and is an expert for industry, the European Commission and W3C.
Arne J. Berre
Chief Scientist, SINTEF
Arne J. Berre is Chief Scientist at SINTEF Digital and Innovation Director at the Norwegian Center for AI Innovation (NorwAI), responsible for the GEMINI center of Big Data and AI. He is the leader of the BDVA/DAIRO TF6 on technical priorities including responsibilities for data technology architectures, data science/AI, data protection, standardisation, benchmarking and HPC, as well as the lead of the Norwegian committee for AI and Big Data with ISO SC 42 AI.
Andreas Metzger
University of Duisburg-Essen, Germany
Andreas Metzger is senior academic councillor at the University of Duisburg-Essen and heads the Adaptive Systems and Big Data Applications group at paluno, the Ruhr Institute for Software Technology. His background and research interests are software engineering and machine learning for adaptive systems. Among other leadership roles, Andreas acted as Technical Coordinator of the European lighthouse project TransformingTransport, which demonstrated the transformations that big data and machine learning can bring to the mobility and logistics sector.
Maria S. Perez
Ontology Engineering Group, Universidad Politécnica de Madrid
Maria S. Perez is full professor at the Universidad Politécnica de Madrid (UPM). She is part of the Board of Directors of the Big Data Value Association and also a member of the Research and Innovation Advisory Group of the EuroHPC Joint Undertaking. Her research interests include data science, big data, machine learning, storage, high performance, and large-scale computing.
Sonja Zillner
Siemens AG
Sonja Zillner works at Siemens AG Technology as Principal Research Scientist, focusing on the definition, acquisition and management of global innovation and research projects in the domain of semantics and artificial intelligence. Since 2020 she is Lead of Core Company Technology Module “Trustworthy AI” at Siemens Corporate Technology. Before that, from 2016 to 2019 she was invited to consult the Siemens Advisory Board in strategic decisions regarding artificial intelligence. In addition, Sonja is professor at Technical University in Munich.




