One datum and many values for sustainable Industry 4.0: a prognostic and health management use case

Industrial context of today, driven by the Industry 4.0 paradigm, is overwhelmed by data. Decreasing cost of innovative technologies, and recent market dynamics have pushed and pulled respectively for those architectures and practices in which data are the masters. While advancing, we have to take care of waste, even though intangibility of data makes them hardly connected to waste. In this paper we are going to reflect on data intensive context of today, focusing on the industrial sector. A smart approach for fully exploiting data collecting infrastructures is proposed, and its declination in a prognostic and health management (PHM) use case set inside an automatic painting system is presented. The contributions of this papers are mainly two: first of all, the general conceptual take-away of "data re-use" is presented and discussed. Moreover, a PHM solution for painting system's number plates, based on optical character recognition (OCR), is proposed and tested as a proof-of-concept for the "data re-use" concept. Summarizing, the already-in-use data sharing principle for achieving transparency and integration inside Industry 4.0, is presented as complementary with the proposed "data re-use", in order to develop a really sustainable shift toward the future.


INTRODUCTION
The Industry 4.0 paradigm has attracted a lot of attention from both practitioners and academicians. Several efforts have been made in order to define and establish best practices from disparate points of view with a particular focus on sustainability of the industrial practice [1]- [4]. Focusing on the industrial data domain, given that the innovative and technologically advanced configuration of the smart factory enables massive data collection, data fusion has been identified as a promising field for extracting value from raw industrial data. On the other hand, investments in Industry 4.0 technologies are generally expensive, reason why we think that it is essential to define and envision all obtainable values from each investment during the evaluation phase. Specifically, we refer to envisioning possible second level and modular applications, starting from the same data, for making the investments worth. To corroborate this concept, we will show an implementation example inside an Italian manufacturing firm, where an investment has been exploited for performing additional tasks respect to those mentioned in the original innovation project. In this experimental work, we will show how the cameras, usually adopted for implementing optical character recognition (OCR) to take trace of batch number associated to the production orders, are exploitable for carrying on the maintenance of painting system's number plates in a predictive way. Specifically, the number plates, essential for properly managing the equipment set-up coherently with the kind of product to be painted, enter the

STATE OF THE ART 2.1. Industry 4.0
The industrial context of today is studded by advanced, connected, and automated manufacturing equipment, able to self-adapt working parameters coherently with the diversified customers' orders among other things [5], [6]. Flexibility, accuracy, precision, and effectiveness are mandatory for properly managing the low-volume and high-variety product portfolio which characterizes the vast majority of manufacturing firms [7], [8]. Industry 4.0, simplistically summarized into the wide adoption of innovative technologies within the manufacturing environment [9], has driven the change toward the essential smart factories of the future: digital, collaborative, connected, automated, and optimized [10]. Along with the industrial revolution, sensors themselves are becoming more and more advanced in order to meet the requirements of the contemporary manufacturing context [11], enabling applications as self-identification, up to selfconfiguration of industrial machinery, also referred to as self-X [12]. These innovative sensors have driven the change toward innovative practices, like PHM, which tries to predict upcoming failures and faults of manufacturing equipment, starting from the collection of data in real time, and analyzing them with advanced analytic [13]. Recently, care for sustainability has involved Industry 4.0 also. Several researchers stressed the importance of sustainable industrial practice and development, highlighting the necessity for sustainable industries as long as attempting to develop a framework for sustainable Industry 4.0 [14], [15].

Prognostic and health management
Among best practices in Industry 4.0 we find PHM, which can be summarized into the estimation of the current health status of a component or equipment [16], and the subsequent prediction of remaining useful life (RUL) with the aim of optimizing maintenance interventions [17]. The PHM concept is often connected to the predictive maintenance (PdM) philosophy, which can be viewed as the application of sensors and analytic to predict equipment's failures and prevent them doing maintenance before the failure occurs [18]. There are several works in present Literature that deal with PdM using images or videos as data to be analyzed. Nonetheless, none of them uses OCR nor is focused on doing maintenance of number plates. In this paper we are going to present a PHM use case, where images of number plates are used for assessing their dirtiness, and the consequent need for maintenance of these number plates that are used by an automated painting system to manage product customization.

Data fusion
In order to present the smart factory from a practical point of view, some researchers focused on architectures, and implementation road-maps, presenting some solutions in the realm of data fusion techniques as fruitful value extraction method [19], [20]. Indeed, one very common pattern is to collect data, eventually heterogeneous and polymorphic (i.e. in different formats), from disparate sources, and to use all of them for achieving improved accuracy and more specific inferences respect to those achieved by the use of a single data source alone [21], [22]. Their proposals aim at merging various disseminated information, collected or elaborated through technologically advanced instrumentation, in order to develop a broader and deeper understanding of things, useful for coherently manage decision making in a more conscious way [23]. Certainly, their proposal is promising and already attempted, but it proved to require in-depth and complex studies on a case-by-case basis.

Data sharing
Additionally, several research streams on data management and data exploitation in Industry 4.0, are focused on presenting the value of data and information sharing inside the industrial environment [24]- [26]. The cyber-physical production systems (CPPS), which means developing each physical manufacturing asset in the virtual world also, through the adoption of sensors collecting data in real-time and updating their digital twin, is an essential example of information sharing and transparency [27], [28]. Indeed, the virtual part of CPPS usually lives inside the cloud, which makes everything accessible from almost everywhere, putting in place the kind of sharing required in Industry 4.0 for a reduced cost [29]. In some works, information sharing is presented as the key to extend conclusions drawn for one system, to similar systems [30] The kind of sharing discussed in these works, is the one incorporated in the integration and virtualization design principles [31], [32]. In more dept, horizontal integration is inter-companies, while vertical integration is intra-company [33], and both strongly relies upon information availability to several interested users and entities.

Data vs value
Whether fed into data fusion models or simply shared within and outside the company boundaries, data is king in today's industrial context. One conclusion backed by almost all researchers dealing with the topic of industrial data, is that the technological advancement and the efforts toward Industry 4.0 have driven the development of data collecting architectures able to gather a huge, often redundant, amount of data from a wide variety of sources [34]. The reliability of the data architecture itself together with data manipulation and knowledge extraction methodologies are hence of critical importance [35]. It is crucial to highlight the difference between data and information, the former being sometimes useless, and unreliable [36], the latter being something that has to be extracted from the data in order to develop real and practical knowledge [37]. In fact, between data and knowledge there is an entire data life-cycle to be covered [33]. For this reason, too many data, collected and not fully utilized, can increase confusion, damper knowledge extraction process [38], and increase costs [39]. Moreover, an additional problem arising in data-driven contexts is the difficult provision of the right data to the right "person", otherwise the usefulness of the entire system becomes borderline [40].
In study [41] in-site processing is proposed as a means of solving the problem of excessive data transmission and storage. Despite this attempt, we found very few researches focused on highlighting how much it is necessary to develop a critical and prioritized data collection architecture while ensuring sustainability of the whole system. Especially the sustainability of production models innovation, and so of the smart factory, should be definitely prioritized, as highlighted by Savastano et al. [42], Cioffi et al. [43], and Jamwal et al. [44].

Multi-purpose data
In order to extract as much value as possible from the investments in Industry 4.0, avoiding to be overwhelmed by data acquisition quirk, it is necessary to exploit technologies already installed, and data gathered, for as many purposes as possible. In this way we can enlarge the obtainable results and values at almost no additional cost and without complicating architectures and practices in use. In more depth, what we conceptually propose is to implement the design principles of semantic and operational interoperability, and decentralization [45], into the investments planning and data utilization phases, for nurturing and fully exploiting the innovative architecture of the smart factory. Two approaches for data utilization can and should coexist inside the manufacturing environment, for almost every kind of datum collected: edge processing for decentralized decision making and precise information extraction, and cloud integration, eventually using data fusion techniques, for implementing the CPPS and for sharing information across companies' boundaries. This proposal has not been detailed and deepened enough in existing literature. In study [46] field-level networking is proposed in order to cope with horizontal and vertical integration. Moreover, the authors envision field level data gathering and processing for various purposes (from a layer and protocols point of view), but without organizing this concept and without going into details. According to Lasi et al. [47], we lack models of development of Industry 4.0, moreover, according to Savastano et al. [42], the majority of academic literature is technical and focused on engineering aspects, while very few works deal with managerial strategies and approaches. Coherently, inside Section 4, we will propose a conceptual and strategic approach for a better evaluation and prioritization of Industry 4.0 investments, for ensuring sustainability and full value extraction from installed technologies. Our proposal is broader and more structured respect to the suggestion made in Tonelli et al. [48], which is the inclusion of primary and secondary stakeholders into data collection phase, or to the concept of data re-use loosely mentioned by Gorecky et al. [49] and Preuveneers and Ilie-Zudor [40].

Optical character recognition in Industry 4.0
OCR is a computer vision technique which allows to impart the human reading capability to machines. Jain and Sharma [50] several OCR-based nonindustrial applications have been implemented in one system only, but they remain somewhat disconnected one from another. Focusing on the industrial environment, OCR is widely adopted for automatically reading numbers associated to production planning systems [51], especially in those contexts where other identification technologies like radio frequency identification (RFID) might not be reliable or durable enough. In study [52] an attempt of integrating multiple value extraction into one system has been proposed. In more details, the authors propose a modular toolkit that allows various applications grounding on MV methods, thus extracting many values from the same system, hence, from the same datum. Focusing on MV in general, a detailed description and list of industrial applications is presented in [53]. In some of the presented use cases the image captured is used for multiple purposes, which is one example of what we are conceptualizing in this paper.

MATERIALS AND METHODS
The use case we are going to present, as practical example and origin of the concept, has been developed inside a renown Italian Manufacturing firm, iGuzzini Illuminazione S.p.A., which is a top-quality manufacturer of lighting devices, with a strong experience in lightning projects for public and cultural heritage buildings. Among others, also the renown Scrovegni Chapel in Padua, Italy, has been enlightened by customized and high tech products of the company [54]. Thanks to relevant investments in Industry 4.0, the firm has acquired an innovative autonomous painting system [55], made of several painting robots able to self-adjust working parameters coherently with each production batch. Automatic painting systems, being precise, effective still fragile, require an extremely reliable identification system, which allows them to set working parameters coherently with the dimensions and requirements of the batch under process. Products are automatically transported along the entire route by a suspended conveyor belt, where the racks and their relative tags are attached. The numeric tags are metal plates with carved digits. They travel both inside and outside the painting rooms, together with the hanging products, reason why part of the sprayed paint can stick to the carvings making the next reading of the plate harder.
In fact, in order to manage robot's settings, a RGB camera, with 2064×1544 resolution and Gigabit Ethernet connection, frames the back-lit carved metal plates entering the painting room with the aim of reading the associated number, and acquire related batch information from the information system of the company, as depicted in Figures 1(a) and (b). In Figure 1(a) the experimental setup adopted for the preliminary definition of positioning. Erroneous readings of the plates can cause incorrect robot's settings and turn into expensive damages to robot's arms, together with production stop and related consequences. In Figures 2(a) and (b) we show the picture of a plate taken outside of the system setup, and two images of respectively a clean plate 2(b), reading accuracy achieved is 93%, compared to another reasonably dirty 2(c), reading accuracy achieved is 67%. Figures 2(b) and 2(c) have been taken using the previously presented setup. As can be noted, the plate snapped in Figure 2(c) may require maintenance after the current passage in the painting rooms given the actual health status, already not optimal even if legible. The frames shown are snapped by the installed camera and back-lit system, and are typical input frame for both the original OCR system, deputed to painting robot's parameter setting, and for our plates' maintenance system.
Preliminary system setup adopted Definitive system setup adopted   Figure 2. Examples of tags used in this work; (a) outside the system setup, (b) and (c) snapped by the system described, respectively related to a clean and to a dirty tag.
Indeed, we decided to exploit the already-in-use data acquisition system, and we developed a PHM solution able to help maintainers in up-keeping the painting number plates. From an architectural point of view, we are adding a modular application, able to work in real-time and to extract additional value from the investment made by the company. The data acquisition architecture is not modified by our additional system, which works in parallel by taking as input data the frames already captured for the automatic reading of the plates' number. These frames are temporarily stored into a database, which is easily accessible in real-time, allowing us to exploit the data collected in parallel for the additional PHM-related purposes. Our prognostic system takes trace of the percentage of number plates reading accuracy, with the aim of eventually sending alert to maintainers.
The process steps we followed for developing the PHM system are the following: Step 1. Collect sample images to be used for developing a custom OCR; Step 2. Clean and pre-process the images; Step 3. Train the customized OCR; Step 4. Collect a testing data set; Step 5. Label each test frame as either "Clean" or "Dirty"; Step 6. Apply the prognostic algorithm to the testing set, simulating the real behavior (one frame after another, as in the real-time situation, some frames are connected to the same plate, framed after one additional painting cycle). , thus being a fairly balanced set of data. These samples have been collected using the system implemented. Each frame has been pre-processed through binarization, morphological closing, and noise reduction thanks to blob analysis, and contains one digit only. By training a custom OCR, we are able to create a model optimized for the specific font and characteristics of the system under analysis.
Once trained our custom OCR, we labeled each testing sample. Among the 950 test images, 795 are "Clean" and 155 are "Dirty". The prognostic algorithm, whose flowchart is shown in Figure 3, starts with reading a frame collected by the already in use system, by drawing in real-time from the database where the frames are temporarily stored. This frame is firstly binarized using a fixed threshold, easy to be set given that the distance in intensity value between dark pixels and light ones is very large, and that the working conditions are very standardized. Once binarized, we apply our custom OCR and extract the accuracy of the number reading, and store it in the historical series of accuracy which we created for each number plate. If the current accuracy is lower than 60 %, or if it has dropped more than 20% respect to the last reading of the plate under consideration, the system sends an alert to the maintenance manager in order to warn him that a specific plate needs maintenance if he is willing to avoid possible misreading of it at the next cycle inside the painting system. {the second criterion for generating an alert, is a rapid drop in percentage, which suggests the fact that the past painting cycle faced by the plate has caused heavy painting residuals on carving. Therefore, the particular plate require maintenance even though reading accuracy is higher than the threshold value identified as critical. Given that we store an historical series of accuracy associated to each specific number plate, it is easy to find when one has suffered a strong decrease during the last painting cycle, hence suggesting maintenance intervention. It is non trivial to have some plates associated to a constant accuracy for several cycles, and some other suffering from rapid decrease. This, because every plate carries a specific order, which is associated to specific painting parameters and materials. By changing these factors, the overall dirtiness level that can stick on carvings changes too. The just described algorithm used on the testing set, suggested 798 times to avoid maintenance (no alert has been generated), and suggested maintenance in 152 cases. All the 795 "Clean" plates have been read with accuracy higher than 85%. Three "Dirty" plates have been read with an accuracy higher than the 60% threshold, reason why the system did not suggest maintenance for them. We have to say that the labeling has been carried out manually by us, and the degree of dirtiness affecting the plates labeled as "Dirty" is variable. The three mis-classified plates are not among the dirtiest ones, justifying in some ways the error made by the algorithm. For the sake of experiment, we decided to avoid doing maintenance on the 152 plates for which the algorithm generated an alert. In 67 cases the frame of these plates after another cycle inside the painting rooms caused misreading of the number, suggesting the importance and efficacy of the PHM system created.

RESULTS AND DISCUSSION
In the presented use case, the actual Industry 4.0 architecture has been empowered by the addition of a modular analytic system for an almost zero investment. The added application works in parallel respect to the existing system, and uses currently collected data to extract further value from them. From a conceptual and managerial point of view, it improves the return on investment. The efficacy of the prognostic solution has been practically proven, by avoiding to maintain the dirty plates for experimental purposes, and observing that more than a half of these not maintained plates would cause misreading at the next passage into the painting system. This suggests us the goodness of the developed prognostic solution. Based on the presented use case, we developed a conceptual take-away, briefly summarized into Figure 4, which can be extended into other use cases and almost all Industry 4.0 projects. The center of the figure is dedicated to squares representing data available inside the typical advanced manufacturing contexts. These data are usually exploited for aggregation and sharing in the current situation. We suggest to add the bottom layer of the figure, i.e. "data multiple uses" layer, to the data management strategy in place. Specifically, we propose to enlarge obtainable results and applications, from data collected through every investment made in this direction. We found that most of the companies dealing with Industry 4.0 have innovative monitoring architecture in place, but cannot fully exploit them for developing practical knowledge and useful decision-making support. We think that almost every investment could be regarded as something broader based on the concept of "data re-use" or "data multiple-utilization", which is present in the literature as highlighted inside the Introduction Section, but is only loosely mentioned in two past papers. Respect to past works, we better characterized the concept, and showed its core and value through a specific use case. With the belief that "data re-use" or "data multiple-utilization" should be seriously considered in mind, we suggest to involve several stakeholders from diverse departments while evaluating Industry 4.0 investments, in order to envision multiple value extraction ways, thus avoiding excessive and eventually redundant data collection architectures [56] which do not bring to practical useful knowledge extraction. Intangibility of data leads to the mistaken thought that they are associated with low or no costs, but this is not the reality. Data collection, storage, processing, and manipulation come at a nonnegligible cost, from a monetary, ethical, and environmental point of view [57], [58]. Sustainability is mandatory for every investment in innovative technologies.
Focusing on some areas of interest for the authors, MV systems' value could be frequently extended by adding modular solutions, using as input data the same video stream in parallel. MV is indeed a branch of Artificial Intelligence that reached a lot of success [59]. It is versatile and can be customized to meet requirements and objectives of multiple actors starting from the same data collecting architecture. In a past work of by Pierleoni et al. [60] the data collection architecture used for counting assembled pieces could be exploited for assembly check purposes with almost no additional investment, except for the algorithm development. Another promising example of data multiple-utilization inside the typical predictive maintenance context, is the exploitation of current consumption data. These kind of data is usually adopted for assessing equipment's health status, usually after the aggregation with other relevant sensors' data, such as vibrations, acoustic emissions, and temperatures. Nonetheless, that datum alone is already very important for energy management purposes also. For this reason, we envision an easy multi-purpose utilization for current consumption data. These are only two examples of how the concept described and motivated in this paper should guide the to-be industrial data management. By taking it into consideration, sustainability increases, additional knowledge and value is extracted, with little effort.

CONCLUSION
All the researchers agree on the pivotal role of data in the context we live in. Nonetheless, data redundancy, data waste, and data incomplete exploitation should be avoided for an efficient transition to valuable Industry 4.0. In this paper, we proposed an experimental use case of Prognostic and Health Management inside an Industry 4.0 environment, together with a conceptual take-away regarding "data reuse" or "multi-purpose data" as a value extraction approach in synergy with already-in-use data management techniques, such as data fusion and edge computing, among others. The validity of the presented PHM solution has been {tested and} proved in sections 2 and 3, showing results of the experiments performed. By extending and generalizing the reasoning behind the implemented system, we developed a stand-alone concept, which can be fruitful to all the managers and engineers involved in Industry 4.0 projects. Almost every industrial context could be coherently re-shaped considering the presented conceptual approach, thus improving overall sustainability and effectiveness of investments (more extracted knowledge from the same data collecting architectures).
Future scope of this work involves the extension and organization of the "multi-purpose data" concept, that has been introduced in this paper, in order to carry on this research field to a higher level of abstraction and definition. We think that there is enough material for enlarging and setting the reference on this broad and reasonable concept, which we found vaguely implied in several works, but has not been organized and presented by other researchers, up to our knowledge. The limitations of this work are mainly connected to the fact that a complete literature review about "data re-use", especially within the Sustainable Industry 4.0 scope, is missing up to now.