Process Mining in Supply Chains: A Systematic Literature Review

ABSTRACT


INTRODUCTION
With the advancement of technology, collaborations between organizations have become more natural to realize.The limitations of physical distance decrease and companies expand their scope to global proportions [1,2].At the same time, the number of transactions between companies increases as they are more closely working together.By synchronizing their processes [3,4], they are forced to become more flexible and more transparent.Hence, for closely collaborating partners, access to accurate, detailed, and complete information about the supply chain wide processes has become indispensable.
To facilitate the communication about and the synchronization of their processes, the majority of collaborating partners construct business process models [5].These models graphically specify and represent the flow of activities within the supply chain, such that the current collaborative process can be analyzed or improved more effectively and efficiently [6].The supply chain business process models can also be used to represent the relations between the public and the private process views of each partner in the supply chain [5] or to show the interactions between different partners in the supply chain.The construction of supply chain wide processes poses a real challenge because often the knowledge about the overall process is 4627 distributed over the involved parties and no single party has an overview on the complete process and all its details.Therefore, in the context of supply chain process modeling, process mining may be used as a solution to construct the overall process model.Process mining techniques include a wide variety of (semi-)automated techniques that study processes based on historical process data extracted from the supporting information systems into structured event logs.The most known and most applied technique type is process discovery [8].It is a type of technique to automatically construct a business process model that captures the real process by analyzing the event log [9,10].Process discovery is thus proposed to produce more objective, more complete and more up-to-date business process models [11].It is currently not clear, however, how these techniques can be applied in the context of cross-organizational processes [12].
Therefore, we conducted a Systematic Literature Review to collect, analyze, structure, and integrate the current academic knowledge about cross-organizational process mining.Except for the collection of metadata, such as the number of published papers over time and the evolution of geographical spread of the authors, the analysis was mainly driven by two frameworks.These frameworks are selected to be suitable to get insights into the addressed research topics, the proposed contributions, and the applied research methods.The first framework describes the types of research outcomes for each paper, whereas the second is applied to classify the types of practical solutions targeted by each paper.This paper proceeds as follows.Section 2 describes how we have implemented the Systematic Literature Review method.In Section 3, the results of the analysis are presented.Section 4 provides a discussion and conclusion.

RESEARCH METHOD
To reveal the current knowledge and to get insights into potentially missing knowledge about process mining of cross-organizational processes, the Systematic Literature Review (SLR) methodology was implemented.This method is assessed as reliable, profound and controllable [13].We adopted the practical guidelines from [13][14][15].Based on a search phrase, derived from the research question, a selection of databases is automatically searched to find relevant papers [14].The resulting paper set is reduced by finegraining the search with the manual application of inclusion and exclusion criteria [13].The final paper set is then studied to get insights into the current state of the art of the research domain and to identify research opportunities (as in [15]).The elements that lead to the paper selection are discussed in more detail.

Research question
The research question is based on the general research goal to get an overview of current and missing academic knowledge about cross-organizational process mining.Such an overview is now lacking, whereas researchers in the past have discussed the need for it [12,16].Therefore, the research question addressed in this paper is: RQ1.Which knowledge about cross-organizational process mining exists in academic literature?By addressing this research question, an overview of current academic knowledge is created.This overview is useful for practitioners, who are now reporting the difficulty of finding suitable information for their crossorganizational process mining projects .On the other hand, also researchers will benefit from the overview.For example, the lack of knowledge about cross-organizational process mining was explicitly mentioned as a research challenge in the process mining community Manifesto [16].

Search and selection process
Search phrase.Based on the research question, a search string was composed to be used in an automatic search process in multiple databases to find the relevant literature for the overview.The search phrase relates to the two key concepts, which are -cross-organizational process‖ and -process mining‖.For the former concept, we consider two synonyms, i.e., -supply chain process‖ and -inter-organizational process‖.Further, the latter concept was split up in -process mining‖ and -workflow mining‖.Finally, because the early papers in these fields did not always use the more modern term -mining‖, we also included descriptions of these techniques that use on the one hand the words -process‖ or -workflow‖, and on the other hand one of these terms: -event log‖, -log file‖, or -audit trail‖.This way, the final search phrase is as follows: ("supply chain" OR "cross-organization" OR inter-organization) AND ("process mining" OR "workflow mining" OR ((process OR workflow) AND ("event log" OR "log file" OR "audit trail"))).Databases.The search phrase was used to find articles in a set of academic databases.There is no standard set of databases.Inspired by the guidelines and the examples of [14,15], we selected the five databases presented in Table 1 This approach of selecting multiple databases is proposed to improve the completeness of the study.Note that the selected databases are academic databases, to be aligned to the research question.Inclusion and exclusion criteria.Because the automated search process includes too many papers that are not relevant, the search process is followed by a manual selection process that aims to eliminate these unrelated works from the paper set.This elimination happens according to predefined inclusion and exclusion criteria.For practical reasons, and according to the guidelines of [13], this process is performed in two phases.First, the inclusion and exclusion criteria are assessed based on only the title, abstract and keywords.In case of doubt, the paper is not discarded from the paper set to be processed further on the next step.Secondly, the criteria for the remaining papers are assessed back based on the full text.
The applied inclusion criteria (IC) and exclusion criteria (EC) are: IC 1. Cross-organizational process model.The study needs to discuss research about process models, which describe processes that are crossing the boundaries of a single organization, spanning over two or more organizations within a supply chain.IC 2. Process mining.The study needs to discuss research about techniques that aim to automatically construct, complete or analyze process models from historical process execution data.The techniques should be data-driven: for example, but not limited to techniques that start from event logs.EC 1.Other models.Studies of other types of models than business process models are excluded.For example, we exclude studies about other types of process models (such as software process models) and general conceptual models (such as data models, business models, and value models).EC 2. Management.Studies that discuss other aspects, tools or techniques than modeling, are excluded.For example, we explicitly exclude studies about business process management and supply chain management.EC 3. Technology.Studies that discuss general technical aspects of collaborating partners are excluded.For example, we exclude studies that discuss supply chain software or technologies for data exchange between partners, if they do not relate their findings to a cross-organizational process (model).Snowballing.To maximize the completeness of the paper set, as proposed by [13], we applied a technique called snowballing.Moreover, we applied backward snowballing that all the papers that are referenced by the papers in the set so far are also considered.By implementing the same inclusion and exclusion criteria, the paper set is extended in two steps (first considering only title, abstract and keywords, and later also the full text of the referenced papers).After removal of duplicates, respectively 41 and 24 papers were excluded.Next, based on the application of the selection criteria on the title, abstract, keywords and conclusion, the paper set was further reduced to 52 and 28 papers respectively.After downloading the full papers from Springer and after assessing the selection criteria on the full texts, the resulting paper set contained 17 unique papers.The application of the snowballing technique added 679 papers to the set, which are reduced to 41 after assessing the title and finally to 4 additional papers when the full text is being evaluated.This way, the final paper set contains 21 unique articles about cross-organizational process discovery.An overview of these papers is presented in Table 2.

RESULTS AND ANALYSIS
This section describes the results of our analysis on the final paper set.First, an overview of the number of papers and the geographical spread of the first authors is presented to provide a context for further analysis.Then, the papers are classified and discussed based on two frameworks (i.e., theoretical contribution types and practical contribution types).Lastly, we provide a less systematic overview of the field and of the technologies used to distract the necessary data.

Analysis of the meta-data
Figure 2 shows the number of papers that discuss process mining techniques in the context of supply chains, according to the selected paper set.The research into supply chain process mining seems not to be abundant.The research appears to have accelerated since 2009.Further, the primary affiliation countries of the first author are presented in Figure 3. From this image, it can be concluded that supply chain process mining research is dominated by two countries: China and the Netherlands.They jointly count for 12 of the 21 papers (57%) in the literature set. Figure 3.The number of papers per country

Classification of the Artifact framework
Hevner, et al. define four kinds of artifacts that can be developed and investigated by design science research [36].We refer to this classification as the Artifact framework, presented in Table 3.According to the framework, products of design science research can be constructs (languages, terminology, definitions, and measures), models (abstractions and representations), methods (approaches and algorithms), or instantiations (prototype and implemented systems) [36].For each paper of the paper set, it was determined which artifacts and contributions are proposed, and for each artifact, the type was derived from Table 3.A difference was made between newly proposed artifacts that can be considered the contributions proposed in the paper (presented in Table 4) and potential existing artifacts that were used for the research described in the paper (not represented in Table 4).
As can be noted in Table 4, the early contributions mainly focused on terminology and informal approaches to represent and analyze supply chain processes.Only later, from 2009 on, also concrete algorithms and techniques were developed for (semi-)automated analysis based on historical process data (=process mining techniques).The papers proposing an algorithm have an underlined x in the column labeled A3.It can be seen that 13 of the 21 papers (62%) propose a process mining (support) algorithm, which is 13 of the 17 papers (76%) after 2009 (included).Exactly 9 of these 13 algorithm-proposing papers (69%) also propose an implementation of the algorithm.The majority of the algorithms appears to focus on (support of) the integration of decentralized process data in a single event log to enable the execution of traditional process mining techniques on supply chain process data.The type of proposed process mining techniques (e.g., data preparation, discovery, conformance checking) is investigated further in Section 0.

Classification in the Process Mining framework
The second framework was the Process Mining framework proposed by Van der Aalst [37] as shown in Table 5.It describes the different types of techniques in the process mining field.The activities can be grouped into data preparation (F0), process specification in the form of models (F1, F2, F3), process auditing (F4, F5, F6, F7), and process navigation (F8, F9, F10).Compare Detect differences between as-is and to-be process models F7 Promote Promote differences between as-is and to-be models to the to-be model F8 Explore Visualize running process instances on as-is or to-be process models F9 Predict Predict final properties of running process instances based on event logs F10 Recommended Recommend next actions of running process instances based on event logs For each paper in the set, it was determined which activities are supported by the proposed contributions.A distinction was made between direct support being concepts about, models of, methods for, and instantiations for these process mining activities as shown ‗D' in the columns of Table 6, and indirect support being preparatory artifacts as shown ‗I' in Table 6.Further, Table 6 also presents whether the proposed artifacts were evaluated and how.When the value of the contributions was shown with an artificial or simplified example or analysis, this was called demonstration.A more in-depth analysis of a real or at least realistic example was called case study.The term ‗empirical' was added when non-trivial statistical techniques were used.Expert interview evaluation means that also perception data was used in the evaluation.Table 6.Process mining techniques proposed directly (D) or indirectly (I) by the selected papers Note that also proposed algorithms without implementation are regarded as direct contributions (e.g., [20]) Note that papers may additionally present analysis techniques that are not included in this framework (e.g., [25]) It can be noted that the majority of the papers (17 of 21 papers, 81%) focus on data preparation (8 of 21 papers, 38%) and process discovery (12 of 21 papers, 57%).In most cases, they (first) attempt to combine the data of different collaborating partners [19][20][21][22]25,28,29,32].Indeed, when the data of the collaborating partners can be prepared in such a way that they can be combined in a single event log-grouping event data for the same process instance in a single trace-the existing process mining techniques can still be used.This way, no dedicated process mining algorithms or implementations for supply chain process models need to be created, which increases reusability of the mature and robust existing techniques.This method also means 4633 that a high number of the papers aims to indirectly contribute to all other types of process mining (i.e., F1-F10), which was not indicated in the table to avoid overload.
Further, it appears that (relatively limited) demonstration and (extended) case study are the preferred form of evaluation.More comprehensive empirical evaluations, such as multiple-case studies, multiple technique comparisons or including user perception discussions are hardly applied in this field.This evaluation may have to do with the sophisticated setting where multiple organizations are involved by definition, which is difficult for researchers to access the appropriate data for evaluation purposes (both regarding quantity and quality of data).

Diving deeper
Except for classifying the papers, we also analyzed their contents less systematically to reveal conventional strategies and approaches.Table 7 presents an overview of the investigated topics in the supply chain process mining field, according to the paper set.[31], [20], [18] Process mining for predictive analytics [29] The concept of a virtual organization [24] One common viewpoint on data-driven process analysis (=process mining) in supply chains, is that organizations have data that they want to remain private and other data that can be made public (e.g., [7,9,17,18,32,34]).Similarly, these authors typically distinguish between a private view on an organization's part of the supply chain wide process, and a public view on the process.They consider an approach in which the public data is shared (with each other or with a trusted third party) to construct an overall process model and then each organization can link its private data or model to this public process model to complement it with the details of their internal business processes.For example, Liu et al. [7] propose a method, which includes three steps: (1) each organization discovers its private and public business process models from its event logs, (2) a trusted third-party middleware takes the public process models as input and generates cooperative public process model fragments of each organization, and (3) each organization combines its private business process model with the for them relevant public fragments to obtain the organizationspecific cross-organization cooperative business process model.
Another interesting angle we were triggered by Table 7 to investigate further, is the technological aspect of the papers.Where does the historical process data that is used to construct event logs come?Table 8 provides an overview.Many papers seem to focus on transactional data used for the physical or virtual exchange of goods (e.g., RFID), services (e.g., SAAS), or information (e.g., EDI).

CONCLUSION
In this paper, the contribution is to provide a structured overview of the current academic literature about supply chain process mining.The practical approach appears to be to focus on merging the data of the different partners in the chain into a single event log, such that existing process mining techniques can be utilized.Furthermore, in the context of privacy concerns, a distinction is made between the public and the private data of the partners.It is the public data, which is used by for example a trusted third party to produce a supply chain wide process model, after which each organization can map its private data on this public model.
The studied paper set with 21 papers lasted to 2009 was observed until considerable attention was spent on supply chains in the process mining field.China and the Netherlands dominate research contributions regarding the affiliation country of the first author.Less than 20 of the 21 papers discuss some formal or informal process mining approach; 13 papers propose a particular process mining algorithm, and nine papers also present an implementation of the algorithm available for download.The majority of papers focus on the data preparation (8 papers) and process discovery (12 papers) and most papers use a (limited) demonstration (6 papers) or an (extended) case study (10 papers) to evaluate their contribution.
Although this Systematic Literature Review shows that the research into supply chain process mining appears to be limited (only 21 papers were found), we believe that the results are useful.The research in this paper addresses the need for an overview of the state of the art expressed by both practitioners and by researchers [16].Furthermore, it can drive future research.Whereas this study is limited to reveal the current academic literature, future work may focus on missing academic knowledge, by investigating whether the literature gaps that can be found in this paper are in fact also research gaps.Indeed, from Table 4, Table 6, Table 7, and Table 8, it can be derived which aspects are understudied, but further research is needed to investigate whether this is a problem or not.Consequently, the discovered research gaps can be addressed appropriately in order to advance both the knowledge and the practice of process mining in supply chains.

Figure 1 .
Figure 1.Overview of the search and selection process

Table 1 .
. Table of Academic Databases

Table 2 .
Final Paper Set Van der Aalst, 2011 Intra-and inter-organizational process mining: Discovering processes within and between organizations [27] Buijs, et al., 2012 Towards cross-organizational process mining in collections of process models and their executions [28] Engel, et al., 2012 Mining inter-organizational business process models from EDI messages: A case study from the automotive sector [29] Rozsnyai, et al., 2012 Business process insight: An approach and platform for the discovery and analysis of end-to-end business processes [30] Azzini, et al., 2013 Consistent process mining over big data triple stores [31] Comuzzi, et al., 2013 Optimized cross-organizational business process monitoring: Design and enactment [32] Zeng, et al., 2013 Cross-organizational collaborative workflow mining from a multi-source log [9] Bernardi, et al., 2014 Discovering cross-organizational business rules from the cloud

Table 4 .
[31]facts and Contributions of the Selected Papers An approach to discover an inter-organizational process model, and a correlation algorithm to match EDI messages to an instance to build an event log, and a software implementation[29]Rozsnyai, et al., 2012x x An approach and an algorithm to discover correlations between distributed process instance data and a software implementation linking the data correlation with process mining techniques[30]Azzini, et al., 2013x An approach and algorithm for semantic lifting of dispersed process data (aggregating events) using semantic data mismatch detection and map reduction techniques[31]Comuzzi, et al., 2013 xx x An approach, based on formal definitions and an algorithm, to monitor crossorganizational process infrastructures, with a software implementation Process Mining in Supply Chains: A Systematic …. (Bambang Jokonowo) 4631

Table 7 .
Primary Research Focuses on Supply Chain Process Mining Literature

Table 8 .
Technological base of the presented techniques