Intelligent machine for ontological representation of massive pedagogical knowledge based on neural networks

Received Feb 24, 2020 Revised Aug 7, 2020 Accepted Nov 12, 2020 Higher education is increasingly integrating free learning management systems (LMS). The main objective underlying such systems integration is the automatization of online educational processes for the benefit of all the involved actors who use these systems. The said processes are developed through the integration and implementation of learning scenarios similar to traditional learning systems. LMS produce big data traces emerging from actors’ interactions in online learning. However, we note the absence of instruments adequate for representing knowledge extracted from big traces. In this context, the research at hand is aimed at transforming the big data produced via interactions into big knowledge that can be used in MOOCs by actors falling within a given learning level within a given learning domain, be it formal or informal. In order to achieve such an objective, ontological approaches are taken, namely: mapping, learning and enrichment, in addition to artificial intelligence-based approaches which are relevant in our research context. In this paper, we propose three interconnected algorithms for a better ontological representation of learning actors’ knowledge, while premising heavily on artificial intelligence approaches throughout the stages of this work. For verifying the validity of our contribution, we will implement an experiment about knowledge sources example.


INTRODUCTION
Cloud computing is an immense infrastructure based on different models for providing various services for software and hardware. Cloud computing paradigm has resulted from integration of online systems. Observable is that tertiary education is currently incorporating technological solutions (LMS) into education, that is to say, computerization of educational processes for the benefit of the current learners and the other coming learning actors alike. Recent relevant research [1] has shown the vital importance of LMS in many educational establishments. We can incorporate many learning activities in the form of learning scenarios in MOOC systems, which can generate massive traces through actors' usage. That is, in the course of learning, such systems produce big data building on the learning actors' interactions (including their effects and productions). These big data were dealt with in depth in our big data preprocessing work [2]. Based on the massive data produced via interactions, we proposed, as a first step, a machine learning system,

Ontological engineering
Knowledge mapping represents an important stage in actors' knowledge representation. Given this fact, we have made an in-depth analysis of the best methods used by re-searchers in ontological mapping. The main objective of this stage is: − To make correspondence between the database elements and ontological elements, on the one hand, and between ontology sources and ontological elements, on the other hand. − To make correspondence between two or more ontology associated with our knowledge representation system. In fact, some researchers use linked or open data for the representation of know-ledge based on the SPARQL language [14], whereas others use the ontological representation [15] as a solution to problems relating to knowledge mapping. Knowledge identification is achieved via the use of two types of methods, namely: non-supervised [14] and supervised Web methods processed by experts in the field [16]. In the second case, there are technical teams working on the extraction of knowledge produced by actors in the field of education. Currently, using systems which are based on web related technologies, the Web domain produces massive data resulting from users' interactions. Numerous researchers have worked for years on the integrated web' methods to better manipulate those data [17]. Building on their analysis, we have come to grips with linked and open data methods facilitating data access. As a case in point, the researcher in [18] has used such methods to facilitate access to cloud data. Linked and open data are based on a standard SPARQL [19], which offers a uniform approach for accessing massive data. To put it another way, web-based methods, including linked and open data, integrate digital resources or massive data for any area. Several works have been proposed in ontological mapping; among these works [20], we find articles which propose solutions related to ontological mapping between Dataset and ontology, on the one hand, and mapping between two or more ontologies on the other, as mentioned above. Accordingly, the main objective of mapping function is to make correspondence/matching between a given operational data layer and ontological knowledge representation for any given field (i.e., medicine, education, and so on).Thus, the above-mentioned correspondence is used to create an ontology common to knowledge extraction systems for the benefit of actors involved in online learning [7]. However, ontological mapping still encounters problems relating, for instance, to similarity between knowledges, as introduced in our work [2], which represents a preprocessing machine for big data produced by interactive actors. The preprocessed data output of our machine are represented in XML and RDF formats. Having reviewed some studies, we have noted that ontological mapping in the case of correspondences between ontologies is performed by a given type of methods and tools, whereas in the case of mapping between dataset and ontology, it is functioned by another type. Given this distinction, we have investigated the best methods for mapping whatever type of input knowledge sources. The Table 1 shows in detail the correspondence between knowledge sources (Dataset and ontologies) and ontologies output: Ontological learning: This process aims to fill the proposed output ontology [7] with knowledge extracted from the big data layer [2]. Ontological learning is implemented by some researchers who have  [21], whereas other researchers use existing algorithms proposed by the scientific community. In the Table 2, we show the efficiency of some proposed algorithms by researchers who dealt with ontological learning in their scientific proposals: In this stage, researchers [18,21] proposed some ontological learning solutions for integrating the input knowledge sources (i.e., a dataset and an ontology) and the output ontology. For instance, in his work, the author undertook an in-depth study on ontological matching for knowledge extraction. During the ontological matching stages, he carried out the ontological alignment between one or more knowledge sources. Having analyzed these works, we noted that they are characteristic of shortcomings in terms of the used approaches. Our analysis of the results of the recently proposed works [3,22] revealed that the best methods of ontological learning are built on artificial intelligence (AI) approaches due to their contribution with regard to Knowledge extraction.
Ontological enrichment: As the literature demonstrates, many methods of ontological enrichment have been introduced by researchers. The main objective of their works is to identify novel knowledge sources for a given field, including the knowledge relevant for education (the context of our study). Other studies as in [23] proposed methods for tackling correspondence between the initially created ontologies and the one identified by our machine in order to enrich our ontological knowledge representation system for educational systems. The field of education produces knowledge through e-learning systems that fall into different knowledge sources categories and found in different places. Researchers are working on two phases of such knowledge identification: the first concerns localization, while the second suggests the integration of intelligent agents for a better identification of delocalized knowledge [24]. Other researchers have offered web services to identify new distant knowledge in the field.

Knowledge extraction based on neural networks
Artificial intelligence (AI) approaches: such approaches have been introduced into many areas of scientific research. It should be noted that sciences subsumed within AI represent the trends that will dominate the scene in the coming years due to their effectiveness in the development of many sectors. More recently, researchers in the field have proposed methods for accentuated handling in the stages of carrying out the research results in various fields. In this context, researchers have proposed methods for accentuated investigation and processing in the phases of carrying out their work in various fields. Among these fields we may refer to the field of robotics. For example, the author in [25,26] has made a comparative study of works relating to this vibrant field and its outcomes and implications for industry. Another example is that researchers have undertaken comparative studies to make agent-based models. Majority of the pinpointed ideas and concepts have an automatic aspect in that they tackle methods which have been proposed for dealing with massive data integration in a given area. Among these concepts, we may refer to artificial neural networks (ANN) and the relating approaches, which can be incorporated in various fields, especially the field of education, where they can be used in parallel processing of big data as an input, and classification of knowledge as an output. Among these works we cite [27].
WORD2VEC: It represents the embedding of lexical words families (i.e., word embedding), which allows vectorial representation based on knowledge sources. We can use word2vec in two different ways: Continuous bag of words model (CBOW) and Skip-gram (Gram jump). On the one hand, the realization of the words vectorial representation with CBOW is obtained by predicting their content from knowledge of the neighboring words. In this case, the notion of "words bag" does not impact the order of words. On the other hand, the second method, Skip-gram, is based on an approach of predicting words based on the context of their occurrence. In this case, the words with heavy weight are neighboring ones. The calculation of the word centers is made by artificial neural networks which simplify the data processing. Drawing on our analysis of the works which integrate Word2Vec approaches, we noted that Word2Vec has an implication for the cases of knowledge extraction from textual documents [13,28]. In other works, it was used for knowledge extraction from structured documents. However, we equally noticed that such works have some shortcomings in terms of their knowledge sources nature and types.

THE METHODOLOGY
In order to achieve our main objective, which has already been laid out above, and which revolves around the representation of massive knowledge produced by learning actors, the present work adopts the methodology presented in Figure 1.

Figure 1. The methodology adopted for ontological knowledge representation in MOOCs
Based on this methodology, we adopt three algorithms: the first for ontological mapping, the second for ontological learning, while the third for ontological enrichment. Throughout the stages of the realization of the three stated algorithms, we use artificial intelligence approaches for the extraction/parallel processing of knowledge received from massive data as an input. The three elements are introduced as follow: − The first algorithm: In this algorithm, the system receives massive data, which are preprocessed by generic interfaces and adapted to the educational field, from the operational massive data layer [2]. Worth noting is that the system involves two types of interfaces: database interface and the ontological one. In this stage, our algorithm makes the matching between database attributes and ontological attributes, which have already been proposed in [7], that is, the conceptual framework used for knowledge extraction from learning actors. − The second algorithm: In this algorithm, the system incorporates ontological learning approaches during knowledge extraction from the database layer into the ontological layer. Hence, the system synchronizes the big data induced by the operational big data layer (SQL and NOSQL) and the ontological layer which is on the other interface. In this stage, the algorithm proposes an approach based on artificial neural networks techniques to better identify new sources of local or distant knowledge, on the national or international scale; then, it feeds the extracted knowledge into our representation system. In parallel, it verifies the quality of the acquired knowledge as well as its impact on the actors' profiles in the educational field. − The third algorithm: In this algorithm, the system makes use of some methods for identifying knowledge in local MOOCs/or other Moroccan universities MOOCs. Then, it adapts them to our ontological knowledge representation system [2]. The algorithm is based on Web intelligent methods configured for the identification of new actors' knowledge sources; then, it adapts them to the general context of our knowledge extraction system. The objective of the abovementioned algorithm is to create an intelligent knowledge representation system, while incorporating the best methods of massive knowledge identification and extraction depending on the results induced by the learning' actors' interactions in a MOOC system.

MASSIVE DATA PREPROCESSING
In this stage, we are proposed a MapReduce machine learning based on bigdata produced by MOOCs [2] to perform the parallel pre-processing of massive data falling within various structures. The preprocessing of knowledge sources is achieved by virtue of analyzing the results induced by the learning actors' interactions in MOOCs. This system [2] processes SQL and NoSQL massive data by an algorithm based on the HADOOP echo system associated approaches. It offers massive semi-structured data in XML and RDF format for the ontological knowledge representation system, the subject matter of the paper.

PROPOSED MACHINE
In this diagram, we propose a global architecture of our ontological knowledge representation system for learning actors based on big data resulting from their interactions in online learning. It should be noted that our system is open to all sorts of knowledge sources at the national level; it is enough to link the sources to the system. Figure 2 proposes the elements prerequisite to make the semantic representation of the actors' knowledge.

Ontological machine for knowledge representation in MOOC
This system makes the ontological representation of the big data pre-processed in [2]. The representation begins with the ontological mapping algorithm (algorithm 1) between the input DataSet and/or ontologies knowledge sources and the output ontologies. Then, the ontological learning algorithm (algorithm 2) feeds knowledge sources extracted from online learning through into the output ontologies. This process is realized through transforming massive data into massive ontological knowledge, while making use of the artificial neural networks throughout the stages of ontological knowledge construction.
Finally, in an important later stage, the ontological enrichment algorithm (algorithm 3) detects new knowledge sources in MOOCs. This algorithm represents one stage among others proposed in our Framework [7]: In this stage, we focus on some definitions of our system variables: Definition 1: For each given ontology as input of the system we have: Such as: Let are the central target word wt and the context word wj respectively indexed to t and j in the dictionary. The conditional probability of generating the word context for the given central target word can be obtained by performing the system on the internal product of the vector.

The contribution of this machine 5.2.1. Ontological mapping layer
In this stage, our system uses an approach which is more adequate for making correspondences between database attributes (SQL AND NOSQL) and ontological attributes (OWL, RDF). Our investigation of the relating literature reveals that researchers have proposed many approaches and methods dealing with massive data integration [29]. Indeed, the relevance of our approach is apparent at the level of incorporating the added values of big data-based linked and open data through web knowledge integration. Hence, this algorithm offers generic solutions for the mapping between the elements of RDBMS and those of RDF and OWL. This linkage is introduced by our conceptual modelling proposed in our ontology-based conceptual framework [7], by so doing, we determined a structured and/or semi structured dataset in online learning systems through: − DS (Attribute, type, value) And OWL ontological representation through the following attributes: − ON (class, R, Value) The objective in this stage lies in the transformation of dataset into ontology. Online learning systems, in this stage, produce big data in accordance with several structures that are pre-processed by our matching learning; this work performs the pre-processing of the massive data produced by actors [2]. This work takes big data from any structure as input; it follows that it applies many methods to analyze and preprocess them. Our aim here is to make an automatic mapping between database attributes and the ontological ones, respecting the standards of ontological mapping [18]. In a given stage, ontological mapping uses linked and open data for incorporating knowledge from various sources based on SPARQL language, which is a query language for web semantic data access. The mapping algorithm: The mapping functions dealt with in various works are of different sorts. Drawing on such works, we determine mapping functions between knowledge sources (a Dataset and/or ontology input) and ontology output in our knowledge extraction system [2]. The following algorithm shows, in detail, all the steps to follow:

Ontological learning layer
In this stage, our machine parameterizes the knowledge representation system [7], which is based on the resulting ontology of all knowledge sources (NOSQL or SQL) [2] emanating from the operational big data layer. Our objective here is to propose an optimal method for automatic feeding of our ontological knowledge base through detecting the new knowledge produced in our basic system and by measuring the similarity of the received knowledge. This objective is achieved resting on the already conducted studies on the best methods of ontological learning through the calculation of the similarity of the acquired knowledge. In order to have an environment capable of systematic and rapid learning, we have made in-deep analysis on the existing learning methods in the literature [30]. Among the methods, we have identified supervised and unsupervised learning methods [31]. In this algorithm, there is a range of processes to be applied on our system. Following an analysis of the works published in scientific journals relating to our research area, we observed that all of the taken approaches were based on one or all of the following steps: − Identification of structured data received from the operational massive data layer. − Definition of the transmission channel between the massive data and massive knowledge interfaces. − Definition of the transmission means from pre-processed massive data to ontological layer − Implementation of ontological learning algorithm. − Integration of artificial neural networks-based approaches for parallel processing of different knowledge sources.
Loading algorithm: In this algorithm, we modify the data identified by our knowledge extraction system. In this stage, the goal is to make our learning system optimal and fast, and, thus, the most integrated at the level of our target knowledge representation system. Hence, this algorithm is defined as follows: Algorithm Input: Mega_data: the preprocessed missives data; Global_Onto: the output ontologies; C: communicational channel. Output: Space of knowledge relevant for learning actors: SPAQL or NSPARCL Start: Adding the stream received from Meta_bigdata in O_collection while respecting the structure defined at the start(wi , xi )

Ontological enrichment layer
This step is proposed in order to give our knowledge representation system more scalability in case of acquisition of other knowledge sources (SQL and/or NOSQL) and, thus, enrich it by each received ontological knowledge. The system in question provides a rich environment for educational entities with regard to learning actor's knowledge representation. For the enrichment of such representation, as our studies show, Moroccan government, like other governments, has set the objective of creating online learning systems as a reference (i.e., MOOCs); our system makes use of actors' interactions in these systems as knowledge sources. The integration of MOOC systems is being generalized at the level of all Moroccan universities [22,32]. By analyzing these systems, we find that they yield massive knowledge that can be integrated into our knowledge representation system. It follows that it is prerequisite that we think of the means of ontological enrichment that make discovery of the best new knowledge when connected with some MOOCs in universities. This objective is achieved through the integration of artificial intelligence-based agents to the discovery of knowledge at the level of the Moroccan MOOC system. In the literature, we have identified numerous researches works that deal with ontological enrichment projects, the basic principle around which revolves the ontological representation of educational resources. In addition to these methods, this algorithm aims to identify new sources of knowledge, and then adapt such sources to our proposed system [7] which provides a framework for extracting knowledge from learning actors.
Enrichment Algorithm: Ontological enrichment: As presented in the foregoing, this algorithm, in principle, proposes a method of identifying distant knowledge sources, be it ontology (Oi) or massive data (DSi). The algorithm provides in detail the steps to follow for a better ontological enrichment:

EXPERIMENT
In this experiment, we emphasize the importance of the ontological mapping stage in the experimental evaluation of the functioning of our machine. The results of this experiment give us an idea about the reel experience of this machine. In this experimental phase, our system takes as input massive data of learning activity traces in XML format (massive data pre-processed by our learning machine [2], for example dataset (229 022 row)).
Massive data traces were extracted from the online learning system of the Mohamadia School of Engineering (EMI). To do the ontological mapping, we will use the ontological representation editor Protégé, which represents an ontological mapping tool that integrates plug-in for transforming knowledge sources (Dataset, XML) to ontology (OWL). This tool performs ontological mapping for the semi structured data (XML). In this mapping stage, the Protege ontological representation tool integrates plug-in for mapping, among these tools we have in the Ontop mapping literature which represents an ontological mapping tool. The correspondence is made either by coding or by graphic assistant. Figure 3 shows in detail the correspondence made in our case.
In this step, we have as output an ontological knowledge representation of learning actors. The Figure 3 shows the representation class, resulting from the application of the ontological mapping function provided by the Ontop plug-in. Following the analysis of the knowledge produced by our system, we observe that this latter sorts out knowledge, which is available for use as part of both formal and informal online learning sessions, depending on actors' profiles. Accordingly, it can create an environment for developing such profiles. We noted that knowledge falls into two categories: a. Relevant knowledge, which is part of the massive knowledge, depends on the flow of the massive data extracted at the level of the operational data layer. Indeed, the extracted data are of different sorts. We may refer for example to the following: − Knowledge related to effects: this type represents the knowledge reflecting the effects of the actors, such as copying the file, consulting the documents, and reading documents, etc. − Knowledge yielded due to productions during a learning session: it is the result of actors' interactions (answering questions, test evaluations, and exams, etc.) − Knowledge produced by virtue of access to files: it is the type of knowledge relating to the summaries produced from the files consulted during a learning session etc.  In a nutshell, it is clear-cut that this system offers a rich environment for relevant and irrelevant knowledge. Concerning the relevant knowledge, it is within the reach of learning actors for the follow-up of the courses in an online learning system, while irrelevant knowledge is available to study the actors' problems in online learning sessions. In the e-learning system we have to types of knowledge, such as relevant knowledge and irrelevant knowledge. To evaluate this machine, we have measured the probabilities for knowledge representation via two approaches (the first in based on our machine and the second on prediction). The Table 3 shows the result for same types of knowledge representation based on different sources circulated in the e-learning EMI.
By analyzing the predicted results and the machine proposed in this paper, we found that the knowledge produced by our machine are relevant compared to the predicted knowledge extracted from an elearning system even if we have diverse knowledge sources. Indeed, our knowledge representation machine is more performant than the probability analysis based on knowledge representation. A second experiment on these knowledge sources was carried out; we keep that knowledge whose similarity between concepts varies between 5 and 10, having a great value of similarity between concepts. We apply our machine to these concepts and we observe that the number of acquaintances remains invariable. It should be noted that we retain only the pertinent knowledge that circulates in the platform. The Tables 3 and 4 gives more details on this second experiment. The results displayed in the Tables 3 and 4 were very encouraging, and the number of knowledge produced by our machine is very high compared to the probabilistic approach, exactly as in the first experiment. These results show the effectiveness of our approach to representing knowledge of learning actors. Having analyzed the obtained results, we can say that our system provides learning actors with a rich environment for developing their profiles. This system offers different kinds of knowledge because it can integrate delocalized knowledge. A case in point is the integration of knowledge produced by external educational entities. It is characterized by: − Activation of the actor's motivation in an online learning session − Giving autonomy to the actors − Proposing efficiency measures to actors − Providing an environment to recommend knowledge for actors. To sum up, the system offers a knowledge-rich environment which can be used by learning actors in online learning, either for the development of their profiles, or for conducting studies on problems relating to the use of learning systems.

CONCLUSION AND PERSPECTIVE
In this work, we carried out research on the most recent works dealing with projects of knowledge extraction and representation for all domains, placing heavy importance on the projects of knowledge representation in the domain of education. We tried to handle the issue of actors' knowledge extraction, which is based on their interactions in an on line learning session. For such purpose, we identified the most appropriate knowledge representation methods in this area. For achieving our main objective; we made such an extraction and representation from various sources, based on three basic algorithms: First, mapping algorithm which performs an ontology mapping between a knowledge source (Dataset and/or ontology) and an output ontology, which represents the ontological representation of the actors' knowledge; this is done through several transformations covering the entire scope of our objective linked to this stage; second, learning algorithm deals with the ontological feeding stage of our output ontology. This process is carried out via the determination of the transmission channel of the massive pre-processed data towards massive ontological knowledge, that is, the output of our knowledge representation system based on e-learning systems; Third, enrichment algorithm is intelligent in that it identifies the new knowledge around which our global system of knowledge representation revolves; then it adapts it to the global context of our actors' knowledge representation system.
The overall results of this ontology-based system of shows the existence of two sorts of knowledge (relevant and irrelevant), which can be used by learning actors in an online learning system like MOOC. We believe that this work will lead to a better implementation of our representation system in a real production environment. Thus, it leads to a better assessment of such actors' effectiveness, the system's impact on their profiles, and finally the integration of a generic approach to actors' knowledge recommendation in an online learning session.