New method for summative evaluation of UML class diagrams based on graph similarities

ABSTRACT


INTRODUCTION
The evaluation of learners occupies a very important place in teaching. The knowledge acquired by the students can be tested by the teacher in the form form of a summative/certification evaluation, if the objective is to validate for example a unit of value, a course, a year or a diploma [1]. Indeed, evaluation is the process by which people make value judgments on a particular subject. In the learning process, this operation being already complicated at the base, takes on even more oversized proportions. In a teaching and learning community, the most effective assessment is one that encourages and rewards effective teaching practices based on learning outcomes [2,3].
The assessment of learning allows the learner to identify his own strengths and weaknesses, and to determine the types of information he needs, to essentially correct his shortcomings [4]. When this assessment is used correctly, students learn that it is possible to start a self-assessment, in order to improve their performance throughout their lives [5]. In all existing education systems, assessment remains the only educational tool, which validates the achievements of students in order to access the following learning subject [6]. Although the evaluation process is very complicated at the outset, this operation becomes even more tedious for the teacher when it comes to evaluating the learner's know-how in complex systems [7]. The difficulty of this task increases further when the number of students increases, which is always the case in higher education.
In this context, this article is a contribution to research efforts on improving the evaluation process for both the teacher and the students. The problem posed is how can we facilitate the task of correction is built here from the violation of a constraint (a mistake made by the learner). This environment has the advantage of allowing the text of the problem to be manipulated throughout the activity, but it forces the the elements to be edited against the statement. Indeed, the learner has no real opportunity to represent elements not explicitly specified in the statement. The teacher can add new exercises by defining the statement and an ideal solution corresponding in a dedicated teacher interface. The use of the environment is restricted to novices, and the authors advocate not to introduce implicit elements into the statement and to adapt the statements to contain as little ambiguity as possible.

Diagram
The diagram environment is designed to lead the learner, through interaction, to mobilize the three functions of metacognitive regulation and thus to facilitate the acquisition of the concepts of object-oriented modeling by generating the emergence of instrumented action schemes to perform effectively the prescribed task [21]. The Diagram environment includes a subset of the features of the traditional UML editors. It provides only the graphical elements needed to build an UML class diagram and simplifies editing of the different elements characteristics. In addition, Diagram provides the opportunity to work simultaneously with the statement (describing the specifications of the exercise to be modeled) and with the UML class diagram, which facilitates visual control of the modeling. This feature provides greater opportunities for interaction because the learner can select elements of the statement and change its visual aspect [22].
Diagram offers three types of modeling scenarios: The first is to build a complete diagram from a statement (this is the activity that is of particular interested to us). The other two scenarios consist of completing a partial diagram and correcting an erroneous diagram. This environment does not correct the learner's errors and is not intended to replace the teacher during the UML diagram construction. A diagram assists the learner in his work by encouraging self-correction. The teacher remains present during the modeling activity (conducted in practical work sessions) to provide advice to the learner.

Graph transformation of UML diagram
Otherwise, different approaches to graphic transformation can be found in the literature. We have studied the existing approaches relating to the transformation of a class diagram into a graph [23]. The transformation of graphs can easily model the graphical structure. It has become a modeling tool often used in the case of complex systems like the class diagram. The example below represents a transformation of a class diagram into a directed and labeled graph where the edges are oriented and multiple between the vertices which are either classes or attributes. The vertices and the edges have many characteristics. The advantage of this representation is to consider a class diagram in its simplest expression.
The representation in the form of a metamodel [24], as that defined by Holcher's studies very precisely describes all the elements of class diagram and the semantics of these elements. It also allows to clearly exposing its structure. For example, the Figure 1 shows an UML metamodel, the classes, attributes, operations and association ends are more specialized named elements. A class can contain attributes and operations which themselves can contain types. It has an association end that defines the role of the linked class as well as a multiplicity. An association can have two association ends. The advantage of this metamodel is that it is adapted to the OMG standard. For the disadvantage, necessary elements are not presented such as visibility, association class, type of association. Based on the extract from the UML metamodel [25], we can transform a class diagram into a graph as shown in Figure 2. A class is a vertex has an edge towards the attribute, which is also a vertex and which can be typed. A class also has an association end, it is a vertex which contains several labels such as the type of relation and the multiplicity. This relation is named, it is linked by an aggregation with the other class. The inheritance relationship is represented by a labeled edge. This representation clearly expresses the links maintained in terms of their elements and their characteristics. They are made explicit using vertices and edges [26].

MEASURING SIMILARITY AND MATCHING UML GRAPHS
Our domain of application is the UML class diagrams. We will define a similarity measure between class diagrams transformed into a UML graph. We saw in the previous section that a class diagram can be represented by a UML graph. Our main objective is to compare the class diagrams produced by the students which are transformed into a UML graph with the diagrams of the teacher. For this we wish to define a similarity function which must be able to produce the correspondence, the difference and the detection of errors between these graphs [27]. To meet these different objectives, we studied the comparison of graphs using graph matching techniques and measures of node similarity. We will therefore build on our existing work on graph similarity measures to build our own method.

Matching approaches to graphs
Different matching approaches have been defined and applied as graph isomorphism [28] which allows to check if two graphs are structurally identical. The subgraphs [29] which allows you to check if a graph is included in another graph. The search for a larger common subgraph [30] and the calculation of the graph editing distance [31]. The problem of these matching was considered a complete NP and difficult NP problem. With the exception of graph isomorphism, complexity is not clearly defined. We have studied another technique, which consists in implementing a similarity measure and looking for matching [32].
We focused on vertice and edge level approaches [33]. The comparison of several elements of the graphs is based in particular on the evaluation of their similarity or their differences, then it consists in identifying and qualifying their common points. This study proposes a comparison of two graphs, for each vertex and edge of a graph are paired with several vertices and egdes of the other graph. The matching of the vertices will be defined thanks to the calculation of the similarity measure. The couples that have maximum similarity will be selected and stored in a correspondence matrix as shown in Figure 3. We have defined our method as a matching system that follows three sequential steps. The first is a step of preprocessing the input diagrams, each class diagram will be transformed into a graph. The second is the matching process, it allows you to calculate similarities to each pair of elements. And the third returns as a formative evaluation of their paired elements with a list of differences and errors, and a summative evaluation to classify the compared diagrams.

Similarity measure
The comparison of two graphs is the task of identifying the semantic correspondences between the elements of two graphs [34]. This correspondence can be quantified in terms of similarity scores, which indicate the proximity of the two graphs. Therefore, their similarities and differences must be precisely quantified to have an exact match. The task takes time because comparing two graphs to assess their similarity is a kind of combinatorial problem generally called graph matching problem [35]. Therefore, an efficient comparison algorithm is necessary to avoid the complexity of the method and to provide an acceptable solution. Indeed, we improve this comparison by introducing one more metric and by revisiting the definitions of existing ones. Each time the couple is compared, a similarity measure is calculated, and is stored in the similarity matrix [36]. Finally, a mapping is determined and extracts the correspondences and the differences resulting from the comparison of the two diagrams as well as the proposal of the corrections of the errors committed by the students.
The properties which are relevant for the similarity of two nodes of the same type are either given by their attributes (for example the names), or by other nodes in the neighborhood of these nodes. We use a set of comparison functions to determine the similarity between two nodes. These functions compare two properties of the same type belonging to different nodes. They return a value between 0 and 1, a value of 0 means no similarity between the nodes, a value of 1 expresses equality [37].
Obviously, some properties are more relevant to the similarity of nodes than others. Therefore, weights and thresholds, which are external resources used by the matching process, must be assigned to each property. They are all configurable and can be adapted as required. The weights and thresholds should be chosen based on the semantics of the UML graph type and based on what users see as a significant change.
For each specific type of UML graph, a configuration file describes the similarity properties relevant of UML graph elements. Two elements of the same type are compared using a comparison function which returns a value between 0 and 1. The comparison function can be defined criteria for each type of element. The criteria take into account some parts of the elements depending on the types, and the actual structure of the compared UML graphs. The values of the different criteria are weighted, and the similarity value is calculated by addition, as can be seen in the following formula [38]: where : e1 and e2 are the elements to compare; C is the set of criteria; sc is the threshold for criteria c; comparec is the comparison function for criteria c. The total similarity of two elements is assessed according to the elements they contain. If the elements admit relationships with each other, the evaluation of their similarities can be taken into account during calculation. The weight values are assigned and weighted by the user. Total similarity is calculated by the following formula: where: TS is the set of similarity types; pi is the weight of the similarity types i. Table 1 presents an example of the assignments of thresholds and weights by the user to have a syntactic, structural and semantic comparison. − x, y and z are the weights values, such that x + y + z = 1 − Let x = 0.5, y = 0.25 and z = 0.25, the syntactic similarity measure has a high weight compared to structural and semantic similarity − a, b and c are the thresholds values of name, visibility and abstraction, such as a + b + c = 1 − e, f and g are the thethresholds values of the name, type and visibility, such that e + f + g = 1 − k, l and m are the thethresholds values of the association, the association end, and the inheritance − such that k + l + m = 1.

COMPARISON FUNCTIONS FOR SYNTACTIC, STRUCTURAL AND SEMANTIC SIMILARITY
The similarity assessment tool has a set of comparison rules, which have different aspects so that their matches and differences are better assessed [39]. Indeed, the comparison rules are expressed as follows: − Syntactic similarities functions are used to measure the lexical similarity (names of classes, names of attributes, etc.) between compared elements − Structural similarity functions are used to measure the similarity of properties (characteristics of attributes and operations, etc.) of the compared elements − Semantic similarity functions are used to measure the similarity of the relations of the compared elements with their neighbors.
In the three types of comparisons, the concepts (class names, attribute names, operation names, and names of relationships between classes) are compared according to their syntactic similarity between two strings using their editing distance, and the domain of ontology, as well as other resources such as dictionaries (synonyms and hyponyms) [40]. This comparison is appropriate for measuring the similarity between the strings which may contain typos, acronyms, misspellings, etc. [41].
There are a number of measures proposed in the literature to measure the semantic similarity between two concepts. Some of these measures are based on the notion of information content (Resnik, 1995), while others are based on the length of the path [42]. These measures are simple and their success consists simply in measuring the conceptual distance between two concepts in the hierarchy of concepts [43].

Syntactic comparison functions
The syntactic similarity measure identifies the syntactic identity of two elements (classes, attributes, operations and relationships). Consequently, our evaluation of syntactic similarity is based both on a comparison of named elements similar to those defined in [44], by invoking a comparator of character strings for each pair of names to be compared. It searches for common substrings between two strings of two elements. It calculates the editing distance for each pair and returns a maximum similarity value [45]. Special characters and separators are ignored. Each comparator memorizes the elements it compares. The calculated similarity measures are identified in correspondence matrices to avoid recalculating them when comparing other auxiliary elements [46]. If the syntactic similarity measure of these elements has already been compared and they participate in other similarity measure then the existing comparator of these elements is consulted. The syntactic similarity measure is quantified using a set of similarity metrics defined as follows [47]: a. Similarity measure between the names of two classes C1 and C2 and according to their visibility and their abstraction: − snc, sv represent arbitrary thresholds assigned to the similarity of classes names, visibilities and abstractions, respectively. − comparernc(C1,C2), comparev(C1,C2) and comparea(C1,C2) represent the comparison functions assigned to the similarity measure of the classes names, visibilities and abstractions, respectively. b. Similarity measure between the names of two attributes A1 and A2, according to their visibility and their abstraction: ( 1 , 2 ) = s na × ( 1 , 2 ) + s v × ( 1 , 2 ) + s t × ( 1 , 2 ) − sna, sv and st represent arbitrary thresholds assigned to the similarity of the names of the attributes, visibilities and types, respectively. − comparerna(A1,A2), comparev(A1,A2) and comparet(A1,A2) represent the comparison functions assigned to the similarity measure of the attributes names, visibilities and type, respectively. c. Similarity measure the between the names of two operations O1 and O2, according to their visibility and their type: − sno, sv and st represent arbitrary thresholds assigned to the similarity of the names of the operations, visibilities and types, respectively. − comparerno(O1,O2), comparev(O1,O2) and comparet(O1,O2) represent the comparison functions assigned to the measure similarity of operations names, visibilities and abstractions, respectively.

Structural comparison functions
Structural similarity measure that we propose focuses on syntactic similarity measure of all named elements between class diagrams. Indeed, structural similarity calculus uses the comparator of the classes names, attributes and operations of these classes. The result of the calculation will be qualified using a set of similarity metrics defined as follows: − Similarity measure between the names of two classes C1 and C2, according to their visibility and their abstraction as determined by (1). − Similarity attribute measure between two classes C1 and C2 is similarity measure between two sets of attributes, A1 and A2, respectively, defined as follows: ak ∈ A1 and bl ∈ A2, |A1| ≤ |A2|. Similarity syntactic ASimsyntax(ak , bl) between two attributes ak and bl is calculated on the basis of their syntactic similarity as defined in (2). − Similarity operation measure between two classes C1 and C2 is similarity measure between two sets of operations, O1 and O2, respectively, defined as follows: ok ∈ O1 and pl ∈ O2, |O1|| ≤ |O2|. Similarity syntactic OSimsyntax(ok , pl) between two operations ok and pl is calculated on the basis of their syntactic similarity as defined in (3). In abstract form, the calculation of structural similarity measure (C1,C2) is carried: pc, pa and po represent arbitrary weights assigned to the similarity measure of the classes names, attributes and operations, respectively.
The calculus of the structural similarity measure of two classes C1 and C2, is the sum of the syntactic similarity of the classes names, syntactic similarity of their attributes and syntactic similarity of their operations, respectively CSimsyntax(C1,C2), Simatt(C1,C2) and Simop(C1,C2). For example, the linked classes names are compared and matched to each other and the class attributes in a class diagram are compared and matched with the class attributes in the other diagram, etc.

Semantic comparison functions
Semantic similarity measure is determined by analyzing the direction in the elements and structure of diagrams. The relation of two classes implies the properties propagation from class mother to the child classes. A change in the direction of a relation between two classes, or replacement of relation type by another type, strongly modifies the semantics of the diagram [48]. We propose the calculation of the semantic similarity measure using three measures:

1585
− The neighbor similarity measure which takes into account the comparison of the neighboring classes invokes a comparator of its structural similarity which was taken into account in the calculation phase for the matching of the structural similarity measure as defined in (2) − The relationships similarity measure which takes into account the relationship name, the relationship type, the multiplicity, and the meaning of directed relationships − The measure of similarity of inheritances which takes into account more particularly their numbers of roots, leaves, classes inheriting in a multiple way. The semantic similarity measure is quantified using a set of similarity metrics defined as follows: − The neighborhood similarity measure calculates the neighborhood similarity of two classes C1 and C2, having the two sets of neighbors V1 and V2, respectively, as follows: The Simstruct similarity Simstruct(mk , nl) between two neighborhoods mk and nl is calculated on the basis of their structural similarity. − Relation similarity measure between the compared classes and their neighbors Simrelation (C1,C2) is measured as weighted similarity of the comparison function of the association end type, the comparison function of the association name and the comparison function of the multiplicity. The relationship similarity measure is defined as follows: where srt, srn and srm represent arbitrary thresholds assigned to the types similarity of the association end, the names of the associations and multiplicities of the association end, respectively. comparert(mk, nl), comparern(mk, nl) and comparerm(mk,nl) represent the comparison functions assigned to the names of associations, the types and multiplicities of association end similarity measure, respectively. Semantic similarity measure measures the similarity of two classes C1 and C2, as similarity weighted by userdefined weights, is the sum of the neighborhood similarity measure, relationship similarity measure, and similarity measure inheritance [49]. The semantic similarity measure is defined as follows: ( 1 , 2 ) = p v × ( 1 , 2 ) + p r × ( 1 , 2 ) + p h × ℎé ( 1 , 2 ) where pv, pr and ph represent arbitrary weights assigned to the neighborhood similarity measure, relationship similarity measure and inheritance similarity measure, respectively. − The inheritance similarity measure is defined as follows: gk ∈ H1 and hl ∈ H2, |H1| ≤ |H2|. The similarity Siminheritance(gk , hl) between two neighborhoods gk and hl is calculated on the basis of their structural similarity.

Weight setting
Our goal is to select the most appropriate weights automatically to detect matches, so that each class in a given class diagram corresponds to the most similar class in the other class diagram, based on the value of similarity. Indeed, we have carried out a series of experiments for collaboration in matters of weight. Each compared element must be assigned a weight which allows it to capture the similarity between these elements. The weight assignments of the constituents of the similarity measure are crucial for the accuracy of the metric. In this context, we consider that all close pairs with certain weights are similar, and that all less similar pairs are not a like. The weights of the similarity measures composed of n constituents are assigned values from 0 to 1 updated by 0.05, such as, 1 + 2 + . . . + = 1. The weights are then assigned in the same way illustrated by the above pseudo-code [50]: for pv = 0:0 ≤1 do for pr = 0:0 ≤ 1 − pv do for pg = 0:0 ≤ 1 − (pv + pr) do find between UML graph classes G1 and G2 evaluate the matching between the UML graph classes of G1 and G2 end for end for end for A pair of class diagrams was chosen at random. The weights are then assigned in the same way as the pseudo-code above. For each weight assignment, the similarity measure for each pair of classes in the two diagrams is calculated and added to the correspondence matrix. Each class in the unmatched class diagram in the other diagram is found. The weight setting that gives the best match result is used to match the other pairs in the diagram.

RESULTS AND DISCUSSIONS
In the previous section, we adapted the similarity measure between UML graphs to our own educational context. We have shown how our matching method is applied for the comparison of UML graphs and how its results are used to provide automatic corrections. We will now detail the actual implementation, put in order the different functionalities of our method and assess the quality of the results produced.
This matching method was developed for the needs of standardizing the formalism of the diagrams to be compared, syntactic validation, experimentation and reuse. We have chosen a representation of class diagrams in the UML meta model. This method is thus capable of measuring the similarities between several UML class diagrams, detecting differences, correcting errors and matching class diagrams. The results obtained (lists of syntactic and structural errors, identified differences, errors) in the form of a textual report, enabled us to carry out a summative evaluation starting from the sole achievement of the learner.

Assessment of class diagrams
The class diagrams modeled by learners were imported into our learning base, over three different exercises. Each exercise took place during a 1.5-hour continuous monitoring session on students second year Engineering computer science. The objective of these control sessions is to model class diagrams theoretically from a textual statement describing the speciations to be represented. The UML class diagrams constructed by the learners were corrected by the teacher for further analysis. We thus have at our disposal sixty-four class diagrams. Some diagrams, products may appear incomplete because some students have just had time to start the exercise or learners have not had enough time to do all the exercises requested during the control session.
We evaluated the relevance and the quality of the results produced by our method on a corpus of hundred class diagrams produced by the learners. From this corpus of diagrams, and a reference diagram for each exercise, we have chosen three exercises to configure and evaluate the system offline. We first improved the criteria involved in the calculation of the similarity functions and the general functioning from UML class diagrams constructed by the learners for the first exercise. Then, we tested and optimized the method on the second group of class diagrams from the second exercise. Some inconsistencies were identified and corrected, taking care not to degrade the quality of matching of the first test. Finally, the third group served to validate the method without any modification of the criteria.

Offline assessment
In this subsection, we present the results of the offline assessment in the form of histograms for the three exercises. The diagrams are numbered at the level of the abscissa axes. To study the intensity of the link that may exist between results of matching similarity obtained by a method score and the scores assigned by the teacher; we will study the linear correlation between these two variables. The linear correlation is then measured by calculating the linear correlation coefficient. This coefficient is equal to the ratio of their covariance and the not null product of their standard deviations.
To be able to measure the quality and relevance of the matching produced by the system, we compared the results found with those it actually finds. For the first exercise, we have found a correlation coefficient equal to 0.83. We note in Figure 4 some production does not conform to the results provided by the teacher. For the second exercise in Figure 5, we found a correlation coefficient equal to 0.98 and 0.96 for the third exercise as shown in Figure 6.

Experimental results and analysis
The application of the three quality measures to the results of all three exercises is shown in Table 2. The indicative calculation times of the system were performed on an Intel Xeon server with the Linux operating system and a processor clocked at 2.4 GHz. The results of the system show that on all compared diagrams, more than 80% of the matching provided conform to those expected whatever the diagrams compared. For 90% of the diagrams processed, the efforts required to correct the matching are minimal (Overall value greater than 0.85). The quality of the diagnosis is relatively good on simple and average problems (diagrams of the first second exercises). It can however be corrected and improved for more complex problems (diagrams of the third exercise). In particular, more than 85% of the results on average are in line with those expected for more than 75% of the diagrams compared for the last exercice. The efforts required to correct errors and omissions in our method are greater in the last exercise, for just under 40% of matching diagrams, the Overall is greater than 0.85. However, for 85% of the diagrams, the Overall value is greater than 0.7 (a result which is still very acceptable in the field of diagram matching).

CONCLUSION
The problem to which we have tried to provide a solution relates to the summative evaluation of UML diagrams by a semi-automatic method. Indeed, although several systems have already tried to overcome this problem, they could not detect the errors made by the students, especially since the comparison is made only with a single solution, or a diagram of class can certainly be represented by several models. The objective of this paper then, was to develop a semi-automatic system, capable of correcting class diagrams through a comparison which is carried out thanks to the measurement of syntactic, structural and semantic similarity in order to find differences and specially to detect errors made by students. We have focused on the different methods of transforming UML diagrams into graphs, while having recourse to the different existing formalisms, and which we have been able to adapt to the problem linked to this article. We started with a study of evaluation as being a fundamental process in the validation of student achievement. At the end of this study, it was clear to us that the formative evaluation remains the best suited for the problem to which we are trying to provide a solution.
The results of the system show that 70% of the matches provided are in line with those expected on all diagrams compared. For 80% of the diagrams processed, the effort required to correct the matching is minimal (overall value greater than 0.85). The quality of the diagnosis is relatively good on simple and medium problems (diagrams of the first and the second exercises). In particular, more than 85% of the results on average are consistent with those expected for more than 75% of the graphs compared for the last exercice. The efforts required to correct errors is greater in the last exercice: for slightly less than 40% of the matched diagrams, the overall is greater than 0.85. However, for 85% of the diagrams, the overall value is greater than 0.7.
For research perspectives, one aspect to consider is that of the classification algorithm with an automatically generated learning base. This allowed us to carry out a summative and normative evaluation of the learners' productions. Generally speaking, we will divide the learning base into two main categories, class diagrams which are correct and class diagrams which are incorrect. Each diagram, of each category, is labeled according to its status and its degree of simplification. It is this same label which will allow us to carry out a summative evaluation of the learners' productions. Indeed, a learner's class diagram at a measure of maximum similarity of a labeled reference diagram will most likely belong to this class of diagrams. Another perspective is to apply our method to other types of structured models where formalism is defined. In particular, the method could be reused "directly" on other static models such as models have characteristics very close to UML class diagrams and can be considered as a subset of UML class diagrams.