http://ijece.iaescore.com Implementing data-driven decision support system based on independent educational data

Info 2021 Decision makers in the educational field always seek new technologies and tools, which provide solid, fast answers that can support decision-making process. They need a platform that utilize the students’ academic data and turn them into knowledge to make the right strategic decisions. In this paper, a roadmap for implementing a data driven decision support system (DSS) is presented based on an educational data mart. The independent data mart is implemented on the students’ degrees in 8 subjects in a private school (Al-Iskandaria Primary School in Basrah province, Iraq). The DSS implementation roadmap is started from pre-processing paper-based data source and ended with providing three categories of online analytical processing (OLAP) queries (multidimensional OLAP, desktop OLAP and web OLAP). Key performance indicator (KPI) is implemented as an essential part of educational DSS to measure school performance. The static evaluation method shows that the proposed DSS follows the privacy, security and performance aspects with no errors after inspecting the DSS knowledge base. The evaluation shows that the data driven DSS based on independent data mart with KPI, OLAP is one of the best platforms to support short-to-long term academic


INTRODUCTION
The importance of data repositories has emerged with the existence of large institutions. Departments manage their own databases (marketing, financial, and administrative), which organise massive common data. Finding data related to a specific subject and organising such data into a single database called 'the data store' are required; another requirement is maintaining the special rules for stores with no modification or changing the rules on the basis of topics by using special software, especially in each subject, via a process called schema integration. This process also specifies how to transfer the merged data [1]. Data warehouse (DW) can be defined as a subject-oriented, time-variant, integrated and non-volatile data used to support strategic decision making [2]. DW holds a collection of permanent historical data that assists in administrative decision making to help in accessing data for the purposes of time analysis, knowledge discovery and decision making [3]. It is specifically designed to extract, process and represent data in a suitable format for this purpose. The data extracted from different sources, rules, systems and places are defined as a kind of database that contains a huge amount of existing data to help in making decisions within an organization [4]. The interior intention of a user requires indicators, which are systems needed for ISSN: 2088-8708  Implementing data-driven decision support system based on independent… (Alaa Khalaf Hamoud) 5303 of data will be a useful platform for both of educational leadership and quality assurance unit. Next, Suman et al. [34] proposed a DW to solve many challenges in the higher eduction center. These challenges are facing the designer throughout the designing process such as the overall system design, the main processes of extraction, transformation, and loading, and performing the analysis processes using multidimensional queries based OLAP. The tool used for developing DW is Mondrian and Pentaho business intelligence tool. However, the proposed systems did not implement KPI as an essential component in the decision making process.
Mihai et al. [35] proposed using enterprise data warehouse (EDW) to support academic decisions in educational institutions. The design and implementation process proceeded via two steps; the first one is implementing EDW to find the performance measurements of the management of the entire academic staff, whereas the second step is finding the result evaluation of the correlation between financial allocation and educational performance. The result obtained from implementing enterprise document management (EDM) is finding the indicator that measures the required financial efforts for education and helping decision makers estimate future efforts to enhance education. However, the paper presented a general overview of EDW implementation and neither went through design methodology nor explained how to select the proper approach of implementing EDM. Another model implemented by Abdullah and Obaid [36] combined educational records from two simulated databases of 10 years of data from the Department of Computer Science at Basrah University and four years of data from Al-Iraq University, Iraq. They unified the data under a single schema of EDW and then used OLAP to conduct a descriptive analysis and find student achievements through these years. The decision makers in the proposed model are lecturers and department heads. However, the approach of EDW design, the access method to EDW and the multidimensional cube were not clearly explained.
Kurniawan and Erwin [37] showed the advantages of using DW and DM in the prediction of students' performance. The model implementation also passed through two stages analysis and DW implementation; they used the Kimball approach to design DW with a star schema as a structure for the DW. However, the type of DW in the model was neither determined nor distinguished from the data mart. Mohammed et al. [38] designed an architectural approach on the basis of DW to combine databases from different Iraqi universities for increasing information sharing among all universities, colleges and departments. They used the beneficial characteristic of DW application to maximise information sharing among universities. However, the approach failed to show how to solve conflicts among databases from different universities. The difficulties in this approach are the huge data of a single university because many colleges exist with many departments and units in a single college. The paper also failed to explain how to deal with different standards and business rules among databases of a single college and how to handle this obstacle.

DSS IMPLEMENTATION
The data mart is the base of the proposed DSS. The flow chart of the implemented data mart is demonstrated in Figure 1. Six basic important steps are taken to implement the educational data mart; these steps are data pre-processing; data profiling; data mart area, which holds the staging area and the data mart schema; ETL; educational cube building; and OLAP, KPI and reports. The evaluation process is performed after completing the knowledge base of the DSS. The DSS architecture consists of four major areas, which are data preparation area; data mart area; KPI, OLAP and reports area; and decision-making area, as shown in Figure 2. The data preparation area involves all data source transformation and selection processes performed on the paper-based data source and their conversion into an electronic data source. The data mart area involves a data staging area where all the extracted data are stored and transformed to be loaded into data mart schema tables (fact tables and dimensions). The OLAP and query area provides many access methods to data mart analysis results from online to offline and multidimensional OLAP (MOLAP). KPI is an essential tool in the DSS that helps stakeholders in measuring progress. The last area is the decision-making area where all stakeholders (analysts, school managers, teachers and senior decision makers) reach the reports and make their decisions.

Data preprocessing
The base element of the proposed DSS is the collected data. The data source of the educational data mart is sourced from the Al-Iscanderia Private Elementary School; this school is in the Bahadria District, Basrah Province, Iraq. The data are paper-based data sheets, which hold all the required documentations for all students' degrees. The school follows the paper-based procedure to calculate averages and success rates on different subjects and in different stages; thus, storing them in an electronic base takes a long time. The main table of student records contain the following details after storing them in the electronic base Table 1. In this table, the main attributes in data sources are explained in detail, content types, data types, and details of each attribute are presented.

5305
The second table as shown in Table 2 of data profiling presents the subjects that have been learned from class to class. Eight subjects taught are for students from classes 1 to 6. The subjects differ for each class; some subjects are taught in the first class, and different subjects are taught from the fourth class to the sixth class. For example, Islamic studies, Arabic language, mathematics, science, arts and physical education are taught from Classes 1 to 6, whereas social studies are taught from classes 4 to 6. Finally, English language is taught in classes 5 and 6.

Data profiling
Data profiling, sometimes called data analysis, is the assessment and examination of data consistency, integrity and quality of data source. Using data profiling is a fundamental process to examine the data quality of DW data sources. Data profiling provides results that can be depended on when making a decision related to DW implementation. Data profiling concentrates on the individual attributes of the data source. It gives a complete summary that describes the length, data type, length, variance, uniqueness, null ratio, and domain range. It shows the full view of data quality related to all data source attributes [39], [40]. Data profiling is an important step in the building process of DW and data mart. Building data mart or DW does not actually succeed if this step is not performed. The results of data profiling can help in determining the dimensions and fact tables, the proposed primary keys, the null ratio in each column, mean and standard in each column, the maximum and minimum values in each column and the domain of each column. The result of the data profiling of student records is shown in Table 3. SSDT provides a data profiling tool that presents the results graphically for ease of understanding and use. Data profiling results are converted into table readings, as shown in Table 3.  2-2012  528  104  3  Class  1  6  6  0  4  Groups  5  0  5  Mark  613  0  6  Month  613  0  7  Number_of_Students  525  525  599  0  8  School_Address  --1  0  9  School_Name  --1  0  10  Student_Name  --599  0  11  Student_Number  3  613  599  0  12  Subject  --12  0  13 Year 2018 2018 1 0

Data mart area
The data mart consists of two areas: Staging and data mart schema tables. The staging area consists primarily of a staging table where the data are first extracted from the data source and loaded into it. The staging table faces all transformation processes and holds the final version of the table before loading it into dimensions and fact tables. Data mart schema is a star schema where five-dimension tables are connected to the fact table whilst their data are taken from the staging table. The staging table is an intermediate area used for storage between source systems and the DW; a temporary storage area is where data are successfully deleted after being uploaded to the repository. This area is used in many major processes, such as archiving and preparing data source, data extraction, cleaning, unifying, mirror, conversion, loading and indexing, quality assurance and updating [41], [42]. These processes are usually referred to as ETL. The staging area should be prepared and performed whilst all the intended OLAP queries are asked. The staging area also The data mart schema tables involve many approaches for designing DW, such as top-down, bottom-up, inside-out, and mixed approaches. Selecting an approach depending on the overall size and implementation duration period of DW can result in either enterprise DW or small data mart [43]. The topdown approach is used for long-term designing models and takes further analysis and redesigning to fit all enterprise goals. The bottom-up approach is used for short-term designing models where the results can be observed [44]. Given that the intended DW is an independent educational data mart, the best approach to build the data mart is the bottom-up approach. Based on the previously listed reasons, using this approach in implementing a data mart provides many facilities to build a solid solution that can provide answers to educational stakeholders.
Three famous schemas are used to implement the DW schema (i.e. star, snowflake and fact constellation). Each schema has its own advantages and disadvantages, which make designers prefer one schema over the other. Star schema is the famous one for its simplicity and wide usage; it consists of a central table called a fact table and many other tables called dimensions that surround the fact table. The fact table consists of many concatenated keys to the dimensions and many other keys called measurements, which represent the facts or functions that can be calculated along other dimension columns [45]. Figure 3 represents the proposed star schema of an educational data mart. The data mart schema is built using SSMS 2014. The schema consists of five dimensions (address, information, enrolment, subject, and degree) and the fact table. The fact table holds five keys to concatenate the dimension tables and one key measurement (count), which is the count function for finding the number of students represented by all the OLAP query answers. Using dimensions is one of the key factors that hasten  [46], [47]. The next stage is performing ETL tasks, which take approximately 70% of the DW development time and cost spent on implementing the overall DW model. ETL involves many tasks used to manipulate data for obtaining their final cleaned and integrated version. The tasks of ETL (not limited to) are:  Data extraction: The first step in the process of transferring data to the DW. It means reading the data and understanding them from different sources and then copying the necessary parts to the data submission area to continue the work later. The extraction step represents the greatest effort in the DW.  Data cleaning is a task of detecting errors in the data and correcting them if possible; it involves tasks such as dealing with missing elements and reducing noise by defining extreme values and correcting data conflicts [48].  Data transformation: When data are extracted from the source system, a series of actions is applied to convert the data into valid and meaningful formulas [49].  Load: Load services need support before and after loadings, such as the regeneration of indexes and physical sections of the table. The specificity and structure of each goal when loading is also considered [50].  Refresh: The last step of ETL where updates over time are transferred from data sources to repositories [51].

ETL
ETL is the stage where designers unconsciously prepare for fast-answered OLAP queries. In ETL, three important factors are prepared to make OLAP fast, namely, measurements, loading dimensions, and concept hierarchies. The first two stages (extract and transform) are implemented on the previously created staging table. The extraction process not only involves selecting the required data but also testing if the data are fulfilling the intended goals.
The two parts of extract and transform (ETL) are implemented on the staging table by using the SSIS package. The loading data mart strategy is divided into two stages, namely, loading dimensions and loading fact table. Figure 4 shows both stages of loading data mart schema tables. The first stage is loading the five-dimension tables with data. The multicast tool is used to make an image-like staging table to load dimensions by using a slowly changing dimension (SCD). Address and info dimensions are loaded using SCD with changing attributes. The next three dimensions (subject, enrolment and degree) are loaded with fixed attributes. The major difference between fixed and changing attributes is that fixed attributes do not detect changes in the staging table after loading data into dimensions, whereas changing attributes detect changes and reflect such changes in dimensions. Fixed attributes can be used for all dimensions' attributes. The second stage is building a cube to provide an analytical platform to perform OLAP queries. The cube is constructed to provide analysts with a platform where they can ask questions and find their answers as charts or tables. The educational multidimensional cube is implemented using SSAS 2014, which consists of the required dimensions and hierarchies. The cube consists of dimensions that can be used to answer OLAP queries on the basis of the measurement (count). Multidimensional cube is selected due to its advantages, such as fast complex query response and excellent performance [52], [53].

OLAP and KPI
The process of implementing OLAP and constructing reports is performed. OLAP is a technology used on DW architecture to obtain fast and accurate results as answers to complex queries [4]. The OLAP cube is primarily implemented to support complex queries on highly dimensional data structures. OLAP simultaneously processes dimensions and fact tables with the possibility to roll back if an error occurs. The OLAP cube is a popular important component of DW; the OLAP cube server stores security settings and complex calculations that can be integrated into data mining tools and algorithms [54]. The OLAP system is built on top of a relational database; OLAP has different categories, such as multidimensional OLAP (MOLAP), relational OLAP (ROLAP), hybrid OLAP (HOLAP), desktop OLAP (DOLAP), database OLAP and web OALP. MOLAP server is implemented by a multidimensional database, and all cube indexes are stored and retrieved. By contrast, ROLAP server sends query parameters and receives query answers to/from the relational database. HOLAP is a combination of the strengths of MOLAP and ROLAP features. DOLAP can be considered a variation of ROLAP. In DOLAP, users have the portability to perform OLAP complex queries by using existing DOLAP software on a pre-created multidimensional dataset on user desktop. In database OLAP, OLAP calculation can be performed on a relational database management system that supports the OLAP structure. Web OLAP allows OLAP calculations from web browsers. Three ways can be used to view OLAP results, namely, SSAS 2014 cube view, online reports using SSRS 2014 and offline reports using Microsoft Excel 2013 pivot table [16], [55].
The first access method is the SSAS cube view. After implementing a multidimensional educational cube by using SSAS 2014, the resulting cube is stored with separated locations, which can be accessed from the SSMS login screen locally or remotely. The SSMS login screen provides remote access to the database, ETL, cube and report. Cubes can be easily viewed by dragging and dropping dimension columns and the measurement. The educational cube is implemented on the basis of the multidimensional category because of its performance and flexibility in implementing complex queries. Figure 5 shows the grade counts in the midterm examinations of all the classes of students who reside in Bahadria City.
The first class has the greatest number of failed students with 228, followed by Classes 2 and 4 each with 180 students. In Class 3, 150 students failed, whereas Classes 5 and 6 hold 30 and 20 failed students, respectively. The figure also illustrates the other grades with the number of students according to class and grade. The figure represents the slice OLAP operation, where a slice of data is selected on the basis of Bahadria City as a dimension with other dimensions (class and mid-grade). The second method used to view reports is SSRS 2014 for constructing web OLAP. The reports are designed using SSRS 2014 to provide pre-defined web OLAP answers and remote access for analysts to view statistical and complex multidimensional query answers. Reports are stored in reporting servers that can be accessed remotely or locally within the intranet network. Figure 6 presents the results as a report using SSRS where these results are conducted in the previous Figure 5. The last method of viewing reports is using the offline Microsoft Excel 2013 pivot table to implement desktop OLAP. This method allows offline access for instant access to the cube. In this method, the final staging table is imported to an excel file after the transformation processes, and the pivot table report chart is used to view the results. The pivot table provides the flexibility to select the required chart to view the results on the basis of the selected columns and a measurement. Excel pivot provides fast calculation functions of a selected table. After importing the staging table from SSMS 2014 and selecting the imported  table, a pivot table can easily provide a chart to view the selected columns with a count aggregate function. Figure 7 illustrates the examples of excel pivot tables. The figure lists the number of students according to their ages (10-15) as classified on the basis of the three selected grades (Excellent (E), good (G), and very good (V)). Two ages (11,12) have the highest grades among the other ages. Students between the ages of 11 and 12 have the highest grade (E) among all students (with>300 cases), followed by those aged 12 (with >200 cases); these results can help analysts in finding the factors that improve student grades in these two age groups, including the factors that influence the grades in other ages. Key performance indicator (KPI) is one of the most powerful tools in any DSS. KPI helps in measuring performance according to specific criteria. SSAS provides a powerful environment to build and implement KPIs that help in constructing DSSs [10]. Figure 8 shows the KPI of the success rate that helps in measuring the performance according to the number of successful students. The value cursor represents the number of successful students in the current year, and the goal is to increase the success rate by 10%. The

Data-driven educational DSS evaluation
In this stage, the proposed DSS is evaluated to find the framework feasibility. The proposed system is implemented on a platform with 8 GB of random-access memory (RAM), intel core i3, 2.4 GHz of processer, Intel (R) 82579 LM Gigabit network adapter, and 140 GB free sized hard. The access to platform is restricted and the online access throughout network is restricted to permitted users only. These restrictions to access DSS satisfy the security aspects of the system. The privacy of the DSS is ensured by hiding all the personal information of the students in the system. Educational DSS holds KPI and three kinds of OLAP reports (Web OLAP, MOLAP, and desktop OLAP). Each OLAP report holds many strength points and serves a specific kind of user. In the proposed educational DSS, the user (manager, analyst, decision maker) has the flexibility to use and navigate any type of reports according to needs. Analysts can use MOLAP reports. MOLAP permits to select more than one dimension to perform a multidimensional expression on the selected dimensions. Managers can use web OLAP reports to view the analyst results on the predefined OLAP reports through the network. Desktop OLAP reports can be used by decision makers and any other users to perform all OLAP operations instantly and show the analytical results according to the selected dimensions. OLAP reports allow users to navigate the educational cube on the basis of the OLAP operations. KPI is a powerful tool in any DSS that helps in giving status and trend indicators and presenting the current value and how near it is to the goal.
The static evaluation method is performed to check the DSS errors by inspecting knowledge base without using DSS. The proposed DSS confirms the security aspects where the dashboards are accessed by permitted users only. These restrictions also confirm the privacy aspects where students' personal information is hidden from analysts. The evaluation process of DSS is a static method where the knowledge base is reviewed to check the errors and mistakes in the knowledge base of the DSS without using DSS. The presentation is checked to confirm the static evaluation requirements. The DSS usability, cost effectiveness and effectiveness are checked on the basis of the system usage [56]- [61]. Subsequently, the dynamic evaluation is performed over test cases after applying the DSS in the school. In this step, the experts check the obtained results from the test cases. All the usability factors are examined, such as learnability, understandability, error prevention, accuracy, operability, efficiency, attractiveness, and effectiveness [62]- [67].

CONCLUSION AND FUTURE WORKS
In this study, a roadmap for implementing an independent educational data mart is explained. The independent educational data mart is the base of the resulting DSS. Many difficulties are dealt with in implementing an educational data mart, such as handling the paper-based data source, implementing the ETL package, applying OLAP, KPI, and finally designing and deploying reports. The data source pre-processing stage involves converting paper-based student information and degrees into electronic data sources by using SSMS. The ETL package is implemented using SSIS, which involves many ETL tasks, such as deriving columns, filling missing values, constructing surrogate keys and constructing concept hierarchies. Four approaches can be used to design and implement models (i.e. top-down, bottom-up, inside out and mixed). The best approach to design the data mart is a bottom-up approach because the result is required in a short period to support decisions for constructing enterprise DW. The educational data mart provides many benefits, such as the provision of a fast-implemented platform to support educational decisions related to the academic performance of students; it also provides academic decision makers, teachers and school managers with solutions to the factors that improve students' success on the basis of their age, class, address and subject. In addition, the educational data mart provides educational stakeholders with a comprehensive overview of all data to detect outliers. Three famous schemas exist for implementing DW, namely, star schema, snowflake schema and fact constellation. Implementing a schema is a complementary step with the OLAP cube implementation because the OLAP cube completely depends on the previously constructed fact tables and dimensions. The proposed schema for the educational data mart is the star schema, which is selected on the basis of many reasons, such as the fast responses to OLAP queries, easiness to apply hierarchies on dimensions and easiness to adapt changes in the schema. The educational data mart can be considered a solid platform for data mining applications that need the information collected from the entire school. The use of a data mart at the school level is preferred in this type of analysis and aggregation of large data. Given the complexity and breadth of its scope of work, the DW of this institution is usually managed by the central information technology (IT) departments. ETL tasks take approximately 70% of time and effort during data mart implementation. Given that the data source of student degrees and information is converted into an electronic data source, the extraction task is performed smoothly.
The transformation task consists of many processes, such as deriving columns, converting data, filling missing values and constructing surrogate keys. The loading task primarily consists of two stagesloading dimensions and loading fact tables. SCD has three types that are used for loading dimensions, namely, historical, fixed, and changing attributes. However, only two types (fixed and changing attributes) are used to load dimensions. A fixed attribute is also used to load dimensions for one time and cannot allow changes, whereas a changing attribute detects changes in staging tables and reflects such changes on the dimensions. Loading fact tables involves looking up for keys in the dimensions and calculating the count measurements to load these looked-up keys and calculated measurements into the fact tables. Many OLAP categories can be implemented throughout this model, such as MOLAP, desktop OLAP and web OLAP. OLAP queries; MOLAP is especially fast because of its use of dimensions, concept hierarchies, and measurements. These three factors, in addition to storing a cube as indexed, make OLAP responses fast. KPI is an essential tool in any DSS where the KPI of a proposed success rate is implemented to measure the progress of increasing the success rate in a school. Based on the proposed educational data mart, data marts can be considered reliable platforms for performing machine learning and data mining algorithms, such as decision tree, neural networks, clustering and association rules, to detect and find hidden patterns that affect student performance in the knowledge base of DSS.
The static evaluation method is performed to check the DSS errors by inspecting the knowledge base without using DSS. The proposed DSS confirms the security aspects where the dashboards are accessed by permitted users only. These restrictions also confirm the privacy aspects where the personal information of students are hidden from analysts. The evaluation process of DSS is a static method where the knowledge base is reviewed to check the errors and mistakes in the knowledge base of DSS without using DSS. The presentation is checked to confirm the static evaluation requirements. The DSS usability, cost effectiveness and effectiveness are checked on the basis of the system usage. Subsequently, a dynamic evaluation is performed over test cases after applying the DSS in the school. In this step, experts check the obtained results from test cases. All the usability factors are tested, such as learnability, understandability, error prevention, accuracy, operability, efficiency, attractiveness and effectiveness. The proposed system can be depended as an analysis platform where it can be used to implement a mobile application to transmit online analytical results to all stakeholders. These results can be used to measure the progress to satisfy the school goals set earlier. This system can be used for implementing data mining algorithms in order to find the hidden patterns, relationship among attributes, and students failure causes. Using these algorithms can enhance students' performance and though improve academic performance of the school. Besides, a knowledge mining will be implemented to conduct the data mining algorithms results to allow users to find the relationship among features in the data.