Smart job searching system based on information retrieval techniques and similarity of fuzzy parameterized sets

Received Mar 31, 2020 Revised Jun 30, 2020 Accepted Jul 12, 2020 Job searching for the proper vacancy among several choices is one of the most important decision-making problems. The necessity of dealing with uncertainty in such real-world problems has been a long-term research challenge which has originated from different methodologies and theories. The main contribution of this work is to match the applicant curriculum vitae (CV) with the best available job opportunities based on certain criteria. The proposed job searching system (JSS) implements a series of approaches which can be broken down into segmentation, tokenization, part of speech, gazetteer, and fuzzy inference to extract and arrange the required data from the job announcements and CV. Moreover, this study designs a fuzzy parameterized structure to store such data as well as a measuring tool to calculate the degree of similarity between the job requirements and the applicant’s CV. In addition, this system analyses the computed similarity scores in order to get the optimal job opportunities for the job seeker in descending order. The performance evaluation of the proposed system shows high recall and precision percentages for the matching process. The results also confirm the viability of the JSS approach in handling the fuzziness that is associated with the problem of job searching.


INTRODUCTION
Job searching is the process of matching the job seeker's application with most appropriate job opportunities and vacancies [1]. An efficient matching process helps the persons acquiring the appropriate job to maximize their wages and maintain their productive contributions. However, the current job searching problem arises because the job seekers' familiarity about the advertised job opportunities in the market is insufficient, and also because the employers do not hold unlimited information either [2].
Nowadays, there are many individuals who would strive to find an opportunity in which they fit the most. From the organization's perspective, however, in some situations it is considered to be difficult to find the best available applicants because of the variety nature of the given qualifications. Typically, searching for a job is accomplished using traditional methods such as looking up for advertisements in magazines, newspapers, or through job recruiters [3]. In some other situations, an individual might not be satisfied with his/her current job because of several reasons. One of these reasons would be the inconsistency that can rise in different positions where the employee has been incorrectly selected by the employer based Int J Elec & Comp Eng ISSN: 2088-8708  Smart job searching system based on information retrieval techniques and similarity ... (Malek Alksasbeh) 637 on misleading qualifications. For example, an Engineer who works in an administrative position would not be able to contribute much to the work because the position is not a technical one where he/she was trained for at the first place. Consequently, being in such scenario may cause the individual to quit his job and look for another one. The constant mobility of workers will increase the cost for the organizations as well as the individuals. Therefore, the traditional job-searching methods are considered not effective for the average job seeker.
With the massive increase in the overall population internationally and the consequent demand for jobs, the need for an electronic job searching application has become one of the recent tools over the past few years [4]. The dramatic advancements in technologies and the overall structure of modern world have created new jobs that require more sophistication when being matched with the right individual. The recruiter must deal with huge number of applications and must be able to find the best match for the job being offered.
Currently, the primary function of most of electronic job searching applications is based on structured data which are composed and stored in a database. However, using traditional tools such as databases may cause the loss of some important information that is essential to job seekers. Moreover, the task of many of these traditional applications basically involves matching the academic and the professional applicants' qualifications with the employers' requirements. This matching process does not necessarily optimize the process of ranking the applicants' qualifications based on their relative appropriateness to the available jobs [5]. Furthermore, the existing job searching systems pick out those firms that meet 100% of the applicant's qualifications and discard the rest even if the applicant satisfies only partial criteria. In fact, the proposed job searching system can select the firms that are even meeting the near about or close needs. It is because if a firm does not meet (100%), then the firm with more requirements should be kept under consideration.
Recently, many researchers have investigated the problem of recruiting and job searching based on the concept of retrieval technology. However, there has been a lack of research studies that attempt to apply the theory of information retrieval and fuzzy set to the process of recruiting and job matching. Berio and Harzallah [6] highlighted the importance of applying knowledge techniques to extract competence from textual resources as an attempt to categorize major keywords and match them with the required data in a process defined as the management stage. They used ontology techniques and e-learning system. Authors also described what they call it a competency resource aspect individual (CRAI) model to deliver a representation of competencies attained in the job matching process.
Wei-Shen and Chung-Chian [7] introduced a selection tool that supports business managers to find qualified talents more efficiently. Their tool is based on fuzzy data mining method which can be applied to obtain valid fuzzy association rules from existing transaction database using a dedicated "Apriori" algorithm. Authors implemented an experiment in a real-world study to validate the claimed hypothesis. The main contribution of their work can be summarized by introducing a technique that can be extended beyond personnel selection, so that it can be used to predict future organizational behavior of the employees.
Dorn and Naz [8] designed a prototype based on HR-XML. The prototype utilizes a modified version of XML sheets which is dedicated to the problem of job searching. The system works by extracting significant information from the application. Such data include paycheck, topic, and abilities.
One of the challenges when designing a job recruiting system is the huge number of applications. Bhargavi et al., [9] presented a system that analyses the applicant's CV in order to extinguish the best candidate from the large quantity of CVs using rule-based decision tree approach, The proposed system basically works by converting the textual document into ASCII. A template is used in the process of conversion.
Distinguishing applicants according to their skills and qualification is another important challenge in this context. Lichtnow et al., [10] described a software program that recognizes significant areas and skill degrees of individuals by analyzing their CV. A text mining technique is used for classifying areas of expertise. This process is called classification process, and it works by linking the text document to a set of concepts defined previously in domain ontology. Along with this line, Shatovska et al., [11] presented an intelligent management system to support recruitment and selection services using text mining methods. Their system can be considered as a virtual employment consultant which primarily works by integrating clustering approach which checks CV's for similarity estimation.
The authors in [12] proposed a finite state transducer (FST) method of formalism that can be used to extract key information from the intended CV. The authors proposed a model for the representation of the CV's content using FST formalism. In their illustration, the parser browses the CV in XML format, and then recognizes the different tags which should have a connection with the employer requirements to construct the transducer. Their approach ensures an easy and efficient CV retrieval.
Ben Abdessalem and Amdouni [13] proposed an electronic recruiting support system based on text mining methods. They introduced an approach for analyzing and structuring CVs written in French language. Al-Ramadin et al., [14] introduced an automated job search (AJS) algorithm using information retrieval techniques. AJS searches the applicants' CVs and provides the employers a list of candidates who could fit the vacant positions. AJS then offers a list of jobs combined with links to their correspondent employers to the Jobseeker. A similar study can be found in the work of Owoseni et al., [15] where they introduced a 3-tier electronic recruitment system which models serialized information of the applicants such as their income and social circumstances. The documents are analyzed using document retrieval and natural language processing in order to examine documents and applications just like humans would do.
Buettner and Timm [16] had a different approach to solve the problem of recruiting. They introduced an innovative consulting approach for recruiting human resources based on an automated assessment of the personality-organization environment fit. Hence, the job applicant's personality traits can be automatically derived from social media usage. Slamet et al., [17], on the other hand, proposed a simpler method for the solving the problem of job searching simplification through construction and collaboration of web scraping technique. They used a classification technique using Naïve Bayes in a search engine developed specifically for this purpose. The results of testing multiple five-time classification on eight categories show that the algorithm performs with accuracy rate of 71.87%.
Qin et al., [18] proposed a semantic representation which directly deals with each single word in the given document. The system is particularly designed for both job requirements fulfillment and matching job seekers' experiences based on a neural network solution called recurrent neural network (RNN). In this technique, four hierarchical strategies that perform ability and awareness matching are designed to measure the importance of job requirements for semantic representation, as well as measuring the contribution of each job experience to a specific ability requirement.
Web based applications have had a great portion in this context, Punitavathi et al., [19] developed a web application system for both online job seeking and recommending candidates using a system called professional social recommender (PSR). The system is based on text field filtering. The final architecture is composed of a Job-seeker interface, a candidate recruitment interface with reference to a recommendation database.
Fuzzy logic is another method that has been integrated in similar systems. Alqahtani et al., [20] proposed a recruitment system which incorporates fuzzy logic to select the suitable candidates based on matching the job position with the closest requirements. When the candidate does not completely meet the available position, a different job with lesser requirements can be kept under consideration. Thus, there will be higher chances where a person satisfies 9 out of 10 requirements could be ascertained to be placed in the best position.
However, none of the efforts reported earlier in the literature has been designed specifically to solve the uncertainties problems of job searching portals by means of measuring the similarity of fuzzy-parameterized sets. The nature of fuzzy-parameterized sets is suitable to store data from job seekers and firms, because at the end of the day, the job search system matches those parameters that job seekers have and what such firms are looking for. This can be made easily by storing data in fuzzy-parameterized set because it can be conceived as a structure of objects and their parameters where the belongingness of objects is taken place. Belongingness, in this context, means how qualified is the job seeker in certain skill (parameter) and it ranges from zero to one. On the other hand, from firms' point of view, belongingness means how qualified should be the applicant. To summarize, fuzzy-parameterized sets and their similarity measure are introduced to use them for this study to suit the problem in hand.
Thus, this study proposed an approach which tries to overcome those issues by using preprocess information retrieval techniques which can be divided into segmentation, tokenization, part of speech, and gazetteer to retrieve the parameters with their fuzzy value. Then, the fuzzy-inference approach will be implemented to assign the degree of membership for each parameter based on their fuzzy value. The proposed technique also tries to transform the retrieved parameter with their degree of membership to fuzzy-parameterized sets in order to compute the similarity between the job seekers' curriculum vitae (CV) with the job announcements. Therefore, the proposed system enables existing job application portals of job selection capability in a way that increases the chances of selecting the "ideal job" based on the applicant's qualifications.

FUZZY PARAMETERIZED SETS AND ITS SIMILARITY MEASURE
In this study, segmentation, tokenization, part of speech, and gazetteer are implemented to extract the requirements that firm asking for as well as extracting the qualifications and other skills that applicant has, from the job announcment and the CV respectively. More Precisely, this study consider those requirements and qualifications as parameters. Thus, job announcement and CV can be stored as: Respectively, where is the set of job announcement's parameters, is the set of CV's parameters, is the firm, and is the applicant. In addition, fuzzy inference is implemented to assign each parameter either from the job announcement or from the applicant's CV with a membership grade to point out: -How quailfied shoud be the applicant in each parameter.
-How qualified is the applicant in each parameter. Assigning each element with a membership grade form a desirable concept. This concept is coined by a fuzzy set [21]. In the forthcoming definition, the fuzzy sets are formally defined over a universal set as a follows,

Definition 1. A fuzzy set A over a universal set E is characterized by a membership function
Symbolically, A fuzzy set over can be written as a follows, Therefore, ( , ) and ( , ) conceived as fuzzy paramtrized sets. Clearly, when common parameters are more it means similarity is more. Furthermore, when common parameters have similar membership grades, for the same parameter, the similarity will be increased. Taking this into account, this study aims to construct a new similarity measure between the job announcement and the applicant's CV. In this context, two natural factors (axioms) that affect the measuring tools are: -The more common parameters, the more similarity.
-The close membership grades of the same parameters, the more similarity will be obtained. For factor (1), the following formula is given: and for factor (2), we provide the following formula: where is a memebership function of parameter from the CV. Now, the way is paved to construct a similarity measure for the problem in hand. The following formula computes the similarly between two fuzzy paramerized sets.

RESEARCH METHOD
The proposed job searching system performs automated job announcement ranking by calculating the similarity measure of fuzzy soft sets in order to find a score which measures the correspondence between the applicant's CV and the firm announcement. The objective of such score is to calculate the announcement's similarity scores, which reflect how job specifications fit with the applicant CV. Figure 1 presents an overview of the proposed system architecture, which consists of the following modules: File acquisition (CV or Job announcement), Segmentation, Tokenization, Part of speech, Gazetteer, Fuzzy inference, Transform to fuzzy-parameterized set, Compute similarity, and finally present arrangement of optimal choices based on a similarity score.

File acquisition
This module offers the capability of acquiring a text file which contains unstructured data from two sides. The first side acquires the announcement provided by the firm as an input, while the second part takes the application provided by the job seeker as CV text document as shown in Figure 2 and Figure 3.

Segmentation
This module identifies sentences in the CVs and available vacancies. In this study, ANNIE sentence splitter is used to split CVs and vacancies into sentences. This splitter uses a gazetteer list of abbreviations to support the process of recognizing sentence-marking full stop [13,22,23]. For example, consider the sentence "Dr. John was born in June 1983"; the full stop after "Dr" is not a sentence-marking stop. Thus, each sentence is interpreted as Sentence and each sentence break is interpreted as Sentence Split.

Tokenization
In this module, tokenization is the way of splitting each sentence into words and terms by removing empty sequences and various symbols such as punctuation, numbers, and symbols in the text. This module uses the ANNIE Tokenizer for tokenizing the text documents and take each word or term from the first character to the last character, where each word or term is called token [22,23].

Part of speech tagging
This module follows the tokenization and the segmentation modules to categorize the tokens into various classes such as verbs, pronouns, proper nouns, noun phrases, etc. This study produces these classes as an annotation class on each token based on predefined rules for categorization utilized through ANNIE POS tagger [23]. Moreover, this tagger extracts each Named Entity such as gender, job title, nationality, location, organization, etc., that can be represented with a proper name [22]. Named entities can simply be viewed as entity instances (e.g., Web designer is an instance of job title). Thus, this module helps us as a predefined step for the Gazetteer module.

Gazetteer
This module creates a consultation annotation lists for each category created by ANNIE POS tagger to provide information about name entities. For this purpose, this study extended the gate gazetteer lists [23] to handle special information in CVs and vacancies from various domains such as information technology, physics, engineering, linguistics, etc., in order to enable the recognition of entity lists using lookup lists with one entry per line. This extension for gate gazetteer lists is stored as a "list" file which contains requirements parameters with their fuzzy value. This helps us to extract a crisp value that best represents a fuzzy set.

Fuzzy inference
A fuzzy inference module can deal with either fuzzy inputs or crisp inputs, but its outputs are mostly fuzzy sets. In this study, job searching system is implemented during this module based on the Mamdani fuzzy model which is adopted in [5], in order to assign the degree of membership for each parameter that a range between 0 and 1.

Transform to fuzzy parameterized sets
Fuzzy inference based on job announcement, provides each common parameter, which extracted from the CV, with membership grade. This module transfers fuzzy sets and into fuzzy parameterized sets ( , ) and ( , ), by combining fuzzy sets and , over the universal , with firm and the applicant repectivelty.

Compute similarity
This module computes the similarity between two fuzzy parameterized sets by means of (( , ), ( , )).
Over the univarsal E = { 1 , 2 , … , 10 }, consider a hypothetical job announcment by firm, say 1 , and CV of applicant as Note that, the values of and are given to weight factor (1) and (2), respectively. Where factor (1), by its nature, has more effect on the similarity.

Arrangement of optimal job choices based on a similarity score
In order to determine the optimal job choices, it is essential to invoke further re-ranking/re-ordering using SQL server rank function [24]. These functions return positive similarity scores, where the highest score in the entire list indicates the best available vacancy; hit sets are sorted according to descending scores as shown in Figure 4.

EXPERIMENTAL EVALUATION
Experiments were performed using 200 job announcements and 10 applicant CV documents contain natural language requirements specification in English with ranges between 300 and 1000 words in size. In this study, Recall (R), Precision (P), and F-measure are used as external evaluation measures according to the formulae below, with a half weight accorded to partially correct answers [25]. These measures are the common evaluation criteria used in the domain of information retrieval to evaluate our system performance [13,25].
P is calculated based on the amount of information correctly returned (correct), the amount of information that is partially returned (partial), and the amount of information incorrectly returned (spurious) by our system. R is derived from the amount of information correctly, partially returned, and the amount of unreturned information (missing) by our system. F-measure is the conjunction of P and R, as a weighted average of the two measures. Table 1 exhibits an average of R, P, and F-measure scores. The overall performance of the system was 95% of R, 86% P, and 91% F-measure. Figure 5 shows the P, R, and F-measure curves results for 10 applicants.

CONCLUSION
This study presented a job searching system based on a similarity of fuzzy parameterized sets which help the job seekers to find the best available vacancies that match their qualifications. Technically speaking, the smart system works by combining a set of preprocessed information retrieval techniques along with their fuzzy value so that similarity parameters are generated. Then, the fuzzy-inference approach is implemented to assign the degree of membership for each parameter based on their fuzzy value. Next, transform the retrieved parameter with their degree of membership to fuzzy-parameterized sets. Finally, the system computes the similarity between the job seekers' CVs and the job announcements. The experimental evaluation has shown that the proposed system performs well and proved good performance of the proposed retrieval and ranking process. Such process has been applied to the job vacancies provided by the designated firm. Consequently, the process is based on the given qualifications which are acquired from CV.