Extraction (or parsing) refers to the process of analysing and processing data from the web along pre-defined extraction goals. In contrast to spidering (or crawling), where a huge amount of data is collected but not yet sorted for content, extraction is able to classify the collected data and tag important phrases for later use.

For example: When an extraction tool for a job search engine platform finds the workload indication “60-70%”, it tags this job ad as a part-time job

In combination with an ontology, an extraction tool can automatically tag and link extracted words and phrases according to the rules stored in the ontology. As a result, content can be annotated semantically and the quality of the tags is enhanced.

For example: When the job title “CEO assistant” is recognised, the extraction tool tags this job ad as an assistant job and not as a management position.

Moreover, the terms extracted by the extraction tool from the web are added to the ontology. Therefore, the ontology is not only enriched by additional terms but also improves the matching quality for the particular platform using the extraction tool. The following example illustrates which rules are underlying the technology for a job search engine platform, so that all jobs are found online as well as tagged and sorted correctly:

An extraction tool, together with an ontology, can search the web for the following content and process it along those tags:

    • age
    • company
    • education
    • hard skill
    • industrial sector
    • IT skill
    • job experience
    • job title
    • language
    • location
    • skills
    • soft skills

In addition, this information can not only be found in job ads but also in CV’s. This means that extraction connected to an ontology is able to match CV and vacancy.