labourR 1.0.0: Automatic Coding of Occupation Titles

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.
Occupations classification is an important step in tasks such as labour market analysis, epidemiological studies and official statistics. To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). labourR is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.



The initial motivation was to retrieve the work experience history from a Curriculum Vitae generated from the Europass online CV editor. Document vectorization is performed using the ESCO model and a fuzzy match is allowed with various string distance metrics.

The labourR package:

  • Allows classifying multilingual free-text using the ESCO-ISCO hierarchy of occupations.
  • Computations are fully vectorized and memory efficient.
  • Includes facilities to assist research in information mining of labour market data.

Installation

You can install the released version of labourR from CRAN with,

install.packages("labourR")
Example
    library(labourR)
    corpus <- data.frame(
      id = 1:3,
      text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's")
    )
    Given an ISCO level, the top suggested ISCO group is returned. num_leaves specifies the number of ESCO occupations used to perform a plurality vote,

    classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5)
    #>    id iscoGroup                                          preferredLabel
    #> 1:  1       251       Software and applications developers and analysts
    #> 2:  2       214 Engineering professionals (excluding electrotechnology)
    #> 3:  3       523                              Cashiers and ticket clerks
    
    For further information browse the vignettes.

    Published by

    Alexandros Kouretsis

    Skilled in Mathematical Modeling, Computational statistics, and Management. Strong research professional with a Doctor of Philosophy (PhD) focused in Theoretical and Mathematical Physics. Passionate about machine learning and data visualization.

    3 thoughts on “labourR 1.0.0: Automatic Coding of Occupation Titles”

      1. You may try

        remotes::install.github(“AleKoure/labourR”)

        and work through the dependencies by installing older versions. e.g.

        remotes::install_version(“ISOcodes”, “2018.06.29”)

        Many packages these days need versions of R >= 3.5.

    Leave a Reply to Duncan Williamson Cancel reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.