labourR 1.0.0: Automatic Coding of Occupation Titles

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.

Occupations classification is an important step in tasks such as labour market analysis, epidemiological studies and official statistics. To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). labourR is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.

The initial motivation was to retrieve the work experience history from a Curriculum Vitae generated from the Europass online CV editor. Document vectorization is performed using the ESCO model and a fuzzy match is allowed with various string distance metrics.

The labourR package:

Allows classifying multilingual free-text using the ESCO-ISCO hierarchy of occupations.
Computations are fully vectorized and memory efficient.
Includes facilities to assist research in information mining of labour market data.

Installation

You can install the released version of labourR from CRAN with,

install.packages("labourR")

Example

library(labourR)
corpus <- data.frame(
  id = 1:3,
  text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's")
)

Given an ISCO level, the top suggested ISCO group is returned. num_leaves specifies the number of ESCO occupations used to perform a plurality vote,

classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5)
#>    id iscoGroup                                          preferredLabel
#> 1:  1       251       Software and applications developers and analysts
#> 2:  2       214 Engineering professionals (excluding electrotechnology)
#> 3:  3       523                              Cashiers and ticket clerks

For further information browse the vignettes.

Published by

Alexandros Kouretsis

Skilled in Mathematical Modeling, Computational statistics, and Management. Strong research professional with a Doctor of Philosophy (PhD) focused in Theoretical and Mathematical Physics. Passionate about machine learning and data visualization. View all posts by Alexandros Kouretsis

3 thoughts on “labourR 1.0.0: Automatic Coding of Occupation Titles”

I tried to install labourR in RStudio and got this error message

package ‘labourR’ is not available (for R version 3.4.3)

Any hope for me?

Duncan

Alexandros Kouretsis says:

July 26, 2020 at 12:11 pm

You may try

remotes::install.github(“AleKoure/labourR”)

and work through the dependencies by installing older versions. e.g.

remotes::install_version(“ISOcodes”, “2018.06.29”)

Many packages these days need versions of R >= 3.5.

Reply

Pingback: R Weekly 2020-30 Baby sleep, JS debugging – R-Craft

Published by

Alexandros Kouretsis

3 thoughts on “labourR 1.0.0: Automatic Coding of Occupation Titles”

Leave a Reply to Duncan Williamson Cancel reply