R-posts.com

labourR 1.0.0: Automatic Coding of Occupation Titles

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.
Occupations classification is an important step in tasks such as labour market analysis, epidemiological studies and official statistics. To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). labourR is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.



The initial motivation was to retrieve the work experience history from a Curriculum Vitae generated from the Europass online CV editor. Document vectorization is performed using the ESCO model and a fuzzy match is allowed with various string distance metrics.

The labourR package:

Installation

You can install the released version of labourR from CRAN with,

install.packages("labourR")
Example
library(labourR)
corpus <- data.frame(
  id = 1:3,
  text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's")
)
Given an ISCO level, the top suggested ISCO group is returned. num_leaves specifies the number of ESCO occupations used to perform a plurality vote,

classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5)
#>    id iscoGroup                                          preferredLabel
#> 1:  1       251       Software and applications developers and analysts
#> 2:  2       214 Engineering professionals (excluding electrotechnology)
#> 3:  3       523                              Cashiers and ticket clerks
For further information browse the vignettes.
Exit mobile version