SkiLLens: Recognising and Mapping Novel Skills from Millions of Job Ads Across Europe Using Language Models
Conference paperAlessia De Santo, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Research publications in journals, conferences, and workshops.
Alessia De Santo, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Federico Clerici, Navid Nobani
Navid Nobani, Giovanni Officioso, Filippo Pallucchini, Giancarlo Sperlì, Fabio Mercorio
Roberto Boselli, Simone D’Amico, Navid Nobani
Carlos Chiatti, Marco Alberio, Giovanni Lamura, Navid Nobani, Daniele Vignoli
Erik Cambria, Lorenzo Malandri, Fabio Mercorio, Navid Nobani, Andrea Seveso
Erik Cambria, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani, Andrea Seveso
Anna Giabelli, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Taxonomies organise knowledge through concepts connected by IS-A relationships, but maintaining and updating them is often costly and time-consuming. Word embeddings can help enrich taxonomies by capturing semantic similarities from large text corpora, though evaluating whether these embeddings preserve the taxonomy’s structure remains challenging. In this paper, we introduce MEET-LM, a methodology for generating and evaluating embeddings that preserve co-hyponymy relations derived from a domain taxonomy. We apply the method to more than 2 million ICT job vacancies classified using the ESCO taxonomy. A neural classifier trained on the resulting embeddings achieves 99.4% accuracy and an F1-score of 86.5%, demonstrating that the approach effectively captures taxonomy-based semantic relations.
Navid Nobani, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica
Word embeddings are effective at capturing semantic and lexical similarities across many domains. However, when the training corpus is associated with a taxonomy (for example in classification tasks based on standard taxonomies), common intrinsic and extrinsic evaluation methods do not guarantee that the embeddings remain consistent with the taxonomy’s structure. This limitation reduces the applicability of distributional semantics in such contexts. To address this problem, we introduce MEET, a framework that includes a new evaluation measure, HSS, designed to assess whether embeddings generated from a corpus preserve the semantic similarity relations defined by the taxonomy.
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Taxonomies provide a structured representation of semantic relations between terms. In the case of official taxonomies, refining them requires keeping the hierarchy updated while preserving its original structure. Most automated taxonomy refinement approaches rely on word embeddings, but they rarely verify whether these models actually encode the semantic similarities defined by the taxonomy. To address this issue, we introduce TaxoRef, a methodology that (i) models semantic similarity between taxonomic concepts using a new metric called HSS, (ii) evaluates how well embeddings preserve these similarity relations, and (iii) uses the best-performing embeddings to support taxonomy refinement. We apply TaxoRef to more than 2 million ICT job advertisements classified under the ESCO European taxonomy. The results show that HSS outperforms existing taxonomy similarity measures and that TaxoRef effectively captures similarities between occupations, providing useful insights for improving and updating the taxonomy.
Navid Nobani, Fabio Mercorio, Mario Mezzanzanica
Navid Nobani, Mauro Pelucchi, Matteo Perico, Andrea Scrivanti, Alessandro Vaccarino
Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, Navid Nobani
Taxonomies play a key role in many Semantic Web and natural language processing applications, as they organise knowledge and support machine understanding. However, maintaining and updating these hierarchies so that they accurately represent a domain is still a time-consuming and error-prone task. Word embeddings can help enrich taxonomies by capturing lexical and semantic similarities from text, but evaluating whether embeddings preserve the taxonomy’s semantic structure remains challenging. In this work, we introduce MEET, a methodology for generating and evaluating embeddings that preserve semantic similarity relations derived from a taxonomy. We also propose a new metric, Hierarchical Semantic Similarity (HSS), to measure similarity between taxonomic concepts. Experiments show that HSS outperforms existing similarity measures and that embeddings selected through MEET achieve better performance on benchmark tasks. To support reproducibility, we released an open-source repository containing all materials used in the study, including HSS scores for 35,000 word pairs.