Purpose – This paper promises to shed light on the heterogeneous nature of the skills required to ‘win’ with Big Data by analysing a large amount of job posts published online. More specifically we: 1) identify the most important ‘job families’ related to Big Data; 2) recognize homogeneous groups of skills (skillsets) that are most sought after by companies; 3) characterize each job family with the appropriate level of competence required within each Big Data skillset. Design/methodology/approach – We implement a semi-automated, fully reproducible, analytical methodology that is able to cope with the significant amount of job posts obtained by scraping some of the most popular job search online portals. Job families are identified through the expert evaluation of the most important keywords appearing in job posts’ titles. Skillsets are instead obtained by using Latent Dirichlet Allocation (LDA), an unsupervised machine learning algorithm used for text classification. Finally, we characterize the job families through a measure of the relative importance of each skillset. Originality/value – This study represents one of the first attempts to classify jobs in families and describe them in terms of skill requirements by means of a large-scale, semi-automated job post analysis, based on machine learning algorithms. To do so, we propose an original combination of various analytical techniques, which are widely established in previous scientific works. The characterization of job families through text mining and topic modelling techniques is innovative and can be reapplied to similar future studies focusing on any other professional field. Practical implications – This paper brings clarity to the multifaceted nature of Big Data competency requirements and job role types. Our results can concretely help business leaders and HR managers create clearer strategies for the procurement of the right skills needed to leverage Big Data at best. In addition, the structured classification of job families and skillsets will help establish a common language to be used within the job market, through which supply and demand can more effectively meet.

Beyond Data Scientists: a Review of Big Data Skills and Job Families

GRECO, Marco;GRIMALDI, Michele;
2016-01-01

Abstract

Purpose – This paper promises to shed light on the heterogeneous nature of the skills required to ‘win’ with Big Data by analysing a large amount of job posts published online. More specifically we: 1) identify the most important ‘job families’ related to Big Data; 2) recognize homogeneous groups of skills (skillsets) that are most sought after by companies; 3) characterize each job family with the appropriate level of competence required within each Big Data skillset. Design/methodology/approach – We implement a semi-automated, fully reproducible, analytical methodology that is able to cope with the significant amount of job posts obtained by scraping some of the most popular job search online portals. Job families are identified through the expert evaluation of the most important keywords appearing in job posts’ titles. Skillsets are instead obtained by using Latent Dirichlet Allocation (LDA), an unsupervised machine learning algorithm used for text classification. Finally, we characterize the job families through a measure of the relative importance of each skillset. Originality/value – This study represents one of the first attempts to classify jobs in families and describe them in terms of skill requirements by means of a large-scale, semi-automated job post analysis, based on machine learning algorithms. To do so, we propose an original combination of various analytical techniques, which are widely established in previous scientific works. The characterization of job families through text mining and topic modelling techniques is innovative and can be reapplied to similar future studies focusing on any other professional field. Practical implications – This paper brings clarity to the multifaceted nature of Big Data competency requirements and job role types. Our results can concretely help business leaders and HR managers create clearer strategies for the procurement of the right skills needed to leverage Big Data at best. In addition, the structured classification of job families and skillsets will help establish a common language to be used within the job market, through which supply and demand can more effectively meet.
2016
978-88-96687-09-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11580/55712
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact