The International Journal of Information and Learning Technology
PurposeThis paper aims to demonstrate how skills taxonomies can be used in combination with machi... more PurposeThis paper aims to demonstrate how skills taxonomies can be used in combination with machine learning to integrate diverse online datasets and reveal skills gaps. The purpose of this study is then to show how the skills gaps revealed by the integrated datasets can be used to achieve better labour market alignment, keep educational offerings up to date and assist graduates to communicate the value of their qualifications.Design/methodology/approachUsing the ESCO taxonomy and natural language processing, this study captures skills data from three types of online data (job ads, course descriptions and resumes), allowing us to compare demand for skills and supply of skills for three different occupations.FindingsThis study illustrates three practical applications for the integrated data, showing how they can be used to help workers who are disrupted by technology to identify alternative career pathways, assist educators to identify gaps in their course offerings and support stude...
Timely and accurate statistics on the labour market enable policymakers to rapidly respond to cha... more Timely and accurate statistics on the labour market enable policymakers to rapidly respond to changing economic conditions. Estimates of job vacancies by national statistical agencies are highly accurate but reported infrequently and with time lags. In contrast, online job postings provide a high-frequency indicator of vacancies with less accuracy. In this study we develop a robust signal averaging algorithm to measure job vacancies using online job postings data. We apply the algorithm using data on Australian job postings and show that it accurately predicts changes in job vacancies over a 4.5-year period. We also show that the algorithm is significantly more accurate than using raw counts of job postings to predict vacancies. The algorithm therefore offers a promising approach to the timely and reliable measurement of changes in vacancies.
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Information flow during catastrophic events is a critical aspect of disaster management. Modern c... more Information flow during catastrophic events is a critical aspect of disaster management. Modern communication platforms, in particular online social networks, provide an opportunity to study such flow and derive early-warning sensors, thus improving emergency preparedness and response. Performance of the social networks sensor method, based on topological and behavioral properties derived from the “friendship paradox”, is studied here for over 50 million Twitter messages posted before, during, and after Hurricane Sandy. We find that differences in users’ network centrality effectively translate into moderate awareness advantage (up to 26 hours); and that geo-location of users within or outside of the hurricane-affected area plays a significant role in determining the scale of such an advantage. Emotional response appears to be universal regardless of the position in the network topology, and displays characteristic, easily detectable patterns, opening a possibility to implement a simple “sentiment sensing” technique that can detect and locate disasters
Supporting materials for Bandit Strategies in Social Search: the case of the DARPA Red Balloon Ch... more Supporting materials for Bandit Strategies in Social Search: the case of the DARPA Red Balloon Challenge. (pdf)
China is the world's second largest economy. After four decades of economic miracles, China&#... more China is the world's second largest economy. After four decades of economic miracles, China's economy is transitioning into an advanced, knowledge-based economy. Yet, we still lack a detailed understanding of the skills that underly the Chinese labor force, and the development and spatial distribution of these skills. For example, the US standardized skill taxonomy O*NET played an important role in understanding the dynamics of manufacturing and knowledge-based work, as well as potential risks from automation and outsourcing. Here, we use Machine Learning techniques to bridge this gap, creating China's first workforce skill taxonomy, and map it to O*NET. This enables us to reveal workforce skill polarization into social-cognitive skills and sensory-physical skills, and to explore the China's regional inequality in light of workforce skills, and compare it to traditional metrics such as education. We build an online tool for the public and policy makers to explore the...
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
When facing threats from automation, a worker residing in a large Chinese city might not be as lu... more When facing threats from automation, a worker residing in a large Chinese city might not be as lucky as a worker in a large U.S. city, depending on the type of large city in which one resides. Empirical studies found that large U.S. cities exhibit resilience to automation impacts because of the increased occupational and skill specialization. However, in this study, we observe polarized responses in large Chinese cities to automation impacts. The polarization might be attributed to the elaborate master planning of the central government, through which cities are assigned with different industrial goals to achieve globally optimal economic success and, thus, a fast-growing economy. By dividing Chinese cities into two groups based on their administrative levels and premium resources allocated by the central government, we find that Chinese cities follow two distinct industrial development trajectories, one trajectory owning government support leads to a diversified industrial structur...
Immersive virtual- and augmented-reality headsets can overlay a flat image against any surface or... more Immersive virtual- and augmented-reality headsets can overlay a flat image against any surface or hang virtual objects in the space around the user. The technology is rapidly improving and may, in the long term, replace traditional flat panel displays in many situations. When displays are no longer intrinsically flat, how should we use the space around the user for abstract data visualisation? In this paper, we ask this question with respect to origin-destination flow data in a global geographic context. We report on the findings of three studies exploring different spatial encodings for flow maps. The first experiment focuses on different 2D and 3D encodings for flows on flat maps. We find that participants are significantly more accurate with raised flow paths whose height is proportional to flow distance but fastest with traditional straight line 2D flows. In our second and third experiment, we compared flat maps, 3D globes and a novel interactive design we call MapsLink, involvi...
Collective search for people and information has tremendously benefited from emerging communicati... more Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-critical tasks such as the DARPA Network Challenge (DNC, also known as the Red Balloon Challenge). However, while collective search often invests significant resources in encouraging the crowd to contribute new information, the effort invested in verifying this information is comparable, yet often neglected in crowdsourcing models. This paper studies how the exploration-verification trade-off displayed by the teams modulated their success in the DNC, as teams had limited human resources that they had to divide between recruitment (exploration) and verification (exploitation). Our analysis suggests that team performance in the DNC can be modelled as a modified multi-armed bandit (MAB) problem, where information arrives to the team originating from sources of different levels of ve...
The International Journal of Information and Learning Technology
PurposeThis paper aims to demonstrate how skills taxonomies can be used in combination with machi... more PurposeThis paper aims to demonstrate how skills taxonomies can be used in combination with machine learning to integrate diverse online datasets and reveal skills gaps. The purpose of this study is then to show how the skills gaps revealed by the integrated datasets can be used to achieve better labour market alignment, keep educational offerings up to date and assist graduates to communicate the value of their qualifications.Design/methodology/approachUsing the ESCO taxonomy and natural language processing, this study captures skills data from three types of online data (job ads, course descriptions and resumes), allowing us to compare demand for skills and supply of skills for three different occupations.FindingsThis study illustrates three practical applications for the integrated data, showing how they can be used to help workers who are disrupted by technology to identify alternative career pathways, assist educators to identify gaps in their course offerings and support stude...
Timely and accurate statistics on the labour market enable policymakers to rapidly respond to cha... more Timely and accurate statistics on the labour market enable policymakers to rapidly respond to changing economic conditions. Estimates of job vacancies by national statistical agencies are highly accurate but reported infrequently and with time lags. In contrast, online job postings provide a high-frequency indicator of vacancies with less accuracy. In this study we develop a robust signal averaging algorithm to measure job vacancies using online job postings data. We apply the algorithm using data on Australian job postings and show that it accurately predicts changes in job vacancies over a 4.5-year period. We also show that the algorithm is significantly more accurate than using raw counts of job postings to predict vacancies. The algorithm therefore offers a promising approach to the timely and reliable measurement of changes in vacancies.
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller fi... more Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
Information flow during catastrophic events is a critical aspect of disaster management. Modern c... more Information flow during catastrophic events is a critical aspect of disaster management. Modern communication platforms, in particular online social networks, provide an opportunity to study such flow and derive early-warning sensors, thus improving emergency preparedness and response. Performance of the social networks sensor method, based on topological and behavioral properties derived from the “friendship paradox”, is studied here for over 50 million Twitter messages posted before, during, and after Hurricane Sandy. We find that differences in users’ network centrality effectively translate into moderate awareness advantage (up to 26 hours); and that geo-location of users within or outside of the hurricane-affected area plays a significant role in determining the scale of such an advantage. Emotional response appears to be universal regardless of the position in the network topology, and displays characteristic, easily detectable patterns, opening a possibility to implement a simple “sentiment sensing” technique that can detect and locate disasters
Supporting materials for Bandit Strategies in Social Search: the case of the DARPA Red Balloon Ch... more Supporting materials for Bandit Strategies in Social Search: the case of the DARPA Red Balloon Challenge. (pdf)
China is the world's second largest economy. After four decades of economic miracles, China&#... more China is the world's second largest economy. After four decades of economic miracles, China's economy is transitioning into an advanced, knowledge-based economy. Yet, we still lack a detailed understanding of the skills that underly the Chinese labor force, and the development and spatial distribution of these skills. For example, the US standardized skill taxonomy O*NET played an important role in understanding the dynamics of manufacturing and knowledge-based work, as well as potential risks from automation and outsourcing. Here, we use Machine Learning techniques to bridge this gap, creating China's first workforce skill taxonomy, and map it to O*NET. This enables us to reveal workforce skill polarization into social-cognitive skills and sensory-physical skills, and to explore the China's regional inequality in light of workforce skills, and compare it to traditional metrics such as education. We build an online tool for the public and policy makers to explore the...
Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller... more Postgres database of followees (backup created by pg_dump in pgAdmin). Partitioned into 9 smaller files with 'split' command. To assemble use "cat followees[1-9] > combined_followees.backup
When facing threats from automation, a worker residing in a large Chinese city might not be as lu... more When facing threats from automation, a worker residing in a large Chinese city might not be as lucky as a worker in a large U.S. city, depending on the type of large city in which one resides. Empirical studies found that large U.S. cities exhibit resilience to automation impacts because of the increased occupational and skill specialization. However, in this study, we observe polarized responses in large Chinese cities to automation impacts. The polarization might be attributed to the elaborate master planning of the central government, through which cities are assigned with different industrial goals to achieve globally optimal economic success and, thus, a fast-growing economy. By dividing Chinese cities into two groups based on their administrative levels and premium resources allocated by the central government, we find that Chinese cities follow two distinct industrial development trajectories, one trajectory owning government support leads to a diversified industrial structur...
Immersive virtual- and augmented-reality headsets can overlay a flat image against any surface or... more Immersive virtual- and augmented-reality headsets can overlay a flat image against any surface or hang virtual objects in the space around the user. The technology is rapidly improving and may, in the long term, replace traditional flat panel displays in many situations. When displays are no longer intrinsically flat, how should we use the space around the user for abstract data visualisation? In this paper, we ask this question with respect to origin-destination flow data in a global geographic context. We report on the findings of three studies exploring different spatial encodings for flow maps. The first experiment focuses on different 2D and 3D encodings for flows on flat maps. We find that participants are significantly more accurate with raised flow paths whose height is proportional to flow distance but fastest with traditional straight line 2D flows. In our second and third experiment, we compared flat maps, 3D globes and a novel interactive design we call MapsLink, involvi...
Collective search for people and information has tremendously benefited from emerging communicati... more Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-critical tasks such as the DARPA Network Challenge (DNC, also known as the Red Balloon Challenge). However, while collective search often invests significant resources in encouraging the crowd to contribute new information, the effort invested in verifying this information is comparable, yet often neglected in crowdsourcing models. This paper studies how the exploration-verification trade-off displayed by the teams modulated their success in the DNC, as teams had limited human resources that they had to divide between recruitment (exploration) and verification (exploitation). Our analysis suggests that team performance in the DNC can be modelled as a modified multi-armed bandit (MAB) problem, where information arrives to the team originating from sources of different levels of ve...
Uploads
Papers by Haohui Chen