Case Study: Automatically Extracting Skills from Job Descriptions and Resumes

For years, companies have used traditional resume parsers to extract job skills from candidates' resumes. These are typically built into applicant tracking systems (ATS) and are used to screen candidates and rank them based on their skills. However, these systems can be brittle, and the extracted skills are usually noisy and difficult to use. Recent improvements in natural language processing have made it possible to greatly improve the accuracy of job skill extraction. In this post, we'll show how to use Taylor's off-the-shelf job skill extraction model to get a list of canonical skills from a job description or resume. Then, we'll show how to do this with a custom skills taxonomy.

How Skill Extraction Works

When gathering information about a candidate's skills, the goal is usually to get a list of known skills that can be compared across candidates, or compared against the requirements of the job description. For this reason, merely extracting raw text from a resume or job description is not that useful on its own—a candidate might list "Node", while the job description lists "Node.js". This is why it's important to use a taxonomy of ground-truth skills, and ensure that a candidate's skills are mapped to the corresponding skills in the taxonomy.

Failed match between resume and job description

Mismatch between raw skills from résumé and job description

At Taylor, we have a couple of off-the-shelf job skill taxonomies—if you're just getting started, these are a great place to begin. Once you've settled on a taxonomy, the challenge is to a) reliably extract skills from a document, and b) find the right skills (if any) in the taxonomy.

Semantic and string matching

At Taylor, we use two approaches to surface possible matches. One is simple string-matching—if the raw text contains the ground-truth skill, or has a low edit distance, it's considered a candidate match. The other approach is to use a pre-trained embedding model to find the closest semantic matches. This allows us to surface "front-end development" as a candidate match for "React", for example. Finally, we use a ranking model to score the candidates, and return the top match, or "No match" if the extracted skill isn't in the taxonomy.

Testing and Deploying our Extraction Models

After creating an account (opens in a new tab), you can access the Extraction Playground, which allows you to test our off-the-shelf job skill extraction models. The two pre-trained models are job-skills-200, and job-skills-1000—choose based on your preferred level of granularity.

Navigating to the extraction playground

Selecting a model

After pasting in an excerpt to extract from a résumé or job description, click "Extract entities" to see the results in the right-hand panel. Results are accompanied by a confidence score, indicating how similar the raw text of the extracted skill is to the matched skill from the taxonomy.

Semantic and string matching

Extracted & resolved job skill

Once you're ready to put extraction into production, it's as simple as creating an API key and copying a few lines of Python or Javascript code into your application. When using our pre-trained models, make sure to prefix your chosen model with taylor: to indicate that you're using one of our models.

import requests
api_key = "xx-your-api-key-here"
 
res = requests.post(
"https://api.trytaylor.ai/api/entities/extract",
headers={"Authorization": f"Bearer {api_key}"},
json={
"pipeline_name": "taylor:job-skills-200",
"texts": [
"I have 5 years of experience with Node and am an experienced React and Next.js developer"
]
}
)
 
print(json.dumps(res.json(), indent=2))

The model will return a response like:

[
  [
    {
      "canonical_name": "Next.js",
      "matching_score": 1.0
    },
    {
      "canonical_name": "Node.js",
      "matching_score": 1.0
    },
    {
      "canonical_name": "React",
      "matching_score": 1.0
    }
  ]
]

Using a Custom Taxonomy

Off-the-shelf models are great to get started, but depending on the industries you're focused on, it's possible that you'll want to use a taxonomy with more coverage of the skills you're most interested in. For example, if you're focused on data science, you might want more depth on statistical modeling skills, and less focus on soft skills like "communication" and "teamwork". Luckily, creating an extraction model for a custom taxonomy takes seconds.

From the Dashboard, you'll select "Extractions" from the "Build" tab, and click "Create Extraction" at the top. To provide the taxonomy, you can upload a CSV file with a "label" column and a "label_description" column, like this:

label	label_description
Time Management	Ability to efficiently organize and prioritize tasks to meet deadlines
Communication	Skill in conveying information clearly and effectively, both verbally and in writing
Problem Solving	Capacity to analyze complex issues and develop innovative solutions
Teamwork	Ability to collaborate effectively with others to achieve common goals
Adaptability	Flexibility to adjust to new situations and changing work environments

Descriptions are optional (i.e. the second column can be blank), but for complex taxonomies, can be helpful for disambiguation. You'll also need to specify the sort of entity you want to extract—in this case, "job skill". It will take a few seconds to create the extraction, and when it's done, you can to back to the playground and use it just like the pre-trained models.

Conclusion

In this post, we discussed the importance of accurate skill extraction, and the challenges of resolving messy extracted skills to a ground-truth taxonomy. Thanks to Taylor's entity extraction solution, anyone can start extracting skills to enrich job descriptions and parse candidate résumés. To get started, create an account (opens in a new tab) and try out the extraction playground. Interested in a different extraction or classification use case? Talk to us (opens in a new tab) to see how we can help.

🔧 LLM Moderation with PromptGuard Y Combinator Backs Taylor