Skills & Occupation Taxonomy
Comprehensive skills and occupation taxonomy extracted from 900M+ job postings. Includes 37,000+ technical skills, 3,000+ professional certifications, and 400+ soft skills identified through a two-step process: dictionary matching followed by NLP relevance filtering to remove spurious matches. Coverage exceeds >85% for descriptions over 200 characters, with 5-15 skills per posting on average (2023+). SOC classification uses title + description context (not title-only matching), achieving >95% at 2-digit and 85-92% at 6-digit level. Taxonomy match rate >90%.
All taxonomy fields are derived from our NLP enrichment pipeline. Skills, SOC codes, seniority, and normalized titles are available on every job posting record.
Key Highlights
- 37,000+ skills, 3,000+ certifications, 400+ soft skills. One of the largest commercial skills taxonomies available
- Two-step extraction: dictionary matching + NLP relevance filtering removes false positives
- Separate hard skills, soft skills, and certification fields for clean downstream use
- SOC classification using title + description context: >95% at 2-digit, 85-92% at 6-digit
- F1 score: 85-92% on bulleted/structured sections, 65-78% on narrative text
- Taxonomy match rate >90%, continuously updated as new postings are processed
Use Cases
- Skills gap analysis and workforce development
- Curriculum alignment for educational institutions
- Talent matching and job recommendation engines
- Emerging skills detection and trend forecasting
Sample FieldsView full schema
nlpSkillsnlpSoftSkillsnlpCertificationsnlpQualificationsnlpSocCodenlpSocTitlenlpNormalizedTitlenlpNormalizedTitleScorenlpSeniorityDelivery Formats
See This Data Live
Interactive charts from our 900M+ deduplicated job postings, updated daily.
Sample Records
A preview of real records from this dataset. Unlock all fields by requesting a free sample.
| Job Title | Company | City | State | Seniority | Work Mode | Min Salary | Max Salary | SOC Code | SOC Title | +6 more |
|---|---|---|---|---|---|---|---|---|---|---|
| Full Stack Developer | Shopify | Ottawa | ON | Mid | Remote | 120,000 | 165,000 | 15-1252 | Software Developers | … |
| Cybersecurity Analyst | CrowdStrike | Austin | TX | Mid | Hybrid | 105,000 | 145,000 | 15-1212 | Information Security Analysts | … |
| Data Engineer | Snowflake | San Mateo | CA | Senior | Hybrid | 165,000 | 225,000 | 15-1252 | Software Developers | ... |
| Cloud Architect | Accenture | Atlanta | GA | Lead | Remote | 175,000 | 240,000 | 15-1244 | Network Architects | ... |
| AI Research Scientist | OpenAI | San Francisco | CA | Senior | On-site | 250,000 | 400,000 | 15-2051 | Data Scientists | ... |
Relevant Solutions
Job Market Data for HR Tech Platforms
Add salary benchmarking and skills intelligence to your platform without building ML
Job Market Training Data for AI & ML Teams
Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.
Job Market Data for Academic Research
Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption
Job Market Data for Consulting Firms
Project-ready labor market data with no annual contract required.
Job Market Data for Healthcare Workforce Planning
Track clinical hiring pipelines with degree-level granularity: 338M+ degree requirement records across nursing, allied health, and physician roles.
More Datasets
Canaria delivers five integrated datasets that join cleanly with each other.
Job Postings Data
900M+ deduplicated job postings from Indeed, LinkedIn, ATS, and 15+ sources
Company Profiles
28.5M company profiles with firmographics, hiring signals, and industry classification
Salary Data
AI-predicted salaries (MAPE <15%) plus parsed posted salaries and ~11M Glassdoor reports
Google Maps Business Data
52M detailed + 193M basic business records with ratings, reviews, and geocoordinates