Job Postings Data
The flagship Canaria dataset: 907M deduplicated job postings aggregated from 15+ sources and enriched with 82 structured fields through our NLP enrichment pipeline. Sources include Indeed (226M), LinkedIn (176M), PJF (216M), SimplyHired (105M), 200,000+ ATS employer career portals, CareerBuilder, and more. Semantic deduplication removes 40-60% of cross-source duplicates using vector similarity, MinHash/Jaccard, and graph-based transitive matching. Every record includes SOC classification, seniority (100% complete), salary prediction (MAPE <15%), work mode detection, and skills extraction from a 37,000+-skill taxonomy.
All records are fully enriched through our NLP pipeline. Raw and enriched fields are delivered together for full transparency.
Key Highlights
- Multi-source semantic deduplication at 40-60% rate using vector similarity, MinHash/Jaccard, and graph-based transitive matching
- Full NLP enrichment pipeline producing 82 structured fields per record
- SOC classification using title + description context (>95% at 2-digit, 85-92% at 6-digit)
- Seniority classification: 100% complete (always returns a value)
- Work mode detection (remote, hybrid, on-site) extracted from description text
- Salary prediction (MAPE <15%) trained on 50M+ Glassdoor/Indeed observations
- Source composition: Indeed 226M, LinkedIn 176M, PJF 216M, SimplyHired 105M, 200K+ ATS portals, and more
Use Cases
- Market research and competitive intelligence
- Economic indicators and labor market signals
- Workforce planning and talent strategy
- AI/ML training data
- Academic research and longitudinal analysis
- Recruiting and staffing intelligence
Sample FieldsView full schema
jobTitlecompanyNamelocationdescriptiondatePostednormTitlesocsocTitlesenioritysalaryAvgAnnualnlpSkillsnlpSoftSkillsremoteemploymentsrcBaseDelivery Formats
See This Data Live
Interactive charts from our 900M+ deduplicated job postings, updated daily.
Sample Records
A preview of real records from this dataset. Unlock all fields by requesting a free sample.
| Job Title | Company | City | State | Seniority | Work Mode | Min Salary | Max Salary | SOC Code | SOC Title | +8 more |
|---|---|---|---|---|---|---|---|---|---|---|
| Senior Software Engineer | Mountain View | CA | Senior | Hybrid | 185,000 | 255,000 | 15-1252 | Software Developers | … | |
| Data Analyst | JPMorgan Chase | New York | NY | Mid | On-site | 95,000 | 130,000 | 15-2051 | Data Scientists | … |
| Product Manager | Meta | Menlo Park | CA | Senior | Hybrid | 172,000 | 240,000 | 11-2021 | Marketing Managers | ... |
| Registered Nurse | HCA Healthcare | Nashville | TN | Mid | On-site | 72,000 | 95,000 | 29-1141 | Registered Nurses | ... |
| DevOps Engineer | Datadog | Boston | MA | Mid | Remote | 140,000 | 185,000 | 15-1244 | Network Architects | ... |
Raw vs. Enriched
See how Canaria transforms a basic job posting into a fully enriched record with 82 structured fields.
Raw Data
What scrapers give you
No SOC codes, no salary prediction, no skills extraction,
no seniority, no deduplication...
Building enrichment in-house costs $500K-$1M Year 1
Canaria Enriched
82 fields per record
+ 62 more fields (location, dedup metadata, qualifications, clearance, travel...)
Relevant Solutions
Labor Market Data for Investment Professionals
Hiring velocity as an economic leading indicator with SOC-level granularity
Job Market Data for Competitive Intelligence
Competitor hiring patterns, skills trends, and geographic expansion signals
Job Market Data for HR Tech Platforms
Add salary benchmarking and skills intelligence to your platform without building ML
Job Market Training Data for AI & ML Teams
Pre-enriched, deduplicated job market training data. Skip 6 months of pipeline building.
Job Market Data for Workforce Planning
Enterprise-grade labor market intelligence with full data transparency, no vendor lock-in
Job Market Data for Academic Research
Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption
Job Market Data for Recruiting & Staffing
Know where hiring is heating up, what competitors pay, and which skills are pulling demand, before your clients ask.
Job Market Data for Consulting Firms
Project-ready labor market data with no annual contract required.
Job Market Data for DEI and ESG Analytics
Benchmark employer benefits across 286M+ records: 401K, PTO, health insurance, equity compensation, and 17 more benefit types.
Job Market Data for Healthcare Workforce Planning
Track clinical hiring pipelines with degree-level granularity: 338M+ degree requirement records across nursing, allied health, and physician roles.
More Datasets
Canaria delivers five integrated datasets that join cleanly with each other.
Company Profiles
28.5M company profiles with firmographics, hiring signals, and industry classification
Salary Data
AI-predicted salaries (MAPE <15%) plus parsed posted salaries and ~11M Glassdoor reports
Skills & Occupation Taxonomy
37,000+ skills, 3,000+ certifications, SOC codes, and normalized titles
Google Maps Business Data
52M detailed + 193M basic business records with ratings, reviews, and geocoordinates