Economists & University Researchers
Job Market Data for Academic Research
900M+ unique postings from 2022 to present, all 82 enriched fields retained for reproducibility
Longitudinal dataset for labor economics: wage dynamics, skill demand, remote work adoption
The Canaria dataset enables empirical analysis across multiple labor economics domains: wage dynamics, skill requirements, remote work trends, and firm-level hiring behavior. All enrichment methodology is fully documented and published for peer review, and every raw, parsed, and enriched field is retained so results are reproducible.
Common Challenges
✕Commercial labor market datasets are black boxes with no published methodology, making peer review impossible
✕Raw scraped data requires months of cleaning before analysis: deduplication, location parsing, and salary normalization
✕Historical data with sufficient coverage for longitudinal studies is either unavailable or prohibitively expensive
✕SOC codes in most commercial sources use title-only matching, which misclassifies roles with ambiguous titles
How Canaria Helps
- ✓Full methodological documentation published as a PDF covering deduplication, SOC classification, and salary prediction
- ✓All raw, parsed, and enriched fields retained in delivery for auditing and reproducibility checks
- ✓Historical archive from 2022 to present with daily updates, suitable for longitudinal and panel studies
- ✓Academic pricing available: 50% off standard tiers with .edu verification
Example Use Cases
- 1Analyze wage dynamics by SOC code and MSA using stated and predicted salary fields from 2022 to present
- 2Study remote work adoption trends using work mode classification across 900M+ postings over three years
- 3Train NLP models for job description understanding using 900M+ records with SOC and skills ground truth labels
Relevant Data Fields
correctDatesocsalaryAvgAnnualnlpSkillsremoteseniorityfinalStatecompanyIndustryThese are a subset of the 82 fields available in every Canaria record.