Data Schema Explorer
Browse field-level schemas for all five Canaria data products. Select a product below to explore its columns, types, and coverage.
For accuracy benchmarks and coverage rates on every enriched field, see our data quality benchmarks. For definitions of terms like SOC, MAPE, and semantic deduplication, see the glossary.
Job Postings: 82 fields, 907M rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| jobId | FixedString(64) | Primary key. SHA256 hash for deduplication. Unique per posting. | cj_9f8a2b3c4d... | 100% |
| jobUrl | String | Link to full job description page. May expire when job removed. | https://indeed.com/viewjob?jk=abc123 | 100% |
| sourceWebsite | String | Job board identifier (indeed, linkedin, glassdoor, greenhouse, lever, workday, etc.) | indeed | 100% |
| jobTitle | String | Raw job title exactly as posted by employer. | Sr. SW Eng II - Backend | 99% |
| jobDescription | String | Full job description HTML/text from the posting source. | <p>We are looking for a...</p> | 97% |
| companyName | String | Company name as listed in the posting, before normalization. | Macys Inc. | 96% |
| scrapedLocation | String | Location string as displayed in the original posting. | New York, NY 10001 | 93% |
| scrapedSalary | String | Salary text verbatim from the posting (when provided). | $90,000 - $120,000 a year | ~35% |
| jobDate | Date | Date the posting was first observed on the source. | 2025-11-15 | 98% |
| jobKey | String | Source-specific job ID from original board (Indeed ID, LinkedIn ID, etc.) | jk_abc123def456 | 100% |
| sourceCountry | String | Country code for regional job board version (us, ca, uk, etc.) | us | 100% |
| jobFunction | String | Job function/category assigned by job board (Engineering, Sales, etc.) | Engineering | ~60% |
| department | String | Organizational unit (Engineering, Sales, HR, etc.) | Engineering | ~30% |
| companyProfileUrl | String | Link to company profile on the job board. | https://indeed.com/cmp/Acme-Corp | ~70% |
| scrapedSeniority | String | Seniority from posting metadata (Entry Level, Senior, etc.) | Senior | ~25% |
| scrapedEmployment | String | Employment type from metadata (Full-time, Part-time, Contract). | Full-time | ~45% |
| scrapedBenefits | String | Benefits from structured posting fields. | Health, Dental, Vision, 401k | ~20% |
| scrapedResponsibilities | String | Job responsibilities if provided as structured field. | Design and implement backend services... | ~15% |
| scrapedQualifications | String | Required qualifications if provided as structured field. | Bachelor's degree in Computer Science... | ~15% |
| city | String | Parsed city name, standardized. | New York | 91% |
| state | String | Two-letter US state code. | NY | 93% |
| zip_code | String | 5-digit US ZIP code, parsed or geocoded. | 10001 | 78% |
| county | String | US county name derived from geocoding. | New York County | 76% |
| cbsa_code | String | Core-Based Statistical Area code (metro/micro area). | 35620 | 74% |
| latitude | Float64 | Latitude coordinate of the posting location. | 40.7128 | 75% |
| longitude | Float64 | Longitude coordinate of the posting location. | -74.0060 | 75% |
| finalCity | String | Best-available city (priority: parsed > scraped > calc). | New York | 91% |
| finalState | String | Best-available state. | NY | 93% |
| finalZipcode | String | Best-available zipcode. | 10001 | 78% |
| finalCountry | String | Best-available country. | US | 92.5% |
| parsedCity | String | City extracted by NLP parser. | New York | ~88% |
| parsedState | String | State extracted by NLP parser. | NY | ~90% |
| parsedCountry | String | Country extracted by NLP parser. | US | ~90% |
| calcCity | String | City from geocoding service. | New York | ~75% |
| calcState | String | State from geocoding. | NY | ~78% |
| nlpNormalizedTitle | String | Standardized job title ("SW Eng II" -> "Software Engineer"). | Software Engineer | ~100% |
| nlpNormalizedTitleScore | Float32 | Title normalization confidence (0.0-1.0). | 0.94 | ~100% |
| nlpSocCode | String | Standard Occupational Classification code (6-digit, BLS 2018). | 15-1252 | 85-90% |
| nlpSocTitle | String | Official BLS occupation title for the SOC code. | Software Developers | 85-90% |
| nlpSeniority | String | ML-classified seniority level. | Senior | 85-95% |
| nlpEmployment | String | ML-classified employment type. | Full-time | 85-90% |
| nlpRemote | String | ML-classified work mode. | Remote | 85%+ (2023+) |
| postingLanguage | String | Detected language of posting (ISO 639-1). | en | 95%+ |
| parsedAnnualSalaryMin | Float32 | Lower bound, normalized to annual USD. | 90000.0 | ~35% (stated) |
| parsedAnnualSalaryAvg | Float32 | Midpoint (min+max)/2 in annual USD. | 105000.0 | ~35% (stated) |
| parsedAnnualSalaryMax | Float32 | Upper bound, normalized to annual USD. | 120000.0 | ~35% (stated) |
| nlpSalary | Float32 | AI-predicted annual salary (MAPE <15%). | 108500.0 | 85-95% (2023+) |
| nlpDescriptionLength | UInt32 | Description character count (quality signal). | 4250 | 97% |
| nlpSkills | Array(String) | Technical/hard skills extracted from description. | ["Python", "SQL", "AWS", "Docker"] | 80-93% |
| nlpSoftSkills | Array(String) | Interpersonal and behavioral skills. | ["Communication", "Leadership", "Teamwork"] | 70-85% |
| nlpCertifications | Array(String) | Professional certifications required or preferred. | ["PMP", "AWS Solutions Architect", "CPA"] | ~30% |
| nlpDegreeLevels | Array(String) | Education degrees mentioned in posting. | ["Bachelor's", "Master's"] | ~60% |
| nlpDegreeLevelMin | String | Minimum acceptable degree for the position. | Bachelor's | ~55% |
| nlpQualifications | Array(String) | Other qualification requirements. | ["5+ years experience", "US citizen"] | ~50% |
| nlpExperienceRequirements | Array(String) | Years/type of experience extracted. | ["3-5 years software development"] | ~45% |
| nlpBenefits | Array(String) | Benefits extracted from description text. | ["401k", "Health Insurance", "PTO", "Dental"] | ~65% |
| scrapedBenefits | String | Benefits from structured posting fields (when source provides). | Health, Dental, Vision, 401k | ~20% |
| nlpOffersVisaSponsorship | Boolean | Does employer offer visa sponsorship? Critical for international candidates. | true | ~15% |
| nlpRequiresClearance | Boolean | Does job require security clearance? Defense/gov sector filtering. | true | ~8% |
| nlpClearanceLevels | Array(String) | Specific clearance levels required. | ["Secret", "Top Secret"] | ~5% |
| nlpCitizenshipRequired | Array(String) | Citizenship/authorization requirements. | ["US Citizen", "Green Card"] | ~10% |
| nlpOffersEquity | Boolean | Does compensation include equity/stock? Startup indicator. | true | ~12% |
| nlpRequiresTravel | Boolean | Does the job require travel? | true | ~20% |
| nlpTravelPercentages | Array(String) | Quantified travel requirements. | ["25%", "50%"] | ~10% |
| nlpIsShiftWork | Boolean | Is this a shift-based position? | false | ~8% |
| nlpShiftTypes | Array(String) | Specific shift types when applicable. | ["Night", "Weekend", "Rotating"] | ~5% |
| nlpLanguagesRequired | Array(String) | Non-English language requirements. | ["Spanish", "Mandarin"] | ~8% |
| nlpIsManagerialRole | Boolean | Is this a people management position? | true | ~90% |
| nlpIsUrgentHiring | Boolean | Time-sensitive hire signal (market demand indicator). | true | ~5% |
| nlpNumberOfOpenings | Array(String) | How many positions available (hiring volume signal). | ["3"] | ~10% |
| nlpTeamSizes | Array(String) | Size of team this role joins (organizational context). | ["15-20"] | ~8% |
| nlpExpectedStartDates | Array(String) | When role is expected to start (hiring timeline). | ["2025-01-15"] | ~5% |
| companyIndustry | String | Industry classification. Company database value preferred, posting value as fallback. | Technology | ~85% |
| companySize | String | Employee count range. | 1001-5000 | ~70% |
| companyHqLocation | String | Headquarters location. | San Francisco, CA | ~65% |
| companyFoundedYear | Date | Company founding date. | 2010 | ~55% |
| companyRevenue | String | Revenue range. | $100M-$500M | ~50% |
| companyType | String | Organization type. | Private | ~60% |
| companyOfficeLocations | Array(String) | Office locations array. | ["SF", "NYC", "Austin"] | ~40% |
| contentId | FixedString(64) | Content hash for cross-stage joins. SHA256(norm(title) + norm(company) + raw(desc)). | a1b2c3d4e5... | 92% |
| firstScrapedTime | DateTime64 | First time posting was detected by scraper (UTC). | 2025-10-01T08:30:00Z | 100% |
| lastScrapedTime | DateTime64 | Most recent observation as active (UTC). Duration = active period. | 2025-11-15T14:22:00Z | 100% |
| firstModifyTime | DateTime64 | Earliest enrichment timestamp across all processing stages. | 2025-10-02T03:00:00Z | ~95% |
Company Profiles: 36 fields, 28.5M rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| companyProfileUrlId | String | Primary key. Unique identifier derived from profile URL. | linkedin_acme-corp | 100% |
| companyNameId | String | Normalized company name identifier for joins. | acme_corp | 100% |
| companyName | String | Display name of the company. | Acme Corporation | 100% |
| companyProfileUrl | String | Full URL to the company profile page. | https://linkedin.com/company/acme-corp | 100% |
| companyProfileName | String | Profile slug or vanity name from the source. | acme-corp | 93% |
| companyKey | String | Internal key used for cross-table joins. | ck_acme_corp_123 | 89% |
| companyHeadquarters | String | Headquarters city and state/country. | San Francisco, CA | 77% |
| country | String | Country code of headquarters. | US | 81% |
| companyIndustry | String | Primary industry classification. | Technology | 84% |
| companySize | String | Employee count range bucket. | 1001-5000 | 83% |
| companyFoundDate | DateTime64 | Date the company was founded. | 2005-06-01 | 33% |
| companyWebsite | String | Company website URL. | https://acme.com | 68% |
| companyShortDesc | String | Short tagline or summary description. | Enterprise cloud infrastructure | 84% |
| companyLogoUrl | String | URL to company logo image. | https://media.licdn.com/dms/image/... | 87% |
| companyLocations | Array(String) | Array of all office locations. | ["San Francisco, CA", "New York, NY", "Austin, TX"] | 100% |
| companyAffiliatedPages | String | Affiliated company pages or subsidiaries. | Acme Labs, Acme Cloud | 28% |
| companyType | String | Organization type (Public, Private, Nonprofit, etc.). | Privately Held | 49% |
| companyEmployeeCount | UInt32 | Exact employee count when available. | 3500 | 82% |
| companySpecialties | String | Comma-separated list of company specialties. | Cloud Computing, AI, DevOps | 26% |
| companyFollowerCount | UInt32 | Number of followers on the profile platform. | 85000 | 80% |
| companyEmployees | String | JSON blob of featured employee profiles. | [{"name": "Jane Smith", "title": "CEO"}] | 75% |
| companySimilarPages | String | Similar company profiles suggested by LinkedIn. | Globex Corp, Initech | 66% |
| companyUpdates | String | Recent company posts or updates from the profile. | [{"text": "We're hiring!", "date": "2025-11-01"}] | 39% |
| companyDesc | String | Full company description / about text. | Acme Corporation is a leading provider of... | 14% |
| companyCeo | String | Name of the company CEO or top executive. | Jane Smith | 1% |
| companyRevenue | String | Revenue range bucket. | $1B-$5B | 6% |
| companyIndustryUrl | String | URL slug for the industry category on source platform. | /cmp/_industry/technology | 52% |
| companyRating | Float64 | Average employer rating (1.0-5.0 scale). | 4.2 | 29% |
| companyReviewCount | UInt32 | Total number of employer reviews. | 1250 | 29% |
| dbInsertTimestamp | DateTime | Timestamp when the record was inserted into ClickHouse. | 2025-11-01T12:00:00Z | 100% |
| firstScrapedTimestamp | DateTime | First time this company profile was scraped. | 2024-06-15T08:00:00Z | 100% |
| lastScrapedTimestamp | DateTime | Most recent scrape of this company profile. | 2025-11-15T14:00:00Z | 100% |
| lastModifiedTimestamp | DateTime | Last time any field in this record was updated. | 2025-11-10T09:30:00Z | 100% |
| companyScrapeStatus | String | Current scraping status (active, archived, error). | active | >99% |
| src | String | Source platform identifier. | 100% | |
| raw | JSON | Raw JSON blob from the original scrape for debugging. | {"raw_html": "..."} | 100% |
Google Maps: 44 fields, 52M detailed + 193M fast rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| id | String | Primary key. Unique place identifier. | 0x808fcb5... | 100% |
| cid | String | Google CID (customer ID) for the place. | 12345678901234567 | 100% |
| title | String | Business name as displayed on Google Maps. | Starbucks | 100% |
| unique_key | String | Deduplication key derived from name + location. | starbucks_sf_94105 | 100% |
| data_id | String | Google internal data identifier for the listing. | 0x808fcb5:0x1234abcd | 100% |
| address | String | Short-form address as shown on the listing. | 123 Market St | ~95% |
| complete_address | String | Full formatted address including city, state, zip. | 123 Market St, San Francisco, CA 94105 | ~90% |
| street | String | Street name and number. | 123 Market St | ~85% |
| city | String | City name. | San Francisco | ~95% |
| state | String | State or province abbreviation. | CA | ~95% |
| postal_code | String | ZIP or postal code. | 94105 | ~85% |
| country | String | Country code. | US | ~98% |
| latitude | Float64 | Latitude coordinate from Google Maps. | 37.7749 | ~99% |
| longitude | Float64 | Longitude coordinate from Google Maps. | -122.4194 | ~99% |
| plus_code | String | Google Plus Code for precise location. | 849VCWC8+R9 | ~80% |
| timezone | String | IANA timezone for the location. | America/Los_Angeles | ~75% |
| category | String | Primary Google Maps business category. | Coffee shop | ~95% |
| categories | Array(String) | All business categories assigned by Google. | ["Coffee shop", "Cafe", "Breakfast restaurant"] | ~90% |
| status | String | Operational status of the business. | Operational | ~85% |
| description | String | Business description from the listing. | Premium coffee and handcrafted beverages... | ~50% |
| about | String | About section with structured attributes (JSON). | {"Service options": ["Dine-in", "Takeout"]} | ~45% |
| price_range | String | Price level indicator. | $$ | ~60% |
| owner | String | Business owner or operator name. | Starbucks Corporation | ~40% |
| review_count | UInt32 | Total number of Google reviews. | 342 | ~90% |
| review_rating | Float32 | Average star rating (1.0-5.0). | 4.3 | ~90% |
| reviews_link | String | Direct link to the Google reviews page. | https://search.google.com/local/reviews?placeid=... | ~85% |
| reviews_per_rating | String | JSON breakdown of reviews by star count. | {"5": 180, "4": 90, "3": 40, "2": 20, "1": 12} | ~80% |
| popular_times | String | Hourly busyness data by day of week (JSON). | {"Monday": [{"hour": 8, "busyness": 30}, ...]} | ~35% |
| user_reviews | String | Sample of recent user reviews (JSON array). | [{"rating": 5, "text": "Great coffee!"}] | ~70% |
| user_reviews_extended | String | Extended review data with metadata (author, date, response). | [{"author": "John", "date": "2025-10", ...}] | ~50% |
| phone | String | Business phone number. | +1 (415) 555-0123 | ~75% |
| web_site | String | Business website URL. | https://www.starbucks.com/store-locator/... | ~65% |
| emails | Array(String) | Email addresses found on listing or website. | ["info@business.com"] | ~20% |
| thumbnail | String | URL to the listing thumbnail image. | https://lh5.googleusercontent.com/p/... | ~85% |
| images | String | JSON array of photo URLs from the listing. | ["https://lh5.googleusercontent.com/p/..."] | ~75% |
| url | String | Google Maps URL for this listing. | https://www.google.com/maps/place/... | ~98% |
| link | String | Short Google Maps link. | https://maps.google.com/?cid=12345... | ~98% |
| reservations | String | Reservation links or availability. | https://www.opentable.com/... | ~15% |
| order_online | String | Online ordering links. | https://order.starbucks.com/... | ~25% |
| menu | String | Menu link or structured menu data. | https://www.starbucks.com/menu | ~30% |
| open_hours | String | Operating hours by day of week (JSON). | {"Monday": "6:00 AM - 8:00 PM", ...} | ~70% |
| detailed_scraped_at | DateTime | Timestamp of the detailed scrape. | 2025-11-01T12:00:00Z | 100% |
| ch_insertion_time | DateTime64 | ClickHouse insertion timestamp. | 2025-11-01T12:05:00Z | 100% |
Salary Data: 35 fields, 907M job postings (predicted) + 11M salary observations rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| parsedAnnualSalaryMin | Float32 | Lower bound of stated salary, normalized to annual USD. | 90000.0 | ~35% (stated) |
| parsedAnnualSalaryAvg | Float32 | Midpoint (min+max)/2 in annual USD. | 105000.0 | ~35% (stated) |
| parsedAnnualSalaryMax | Float32 | Upper bound of stated salary, normalized to annual USD. | 120000.0 | ~35% (stated) |
| nlpSalary | Float32 | AI-predicted annual salary trained on 50M+ observations (MAPE <15%). | 108500.0 | 85-95% (2023+) |
| scrapedSalary | String | Raw salary text verbatim from the posting. | $90,000 - $120,000/yr | ~35% |
| company_code | String | Glassdoor internal company identifier. | E1234 | 100% |
| company_name | String | Company name as listed on Glassdoor. | 100% | |
| job_title | String | Job title for this salary submission. | Software Engineer | 100% |
| location_raw | String | Raw location string from the salary submission. | San Francisco, CA | ~95% |
| city | String | Parsed city name. | San Francisco | ~90% |
| state | String | Parsed state abbreviation. | CA | ~90% |
| metro | String | Metro area designation. | San Francisco-Oakland-Berkeley, CA | ~85% |
| country | String | Country code. | US | ~98% |
| submitted_date | String | Date the salary was submitted by the employee. | 2025-03-15 | ~95% |
| scrape_time | DateTime64 | Timestamp when this record was scraped. | 2025-11-01T08:00:00Z | 100% |
| years_of_exp | String | Years of experience reported by the submitter. | 5-7 | ~60% |
| pay_json | String | Full pay breakdown as JSON (base, bonus, stock, etc.). | {"base": 150000, "bonus": 20000, "stock": 50000} | ~90% |
| total_pay_raw | String | Total compensation as reported (text). | $220,000 | ~85% |
| base_additional_raw | String | Base + additional pay text from Glassdoor. | $150K base + $70K additional | ~80% |
| base_pay | Float64 | Parsed base pay in USD. | 150000.0 | ~90% |
| additional_pay | Float64 | Additional pay (bonus, stock, commission) in USD. | 70000.0 | ~75% |
| anonymity_min | Float64 | Glassdoor anonymity range lower bound. | 140000.0 | ~80% |
| anonymity_max | Float64 | Glassdoor anonymity range upper bound. | 160000.0 | ~80% |
| pay_period | String | Pay frequency (Annual, Monthly, Hourly). | Annual | ~95% |
| currency_code | String | ISO currency code. | USD | ~98% |
| salary_min_annual | Float64 | Minimum annual salary (normalized from pay_period). | 140000.0 | ~85% |
| salary_max_annual | Float64 | Maximum annual salary (normalized from pay_period). | 180000.0 | ~85% |
| salary_avg_annual | Float64 | Average annual salary (midpoint of min/max). | 160000.0 | ~85% |
| source | String | Data source identifier. | glassdoor | 100% |
| source_file_type | String | File format of the source data. | json | 100% |
| source_file | String | Original source file path for traceability. | glassdoor_salaries_2025_q4.json | 100% |
| salary_detailed_url | String | URL to the Glassdoor salary detail page. | https://glassdoor.com/Salary/Google-Software-Engineer-... | ~90% |
| submitted_count | UInt32 | Total salary submissions for this job title. | 1250 | ~95% |
| submitted_count_company | UInt32 | Salary submissions for this title at this company. | 85 | ~90% |
| salary_general_url | String | URL to Glassdoor general salary page for this title. | https://glassdoor.com/Salaries/software-engineer-salary-... | ~95% |
Skills & Taxonomy: 30 fields, 907M (enrichment layer on job postings) rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| nlpSkills | Array(String) | Technical/hard skills extracted from job description. | ["Python", "SQL", "AWS", "Docker"] | 80-93% |
| nlpSoftSkills | Array(String) | Interpersonal and behavioral skills. | ["Communication", "Leadership", "Teamwork"] | 70-85% |
| nlpCertifications | Array(String) | Professional certifications required or preferred. | ["PMP", "AWS Solutions Architect", "CPA"] | ~30% |
| nlpQualifications | Array(String) | Other qualification requirements extracted from text. | ["5+ years experience", "US citizen"] | ~50% |
| nlpExperienceRequirements | Array(String) | Years and type of experience requirements. | ["3-5 years software development"] | ~45% |
| nlpDegreeLevels | Array(String) | Education degrees mentioned in posting. | ["Bachelor's", "Master's"] | ~60% |
| nlpDegreeLevelMin | String | Minimum acceptable degree for the position. | Bachelor's | ~55% |
| nlpBenefits | Array(String) | Benefits extracted from description text. | ["401k", "Health Insurance", "PTO"] | ~65% |
| nlpSocCode | String | Standard Occupational Classification code (6-digit, BLS 2018). | 15-1252 | 85-90% |
| nlpSocTitle | String | Official BLS occupation title for the SOC code. | Software Developers | 85-90% |
| nlpSeniority | String | ML-classified seniority level (100% complete). | Senior | 85-95% |
| nlpEmployment | String | ML-classified employment type. | Full-time | 85-90% |
| nlpRemote | String | ML-classified work mode (Remote, Hybrid, On-site). | Remote | 85%+ (2023+) |
| nlpNormalizedTitle | String | Standardized job title via NLP normalization. | Software Engineer | ~100% |
| nlpNormalizedTitleScore | Float32 | Title normalization confidence score (0.0-1.0). | 0.94 | ~100% |
| nlpIsManagerialRole | Boolean | Is this a people management position? | true | ~90% |
| nlpIsUrgentHiring | Boolean | Time-sensitive hire signal (market demand indicator). | true | ~5% |
| nlpNumberOfOpenings | Array(String) | How many positions available (hiring volume signal). | ["3"] | ~10% |
| nlpTeamSizes | Array(String) | Size of team this role joins. | ["15-20"] | ~8% |
| nlpExpectedStartDates | Array(String) | When role is expected to start. | ["2025-01-15"] | ~5% |
| nlpOffersVisaSponsorship | Boolean | Does employer offer visa sponsorship? | true | ~15% |
| nlpRequiresClearance | Boolean | Does job require security clearance? | true | ~8% |
| nlpClearanceLevels | Array(String) | Specific clearance levels required. | ["Secret", "Top Secret"] | ~5% |
| nlpCitizenshipRequired | Array(String) | Citizenship/authorization requirements. | ["US Citizen", "Green Card"] | ~10% |
| nlpOffersEquity | Boolean | Does compensation include equity/stock? | true | ~12% |
| nlpRequiresTravel | Boolean | Does the job require travel? | true | ~20% |
| nlpTravelPercentages | Array(String) | Quantified travel requirements. | ["25%", "50%"] | ~10% |
| nlpIsShiftWork | Boolean | Is this a shift-based position? | false | ~8% |
| nlpShiftTypes | Array(String) | Specific shift types when applicable. | ["Night", "Weekend", "Rotating"] | ~5% |
| nlpLanguagesRequired | Array(String) | Non-English language requirements. | ["Spanish", "Mandarin"] | ~8% |
Loading schema explorer...