Data Schema Explorer
Browse field-level schemas for all five Canaria data products. Select a product below to explore its columns, types, and coverage.
For accuracy benchmarks and coverage rates on every enriched field, see our data quality benchmarks. For definitions of terms like SOC, MAPE, and semantic deduplication, see the glossary.
Job Postings: 100 fields, 1B+ rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| jobID | FixedString(64) | Primary key. Per-posting SHA-256 key. Changes if the employer edits the title. | 9f8a2b3c4d... | 100% |
| sourceJobKey | String | Raw source-side posting ID (Indeed / LinkedIn / Greenhouse). Title-edit-stable; pair with sourceWebsite for stable joins. | jk_abc123def456 | 100% |
| sourceWebsite | String | Source the posting was collected from (indeed, linkedin, greenhouse, lever, workday, etc.). | indeed | 100% |
| sourceRegion | String | Geographic region tag from the source (country-level shard for global aggregators). | us | 100% |
| jobDate | DateTime64 | Canonical posting date for the role (UTC). Source-published timestamp when available, otherwise first-seen. | 2025-11-15 | ~98% |
| firstSeenDate | DateTime64 | First time the posting was observed in our pipeline (UTC). | 2025-10-01T08:30:00Z | 100% |
| lastSeenDate | DateTime64 | Most recent time the posting was observed active (UTC). Symmetric with firstSeenDate. | 2025-11-15T14:22:00Z | 100% |
| contentID | FixedString(64) | Content-version hash over title + company + description. Join key for the job-description companion table. | a1b2c3d4e5... | ~97% |
| dedupJobID | FixedString(64) | Conceptual dedup / cluster key. Same value across near-duplicate postings of one role. Part of the AI-deduplication layer. | 7c2f0e9a11... | ~97% |
| jobTitle | String | Raw posting title exactly as published by the employer. | Sr. SW Eng II - Backend | 99% |
| jobURL | String | Direct link to the posting on the source site. | https://indeed.com/viewjob?jk=abc123 | 100% |
| atsApplicationURL | String | Link to the employer's ATS application form (Greenhouse, Lever, Workday) when the posting links out to one. | https://boards.greenhouse.io/acme/jobs/123 | ~70% |
| jobDescription | String | Full cleaned description text. Delivered as a separate companion table joined on contentID, not stored in the wide table. | We are looking for a... | ~97% |
| nlpPostingLanguage | String | Detected language of the posting (ISO 639-1). | en | 95%+ |
| extractedEmails | Array(String) | Email addresses found in the description text. | ["careers@acme.com"] | ~10% |
| extractedPhones | Array(String) | Phone numbers found in the description text. | ["+1-415-555-0100"] | ~8% |
| extractedURLs | Array(String) | URLs mentioned in the description text. Distinct from the posting's own jobURL. | ["https://acme.com/careers"] | ~25% |
| scrapedLocation | String | Location string exactly as published, before normalization. | New York, NY 10001 | 93% |
| finalCity | String | Resolved-priority city in normalized form. | New York | 91% |
| finalState | String | Resolved state or region. US postings carry the 2-letter code. | NY | 93% |
| finalCountry | String | Resolved ISO 3166-1 country, UPPERCASE. | US | ~92% |
| finalZipcode | String | Resolved postal code when the source publishes one at posting level. | 10001 | 78% |
| scrapedSalary | String | Salary text verbatim from the posting, before normalization. | $90,000 - $120,000 a year | ~35% |
| annualSalaryMinimum | Float32 | Lower bound of the salary range, annualized. Values above $5M annual dropped to NULL. | 90000.0 | ~35% (stated) |
| annualSalaryAverage | Float32 | Midpoint of the salary range, annualized. | 105000.0 | ~35% (stated) |
| annualSalaryMaximum | Float32 | Upper bound of the salary range, annualized. | 120000.0 | ~35% (stated) |
| salaryCurrencySymbol | String | Raw currency glyph as it appeared on the posting. | $ | ~35% |
| salaryCurrencyCode | String | ISO 4217 currency code. Use for filtering and conversion. | USD | ~35% |
| salaryCurrencyDetectionSource | String | How the currency was determined (explicit symbol, ISO code, country, or fallback). | symbol | ~35% |
| nlpIsPaid | UInt8 | 1 for paid roles, 0 for unpaid (internships, volunteer), NULL when no signal. | 1 | ~90% |
| nlpBenefits | Array(String) | Benefits mentioned in the description text. | ["401k", "Health Insurance", "PTO"] | ~65% |
| mentionsEquityCompensation | UInt8 | 1 if the description mentions equity / RSU / stock compensation. | 1 | ~12% |
| nlpNormalizedTitle | String | Canonical job title from our taxonomy ("SW Eng II" -> "Software Engineer"). | Software Engineer | ~100% |
| nlpNormalizedTitleConfidence | Float32 | Model confidence for nlpNormalizedTitle (0.0-1.0). | 0.94 | ~100% |
| nlpSocCode | String | US SOC 2018 6-digit occupation code, classified from title and description. | 15-1252 | 85-90% |
| nlpSocTitle | String | Short label for the SOC code. | Software Developers | 85-90% |
| nlpSocConfidence | Float32 | Model confidence for nlpSocCode (0.0-1.0). | 0.91 | 85-90% |
| nlpSeniority | String | Seniority band classified from title and description. | Senior | 85-95% |
| nlpSeniorityTrack | String | Career track: ic, manager, or mixed. Distinguishes IC from manager at the same band. | ic | ~85% |
| nlpSeniorityConfidence | Float32 | Model confidence for nlpSeniority and nlpSeniorityTrack (0.0-1.0). | 0.88 | ~85% |
| mentionsManagerialRole | UInt8 | 1 if the description language indicates a people-management role. | 1 | ~90% |
| nlpEmploymentStatus | String | Employment status classification (full-time, part-time, contract, internship, temporary). | full-time | 85-90% |
| nlpEmploymentConfidence | Float32 | Model confidence for nlpEmploymentStatus (0.0-1.0). | 0.90 | 85-90% |
| nlpEmploymentTypeMentions | Array(String) | Employment types mentioned in the text. Verbatim mentions; nlpEmploymentStatus is the single classified value. | ["full-time", "contract"] | ~70% |
| nlpWorkerClassification | String | Staffing-market worker classification: w2, 1099, c2c, statutory_employee, or unknown. | w2 | ~80% |
| nlpRemoteStatus | String | Remote-work classification: Remote, Hybrid, On-Site, or Unknown. | Remote | 85%+ (2023+) |
| nlpRemoteConfidence | Float32 | Model confidence for nlpRemoteStatus (0.0-1.0). | 0.93 | 85%+ (2023+) |
| nlpHybridDaysPerWeek | Float32 | In-office days per week for hybrid roles. Populated only when nlpRemoteStatus = Hybrid. | 3 | ~10% |
| nlpRemoteWorkMentions | Array(String) | Remote-work phrases extracted from the text. | ["fully remote", "work from home"] | ~40% |
| mentionsRemoteWork | UInt8 | 1 if the description text mentions remote work. | 1 | ~40% |
| parsedRemotePercentages | Array(String) | Remote-percent strings parsed from the description (verbatim spans). | ["3 days in office"] | ~8% |
| remotePercentageLabels | Array(String) | Semantic label for each parsedRemotePercentages entry. | ["hybrid 3 days"] | ~8% |
| nlpShiftTypes | Array(String) | Shift-type phrases extracted (night shift, weekends, rotating). | ["Night", "Weekend"] | ~5% |
| mentionsShiftWork | UInt8 | 1 if the description mentions shift work. | 0 | ~8% |
| nlpTravelRequirements | Array(String) | Travel-requirement phrases extracted. | ["up to 25% travel"] | ~10% |
| mentionsTravelRequirement | UInt8 | 1 if the description mentions a travel requirement. | 1 | ~20% |
| nlpStartDateSignals | Array(String) | Start-date phrases extracted from the posting text. | ["ASAP", "flexible"] | ~5% |
| parsedStartDates | Array(String) | Start-date strings parsed from the description (verbatim spans). | ["2026-01-15"] | ~5% |
| startDateLabels | Array(String) | Semantic label for each parsedStartDates entry (immediate, future-dated, flexible). | ["immediate"] | ~5% |
| nlpTechnicalSkills | Array(String) | Technical / hard skills extracted from the description. | ["Python", "SQL", "AWS"] | 80-93% |
| nlpSoftSkills | Array(String) | Interpersonal and behavioral skills extracted. | ["Communication", "Leadership"] | 70-85% |
| nlpCertifications | Array(String) | Certifications named in the posting. | ["PMP", "AWS Solutions Architect"] | ~30% |
| nlpLicenseRequirements | Array(String) | Professional license requirements extracted. | ["active RN license", "valid CDL"] | ~15% |
| nlpQualifications | Array(String) | Free-text qualification mentions. | ["5+ years experience"] | ~50% |
| nlpDegreeLevels | Array(String) | All degree levels mentioned in the description. | ["Bachelor's", "Master's"] | ~60% |
| nlpMinimumDegree | String | Lowest degree level required, derived from nlpDegreeLevels. | Bachelor's | ~55% |
| nlpExperienceLevels | Array(String) | Experience-level cues mentioned (entry-level, senior, executive). | ["senior"] | ~45% |
| nlpExperienceRequirements | Array(String) | Years-of-experience strings parsed (verbatim spans). | ["3-5 years"] | ~45% |
| nlpLanguageRequirements | Array(String) | Non-English language requirements. | ["Spanish", "Mandarin"] | ~8% |
| nlpPhysicalRequirements | Array(String) | Physical-requirement phrases extracted. | ["lift 50 lbs"] | ~10% |
| nlpOffersVisaSponsorship | UInt8 | 1 if the role offers visa sponsorship, 0 if explicitly not, NULL when no signal. | 1 | ~15% |
| nlpVisaSponsorshipMentions | Array(String) | Visa-sponsorship phrases extracted from the posting. | ["H-1B sponsorship available"] | ~15% |
| mentionsVisaSponsorship | UInt8 | 1 if the description mentions visa sponsorship language. | 1 | ~15% |
| nlpCitizenshipRequirements | Array(String) | Citizenship-requirement phrases extracted. | ["US Citizen", "Green Card"] | ~10% |
| nlpRequiresSecurityClearance | UInt8 | 1 if the role requires a security clearance, 0 if explicitly not, NULL when no signal. | 1 | ~8% |
| nlpClearanceLevels | Array(String) | Security clearance levels mentioned. | ["Secret", "Top Secret"] | ~5% |
| mentionsClearanceRequirement | UInt8 | 1 if the description mentions a security clearance requirement. | 0 | ~8% |
| nlpScreeningRequirements | Array(String) | Screening-requirement phrases (background check, drug test, credit check). | ["background check"] | ~12% |
| nlpEeocStatement | String | EEOC / equal-opportunity statement category detected on the posting. | standard | ~40% |
| nlpAdvertiserType | String | Advertiser inferred from the posting context (direct employer, staffing agency, job board). | direct employer | ~80% |
| nlpCompanyTypes | Array(String) | Company-type cues mentioned in the posting (startup, fortune 500, non-profit). | ["startup"] | ~30% |
| nlpUrgencySignals | Array(String) | Urgency-signal phrases (immediate start, urgent hiring). | ["urgent hiring"] | ~5% |
| mentionsUrgentHiring | UInt8 | 1 if the description carries urgent-hiring language. | 0 | ~5% |
| companyName | String | Company name as published on the posting, before normalization. | Macys Inc. | 96% |
| companyProfileURL | String | Link to the employer's profile / company page on the source site. | https://indeed.com/cmp/Acme-Corp | ~70% |
| nlpNormalizedCompanyName | String | Canonical, deduplicated employer name from our company database. | Macy's | ~90% |
| nlpNormalizedIndustryName | String | Canonical industry (LinkedIn taxonomy, ~437 entries), deduplicated across spelling variants. | Retail | ~85% |
| naicsSector | String | NAICS-2022 sector (2-digit), derived from the canonical industry. | 44-45 | ~80% |
| naicsIndustryGroup | String | NAICS-2022 industry group (4-digit) when the industry is specific enough. | 4521 | ~70% |
| naicsCode | String | NAICS-2022 6-digit national industry code. NULL for industries too generic to map. | 452210 | ~60% |
| naicsTitle | String | Short label for the most-specific populated NAICS level. | Department Stores | ~80% |
| nlpNormalizedCompanyType | String | Canonical ownership / organizational type (Public Company, Privately Held, etc.). | Public Company | ~60% |
| companyEmployeeSizeBucket | String | Canonical employee-count bucket for the employer. | 10000+ | ~70% |
| companyEmployeeSizeMinimum | UInt32 | Lower bound (inclusive) of the employee-count bucket. | 10000 | ~70% |
| companyEmployeeSizeMaximum | UInt32 | Upper bound of the employee-count bucket; NULL for open-ended top buckets. | 50000 | ~65% |
| companyRevenueMinimumUSD | Float64 | Lower bound (inclusive) of the employer's annual revenue range, in USD. | 1000000000 | ~50% |
| companyRevenueMaximumUSD | Float64 | Upper bound of the employer's annual revenue range, in USD; NULL for open-ended top buckets. | 5000000000 | ~50% |
| companyHeadquartersCountryISO2 | String | ISO 3166-1 alpha-2 country code (UPPERCASE) for the employer's headquarters country. | US | ~81% |
| companyHeadquartersCity | String | Employer headquarters city in normalized form. | New York | ~65% |
| companyHeadquartersState | String | Employer headquarters state or region. US employers carry the 2-letter code. | NY | ~60% |
Company Profiles: 36 fields, 28.5M rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| companyProfileUrlId | String | Primary key. Unique identifier derived from profile URL. | linkedin_acme-corp | 100% |
| companyNameId | String | Normalized company name identifier for joins. | acme_corp | 100% |
| companyName | String | Display name of the company. | Acme Corporation | 100% |
| companyProfileUrl | String | Full URL to the company profile page. | https://linkedin.com/company/acme-corp | 100% |
| companyProfileName | String | Profile slug or vanity name from the source. | acme-corp | 93% |
| companyKey | String | Internal key used for cross-table joins. | ck_acme_corp_123 | 89% |
| companyHeadquarters | String | Headquarters city and state/country. | San Francisco, CA | 77% |
| country | String | Country code of headquarters. | US | 81% |
| companyIndustry | String | Primary industry classification. | Technology | 84% |
| companySize | String | Employee count range bucket. | 1001-5000 | 83% |
| companyFoundDate | DateTime64 | Date the company was founded. | 2005-06-01 | 33% |
| companyWebsite | String | Company website URL. | https://acme.com | 68% |
| companyShortDesc | String | Short tagline or summary description. | Enterprise cloud infrastructure | 84% |
| companyLogoUrl | String | URL to company logo image. | https://media.licdn.com/dms/image/... | 87% |
| companyLocations | Array(String) | Array of all office locations. | ["San Francisco, CA", "New York, NY", "Austin, TX"] | 100% |
| companyAffiliatedPages | String | Affiliated company pages or subsidiaries. | Acme Labs, Acme Cloud | 28% |
| companyType | String | Organization type (Public, Private, Nonprofit, etc.). | Privately Held | 49% |
| companyEmployeeCount | UInt32 | Exact employee count when available. | 3500 | 82% |
| companySpecialties | String | Comma-separated list of company specialties. | Cloud Computing, AI, DevOps | 26% |
| companyFollowerCount | UInt32 | Number of followers on the profile platform. | 85000 | 80% |
| companyEmployees | String | JSON blob of featured employee profiles. | [{"name": "Jane Smith", "title": "CEO"}] | 75% |
| companySimilarPages | String | Similar company profiles suggested by LinkedIn. | Globex Corp, Initech | 66% |
| companyUpdates | String | Recent company posts or updates from the profile. | [{"text": "We're hiring!", "date": "2025-11-01"}] | 39% |
| companyDesc | String | Full company description / about text. | Acme Corporation is a leading provider of... | 14% |
| companyCeo | String | Name of the company CEO or top executive. | Jane Smith | 1% |
| companyRevenue | String | Revenue range bucket. | $1B-$5B | 6% |
| companyIndustryUrl | String | URL slug for the industry category on source platform. | /cmp/_industry/technology | 52% |
| companyRating | Float64 | Average employer rating (1.0-5.0 scale). | 4.2 | 29% |
| companyReviewCount | UInt32 | Total number of employer reviews. | 1250 | 29% |
| dbInsertTimestamp | DateTime | Timestamp when the record was inserted into ClickHouse. | 2025-11-01T12:00:00Z | 100% |
| firstScrapedTimestamp | DateTime | First time this company profile was scraped. | 2024-06-15T08:00:00Z | 100% |
| lastScrapedTimestamp | DateTime | Most recent scrape of this company profile. | 2025-11-15T14:00:00Z | 100% |
| lastModifiedTimestamp | DateTime | Last time any field in this record was updated. | 2025-11-10T09:30:00Z | 100% |
| companyScrapeStatus | String | Current scraping status (active, archived, error). | active | >99% |
| src | String | Source platform identifier. | 100% | |
| raw | JSON | Raw JSON blob from the original scrape for debugging. | {"raw_html": "..."} | 100% |
Google Maps: 44 fields, 52M detailed + 193M fast rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| id | String | Primary key. Unique place identifier. | 0x808fcb5... | 100% |
| cid | String | Google CID (customer ID) for the place. | 12345678901234567 | 100% |
| title | String | Business name as displayed on Google Maps. | Starbucks | 100% |
| unique_key | String | Deduplication key derived from name + location. | starbucks_sf_94105 | 100% |
| data_id | String | Google internal data identifier for the listing. | 0x808fcb5:0x1234abcd | 100% |
| address | String | Short-form address as shown on the listing. | 123 Market St | ~95% |
| complete_address | String | Full formatted address including city, state, zip. | 123 Market St, San Francisco, CA 94105 | ~90% |
| street | String | Street name and number. | 123 Market St | ~85% |
| city | String | City name. | San Francisco | ~95% |
| state | String | State or province abbreviation. | CA | ~95% |
| postal_code | String | ZIP or postal code. | 94105 | ~85% |
| country | String | Country code. | US | ~98% |
| latitude | Float64 | Latitude coordinate from Google Maps. | 37.7749 | ~99% |
| longitude | Float64 | Longitude coordinate from Google Maps. | -122.4194 | ~99% |
| plus_code | String | Google Plus Code for precise location. | 849VCWC8+R9 | ~80% |
| timezone | String | IANA timezone for the location. | America/Los_Angeles | ~75% |
| category | String | Primary Google Maps business category. | Coffee shop | ~95% |
| categories | Array(String) | All business categories assigned by Google. | ["Coffee shop", "Cafe", "Breakfast restaurant"] | ~90% |
| status | String | Operational status of the business. | Operational | ~85% |
| description | String | Business description from the listing. | Premium coffee and handcrafted beverages... | ~50% |
| about | String | About section with structured attributes (JSON). | {"Service options": ["Dine-in", "Takeout"]} | ~45% |
| price_range | String | Price level indicator. | $$ | ~60% |
| owner | String | Business owner or operator name. | Starbucks Corporation | ~40% |
| review_count | UInt32 | Total number of Google reviews. | 342 | ~90% |
| review_rating | Float32 | Average star rating (1.0-5.0). | 4.3 | ~90% |
| reviews_link | String | Direct link to the Google reviews page. | https://search.google.com/local/reviews?placeid=... | ~85% |
| reviews_per_rating | String | JSON breakdown of reviews by star count. | {"5": 180, "4": 90, "3": 40, "2": 20, "1": 12} | ~80% |
| popular_times | String | Hourly busyness data by day of week (JSON). | {"Monday": [{"hour": 8, "busyness": 30}, ...]} | ~35% |
| user_reviews | String | Sample of recent user reviews (JSON array). | [{"rating": 5, "text": "Great coffee!"}] | ~70% |
| user_reviews_extended | String | Extended review data with metadata (author, date, response). | [{"author": "John", "date": "2025-10", ...}] | ~50% |
| phone | String | Business phone number. | +1 (415) 555-0123 | ~75% |
| web_site | String | Business website URL. | https://www.starbucks.com/store-locator/... | ~65% |
| emails | Array(String) | Email addresses found on listing or website. | ["info@business.com"] | ~20% |
| thumbnail | String | URL to the listing thumbnail image. | https://lh5.googleusercontent.com/p/... | ~85% |
| images | String | JSON array of photo URLs from the listing. | ["https://lh5.googleusercontent.com/p/..."] | ~75% |
| url | String | Google Maps URL for this listing. | https://www.google.com/maps/place/... | ~98% |
| link | String | Short Google Maps link. | https://maps.google.com/?cid=12345... | ~98% |
| reservations | String | Reservation links or availability. | https://www.opentable.com/... | ~15% |
| order_online | String | Online ordering links. | https://order.starbucks.com/... | ~25% |
| menu | String | Menu link or structured menu data. | https://www.starbucks.com/menu | ~30% |
| open_hours | String | Operating hours by day of week (JSON). | {"Monday": "6:00 AM - 8:00 PM", ...} | ~70% |
| detailed_scraped_at | DateTime | Timestamp of the detailed scrape. | 2025-11-01T12:00:00Z | 100% |
| ch_insertion_time | DateTime64 | ClickHouse insertion timestamp. | 2025-11-01T12:05:00Z | 100% |
Salary Data: 35 fields, 1B+ job postings (predicted) + 11M salary observations rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| parsedAnnualSalaryMin | Float32 | Lower bound of stated salary, normalized to annual USD. | 90000.0 | ~35% (stated) |
| parsedAnnualSalaryAvg | Float32 | Midpoint (min+max)/2 in annual USD. | 105000.0 | ~35% (stated) |
| parsedAnnualSalaryMax | Float32 | Upper bound of stated salary, normalized to annual USD. | 120000.0 | ~35% (stated) |
| nlpSalary | Float32 | AI-predicted annual salary trained on 50M+ observations. Three-source fusion with 95% CIs per benchmark cell. | 108500.0 | 85-95% (2023+) |
| scrapedSalary | String | Raw salary text verbatim from the posting. | $90,000 - $120,000/yr | ~35% |
| company_code | String | Glassdoor internal company identifier. | E1234 | 100% |
| company_name | String | Company name as listed on Glassdoor. | 100% | |
| job_title | String | Job title for this salary submission. | Software Engineer | 100% |
| location_raw | String | Raw location string from the salary submission. | San Francisco, CA | ~95% |
| city | String | Parsed city name. | San Francisco | ~90% |
| state | String | Parsed state abbreviation. | CA | ~90% |
| metro | String | Metro area designation. | San Francisco-Oakland-Berkeley, CA | ~85% |
| country | String | Country code. | US | ~98% |
| submitted_date | String | Date the salary was submitted by the employee. | 2025-03-15 | ~95% |
| scrape_time | DateTime64 | Timestamp when this record was scraped. | 2025-11-01T08:00:00Z | 100% |
| years_of_exp | String | Years of experience reported by the submitter. | 5-7 | ~60% |
| pay_json | String | Full pay breakdown as JSON (base, bonus, stock, etc.). | {"base": 150000, "bonus": 20000, "stock": 50000} | ~90% |
| total_pay_raw | String | Total compensation as reported (text). | $220,000 | ~85% |
| base_additional_raw | String | Base + additional pay text from Glassdoor. | $150K base + $70K additional | ~80% |
| base_pay | Float64 | Parsed base pay in USD. | 150000.0 | ~90% |
| additional_pay | Float64 | Additional pay (bonus, stock, commission) in USD. | 70000.0 | ~75% |
| anonymity_min | Float64 | Glassdoor anonymity range lower bound. | 140000.0 | ~80% |
| anonymity_max | Float64 | Glassdoor anonymity range upper bound. | 160000.0 | ~80% |
| pay_period | String | Pay frequency (Annual, Monthly, Hourly). | Annual | ~95% |
| currency_code | String | ISO currency code. | USD | ~98% |
| salary_min_annual | Float64 | Minimum annual salary (normalized from pay_period). | 140000.0 | ~85% |
| salary_max_annual | Float64 | Maximum annual salary (normalized from pay_period). | 180000.0 | ~85% |
| salary_avg_annual | Float64 | Average annual salary (midpoint of min/max). | 160000.0 | ~85% |
| source | String | Data source identifier. | glassdoor | 100% |
| source_file_type | String | File format of the source data. | json | 100% |
| source_file | String | Original source file path for traceability. | glassdoor_salaries_2025_q4.json | 100% |
| salary_detailed_url | String | URL to the Glassdoor salary detail page. | https://glassdoor.com/Salary/Google-Software-Engineer-... | ~90% |
| submitted_count | UInt32 | Total salary submissions for this job title. | 1250 | ~95% |
| submitted_count_company | UInt32 | Salary submissions for this title at this company. | 85 | ~90% |
| salary_general_url | String | URL to Glassdoor general salary page for this title. | https://glassdoor.com/Salaries/software-engineer-salary-... | ~95% |
Skills & Taxonomy: 30 fields, 1B+ (enrichment layer on job postings) rows
| Field Name | Type | Description | Example | Coverage |
|---|---|---|---|---|
| nlpSkills | Array(String) | Technical/hard skills extracted from job description. | ["Python", "SQL", "AWS", "Docker"] | 80-93% |
| nlpSoftSkills | Array(String) | Interpersonal and behavioral skills. | ["Communication", "Leadership", "Teamwork"] | 70-85% |
| nlpCertifications | Array(String) | Professional certifications required or preferred. | ["PMP", "AWS Solutions Architect", "CPA"] | ~30% |
| nlpQualifications | Array(String) | Other qualification requirements extracted from text. | ["5+ years experience", "US citizen"] | ~50% |
| nlpExperienceRequirements | Array(String) | Years and type of experience requirements. | ["3-5 years software development"] | ~45% |
| nlpDegreeLevels | Array(String) | Education degrees mentioned in posting. | ["Bachelor's", "Master's"] | ~60% |
| nlpDegreeLevelMin | String | Minimum acceptable degree for the position. | Bachelor's | ~55% |
| nlpBenefits | Array(String) | Benefits extracted from description text. | ["401k", "Health Insurance", "PTO"] | ~65% |
| nlpSocCode | String | Standard Occupational Classification code (6-digit, BLS 2018). | 15-1252 | 85-90% |
| nlpSocTitle | String | Official BLS occupation title for the SOC code. | Software Developers | 85-90% |
| nlpSeniority | String | ML-classified seniority level (100% complete). | Senior | 85-95% |
| nlpEmployment | String | ML-classified employment type. | Full-time | 85-90% |
| nlpRemote | String | ML-classified work mode (Remote, Hybrid, On-site). | Remote | 85%+ (2023+) |
| nlpNormalizedTitle | String | Standardized job title via NLP normalization. | Software Engineer | ~100% |
| nlpNormalizedTitleScore | Float32 | Title normalization confidence score (0.0-1.0). | 0.94 | ~100% |
| nlpIsManagerialRole | Boolean | Is this a people management position? | true | ~90% |
| nlpIsUrgentHiring | Boolean | Time-sensitive hire signal (market demand indicator). | true | ~5% |
| nlpNumberOfOpenings | Array(String) | How many positions available (hiring volume signal). | ["3"] | ~10% |
| nlpTeamSizes | Array(String) | Size of team this role joins. | ["15-20"] | ~8% |
| nlpExpectedStartDates | Array(String) | When role is expected to start. | ["2025-01-15"] | ~5% |
| nlpOffersVisaSponsorship | Boolean | Does employer offer visa sponsorship? | true | ~15% |
| nlpRequiresClearance | Boolean | Does job require security clearance? | true | ~8% |
| nlpClearanceLevels | Array(String) | Specific clearance levels required. | ["Secret", "Top Secret"] | ~5% |
| nlpCitizenshipRequired | Array(String) | Citizenship/authorization requirements. | ["US Citizen", "Green Card"] | ~10% |
| nlpOffersEquity | Boolean | Does compensation include equity/stock? | true | ~12% |
| nlpRequiresTravel | Boolean | Does the job require travel? | true | ~20% |
| nlpTravelPercentages | Array(String) | Quantified travel requirements. | ["25%", "50%"] | ~10% |
| nlpIsShiftWork | Boolean | Is this a shift-based position? | false | ~8% |
| nlpShiftTypes | Array(String) | Specific shift types when applicable. | ["Night", "Weekend", "Rotating"] | ~5% |
| nlpLanguagesRequired | Array(String) | Non-English language requirements. | ["Spanish", "Mandarin"] | ~8% |
Loading schema explorer...