Infinity Constellation Logo

Infinity Constellation

Senior Web Scraping Engineer — Labrynth

Reposted 15 Days Ago
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
The Senior Web Scraping Engineer will design and maintain large-scale web scraping systems, focusing on automation, cloud workflows, and incorporating LLM-based techniques to enhance data extraction quality.
The summary above was generated by AI

Stack @ Labrynth: GCP · Python · Pydantic/PydanticAI · Docling · Django · Cloud Run · LLMs · GitHub · Clickup · Selenium

About Labrynth:

At Labrynth, we’re a Silicon Valley startup building next-generation Hermeneutical-Agent systems — AI that can read, reason, and execute on the world’s most complex regulations.
Our Application Validator is live, performing audit-grade, evidence-grounded compliance checks. Next, we’re expanding the Application Generator to create regulator-ready drafts backed by verified data and citations.

You’ll help shape both — advancing safety, evaluation rigor, latency, and cost-efficiency across large-scale, production AI systems at the edge of applied research and real-world impact.

Our mission is to transform bureaucratic and complex processes using AI and automation, turning them into fast, transparent, and scalable pipelines. We are a spin-off from the world's largest AI Model Trainer - Invisible Technologies and are backed by the Infinity Constellation group. We already work with enterprise clients, governments, and large-scale projects, so you will have a real impact accelerating major developments.

About the Role

We are looking for a Senior Web Scraping Engineer to design, build, and operate large-scale data collection systems. You will be responsible for developing robust scrapers using tools such as Selenium, Beautiful Soup, Playwright, Scrapy, etc., and for creating automated workflows in the cloud that run reliably on a schedule, generate logs, and surface failures proactively.

You will also experiment with and apply LLM-based techniques to improve scraping robustness and data extraction quality.


Key Responsibilities
  • Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources.

  • Build scrapers using tools and frameworks such as Selenium, Playwright, BeautifulSoup, Scrapy (and similar libraries) with a focus on reliability, performance, and maintainability.

  • Create automated workflows for scraping and data processing:

    • Containerize scraping jobs (e.g., using Docker).

    • Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure).

    • Configure scheduling (e.g., run daily/weekly/hourly) and dependency management.

  • Implement monitoring, alerting, and logging:

    • Capture detailed logs for each job run.

    • Track job statuses and failures.

    • Implement notifications/alerts when a scraper breaks or a website changes.

  • Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes.

  • Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality.

  • Utilize LLMs (Large Language Models) to:

    • Parse and extract structured information from messy HTML or semi-structured content.

    • Increase robustness of scrapers to frequent UI/DOM changes.

    • Prototype new scraping / extraction strategies using LLM APIs.

  • Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team.

  • Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.

Requirements
  • 3+ years of professional experience working with web scraping or data collection at scale.

  • Strong proficiency in Python and common scraping libraries/frameworks such as:

    • Selenium, Playwright, BeautifulSoup, Scrapy (or similar).

  • Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.

  • Experience building automated, production-grade workflows:

    • Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).

    • Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., PostgreSQL, BigQuery, S3, GCS).

  • Hands-on experience with cloud platforms (AWS, GCP, or Azure), including:

    • Deploying and running scheduled jobs.

    • Managing infrastructure-as-code or similar deployment processes.

  • Strong experience with logging, monitoring, and alerting:

    • Ability to design logging for scraping jobs and to debug failures from logs.

    • Familiarity with tools like CloudWatch, Stackdriver, ELK, Prometheus, Grafana, or similar.

  • Experience with containers (Docker) and familiarity with CI/CD workflows.

  • Exposure to LLMs (e.g., OpenAI, Anthropic, etc.) for tasks like parsing, information extraction, or automation.

  • Strong problem-solving skills and the ability to debug complex, dynamic websites.

  • Comfortable working in a fast-paced environment, with good communication skills in English.

Nice-to-Have
  • Experience with Kubernetes or other container orchestration systems.

  • Experience dealing with large-scale crawling, distributed scraping, and high-concurrency systems.

  • Familiarity with handling CAPTCHAs, rotating proxies, and headless browsers at scale.

  • Background in data engineering

  • Contributions to open-source web scraping tools or frameworks.

Working Model
  • Remote-first; primary collaboration in Americas time zones with ~5 hours overlap.

  • Fully remote, flexible hours

  • Payment in USD (contractor/freelance basis)

    • Budget: 5,000 USD/month

  • Work on a global team, with real-world challenges and rapid growth opportunities

Top Skills

Airflow
Beautifulsoup
BigQuery
Cloudwatch
Dagster
Docker
Elk
GCP
Gcs
Grafana
Llms
Playwright
Postgres
Prefect
Prometheus
Python
S3
Scrapy
Selenium
Stackdriver

Similar Jobs

56 Minutes Ago
Easy Apply
Remote or Hybrid
San Jose, CA, USA
Easy Apply
154K-220K Annually
Senior level
154K-220K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
Develop and maintain Mac and iOS applications, translating product requirements into high-quality code while ensuring performance and troubleshooting issues.
Top Skills: Apple Push NotificationsC++Cloud Messaging ApisMac Networking StackObjective-CPkiRestful ApisSwiftVpn
An Hour Ago
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Design and maintain backend systems for stablecoin payments, lead projects, collaborate with teams, and mentor junior engineers in a fast-paced environment.
Top Skills: AWSGCPGoJava
An Hour Ago
Remote or Hybrid
2 Locations
135K-185K Annually
Senior level
135K-185K Annually
Senior level
Artificial Intelligence • Natural Language Processing • Professional Services • Analytics • Consulting • Conversational AI • Generative AI
The role involves leading integration activities for Vendor Invoice Management with SAP S/4 HANA, managing stakeholder collaboration, and supporting project execution and quality assurance.
Top Skills: Accounts PayableSap FinanceSap S/4 HanaVendor Invoice Management (Vim)

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account