This job post is closed and the position is probably filled. Please do not apply. Work for Scrapinghub and want to re-open this job? Use the edit link in the email when you posted the job!
\nAbout the job:\n\nWe are looking for two Senior Backend Engineers to develop and grow our crawling and extraction services. Our automated service is used directly by our customers via API, as well as by us for internal projects. Our extraction capabilities include automated product and article extraction from single pages or whole domains using machine learning and custom built components and we plan to expand it for jobs and news. The service is still in early stages of development, serving its first customers.\n\nAs a professional services company we are often required to build a custom crawling and extraction pipeline for a specific customer. That requires crawl and extraction planning with respect to customer needs, including crawling time estimation and HW allocation. The volume is often very high, and solutions have to be properly designed to provide the required performance, reliability and maintainability.\n\nOur platform has several components communicating via Apache Kafka and using HBase as a permanent storage. Most components are written in Python, while several crucial components are made using Scala and Kafka Streams. Currently, main priorities are improving reliability and scalability of the system, integration with other Scrapinghub services, implementation of auto-scaling and other features. This is going to be a challenging journey for every good Backend Engineer!\n\nJob Responsibilities:\n\n\n* Design and implementation of a large scale web crawling and extraction service.\n\n* Solution architecture for large scale crawling and data extraction: design, hardware and development effort estimations, writing proposal drafts, explaining and motivating the solution for customers,\n\n* Implementation and troubleshooting of Apache Kafka applications: workers, HW estimation, performance tuning, debugging,\n\n* Interaction with data science engineers and customers\n\n* Write code carefully for critical and production environments along with good communication and learning skills.\n\n\n\n\nRequirements:\n\n\n* Experience building at least one large scale data processing system or high load service. Understanding what CPU/memory effort the particular code requires,\n\n* Good knowledge of Python\n\n* experience with any distributed messaging system (Rabbitmq, Kafka, ZeroMQ, etc),\n\n* Docker containers basics,\n\n* Linux knowledge.\n\n* Good communication skills in English,\n\n* Understand a ways to solve problem, and ability to wisely choose between: quick hotfix, long-term solution, or design change.\n\n\n\n\nBonus points for:\n\n\n* Kafka Streams and microservices based on Apache Kafka, understanding Kafka message delivery semantics and how to achieve them on practice,\n\n* HBase: data model, selecting the access patterns, maintenance processes,\n\n* Understanding how web works: research on link structure, major components on link graphs,\n\n* Algorithms and data structures background,\n\n* Experience with web data processing tasks: web crawling, finding similar items, mining data streams, link analysis, etc.\n\n* Experience with Microservices,\n\n* Experience with JVM,\n\n* Open source activity.\n\n\n \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Cloud, Senior, Engineer, Backend, Scala and Apache jobs that are similar:\n\n
$75,000 — $120,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
# How do you apply?\n\nThis job post has been closed by the poster, which means they probably have enough applicants now. Please do not apply.