FeedbackIf you find a bug, or have feedback, put it here. Please no job applications in here, click Apply on the job instead.Thanks for the message! We will get back to you soon.

[Spam check] What is the name of Elon Musk's company going to Mars?

Send feedback
Open Startup
Health InsurancePost a job

find a remote job
work from anywhere

πŸ‘‰ Hiring for a Remote position?

Post a job
on the πŸ† #1 Remote Jobs board

Remote Health by SafetyWing

Global health insurance for freelancers & remote workers










This job post is closed and the position is probably filled. Please do not apply.
\nAbout the job:\n\nWe are looking for two Senior Backend Engineers to develop and grow our crawling and extraction services. Our automated service is used directly by our customers via API, as well as by us for internal projects. Our extraction capabilities include automated product and article extraction from single pages or whole domains using machine learning and custom built components and we plan to expand it for jobs and news. The service is still in early stages of development, serving its first customers.\n\nAs a professional services company we are often required to build a custom crawling and extraction pipeline for a specific customer. That requires crawl and extraction planning with respect to customer needs, including crawling time estimation and HW allocation. The volume is often very high, and solutions have to be properly designed to provide the required performance, reliability and maintainability.\n\nOur platform has several components communicating via Apache Kafka and using HBase as a permanent storage. Most components are written in Python, while several crucial components are made using Scala and Kafka Streams. Currently, main priorities are improving reliability and scalability of the system, integration with other Scrapinghub services, implementation of auto-scaling and other features. This is going to be a challenging journey for every good Backend Engineer!\n\nJob Responsibilities:\n\n\n* Design and implementation of a large scale web crawling and extraction service.\n\n* Solution architecture for large scale crawling and data extraction: design, hardware and development effort estimations, writing proposal drafts, explaining and motivating the solution for customers,\n\n* Implementation and troubleshooting of Apache Kafka applications: workers, HW estimation, performance tuning, debugging,\n\n* Interaction with data science engineers and customers\n\n* Write code carefully for critical and production environments along with good communication and learning skills.\n\n\n\n\nRequirements:\n\n\n* Experience building at least one large scale data processing system or high load service. Understanding what CPU/memory effort the particular code requires,\n\n* Good knowledge of Python\n\n* experience with any distributed messaging system (Rabbitmq, Kafka, ZeroMQ, etc),\n\n* Docker containers basics,\n\n* Linux knowledge.\n\n* Good communication skills in English,\n\n* Understand a ways to solve problem, and ability to wisely choose between: quick hotfix, long-term solution, or design change.\n\n\n\n\nBonus points for:\n\n\n* Kafka Streams and microservices based on Apache Kafka, understanding Kafka message delivery semantics and how to achieve them on practice,\n\n* HBase: data model, selecting the access patterns, maintenance processes,\n\n* Understanding how web works: research on link structure, major components on link graphs,\n\n* Algorithms and data structures background,\n\n* Experience with web data processing tasks: web crawling, finding similar items, mining data streams, link analysis, etc.\n\n* Experience with Microservices,\n\n* Experience with JVM,\n\n* Open source activity.\n\n\n

See more jobs at Scrapinghub

# How do you apply?\n\n This job post is older than 30 days and the position is probably filled. Try applying to jobs posted recently instead.