Open Startup
RSS
API
Remote HealthPost a job

find a remote job
work from anywhere

πŸ‘‰ Hiring for a Remote position?

Post a job
on the πŸ† #1 Remote Jobs board

Remote Health by SafetyWing


Global health insurance for freelancers & remote workers

Remote Health by SafetyWing


Global health insurance for freelancers & remote workers

O'Reilly Auto Parts


closed

Site Reliability Engineer


O'Reilly Auto Parts


sys admin

 

engineer

 

admin

 

sys admin

 

engineer

 

admin

This job post is closed and the position is probably filled. Please do not apply.
\nThe Site Reliability Engineer is responsible for the availability and performance the platforms and services of O’Reilly Auto Parts. Creates and defines monitoring and incident response tools and processes.\nThe Site Reliability Engineer will create a bridge between development and operations by applying a software engineering mindset to system administration. Time will be split between operations/on-call duties and developing systems and software that help increase site reliability and performance.\n\nESSENTIAL JOB FUNCTION:\n\n\n* Deploy methodologies for building and operating highly available and scalable services.Work closely with Network Operations Center to develop monitoring tools, analyze root cause of incidents, and improve the Network Operations Center’s ability to independently resolve issues.\n\n* Evaluate, build and modify automation for deploying and operating production services.\n\n* Provide leadership in reducing and resolving production incidents.\n\n* Proactively monitor and review application performance. Monitor specific metrics, set thresholds, and trigger alerts based on those thresholds.\n\n* Collect and analyze logging and diagnostic information.\n\n* Identify opportunities to improve all operations processes.\n\n* Facilitate effective transition of services into production ensuring that all requirements have been met in accordance with O’Reilly’s Change Management standards.\n\n* Properly document all incident responses.\n\n* Provide updates and documentation to runbooks and operational manuals.\n\n* Document mean time to recover (MTTR) and mean time to failure (MTTF).\n\n* Participate in on-call rotations.\n\n\n\n\nSKILLS/ EDUCATION/ KNOWLEDGE/ EXPERIENCE/ ABILITES:\n\nRequired: \n\n\n* Bachelor’s Degree or equivalent work experience.\n\n* 5+ years of professional experience in Site Reliability, Linux Systems Administration, DevOps, or Infrastructure Engineering.\n\n* Experience with programming languages including Java, JavaScript and SQL.\n\n* Experience with Shell Scripting such as Bash, Python or Ruby.\n\n* Familiarity with automation and configuration management tools and frameworks.\n\n* Excellent analytical and problem solving skills.\n\n* Strong written and verbal communication skills.\n\n* Must be well organized, detail oriented, and able to self-prioritize work.\n\n* Must exhibit a high degree of professionalism.\n\n* Composed urgency in stressful situations.\n\n\n\n\nDesired:\n\n\n* ITIL Foundations Certification.\n\n* CRE or CMRP Certifications.\n\n\n


See more jobs at O'Reilly Auto Parts

# How do you apply?\n\n This job post is older than 30 days and the position is probably filled. Try applying to jobs posted recently instead.
105ms