Site Reliability Engineer (Prometheus) Job at Bay Area TeK Solutions LLC, Remote

eHF5UCtvemxjNGZTeUI1TFlFdzB6VDlY
  • Bay Area TeK Solutions LLC
  • Remote

Job Description

Job Description:

We are looking for a skilled Senior Site Reliability Engineer (SRE) with deep expertise in Prometheus, Grafana, and Kubernetes to join our remote team. In this role, you will manage and optimize the infrastructure supporting a large-scale hardware monitoring project, ensuring high availability, reliability, and scalability for thousands of server hardware.

Key Responsibilities:

  • Monitoring and Observability : Design, implement, and maintain comprehensive monitoring systems using Prometheus and Grafana to track and visualize metrics from thousands of hardware servers.
  • Kubernetes Orchestration : Deploy, manage, and optimize applications on Kubernetes clusters , ensuring optimal performance and scalability.
  • Automation and Scripting : Develop and implement automation for routine tasks, including alerting, system monitoring, and response mechanisms.
  • Incident Management : Troubleshoot, diagnose, and resolve infrastructure incidents, ensuring the uptime and reliability of services.
  • Performance Tuning : Optimize system performance, ensuring efficient data storage, querying, and alerting in Prometheus and Grafana environments.
  • CI/CD Integration : Collaborate with development teams to integrate monitoring into the CI/CD pipeline and ensure smooth deployments.
  • Capacity Planning : Perform capacity analysis and ensure that systems are appropriately scaled to handle increasing load.
  • Post Deployment support: Support for monitoring solution once monitoring solution is implemented, troubleshooting incidents.

Required Skills:

  • Prometheus : Advanced experience in configuring, tuning, and managing Prometheus for large-scale environments.
  • Grafana : Proficiency in setting up Grafana dashboards for real-time monitoring and alerting.
  • Kubernetes : Strong hands-on experience with managing Kubernetes clusters, deployments, and container orchestration.
  • Scripting : Proficiency in scripting languages such as Python or Bash to automate tasks.
  • Alerting & Incident Management : Experience setting up advanced alerting and incident management processes.
  • Infrastructure as Code (IaC) : Experience with tools like Helm .
  • CI/CD Pipelines : Knowledge of CI/CD tools and automation frameworks for seamless deployment.

Job Tags

Full time, Remote job,

Similar Jobs

Sandpoint Furniture Carpet One Floor & Home

Carpet Installers Job at Sandpoint Furniture Carpet One Floor & Home

 ...longest wear. Stretch carpet to align with walls and ensure a smooth surface, and press carpet in place over tack strips or use staples, tape, tacks or glue to hold carpet in place. Take measurements and study floor sketches to calculate the area to be carpeted and... 

Productivity Inc

Maintenance & Grounds Assistant Job at Productivity Inc

Maintenance / Grounds Assistant ~$22.00 - $25.00/hour DOQ - starting salary - annual performance AND salary reviews ~ Great benefits - Eligible day-1 of employment - Health, Dental & Vision ~401k with match ~ Generous paid time off - Holidays, Vacation & Personal...

Aquafinity

IT Help Desk Technician Job at Aquafinity

Position Overview: Now Hiring: IT Help Desk Technician II Location: Jupiter, Florida Pay: $65,000.00-$85,000.00 Join Aquafinity...  ...effectively. Collaborative mindset and the ability to work both independently and as part of a team. ~ Certifications (Preferred... 

FIBERTEK, INC.

Electrical Engineer Job at FIBERTEK, INC.

 ...Use your electrical engineering skills to design and build electro-optical systems to push boundaries for NASA and DoD. Fibertek is looking for an Electrical Engineer to join our Herndon, VA location to contribute to the development of laser products within our R&D team... 

The H&K Group

Railroad Conductor/Laborer Job at The H&K Group

 ...Ability to meet physical requirements (movement, lifting, as relevant to job) Preferred Skills, Education, and Experience High school diploma or equivalent (such as the GED) from an accredited educational institution Machinery knowledge Experience operating...