Site icon Learning & Doing

Troubleshooting Workloads on GKE for Site Reliability Engineers

Site

“Troubleshooting Workloads on GKE for Site Reliability Engineers”

Pengantar

Insinyur Keandalan Lokasi (SRE) memiliki serangkaian tanggung jawab yang luas, dan mengelola insiden adalah bagian penting dari peran mereka. Anda akan mempelajari cara memanfaatkan kemampuan terintegrasi rangkaian operasi Google Cloud yang mencakup pembuatan log, pemantauan, dan dasbor siap pakai yang kaya.

Praktiktikum

Task 1. Navigating Google Kubernetes Engine (GKE) resource pages

Task 2. Accessing operational data through GKE Dashboards

git clone --depth 1 --branch cloudskillsboost_asm https://github.com/GoogleCloudPlatform/cloud-ops-sandbox.git
cd cloud-ops-sandbox/sre-recipes
./sandboxctl sre-recipes restore "recipe3"

Task 3. Proactive monitoring with logs-based metrics

Metric Type: Counter
Log metric name: Error_Rate_SLI
Filter Selection: (Copy and paste the filter below)
resource.labels.cluster_name="cloud-ops-sandbox" AND resource.labels.namespace_name="default" AND resource.type="k8s_container" AND labels.k8s-pod/app="recommendationservice" AND severity>=ERROR

Task 4. Creating a SLO

Task 5. Define an alert on the SLO

Penutup

Sahabat Blog Learning & Doing demikianlah penjelasan mengenai Troubleshooting Workloads on GKE for Site Reliability Engineers. Semoga Bermanfaat . Sampai ketemu lagi di postingan berikut nya.

Exit mobile version