FinOps en production : optimiser les coûts cloud avec tagging, rightsizing et Kubecost

FinOps devient critique en 2026 : budgets cloud explosent (+40%/an), gaspillage moyen 30%. Ce guide couvre tagging automatique, rightsizing, Kubecost pour Kubernetes et optimisation continue temps réel.

Découvrez Kubernetes, Kubecost pour les coûts Kubernetes, Terraform pour l'IaC, et Prometheus pour le monitoring pour mettre en œuvre une stratégie FinOps complète.

Plan

Qu'est-ce que le FinOps ?
Tagging et allocation des coûts
Rightsizing instances et storage
Kubecost : FinOps pour Kubernetes
Reserved Instances et Savings Plans
Spot instances et architecture résiliente
Dashboards et alertes temps réel
Culture FinOps et gouvernance
Conclusion

Qu'est-ce que le FinOps ?

Définition et contexte 2026

FinOps = pratique culturelle et discipline qui allie finance, technologie et business pour optimiser les dépenses cloud.

Problématique :

Dépenses cloud : +40% croissance annuelle
Gaspillage moyen : 30% du budget cloud
Visibilité : moins de 50% des entreprises connaissent leurs coûts réels
Attribution : impossible de facturer équipes correctement

Objectif FinOps :

Visibilité : coûts temps réel par équipe/projet
Responsabilité : chaque équipe owner de ses coûts
Optimisation : décisions basées sur ROI
Prédictibilité : budgets et forecasts précis

Statistiques 2026

82% entreprises adoptent FinOps formellement
$1.3T dépenses cloud globales
30% économies moyennes après FinOps
2-6 mois ROI typique initiative FinOps
FinOps Engineer = top 10 rôle cloud demandé

Modèle FinOps Foundation

┌────────────────────────────────────────┐
│         INFORM (Visibilité)            │
│  • Allocation coûts                    │
│  • Tagging resources                   │
│  • Forecasting                         │
│  • Benchmarking                        │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPTIMIZE (Efficience)           │
│  • Rightsizing                         │
│  • Reserved Instances                  │
│  • Spot instances                      │
│  • Storage optimization                │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPERATE (Gouvernance)           │
│  • Policies automatiques               │
│  • Alertes budgets                     │
│  • Chargeback/Showback                 │
│  • Culture FinOps                      │
└────────────────────────────────────────┘

Tagging et allocation des coûts

Stratégie de tagging

Tags essentiels :

Environment : prod/staging/dev
Team : équipe propriétaire
Project : projet/produit
CostCenter : centre de coûts finance
Owner : email responsable
Application : nom application
ManagedBy : terraform/manual/autoscaling

Policy de tagging AWS

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Deny",
			"Action": [
				"ec2:RunInstances",
				"rds:CreateDBInstance",
				"s3:CreateBucket",
				"elasticloadbalancing:CreateLoadBalancer"
			],
			"Resource": "*",
			"Condition": {
				"StringNotLike": {
					"aws:RequestTag/Environment": ["prod", "staging", "dev"],
					"aws:RequestTag/Team": "*",
					"aws:RequestTag/Project": "*",
					"aws:RequestTag/CostCenter": "*"
				}
			}
		}
	]
}

Appliquer via AWS Organizations :

# Service Control Policy (SCP)
aws organizations create-policy \
  --name RequireTagsPolicy \
  --type SERVICE_CONTROL_POLICY \
  --content file://require-tags-policy.json

# Attacher à OU
aws organizations attach-policy \
  --policy-id p-abc123 \
  --target-id ou-xyz789

Tag automatique avec Terraform

# variables.tf
variable "default_tags" {
  type = map(string)
  default = {
    Environment = "prod"
    ManagedBy   = "terraform"
    Team        = "platform"
    CostCenter  = "engineering"
  }
}

# provider.tf
provider "aws" {
  region = "eu-west-1"

  default_tags {
    tags = var.default_tags
  }
}

# main.tf - tags appliqués automatiquement
resource "aws_instance" "app" {
  ami           = "ami-12345678"
  instance_type = "t3.medium"

  tags = merge(
    var.default_tags,
    {
      Name        = "app-server"
      Application = "payment-api"
      Owner       = "team-payments@company.com"
    }
  )
}

Tag Compliance Checker

#!/usr/bin/env python3
# check_tags.py

import boto3
import json
from datetime import datetime

REQUIRED_TAGS = ['Environment', 'Team', 'Project', 'CostCenter']

def check_ec2_tags():
    ec2 = boto3.client('ec2')

    instances = ec2.describe_instances()
    non_compliant = []

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}

            missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]

            if missing_tags:
                non_compliant.append({
                    'InstanceId': instance_id,
                    'MissingTags': missing_tags,
                    'State': instance['State']['Name']
                })

    return non_compliant

def check_rds_tags():
    rds = boto3.client('rds')

    instances = rds.describe_db_instances()
    non_compliant = []

    for instance in instances['DBInstances']:
        db_id = instance['DBInstanceIdentifier']
        arn = instance['DBInstanceArn']

        tags_response = rds.list_tags_for_resource(ResourceName=arn)
        tags = {tag['Key']: tag['Value'] for tag in tags_response['TagList']}

        missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]

        if missing_tags:
            non_compliant.append({
                'DBInstanceId': db_id,
                'MissingTags': missing_tags,
                'Status': instance['DBInstanceStatus']
            })

    return non_compliant

def main():
    print("Checking tag compliance...")

    ec2_issues = check_ec2_tags()
    rds_issues = check_rds_tags()

    report = {
        'Timestamp': datetime.now().isoformat(),
        'EC2': {
            'Total': len(ec2_issues),
            'NonCompliant': ec2_issues
        },
        'RDS': {
            'Total': len(rds_issues),
            'NonCompliant': rds_issues
        }
    }

    print(json.dumps(report, indent=2))

    # Slack notification si problèmes
    if ec2_issues or rds_issues:
        # send_slack_alert(report)
        pass

if __name__ == '__main__':
    main()

# Cron daily
0 9 * * * /usr/local/bin/check_tags.py | mail -s "Tag Compliance Report" finops@company.com

Rightsizing instances et storage

Analyse utilisation EC2

# CloudWatch metrics 14 jours
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-abc123 \
  --start-time 2026-01-03T00:00:00Z \
  --end-time 2026-01-17T00:00:00Z \
  --period 3600 \
  --statistics Average,Maximum

# Exemple output:
# Average: 12%
# Maximum: 28%
# → Instance oversized, rightsizing recommandé

Script rightsizing automatique

#!/usr/bin/env python3
# rightsizing_recommendations.py

import boto3
from datetime import datetime, timedelta

def get_cpu_utilization(instance_id, days=14):
    cloudwatch = boto3.client('cloudwatch')

    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)

    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    if not response['Datapoints']:
        return None, None

    avg = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
    max_cpu = max(d['Maximum'] for d in response['Datapoints'])

    return avg, max_cpu

def get_rightsizing_recommendation(instance_type, avg_cpu, max_cpu):
    """
    Recommandations basées sur utilisation:
    - avg < 20% et max < 40% : downsize
    - avg > 70% ou max > 90% : upsize
    """

    # Mapping instance types (simplifié)
    downsize_map = {
        't3.xlarge': 't3.large',
        't3.large': 't3.medium',
        't3.medium': 't3.small',
        'm5.2xlarge': 'm5.xlarge',
        'm5.xlarge': 'm5.large',
        'm5.large': 'm5.medium'
    }

    upsize_map = {v: k for k, v in downsize_map.items()}

    if avg_cpu < 20 and max_cpu < 40:
        return downsize_map.get(instance_type, instance_type), "downsize"
    elif avg_cpu > 70 or max_cpu > 90:
        return upsize_map.get(instance_type, instance_type), "upsize"

    return instance_type, "optimal"

def analyze_instances():
    ec2 = boto3.client('ec2')

    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )

    recommendations = []

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            instance_type = instance['InstanceType']

            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}
            name = tags.get('Name', 'N/A')

            avg_cpu, max_cpu = get_cpu_utilization(instance_id)

            if avg_cpu is None:
                continue

            recommended_type, action = get_rightsizing_recommendation(
                instance_type, avg_cpu, max_cpu
            )

            if action != "optimal":
                # Calculer économies
                current_cost = get_instance_cost(instance_type)
                new_cost = get_instance_cost(recommended_type)
                monthly_savings = (current_cost - new_cost) * 730  # heures/mois

                recommendations.append({
                    'InstanceId': instance_id,
                    'Name': name,
                    'CurrentType': instance_type,
                    'AvgCPU': f"{avg_cpu:.1f}%",
                    'MaxCPU': f"{max_cpu:.1f}%",
                    'Recommendation': recommended_type,
                    'Action': action,
                    'MonthlySavings': f"${monthly_savings:.2f}"
                })

    return recommendations

def get_instance_cost(instance_type):
    """Prix on-demand par heure (simplifié - utiliser AWS Price List API)"""
    prices = {
        't3.small': 0.0208,
        't3.medium': 0.0416,
        't3.large': 0.0832,
        't3.xlarge': 0.1664,
        'm5.medium': 0.096,
        'm5.large': 0.192,
        'm5.xlarge': 0.384,
        'm5.2xlarge': 0.768
    }
    return prices.get(instance_type, 0)

def main():
    print("Analyzing EC2 instances for rightsizing...")

    recommendations = analyze_instances()

    print(f"\nFound {len(recommendations)} rightsizing opportunities:")
    print("-" * 100)

    for rec in recommendations:
        print(f"Instance: {rec['InstanceId']} ({rec['Name']})")
        print(f"  Current: {rec['CurrentType']} - CPU: {rec['AvgCPU']} avg, {rec['MaxCPU']} max")
        print(f"  Recommendation: {rec['Action'].upper()} to {rec['Recommendation']}")
        print(f"  Monthly savings: {rec['MonthlySavings']}")
        print()

    total_savings = sum(float(r['MonthlySavings'].replace('$', '')) for r in recommendations)
    print(f"Total potential monthly savings: ${total_savings:.2f}")
    print(f"Annual savings: ${total_savings * 12:.2f}")

if __name__ == '__main__':
    main()

Storage optimization

EBS volumes non attachés :

# Lister volumes disponibles
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table

# Calculer coût
# gp3: $0.08/GB/month
# io2: $0.125/GB/month

# Snapshot puis delete volumes inutilisés
for vol in $(aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].VolumeId' --output text); do
    aws ec2 create-snapshot --volume-id $vol --description "Backup before deletion"
    aws ec2 delete-volume --volume-id $vol
done

S3 lifecycle policies :

{
	"Rules": [
		{
			"Id": "MoveToIA",
			"Status": "Enabled",
			"Transitions": [
				{
					"Days": 90,
					"StorageClass": "STANDARD_IA"
				},
				{
					"Days": 180,
					"StorageClass": "GLACIER"
				},
				{
					"Days": 365,
					"StorageClass": "DEEP_ARCHIVE"
				}
			],
			"NoncurrentVersionTransitions": [
				{
					"NoncurrentDays": 30,
					"StorageClass": "STANDARD_IA"
				}
			],
			"NoncurrentVersionExpiration": {
				"NoncurrentDays": 90
			}
		},
		{
			"Id": "DeleteOldBackups",
			"Status": "Enabled",
			"Prefix": "backups/",
			"Expiration": {
				"Days": 730
			}
		}
	]
}

Kubecost : FinOps pour Kubernetes

Installation Kubecost

# Ajouter repo Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

# Installer Kubecost
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken="aGVsbEB3b3JsZAo=" \
  --set prometheus.server.persistentVolume.enabled=true \
  --set prometheus.server.persistentVolume.size=32Gi

# Vérifier
kubectl get pods -n kubecost
# kubecost-cost-analyzer-xxx     3/3     Running
# kubecost-prometheus-server-xxx 2/2     Running

# Port-forward UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

# Accès: http://localhost:9090

Configuration cloud billing

AWS :

# kubecost-values.yaml
kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  awsSpotDataRegion: eu-west-1
  awsSpotDataBucket: kubecost-spot-data-bucket
  athenaProjectID: my-project
  athenaBucketName: aws-athena-query-results-bucket
  athenaRegion: eu-west-1
  athenaDatabase: athenacurcfn_cur
  athenaTable: cur

GCP :

kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  gcpBillingDataDataset: billing_export
  gcpProjectID: my-gcp-project

# Secret pour credentials
kubectl create secret generic cloud-integration \
  -n kubecost \
  --from-file=cloud-integration.json=gcp-key.json

# Upgrade avec config
helm upgrade kubecost kubecost/cost-analyzer \
  -n kubecost \
  -f kubecost-values.yaml

Allocation par namespace/label

# API Kubecost - coûts par namespace
curl "http://localhost:9090/model/allocation?window=7d&aggregate=namespace"

# Output JSON:
{
  "data": [
    {
      "namespace": "production",
      "totalCost": 12456.78,
      "cpuCost": 5432.10,
      "ramCost": 4321.09,
      "pvCost": 2703.59
    },
    {
      "namespace": "staging",
      "totalCost": 1234.56,
      ...
    }
  ]
}

Allocation par label :

# Coûts par team
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:team"

# Coûts par application
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:app"

Savings recommendations

# API recommendations
curl "http://localhost:9090/model/savings"

# Output:
{
  "clusterSizing": {
    "overprovisioned": [
      {
        "namespace": "dev",
        "deployment": "test-app",
        "container": "app",
        "currentCPU": "2000m",
        "recommendedCPU": "500m",
        "monthlySavings": 87.45
      }
    ]
  },
  "abandonedWorkloads": [
    {
      "namespace": "staging",
      "deployment": "old-api",
      "monthlyCost": 234.56,
      "reason": "0 requests last 30 days"
    }
  ]
}

Kubecost Alerts

# kubecost-alerts.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-alerts
  namespace: kubecost
data:
  alerts.json: |
    {
      "alerts": [
        {
          "type": "budget",
          "name": "Production Budget Alert",
          "threshold": 10000,
          "window": "monthly",
          "aggregation": "namespace",
          "filter": "namespace=production",
          "ownerContact": ["team-platform@company.com"]
        },
        {
          "type": "spendChange",
          "name": "Staging Spend Spike",
          "threshold": 50,
          "window": "1d",
          "aggregation": "namespace",
          "filter": "namespace=staging",
          "ownerContact": ["team-dev@company.com"]
        },
        {
          "type": "efficiency",
          "name": "Low Efficiency Alert",
          "threshold": 0.5,
          "window": "7d",
          "aggregation": "deployment",
          "ownerContact": ["finops@company.com"]
        }
      ]
    }

Reserved Instances et Savings Plans

Analyse couverture RI

#!/usr/bin/env python3
# ri_coverage.py

import boto3
from datetime import datetime, timedelta

def analyze_ri_coverage():
    ce = boto3.client('ce')  # Cost Explorer

    end = datetime.now().date()
    start = end - timedelta(days=30)

    response = ce.get_reservation_coverage(
        TimePeriod={
            'Start': start.strftime('%Y-%m-%d'),
            'End': end.strftime('%Y-%m-%d')
        },
        Granularity='MONTHLY',
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'INSTANCE_TYPE'},
            {'Type': 'DIMENSION', 'Key': 'REGION'}
        ]
    )

    print("Reserved Instance Coverage Report")
    print("=" * 80)

    for item in response['CoveragesByTime']:
        period = item['TimePeriod']

        for group in item['Groups']:
            instance_type = group['Attributes'].get('INSTANCE_TYPE', 'N/A')
            region = group['Attributes'].get('REGION', 'N/A')

            coverage = group['Coverage']
            coverage_hours = coverage['CoverageHours']

            on_demand_hours = float(coverage_hours.get('OnDemandHours', 0))
            reserved_hours = float(coverage_hours.get('ReservedHours', 0))
            total_hours = float(coverage_hours.get('TotalRunningHours', 0))

            if total_hours > 0:
                coverage_pct = (reserved_hours / total_hours) * 100

                print(f"\n{instance_type} in {region}")
                print(f"  Total Hours: {total_hours:.0f}")
                print(f"  Reserved Hours: {reserved_hours:.0f}")
                print(f"  On-Demand Hours: {on_demand_hours:.0f}")
                print(f"  Coverage: {coverage_pct:.1f}%")

                # Recommandation si couverture < 70%
                if coverage_pct < 70 and total_hours > 500:
                    print(f"  ⚠️  RECOMMENDATION: Consider purchasing RI")

def get_ri_recommendations():
    ce = boto3.client('ce')

    response = ce.get_reservation_purchase_recommendation(
        Service='Amazon Elastic Compute Cloud - Compute',
        AccountScope='PAYER',
        LookbackPeriodInDays='THIRTY_DAYS',
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT'
    )

    print("\n" + "=" * 80)
    print("RI Purchase Recommendations")
    print("=" * 80)

    for rec in response['Recommendations']:
        details = rec['RecommendationDetails']

        print(f"\nInstance Type: {details.get('InstanceType', 'N/A')}")
        print(f"Region: {details.get('Region', 'N/A')}")
        print(f"Recommended: {details.get('RecommendedNumberOfInstancesToPurchase', 0)} instances")
        print(f"Monthly Savings: ${float(details.get('EstimatedMonthlySavingsAmount', 0)):.2f}")
        print(f"Upfront Cost: ${float(details.get('UpfrontCost', 0)):.2f}")
        print(f"Monthly Cost: ${float(details.get('RecurringStandardMonthlyCost', 0)):.2f}")

if __name__ == '__main__':
    analyze_ri_coverage()
    get_ri_recommendations()

Savings Plans vs Reserved Instances

Critère	Reserved Instances	Savings Plans
Flexibilité	Fixe (type/région)	Flexible (type/région/famille)
Discount	40-72%	40-66%
Engagement	Instance spécifique	$/heure compute
Scope	EC2 only	EC2, Lambda, Fargate
Changement	Modifier/échanger	Automatique
Recommandé pour	Workloads stables	Workloads variables

Recommandation 2026 : Savings Plans pour 60-70% base load, Spot pour workloads flexibles.

Spot instances et architecture résiliente

Spot instances : 70-90% discount

Cas d'usage :

CI/CD runners
Batch processing
Data analytics
Dev/test environments
Stateless applications

Kubernetes avec Spot instances

# spot-nodegroup.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: eu-west-1

nodeGroups:
  # On-Demand pour workloads critiques
  - name: on-demand
    instanceType: m5.xlarge
    minSize: 3
    maxSize: 10
    desiredCapacity: 5
    labels:
      workload-type: critical
    taints:
      - key: workload-type
        value: critical
        effect: NoSchedule

  # Spot pour workloads tolérants
  - name: spot
    instancesDistribution:
      instanceTypes:
        - m5.xlarge
        - m5a.xlarge
        - m5n.xlarge
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 3
    minSize: 0
    maxSize: 50
    desiredCapacity: 10
    labels:
      workload-type: flexible
    taints:
      - key: workload-type
        value: flexible
        effect: NoSchedule

Deployment spot-friendly :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      # Tolérer Spot instances
      tolerations:
        - key: workload-type
          operator: Equal
          value: flexible
          effect: NoSchedule

      # Affinité Spot nodes
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: workload-type
                    operator: In
                    values:
                      - flexible

      # Graceful shutdown
      terminationGracePeriodSeconds: 120

      containers:
        - name: processor
          image: batch-processor:v1

          # Handle SIGTERM properly
          lifecycle:
            preStop:
              exec:
                command: ['/bin/sh', '-c', 'sleep 15']

          resources:
            requests:
              cpu: 1000m
              memory: 2Gi
            limits:
              cpu: 2000m
              memory: 4Gi

Spot interruption handler

# Installer AWS Node Termination Handler
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableScheduledEventDraining=true

Handler personnalisé :

#!/usr/bin/env python3
# spot_handler.py - sur chaque spot node

import requests
import time
import subprocess

METADATA_URL = "http://169.254.169.254/latest/meta-data/spot/instance-action"

def check_spot_termination():
    try:
        response = requests.get(METADATA_URL, timeout=1)
        if response.status_code == 200:
            return True, response.json()
    except:
        pass
    return False, None

def drain_node():
    # Cordon node
    subprocess.run(['kubectl', 'cordon', NODE_NAME])

    # Drain with grace period
    subprocess.run([
        'kubectl', 'drain', NODE_NAME,
        '--ignore-daemonsets',
        '--delete-emptydir-data',
        '--grace-period=90'
    ])

if __name__ == '__main__':
    while True:
        terminating, action = check_spot_termination()

        if terminating:
            print(f"Spot termination notice received: {action}")
            drain_node()
            break

        time.sleep(5)

Dashboards et alertes temps réel

CloudWatch Billing Dashboard

#!/usr/bin/env python3
# create_billing_dashboard.py

import boto3
import json

cloudwatch = boto3.client('cloudwatch')

dashboard_body = {
    "widgets": [
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Total AWS Charges (MTD)",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        },
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"dimensions": {"ServiceName": "AmazonEC2"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonRDS"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonS3"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonEKS"}}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Charges by Service",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        }
    ]
}

cloudwatch.put_dashboard(
    DashboardName='FinOps-Billing',
    DashboardBody=json.dumps(dashboard_body)
)

print("Dashboard created: FinOps-Billing")

Budget Alerts

# AWS Budget avec alertes
aws budgets create-budget \
  --account-id 123456789012 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json

// budget.json
{
	"BudgetName": "Monthly-Production-Budget",
	"BudgetLimit": {
		"Amount": "10000",
		"Unit": "USD"
	},
	"TimeUnit": "MONTHLY",
	"BudgetType": "COST",
	"CostFilters": {
		"TagKeyValue": ["user:Environment$production"]
	}
}

// notifications.json
[
	{
		"Notification": {
			"NotificationType": "ACTUAL",
			"ComparisonOperator": "GREATER_THAN",
			"Threshold": 80,
			"ThresholdType": "PERCENTAGE"
		},
		"Subscribers": [
			{
				"SubscriptionType": "EMAIL",
				"Address": "finops@company.com"
			},
			{
				"SubscriptionType": "SNS",
				"Address": "arn:aws:sns:eu-west-1:123456789012:budget-alerts"
			}
		]
	},
	{
		"Notification": {
			"NotificationType": "FORECASTED",
			"ComparisonOperator": "GREATER_THAN",
			"Threshold": 100,
			"ThresholdType": "PERCENTAGE"
		},
		"Subscribers": [
			{
				"SubscriptionType": "EMAIL",
				"Address": "cto@company.com"
			}
		]
	}
]

Culture FinOps et gouvernance

Chargeback vs Showback

Showback : Transparence coûts, pas de facturation

# showback_report.py - email hebdomadaire équipes

def generate_showback_report(team):
    costs = get_team_costs(team, days=7)

    report = f"""
    FinOps Weekly Report - {team}

    Last 7 days costs: ${costs['total']:.2f}

    Breakdown:
    - Compute (EC2/EKS): ${costs['compute']:.2f}
    - Storage (EBS/S3): ${costs['storage']:.2f}
    - Database (RDS): ${costs['database']:.2f}
    - Networking: ${costs['network']:.2f}

    Trend: {costs['trend']}% vs last week

    Top 5 resources:
    {format_top_resources(costs['top_resources'])}

    Optimization opportunities:
    - {len(costs['recommendations'])} rightsizing recommendations
    - Potential monthly savings: ${costs['potential_savings']:.2f}

    View detailed breakdown: https://finops.company.com/teams/{team}
    """

    send_email(f"{team}@company.com", "Weekly FinOps Report", report)

Chargeback : Facturation réelle aux équipes

# chargeback_invoice.py - mensuel

def generate_chargeback_invoice(team, month):
    costs = get_team_costs(team, month=month)

    # Appliquer markup (overhead infra)
    markup = 1.15  # 15% overhead
    total_with_markup = costs['total'] * markup

    invoice = {
        'team': team,
        'period': month,
        'subtotal': costs['total'],
        'markup': costs['total'] * 0.15,
        'total': total_with_markup,
        'cost_center': get_cost_center(team)
    }

    # Export vers ERP
    export_to_erp(invoice)

    return invoice

FinOps KPIs

# finops_kpis.py - dashboard exécutif

def calculate_finops_kpis():
    return {
        # Coût unitaire
        'cost_per_customer': total_costs / total_customers,
        'cost_per_transaction': total_costs / total_transactions,
        'cost_per_api_call': total_costs / total_api_calls,

        # Efficience
        'compute_utilization': used_compute / provisioned_compute,
        'storage_utilization': used_storage / provisioned_storage,
        'waste_percentage': wasted_spend / total_spend,

        # Couverture
        'ri_coverage': reserved_hours / total_hours,
        'spot_usage': spot_hours / total_hours,

        # Gouvernance
        'tagged_resources': tagged / total_resources,
        'budget_adherence': actual_spend / budgeted_spend
    }

Checklist FinOps

✅ Phase 1 : Visibilité (Mois 1)

Tagging strategy définie et appliquée
Compliance tagging ≥90%
Cost Explorer configuré
Dashboards billing créés
Export données vers data lake

✅ Phase 2 : Analyse (Mois 2)

Analyse utilisation EC2/RDS
Rightsizing recommendations
Storage optimization (EBS/S3)
Identification ressources idle
Quick wins implémentés (20-30% savings)

✅ Phase 3 : Optimisation (Mois 3-4)

Kubecost déployé (si K8s)
RI/Savings Plans achetés (60-70% base)
Spot instances architecture
Budgets et alertes actifs
Showback rapports hebdomadaires

✅ Phase 4 : Gouvernance (Mois 5-6)

Conclusion

FinOps devient essentiel en 2026 avec budgets cloud croissants. Tagging, rightsizing, Reserved Instances et Kubecost permettent d'économiser 30-50% tout en maintenant performance et agilité.

Points clés :

Tagging = fondation visibilité
Rightsizing = quick wins 20-30%
Kubecost = FinOps Kubernetes essentiel
RI/Savings Plans = 40-70% discount base load
Spot = 70-90% discount workloads flexibles

Gains typiques :

Économies : 30-50% budget cloud
ROI : 2-6 mois
Visibilité : 100% ressources taggées
Efficience : +40% utilisation compute
Waste : -80% ressources idle

Actions prioritaires :

Implémenter tagging obligatoire
Analyse rightsizing EC2/RDS
Déployer Kubecost (si K8s)
Acheter RI/Savings Plans base load
Architecture Spot instances