FinOps en production : optimiser les coûts cloud avec tagging, rightsizing et Kubecost

Publié le 17 janvier 2026

Cloud
FinOps
Kubernetes

FinOps devient critique en 2026 : budgets cloud explosent (+40%/an), gaspillage moyen 30%. Ce guide couvre tagging automatique, rightsizing, Kubecost pour Kubernetes et optimisation continue temps réel.

Plan

  • Qu'est-ce que le FinOps ?
  • Tagging et allocation des coûts
  • Rightsizing instances et storage
  • Kubecost : FinOps pour Kubernetes
  • Reserved Instances et Savings Plans
  • Spot instances et architecture résiliente
  • Dashboards et alertes temps réel
  • Culture FinOps et gouvernance
  • Conclusion

Qu'est-ce que le FinOps ?

Définition et contexte 2026

FinOps = pratique culturelle et discipline qui allie finance, technologie et business pour optimiser les dépenses cloud.

Problématique :

  • Dépenses cloud : +40% croissance annuelle
  • Gaspillage moyen : 30% du budget cloud
  • Visibilité : <50% entreprises connaissent leurs coûts réels
  • Attribution : impossible de facturer équipes correctement

Objectif FinOps :

  • Visibilité : coûts temps réel par équipe/projet
  • Responsabilité : chaque équipe owner de ses coûts
  • Optimisation : décisions basées sur ROI
  • Prédictibilité : budgets et forecasts précis

Statistiques 2026

  • 82% entreprises adoptent FinOps formellement
  • $1.3T dépenses cloud globales
  • 30% économies moyennes après FinOps
  • 2-6 mois ROI typique initiative FinOps
  • FinOps Engineer = top 10 rôle cloud demandé

Modèle FinOps Foundation

┌────────────────────────────────────────┐
│         INFORM (Visibilité)            │
│  • Allocation coûts                    │
│  • Tagging resources                   │
│  • Forecasting                         │
│  • Benchmarking                        │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPTIMIZE (Efficience)           │
│  • Rightsizing                         │
│  • Reserved Instances                  │
│  • Spot instances                      │
│  • Storage optimization                │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPERATE (Gouvernance)           │
│  • Policies automatiques               │
│  • Alertes budgets                     │
│  • Chargeback/Showback                 │
│  • Culture FinOps                      │
└────────────────────────────────────────┘

Tagging et allocation des coûts

Stratégie de tagging

Tags essentiels :

  • Environment : prod/staging/dev
  • Team : équipe propriétaire
  • Project : projet/produit
  • CostCenter : centre de coûts finance
  • Owner : email responsable
  • Application : nom application
  • ManagedBy : terraform/manual/autoscaling

Policy de tagging AWS

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "rds:CreateDBInstance",
        "s3:CreateBucket",
        "elasticloadbalancing:CreateLoadBalancer"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:RequestTag/Environment": ["prod", "staging", "dev"],
          "aws:RequestTag/Team": "*",
          "aws:RequestTag/Project": "*",
          "aws:RequestTag/CostCenter": "*"
        }
      }
    }
  ]
}

Appliquer via AWS Organizations :

# Service Control Policy (SCP)
aws organizations create-policy \
  --name RequireTagsPolicy \
  --type SERVICE_CONTROL_POLICY \
  --content file://require-tags-policy.json

# Attacher à OU
aws organizations attach-policy \
  --policy-id p-abc123 \
  --target-id ou-xyz789

Tag automatique avec Terraform

# variables.tf
variable "default_tags" {
  type = map(string)
  default = {
    Environment = "prod"
    ManagedBy   = "terraform"
    Team        = "platform"
    CostCenter  = "engineering"
  }
}

# provider.tf
provider "aws" {
  region = "eu-west-1"
  
  default_tags {
    tags = var.default_tags
  }
}

# main.tf - tags appliqués automatiquement
resource "aws_instance" "app" {
  ami           = "ami-12345678"
  instance_type = "t3.medium"
  
  tags = merge(
    var.default_tags,
    {
      Name        = "app-server"
      Application = "payment-api"
      Owner       = "team-payments@company.com"
    }
  )
}

Tag Compliance Checker

#!/usr/bin/env python3
# check_tags.py

import boto3
import json
from datetime import datetime

REQUIRED_TAGS = ['Environment', 'Team', 'Project', 'CostCenter']

def check_ec2_tags():
    ec2 = boto3.client('ec2')
    
    instances = ec2.describe_instances()
    non_compliant = []
    
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}
            
            missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]
            
            if missing_tags:
                non_compliant.append({
                    'InstanceId': instance_id,
                    'MissingTags': missing_tags,
                    'State': instance['State']['Name']
                })
    
    return non_compliant

def check_rds_tags():
    rds = boto3.client('rds')
    
    instances = rds.describe_db_instances()
    non_compliant = []
    
    for instance in instances['DBInstances']:
        db_id = instance['DBInstanceIdentifier']
        arn = instance['DBInstanceArn']
        
        tags_response = rds.list_tags_for_resource(ResourceName=arn)
        tags = {tag['Key']: tag['Value'] for tag in tags_response['TagList']}
        
        missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]
        
        if missing_tags:
            non_compliant.append({
                'DBInstanceId': db_id,
                'MissingTags': missing_tags,
                'Status': instance['DBInstanceStatus']
            })
    
    return non_compliant

def main():
    print("Checking tag compliance...")
    
    ec2_issues = check_ec2_tags()
    rds_issues = check_rds_tags()
    
    report = {
        'Timestamp': datetime.now().isoformat(),
        'EC2': {
            'Total': len(ec2_issues),
            'NonCompliant': ec2_issues
        },
        'RDS': {
            'Total': len(rds_issues),
            'NonCompliant': rds_issues
        }
    }
    
    print(json.dumps(report, indent=2))
    
    # Slack notification si problèmes
    if ec2_issues or rds_issues:
        # send_slack_alert(report)
        pass

if __name__ == '__main__':
    main()
# Cron daily
0 9 * * * /usr/local/bin/check_tags.py | mail -s "Tag Compliance Report" finops@company.com

Rightsizing instances et storage

Analyse utilisation EC2

# CloudWatch metrics 14 jours
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-abc123 \
  --start-time 2026-01-03T00:00:00Z \
  --end-time 2026-01-17T00:00:00Z \
  --period 3600 \
  --statistics Average,Maximum

# Exemple output:
# Average: 12%
# Maximum: 28%
# → Instance oversized, rightsizing recommandé

Script rightsizing automatique

#!/usr/bin/env python3
# rightsizing_recommendations.py

import boto3
from datetime import datetime, timedelta

def get_cpu_utilization(instance_id, days=14):
    cloudwatch = boto3.client('cloudwatch')
    
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
    
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average', 'Maximum']
    )
    
    if not response['Datapoints']:
        return None, None
    
    avg = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
    max_cpu = max(d['Maximum'] for d in response['Datapoints'])
    
    return avg, max_cpu

def get_rightsizing_recommendation(instance_type, avg_cpu, max_cpu):
    """
    Recommandations basées sur utilisation:
    - avg < 20% et max < 40% : downsize
    - avg > 70% ou max > 90% : upsize
    """
    
    # Mapping instance types (simplifié)
    downsize_map = {
        't3.xlarge': 't3.large',
        't3.large': 't3.medium',
        't3.medium': 't3.small',
        'm5.2xlarge': 'm5.xlarge',
        'm5.xlarge': 'm5.large',
        'm5.large': 'm5.medium'
    }
    
    upsize_map = {v: k for k, v in downsize_map.items()}
    
    if avg_cpu < 20 and max_cpu < 40:
        return downsize_map.get(instance_type, instance_type), "downsize"
    elif avg_cpu > 70 or max_cpu > 90:
        return upsize_map.get(instance_type, instance_type), "upsize"
    
    return instance_type, "optimal"

def analyze_instances():
    ec2 = boto3.client('ec2')
    
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    
    recommendations = []
    
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            instance_type = instance['InstanceType']
            
            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}
            name = tags.get('Name', 'N/A')
            
            avg_cpu, max_cpu = get_cpu_utilization(instance_id)
            
            if avg_cpu is None:
                continue
            
            recommended_type, action = get_rightsizing_recommendation(
                instance_type, avg_cpu, max_cpu
            )
            
            if action != "optimal":
                # Calculer économies
                current_cost = get_instance_cost(instance_type)
                new_cost = get_instance_cost(recommended_type)
                monthly_savings = (current_cost - new_cost) * 730  # heures/mois
                
                recommendations.append({
                    'InstanceId': instance_id,
                    'Name': name,
                    'CurrentType': instance_type,
                    'AvgCPU': f"{avg_cpu:.1f}%",
                    'MaxCPU': f"{max_cpu:.1f}%",
                    'Recommendation': recommended_type,
                    'Action': action,
                    'MonthlySavings': f"${monthly_savings:.2f}"
                })
    
    return recommendations

def get_instance_cost(instance_type):
    """Prix on-demand par heure (simplifié - utiliser AWS Price List API)"""
    prices = {
        't3.small': 0.0208,
        't3.medium': 0.0416,
        't3.large': 0.0832,
        't3.xlarge': 0.1664,
        'm5.medium': 0.096,
        'm5.large': 0.192,
        'm5.xlarge': 0.384,
        'm5.2xlarge': 0.768
    }
    return prices.get(instance_type, 0)

def main():
    print("Analyzing EC2 instances for rightsizing...")
    
    recommendations = analyze_instances()
    
    print(f"\nFound {len(recommendations)} rightsizing opportunities:")
    print("-" * 100)
    
    for rec in recommendations:
        print(f"Instance: {rec['InstanceId']} ({rec['Name']})")
        print(f"  Current: {rec['CurrentType']} - CPU: {rec['AvgCPU']} avg, {rec['MaxCPU']} max")
        print(f"  Recommendation: {rec['Action'].upper()} to {rec['Recommendation']}")
        print(f"  Monthly savings: {rec['MonthlySavings']}")
        print()
    
    total_savings = sum(float(r['MonthlySavings'].replace('$', '')) for r in recommendations)
    print(f"Total potential monthly savings: ${total_savings:.2f}")
    print(f"Annual savings: ${total_savings * 12:.2f}")

if __name__ == '__main__':
    main()

Storage optimization

EBS volumes non attachés :

# Lister volumes disponibles
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table

# Calculer coût
# gp3: $0.08/GB/month
# io2: $0.125/GB/month

# Snapshot puis delete volumes inutilisés
for vol in $(aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].VolumeId' --output text); do
    aws ec2 create-snapshot --volume-id $vol --description "Backup before deletion"
    aws ec2 delete-volume --volume-id $vol
done

S3 lifecycle policies :

{
  "Rules": [
    {
      "Id": "MoveToIA",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 180,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 90
      }
    },
    {
      "Id": "DeleteOldBackups",
      "Status": "Enabled",
      "Prefix": "backups/",
      "Expiration": {
        "Days": 730
      }
    }
  ]
}

Kubecost : FinOps pour Kubernetes

Installation Kubecost

# Ajouter repo Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

# Installer Kubecost
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken="aGVsbEB3b3JsZAo=" \
  --set prometheus.server.persistentVolume.enabled=true \
  --set prometheus.server.persistentVolume.size=32Gi

# Vérifier
kubectl get pods -n kubecost
# kubecost-cost-analyzer-xxx     3/3     Running
# kubecost-prometheus-server-xxx 2/2     Running

# Port-forward UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

# Accès: http://localhost:9090

Configuration cloud billing

AWS :

# kubecost-values.yaml
kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  awsSpotDataRegion: eu-west-1
  awsSpotDataBucket: kubecost-spot-data-bucket
  athenaProjectID: my-project
  athenaBucketName: aws-athena-query-results-bucket
  athenaRegion: eu-west-1
  athenaDatabase: athenacurcfn_cur
  athenaTable: cur

GCP :

kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  gcpBillingDataDataset: billing_export
  gcpProjectID: my-gcp-project
# Secret pour credentials
kubectl create secret generic cloud-integration \
  -n kubecost \
  --from-file=cloud-integration.json=gcp-key.json

# Upgrade avec config
helm upgrade kubecost kubecost/cost-analyzer \
  -n kubecost \
  -f kubecost-values.yaml

Allocation par namespace/label

# API Kubecost - coûts par namespace
curl "http://localhost:9090/model/allocation?window=7d&aggregate=namespace"

# Output JSON:
{
  "data": [
    {
      "namespace": "production",
      "totalCost": 12456.78,
      "cpuCost": 5432.10,
      "ramCost": 4321.09,
      "pvCost": 2703.59
    },
    {
      "namespace": "staging",
      "totalCost": 1234.56,
      ...
    }
  ]
}

Allocation par label :

# Coûts par team
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:team"

# Coûts par application
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:app"

Savings recommendations

# API recommendations
curl "http://localhost:9090/model/savings"

# Output:
{
  "clusterSizing": {
    "overprovisioned": [
      {
        "namespace": "dev",
        "deployment": "test-app",
        "container": "app",
        "currentCPU": "2000m",
        "recommendedCPU": "500m",
        "monthlySavings": 87.45
      }
    ]
  },
  "abandonedWorkloads": [
    {
      "namespace": "staging",
      "deployment": "old-api",
      "monthlyCost": 234.56,
      "reason": "0 requests last 30 days"
    }
  ]
}

Kubecost Alerts

# kubecost-alerts.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-alerts
  namespace: kubecost
data:
  alerts.json: |
    {
      "alerts": [
        {
          "type": "budget",
          "name": "Production Budget Alert",
          "threshold": 10000,
          "window": "monthly",
          "aggregation": "namespace",
          "filter": "namespace=production",
          "ownerContact": ["team-platform@company.com"]
        },
        {
          "type": "spendChange",
          "name": "Staging Spend Spike",
          "threshold": 50,
          "window": "1d",
          "aggregation": "namespace",
          "filter": "namespace=staging",
          "ownerContact": ["team-dev@company.com"]
        },
        {
          "type": "efficiency",
          "name": "Low Efficiency Alert",
          "threshold": 0.5,
          "window": "7d",
          "aggregation": "deployment",
          "ownerContact": ["finops@company.com"]
        }
      ]
    }

Reserved Instances et Savings Plans

Analyse couverture RI

#!/usr/bin/env python3
# ri_coverage.py

import boto3
from datetime import datetime, timedelta

def analyze_ri_coverage():
    ce = boto3.client('ce')  # Cost Explorer
    
    end = datetime.now().date()
    start = end - timedelta(days=30)
    
    response = ce.get_reservation_coverage(
        TimePeriod={
            'Start': start.strftime('%Y-%m-%d'),
            'End': end.strftime('%Y-%m-%d')
        },
        Granularity='MONTHLY',
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'INSTANCE_TYPE'},
            {'Type': 'DIMENSION', 'Key': 'REGION'}
        ]
    )
    
    print("Reserved Instance Coverage Report")
    print("=" * 80)
    
    for item in response['CoveragesByTime']:
        period = item['TimePeriod']
        
        for group in item['Groups']:
            instance_type = group['Attributes'].get('INSTANCE_TYPE', 'N/A')
            region = group['Attributes'].get('REGION', 'N/A')
            
            coverage = group['Coverage']
            coverage_hours = coverage['CoverageHours']
            
            on_demand_hours = float(coverage_hours.get('OnDemandHours', 0))
            reserved_hours = float(coverage_hours.get('ReservedHours', 0))
            total_hours = float(coverage_hours.get('TotalRunningHours', 0))
            
            if total_hours > 0:
                coverage_pct = (reserved_hours / total_hours) * 100
                
                print(f"\n{instance_type} in {region}")
                print(f"  Total Hours: {total_hours:.0f}")
                print(f"  Reserved Hours: {reserved_hours:.0f}")
                print(f"  On-Demand Hours: {on_demand_hours:.0f}")
                print(f"  Coverage: {coverage_pct:.1f}%")
                
                # Recommandation si couverture &lt; 70%
                if coverage_pct < 70 and total_hours > 500:
                    print(f"  ⚠️  RECOMMENDATION: Consider purchasing RI")

def get_ri_recommendations():
    ce = boto3.client('ce')
    
    response = ce.get_reservation_purchase_recommendation(
        Service='Amazon Elastic Compute Cloud - Compute',
        AccountScope='PAYER',
        LookbackPeriodInDays='THIRTY_DAYS',
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT'
    )
    
    print("\n" + "=" * 80)
    print("RI Purchase Recommendations")
    print("=" * 80)
    
    for rec in response['Recommendations']:
        details = rec['RecommendationDetails']
        
        print(f"\nInstance Type: {details.get('InstanceType', 'N/A')}")
        print(f"Region: {details.get('Region', 'N/A')}")
        print(f"Recommended: {details.get('RecommendedNumberOfInstancesToPurchase', 0)} instances")
        print(f"Monthly Savings: ${float(details.get('EstimatedMonthlySavingsAmount', 0)):.2f}")
        print(f"Upfront Cost: ${float(details.get('UpfrontCost', 0)):.2f}")
        print(f"Monthly Cost: ${float(details.get('RecurringStandardMonthlyCost', 0)):.2f}")

if __name__ == '__main__':
    analyze_ri_coverage()
    get_ri_recommendations()

Savings Plans vs Reserved Instances

CritèreReserved InstancesSavings Plans
FlexibilitéFixe (type/région)Flexible (type/région/famille)
Discount40-72%40-66%
EngagementInstance spécifique$/heure compute
ScopeEC2 onlyEC2, Lambda, Fargate
ChangementModifier/échangerAutomatique
Recommandé pourWorkloads stablesWorkloads variables

Recommandation 2026 : Savings Plans pour 60-70% base load, Spot pour workloads flexibles.


Spot instances et architecture résiliente

Spot instances : 70-90% discount

Cas d'usage :

  • CI/CD runners
  • Batch processing
  • Data analytics
  • Dev/test environments
  • Stateless applications

Kubernetes avec Spot instances

# spot-nodegroup.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: eu-west-1

nodeGroups:
  # On-Demand pour workloads critiques
  - name: on-demand
    instanceType: m5.xlarge
    minSize: 3
    maxSize: 10
    desiredCapacity: 5
    labels:
      workload-type: critical
    taints:
      - key: workload-type
        value: critical
        effect: NoSchedule
  
  # Spot pour workloads tolérants
  - name: spot
    instancesDistribution:
      instanceTypes:
        - m5.xlarge
        - m5a.xlarge
        - m5n.xlarge
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 3
    minSize: 0
    maxSize: 50
    desiredCapacity: 10
    labels:
      workload-type: flexible
    taints:
      - key: workload-type
        value: flexible
        effect: NoSchedule

Deployment spot-friendly :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      # Tolérer Spot instances
      tolerations:
      - key: workload-type
        operator: Equal
        value: flexible
        effect: NoSchedule
      
      # Affinité Spot nodes
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: workload-type
                operator: In
                values:
                - flexible
      
      # Graceful shutdown
      terminationGracePeriodSeconds: 120
      
      containers:
      - name: processor
        image: batch-processor:v1
        
        # Handle SIGTERM properly
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
        
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi

Spot interruption handler

# Installer AWS Node Termination Handler
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableScheduledEventDraining=true

Handler personnalisé :

#!/usr/bin/env python3
# spot_handler.py - sur chaque spot node

import requests
import time
import subprocess

METADATA_URL = "http://169.254.169.254/latest/meta-data/spot/instance-action"

def check_spot_termination():
    try:
        response = requests.get(METADATA_URL, timeout=1)
        if response.status_code == 200:
            return True, response.json()
    except:
        pass
    return False, None

def drain_node():
    # Cordon node
    subprocess.run(['kubectl', 'cordon', NODE_NAME])
    
    # Drain with grace period
    subprocess.run([
        'kubectl', 'drain', NODE_NAME,
        '--ignore-daemonsets',
        '--delete-emptydir-data',
        '--grace-period=90'
    ])

if __name__ == '__main__':
    while True:
        terminating, action = check_spot_termination()
        
        if terminating:
            print(f"Spot termination notice received: {action}")
            drain_node()
            break
        
        time.sleep(5)

Dashboards et alertes temps réel

CloudWatch Billing Dashboard

#!/usr/bin/env python3
# create_billing_dashboard.py

import boto3
import json

cloudwatch = boto3.client('cloudwatch')

dashboard_body = {
    "widgets": [
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Total AWS Charges (MTD)",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        },
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"dimensions": {"ServiceName": "AmazonEC2"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonRDS"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonS3"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonEKS"}}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Charges by Service",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        }
    ]
}

cloudwatch.put_dashboard(
    DashboardName='FinOps-Billing',
    DashboardBody=json.dumps(dashboard_body)
)

print("Dashboard created: FinOps-Billing")

Budget Alerts

# AWS Budget avec alertes
aws budgets create-budget \
  --account-id 123456789012 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json
// budget.json
{
  "BudgetName": "Monthly-Production-Budget",
  "BudgetLimit": {
    "Amount": "10000",
    "Unit": "USD"
  },
  "TimeUnit": "MONTHLY",
  "BudgetType": "COST",
  "CostFilters": {
    "TagKeyValue": ["user:Environment$production"]
  }
}
// notifications.json
[
  {
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [
      {
        "SubscriptionType": "EMAIL",
        "Address": "finops@company.com"
      },
      {
        "SubscriptionType": "SNS",
        "Address": "arn:aws:sns:eu-west-1:123456789012:budget-alerts"
      }
    ]
  },
  {
    "Notification": {
      "NotificationType": "FORECASTED",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 100,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [
      {
        "SubscriptionType": "EMAIL",
        "Address": "cto@company.com"
      }
    ]
  }
]

Culture FinOps et gouvernance

Chargeback vs Showback

Showback : Transparence coûts, pas de facturation

# showback_report.py - email hebdomadaire équipes

def generate_showback_report(team):
    costs = get_team_costs(team, days=7)
    
    report = f"""
    FinOps Weekly Report - {team}
    
    Last 7 days costs: ${costs['total']:.2f}
    
    Breakdown:
    - Compute (EC2/EKS): ${costs['compute']:.2f}
    - Storage (EBS/S3): ${costs['storage']:.2f}
    - Database (RDS): ${costs['database']:.2f}
    - Networking: ${costs['network']:.2f}
    
    Trend: {costs['trend']}% vs last week
    
    Top 5 resources:
    {format_top_resources(costs['top_resources'])}
    
    Optimization opportunities:
    - {len(costs['recommendations'])} rightsizing recommendations
    - Potential monthly savings: ${costs['potential_savings']:.2f}
    
    View detailed breakdown: https://finops.company.com/teams/{team}
    """
    
    send_email(f"{team}@company.com", "Weekly FinOps Report", report)

Chargeback : Facturation réelle aux équipes

# chargeback_invoice.py - mensuel

def generate_chargeback_invoice(team, month):
    costs = get_team_costs(team, month=month)
    
    # Appliquer markup (overhead infra)
    markup = 1.15  # 15% overhead
    total_with_markup = costs['total'] * markup
    
    invoice = {
        'team': team,
        'period': month,
        'subtotal': costs['total'],
        'markup': costs['total'] * 0.15,
        'total': total_with_markup,
        'cost_center': get_cost_center(team)
    }
    
    # Export vers ERP
    export_to_erp(invoice)
    
    return invoice

FinOps KPIs

# finops_kpis.py - dashboard exécutif

def calculate_finops_kpis():
    return {
        # Coût unitaire
        'cost_per_customer': total_costs / total_customers,
        'cost_per_transaction': total_costs / total_transactions,
        'cost_per_api_call': total_costs / total_api_calls,
        
        # Efficience
        'compute_utilization': used_compute / provisioned_compute,
        'storage_utilization': used_storage / provisioned_storage,
        'waste_percentage': wasted_spend / total_spend,
        
        # Couverture
        'ri_coverage': reserved_hours / total_hours,
        'spot_usage': spot_hours / total_hours,
        
        # Gouvernance
        'tagged_resources': tagged / total_resources,
        'budget_adherence': actual_spend / budgeted_spend
    }

Checklist FinOps

Phase 1 : Visibilité (Mois 1)

  • Tagging strategy définie et appliquée
  • Compliance tagging ≥90%
  • Cost Explorer configuré
  • Dashboards billing créés
  • Export données vers data lake

Phase 2 : Analyse (Mois 2)

  • Analyse utilisation EC2/RDS
  • Rightsizing recommendations
  • Storage optimization (EBS/S3)
  • Identification ressources idle
  • Quick wins implémentés (20-30% savings)

Phase 3 : Optimisation (Mois 3-4)

  • Kubecost déployé (si K8s)
  • RI/Savings Plans achetés (60-70% base)
  • Spot instances architecture
  • Budgets et alertes actifs
  • Showback rapports hebdomadaires

Phase 4 : Gouvernance (Mois 5-6)

  • Policies automatiques (tag enforcement)
  • Chargeback implémenté
  • FinOps reviews mensuelles
  • KPIs suivis et reportés
  • Culture FinOps établie

Conclusion

FinOps devient essentiel en 2026 avec budgets cloud croissants. Tagging, rightsizing, Reserved Instances et Kubecost permettent d'économiser 30-50% tout en maintenant performance et agilité.

Points clés :

  • Tagging = fondation visibilité
  • Rightsizing = quick wins 20-30%
  • Kubecost = FinOps Kubernetes essentiel
  • RI/Savings Plans = 40-70% discount base load
  • Spot = 70-90% discount workloads flexibles

Gains typiques :

  • Économies : 30-50% budget cloud
  • ROI : 2-6 mois
  • Visibilité : 100% ressources taggées
  • Efficience : +40% utilisation compute
  • Waste : -80% ressources idle

Actions prioritaires :

  1. Implémenter tagging obligatoire
  2. Analyse rightsizing EC2/RDS
  3. Déployer Kubecost (si K8s)
  4. Acheter RI/Savings Plans base load
  5. Architecture Spot instances
Besoin d'aide sur ce sujet ?

Notre équipe d'experts est là pour vous accompagner dans vos projets.

Contactez-nous

Articles similaires qui pourraient vous intéresser