Cloud
FinOps
Kubernetes

FinOps en production : optimiser les coûts cloud avec tagging, rightsizing et Kubecost

17 janvier 2026

15 min de lecture

FinOps devient critique en 2026 : budgets cloud explosent (+40%/an), gaspillage moyen 30%. Ce guide couvre tagging automatique, rightsizing, Kubecost pour Kubernetes et optimisation continue temps réel.

Découvrez Kubernetes, Kubecost pour le coûts Kubernetes, Terraform pour l'IaC, et Prometheus pour le monitoring pour mettre en œuvre une stratégie FinOps complète.

Plan

  • Qu'est-ce que le FinOps ?
  • Tagging et allocation des coûts
  • Rightsizing instances et storage
  • Kubecost : FinOps pour Kubernetes
  • Reserved Instances et Savings Plans
  • Spot instances et architecture résiliente
  • Dashboards et alertes temps réel
  • Culture FinOps et gouvernance
  • Conclusion

Qu'est-ce que le FinOps ?

Définition et contexte 2026

FinOps = pratique culturelle et discipline qui allie finance, technologie et business pour optimiser les dépenses cloud.

Problématique :

  • Dépenses cloud : +40% croissance annuelle
  • Gaspillage moyen : 30% du budget cloud
  • Visibilité : moins de 50% des entreprises connaissent leurs coûts réels
  • Attribution : impossible de facturer équipes correctement

Objectif FinOps :

  • Visibilité : coûts temps réel par équipe/projet
  • Responsabilité : chaque équipe owner de ses coûts
  • Optimisation : décisions basées sur ROI
  • Prédictibilité : budgets et forecasts précis
Statistiques 2026
  • 82% entreprises adoptent FinOps formellement
  • $1.3T dépenses cloud globales
  • 30% économies moyennes après FinOps
  • 2-6 mois ROI typique initiative FinOps
  • FinOps Engineer = top 10 rôle cloud demandé
Modèle FinOps Foundation
┌────────────────────────────────────────┐
│         INFORM (Visibilité)            │
│  • Allocation coûts                    │
│  • Tagging resources                   │
│  • Forecasting                         │
│  • Benchmarking                        │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPTIMIZE (Efficience)           │
│  • Rightsizing                         │
│  • Reserved Instances                  │
│  • Spot instances                      │
│  • Storage optimization                │
└────────────────────────────────────────┘
              ↓
┌────────────────────────────────────────┐
│        OPERATE (Gouvernance)           │
│  • Policies automatiques               │
│  • Alertes budgets                     │
│  • Chargeback/Showback                 │
│  • Culture FinOps                      │
└────────────────────────────────────────┘

Tagging et allocation des coûts

Stratégie de tagging

Tags essentiels :

  • Environment : prod/staging/dev
  • Team : équipe propriétaire
  • Project : projet/produit
  • CostCenter : centre de coûts finance
  • Owner : email responsable
  • Application : nom application
  • ManagedBy : terraform/manual/autoscaling
Policy de tagging AWS
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Deny",
			"Action": [
				"ec2:RunInstances",
				"rds:CreateDBInstance",
				"s3:CreateBucket",
				"elasticloadbalancing:CreateLoadBalancer"
			],
			"Resource": "*",
			"Condition": {
				"StringNotLike": {
					"aws:RequestTag/Environment": ["prod", "staging", "dev"],
					"aws:RequestTag/Team": "*",
					"aws:RequestTag/Project": "*",
					"aws:RequestTag/CostCenter": "*"
				}
			}
		}
	]
}

Appliquer via AWS Organizations :

# Service Control Policy (SCP)
aws organizations create-policy \
  --name RequireTagsPolicy \
  --type SERVICE_CONTROL_POLICY \
  --content file://require-tags-policy.json

# Attacher à OU
aws organizations attach-policy \
  --policy-id p-abc123 \
  --target-id ou-xyz789
Tag automatique avec Terraform
# variables.tf
variable "default_tags" {
  type = map(string)
  default = {
    Environment = "prod"
    ManagedBy   = "terraform"
    Team        = "platform"
    CostCenter  = "engineering"
  }
}

# provider.tf
provider "aws" {
  region = "eu-west-1"

  default_tags {
    tags = var.default_tags
  }
}

# main.tf - tags appliqués automatiquement
resource "aws_instance" "app" {
  ami           = "ami-12345678"
  instance_type = "t3.medium"

  tags = merge(
    var.default_tags,
    {
      Name        = "app-server"
      Application = "payment-api"
      Owner       = "team-payments@company.com"
    }
  )
}
Tag Compliance Checker
#!/usr/bin/env python3
# check_tags.py

import boto3
import json
from datetime import datetime

REQUIRED_TAGS = ['Environment', 'Team', 'Project', 'CostCenter']

def check_ec2_tags():
    ec2 = boto3.client('ec2')

    instances = ec2.describe_instances()
    non_compliant = []

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}

            missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]

            if missing_tags:
                non_compliant.append({
                    'InstanceId': instance_id,
                    'MissingTags': missing_tags,
                    'State': instance['State']['Name']
                })

    return non_compliant

def check_rds_tags():
    rds = boto3.client('rds')

    instances = rds.describe_db_instances()
    non_compliant = []

    for instance in instances['DBInstances']:
        db_id = instance['DBInstanceIdentifier']
        arn = instance['DBInstanceArn']

        tags_response = rds.list_tags_for_resource(ResourceName=arn)
        tags = {tag['Key']: tag['Value'] for tag in tags_response['TagList']}

        missing_tags = [tag for tag in REQUIRED_TAGS if tag not in tags]

        if missing_tags:
            non_compliant.append({
                'DBInstanceId': db_id,
                'MissingTags': missing_tags,
                'Status': instance['DBInstanceStatus']
            })

    return non_compliant

def main():
    print("Checking tag compliance...")

    ec2_issues = check_ec2_tags()
    rds_issues = check_rds_tags()

    report = {
        'Timestamp': datetime.now().isoformat(),
        'EC2': {
            'Total': len(ec2_issues),
            'NonCompliant': ec2_issues
        },
        'RDS': {
            'Total': len(rds_issues),
            'NonCompliant': rds_issues
        }
    }

    print(json.dumps(report, indent=2))

    # Slack notification si problèmes
    if ec2_issues or rds_issues:
        # send_slack_alert(report)
        pass

if __name__ == '__main__':
    main()
# Cron daily
0 9 * * * /usr/local/bin/check_tags.py | mail -s "Tag Compliance Report" finops@company.com

Rightsizing instances et storage

Analyse utilisation EC2
# CloudWatch metrics 14 jours
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-abc123 \
  --start-time 2026-01-03T00:00:00Z \
  --end-time 2026-01-17T00:00:00Z \
  --period 3600 \
  --statistics Average,Maximum

# Exemple output:
# Average: 12%
# Maximum: 28%
# → Instance oversized, rightsizing recommandé
Script rightsizing automatique
#!/usr/bin/env python3
# rightsizing_recommendations.py

import boto3
from datetime import datetime, timedelta

def get_cpu_utilization(instance_id, days=14):
    cloudwatch = boto3.client('cloudwatch')

    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)

    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    if not response['Datapoints']:
        return None, None

    avg = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
    max_cpu = max(d['Maximum'] for d in response['Datapoints'])

    return avg, max_cpu

def get_rightsizing_recommendation(instance_type, avg_cpu, max_cpu):
    """
    Recommandations basées sur utilisation:
    - avg < 20% et max < 40% : downsize
    - avg > 70% ou max > 90% : upsize
    """

    # Mapping instance types (simplifié)
    downsize_map = {
        't3.xlarge': 't3.large',
        't3.large': 't3.medium',
        't3.medium': 't3.small',
        'm5.2xlarge': 'm5.xlarge',
        'm5.xlarge': 'm5.large',
        'm5.large': 'm5.medium'
    }

    upsize_map = {v: k for k, v in downsize_map.items()}

    if avg_cpu < 20 and max_cpu < 40:
        return downsize_map.get(instance_type, instance_type), "downsize"
    elif avg_cpu > 70 or max_cpu > 90:
        return upsize_map.get(instance_type, instance_type), "upsize"

    return instance_type, "optimal"

def analyze_instances():
    ec2 = boto3.client('ec2')

    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )

    recommendations = []

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            instance_type = instance['InstanceType']

            tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}
            name = tags.get('Name', 'N/A')

            avg_cpu, max_cpu = get_cpu_utilization(instance_id)

            if avg_cpu is None:
                continue

            recommended_type, action = get_rightsizing_recommendation(
                instance_type, avg_cpu, max_cpu
            )

            if action != "optimal":
                # Calculer économies
                current_cost = get_instance_cost(instance_type)
                new_cost = get_instance_cost(recommended_type)
                monthly_savings = (current_cost - new_cost) * 730  # heures/mois

                recommendations.append({
                    'InstanceId': instance_id,
                    'Name': name,
                    'CurrentType': instance_type,
                    'AvgCPU': f"{avg_cpu:.1f}%",
                    'MaxCPU': f"{max_cpu:.1f}%",
                    'Recommendation': recommended_type,
                    'Action': action,
                    'MonthlySavings': f"${monthly_savings:.2f}"
                })

    return recommendations

def get_instance_cost(instance_type):
    """Prix on-demand par heure (simplifié - utiliser AWS Price List API)"""
    prices = {
        't3.small': 0.0208,
        't3.medium': 0.0416,
        't3.large': 0.0832,
        't3.xlarge': 0.1664,
        'm5.medium': 0.096,
        'm5.large': 0.192,
        'm5.xlarge': 0.384,
        'm5.2xlarge': 0.768
    }
    return prices.get(instance_type, 0)

def main():
    print("Analyzing EC2 instances for rightsizing...")

    recommendations = analyze_instances()

    print(f"\nFound {len(recommendations)} rightsizing opportunities:")
    print("-" * 100)

    for rec in recommendations:
        print(f"Instance: {rec['InstanceId']} ({rec['Name']})")
        print(f"  Current: {rec['CurrentType']} - CPU: {rec['AvgCPU']} avg, {rec['MaxCPU']} max")
        print(f"  Recommendation: {rec['Action'].upper()} to {rec['Recommendation']}")
        print(f"  Monthly savings: {rec['MonthlySavings']}")
        print()

    total_savings = sum(float(r['MonthlySavings'].replace('$', '')) for r in recommendations)
    print(f"Total potential monthly savings: ${total_savings:.2f}")
    print(f"Annual savings: ${total_savings * 12:.2f}")

if __name__ == '__main__':
    main()
Storage optimization

EBS volumes non attachés :

# Lister volumes disponibles
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table

# Calculer coût
# gp3: $0.08/GB/month
# io2: $0.125/GB/month

# Snapshot puis delete volumes inutilisés
for vol in $(aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].VolumeId' --output text); do
    aws ec2 create-snapshot --volume-id $vol --description "Backup before deletion"
    aws ec2 delete-volume --volume-id $vol
done

S3 lifecycle policies :

{
	"Rules": [
		{
			"Id": "MoveToIA",
			"Status": "Enabled",
			"Transitions": [
				{
					"Days": 90,
					"StorageClass": "STANDARD_IA"
				},
				{
					"Days": 180,
					"StorageClass": "GLACIER"
				},
				{
					"Days": 365,
					"StorageClass": "DEEP_ARCHIVE"
				}
			],
			"NoncurrentVersionTransitions": [
				{
					"NoncurrentDays": 30,
					"StorageClass": "STANDARD_IA"
				}
			],
			"NoncurrentVersionExpiration": {
				"NoncurrentDays": 90
			}
		},
		{
			"Id": "DeleteOldBackups",
			"Status": "Enabled",
			"Prefix": "backups/",
			"Expiration": {
				"Days": 730
			}
		}
	]
}

Kubecost : FinOps pour Kubernetes

Installation Kubecost
# Ajouter repo Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

# Installer Kubecost
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken="aGVsbEB3b3JsZAo=" \
  --set prometheus.server.persistentVolume.enabled=true \
  --set prometheus.server.persistentVolume.size=32Gi

# Vérifier
kubectl get pods -n kubecost
# kubecost-cost-analyzer-xxx     3/3     Running
# kubecost-prometheus-server-xxx 2/2     Running

# Port-forward UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

# Accès: http://localhost:9090
Configuration cloud billing

AWS :

# kubecost-values.yaml
kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  awsSpotDataRegion: eu-west-1
  awsSpotDataBucket: kubecost-spot-data-bucket
  athenaProjectID: my-project
  athenaBucketName: aws-athena-query-results-bucket
  athenaRegion: eu-west-1
  athenaDatabase: athenacurcfn_cur
  athenaTable: cur

GCP :

kubecostProductConfigs:
  cloudIntegrationSecret: cloud-integration
  gcpBillingDataDataset: billing_export
  gcpProjectID: my-gcp-project
# Secret pour credentials
kubectl create secret generic cloud-integration \
  -n kubecost \
  --from-file=cloud-integration.json=gcp-key.json

# Upgrade avec config
helm upgrade kubecost kubecost/cost-analyzer \
  -n kubecost \
  -f kubecost-values.yaml
Allocation par namespace/label
# API Kubecost - coûts par namespace
curl "http://localhost:9090/model/allocation?window=7d&aggregate=namespace"

# Output JSON:
{
  "data": [
    {
      "namespace": "production",
      "totalCost": 12456.78,
      "cpuCost": 5432.10,
      "ramCost": 4321.09,
      "pvCost": 2703.59
    },
    {
      "namespace": "staging",
      "totalCost": 1234.56,
      ...
    }
  ]
}

Allocation par label :

# Coûts par team
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:team"

# Coûts par application
curl "http://localhost:9090/model/allocation?window=30d&aggregate=label:app"
Savings recommendations
# API recommendations
curl "http://localhost:9090/model/savings"

# Output:
{
  "clusterSizing": {
    "overprovisioned": [
      {
        "namespace": "dev",
        "deployment": "test-app",
        "container": "app",
        "currentCPU": "2000m",
        "recommendedCPU": "500m",
        "monthlySavings": 87.45
      }
    ]
  },
  "abandonedWorkloads": [
    {
      "namespace": "staging",
      "deployment": "old-api",
      "monthlyCost": 234.56,
      "reason": "0 requests last 30 days"
    }
  ]
}
Kubecost Alerts
# kubecost-alerts.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-alerts
  namespace: kubecost
data:
  alerts.json: |
    {
      "alerts": [
        {
          "type": "budget",
          "name": "Production Budget Alert",
          "threshold": 10000,
          "window": "monthly",
          "aggregation": "namespace",
          "filter": "namespace=production",
          "ownerContact": ["team-platform@company.com"]
        },
        {
          "type": "spendChange",
          "name": "Staging Spend Spike",
          "threshold": 50,
          "window": "1d",
          "aggregation": "namespace",
          "filter": "namespace=staging",
          "ownerContact": ["team-dev@company.com"]
        },
        {
          "type": "efficiency",
          "name": "Low Efficiency Alert",
          "threshold": 0.5,
          "window": "7d",
          "aggregation": "deployment",
          "ownerContact": ["finops@company.com"]
        }
      ]
    }

Reserved Instances et Savings Plans

Analyse couverture RI
#!/usr/bin/env python3
# ri_coverage.py

import boto3
from datetime import datetime, timedelta

def analyze_ri_coverage():
    ce = boto3.client('ce')  # Cost Explorer

    end = datetime.now().date()
    start = end - timedelta(days=30)

    response = ce.get_reservation_coverage(
        TimePeriod={
            'Start': start.strftime('%Y-%m-%d'),
            'End': end.strftime('%Y-%m-%d')
        },
        Granularity='MONTHLY',
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'INSTANCE_TYPE'},
            {'Type': 'DIMENSION', 'Key': 'REGION'}
        ]
    )

    print("Reserved Instance Coverage Report")
    print("=" * 80)

    for item in response['CoveragesByTime']:
        period = item['TimePeriod']

        for group in item['Groups']:
            instance_type = group['Attributes'].get('INSTANCE_TYPE', 'N/A')
            region = group['Attributes'].get('REGION', 'N/A')

            coverage = group['Coverage']
            coverage_hours = coverage['CoverageHours']

            on_demand_hours = float(coverage_hours.get('OnDemandHours', 0))
            reserved_hours = float(coverage_hours.get('ReservedHours', 0))
            total_hours = float(coverage_hours.get('TotalRunningHours', 0))

            if total_hours > 0:
                coverage_pct = (reserved_hours / total_hours) * 100

                print(f"\n{instance_type} in {region}")
                print(f"  Total Hours: {total_hours:.0f}")
                print(f"  Reserved Hours: {reserved_hours:.0f}")
                print(f"  On-Demand Hours: {on_demand_hours:.0f}")
                print(f"  Coverage: {coverage_pct:.1f}%")

                # Recommandation si couverture < 70%
                if coverage_pct < 70 and total_hours > 500:
                    print(f"  ⚠️  RECOMMENDATION: Consider purchasing RI")

def get_ri_recommendations():
    ce = boto3.client('ce')

    response = ce.get_reservation_purchase_recommendation(
        Service='Amazon Elastic Compute Cloud - Compute',
        AccountScope='PAYER',
        LookbackPeriodInDays='THIRTY_DAYS',
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT'
    )

    print("\n" + "=" * 80)
    print("RI Purchase Recommendations")
    print("=" * 80)

    for rec in response['Recommendations']:
        details = rec['RecommendationDetails']

        print(f"\nInstance Type: {details.get('InstanceType', 'N/A')}")
        print(f"Region: {details.get('Region', 'N/A')}")
        print(f"Recommended: {details.get('RecommendedNumberOfInstancesToPurchase', 0)} instances")
        print(f"Monthly Savings: ${float(details.get('EstimatedMonthlySavingsAmount', 0)):.2f}")
        print(f"Upfront Cost: ${float(details.get('UpfrontCost', 0)):.2f}")
        print(f"Monthly Cost: ${float(details.get('RecurringStandardMonthlyCost', 0)):.2f}")

if __name__ == '__main__':
    analyze_ri_coverage()
    get_ri_recommendations()
Savings Plans vs Reserved Instances
CritèreReserved InstancesSavings Plans
FlexibilitéFixe (type/région)Flexible (type/région/famille)
Discount40-72%40-66%
EngagementInstance spécifique$/heure compute
ScopeEC2 onlyEC2, Lambda, Fargate
ChangementModifier/échangerAutomatique
Recommandé pourWorkloads stablesWorkloads variables

Recommandation 2026 : Savings Plans pour 60-70% base load, Spot pour workloads flexibles.


Spot instances et architecture résiliente

Spot instances : 70-90% discount

Cas d'usage :

  • CI/CD runners
  • Batch processing
  • Data analytics
  • Dev/test environments
  • Stateless applications
Kubernetes avec Spot instances
# spot-nodegroup.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: eu-west-1

nodeGroups:
  # On-Demand pour workloads critiques
  - name: on-demand
    instanceType: m5.xlarge
    minSize: 3
    maxSize: 10
    desiredCapacity: 5
    labels:
      workload-type: critical
    taints:
      - key: workload-type
        value: critical
        effect: NoSchedule

  # Spot pour workloads tolérants
  - name: spot
    instancesDistribution:
      instanceTypes:
        - m5.xlarge
        - m5a.xlarge
        - m5n.xlarge
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 3
    minSize: 0
    maxSize: 50
    desiredCapacity: 10
    labels:
      workload-type: flexible
    taints:
      - key: workload-type
        value: flexible
        effect: NoSchedule

Deployment spot-friendly :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      # Tolérer Spot instances
      tolerations:
        - key: workload-type
          operator: Equal
          value: flexible
          effect: NoSchedule

      # Affinité Spot nodes
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: workload-type
                    operator: In
                    values:
                      - flexible

      # Graceful shutdown
      terminationGracePeriodSeconds: 120

      containers:
        - name: processor
          image: batch-processor:v1

          # Handle SIGTERM properly
          lifecycle:
            preStop:
              exec:
                command: ['/bin/sh', '-c', 'sleep 15']

          resources:
            requests:
              cpu: 1000m
              memory: 2Gi
            limits:
              cpu: 2000m
              memory: 4Gi
Spot interruption handler
# Installer AWS Node Termination Handler
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableScheduledEventDraining=true

Handler personnalisé :

#!/usr/bin/env python3
# spot_handler.py - sur chaque spot node

import requests
import time
import subprocess

METADATA_URL = "http://169.254.169.254/latest/meta-data/spot/instance-action"

def check_spot_termination():
    try:
        response = requests.get(METADATA_URL, timeout=1)
        if response.status_code == 200:
            return True, response.json()
    except:
        pass
    return False, None

def drain_node():
    # Cordon node
    subprocess.run(['kubectl', 'cordon', NODE_NAME])

    # Drain with grace period
    subprocess.run([
        'kubectl', 'drain', NODE_NAME,
        '--ignore-daemonsets',
        '--delete-emptydir-data',
        '--grace-period=90'
    ])

if __name__ == '__main__':
    while True:
        terminating, action = check_spot_termination()

        if terminating:
            print(f"Spot termination notice received: {action}")
            drain_node()
            break

        time.sleep(5)

Dashboards et alertes temps réel

CloudWatch Billing Dashboard
#!/usr/bin/env python3
# create_billing_dashboard.py

import boto3
import json

cloudwatch = boto3.client('cloudwatch')

dashboard_body = {
    "widgets": [
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Total AWS Charges (MTD)",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        },
        {
            "type": "metric",
            "properties": {
                "metrics": [
                    ["AWS/Billing", "EstimatedCharges", {"dimensions": {"ServiceName": "AmazonEC2"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonRDS"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonS3"}}],
                    ["...", {"dimensions": {"ServiceName": "AmazonEKS"}}]
                ],
                "period": 21600,
                "stat": "Maximum",
                "region": "us-east-1",
                "title": "Charges by Service",
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        }
    ]
}

cloudwatch.put_dashboard(
    DashboardName='FinOps-Billing',
    DashboardBody=json.dumps(dashboard_body)
)

print("Dashboard created: FinOps-Billing")
Budget Alerts
# AWS Budget avec alertes
aws budgets create-budget \
  --account-id 123456789012 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json
// budget.json
{
	"BudgetName": "Monthly-Production-Budget",
	"BudgetLimit": {
		"Amount": "10000",
		"Unit": "USD"
	},
	"TimeUnit": "MONTHLY",
	"BudgetType": "COST",
	"CostFilters": {
		"TagKeyValue": ["user:Environment$production"]
	}
}
// notifications.json
[
	{
		"Notification": {
			"NotificationType": "ACTUAL",
			"ComparisonOperator": "GREATER_THAN",
			"Threshold": 80,
			"ThresholdType": "PERCENTAGE"
		},
		"Subscribers": [
			{
				"SubscriptionType": "EMAIL",
				"Address": "finops@company.com"
			},
			{
				"SubscriptionType": "SNS",
				"Address": "arn:aws:sns:eu-west-1:123456789012:budget-alerts"
			}
		]
	},
	{
		"Notification": {
			"NotificationType": "FORECASTED",
			"ComparisonOperator": "GREATER_THAN",
			"Threshold": 100,
			"ThresholdType": "PERCENTAGE"
		},
		"Subscribers": [
			{
				"SubscriptionType": "EMAIL",
				"Address": "cto@company.com"
			}
		]
	}
]

Culture FinOps et gouvernance

Chargeback vs Showback

Showback : Transparence coûts, pas de facturation

# showback_report.py - email hebdomadaire équipes

def generate_showback_report(team):
    costs = get_team_costs(team, days=7)

    report = f"""
    FinOps Weekly Report - {team}

    Last 7 days costs: ${costs['total']:.2f}

    Breakdown:
    - Compute (EC2/EKS): ${costs['compute']:.2f}
    - Storage (EBS/S3): ${costs['storage']:.2f}
    - Database (RDS): ${costs['database']:.2f}
    - Networking: ${costs['network']:.2f}

    Trend: {costs['trend']}% vs last week

    Top 5 resources:
    {format_top_resources(costs['top_resources'])}

    Optimization opportunities:
    - {len(costs['recommendations'])} rightsizing recommendations
    - Potential monthly savings: ${costs['potential_savings']:.2f}

    View detailed breakdown: https://finops.company.com/teams/{team}
    """

    send_email(f"{team}@company.com", "Weekly FinOps Report", report)

Chargeback : Facturation réelle aux équipes

# chargeback_invoice.py - mensuel

def generate_chargeback_invoice(team, month):
    costs = get_team_costs(team, month=month)

    # Appliquer markup (overhead infra)
    markup = 1.15  # 15% overhead
    total_with_markup = costs['total'] * markup

    invoice = {
        'team': team,
        'period': month,
        'subtotal': costs['total'],
        'markup': costs['total'] * 0.15,
        'total': total_with_markup,
        'cost_center': get_cost_center(team)
    }

    # Export vers ERP
    export_to_erp(invoice)

    return invoice
FinOps KPIs
# finops_kpis.py - dashboard exécutif

def calculate_finops_kpis():
    return {
        # Coût unitaire
        'cost_per_customer': total_costs / total_customers,
        'cost_per_transaction': total_costs / total_transactions,
        'cost_per_api_call': total_costs / total_api_calls,

        # Efficience
        'compute_utilization': used_compute / provisioned_compute,
        'storage_utilization': used_storage / provisioned_storage,
        'waste_percentage': wasted_spend / total_spend,

        # Couverture
        'ri_coverage': reserved_hours / total_hours,
        'spot_usage': spot_hours / total_hours,

        # Gouvernance
        'tagged_resources': tagged / total_resources,
        'budget_adherence': actual_spend / budgeted_spend
    }

Checklist FinOps

Phase 1 : Visibilité (Mois 1)

  • Tagging strategy définie et appliquée
  • Compliance tagging ≥90%
  • Cost Explorer configuré
  • Dashboards billing créés
  • Export données vers data lake

Phase 2 : Analyse (Mois 2)

  • Analyse utilisation EC2/RDS
  • Rightsizing recommendations
  • Storage optimization (EBS/S3)
  • Identification ressources idle
  • Quick wins implémentés (20-30% savings)

Phase 3 : Optimisation (Mois 3-4)

  • Kubecost déployé (si K8s)
  • RI/Savings Plans achetés (60-70% base)
  • Spot instances architecture
  • Budgets et alertes actifs
  • Showback rapports hebdomadaires

Phase 4 : Gouvernance (Mois 5-6)

  • Policies automatiques (tag enforcement)
  • Chargeback implémenté
  • FinOps reviews mensuelles
  • KPIs suivis et reportés
  • Culture FinOps établie

Conclusion

FinOps devient essentiel en 2026 avec budgets cloud croissants. Tagging, rightsizing, Reserved Instances et Kubecost permettent d'économiser 30-50% tout en maintenant performance et agilité.

Points clés :

  • Tagging = fondation visibilité
  • Rightsizing = quick wins 20-30%
  • Kubecost = FinOps Kubernetes essentiel
  • RI/Savings Plans = 40-70% discount base load
  • Spot = 70-90% discount workloads flexibles

Gains typiques :

  • Économies : 30-50% budget cloud
  • ROI : 2-6 mois
  • Visibilité : 100% ressources taggées
  • Efficience : +40% utilisation compute
  • Waste : -80% ressources idle

Actions prioritaires :

  1. Implémenter tagging obligatoire
  2. Analyse rightsizing EC2/RDS
  3. Déployer Kubecost (si K8s)
  4. Acheter RI/Savings Plans base load
  5. Architecture Spot instances
Besoin d'aide sur ce sujet ?

Notre équipe d'experts est là pour vous accompagner dans vos projets d'infrastructure et d'infogérance.

Contactez-nous

Articles similaires