Every week I hear a fresh story about an AWS key leak that turned into an all nighter. The pattern is boring by now. Static keys on a laptop. An old repo on GitHub. A forgotten user with full power.
Here is the short version from the trenches. Do the boring things right in IAM and you sleep fine. Skip them and the pager gets a workout.
What belongs to the root account?
Almost nothing. Do not create access keys for root. Turn on MFA for root using a hardware token or a phone app. Store the recovery codes offline. Then put the root user in a password manager and forget it exists.
Create an admin group with a named admin user. That is your break glass identity. Use that for rare account tasks, not for daily work.
How do I keep keys off laptops?
Use short lived credentials with STS AssumeRole. Keep a single long term key in a locked profile and jump to roles for real work. The CLI makes this smooth.
# ~/.aws/config
[profile base]
region = us-east-1
output = json
[profile dev]
source_profile = base
role_arn = arn:aws:iam::111122223333:role/DevEngineer
mfa_serial = arn:aws:iam::111122223333:mfa/you
# usage:
# aws --profile dev sts get-caller-identityNow your terminal session has expiring creds and an audit trail that points to a role, not to a static user key you pasted into env vars last summer.
What should default deny look like in practice?
Start from least privilege. Give read only for most users. Add narrow write bits only where needed. When in doubt add an explicit Deny for the scary stuff like wildcards on iam or kms or full s3.
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Deny", "Action": "iam:*", "Resource": "*" },
{ "Effect": "Allow", "Action": ["s3:GetObject","s3:ListBucket"], "Resource": "*" }
]
}Attach guardrail denies to groups that everyone has. Then grant tiny allow policies for the handful of tasks each team needs.
How do I grant access without sharing passwords?
Put power on roles for EC2 using instance profiles. Apps pull creds from the metadata service. No keys in files. No secrets in AMIs. No copy paste into init scripts.
# EC2 role policy for S3 read in one bucket
{
"Version":"2012-10-17",
"Statement":[
{ "Effect":"Allow", "Action":["s3:GetObject"],
"Resource":"arn:aws:s3:::my-app-data/*" }
]
}If the code can run on EC2 it should run with a role. That single move wipes out a whole class of leaks.
How do I share safely across accounts?
Use cross account AssumeRole. Let the partner or the other team jump into a role with a trust policy. No long term keys to rotate. No user sprawl.
# Trust policy on target role
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Principal":{ "AWS":"arn:aws:iam::444455556666:root" },
"Action":"sts:AssumeRole",
"Condition":{ "StringEquals":{ "sts:ExternalId":"shared-access-123" } }
}
]
}Give that role only what it needs. Timebox it. Log it. Sleep fine.
Can I enforce TLS and stop accidental public data?
Yes. Use S3 bucket policy with aws:SecureTransport. Force SSL and block public gets unless they come from the right IAM principals or IPs.
# S3 bucket policy snippet
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"DenyInsecure",
"Effect":"Deny",
"Principal":"*",
"Action":"s3:*",
"Resource":["arn:aws:s3:::my-private-bucket","arn:aws:s3:::my-private-bucket/*"],
"Condition":{ "Bool":{"aws:SecureTransport":"false"} }
}
]
}Public by default is a trap. Make the bucket say no unless the request is precise.
How do I review who can do what?
On the console, open a user and check Access Advisor to see services they touched. For quick checks use the CLI policy simulator to test an action before rollout.
# Will this principal be allowed to delete an EC2 snapshot?
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::111122223333:user/alice \
--action-names ec2:DeleteSnapshotDo this in pull requests for infra changes. Trust but verify before merge.
What about audit and alerts?
Turn on CloudTrail in all regions. Send logs to a locked S3 bucket with log file validation. Point it to CloudWatch Logs and create metric filters for trouble like root logins or denied calls.
# Example filter for root sign in
aws logs put-metric-filter \
--log-group-name CloudTrail/Account \
--filter-name RootLogin \
--filter-pattern '{ ($.userIdentity.type = "Root") && ($.eventName = "ConsoleLogin") && ($.responseElements.ConsoleLogin = "Success") }' \
--metric-transformations metricName=RootLogin,metricNamespace=Security,metricValue=1Wire that metric to SNS or your chat room. Fast signal beats long postmortems.
How do I push people to MFA without nagging?
Set a strong account password policy. Then add a policy that denies risky actions unless MFA is on. Users learn fast when the console says no.
# Deny changes without MFA
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Deny",
"Action":[ "iam:*", "ec2:TerminateInstances" ],
"Resource":"*",
"Condition":{ "BoolIfExists":{ "aws:MultiFactorAuthPresent":"false" } }
}
]
}MFA for root. MFA for admins. MFA for anyone who can move money or delete data.
How do I avoid policy spaghetti?
Keep policies small and named by task. Use groups. Put users in groups. Tag resources and use conditions instead of huge resource lists. Store policy JSON in git next to the Terraform or CloudFormation templates.
Give policies a clear path like app or team names. Put the version in the description field. Rotate by creating a new policy and swapping the attach, not by live editing.
What changed recently that helps?
Lambda just landed in preview and people are toying with push based tasks. Do not glue it together with hard coded keys. Use roles for Lambda the same way you do for EC2. Same least privilege rules. Same logging story.
KMS is here too, which means you can stop rolling your own crypto keys in S3. Tie KMS grants to roles and let CloudTrail tell you who touched what.
Compact conclusion
Seven habits for no drama IAM:
No root keys. MFA on root. Roles for humans with STS. Roles for EC2 and Lambda. Deny on the scary stuff. CloudTrail in all regions with alerts. Small policies tied to tasks.
If you do only that, you avoid most of the classic disasters. Your keys stay short lived. Your blast radius stays small. Your logs tell the story when you need it.
Pro tip for today: run aws sts get-caller-identity before anything risky. If it prints the wrong role, stop. Save a night of pain.