Deployment Troubleshooting¶
Common deployment issues and solutions for SYNDI systems.
Deployment Failures¶
Stack in ROLLBACK_COMPLETE State¶
Symptom:
⚠️ Stack rawscribe-stage-myorg is in ROLLBACK_COMPLETE state
Cause: Previous deployment failed and CloudFormation rolled back
Solution:
# Automatic handling via rs-deploy-only
ORG=myorg ENV=stage make rs-deploy-only
The deployment automatically:
Detects ROLLBACK_COMPLETE state
Deletes the failed stack
Waits for deletion to complete
Proceeds with fresh deployment
Manual solution:
# Delete stack manually
aws cloudformation delete-stack \
--stack-name rawscribe-stage-myorg \
--region us-east-1
# Wait for deletion
aws cloudformation wait stack-delete-complete \
--stack-name rawscribe-stage-myorg
# Redeploy
ORG=myorg ENV=stage make rs-deploy
“Stack already exists”¶
Symptom:
Error: Stack rawscribe-stage-myorg already exists
Cause: Organization was previously deployed
Solution:
Option 1: Use different organization name
ORG=myorg2 ENV=stage make rs-deploy
Option 2: Update existing stack
ORG=myorg ENV=stage make rs-deploy-only
Option 3: Delete and redeploy
make rs-teardown ENV=stage ORG=myorg
# Wait 2-3 minutes
ORG=myorg ENV=stage make rs-deploy
“Bucket already exists”¶
Symptom:
Error: Bucket rawscribe-forms-stage-myorg-288761742376 already exists
Cause: CREATE_BUCKETS=true but buckets already exist
Solution:
# Use CREATE_BUCKETS=false for existing buckets
CREATE_BUCKETS=false ENABLE_AUTH=true \
ORG=myorg ENV=stage make rs-deploy
“Insufficient permissions”¶
Symptom:
Error: User: arn:aws:iam::123456:user/myuser is not authorized to perform: cloudformation:CreateStack
Cause: AWS IAM user lacks required permissions
Solution:
Check required permissions:
CloudFormation: CreateStack, UpdateStack, DescribeStacks
Lambda: CreateFunction, UpdateFunctionCode
S3: CreateBucket, PutObject, GetObject
Cognito: CreateUserPool, CreateUserPoolClient
API Gateway: CreateRestApi, CreateDeployment
IAM: CreateRole, AttachRolePolicy, PutRolePolicy
CloudFront: CreateDistribution
Request administrator access or specific permissions from your AWS admin.
Build Failures¶
SAM Build Fails¶
Symptom:
Error: Unable to find a supported build workflow for runtime python3.9
Cause: Build dependencies missing or corrupted build directory
Solution:
# Clean and rebuild
rm -rf .aws-sam-stage-myorg/
ORG=myorg ENV=stage make rs-deploy
Dependencies Not Installing¶
Symptom:
Error: Could not install packages due to an OSError
Cause: Network issues or package conflicts
Solution:
# Update requirements.txt with specific versions
vim backend/layers/dependencies/requirements.txt
# Clean and rebuild
rm -rf .aws-sam-stage-myorg/
ORG=myorg ENV=stage make rs-deploy
Layer Build Timeout¶
Symptom:
Error: Build timed out after 600 seconds
Cause: Too many dependencies or slow network
Solution:
# Build locally first
cd backend/layers/dependencies
pip install -r requirements.txt -t python/
# Then deploy
cd ../../../
ORG=myorg ENV=stage make rs-deploy
Configuration Issues¶
Config Not Found¶
Symptom:
❌ Base config not found: infra/.config/lambda/stage.json
Cause: Config files missing
Solution:
# Check if base configs exist
ls infra/.config/lambda/stage.json
ls infra/.config/webapp/stage.json
# If missing, restore from example or git
git checkout infra/.config/
# Or create minimal configs (see config-examples.md)
Configs Not Syncing¶
Symptom: API endpoint not updating in configs after deployment
Cause: Forgot to run sync-configs
Solution:
make sync-configs ENV=stage ORG=myorg
Lambda Function Issues¶
Function Not Updating¶
Symptom: Code changes not reflected in Lambda
Cause: Using cached build or wrong command
Solution:
# Force rebuild
rm -rf .aws-sam-stage-myorg/
ORG=myorg ENV=stage make rs-deploy
# Or use direct update
ORG=myorg ENV=stage make rs-deploy-function
“Function size too large”¶
Symptom:
Error: RequestEntityTooLargeException: Request must be smaller than 69905067 bytes
Cause: Lambda package > 50MB uncompressed
Solution:
# rs-deploy-function automatically handles this
# It uploads via S3 if package > 69MB
ORG=myorg ENV=stage make rs-deploy-function
# Or reduce package size:
# - Remove unnecessary dependencies
# - Use Lambda layers for large packages
Lambda Timeout¶
Symptom: Lambda execution times out
Cause: Function timeout set too low
Solution: Update template.yaml:
Globals:
Function:
Timeout: 60 # Increase from 30
Then redeploy:
ORG=myorg ENV=stage make rs-deploy
Authentication Issues¶
Cognito Resources Not Created¶
Symptom: No User Pool created after deployment
Cause: ENABLE_AUTH=false
Solution:
# Redeploy with auth enabled
ENABLE_AUTH=true ORG=myorg ENV=stage make rs-deploy-only
Admin User Not Created¶
Symptom: Admin user missing after deployment
Cause: Missing ADMIN_USERNAME or ADMIN_PASSWORD
Solution:
# Redeploy with admin credentials
ENABLE_AUTH=true \
ADMIN_USERNAME=admin@myorg.com \
ADMIN_PASSWORD=SecurePass2025! \
ORG=myorg ENV=stage make rs-deploy-only
JWT Validation Fails¶
Symptom: Valid tokens rejected by Lambda
Cause: Mismatched Cognito configuration
Solution:
# Check Lambda environment variables
aws lambda get-function-configuration \
--function-name rawscribe-stage-myorg-backend \
--query 'Environment.Variables' | jq
# Ensure COGNITO_USER_POOL_ID and COGNITO_CLIENT_ID are set
# Redeploy if needed
ORG=myorg ENV=stage make rs-deploy-only
Teardown and Redeploy¶
Safe Teardown (Preserves Data)¶
Removes Lambda and API Gateway, keeps Cognito and S3:
# Teardown stack (preserves User Pool and S3 buckets)
make rs-teardown ENV=stage ORG=myorg
What gets preserved:
✅ Cognito User Pool - User accounts remain
✅ S3 Buckets - All data preserved
✅ User passwords - No reset needed
What gets removed:
❌ Lambda function
❌ API Gateway
❌ CloudFormation stack
❌ CloudFront distribution
Redeploy after teardown:
# Redeploy with existing resources
CREATE_BUCKETS=false ENABLE_AUTH=true \
ORG=myorg ENV=stage make rs-deploy
# CloudFormation will discover and reuse existing User Pool and buckets
Complete Teardown (DANGEROUS - Destroys Data!)¶
WARNING: This deletes ALL data including user accounts and ELN submissions!
# Delete CloudFormation stack (initiates deletion)
aws cloudformation delete-stack \
--stack-name rawscribe-stage-myorg \
--region us-east-1
# Wait for deletion
aws cloudformation wait stack-delete-complete \
--stack-name rawscribe-stage-myorg
# Manually delete S3 buckets (CloudFormation can't delete non-empty buckets)
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Empty and delete each bucket
aws s3 rm s3://rawscribe-forms-stage-myorg-${ACCOUNT_ID} --recursive
aws s3 rb s3://rawscribe-forms-stage-myorg-${ACCOUNT_ID}
aws s3 rm s3://rawscribe-eln-stage-myorg-${ACCOUNT_ID} --recursive
aws s3 rb s3://rawscribe-eln-stage-myorg-${ACCOUNT_ID}
aws s3 rm s3://rawscribe-eln-drafts-stage-myorg-${ACCOUNT_ID} --recursive
aws s3 rb s3://rawscribe-eln-drafts-stage-myorg-${ACCOUNT_ID}
aws s3 rm s3://rawscribe-lambda-stage-myorg-${ACCOUNT_ID} --recursive
aws s3 rb s3://rawscribe-lambda-stage-myorg-${ACCOUNT_ID}
aws s3 rm s3://syndi-frontend-stage-myorg-${ACCOUNT_ID} --recursive
aws s3 rb s3://syndi-frontend-stage-myorg-${ACCOUNT_ID}
Verification After Teardown¶
# Check Lambda (should not exist)
aws lambda get-function --function-name rawscribe-stage-myorg-backend
# Expected: ResourceNotFoundException
# Check API Gateway (should not exist)
aws apigateway get-rest-apis \
--query "items[?name=='rawscribe-stage-myorg-api'].name"
# Expected: []
# Check User Pool (should exist if safe teardown)
aws cognito-idp list-user-pools --max-results 60 \
--query "UserPools[?contains(Name,'rawscribe-stage-myorg')].Name"
# Check S3 buckets (should exist if safe teardown)
aws s3 ls | grep "rawscribe.*myorg"
Performance Issues¶
Slow Deployments¶
Symptom: Deployment takes > 10 minutes
Causes and Solutions:
1. Layer rebuild:
# Check if requirements.txt changed
git diff backend/layers/dependencies/requirements.txt
# If unchanged, use rs-deploy-only (skips rebuild)
ORG=myorg ENV=stage make rs-deploy-only
2. Network issues:
# Check AWS connectivity
aws sts get-caller-identity
# Try different region (edit AWS config)
aws configure set region us-west-2
3. Large deployment artifacts:
# Check package size
ls -lh .aws-sam-stage-myorg/RawscribeLambda/
# Reduce if needed:
# - Remove unused dependencies
# - Optimize imports
Slow Lambda Cold Starts¶
Symptom: First request takes > 5 seconds
Solution: TBD - Lambda warming strategies
Resource Cleanup¶
Clean Build Directories¶
# Remove specific org build
rm -rf .aws-sam-stage-myorg/
# Remove all build directories
rm -rf .aws-sam-*/
# Force clean rebuild
make clean-frontend clean-backend
ORG=myorg ENV=stage make rs-deploy
Clean Local Test Data¶
# Clean test artifacts
make clean-test
# Clean local S3 simulation
rm -rf .local/s3/*
# Recreate local environment
make setup-local ENV=dev ORG=myorg
Verification Commands¶
Check Stack Exists¶
aws cloudformation describe-stacks \
--stack-name rawscribe-stage-myorg \
--query 'Stacks[0].StackName' \
--output text
Check Stack Status¶
make check-rs-stack-status ENV=stage ORG=myorg
# Or directly
aws cloudformation describe-stacks \
--stack-name rawscribe-stage-myorg \
--query 'Stacks[0].StackStatus' \
--output text
List All Resources¶
# Complete deployment check
make check-rs ENV=stage ORG=myorg
# List CloudFormation resources
aws cloudformation describe-stack-resources \
--stack-name rawscribe-stage-myorg
Getting Help¶
View Deployment Logs¶
# CloudFormation events
aws cloudformation describe-stack-events \
--stack-name rawscribe-stage-myorg \
--max-items 20
# Lambda logs
make rs-watch-log ENV=stage ORG=myorg
Check Recent Changes¶
# View stack events
aws cloudformation describe-stack-events \
--stack-name rawscribe-stage-myorg \
--query 'StackEvents[0:10].[Timestamp,ResourceStatus,ResourceType,ResourceStatusReason]' \
--output table