AutoConnectToPuttyWithEMR

Written by

in

AWS EMR SSH timeouts caused by inactivity can be resolved by deploying an auto-connect script using EMR Bootstrap Actions to inject keep-alive configurations into the SSH daemon (sshd_config) across the cluster nodes.. When long-running Spark, Hive, or Hadoop jobs execute on Amazon EMR, firewalls or NAT gateways drop silent connections, causing the shell session to freeze. Understanding the Solution

The primary strategy is modifying the server-side OpenSSH configuration automatically at cluster launch. This ensures that the master and core nodes send automated packet signals back to your terminal, forcing the network pipeline to remain open even during long intervals of idle terminal time. 1. Write the Auto-Connect Bootstrap Script

You must write a lightweight Bash script that adds client-alive directives to the SSH configuration. Save this file as emr-ssh-keepalive.sh and upload it to an Amazon S3 bucket that your EMR cluster can access.

#!/bin/bash # AWS EMR SSH Auto-Connect Keep-Alive Script # Append keepalive rules to sshd_config echo “ClientAliveInterval 60” | sudo tee -append /etc/ssh/sshd_config echo “ClientAliveCountMax 120” | sudo tee -append /etc/ssh/sshd_config # Restart the SSH daemon to apply changes safely sudo systemctl restart sshd Use code with caution.

ClientAliveInterval 60: The master node sends an internal null packet every 60 seconds to check client responsiveness.

ClientAliveCountMax 120: The cluster will tolerate up to 120 missed signals before closing, maintaining a live session for up to 2 hours of complete disconnectivity. 2. Configure the EMR Cluster Step-by-Step

To use the auto-connect helper script, you must bind it as a Bootstrap Action when creating your cluster.

Upload Code: Store emr-ssh-keepalive.sh inside your deployment bucket (e.g., s3://my-emr-bootstrap-bucket/scripts/).

Launch via AWS CLI: Execute the cluster creation snippet below, replacing placeholder values with your target AWS resource IDs:

aws emr create-cluster–name “Keep-Alive Enhanced EMR Cluster” –release-label emr-7.1.0 –applications Name=Spark Name=Hadoop –ec2-attributes KeyName=your-ec2-key,InstanceProfile=EMR_EC2_DefaultRole –service-role EMR_DefaultRole –instance-type m5.xlarge –instance-count 3 –bootstrap-actions Path=“s3://my-emr-bootstrap-bucket/scripts/emr-ssh-keepalive.sh”,Name=“SSH AutoConnect Action” Use code with caution.

Verify Settings: Once the cluster shifts to the RUNNING or WAITING state, connect using your typical SSH pipeline: ssh -i your-ec2-key.pem hadoop@://amazonaws.com Use code with caution. 3. Alternative Local Machine Configurations

If you cannot rebuild or restart an active production EMR cluster, apply matching parameters to your local developer machine instead.

For Linux/macOS Clients: Append keepalive entries inside your user environment configuration profile (~/.ssh/config):

Host *.amazonaws.com ServerAliveInterval 59 ServerAliveCountMax 3 Use code with caution.

For PuTTY Windows Clients: Navigate to Connection inside your session tree and alter the “Seconds between keepalives (0 to turn off)” option field from 0 to 59.

I notice you are addressing SSH connectivity stability for distributed workflows; are you looking to construct a continuous deployment pipeline to automate temporary cluster creations for batch analytical scripts? Expand map Amazon EC2 ssh timeout due inactivity – Stack Overflow

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *