Presto setup with AWS EC2 & S3 (2/2)

Chyi-Kwei Yau
3 min readMay 20, 2021

This is the second part of my presto setup note (check here for first part). In the previous post we setup a VPC with standalone metastore and in this post, we will setup presto coordinator & workers and run some quick tests.

Presto

In presto setting, we will need uri for metastore and presto coordinator. To make the config simple, I will create 2 ENIs (Network interfaces) for metastore and coordinator. These 2 private IPs will be attached to active meastore and coordinator EC2 instance so we can use fixed url in our configuration. I use:

  • 10.0.0.20 for metastore
  • 10.0.0.10 for presto coordinator

For high availability, you can also use a script to check instance status and attached ENIs to healthy ones, like this.

Presto AMI

We will create a shared image (AMI) for coordinator and worker since most configs are similar. We will create a few config templates in the image and generate final configs when we initialize the server (through EC2 “user data”).

To build the image:

# Install open JDK
sudo yum update
sudo amazon-linux-extras install java-openjdk11
# install presto
cd ~
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.252/presto-server-0.252.tar.gz
tar -xvf presto-server-0.252.tar.gz
mv presto-server-0.252 presto
sudo mv presto /opt/presto
sudo mkdir -p /opt/presto/etc/
chown ec2-user /opt/presto/etc/
# create data dir
sudo mkdir -p /var/presto/data
chown ec2-user /var/presto/data

Also, create the following templates in “/opt/presto/conf_template/” folder. (Notes: Part of the script are copied & modified from presto-on-aws repo.)

config.properties

coordinator={{isCoor}}
node-scheduler.include-coordinator={{includeCoor}}
http-server.http.port=8080
query.max-memory={{maxMem}}
query.max-memory-per-node={{maxMemPerNode}}
query.max-total-memory-per-node={{maxTotalMemPerNode}}
discovery.uri=http://{{discoverUri}}:8080

node.properties

node.environment={nodeEnv}
node.id={nodeID}
node.data-dir=/var/presto/data

jvm.config

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

presto.service

[Unit]
Description=Presto
After=syslog.target network.target
[Service]
Environment=JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.9.11–0.amzn2.0.1.x86_64
User=ec2-user
Type=forking
ExecStart=/opt/presto/bin/launcher start
ExecStop=/opt/presto/bin/launcher stop
Restart=always
[Install]
WantedBy=multi-user.target

Presto Coordinator

Once the AMI is created, we can setup the the coordinator with “user data” like:

#! /bin/bash
# add config.properties
cp /opt/presto/conf_template/config.properties /opt/presto/etc/config.properties
chown ec2-user /opt/presto/etc/config.properties
sed -i -e “s/{{isCoor}}/false/g” /opt/presto/etc/config.properties
sed -i -e “s/{{includeCoor}}/false/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxMem}}/50GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxMemPerNode}}/6GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxTotalMemPerNode}}/7GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{discoverUri}}/localhost/g” /opt/presto/etc/config.properties
echo "discovery-server.enabled=true" >> /opt/presto/etc/config.properties
# add node.properties
cp /opt/presto/conf_template/node.properties /opt/presto/etc/node.properties
chown ec2-user /opt/presto/etc/node.properties
sed -i -e “s/{{nodeEnv}}/production/g” /opt/presto/etc/node.properties
sed -i -e “s/{{nodeID}}/$(curl http://169.254.169.254/latest/meta-data/instance-id/)/g" /opt/presto/etc/node.properties
# add jvm.config
cp /opt/presto/conf_template/jvm.config /opt/presto/etc/jvm.config
chown ec2-user /opt/presto/etc/jvm.config
# start as service
/etc/systemd/system/presto.service
sudo systemctl enable presto
sudo systemctl start presto

(Notes: you will need to adjust the memory setting based on your instance type.)

Presto Worker

Similar to coordinator, I created an auto-scale group with the following “user data”:

#! /bin/bash
# add config.properties
cp /opt/presto/conf_template/config.properties /opt/presto/etc/config.properties
chown ec2-user /opt/presto/etc/config.properties
sed -i -e “s/{{isCoor}}/false/g” /opt/presto/etc/config.properties
sed -i -e “s/{{includeCoor}}/false/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxMem}}/50GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxMemPerNode}}/7GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{maxTotalMemPerNode}}/8GB/g” /opt/presto/etc/config.properties
sed -i -e “s/{{discoverUri}}/10.0.0.10/g” /opt/presto/etc/config.properties
# add node.properties
cp /opt/presto/conf_template/node.properties /opt/presto/etc/node.properties
chown ec2-user /opt/presto/etc/node.properties
sed -i -e “s/{{nodeEnv}}/production/g” /opt/presto/etc/node.properties
sed -i -e “s/{{nodeID}}/$(curl http://169.254.169.254/latest/meta-data/instance-id/)/g" /opt/presto/etc/node.properties
# add jvm.config
cp /opt/presto/conf_template/jvm.config /opt/presto/etc/jvm.config
chown ec2-user /opt/presto/etc/jvm.config

Quick Test

Now we should have the presto coordinator and workers up and running. First we can connect to coordinator from cli:

./presto-cli — server 10.0.0.10:8080 — catalog hive — schema default

and check if workers are found:

select * from system.runtime.nodes;

You should see worker instance id as “node_id”

To test if hive connector and S3 are working, we can create a table in S3 with TPCH data:

CREATE SCHEMA hive.mytest WITH (location = ‘s3a://{{s3 bucket name}}/mytest/’);create table hive.mytest.lineitem_sf10 as select * from tpch.sf10.lineitem;

We can also use Web UI to check query progress and other stats. You can set up ssh tunnel to access the coordinator server:

ssh -L localhost:8080:10.0.0.10:8080 ssh-login

Now you access the web UI on localhost:8080.

--

--