Running Presto on Amazon Web Services

3 min readDec 9, 2020

Article by Hakim Pocketwalla, Big Data & Cloud Developer

Introduction

Presto is an open source, distributed query engine that enables us to perform fast, interactive and analytical type of queries on various sizes of datasets. Presto is SQL compliant and supports many data sources.

Presto is a distributed system that runs on a cluster of machines. A full installation includes a coordinator and multiple workers.
Queries are submitted from a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the query execution, then distributes the processing to the workers.

Presto Architecture

Presto Requirements

Presto has a few basic requirements that we need to ensure:

Linux or Mac OS X
Java 8, 64-bit
Python 2.4+

Installing and using Presto

We can download Presto from its official website as a tarball package: https://prestosql.io/download.html
Unzipping it will return a single top-level directory.

We now need to create an “etc“ directory inside the installation directory.
The “etc“ directory will hold the following configuration:

Node Properties: environmental configuration specific to each node
JVM Config: command line options for the Java Virtual Machine
Config Properties: configuration for the Presto server
Catalog Properties: configuration for Connectors (data sources)

More details of these configurations can be obtained from the presto deployment guide:
https://prestodb.io/docs/current/installation/deployment.html

Running Presto on Amazon Web Services

As mentioned earlier, Presto works in a distributed environment and hence, needs a cluster to work with.
Lucky for us, Amazon Web Services provides us the ability to use Presto installed as a part of EMR clusters with all basic configurations done.
We need not create any of the basic configurations files mentioned in the previous section. AWS provides us a complete working setup configured across all nodes of the cluster.
As of the date of writing this blog, with EMR version 5.30.1, we get Presto version 0.232 and with EMR version 6.0.0, we get Presto version 0.230.

We can check our configurations at the path:
/etc/presto/conf/

Also the presto installation directory can be found at:
/usr/lib/presto/

From here we can navigate into the bin folder and use the pre packaged Presto CLI executable provided to us to start the Presto CLI.

Note: One thing to note is that the way Presto is set up on EMR, it runs on port 8889 rather than the default port 8080.

Configuring external connectors with Presto on EMR

To configure external connectors that are supported by presto, we need to create their respective configuration files at the following path:
/etc/presto/conf/catalog/

The naming convention used to create a properties file is:
”<supported connector name>.properties”
Each properties file shall contain additional configurations required for that particular connector.

Suppose we wanted to configure MySQL as a source with Presto, Let us have a look at how we can do it:

First we navigate to the directory: /etc/presto/conf/catalog/
Then we create a new file with the name: “mysql.properties“
Inside the “mysql.properties“ files, we add the following configurations:

connector.name=mysql connection-url=jdbc:mysql://host:port connection-user=mysql_username connection-password=mysql_password

We now need to copy this file on each of the worker nodes in the same location.
Once the file is copied, we need to stop and start the Presto service on each node of the cluster. We can do so by using the commands:
sudo systemctl stop presto-server
sudo systemctl start presto-server
We can now start the Presto CLI and begin using MySQL with presto.

In a similar fashion we can create as many of the supported Presto connectors as necessary.
More information on the connectors can be found from the official Presto documentation:
https://prestodb.io/docs/current/index.html

Running Presto on Amazon Web Services

Introduction

Presto Architecture

Presto Requirements

Installing and using Presto

Running Presto on Amazon Web Services

Configuring external connectors with Presto on EMR

Written by BigData & Cloud Practice

No responses yet