Running Presto on Amazon Web Services

Article by Hakim Pocketwalla, Big Data & Cloud Developer

Introduction

Presto is a distributed system that runs on a cluster of machines. A full installation includes a coordinator and multiple workers.
Queries are submitted from a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the query execution, then distributes the processing to the workers.

Presto Architecture

Presto Requirements

  • Linux or Mac OS X
  • Java 8, 64-bit
  • Python 2.4+

Installing and using Presto

We now need to create an “etc“ directory inside the installation directory.
The “etc“ directory will hold the following configuration:

  • Node Properties: environmental configuration specific to each node
  • JVM Config: command line options for the Java Virtual Machine
  • Config Properties: configuration for the Presto server
  • Catalog Properties: configuration for Connectors (data sources)

More details of these configurations can be obtained from the presto deployment guide:
https://prestodb.io/docs/current/installation/deployment.html

Running Presto on Amazon Web Services

We can check our configurations at the path:
/etc/presto/conf/

Also the presto installation directory can be found at:
/usr/lib/presto/

From here we can navigate into the bin folder and use the pre packaged Presto CLI executable provided to us to start the Presto CLI.

Note: One thing to note is that the way Presto is set up on EMR, it runs on port 8889 rather than the default port 8080.

Configuring external connectors with Presto on EMR

The naming convention used to create a properties file is:
”<supported connector name>.properties”
Each properties file shall contain additional configurations required for that particular connector.

Suppose we wanted to configure MySQL as a source with Presto, Let us have a look at how we can do it:

  • First we navigate to the directory: /etc/presto/conf/catalog/
  • Then we create a new file with the name: “mysql.properties“
  • Inside the “mysql.properties“ files, we add the following configurations:

connector.name=mysql
connection-url=jdbc:mysql://host:port
connection-user=mysql_username
connection-password=mysql_password

  • We now need to copy this file on each of the worker nodes in the same location.
  • Once the file is copied, we need to stop and start the Presto service on each node of the cluster. We can do so by using the commands:
    sudo systemctl stop presto-server
    sudo systemctl start presto-server
  • We can now start the Presto CLI and begin using MySQL with presto.

In a similar fashion we can create as many of the supported Presto connectors as necessary.
More information on the connectors can be found from the official Presto documentation:
https://prestodb.io/docs/current/index.html

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BigData & Cloud Practice

Abzooba is an AI and Data Company. BD&C Practice is one of the fastest growing groups in Abzooba helping several fortune 500 clients in there cognitive journey