Running Presto on Amazon Web Services

Introduction

Presto is an open source, distributed query engine that enables us to perform fast, interactive and analytical type of queries on various sizes of datasets. Presto is SQL compliant and supports many data sources.

Presto Architecture

Presto Requirements

Presto has a few basic requirements that we need to ensure:

  • Linux or Mac OS X
  • Java 8, 64-bit
  • Python 2.4+

Installing and using Presto

We can download Presto from its official website as a tarball package: https://prestosql.io/download.html
Unzipping it will return a single top-level directory.

  • Node Properties: environmental configuration specific to each node
  • JVM Config: command line options for the Java Virtual Machine
  • Config Properties: configuration for the Presto server
  • Catalog Properties: configuration for Connectors (data sources)

Running Presto on Amazon Web Services

As mentioned earlier, Presto works in a distributed environment and hence, needs a cluster to work with.
Lucky for us, Amazon Web Services provides us the ability to use Presto installed as a part of EMR clusters with all basic configurations done.
We need not create any of the basic configurations files mentioned in the previous section. AWS provides us a complete working setup configured across all nodes of the cluster.
As of the date of writing this blog, with EMR version 5.30.1, we get Presto version 0.232 and with EMR version 6.0.0, we get Presto version 0.230.

Configuring external connectors with Presto on EMR

To configure external connectors that are supported by presto, we need to create their respective configuration files at the following path:
/etc/presto/conf/catalog/

  • First we navigate to the directory: /etc/presto/conf/catalog/
  • Then we create a new file with the name: “mysql.properties“
  • Inside the “mysql.properties“ files, we add the following configurations:
  • We now need to copy this file on each of the worker nodes in the same location.
  • Once the file is copied, we need to stop and start the Presto service on each node of the cluster. We can do so by using the commands:
    sudo systemctl stop presto-server
    sudo systemctl start presto-server
  • We can now start the Presto CLI and begin using MySQL with presto.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BigData & Cloud Practice

BigData & Cloud Practice

53 Followers

Abzooba is an AI and Data Company. BD&C Practice is one of the fastest growing groups in Abzooba helping several fortune 500 clients in there cognitive journey