Setting up (permissions for) Elasticsearch Service in AWS

Elasticsearch logo

Recently, I’ve moved one of my project from CloudSearch (CS) to Elasticsearch Service (ES), which roughly provides the same service. For me, the rationale to move from CS to ES was lowering costs: CS is approximately three times more expensive than ES. You pay CS and ES ‘by the hour’, and if you access them a lot, you’ll basically pay all hours every month. To give you an idea, CS set me back around AU$54 a month (AU$0.075/hr, whereas ES will cost as little as AU$17 a month (AU$0.024/hr). As a bonus: ES can be used in the Free Tier program, which CS doesn’t allow.

I used CS for two purposes: to find documents based on search queries, and to make suggestions for search terms while people are typing (like Google does all the time). In order to be useful, you have to index documents in ES, which I solved by creating a Lambda that updates ES, and is triggered by updates in DynamoDB.

We’ll set up an Elasticsearch domain, and create an API plus Lambda to act as a proxy for it. This way, you can use “nice” URLs in stead of the horrific AWS URLs.

Elasticsearch works with domains, so the first action you have to perform is to create one. In AWS, go to Services and click Elasticsearch Service. (Toptip: you can sort AWS services by name, which I find very helpful.)

A-Z button
Sorting AWS services by Group or Alphabetically

In the ES dashboard, click on ‘Create a new domain’. Type in a domain name, and select a version (5.1 at the time of writing is the most recent one). In the next screen, you can select more advanced options. I’ve kept all default values, especially since that way I qualify for using the Free Tier plan for ES. Click next, then select an Access Policy. This is where things will start to get interesting. (I write this post because I spent quite some time figuring out the permissions. I want to share my experience, so no one else has to go through this.)

You have to choose a policy in order to create your domain, but we will revisit this later on, so feel free to chose any template and click Next. Creating an ES domain takes a few minutes (up to 10 minutes, according to Amazon).

I created two Serverless (the framework, not the concept) services that will act as proxies for ES; one for the actual searching, and one for updating (indexing) documents. Here’s a small part of the serverless.yml configuration file regarding these services:

functions:
elasticsearchUpdate:
handler:
elasticsearch.update
events:
- stream: <dynamodb-stream-1>
- stream: <dynamodb-stream-2>
searchProxy:
handler:
search.handler
events:
- http:
path:
search
method: get
cors: true

The elasticsearchUpdate service updates the ES domain with documents originating from DynamoDB table updates. (Those streams are set up in DynamoDB. In short: select a table, click on ‘Manage Stream’ in Overview, select ‘New image’, and finally click ‘Enable’. Use the stream’s ARN to update the appropriate fields in the serverless configuration.)

The searchProxy service will act as the actual proxy to the outside world. (The http event in the configuration creates an API Gateway endpoint /search allowing the GET method and calls the lambda function handler in the file index.js.)

API Gateway becomes very useful if you add a custom domain. You’ll need to create an SSL certificate for you sub domain to do so, but that is easy and free if you use ACM (Amazon Certificate Manager). It falls outside the scope of this post, but it’s fairly straightforward to achieve and definitely worth the effort.

The lambda that proxies the search requests is fairly simple:

'use strict';
const fetch = require('node-fetch');

const ES_SERVICE_ENDPOINT = `https://${process.env.ES_SERVICE_ENDPOINT}`;

const done = (cb) => (error, data) => {
cb(null, {
statusCode: error ? 400 : 200,
body: JSON.stringify(error ? error.message : data),
headers: {
'Content-Type' : 'application/json',
"Access-Control-Allow-Origin" : "*",
"Access-Control-Allow-Credentials" : true
},
});
};

module.exports.handler = (event, context, callback) => {
console.log(JSON.stringify(event, null, 3));

// Get search query parameter
const q = event.queryStringParameters.q;

// Return empty object if query is empty
if (q === undefined) {
return done(callback)(null, {hits: {total: 0, hits: []}});
}
    // Call Elasticsearch Service endpoint
fetch(`${ES_SERVICE_ENDPOINT}/_search?q=${q}`)
.then(response => {
return response.json()
.then(json => done(callback)(null, {json}));
})
.catch(err => {
console.log(`${err.code}: ${err.message}`);
return done(callback)(err);
});
};

Because I use API gateway in ‘Lambda proxy integration’ mode, I use a convenience function done, which makes sure we send something useful back. The actual handler retrieves the query parameter q, creates a specific ES URL, and calls fetch. Upon receiving the response, we simply return the response’s json and send it back to the original requester.

The ES proxy for updating (indexing) document is a bit more complicated, which is why I won’t reproduce it here. However, you can find it as a Gist at this location: https://gist.github.com/Sandyman/6bd342264b79838ae38958a7e08283b7.

It’s fairly straightforward, and with the help of the comments you should be able to figure it out. Please note that the lambda handles two streams from two different tables.

The lambda uses a package called aws-es to do the actual heavy lifting. Add it to your project using npm or yarn:

npm i aws-es -S

or:

yarn add aws-es

There’s one thing left to do, which we kind of ignored when we created the domain: settings up permissions for ES. The search proxy permission isn’t the problem, this can be tackled by creating a permission allowing es:ESHttpGet in ES:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "es:ESHttpGet",
"Resource": "arn:aws:es:us-east-1:<account-id>:domain/es-coffeebean-ninja/*"
}
]
}

Click ‘Modify access policy’ in ES to change this. (You might want to write your JSON in a separate editor, because ES shows an error popup every time you make a mistake, which is horrible. It makes the editor quite useless.) Click submit when you’re done.

Setting up permissions for indexing documents was the really frustrating part. This was quite a drag, and it took me a couple of days to figure out how to get it play nicely, and it still feels a bit like a hack. But hey, it works!

The first thing you need to do, is create a user in IAM who has no permissions whatsoever. None. Go to Users, select ‘Add user’, enter a name (e.g., es-user) and select ‘Programmatic Access’. Click ‘Next: Permissions’ and without adding any permissions click ‘Next: Review’. Ignore the warning, and click ‘Create user’.

On the next screen, you’ll be shown the user’s Access Key and Secret Access Key. Make sure to copy them and securely save them. You’ll need them later on. (Please note that it is not possible to retrieve these credentials at any time. If you lose them, you’ll have to create new credentials.)

Now, copy the user’s ARN, which can be found on the user’s summary page and return to ES. Once again, select ‘Modify access policy’ for your domain and add the following section in the Statement array:

    {
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-id>:user/es-user"
},
"Action": "es:*",
"Resource": "arn:aws:es:us-east-1:<account-id>:domain/es-coffeebean-ninja/*"
},

It provides the user es-user with full access to ES (as indicated by es:*).

Finally, make sure your lambda uses the Access Key and Secret Access Key of this user for authentication. I solved this by adding environment variables in the serverless configuration file, which can be accessed in the lambda function using process.env.SOME_ENV_VARIABLE. You’ll see how this works if you look at the Gist again.

That’s it! You have successfully set up a convenient way to access your Elasticsearch Service. Try it, and hopefully with the help of this post you’ll be able to get it working a bit quicker than me! And if you wonder why I left out the second use case (suggestions), I haven’t put in the effort to get this working. Maybe I’ll write another post about that when it’s done.