Dummies Guide to Deploying a Custom Pytorch Model on AWS Sagemaker

9 min readDec 28, 2020

If you are offended by the ‘Dummies’ in the title, this post is not for you. In this post, I’ll walk you through each minute step of the process on how to deploy a custom machine learning model which is built and trained locally using custom algorithm, as a REST API using Sagemaker and docker. This is because I found that there are a lot of moving parts of the code that you need to understand which most of the references I observed miss out on considering it is very basic to even mention. And, if you would like to know nitty grittiness of what you are doing like me, you might find those “easy-enough” parts challenging.

Sagemaker is a fully managed machine learning service, which provides you support to build models using built-in-algorithms, with native support for bring-your-own-algorithms and ML frameworks such as Apache MXNet, PyTorch, SparkML, Tensorflow and Scikit-Learn.

The post is bifurcated into following parts:

Changing the structure of your trained model’s directory to accommodate new files
Refactoring server and inference code
Sagemaker endpoint and model creation
Invoking the model from Sagemaker notebook or lambda (Model serving and hosting)

If you are like me and have been googling about this, you might have seen the following so far:

Use PyTorch with the SageMaker Python SDK - sagemaker 2.23.0 documentation

With PyTorch Estimators and Models, you can train and host PyTorch models on Amazon SageMaker. For information about…

sagemaker.readthedocs.io

aws/amazon-sagemaker-examples

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

One talks about refactoring model_fn, input_fn, output_fn and predict_fn to serve the model. Another one talks about refactoring inference.py and server files for model deployment. Both of which can fulfil your requirement but I found the latter one much easier for my purpose at least.

In my case, I had the model built and trained it locally with the following structure:

Everything below with (I) as prefix is something you need to be worried about and everything else is dependent on my model.

(I) text_to_audio.py: This is the main file which is generating my model’s output i.e given a text string, it generates an audio file (building that model is out of scope of this post and will be covered in another one)
(I) Dockerfile: describes how to build your Docker container image.
Dependencies: My model’s dependencies
docker_compose_development.yml and docker_compose.yml: With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration
docker_model.sh: Download pretrained models
entrypoint.sh: model specific
models: directory containing trained model
pipelines: directory containing dependencies for creating a pipeline for my model

Step. 1: Changing the structure of your trained model’s directory to accommodate new files

So, you might have noticed here, you need to be concerned only with the script file that is generating your model’s output and it’s docker file.

Now, we will start by copying few files that will help us in creating a REST endpoint that will be using Flask for the deployment of your pre trained model. You can copy 4 files that we will need for this from the link below:

aws/amazon-sagemaker-examples

Example notebooks that show how to apply machine learning, deep learning and reinforcement learning in Amazon SageMaker…

github.com

Once done, your directory will look something like this:

We have copied the following files: nginx.conf, serve, wsgi.py and predictor.py

The files that we’ll put in the container are:

nginx.conf is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
predictor.py is the program that actually implements the Flask web server and your models output for this app. You'll want to customize the actual prediction parts to your application.
serve is the program which gets started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py.
wsgi.py is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.

In summary, the two files you will probably want to change for your application are serve and predictor.py.

Step 2: Refactoring server and inference code

When an endpoint is invoked Sagemaker interacts with the Docker container, which runs the inference code for hosting services and processes the request and returns the response. Containers need to implement a web server that responds to /invocations and /ping on port 8080.

Inference code in the container will receive GET requests from the infrastructure and it should respond to Sagemaker with an HTTP 200 status code and an empty body, which indicates that the container is ready to accept inference requests at invocations endpoint.

And invocations is the endpoint that receives POST requests and responds as per the model’s code

Following is the code in my output generation script (text_to_audio.py):

As you can see, our models expect a text and an output path to save the generated audio. Now, the same can be accommodated in our predictor.py script mentioned below:

We have used subprocess.Popen to take the input text from the JSON object along with the output path and then the same is used by our text_to_audio.py

Step 3: Sagemaker endpoint, endpoint configuration and model creation

Sagemaker uses docker containers to allow your custom models to be deployed to the Elastic Container Registry. You can put your scripts, algorithms, and inference code for your models in the containers, which includes the runtime, system tools, libraries and other code to deploy your models, which provides flexibility to run your own model.

You create Docker containers from images that are saved in an ECR. You build the images from scripted instructions provided in a Dockerfile.

The Dockerfile describes the image that you want to build with complete operating system installation of the system that you want to run. You can ignore the above sections of my Docker file as it uses GPU installation as base image and then run the tools to install the dependencies needed by my inference code. All you need to see here are the following:

You need to copy the files nginx.conf, predictor.py, serve and wsgi.py to /opt/code (as mentioned in AWS example) or /app/ (in my model’s case) and make it as a working directory.

Another thing to notice here is that everything is copied to your working directory (whatever it may be /app or /opt/ml) and that the entrypoint is the ‘serve’ file.

To build a local image, use the following command.

docker-build -t <image-name>

I have used nvidia-docker for GPU image but for CPU, you can simply use docker.

Then, create a repository in AWS ECR and click on Push Commands as highlighted in the below image. It will give you commands required to push your docker image to ECR:

Another way of pushing is to use something called, build_and_push.sh (it can be found in above aws-sagemaker-example link. It depends on which approach you wanna use to push the image to ECR.

Before pushing the repository, you need to configure your AWS CLI and login. Just type aws configure, enter and provide the requested information.

It will allow your EC2 instance or local computer to know which AWS account to use for further deployment and storage.

Sagemaker model creation: Go to inference > model

It requires two mandatory things:

Model name: Provide whatever you wish to name your model

Location of inference code image: it will be generated after the above section and will be in the format mentioned in the cell (aws_account_id.dkr……..)

Endpoint Configuration: Sagemaker>Inference>Endpoint Configuration

It requires following mandatory things:

Endpoint configuration name
Add model (it will show your previously generated model in the above section)

Endpoint: Sagemaker>Inference>Endpoint

Once you are done creating a model, endpoint configuration, it is time for you to create an endpoint.

It requires following mandatory things:

Endpoint name
Attach endpoint configuration (use the above created configuration from the dropdown)

Step 4: Testing locally before deploying

This is one of the most important thing that I have seen other posts miss out on. For me, I had to change lot of things from time to time in my code to make it work at last. And, it is not economical and efficient to build the docker image every time, build a model, create an endpoint to test it via Sagemaker or a lambda function.

So, even before going there, we will be testing it locally (in my case, EC2 instance) to see if the code actually works.

The question that any beginner might reckon is: How do we do that? There might be lot of ways but this is how I did it:

I created a .py test file to send the payload to my flask application.

Here, I provided the above mentioned input.json path, load it as dictionary and then dump that text from JSON as payload to the POST request.

2. Now, you can run your docker image you created above by following command:

docker run -p 8080:8080 -it <docker_image>

3. Once the docker is running, you can simply type following command to run the script we created in the first step:

python ~/predict_test.py

If the output of the script is 200, you are good to go. The model is being loaded correctly and it is generating prediction/output as intended.

Here’s an example of what it will output if successful:

However, you might encounter few other errors which can be resolved by googling those in stack overflow. Few of the errors that I encountered are as follows:

I chose CPU based instance while model creation instead of GPU (which my model requires). This gave me error stating that the server is overloaded.
I had to install few dependencies via my Docker image like ca-certificates, gevent, nginx and so on.

Step 5: Invoking the model from Sagemaker notebook or curl command (Model serving and hosting)

I will be passing following text as an input to the predictor.py file. It is in JSON format by the name input.json

I will be now loading this input.json to my Sagemaker Notebook instance and use boto3 to call the endpoint.

Another way to call our endpoint and check if it is working is by using a lambda function in the following manner:

By using the above approach, I was able to create a REST endpoint to call my custom built model inside Sagemaker and save the audio output to s3 bucket as shown in the image below: