
Use the generative AI base model in VPC mode without internet connectivity using Amazon SageMaker JumpStart
With the latest advances in generative AI, there’s a lot of discussion going on about how to use generative AI across various industries to solve specific business problems. Generative AI is a type of AI that can generate new content and ideas, including conversations, stories, images, videos, and music. It is all supported by very large models that are pre-trained on large amounts of data and are commonly referred to as foundation model (FM). These FMs can perform a variety of tasks spanning multiple domains, such as writing blog posts, generating images, solving math problems, engaging in dialogue, and answering questions based on documents. FM’s size and general-purpose nature make it different from traditional ML models, which typically perform specific tasks, such as analyzing text for sentiment, classifying images, and predicting trends.
While organizations want to use these strengths of FM, they also want FM-based solutions running in their own protected environment. Organizations operating in highly regulated spaces such as global financial services and healthcare and life sciences have hearing and compliance requirements to run their environment in their VPC. In fact, often even direct internet access is disabled in these environments to avoid exposure to unwanted traffic, both incoming and outgoing.
Amazon SageMaker JumpStart is an ML hub that offers ML algorithms, models, and solutions. With SageMaker JumpStart, ML practitioners can choose from a growing list of top performing open source FMs. It also provides the ability to deploy this model in your own Virtual Private Cloud (VPC).
In this post, we demonstrate how to use JumpStart to implement a Flan-T5 XXL model in VPC without internet connectivity. We cover the following topics:
- How to deploy the base model using SageMaker JumpStart in a VPC without internet access
- Advantages of deploying FM via the SageMaker JumpStart model in VPC mode
- An alternative way to customize foundation model implementation is via JumpStart
In addition to the FLAN-T5 XXL, JumpStart provides many different foundation models for various tasks. For a complete list, see Get Started with Amazon SageMaker JumpStart.
Solution overview
As part of the solution, we cover the following steps:
- Set up a VPC without an internet connection.
- Set up Amazon SageMaker Studio using the VPC we built.
- Deploy a generative AI Flan T5-XXL base model using JumpStart in a VPC without internet access.
Here is an architectural diagram of the solution.
Let’s walk through the different steps to implement this solution.
Precondition
To follow this post, you will need the following:
Set up a VPC without an internet connection
Create a new CloudFormation stack using 01_network. yaml template. This template creates a new VPC and adds two private subnets in two Availability Zones without internet connectivity. Then deploy a VPC gateway endpoint to access Amazon Simple Storage Service (Amazon S3) and connect VPC endpoints for SageMaker and some other services so that resources in the VPC can connect to AWS services through AWS PrivateLink.
Give the stack a name, such as No-Internet
and complete the stack creation process.
This solution is highly unavailable because the CloudFormation template creates a VPC interface endpoint in only one subnet to reduce costs when following the steps in this post.
Set up Studio using VPC
Create another CloudFormation stack using 02_sagemaker_studio. yaml, which creates the Studio domain, Studio user profiles, and support resources such as IAM roles. Choose a name for the stack; for this post, we use the name SageMaker-Studio-VPC-No-Internet
. Name the VPC stack you created earlier (No-Internet
) as CoreNetworkingStackName
parameters and leave everything else as default.
Wait for AWS CloudFormation to report that stack creation is complete. You can confirm that the Studio domain is available for use in the SageMaker console.
To verify that the Studio domain user does not have internet access, launch Studio using the SageMaker console. Choose Submit, NewAnd Terminal, then try to access the internet resource. As shown in the following screenshot, the terminal will continue to wait for resources and eventually times out.
This proves that Studio is operating in a VPC that doesn’t have internet access.
Implement the Flan T5-XXL generative AI base model using JumpStart
We can deploy this model via Studio as well as via API. JumpStart provides all the code for deploying models via a SageMaker notebook accessible from within Studio. For this post, we’re showing off this capability from Studio.
- On the Studio welcome page, select it Upgrade under Prebuilt and automated solutions.
- Choose the Flan-T5 XXL model below Foundation Models.
- By default, it opens a file Spread tabs. Expand Deployment Configuration section to change
hosting instance
Andendpoint name
, or add additional tags. There is also an option to changeS3 bucket location
where the model artifacts will be stored to create the endpoint. For this post, we left everything at their default values. Note the endpoint name to use when calling the endpoint to make predictions.
- Expand Security settings section, where you can specify
IAM role
to create an endpoint. You can also specifyVPC configurations
by providingsubnets
Andsecurity groups
. The subnet ID and security group ID can be found from the Output tab of the VPC stack in the AWS CloudFormation console. SageMaker JumpStart requires at least two subnets as part of this configuration. Subnets and security groups control access to and from the model container.
NOTES: Regardless of whether a SageMaker JumpStart model is deployed in a VPC or not, it always runs in network isolation mode, which isolates the model container so that no incoming or outgoing network calls can be made to or from the model container. Since we are using a VPC, SageMaker downloads model artifacts through the VPC we specify. Running model containers in network isolation does not prevent your SageMaker endpoint from responding to inference requests. The server process runs beside the model container and forwards inference requests, but the model container has no network access.
- Choose Spread to deploy models. We can see the near real-time status of the endpoint creation in progress. Endpoint creation may take 5–10 minutes to complete.
Note the field values Location of model data on this page. All SageMaker JumpStart models are hosted in a SageMaker managed S3 bucket (s3://jumpstart-cache-prod-{region}
). Therefore, regardless of which model is selected from JumpStart, the model will be deployed from the publicly accessible SageMaker JumpStart S3 bucket and traffic never goes to the public model zoo API to download the model. This is why model endpoint creation starts successfully even when we create endpoints in a VPC that doesn’t have direct internet access.
Model artifacts can also be copied to any private model zoo or your own S3 bucket for further control and securing of model resource locations. You can use the following command to download models locally using the AWS Command Line Interface (AWS CLI):
aws s3 cp s3://jumpstart-cache-prod-eu-west-1/huggingface-infer/prepack/v1.0.2/infer-prepack-huggingface-text2text-flan-t5-xxl.tar.gz .
- After a few minutes, the endpoint was created successfully and showed status as In service. Choose
Open Notebook
inUse Endpoint from Studio
part. This is a sample notebook provided as part of the JumpStart experience for quickly testing endpoints.
- In a notebook, select Picture as Data Science 3.0 and kernels as Python 3. When the kernel is ready, you can run the notebook cell to make predictions on the endpoint. Note that the notebook uses call_endpoint() APIs from AWS SDK for Python to make predictions. Or, you can use SageMaker Python SDK predict() method to achieve the same result.
This concludes the steps for deploying the Flan-T5 XXL model using JumpStart in a VPC without internet access.
Advantages of implementing the SageMaker JumpStart model in VPC mode
Here are some of the advantages of implementing the SageMaker JumpStart model in VPC mode:
- Since SageMaker JumpStart does not download models from a public model zoo, it can also be used in a fully locked environment where there is no internet access.
- Because network access can be restricted and restricted for the SageMaker JumpStart model, it helps teams improve the environment’s security environment
- Due to VPC restrictions, access to endpoints can also be restricted via subnets and security groups, which adds an additional layer of security
An alternative way to customize foundation model implementation is via SageMaker JumpStart
In this section, we share some alternative ways to implement models.
Use the SageMaker JumpStart API from your IDE of choice
The models provided by SageMaker JumpStart do not require you to access Studio. You can deploy it to a SageMaker endpoint from any IDE, thanks JumpStart API. You can skip the Studio setup steps covered earlier in this post and use the JumpStart API to deploy the model. This API provides arguments by which VPC configuration can also be provided. API is part of SageMaker Python SDK self. For more information, see Pre-trained model.
Use the notebook provided by SageMaker JumpStart from SageMaker Studio
SageMaker JumpStart also provides a notebook for implementing models directly. On the model details page, select Open a notebook to open a sample notebook that contains the code to deploy the endpoint. Notebook using SageMaker JumpStart Industry API which allows you to list and filter models, retrieve artifacts, and deploy and query endpoints. You can also edit the notebook code according to the specific requirements of your use case.
Clean resources
Look CLEANING. md file to find detailed steps to remove Studio, VPC, and other resources created as part of this post.
Problem solving
If you’re having trouble creating a CloudFormation stack, see CloudFormation Troubleshooting.
Conclusion
Generative AI powered by large language models is changing the way people derive and apply insights from information. However, organizations operating in highly regulated spaces must use generative AI capabilities in ways that allow them to innovate more quickly but also simplify patterns of access to those capabilities.
We encourage you to try the approach provided in this post to embed generative AI capabilities in your current environment while still keeping it in your own VPC without internet access. For further reading on the SageMaker JumpStart foundation model, see the following:
About the Author
Vikesh Pandey is a Machine Learning Specialist Solution Architect at AWS, helping customers from the financial industry design and build solutions across generative AI and ML. Outside of work, Vikesh enjoys trying different dishes and exercising outdoors.
Mehran Nikoo is a Senior Solutions Architect at AWS, working with Digital Native businesses in the UK and helping them achieve their goals. Passionate about applying his software engineering experience to machine learning, he specializes in end-to-end machine learning and MLOps practice.