Announcing new Jupyter contribution by AWS to democratize generative AI and scale ML workloads
Project Jupyter is a multi-stakeholder open source project that builds applications, open standards, and tools for data science, machine learning (ML), and computing science. Jupyter notebooks, first released in 2011, have become the de facto standard tool used by millions of users worldwide in every possible academic, research, and industrial sector. Jupyter enables users to work with code and data interactively, and to build and share computational narratives that provide a complete and reproducible record of their work.
Given the importance of Jupyter to data scientists and ML developers, AWS is an active sponsor and contributor to the Jupyter Project. Our goal is to work within the open source community to help Jupyter become the best notebook platform for data science and ML. AWS is the platinum sponsor of Project Jupyter through the NumFOCUS Foundation, and I am proud and honored to lead a team of dedicated AWS engineers who contribute to Jupyter software and participate in Jupyter’s community and governance. Our open-source contributions to Jupyter include JupyterLab, Jupyter Server, and the Jupyter Notebook subproject. We are also members of the Jupyter working group on Security, and Diversity, Equality, and Inclusion (DEI). Parallel to this open source contribution, we have an AWS product team working to integrate Jupyter with products like Amazon SageMaker.
Today at JupyterCon, we’re excited to announce several new tools for Jupyter users to enhance their experience and boost development productivity. All of these tools are open source and can be used wherever you run Jupyter.
Introducing two generative AI extensions for Jupyter
Generative AI can significantly increase the productivity of data scientists and developers when they write code. Today, we’re announcing two Jupyter extensions that bring generative AI to Jupyter users through the chat UI, IPython magic commands, and autocomplete. This extension allows you to perform a variety of development tasks using generative AI models in JupyterLab and Jupyter notebooks.
Jupyter AI, an open source project to bring generative AI to Jupyter notebooks
Using the power of big language models like ChatGPT, Jurassic-2 AI21, and (coming soon) Amazon Titan, Jupiter AI is an open source project bringing generative AI features to Jupyter notebooks. For example, using a large language model, Jupyter AI can help programmers build, debug, and annotate their source code. Jupyter’s AI can also answer questions about local files and generate entire notebooks from a simple natural language prompt. Jupyter AI offers magic commands that work on any notebook or IPython shell, and a friendly chat UI in JupyterLab. These two experiences working with dozens of models from various model providers. JupyterLab users can select any text or notebook cell, enter natural language commands to perform tasks with that selection, then insert the AI-generated response wherever they want. Jupyter AI is integrated with Jupyter’s MIME type system, which lets you work with any input and output type Jupyter supports (text, image, etc.). Jupyter AI also provides integration points that allow third parties to configure their own models. Jupyter AI is the official open source project of the Jupyter Project.
Amazon CodeWhisperer Jupyter extension
Autocompletion is foundational to developers and generative AI can significantly enhance the code suggestion experience. That’s why we’re announcing the general availability of Amazon CodeWhisperer in early 2023. is an AI coding companion that takes the underlying model under the hood to drastically increase developer productivity. It works by generating code suggestions in real time based on developer comments in natural language and previous code in their integrated development environment (IDE).
Today, we’re excited to announce that JupyterLab users can install and use the CodeWhisperer extension for free to generate full-function, one-line, or real-time code suggestions for Python notebooks in JupyterLab and Amazon SageMaker Studio. With CodeWhisperer, you can write comments in natural language that describe a specific task in English, such as “Creating a pandas data frame using a CSV file”. Based on this information, CodeWhisperer recommends one or more code snippets right in the notebook that can accomplish the task. You can quickly and easily accept top suggestions, view more suggestions, or continue writing your own code.
During its preview, CodeWhisperer proved that it is amazing at generating code to speed up coding tasks, helping developers complete tasks 57% faster on average. Additionally, developers who use CodeWhisperer are 27% more likely to complete coding tasks successfully than those who don’t. This is a huge jump in developer productivity. CodeWhisperer also includes a built-in reference tracker that detects whether code suggestions may resemble open-source training data and can flag them.
Introducing a new Jupyter extension for building, training, and deploying ML at scale
Our mission at AWS is to democratize access to ML across industries. To achieve this goal, starting in 2017, we launched Amazon SageMaker notebook instances—a fully managed computing instance running Jupyter that includes all popular data science and ML packages. In 2019, we made a significant leap forward with the launch of SageMaker Studio, an IDE for ML built on top of JupyterLab that lets you build, train, tune, debug, deploy, and monitor models from a single application. Tens of thousands of customers use Studio to empower data science teams of all sizes. In 2021, we further extended the benefits of SageMaker to our community of millions of Jupyter users by launching Amazon SageMaker Studio Lab—a free notebook service, again based on JupyterLab, that includes free compute and persistent storage.
Today, we’re excited to announce three new capabilities to help you scale your ML development faster.
In 2022, we’re releasing new capabilities to allow customers to run notebooks as scheduled jobs in SageMaker Studio and Studio Lab. Thanks to these capabilities, many of our customers have saved time by not having to manually set up complex cloud infrastructure to scale their ML workflows.
We are happy to announce that the notebook scheduling tool is now an open source Jupyter extension that allows JupyterLab users to run and schedule notebooks in SageMaker anywhere JupyterLab is running. Users can select a notebook and automate it as a running job in a production environment via a simple yet powerful user interface. Once a notebook is selected, the tool takes a snapshot of the entire notebook, packages its dependencies in a container, builds the infrastructure, runs the notebook as an automated job according to a user-defined schedule, and unprovisions the infrastructure once the job is done. This reduces the time it takes to move notebooks to production from weeks to hours.
SageMaker open source distribution
Data scientists and developers want to start developing ML applications quickly, and it can be complicated to install mutually compatible versions of all the necessary packages. To eliminate manual work and increase productivity, we are happy to announce it new open source distribution which includes the most popular packages for ML, data science, and data visualization. This distribution includes deep learning frameworks such as PyTorch, TensorFlow, and Keras; popular Python packages such as NumPy, scikit-learn, and pandas; and IDEs such as JupyterLab and Jupyter Notebook. The distribution is versioned using SemVer and will be released regularly going forward. Containers are available through Amazon ECR Public Gallery, and the source code is available on GitHub. This provides companies transparency into their package and manufacturing processes, making it easier for them to reproduce, customize, or recertify distributions. The base image comes with pip and Conda/Mamba, so data scientists can quickly install additional packages to meet their specific needs.
Amazon CodeGuru Jupyter extension
Amazon CodeGuru Security now supports security and code quality scanning in JupyterLab and SageMaker Studio. This new capability assists notebook users in detecting security vulnerabilities such as injection weaknesses, data leaks, weak cryptography, or missing encryption within notebook cells. You can also detect many common issues affecting the readability, reproducibility, and correctness of computing notebooks, such as ML library API abuse, invalid run orders, and nondeterminism. When a vulnerability or quality issue is identified in a notebook, CodeGuru generates recommendations that allow you to remedy the issue based on AWS security best practices.
We’re excited to see how the Jupyter community will use these tools to scale, increase productivity, and leverage generative AI to transform their industry. Check out the following resources to learn more about Jupyter on AWS and how to install and get started with this new tool:
About the Author
Brian Granger is a Python project leader, co-founder of the Jupyter Project, and an active contributor to a number of other open source projects focused on data science in Python. In 2016, he co-created the Altair package for statistical visualization in Python. She is a member of the advisory board of the NumFOCUS Foundation, a faculty member at the Cal Poly Center for Innovation and Entrepreneurship, and Sr. Principal Technologist at AWS.