diff --git a/README.md b/README.md index 201d620..5fc580e 100644 --- a/README.md +++ b/README.md @@ -4,15 +4,19 @@ This repository contains examples of Docker images that are valid custom images for KernelGateway Apps in SageMaker Studio. These custom images enable you to bring your own packages, files, and kernels for use with notebooks, terminals, and interactive consoles within SageMaker Studio. +You can find more information about using Custom Images in the [SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html). + ### Examples -- [echo-kernel-image](examples/echo-kernel-image) - This example uses the echo_kernel from Jupyter as a "Hello World" introduction into writing custom KernelGateway images. -- [jupyter-docker-stacks-julia-image](examples/jupyter-docker-stacks-julia-image) - This example leverages the Data Science image from Jupyter Docker Stacks to add a Julia kernel. -- [r-image](examples/r-image) - This example contains the `ir` kernel and a selection of R packages, along with the AWS Python SDK (boto3) and the SageMaker Python SDK which can be used from R using `reticulate` -- [rapids-image](examples/rapids-image) - This example uses the offical rapids.ai image from Dockerhub. Use with a GPU instance on Studio -- [scala-image](examples/scala-image) - This example adds a Scala kernel based on [Almond Scala Kernel](https://almond.sh/). -- [tf2.3-image](examples/tf23-image) - This examples uses the official TensorFlow 2.3 image from DockerHub and demonstrates bundling custom files along with the image. -#### One-time setup +- [echo-kernel-image](examples/echo-kernel-image) - Uses the echo_kernel from Jupyter as a "Hello World" introduction into writing custom KernelGateway images. +- [javascript-tf-image](examples/javascript-tf-image) - [tslab](https://www.npmjs.com/package/tslab)-based kernels for JavaScript or TypeScript, including [TensorFlow.js](https://www.tensorflow.org/js) and CUDA GPU libraries. +- [jupyter-docker-stacks-julia-image](examples/jupyter-docker-stacks-julia-image) - Leverages the Data Science image from Jupyter Docker Stacks to add a Julia kernel. +- [r-image](examples/r-image) - Contains the `ir` kernel and a selection of R packages, along with the AWS Python SDK (boto3) and the SageMaker Python SDK which can be used from R using `reticulate` +- [rapids-image](examples/rapids-image) - Uses the offical rapids.ai image from Dockerhub. Use with a GPU instance on Studio +- [scala-image](examples/scala-image) - Adds a Scala kernel based on [Almond Scala Kernel](https://almond.sh/). +- [tf2.3-image](examples/tf23-image) - Uses the official TensorFlow 2.3 image from DockerHub and demonstrates bundling custom files along with the image. + +### One-time setup All examples have a one-time setup to create an ECR repository @@ -28,4 +32,4 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) ### License -This sample code is licensed under the MIT-0 License. See the LICENSE file. \ No newline at end of file +This sample code is licensed under the MIT-0 License. See the LICENSE file. diff --git a/examples/javascript-tf-image/Dockerfile b/examples/javascript-tf-image/Dockerfile new file mode 100644 index 0000000..800b99d --- /dev/null +++ b/examples/javascript-tf-image/Dockerfile @@ -0,0 +1,195 @@ +# A CUDA-capable, tslab-based JS/TS kernel container for SageMaker Studio, including TensorFlow.js. +# +# Python & CUDA configuration with inspiration from the AWS TensorFlow Deep Learning containers, e.g: +# https://github.com/aws/deep-learning-containers/blob/master/tensorflow/training/docker/2.4/py3/cu110/Dockerfile.gpu +# +# Use with Jupyter kernel 'jslab' or 'tslab'; user config as per NB_UID/NB_GID; home folder /home/sagemaker-user +FROM nvidia/cuda:11.0-base-ubuntu18.04 + +ARG NB_USER="sagemaker-user" +ARG NB_UID="1000" +ARG NB_GID="100" + +ARG NODEJS_VERSION=14.x +ARG PYTHON_VERSION=3.9.4 +ARG PYTHON=python3.9 +ARG PYTHON_PIP=python3-pip +ARG PIP=pip3 + +# Prevent setup prompts hanging on user input in our non-interactive environment: +ENV DEBIAN_FRONTEND=noninteractive +ENV DEBCONF_NONINTERACTIVE_SEEN=true + +# Python config for logging, IO, etc: +ENV PYTHONDONTWRITEBYTECODE=1 +ENV PYTHONUNBUFFERED=1 +ENV PYTHONIOENCODING=UTF-8 +ENV LANG=C.UTF-8 +ENV LC_ALL=C.UTF-8 + +# Optimizing TF for Intel/MKL, as per: +# https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html +# (May not be relevant for TensorFlow.js?) +ENV KMP_AFFINITY=granularity=fine,compact,1,0 +ENV KMP_BLOCKTIME=1 +ENV KMP_SETTINGS=0 +ENV MANUAL_BUILD=0 + +USER root +WORKDIR /root + +# Setup the NB user with root privileges. +RUN apt-get update && \ + apt-get install -y sudo && \ + useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \ + chmod g+w /etc/passwd && \ + echo "${NB_USER} ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers && \ + # Prevent apt-get cache from being persisted to this layer. + rm -rf /var/lib/apt/lists/* + +RUN apt-get update \ + && apt-get install -y --no-install-recommends --allow-unauthenticated \ + ca-certificates \ + cuda-command-line-tools-11-0 \ + cuda-cudart-dev-11-0 \ + libcufft-dev-11-0 \ + libcurand-dev-11-0 \ + libcusolver-dev-11-0 \ + libcusparse-dev-11-0 \ + curl \ + libcudnn8=8.0.5.39-1+cuda11.0 \ + # TensorFlow doesn't require libnccl anymore but Open MPI still depends on it + libnccl2=2.7.8-1+cuda11.0 \ + libgomp1 \ + libnccl-dev=2.7.8-1+cuda11.0 \ + libfreetype6-dev \ + libhdf5-serial-dev \ + liblzma-dev \ + libtemplate-perl \ + libzmq3-dev \ + git \ + unzip \ + wget \ + libtool \ + libssl1.1 \ + openssl \ + build-essential \ + zlib1g-dev \ + && apt-get update \ + && apt-get install -y --no-install-recommends --allow-unauthenticated \ + libcublas-11-0=11.2.0.252-1 \ + libcublas-dev-11-0=11.2.0.252-1 \ + # The 'apt-get install' of nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 + # adds a new list which contains libnvinfer library, so it needs another + # 'apt-get update' to retrieve that list before it can actually install the + # library. + # We don't install libnvinfer-dev since we don't need to build against TensorRT, + # and libnvinfer4 doesn't contain libnvinfer.a static library. + # nvinfer-runtime-trt-repo doesn't have a 1804-cuda10.1 version yet. see: + # https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64/ + && apt-get update && apt-get install -y --no-install-recommends --allow-unauthenticated \ + nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \ + && apt-get update && apt-get install -y --no-install-recommends --allow-unauthenticated \ + libnvinfer7=7.1.3-1+cuda11.0 \ + && rm -rf /var/lib/apt/lists/* + +# Set default NCCL parameters +RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf + +# /usr/local/lib/libpython* needs to be accessible for dynamic linking +ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH + +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + libbz2-dev \ + libc6-dev \ + libffi-dev \ + libgdbm-dev \ + libncursesw5-dev \ + libreadline-gplv2-dev \ + libsqlite3-dev \ + libssl-dev \ + tk-dev \ + && rm -rf /var/lib/apt/lists/* \ + && apt-get clean + +# Install specific Python version: +RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \ + && tar -xvf Python-$PYTHON_VERSION.tgz \ + && cd Python-$PYTHON_VERSION \ + && ./configure --enable-shared && make && make install \ + && rm -rf ../Python-$PYTHON_VERSION* + +RUN ${PIP} --no-cache-dir install --upgrade \ + pip \ + setuptools + +# Provide a "python" binary for any tools that need it: +RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \ + && ln -s $(which ${PIP}) /usr/bin/pip + +# Install specific NodeJS version: +RUN curl --silent --location https://deb.nodesource.com/setup_$NODEJS_VERSION | bash - +RUN apt-get install --yes nodejs + +# Optional Python packages for AWS/SageMaker/DataScience/C++ in case they're helpful: +RUN ${PIP} install --no-cache-dir \ + pybind11 \ + cmake==3.18.2.post1 \ + # (Numpy & Pandas needed for SageMaker SDK) + numpy==1.20.0 \ + pandas==1.2.4 \ + # python-dateutil==2.8.1 to satisfy botocore associated with latest awscli + python-dateutil==2.8.1 \ + # install PyYAML>=5.4 to avoid conflict with latest awscli + "pyYAML>=5.4,<5.5" \ + requests==2.25.1 \ + "awscli<2" \ + "sagemaker>=2,<3" \ + sagemaker-experiments==0.* \ + smclarify \ + smdebug==1.0.8 + +# More setup of optional Python packages: +ENV CPATH="/usr/local/lib/python3.9/dist-packages/pybind11/include/" +RUN apt-get update && apt-get -y install cmake protobuf-compiler + +# Required Python packages: +RUN ${PIP} install --no-cache-dir \ + # tslab brings a separate kernel, but uses Python during setup so needs ipykernel to be present: + "ipykernel>=5,<6" \ + && ${PYTHON} -m ipykernel install --sys-prefix + +# Install NodeJS libraries: +# `npm install -g` pushes global installs to /usr/lib/node_modules in this case, which tslab doesn't seem +# able to resolve per the issue below - even if we `ENV NODE_PATH=/usr/lib/node_modules`... So instead, +# we'll central/kernel-provided libs non-"globally" at the filesystem root: +WORKDIR / +RUN npm install \ + aws-sdk@2 \ + # (For performance, use -node-gpu where you can, else -node, else tfjs) + @tensorflow/tfjs@3.5 \ + @tensorflow/tfjs-node@3.5 \ + @tensorflow/tfjs-node-gpu@3.5 \ + # tslab supports both TypeScript and JavaScript: + typescript@1.4 + +# The `tslab` kernel provider should be installed globally, and then hooked in to Jupyter: +RUN npm install -g tslab@1.0 \ + && tslab install + +# Now final user setup: +USER $NB_UID + +# Set up user env vars: +# (Bash default shell gives a better Jupyter terminal UX than `sh`) +ENV SHELL=/bin/bash \ + NB_USER=$NB_USER \ + NB_UID=$NB_UID \ + NB_GID=$NB_GID \ + HOME=/home/$NB_USER + +WORKDIR $HOME + +# SageMaker will override the entrypoint when running in context - so just set bash for debugging: +CMD ["/bin/bash"] diff --git a/examples/javascript-tf-image/README.md b/examples/javascript-tf-image/README.md new file mode 100644 index 0000000..4e397d6 --- /dev/null +++ b/examples/javascript-tf-image/README.md @@ -0,0 +1,107 @@ +## JavaScript / TypeScript Image with TensorFlow.js + +### Overview + +> NOTE: This Dockerfile installs dependencies that may be licensed under copyleft licenses such as GPLv3. You should review the license terms and make sure they are acceptable for your use case before proceeding and downloading this image. + +A SageMaker Studio-compatible notebook kernel image for JavaScript or TypeScript, based on [tslab](https://www.npmjs.com/package/tslab). + +This example: + +- Derives from [nvidia/cuda](https://hub.docker.com/r/nvidia/cuda) images (as some of the [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers) do), for GPU driver support +- Includes [TensorFlow.js](https://www.tensorflow.org/js) and the [AWS SDK for JavaScript](https://aws.amazon.com/sdk-for-javascript/) +- Packages some additional Python-oriented AWS and SageMaker utilities including the [AWS CLI](https://aws.amazon.com/cli/), and [SageMaker SDK for Python](https://sagemaker.readthedocs.io/en/stable/) + + +### Building the image + +Build the Docker image and push to Amazon ECR. + +> ⏰ **Note:** This image can take several minutes to build Python components from source, and typically reaches several GB in size. + +```bash +# Modify these as required. The Docker registry endpoint can be tuned based on your current region from https://docs.aws.amazon.com/general/latest/gr/ecr.html#ecr-docker-endpoints +REGION= +ACCOUNT_ID= +IMAGE_NAME=custom-jsts + +aws --region ${REGION} ecr get-login-password | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom +docker build . -t ${IMAGE_NAME} -t ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME} +``` + +```bash +docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME} +``` + + +### Using with SageMaker Studio + +Once the image is pushed to Amazon ECR, you can attach it to your SageMaker Studio domain via either the console UI or the CLI. See the [SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi-create.html) for more details. + +The tslab installation in this image exports **two Jupyter kernels** you can use: + +- `tslab`: For working in [TypeScript](https://www.typescriptlang.org/) +- `jslab`: For standard JavaScript + +Both kernel options see the same globally-pre-installed npm packages, and use the same SageMaker Studio filesystem configuration (user, group, folder) as shown in [app-image-config-input.json](app-image-config-input.json). + +At the time of writing SageMaker Studio does not (yet?) fully support multi-kernel images, so you'll need to create **two** ["SageMaker Images"](https://console.aws.amazon.com/sagemaker/home?#/images) (referencing the same [ECR](https://console.aws.amazon.com/ecr/repositories/private/) URI)) - if you want to expose both JS and TS options to users in your domain. + +For example, to create the SageMaker Image via AWS CLI: + +```bash +# IAM Role in your account to be used for the SageMaker Image setup process: +ROLE_ARN= + +# May want to use an alternative SM_IMAGE_NAME if registering two SageMaker images to same ECR container: +SM_IMAGE_NAME=${IMAGE_NAME} + +aws --region ${REGION} sagemaker create-image \ + --image-name ${SM_IMAGE_NAME} \ + --role-arn ${ROLE_ARN} + +aws --region ${REGION} sagemaker create-image-version \ + --image-name ${IMAGE_NAME} \ + --base-image "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}" + +# Verify the image-version is created successfully. +# Do NOT proceed if image-version is in CREATE_FAILED state or in any other state apart from CREATED. +aws --region ${REGION} sagemaker describe-image-version --image-name ${IMAGE_NAME} +``` + +...and then configure the container runtime settings in SageMaker: + +```bash +# TODO: Edit the JSON to point to whichever of 'tslab' or 'jslab' kernel you intend to use +aws --region ${REGION} sagemaker create-app-image-config --cli-input-json file://app-image-config-input.json +``` + +The final step is to attach your SageMaker Custom Image(s) to your SageMaker Studio domain. You can do this from the [AWS Console for SageMaker Studio](https://console.aws.amazon.com/sagemaker/home?#/studio/d-doedz9htjn38), or using the `aws sagemaker update-domain` [command](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-domain.html) in the AWS CLI. + + +### Further information + +#### Magics and package installations + +SageMaker Python users may be used to installing packages inline on kernels using `!pip install ...` commands. + +[Magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html) (like `%%bash` or `!` for running shell scripts) are generally specific to the IPython (Python) kernel and may not be available in other implementations (like `tslab` used here). + +See [this discussion](https://github.com/yunabe/tslab/issues/35) for alternatives and roadmap for running shell commands like `npm install` in tslab. + +You could, for example, run a cell like: + +```js +const { execSync } = require("child_process"); +execSync("npm install lodash", { encoding: "utf-8" }); +``` + +#### A note on GPU-accelerated notebooks + +This kernel includes CUDA libraries in case you want to experiment with GPU-accelerated instance types like `ml.g4dn.xlarge` in notebook. + +**However**, remember that a general best practice is to **package your high-resource code as SageMaker Jobs** (such as Processing, Training, or Batch Transform jobs) and keep your notebook environment resources modest (such as an `ml.t3.medium`). Working with SageMaker Jobs early in the build process can help: + +- Optimize infrastructure costs (since these jobs spin up and release their infrastructure on-demand) +- Improve experiment tracking (since the SageMaker APIs automatically store history of training job inputs, parameters, metrics, and so on) +- Accelerate the path to production (by e.g. attempting to train models in container environments that are already set up for inference deployment) diff --git a/examples/javascript-tf-image/app-image-config-input.json b/examples/javascript-tf-image/app-image-config-input.json new file mode 100644 index 0000000..2102643 --- /dev/null +++ b/examples/javascript-tf-image/app-image-config-input.json @@ -0,0 +1,16 @@ +{ + "AppImageConfigName": "custom-ts", + "KernelGatewayImageConfig": { + "KernelSpecs": [ + { + "Name": "tslab", + "DisplayName": "TensorFlow.js (TypeScript)" + } + ], + "FileSystemConfig": { + "MountPath": "/home/sagemaker-user", + "DefaultUid": 1000, + "DefaultGid": 100 + } + } +} \ No newline at end of file