In this competitive world of technology, Machine Learning and Artificial Intelligence technologies have emerged as a breakthrough for developing advanced AI applications like image recognition, natural language processing, speech translation, and more. However, developing such AI-powered applications would require massive amounts of computational power far beyond the capabilities of CPUs (Central Processing Units).
That’s because CPUs come with very few hand-countable cores and threads. So, CPUs can only process a few threads at a time, which becomes a bottleneck for the highly parallelizable computations required for deep learning algorithms. This gave rise to the use of GPUs (Graphics Processing Units), which shipped with thousands of cores and can handle thousands of threads simultaneously and are designed for mathematically-intensive tasks like real-time 3D graphics rendering, crypto mining, deep learning where a large number of mathematical computations are required.
NVIDIA, the GPU manufacturer giant, revolutionized the applicability of neural networks by developing CUDA, a parallel computing platform and API model for GPU acceleration. This allowed developers to leverage the processing prowess of NVIDIA GPUs for general-purpose computing through languages like C/C++ and Python. To further ease the development of GPU-accelerated deep learning applications, companies like Meta and Google developed frameworks like PyTorch and TensorFlow. Built on top of CUDA, these frameworks provide high-level APIs for building and training neural networks in Python without directly working with low-level CUDA code.
PyTorch, developed by Facebook’s AI Research lab, has emerged as one of the leading choices for developing, training, and interfering in deep learning research and production. With its imperative programming model and Pythonic syntax, PyTorch is widely adopted in natural language processing and reinforcement learning applications.
However, setting up PyTorch on Windows with GPU support can be challenging with multiple dependencies like NVIDIA drivers, CUDA toolkit, CUDNN library, PyTorch and TensorFlow versions, etc. In this comprehensive guide, I aim to provide a step-by-step process to setup PyTorch for GPU devices on Windows 10/11. Let’s begin this post by going through the prerequisites like hardware requirements, driver setup, and installing CUDA, CUDNN, Anaconda, and Pytorch. We will share details on correctly configuring environment variables and verifying GPU access.
By the end, you will learn the complete process of setting up PyTorch on Windows to leverage the power of NVIDIA GPUs for accelerating deep neural network training and inferencing.
Before diving into the installation process, let’s familiarize ourselves with some common terminologies:
- CPU (Central Processing Unit) – The main processor in a computer that handles computations. CPUs are optimized for sequential serial processing.
- GPU (Graphics Processing Unit) – Specialized electronic circuits designed to rapidly process parallel workloads. GPUs are optimized for parallel processing and ideal for machine learning workloads. Popular GPUs are made by Nvidia.
- NVIDIA – Leading manufacturer of GPUs commonly used for AI/ML workloads. Popular Nvidia GPUs include the Tesla and GeForce RTX series.
- CUDA – Parallel computing platform created by Nvidia that allows software developers to leverage the parallel computing capabilities of Nvidia GPUs.
- cuDNN – Nvidia’s library of GPU-accelerated deep neural network primitives. Helps optimize common deep learning operations.
- Anaconda – Open source Python distribution designed for large-scale data processing, predictive analytics, and scientific computing. Comes bundled with many popular data science libraries.
- PyTorch – Open source machine learning framework based on Python and optimized for GPU acceleration. Provides flexibility like Python with high performance like C++.
- TensorFlow – End-to-end open-source machine learning platform developed by Google. Offers many high-level APIs for building and training ML models.
- IDE (Integrated Development Environment) – A software application that provides tools and interfaces for programmers to develop software and applications. Examples: Visual Studio Code, PyCharm, and Jupyter Notebook.
- CUDA Cores – Processing units within Nvidia GPUs designed specifically to perform the calculations required for parallel computing. More CUDA cores lead to improved parallel processing performance.
- CUDA Toolkit – Software development kit created by Nvidia that provides GPU-accelerated libraries, compilers, development tools, and APIs for developing software that leverages Nvidia GPUs.
- Conda Env – Self-contained directory that contains a specific collection of conda packages, Python interpreter, and other dependencies needed to run an application. Helpful for managing dependencies.
- FP32/FP64 – Floating point precision formats that represent 32-bit (single precision) or 64-bit (double precision) floating point values. FP32 requires less memory so commonly used but FP64 offers higher precision.
- NVCC – Nvidia’s C/C++ compiler that can compile code for both CPU and GPU. Part of the CUDA Toolkit.
- Half Precision – 16-bit floating point format that requires less memory and bandwidth compared to 32-bit FP32. Useful for some ML applications.
- Auto Mixed Precision – Training deep learning neural networks using both lower precision (FP16) and higher precision (FP32) automatically based on need. Helps accelerate training while retaining accuracy.
- Tensor Cores – Specialized cores within Nvidia GPUs designed specifically to accelerate mixed precision matrix multiplication operations commonly used in deep learning.
- Machine Learning – The field of computer science that gives computers the ability to learn without being explicitly programmed. Focuses on computer programs that can teach themselves to grow, change, and improve on their own by using algorithms and statistical models to analyze data.
- Deep Learning – A subfield of machine learning that uses neural networks modeled after the human brain and containing multiple layers. Excels at finding patterns in large amounts of unstructured data like images, video, audio, and text.
- Natural Language Processing (NLP) – The field of AI focused on enabling computers to understand, interpret, and manipulate human language. The key component of conversational AI.
Prerequisites to Setup Pytorch for Your GPU on Windows 10/11
First and foremost thing, you can’t setup either CUDA or machine learning frameworks like Pytorch or TensorFlow on any machine that has GPU. there are certain hardware and software requirements that must be met. Let’s see the key prerequisites in this section.
- NVIDIA GPU – A CUDA-capable GPU from NVIDIA is essential. CUDA and related libraries like cuDNN only work with NVIDIA GPUs shipped with CUDA cores. Check out the CUDA-comparable GPUs here.
- Compatible Motherboard – The motherboard should have a PCIe slot to accommodate the NVIDIA GPU. It should be compatible with installing and interfacing with the GPU.
- Minimum 8GB RAM – Having enough RAM is important for running larger deep learning models. The CUDA guide recommends at least 8GB for optimal performance. If you are into heavy-duty tasks then we suggest going for 32GB or higher for desktop computers.
- Enough Disk Space – You will need a few GB of free disk space to install the NVIDIA drivers, CUDA toolkit, cuDNN, PyTorch, and other libraries. We installed CUDA 12.2 on our Windows PC. It eaten up to 20 GB of disk space.
- Windows 10/11 or Windows Server 2022/2019 – The OS should be Windows 10 or the latest Windows 11 for full compatibility. On the server side Microsoft Windows Server 2022 or Microsoft Windows Server 2019
- NVIDIA GPU Drivers – The latest Game Ready driver from NVIDIA that supports your GPU model. It allows Windows to recognize the GPU.
- CUDA Toolkit – Provides libraries, APIs, and compilers like nvcc to enable GPU acceleration.
- cuDNN – The GPU-accelerated library for deep learning primitives from NVIDIA.
- Visual Studio – The Visual C++ redistributables are needed to run Python on Windows.
- Anaconda/Miniconda – To manage Python packages and environments.
- PyTorch/TensorFlow – The deep learning framework we aim to install and use with CUDA/GPUs.
These are the essential prerequisites in terms of hardware and software for setting up PyTorch on Windows with CUDA GPU acceleration based on NVIDIA’s official documentation. The actual installation steps will be covered later in this guide.
The Demo PC System Specification
We tried this on one of our Windows PCs, which has the below hardware and software.
- CPU: Intel Core i& 7700 8 cores 16 threads with 3.2 G
- RAM: 32 GB DDR4 2400 MGh
- GPU: GTX 1050 2 GB
- SSD: Intel NVME M.2 1TB
- OS: Windows 10 22H
- NVIDIA CUDA Tool Kit: 12.2 (on System) and 12.1 (on Conda)
- cuDNN: 8.9.4
- Anaconda: 22.9.0
- Python: 3.8, 3.9, 3.10
- PyTorch: 2.1.0.dev20230902 py3.9_cuda12.1_cudnn8_0 pytorch-nightly
- Visual Studio Community 2022 with ‘Universal Windows Platform development‘ and ‘Desktop Development with C++‘ (optional)
- PyCharm: 2023.2.1 (Community Edition)
How to Setup Pytorch for Your GPU on Windows 10/11?
This is the phase, in which we spend most of our time. Sometimes, more than expected. What do you think the real challenge here is, it’s neither the procedure nor the application/packages/drivers. It’s the comparability between them (GPU cards, Storage, RAM, Python, CUDA, cuDNN, PyTorch, or TensorFlow). That’s why it takes time to figure out the combination of versions that match your computer hardware.
We tried covering detailed step-by-step instructions to setup Pytorch for Your GPU on Windows PC. We used Windows 10 for this demo. However, the procedure remains the same for Windows 11 either.
Let’s start this process with the installation of NVIDIA Drivers.
Time needed: 1 hour.
How to Setup Pytorch for Your GPU on Windows 10/11?
- Install NVIDIA GPU Drivers
The first thing you need is the proper driver for your NVIDIA GPU. Head over to NVIDIA’s driver download page and get the latest Game Ready driver or Studio Driver. Make sure to select Windows 10/11 – 64-bit as the operating system.
Alternatively, you can go to the NVIDIA GPU card page and download the driver. You will get the same driver installer either way.
Run the downloaded executable and follow the on-screen instructions. A system restart will be required after installing the drivers. Restart the system to complete the Driver installation process.
Note: If you are not sure which GPU you have, fire up the ‘Device Manager’ window and expand the ‘Display Adopters’. Your GPU card should be listed if you have configured it without any issues.
- Install CUDA Toolkit
CUDA or Compute Unified Device Architecture is NVIDIA’s parallel computing platform and API model that allows software developers to leverage GPU acceleration. It is required for frameworks like PyTorch to utilize NVIDIA graphics cards.
Go to the CUDA toolkit archive and download the latest stable version that matches your Operating System, GPU model, and Python version you plan to use (Python 3.x recommended).
Again, run the installer accepting all the default options. Remember the installation path as we will need to configure environment variables later on.
- Install cuDNN Library
To further boost performance for deep neural networks, we need the cuDNN library from NVIDIA. You will need to create an NVIDIA developer account to access the download link.
Download the cuDNN version that matches your CUDA toolkit version. For example, if you installed CUDA 10.1, you need cuDNN 7.6.X.
Unzip the downloaded package and copy the contents of
include, and lib to the respective directory paths where CUDA is installed on your machine. For example:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib
We downloaded cuDNN 8.9.4. Unzipped it, and copied the DLLs from bin, include, and lib to the respective folders underneath NVIDIA GPU Computing Toolkit\CUDA\v12.2\
- Configure Environment Variables
We need to update a few environment variables for CUDA, cuDNN, and the NVIDIA driver to work properly.
Click Win +R to open the Run prompt. Type “sysdm.cpl” then Enter. System Properties opens up. Go to ‘advance’ Tab. Click on ‘Environment Variables’. Under
System Variables, click Path and add the following:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp
C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common
C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.2\
C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
- Install Anaconda
We will use Anaconda to set up and manage the Python environment for LocalGPT.
1. Download the latest Anaconda installer for Windows from https://www.anaconda.com/products/distribution
2. Choose Python 3.10 or higher during installation.
3. Complete the installation process and restart your terminal.
4. Open the Anaconda Prompt which will have the Conda environment activated by default.
To verify the installation is successful, fire up the ‘Anaconda Prompt’ and enter this command:
Refer to these online documents for installation, setting up the environmental variable, and troubleshooting: https://docs.anaconda.com/free/anaconda/install/windows/
Once installed, open the Anaconda Prompt terminal and create a new Conda environment:
conda create -n <Conda env name> python=3.9
Activate the environment:
conda activate <Conda env name>
To deactivate the Conda Env:
To activate the Base Env:
conda activate base
To see the list of Environments:
conda env list
- Install IDE (Optional)
For this demo, we will use PyCharm as our chosen IDE for Python development. You can check out how to install PyCharm on Windows here. If in case you are not a fan of any IDE, you can directly download the Python interpreter and use it on your CLI.
Download and install PyCharm Community Edition from jetbrains.com/pycharm. Make sure to customize the installer to add Anaconda Python environment support.
Once setup is complete, open PyCharm. We need to configure it to use the Conda environment we created earlier. Go to File > Settings > Project Interpreter. Click the gear icon and select “Add”. Locate the Python executable in the <conda env> folder in the Anaconda installation.
Great! Now PyCharm is configured to use the right Python environment.
Let us know if you need any help setting up the IDE to use the PyTorch GPU environment we have configured.
- Take a Note of CUDA Took Kit, CUDA Runtime API, CUDA Driver API, and GPU Card version
Before going ahead with the installation, you should make a note of the CUDA Tool Kit version installed on your system. CUDA has two APIs Runtime API and Driver API. Run these commands to check the CUDA API version:
Run this command to check the Runtime API version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:09:35_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.140
Run this command to check the Driver API version:
GPU 0: NVIDIA GeForce GTX 1050 (UUID: GPU-c45d4514-6d35-9cb7-cfb9-c2fae7306659)
Run this command to check the GPU is recognized.
Sun Sep 3 23:18:11 2023
| NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 |
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 NVIDIA GeForce GTX 1050 WDDM | 00000000:01:00.0 On | N/A |
| 45% 39C P8 N/A / 75W | 1306MiB / 2048MiB | 0% Default |
| | | N/A |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| 0 N/A N/A 4552 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 4896 C+G ...ocal\Programs\Evernote\Evernote.exe N/A |
| 0 N/A N/A 7772 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 8088 C+G ...oogle\Chrome\Application\chrome.exe N/A |
| 0 N/A N/A 8932 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 9884 C+G ...m Files\Mozilla Firefox\firefox.exe N/A |
| 0 N/A N/A 10504 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 11060 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 13208 C+G ...les\Microsoft OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 14972 C+G ...pulse\Screenpresso\Screenpresso.exe N/A |
| 0 N/A N/A 16352 C+G ...Brave-Browser\Application\brave.exe N/A |
| 0 N/A N/A 16864 C+G ...72.0_x64__8wekyb3d8bbwe\GameBar.exe N/A |
| 0 N/A N/A 17252 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 17404 C+G ...m Files\Mozilla Firefox\firefox.exe N/A |
| 0 N/A N/A 17780 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
| 0 N/A N/A 18216 C+G ...es (x86)\MSI\Fast Boot\FastBoot.exe N/A |
| 0 N/A N/A 18288 C+G ...l\Microsoft\Teams\current\Teams.exe N/A |
| 0 N/A N/A 19000 C+G ....0_x64__8wekyb3d8bbwe\HxOutlook.exe N/A |
| 0 N/A N/A 20136 C+G ...e Stream\18.104.22.168\GoogleDriveFS.exe N/A |
| 0 N/A N/A 21352 C+G ...2.0_x64__cv1g1gvanyjgm\WhatsApp.exe N/A |
| 0 N/A N/A 23116 C+G ...ork Manager\MSI_Network_Manager.exe N/A |
| 0 N/A N/A 29064 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 29400 C+G ...l\Microsoft\Teams\current\Teams.exe N/A |
CUDA Version: 12.2
GPU: NVIDIA GeForce GTX 1050
Driver Version: 537.13
The above information is required to download the PyTorch and TensorFlow frameworks.
- Install PyTorch and CUDA for your GPU
Now we are all set to install PyTorch and CUDA for your GPU on a Windows machine. Visit the PyTorch website: https://pytorch.org/get-started/locally/ to construct the command to install PyTorch and CUDA.
Select the PyTorch Build, OS, Package, Programming Language, and CUDA version.
You can choose the Pip instead of the Conda Package. It’s absolutely find and gives the same outcome. Since we used Anaconda as our package manager, we will go with Conda option.
In our case, we selected:
1. PyTorch Build: Nightly (Since we are using the latest NVIDIA CUDA Tool Kit)
2. OS: Windows
3. Package: Conda (You can use Pip. Both will give the same outcome. )
4. Language: Python
5. Compute Platform: CUDA 12.1 (Since we can’t select 12.2, we selected the closest version)
Copy the command and run it on your Conda terminal. Note: Ensure you have activated the correct Conda Environment before you execute this command
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia
This will install the latest stable PyTorch version 2.1.0.dev20230902 py3.9_cuda12.1_cudnn8_0 pytorch-nightly with CUDA 12.1.
- Verify GPU Usage
We can validate that everything is working as expected by running a small PyTorch program:
device = torch.device("cuda")
device = torch.device("cpu")
print("Using", device, "device")
We tried setting up GPU for TensorFlow, but unfortunately, we couldn’t configure it for GPU. After surfing the TesorFlow site, we got to know that there is no GPU support for Windows and Mac. Windows and Mac have only CPU support. If you want to run TensorFlow for your GPU, it could only be possible on Linux.
“GPU support on native Windows is only available for 2.10 or earlier versions, starting in TF 2.11, CUDA build is not supported for Windows. For using TensorFlow GPU on Windows, you will need to build/install TensorFlow in WSL2 or use TensorFlow-cpu with TensorFlow-DirectML-Plugin”
In this comprehensive guide, we went through the entire process of setting up PyTorch on Windows 10/11 with CUDA GPU acceleration.
We looked at the hardware and software prerequisites like having an NVIDIA GPU, and compatible motherboard and installing the necessary NVIDIA drivers, CUDA toolkit, cuDNN library, etc.
The step-by-step installation process was explained in detail including tips like doing a clean driver install and verifying CUDA installation using nvcc.
Configuring the environment variables properly is key to ensuring PyTorch can locate the CUDA install directory. Using Anaconda for Python package management simplifies setting up Conda environments for PyTorch.
The latest PyTorch stable release was installed with matching CUDA driver support for leveraging the GPU acceleration. A simple PyTorch program was run to confirm that the GPU is accessible.
Following these steps will help you correctly configure PyTorch on Windows for up to 10-50x speedups on deep learning workloads by harnessing the power of your NVIDIA GPUs. However, the performance purely depends on the GPU card used over the CPU chip. We will plan the performance test in some other post.
You now have the required setup to start developing and training large neural network models efficiently by exploiting the massively parallel processing capabilities of NVIDIA GPUs. The flexible PyTorch framework offers you full control over model building, debugging, and optimization for your AI projects.