Skip to content

Distributed Computing

January 9, 2025
September 21, 2016

FIXME: some info are for distributed computing, some are for hardware (GPU) acceleration

data-analytics
data-analytics-python

9 crushing performance problems in scalable systems | InfoWorld
Get more done at the Linux command line with GNU Parallel | Opensource.com

The Cluster Documentation Project » ADMIN Magazine
Cluster Documentation Project

Multigrid method - Wikiwand
Fallacies of distributed computing - Wikiwand
Fallacies of Distributed Systems
Understanding the 8 Fallacies of Distributed Systems - DZone Microservices

Four Distributed Systems Architectural Patterns by Tim Berglund - YouTube
Why Distributed Systems Are Hard - YouTube

HPC » ADMIN Magazine
Parallelizing Code – Loops » ADMIN Magazine

Jepsen: On the perils of network partitions
Strong consistency models
The network is reliable
Burn the Library

Tick or Tock? Keeping Time and Order in Distributed Databases| PingCAP

A split brain is what happens when you have multiple autonomous sub-clusters forming, and more than one believe they're the "master". This can cause irreconcilable changes and data loss.

Google I/O 2009 - Transactions Across Datacenters.. - YouTube
Debugging Incidents in Google’s Distributed Systems - ACM Queue

How to do distributed locking — Martin Kleppmann’s blog
Distributed Locks Are Dead, Long Live Distributed Locks - DZone Java

Consensus Mechanisms

How to Agree: Different Types of Consensus for Blockchain
Building Your Own Consensus | Hackaday

Understanding Blockchain Fundamentals, Part 1: Byzantine Fault Tolerance

Episode 377: Heidi Howard on Distributed Consensus : Software Engineering Radio
分布式系统的事务处理 | | 酷 壳 - CoolShell
Paxos (computer science) - Wikiwand
Paxos Made Live - An Engineering Perspective (2006 Invited Talk) – Google Research
Paxos Made Moderately Complex

Raft Consensus Algorithm
Raft (computer science) - Wikiwand

Byzantine Fault Tolerance

Byzantine fault - Wikiwand
Byzantine Fault Tolerance Explained | Binance Academy
The-Byzantine-Generals-Problem.pdf
How does blockchain solve the Byzantine generals problem?

Node.js

Mostafa-Samir/klyng: A message-passing distributed computing framework for node.js

bithound/farm.bithound.io: “All animals are equal, but some animals are more equal than others.” a simple "framework" that bitHound used for working in a distributed environment; uses ZeroMQ

substack/dnode: turtles all the way down rpc
substack/dnode-protocol: Implements the dnode protocol abstractly in node.js

substack/fleet: multi-server continuous git-based deployment and process management
substack/seaport: semver service registry for clusters
substack/airport: role-based port management for upnode

Python

pyamgx – Accelerated Python Library » ADMIN Magazine algebraic multigrid

Scale your pandas workflow by changing a single line of code. — Modin documentation Pandas API on Ray/Dask
Ray documentation
Dask: Scalable analytics in Python

High-Performance Python – Distributed Python » ADMIN Magazine

ArrayFire

ArrayFire | Faster Code
Blog | ArrayFire
ArrayFire Users - Google Groups

Configuring ArrayFire Environment

The API docs seems outdated, source code may provide some functions not in the doc
Docs Overview
Docs Tutorials
Docs Functions
Docs Complete List of ArrayFire Functions

arrayfire/arrayfire: ArrayFire: a general purpose GPU library.
arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library.
arrayfire/arrayfire-rust: Rust wrapper for ArrayFire

Build

Home · arrayfire/arrayfire Wiki
Jetson/Installing ArrayFire - eLinux.org

ArrayFire master branch (3.7) as of 20190801 reports "Unsupported compiler Intel" upon build. Use official 3.6 release.

ArrayFire 3.6

CUDA 10 requires CMake 3.12.3 (need to build from source on Ubuntu 18.04)

For PC it's easiest to install the prebuilt binary.

GPU

The GPU evolution: from simple graphics to AI brains - YouTube
How GPUs are Beginning to Displace Clusters for Big Data & Data Science - By Dan Voyce
How do Graphics Cards Work? Exploring GPU Architecture - YouTube

AmgX | NVIDIA Developer algebraic, physics
AmgX: Multi-Grid Accelerated Linear Solvers for Industrial Applications

PyOpenCL

NVIDIA GPUDirect | NVIDIA Developer
GPU 通信技术初探(一)
How to Overlap Data Transfers in CUDA C/C++ | NVIDIA Developer Blog

What differences and relations between SVM, HSA, HMM and Unified Memory

How to Build Your GPU Cluster: Process and Hardware Options
PyTorch Multi GPU: 3 Techniques Explain red
TensorFlow Multiple GPU: 5 Strategies and 2 Quick Tutorials
Keras Multi GPU: A Practical Guide

Scaling Up

NVLink from nVidia

UALink (Unified Accelerator Link)

八巨头联手硬控英伟达!老黄慌了吗? - YouTube
Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium
AMD & Intel Team Up For UALink As Open Alternative To NVIDIA's NVLink - Phoronix
UALink will be the NVLink Standard Backed by AMD Intel Broadcom Cisco and More
Nvidia’s NVLink Vs. UALink. How NVIDIA’s Hype United Tech Giants in… | by Ali Waseem | Jun, 2024 | Medium

Scaling Out

Spectrum-X from nVidia

Ultra Ethernet Consortium

CUDA

docker-nvidia

CUDA - Wikiwand
An Even Easier Introduction to CUDA | NVIDIA Developer Blog
Programming Guide :: CUDA Toolkit Documentation

CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation

CUDA Toolkit Downloads | NVIDIA Developer
Guides - Installing the NVIDIA CUDA Toolkit | Linode
Installation Guide Linux :: CUDA Toolkit Documentation post install, set PATH and LD_LIBRARY_PATH
CUDA on WSL :: CUDA Toolkit Documentation
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow

nvidia-smi's return the CUDA driver's version (around 410.48), not the CUDA runtime version, use nvcc --version
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow !important
How to get the CUDA version? - Stack Overflow

PyCUDA

Getting Started With CUDA for Python Programmers - YouTube 2024-01
Going Further with CUDA for Python Programmers - YouTube 2024-01

Getting Started with NVIDIA GPU CUDA Core Programming Using Visual Studio in 2021 - YouTube
Intro to Parallel Programming CUDA - Udacity 458 - YouTube
CUDA Programming - YouTube
Tutorial: CUDA programming in Python with numba and cupy - YouTube

Home - CUDA Tutorial
Imaging and Computer Vision |NVIDIA

NVIDIA Collective Communications Library (NCCL) | NVIDIA Developer multi-GPU and multi-node collective communication primitives

CUDA Monopoly

Why Nvidia's AI monopoly is coming to an end - YouTube

vosen/ZLUDA: CUDA on ??? GPUs
ZLUDA Project Paves the Way for CUDA on Intel GPUs | Tom's Hardware
Software allows CUDA code to run on AMD and Intel GPUs without changes — ZLUDA is back but both companies ditched it, nixing future updates | Tom's Hardware
Nvidia bans using translation layers for CUDA software — previously the prohibition was only listed in the online EULA, now included in installed files [Updated] | Tom's Hardware
ZLUDA: CUDA For AMD GPUs Returns From The Grave - YouTube

SCALE documentation
New SCALE tool enables CUDA applications to run on AMD GPUs | Tom's Hardware
AMD ‘Scales’ up its CUDA capabilities – Jon Peddie Research

ROCm

AMD ROCm™ Software

Triton

Python code that is compiled to LLVM IR and then to PTX binary, skipping the CUDA compiler

Welcome to Triton’s documentation! — Triton documentation
openai/triton: Development repository for the Triton language and compiler

Introducing Triton: Open-Source GPU Programming for Neural Networks
Wanna use your Nvidia GPU for acceleration but put off by CUDA? OpenAI has a Python-based alternative • The Register

NPU

Do we really need NPUs now? - YouTube questionable as there is current no need for long running AI app in the background

## Pacemaker

ClusterLabs > Pacemaker

MPI

Message Passing Interface - Wikiwand
Open MPI: Open Source High Performance Computing

A Comprehensive MPI Tutorial Resource · MPI Tutorial
Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems

MPI for Python — MPI for Python documentation
mpi4py – High-Performance Distributed Python » ADMIN Magazine

mpidotnet/MPI.NET: MPI.NET updated for .NET 4.0 and Linux

OpenMP

OpenMP - Wikiwand
Home - OpenMP
openmp - GCC Wiki

OpenMP Task Parallelism for Faster Genomic Data Processing
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

OpenMP » ADMIN Magazine
In the Loop » ADMIN Magazine

OpenACC

OpenMP like library for NVIDIA GPU
enables hybrid CPU + GPU programming
easier to use than CUDA

OpenACC - Wikiwand
Homepage | OpenACC
OpenACC - GCC Wiki

OpenACC: More Science Less Programming | NVIDIA Developer

Parallel Computing: What is better and why: OpenACC or OpenMP?
OpenMP + OpenACC

Urbit

Urbit

Urbit with Galen Wolfe-Pauly - Software Engineering Daily

Edge servers

That's It, I'm Done With Serverless. - YouTube
Regional execution for ultra-low latency rendering at the edge – Vercel

Edge Location (AWS Serverless): slow cold start, your code is deployed to the specific location(s)
Edge Runtime: faster "cold start" (actually not cold); your code is deployed globally, think CDN; e.g.: Netlify, Vercel
Regional Edge Runtime: faster "cold start" (actually not cold); edge server is closer to DB, further to user

Edge Runtime Cons:


SIMD

xtensor-stack/xsimd: C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

An Introduction to GCC Compiler Intrinsics in Vector Processing | Linux Journal

Linear Algebra

xtensor-stack/xtensor-benchmark: Easy to use benchmarks for linear algebra frameworks

Intel® Math Kernel Library (Intel® MKL) | Intel® Software
Math Kernel Library - Wikiwand

LAPACK — Linear Algebra PACKage

OpenBLAS : An optimized BLAS library

Armadillo: C++ library for linear algebra & scientific computing

Boosting numpy: Why BLAS Matters - Weblog
Is your Numpy optimized for speed? - Towards Data Science different backends

clifford: Geometric Algebra for Python — Clifford documentation