FIXME: some info are for distributed computing, some are for hardware (GPU) acceleration
data-analytics
data-analytics-python
9 crushing performance problems in scalable systems | InfoWorld
Get more done at the Linux command line with GNU Parallel | Opensource.com
The Cluster Documentation Project » ADMIN Magazine
Cluster Documentation Project
Multigrid method - Wikiwand
Fallacies of distributed computing - Wikiwand
Fallacies of Distributed Systems
Understanding the 8 Fallacies of Distributed Systems - DZone Microservices
Four Distributed Systems Architectural Patterns by Tim Berglund - YouTube
Why Distributed Systems Are Hard - YouTube
HPC » ADMIN Magazine
Parallelizing Code – Loops » ADMIN Magazine
Jepsen: On the perils of network partitions
Strong consistency models
The network is reliable
Burn the Library
Tick or Tock? Keeping Time and Order in Distributed Databases| PingCAP
A split brain is what happens when you have multiple autonomous sub-clusters forming, and more than one believe they're the "master". This can cause irreconcilable changes and data loss.
Google I/O 2009 - Transactions Across Datacenters.. - YouTube
Debugging Incidents in Google’s Distributed Systems - ACM Queue
How to do distributed locking — Martin Kleppmann’s blog
Distributed Locks Are Dead, Long Live Distributed Locks - DZone Java
Consensus Mechanisms
How to Agree: Different Types of Consensus for Blockchain
Building Your Own Consensus | Hackaday
Understanding Blockchain Fundamentals, Part 1: Byzantine Fault Tolerance
Episode 377: Heidi Howard on Distributed Consensus : Software Engineering Radio
分布式系统的事务处理 | | 酷 壳 - CoolShell
Paxos (computer science) - Wikiwand
Paxos Made Live - An Engineering Perspective (2006 Invited Talk) – Google Research
Paxos Made Moderately Complex
Raft Consensus Algorithm
Raft (computer science) - Wikiwand
Byzantine Fault Tolerance
Byzantine fault - Wikiwand
Byzantine Fault Tolerance Explained | Binance Academy
The-Byzantine-Generals-Problem.pdf
How does blockchain solve the Byzantine generals problem?
Node.js
Mostafa-Samir/klyng: A message-passing distributed computing framework for node.js
bithound/farm.bithound.io: “All animals are equal, but some animals are more equal than others.” a simple "framework" that bitHound used for working in a distributed environment; uses ZeroMQ
substack/dnode: turtles all the way down rpc
substack/dnode-protocol: Implements the dnode protocol abstractly in node.js
substack/fleet: multi-server continuous git-based deployment and process management
substack/seaport: semver service registry for clusters
substack/airport: role-based port management for upnode
Python
pyamgx – Accelerated Python Library » ADMIN Magazine algebraic multigrid
Scale your pandas workflow by changing a single line of code. — Modin documentation Pandas API on Ray/Dask
Ray documentation
Dask: Scalable analytics in Python
High-Performance Python – Distributed Python » ADMIN Magazine
ArrayFire
ArrayFire | Faster Code
Blog | ArrayFire
ArrayFire Users - Google Groups
Configuring ArrayFire Environment
The API docs seems outdated, source code may provide some functions not in the doc
Docs Overview
Docs Tutorials
Docs Functions
Docs Complete List of ArrayFire Functions
arrayfire/arrayfire: ArrayFire: a general purpose GPU library.
arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library.
arrayfire/arrayfire-rust: Rust wrapper for ArrayFire
Build
Home · arrayfire/arrayfire Wiki
Jetson/Installing ArrayFire - eLinux.org
ArrayFire master branch (3.7) as of 20190801 reports "Unsupported compiler Intel" upon build. Use official 3.6 release.
ArrayFire 3.6
- reports "Unsupported platform" on Ubuntu 16.04, requires CMake 3.8+
- requires
glibc
2.27 at runtime (included in Ubuntu 18.04)
CUDA 10 requires CMake 3.12.3 (need to build from source on Ubuntu 18.04)
For PC it's easiest to install the prebuilt binary.
GPU
The GPU evolution: from simple graphics to AI brains - YouTube
How GPUs are Beginning to Displace Clusters for Big Data & Data Science - By Dan Voyce
How do Graphics Cards Work? Exploring GPU Architecture - YouTube
AmgX | NVIDIA Developer algebraic, physics
AmgX: Multi-Grid Accelerated Linear Solvers for Industrial Applications
NVIDIA GPUDirect | NVIDIA Developer
GPU 通信技术初探(一)
How to Overlap Data Transfers in CUDA C/C++ | NVIDIA Developer Blog
What differences and relations between SVM, HSA, HMM and Unified Memory
How to Build Your GPU Cluster: Process and Hardware Options
PyTorch Multi GPU: 3 Techniques Explain red
TensorFlow Multiple GPU: 5 Strategies and 2 Quick Tutorials
Keras Multi GPU: A Practical Guide
Scaling Up
NVLink from nVidia
UALink (Unified Accelerator Link)
- Infinity Fabric from AMD: connects CPUs and GPUs
- Switching Technologies from Broadcom: supports PCIe7.0 and xGMI
- OAM (Open Accelerator Module) from Meta and Microsoft: server framework for accelerators
八巨头联手硬控英伟达!老黄慌了吗? - YouTube
Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium
AMD & Intel Team Up For UALink As Open Alternative To NVIDIA's NVLink - Phoronix
UALink will be the NVLink Standard Backed by AMD Intel Broadcom Cisco and More
Nvidia’s NVLink Vs. UALink. How NVIDIA’s Hype United Tech Giants in… | by Ali Waseem | Jun, 2024 | Medium
Scaling Out
Spectrum-X from nVidia
CUDA
CUDA - Wikiwand
An Even Easier Introduction to CUDA | NVIDIA Developer Blog
Programming Guide :: CUDA Toolkit Documentation
CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation
CUDA Toolkit Downloads | NVIDIA Developer
Guides - Installing the NVIDIA CUDA Toolkit | Linode
Installation Guide Linux :: CUDA Toolkit Documentation post install, set PATH
and LD_LIBRARY_PATH
CUDA on WSL :: CUDA Toolkit Documentation
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow
nvidia-smi
's return the CUDA driver's version (around 410.48), not the CUDA runtime version, use nvcc --version
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow !important
How to get the CUDA version? - Stack Overflow
Getting Started With CUDA for Python Programmers - YouTube 2024-01
Going Further with CUDA for Python Programmers - YouTube 2024-01
Getting Started with NVIDIA GPU CUDA Core Programming Using Visual Studio in 2021 - YouTube
Intro to Parallel Programming CUDA - Udacity 458 - YouTube
CUDA Programming - YouTube
Tutorial: CUDA programming in Python with numba and cupy - YouTube
Home - CUDA Tutorial
Imaging and Computer Vision |NVIDIA
NVIDIA Collective Communications Library (NCCL) | NVIDIA Developer multi-GPU and multi-node collective communication primitives
CUDA Monopoly
Why Nvidia's AI monopoly is coming to an end - YouTube
vosen/ZLUDA: CUDA on ??? GPUs
ZLUDA Project Paves the Way for CUDA on Intel GPUs | Tom's Hardware
Software allows CUDA code to run on AMD and Intel GPUs without changes — ZLUDA is back but both companies ditched it, nixing future updates | Tom's Hardware
Nvidia bans using translation layers for CUDA software — previously the prohibition was only listed in the online EULA, now included in installed files [Updated] | Tom's Hardware
ZLUDA: CUDA For AMD GPUs Returns From The Grave - YouTube
SCALE documentation
New SCALE tool enables CUDA applications to run on AMD GPUs | Tom's Hardware
AMD ‘Scales’ up its CUDA capabilities – Jon Peddie Research
ROCm
Triton
Python code that is compiled to LLVM IR and then to PTX binary, skipping the CUDA compiler
Welcome to Triton’s documentation! — Triton documentation
openai/triton: Development repository for the Triton language and compiler
Introducing Triton: Open-Source GPU Programming for Neural Networks
Wanna use your Nvidia GPU for acceleration but put off by CUDA? OpenAI has a Python-based alternative • The Register
NPU
Do we really need NPUs now? - YouTube questionable as there is current no need for long running AI app in the background
## Pacemaker
MPI
Message Passing Interface - Wikiwand
Open MPI: Open Source High Performance Computing
A Comprehensive MPI Tutorial Resource · MPI Tutorial
Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems
MPI for Python — MPI for Python documentation
mpi4py – High-Performance Distributed Python » ADMIN Magazine
mpidotnet/MPI.NET: MPI.NET updated for .NET 4.0 and Linux
OpenMP
OpenMP - Wikiwand
Home - OpenMP
openmp - GCC Wiki
OpenMP Task Parallelism for Faster Genomic Data Processing
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure
OpenMP » ADMIN Magazine
In the Loop » ADMIN Magazine
OpenACC
OpenMP like library for NVIDIA GPU
enables hybrid CPU + GPU programming
easier to use than CUDA
OpenACC - Wikiwand
Homepage | OpenACC
OpenACC - GCC Wiki
OpenACC: More Science Less Programming | NVIDIA Developer
Parallel Computing: What is better and why: OpenACC or OpenMP?
OpenMP + OpenACC
Urbit
Urbit with Galen Wolfe-Pauly - Software Engineering Daily
Edge servers
That's It, I'm Done With Serverless. - YouTube
Regional execution for ultra-low latency rendering at the edge – Vercel
Edge Location (AWS Serverless): slow cold start, your code is deployed to the specific location(s)
Edge Runtime: faster "cold start" (actually not cold); your code is deployed globally, think CDN; e.g.: Netlify, Vercel
Regional Edge Runtime: faster "cold start" (actually not cold); edge server is closer to DB, further to user
Edge Runtime Cons:
- Compatibility (no all functions of your runtime is available)
- No native runtime (cannot run Rust/Go binary from JavaScript)
SIMD
An Introduction to GCC Compiler Intrinsics in Vector Processing | Linux Journal
Linear Algebra
xtensor-stack/xtensor-benchmark: Easy to use benchmarks for linear algebra frameworks
Intel® Math Kernel Library (Intel® MKL) | Intel® Software
Math Kernel Library - Wikiwand
LAPACK — Linear Algebra PACKage
OpenBLAS : An optimized BLAS library
Armadillo: C++ library for linear algebra & scientific computing
Boosting numpy: Why BLAS Matters - Weblog
Is your Numpy optimized for speed? - Towards Data Science different backends
clifford: Geometric Algebra for Python — Clifford documentation