Skip to content

Distributed Computing

September 29, 2023
September 21, 2016

FIXME: some info are for distributed computing, some are for hardware (GPU) acceleration

data-analytics
data-analytics-python

9 crushing performance problems in scalable systems | InfoWorld
Get more done at the Linux command line with GNU Parallel | Opensource.com

The Cluster Documentation Project » ADMIN Magazine
Cluster Documentation Project

Multigrid method - Wikiwand
Fallacies of distributed computing - Wikiwand
Fallacies of Distributed Systems
Understanding the 8 Fallacies of Distributed Systems - DZone Microservices

Four Distributed Systems Architectural Patterns by Tim Berglund - YouTube
Why Distributed Systems Are Hard - YouTube

HPC » ADMIN Magazine
Parallelizing Code – Loops » ADMIN Magazine

Jepsen: On the perils of network partitions
Strong consistency models
The network is reliable
Burn the Library

Tick or Tock? Keeping Time and Order in Distributed Databases| PingCAP

A split brain is what happens when you have multiple autonomous sub-clusters forming, and more than one believe they're the "master". This can cause irreconcilable changes and data loss.

Building Your Own Consensus | Hackaday
Episode 377: Heidi Howard on Distributed Consensus : Software Engineering Radio
分布式系统的事务处理 | | 酷 壳 - CoolShell
Paxos (computer science) - Wikiwand
Paxos Made Live - An Engineering Perspective (2006 Invited Talk) – Google Research
Paxos Made Moderately Complex

Raft Consensus Algorithm
Raft (computer science) - Wikiwand

Google I/O 2009 - Transactions Across Datacenters.. - YouTube
Debugging Incidents in Google’s Distributed Systems - ACM Queue

How to do distributed locking — Martin Kleppmann’s blog
Distributed Locks Are Dead, Long Live Distributed Locks - DZone Java

Byzantine Fault Tolerance

Byzantine fault - Wikiwand
Byzantine Fault Tolerance Explained | Binance Academy
The-Byzantine-Generals-Problem.pdf
How does blockchain solve the Byzantine generals problem?

Node.js

Mostafa-Samir/klyng: A message-passing distributed computing framework for node.js

bithound/farm.bithound.io: “All animals are equal, but some animals are more equal than others.” a simple "framework" that bitHound used for working in a distributed environment; uses ZeroMQ

substack/dnode: turtles all the way down rpc
substack/dnode-protocol: Implements the dnode protocol abstractly in node.js

substack/fleet: multi-server continuous git-based deployment and process management
substack/seaport: semver service registry for clusters
substack/airport: role-based port management for upnode

Python

pyamgx – Accelerated Python Library » ADMIN Magazine algebraic multigrid

Scale your pandas workflow by changing a single line of code. — Modin documentation Pandas API on Ray/Dask
Ray documentation
Dask: Scalable analytics in Python

High-Performance Python – Distributed Python » ADMIN Magazine

ArrayFire

ArrayFire | Faster Code
Blog | ArrayFire
ArrayFire Users - Google Groups

Configuring ArrayFire Environment

The API docs seems outdated, source code may provide some functions not in the doc
Docs Overview
Docs Tutorials
Docs Functions
Docs Complete List of ArrayFire Functions

arrayfire/arrayfire: ArrayFire: a general purpose GPU library.
arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library.
arrayfire/arrayfire-rust: Rust wrapper for ArrayFire

Build

Home · arrayfire/arrayfire Wiki
Jetson/Installing ArrayFire - eLinux.org

ArrayFire master branch (3.7) as of 20190801 reports "Unsupported compiler Intel" upon build. Use official 3.6 release.

ArrayFire 3.6

CUDA 10 requires CMake 3.12.3 (need to build from source on Ubuntu 18.04)

For PC it's easiest to install the prebuilt binary.

GPU

How GPUs are Beginning to Displace Clusters for Big Data & Data Science - By Dan Voyce

AmgX | NVIDIA Developer algebraic, physics
AmgX: Multi-Grid Accelerated Linear Solvers for Industrial Applications

PyOpenCL

NVIDIA GPUDirect | NVIDIA Developer
GPU 通信技术初探(一)
How to Overlap Data Transfers in CUDA C/C++ | NVIDIA Developer Blog

What differences and relations between SVM, HSA, HMM and Unified Memory

CUDA

docker-nvidia

CUDA - Wikiwand
An Even Easier Introduction to CUDA | NVIDIA Developer Blog
Programming Guide :: CUDA Toolkit Documentation

CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation

CUDA Toolkit Downloads | NVIDIA Developer
Guides - Installing the NVIDIA CUDA Toolkit | Linode
Installation Guide Linux :: CUDA Toolkit Documentation post install, set PATH and LD_LIBRARY_PATH
CUDA on WSL :: CUDA Toolkit Documentation
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow

nvidia-smi's return the CUDA driver's version (around 410.48), not the CUDA runtime version, use nvcc --version
Different CUDA versions shown by nvcc and NVIDIA-smi - Stack Overflow !important
How to get the CUDA version? - Stack Overflow

PyCUDA

Getting Started with NVIDIA GPU CUDA Core Programming Using Visual Studio in 2021 - YouTube
Intro to Parallel Programming CUDA - Udacity 458 - YouTube
CUDA Programming - YouTube
Tutorial: CUDA programming in Python with numba and cupy - YouTube

Home - CUDA Tutorial
Imaging and Computer Vision |NVIDIA

NVIDIA Collective Communications Library (NCCL) | NVIDIA Developer multi-GPU and multi-node collective communication primitives

Triton

Welcome to Triton’s documentation! — Triton documentation
openai/triton: Development repository for the Triton language and compiler

Introducing Triton: Open-Source GPU Programming for Neural Networks
Wanna use your Nvidia GPU for acceleration but put off by CUDA? OpenAI has a Python-based alternative • The Register

MPI

Message Passing Interface - Wikiwand
Open MPI: Open Source High Performance Computing

A Comprehensive MPI Tutorial Resource · MPI Tutorial
Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems

MPI for Python — MPI for Python documentation
mpi4py – High-Performance Distributed Python » ADMIN Magazine

mpidotnet/MPI.NET: MPI.NET updated for .NET 4.0 and Linux

OpenMP

OpenMP - Wikiwand
Home - OpenMP
openmp - GCC Wiki

OpenMP Task Parallelism for Faster Genomic Data Processing
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

OpenMP » ADMIN Magazine
In the Loop » ADMIN Magazine

OpenACC

OpenMP like library for NVIDIA GPU
enables hybrid CPU + GPU programming
easier to use than CUDA

OpenACC - Wikiwand
Homepage | OpenACC
OpenACC - GCC Wiki

OpenACC: More Science Less Programming | NVIDIA Developer

Parallel Computing: What is better and why: OpenACC or OpenMP?
OpenMP + OpenACC

Urbit

Urbit

Urbit with Galen Wolfe-Pauly - Software Engineering Daily

Edge servers

That's It, I'm Done With Serverless. - YouTube
Regional execution for ultra-low latency rendering at the edge – Vercel

Edge Location (AWS Serverless): slow cold start, your code is deployed to the specific location(s)
Edge Runtime: faster "cold start" (actually not cold); your code is deployed globally, think CDN; e.g.: Netlify, Vercel
Regional Edge Runtime: faster "cold start" (actually not cold); edge server is closer to DB, further to user

Edge Runtime Cons:


SIMD

xtensor-stack/xsimd: C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

An Introduction to GCC Compiler Intrinsics in Vector Processing | Linux Journal

Linear Algebra

xtensor-stack/xtensor-benchmark: Easy to use benchmarks for linear algebra frameworks

Intel® Math Kernel Library (Intel® MKL) | Intel® Software
Math Kernel Library - Wikiwand

LAPACK — Linear Algebra PACKage

OpenBLAS : An optimized BLAS library

Armadillo: C++ library for linear algebra & scientific computing

Boosting numpy: Why BLAS Matters - Weblog
Is your Numpy optimized for speed? - Towards Data Science different backends

clifford: Geometric Algebra for Python — Clifford documentation