devops
database
elastic-stack
elastic-kibana
Analytics - Wikiwand
Data Analytics Reference Stack | Clear Linux* Project
Data Science Timeline - Noteworthy - The Journal Blog
Data Analyst VS Data Scientist – What's the Difference?
Software analytics - Wikiwand
Web analytics - Wikiwand
IT operations analytics - Wikiwand
Session (web analytics) - Wikiwand
Behavioral analytics - Wikiwand
not to be confused with User Behavioral Analytics, used in security context for threat detection
Business intelligence - Wikiwand
Cohort analysis - Wikiwand
10 Steps To Get You Started With Behavioral Analytics
Six Ways to Create Better Customer Behavior Analytics | Datameer
What is Operational Analytics? - Definition from Techopedia
Operations Analytics | Coursera
First data, logs or events triggered by applications and services, must be collected and store on some data store.
[Data Series - Microsoft Virtual Academy](https://mva.microsoft.com/search/SearchResults.aspx#!q="Data Series%3A"&lang=1033)
Data Scientia – Data Science | AI | Machine Learning | IoT | Cloud Analytics
Data Science Simplified Part 12: Resampling Methods – Data Scientia
The Best Free Data Science eBooks - Towards Data Science
Introducing Application Insights Analytics | Brian Harry's blog
Apache Hadoop Ecosystem and Open Source Big Data Projects | Hortonworks ❗!important
4 free maths courses to do in quarantine and level up your Data Science skills | by Gonzalo Ferreiro Volpi | Towards Data Science
Machine Learning and Data Science free online courses to do in quararantine | Towards Data Science
Prefect Docs
101 Machine Learning Algorithms for Data Science with Cheat Sheets | R-bloggers
7 Open Source Data Science Projects | Machine Learning Projects
Use cases
Online transaction processing - Wikiwand OLTP
What is OLTP (online transaction processing)? - Definition from WhatIs.com
Online analytical processing - Wikiwand OLAP
What is OLAP (online analytical processing)? - Definition from WhatIs.com
Hybrid transactional/analytical processing - Wikiwand NoSQL/NewSQL database can serve this purpose
RTA
Data warehouse - Wikiwand
Extract, transform, load - Wikiwand
ETL
ETLs vs ELTs: Why are ELTs Disrupting the Data Market? | by SeattleDataGuy | Coriers | Mar, 2021 | Medium
A good nudge trumps a good prediction - O'Reilly Radar
Whether prediction should be user friendly or business friendly
Stream Architecture
What is Stream Processing? - data Artisans
How a Stream Works - DZone Big Data
What is a Streaming Database?
The state can be built from events
"Turning the database inside out with Apache Samza" by Martin Kleppmann - YouTube
"Transactions: myths, surprises and opportunities" by Martin Kleppmann - YouTube
Streaming Architecture with Ted Dunning | Software Engineering Daily
Spark: Batch first, then stream; ELT job, working set in memory
Flink: Stream first, then batch; exactly one event processing
Streaming pipeline:
Type | Example | Storage Media | Usage |
---|---|---|---|
Message bus | Redis, Kafka | RAM, Disk | low latency data ingest |
Datalake | S3/HDFS | Disk | high capacity low cost long term storage |
Data warehouse | Elasticsearch | RAM | data structuring and indexing, fast interactive query |
Database | MySQL, MongoDB | RAM, Disk | data access with indexing |
Apache Flink vs. Apache Spark - DZone Big Data
Apache Flink: Does the world need another streaming engine? | ZDNet
Choose your real-time weapon: Storm or Spark? | InfoWorld
ksqlDB: The database purpose-built for stream processing applications.
Apache Flink: Scalable Stream and Batch Data Processing
How Netflix Optimized Flink for Massive Scale on AWS
Why Apache Flink - data Artisans
Apache Kafka
Apache Kafka - Hortonworks
Kafka Design Patterns with Gwen Shapira | Software Engineering Daily
Best Practices for Apache Kafka® in Production: Confluent Online Talk Series - Confluent
How to install Kafka using Docker - ITNEXT
Apache Kafka, Data Pipelines, and Functional Reactive Programming with Node.js | Heroku
Apache Kafka Crash Course - YouTube
Top 10 Problems When Using Apache Kafka - Pandio
Apache Pulsar Apache Pulsar is an open-source distributed pub-sub messaging system
Comparing Apache Kafka and Apache Pulsar | by Jaroslaw Kijanowski | SoftwareMill Tech Blog
7 Reasons We Chose Apache Pulsar over Apache Kafka | DataStax
5 More Reasons to Choose Apache Pulsar over Kafka | DataStax
Apache NiFi
Apache NiFi - Hortonworks
Apache Storm
Apache Storm - Hortonworks
Apache Storm: Architecture - DZone Big Data
Apache Spark™ - Unified Analytics Engine for Big Data
Apache Spark - Hortonworks
Spark and Streaming with Matei Zaharia | Software Engineering Daily
Apache Spark Tutorials - Frank Kane - YouTube
Apache Spark 2 using Python 3 - YouTube
Spark SQL: An Introductory Guide - DZone Big Data
We interrupt this revolution: Apache Spark changes the rules of the game | ZDNet
Apache Beam
Apache Beam - Wikiwand
stream API to abstract streaming warehouse, abstracts Flink, Spark, Dataflow
Beam is introducing a framework through which APIs in languages other than Java can be supported, and Python is the first one.
Cloud Dataflow - Stream & Batch Data Processing | Google Cloud
Hadoop and Spark: A tale of two cities | ZDNet
The Streaming Database | Materialize
Batch Architecture
Apache Hadoop
Big Data: What is Hadoop - An Easy Explanation For Absolutely Anyone
Is Hadoop Officially Dead?
Why is Hadoop dying? | Packt Hub
Big data
The Data Science Venn Diagram — Drew Conway
The Third Wave Data Scientist – Towards Data Science
Data Skeptic
A data cleaner's cookbook - About
Chris Albon
OpenRefine | OpenRefine
OpenRefine/OpenRefine: OpenRefine is a free, open source power tool for working with messy data and improving it
Pachyderm - Scalable, Reproducible Data Science
Containerized data analytics at scale, with Minio and Pachyderm
Data Science eBook by Analyticbridge - 2nd Edition - Data Science Central
Extracting value from the IoT - O'Reilly Radar
Collecting data and loading it into a data warehouse is not sufficient. You also need capabilities for accessing, modeling, and analyzing your data.
Awesome Data Science Repository - Data Science Central
Nyandwi/machine_learning_complete: A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
PredictionIO Open Source Machine Learning Server
The Art and Science of Data-Driven Journalism
Kaggle: Your Home for Data Science
Introduction to Data Science
Explore Your Data: The Fundamentals of Network Analysis
Design vs. Data: Enemies or Friends? how to evolve and extent a code base.
Cathy O'Neil on Weapons of Math Destruction | EconTalk | Library of Economics and Liberty crucial decision made based on machine learn statistics is unreliable as no one really know how the algorithm works
An expert's guide to big data storage architecture
Big data tutorial: Everything you need to know
Apache
a49a/bigdata-sql-benchmark: Flink, Presto, Trino TPC-DS benchmark
Apache Iceberg The open table format for analytic datasets, supports SQL and Spark, Trino, Flink, Presto engine
Apache Airflow data pipeline in Python, SQL-like query
Jupyter
Project Jupyter | Home
Jupyter and the future of IPython — IPython Jupyter was formerly IPython Notebook
Search, explore & show/display Jupyter notebooks in the terminal | Towards Data Science
Welcome to nbdev | nbdev
nbdev: use Jupyter Notebooks for everything · fast.ai
Jupyter is now a full-fledged IDE - Towards Data Science
MatrixDS – A community for working on and sharing advanced analytics
Deepnote - Data science notebook for teams
Introducing the Jupyter Extension for VS Code - Python
Jupyter Notebook: An Introduction – Real Python
Basics of Jupyter Notebook and Python | Packt Hub
28 Jupyter Notebook tips, tricks and shortcuts
Jupyter Notebooks in Visual Studio Code | Visual Studio Toolbox | Channel 9
Get Started With Jupyter Notebook: A Tutorial
Introduction to Jupyter Notebooks | Programming Historian
Top 10 Magic Commands in Python to Boost your Productivity | by Siddhesh Jadhav | Towards Data Science
IPython Cookbook - IPython Cookbook, Second Edition (2018)
ipython/ipython-in-depth: IPython and Jupyter in-depth Tutorial, first presented at PyCon 2012
Jupyter Notebooks as Markdown Documents, Julia, Python or R Scripts — Jupytext documentation
A gallery of interesting Jupyter Notebooks · jupyter/jupyter Wiki
JupyterLab Documentation
Voilà Dashboards
And Voilà!. … from Jupyter notebooks to standalone… | by QuantStack | Jupyter Blog
Dashboarding with JupyterLab 3. Project Jupyter offers a complete suite… | by Carlos Herrero | Jan, 2021 | Jupyter Blog
Hello, Colaboratory - Colaboratory
Google Drive + Google Colab + GitHub; Don’t Just Read, Do It!
Microsoft Azure Notebooks
scrapbook documentation a library for recording a notebook’s data values and generated visual content as "scraps"
papermill documentation tool for parameterizing and executing Jupyter Notebooks
nteract/papermill: 📚 Parameterize, execute, and analyze notebooks
Introduction to Papermill - Towards Data Science
Automated Report Generation with Papermill: Part 1 - Practical Business Python
Automated Report Generation with Papermill: Part 2 - Practical Business Python
Project Jupyter
The Jupyter Notebook — Jupyter Notebook documentation
nbviewer FAQ
jupyter/nbconvert: Jupyter Notebook Conversion
Binder executable notebooks from URL
JupyterLab
JupyterLab Documentation — JupyterLab documentation
JupyterHub
JupyterHub — JupyterHub documentation
ipywidgets — Jupyter Widgets documentation
A very simple demo of interactive controls on Jupyter notebook
Interactive Visualizations with Pandas, Seaborn and Ipywidgets | by Zoltan Guba | Python in Plain English
jupyter-repo2docker — repo2docker documentation
Docker Without the Hassle – Towards Data Science
Create your own GPU accelerated Jupyter Notebook Server for Google Colab using Docker | by Sascha Kirch | Apr, 2022 | Towards Data Science
neuron - Visual Studio Marketplace
Data Science in Visual Studio Code using Neuron, a new VS Code extension – Microsoft Faculty Connection
Jupylet
JUPYLET PROGRAMMER’S REFERENCE GUIDE — Jupylet documentation
Polyglot Notebook
Polyglot Notebooks - Visual Studio Marketplace
Announcing Polyglot Notebooks! Multi-language notebooks in Visual Studio Code - .NET Blog
Polyglot Notebooks fully released for VS Code, with support for multiple languages - not including Python • DEVCLASS history of Jupyter Notebook
Datasets
Fueling the Gold Rush: The Greatest Public Datasets for AI
Data Asset eXchange – IBM Developer
Open Data Kit
Computer Vision Datasets
Access Free Google Cloud Public Dataset with Python
Datasets – Google Research
Dataset Search
Find Open Datasets and Machine Learning Projects | Kaggle
Google just published 25 million free datasets - Towards Data Science
COCO - Common Objects in Context
An Introduction to the COCO Dataset
資料一線通 | DATA.GOV.HK
Open Data Hong Kong - 香港開放數據 | Hong Kong's Open Data community
g0vhk.io - Home | Facebook
70 Amazing Free Data Sources You Should Know
Datasets for Data Mining and Data Science
Downloading The Kinetics Dataset For Human Action Recognition in Deep Learning
Analysis of the MRNet Knee MRI dataset | The Startup
Label Studio Open-source data labeling, annotation and exploration tool
Business Analytics
Commercial
Big Data Integration and Analytics | Hitachi Vantara
Business Intelligence and Analytics | Tableau Software
Introduction to Tableau - Learn The Part - Medium
Data Visualization | Microsoft Power BI
15 分鐘上手 Power BI!我一旦認真起來連我自己都會害怕 ~ - YouTube
The 5 best self-service BI tools compared | CIO
15 分鐘上手 Power BI!我一旦認真起來連我自己都會害怕 ~ - YouTube
Open source
Apache Superset (incubating) — Apache Superset documentation
Redash helps you make sense of your data | Redash
Metabase
Easy analytics with Grafana, Postgres, and Kubernetes.
Data Processing
Tabula: Extract Tables from PDFs
香港地址解析器 Hong Kong Address Parser
Data Analytics Reference Stack | Clear Linux* Project
AugLy: A new data augmentation library to help build more robust AI models
facebookresearch/AugLy: A data augmentations library for audio, image, text, and video.
Data Build Tool/dbt
What is dbt? | dbt Developer Hub
Transform Your Data Like a Pro With dbt (Data Build Tool) - DEV Community
DataStation
DataStation | The Data IDE for Developers
multiprocessio/datastation: Easily query, script, and visualize data from every database, file, and API.
multiprocessio/dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Python
JavaScript
Danfo.js Documentation - Danfo.js Pandas for JavaScript
Hello from Scikit.js | Scikit.js Scikit Learn for JavaScript
JSdata
Crossfilter Pandas for JavaScript
How to Create an Interactive Dashboard with Crossfilter and Dc.Js
scijs
ndarray
Implementing Multidimensional Arrays in JavaScript | 0 FPS
tidy.js
tidy.js – Intro & Demo / Peter Beshai / Observable
C
Articles on Mathematics, Physics and Computer Science
muparser - fast math parser library
Go
DataFrames in Go with gota, qframe, and dataframe-go - MungingData
gonum
plot package - gonum.org/v1/plot - pkg.go.dev
tobgu/qframe: Immutable data frame for Go
go-gota/gota: Gota: DataFrames and data wrangling in Go (Golang)
rocketlaunchr/dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Rust
Vector | A lightweight, ultra-fast tool for building observability pipelines
Polars
Polars
Pandas vs. Polars: A Syntax and Speed Comparison | by Leonie Monigatti | Jan, 2023 | Towards Data Science
Why Polars uses less memory than Pandas
Replacing Pandas with Polars. A Practical Guide. - Confessions of a Data Guy
Polars for initial data analysis, Polars for production