Skip to content

Data Analytics

September 29, 2023
September 21, 2016

devops
database
elastic-stack
elastic-kibana

Analytics - Wikiwand
Data Analytics Reference Stack | Clear Linux* Project
Data Science Timeline - Noteworthy - The Journal Blog
Data Analyst VS Data Scientist – What's the Difference?

Software analytics - Wikiwand
Web analytics - Wikiwand
IT operations analytics - Wikiwand
Session (web analytics) - Wikiwand

Behavioral analytics - Wikiwand
not to be confused with User Behavioral Analytics, used in security context for threat detection
Business intelligence - Wikiwand
Cohort analysis - Wikiwand
10 Steps To Get You Started With Behavioral Analytics
Six Ways to Create Better Customer Behavior Analytics | Datameer

From unstructured data to actionable intelligence: Using machine learning for threat intelligence - Microsoft Security

What is Operational Analytics? - Definition from Techopedia
Operations Analytics | Coursera

Business analytics - Wikiwand

First data, logs or events triggered by applications and services, must be collected and store on some data store.

[Data Series - Microsoft Virtual Academy](https://mva.microsoft.com/search/SearchResults.aspx#!q="Data Series%3A"&lang=1033)

Data Scientia – Data Science | AI | Machine Learning | IoT | Cloud Analytics
Data Science Simplified Part 12: Resampling Methods – Data Scientia
The Best Free Data Science eBooks - Towards Data Science

Introducing Application Insights Analytics | Brian Harry's blog

Apache Hadoop Ecosystem and Open Source Big Data Projects | Hortonworks ❗!important

4 free maths courses to do in quarantine and level up your Data Science skills | by Gonzalo Ferreiro Volpi | Towards Data Science
Machine Learning and Data Science free online courses to do in quararantine | Towards Data Science

Prefect Docs
101 Machine Learning Algorithms for Data Science with Cheat Sheets | R-bloggers
7 Open Source Data Science Projects | Machine Learning Projects

Use cases

OLTP vs. OLAP

Online transaction processing - Wikiwand OLTP
What is OLTP (online transaction processing)? - Definition from WhatIs.com
Online analytical processing - Wikiwand OLAP
What is OLAP (online analytical processing)? - Definition from WhatIs.com
Hybrid transactional/analytical processing - Wikiwand NoSQL/NewSQL database can serve this purpose
RTA
Data warehouse - Wikiwand
Extract, transform, load - Wikiwand
ETL
ETLs vs ELTs: Why are ELTs Disrupting the Data Market? | by SeattleDataGuy | Coriers | Mar, 2021 | Medium

A good nudge trumps a good prediction - O'Reilly Radar

Whether prediction should be user friendly or business friendly

Stream Architecture

What is Stream Processing? - data Artisans
How a Stream Works - DZone Big Data
What is a Streaming Database?

The state can be built from events

"Turning the database inside out with Apache Samza" by Martin Kleppmann - YouTube
"Transactions: myths, surprises and opportunities" by Martin Kleppmann - YouTube

Streaming Architecture with Ted Dunning | Software Engineering Daily
Spark: Batch first, then stream; ELT job, working set in memory
Flink: Stream first, then batch; exactly one event processing

Streaming pipeline:

TypeExampleStorage MediaUsage
Message busRedis, KafkaRAM, Disklow latency data ingest
DatalakeS3/HDFSDiskhigh capacity low cost long term storage
Data warehouseElasticsearchRAMdata structuring and indexing, fast interactive query
DatabaseMySQL, MongoDBRAM, Diskdata access with indexing

Apache Flink vs. Apache Spark - DZone Big Data
Apache Flink: Does the world need another streaming engine? | ZDNet
Choose your real-time weapon: Storm or Spark? | InfoWorld

ksqlDB: The database purpose-built for stream processing applications.

Apache Flink: Scalable Stream and Batch Data Processing
How Netflix Optimized Flink for Massive Scale on AWS
Why Apache Flink - data Artisans

Apache Kafka
Apache Kafka - Hortonworks
Kafka Design Patterns with Gwen Shapira | Software Engineering Daily
Best Practices for Apache Kafka® in Production: Confluent Online Talk Series - Confluent
How to install Kafka using Docker - ITNEXT
Apache Kafka, Data Pipelines, and Functional Reactive Programming with Node.js | Heroku
Apache Kafka Crash Course - YouTube
Top 10 Problems When Using Apache Kafka - Pandio

Apache Pulsar Apache Pulsar is an open-source distributed pub-sub messaging system
Comparing Apache Kafka and Apache Pulsar | by Jaroslaw Kijanowski | SoftwareMill Tech Blog
7 Reasons We Chose Apache Pulsar over Apache Kafka | DataStax
5 More Reasons to Choose Apache Pulsar over Kafka | DataStax

Apache NiFi
Apache NiFi - Hortonworks

Apache Storm
Apache Storm - Hortonworks
Apache Storm: Architecture - DZone Big Data

Apache Spark™ - Unified Analytics Engine for Big Data
Apache Spark - Hortonworks
Spark and Streaming with Matei Zaharia | Software Engineering Daily
Apache Spark Tutorials - Frank Kane - YouTube
Apache Spark 2 using Python 3 - YouTube
Spark SQL: An Introductory Guide - DZone Big Data
We interrupt this revolution: Apache Spark changes the rules of the game | ZDNet

Apache Beam
Apache Beam - Wikiwand
stream API to abstract streaming warehouse, abstracts Flink, Spark, Dataflow
Beam is introducing a framework through which APIs in languages other than Java can be supported, and Python is the first one.

Cloud Dataflow - Stream & Batch Data Processing | Google Cloud
Hadoop and Spark: A tale of two cities | ZDNet

Benthos | Benthos

The Streaming Database | Materialize

Batch Architecture

Apache Hadoop
Big Data: What is Hadoop - An Easy Explanation For Absolutely Anyone

Is Hadoop Officially Dead?
Why is Hadoop dying? | Packt Hub

Big data

onurakpolat/awesome-bigdata: A curated list of awesome big data frameworks, ressources and other awesomeness.

The Data Science Venn Diagram — Drew Conway
The Third Wave Data Scientist – Towards Data Science

Data Skeptic
A data cleaner's cookbook - About
Chris Albon

OpenRefine | OpenRefine
OpenRefine/OpenRefine: OpenRefine is a free, open source power tool for working with messy data and improving it

Pachyderm - Scalable, Reproducible Data Science
Containerized data analytics at scale, with Minio and Pachyderm

Data Science eBook by Analyticbridge - 2nd Edition - Data Science Central

Extracting value from the IoT - O'Reilly Radar

Collecting data and loading it into a data warehouse is not sufficient. You also need capabilities for accessing, modeling, and analyzing your data.

Awesome Data Science Repository - Data Science Central
Nyandwi/machine_learning_complete: A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

PredictionIO Open Source Machine Learning Server

The Art and Science of Data-Driven Journalism

Comparison of top data science libraries for Python, R and Scala [Infographic] - Data Science Central

Kaggle: Your Home for Data Science
Introduction to Data Science
Explore Your Data: The Fundamentals of Network Analysis

Design vs. Data: Enemies or Friends? how to evolve and extent a code base.

Cathy O'Neil on Weapons of Math Destruction | EconTalk | Library of Economics and Liberty crucial decision made based on machine learn statistics is unreliable as no one really know how the algorithm works

An expert's guide to big data storage architecture
Big data tutorial: Everything you need to know

Apache

a49a/bigdata-sql-benchmark: Flink, Presto, Trino TPC-DS benchmark
Apache Iceberg The open table format for analytic datasets, supports SQL and Spark, Trino, Flink, Presto engine
Apache Airflow data pipeline in Python, SQL-like query

Jupyter

Project Jupyter | Home
Jupyter and the future of IPython — IPython Jupyter was formerly IPython Notebook
Search, explore & show/display Jupyter notebooks in the terminal | Towards Data Science

Welcome to nbdev | nbdev
nbdev: use Jupyter Notebooks for everything · fast.ai
Jupyter is now a full-fledged IDE - Towards Data Science
MatrixDS – A community for working on and sharing advanced analytics
Deepnote - Data science notebook for teams

Introducing the Jupyter Extension for VS Code - Python

Jupyter Notebook: An Introduction – Real Python
Basics of Jupyter Notebook and Python | Packt Hub
28 Jupyter Notebook tips, tricks and shortcuts
Jupyter Notebooks in Visual Studio Code | Visual Studio Toolbox | Channel 9
Get Started With Jupyter Notebook: A Tutorial
Introduction to Jupyter Notebooks | Programming Historian
Top 10 Magic Commands in Python to Boost your Productivity | by Siddhesh Jadhav | Towards Data Science

IPython Cookbook - IPython Cookbook, Second Edition (2018)
ipython/ipython-in-depth: IPython and Jupyter in-depth Tutorial, first presented at PyCon 2012

Jupyter Notebooks as Markdown Documents, Julia, Python or R Scripts — Jupytext documentation

A gallery of interesting Jupyter Notebooks · jupyter/jupyter Wiki
JupyterLab Documentation

Voilà Dashboards
And Voilà!. … from Jupyter notebooks to standalone… | by QuantStack | Jupyter Blog
Dashboarding with JupyterLab 3. Project Jupyter offers a complete suite… | by Carlos Herrero | Jan, 2021 | Jupyter Blog

Hello, Colaboratory - Colaboratory
Google Drive + Google Colab + GitHub; Don’t Just Read, Do It!
Microsoft Azure Notebooks

scrapbook documentation a library for recording a notebook’s data values and generated visual content as "scraps"

papermill documentation tool for parameterizing and executing Jupyter Notebooks
nteract/papermill: 📚 Parameterize, execute, and analyze notebooks
Introduction to Papermill - Towards Data Science
Automated Report Generation with Papermill: Part 1 - Practical Business Python
Automated Report Generation with Papermill: Part 2 - Practical Business Python

Project Jupyter
The Jupyter Notebook — Jupyter Notebook documentation
nbviewer FAQ
jupyter/nbconvert: Jupyter Notebook Conversion
Binder executable notebooks from URL

JupyterLab
JupyterLab Documentation — JupyterLab documentation

JupyterHub
JupyterHub — JupyterHub documentation

ipywidgets — Jupyter Widgets documentation
A very simple demo of interactive controls on Jupyter notebook
Interactive Visualizations with Pandas, Seaborn and Ipywidgets | by Zoltan Guba | Python in Plain English

jupyter-repo2docker — repo2docker documentation
Docker Without the Hassle – Towards Data Science
Create your own GPU accelerated Jupyter Notebook Server for Google Colab using Docker | by Sascha Kirch | Apr, 2022 | Towards Data Science

neuron - Visual Studio Marketplace
Data Science in Visual Studio Code using Neuron, a new VS Code extension – Microsoft Faculty Connection

Jupylet

JUPYLET PROGRAMMER’S REFERENCE GUIDE — Jupylet documentation

Polyglot Notebook

Polyglot Notebooks - Visual Studio Marketplace

Announcing Polyglot Notebooks! Multi-language notebooks in Visual Studio Code - .NET Blog
Polyglot Notebooks fully released for VS Code, with support for multiple languages - not including Python • DEVCLASS history of Jupyter Notebook

Datasets

Fueling the Gold Rush: The Greatest Public Datasets for AI
Data Asset eXchange – IBM Developer
Open Data Kit
Computer Vision Datasets

Access Free Google Cloud Public Dataset with Python

Datasets – Google Research
Dataset Search
Find Open Datasets and Machine Learning Projects | Kaggle
Google just published 25 million free datasets - Towards Data Science

COCO - Common Objects in Context
An Introduction to the COCO Dataset

資料一線通 | DATA.GOV.HK
Open Data Hong Kong - 香港開放數據 | Hong Kong's Open Data community
g0vhk.io - Home | Facebook

70 Amazing Free Data Sources You Should Know
Datasets for Data Mining and Data Science

Downloading The Kinetics Dataset For Human Action Recognition in Deep Learning
Analysis of the MRNet Knee MRI dataset | The Startup

Label Studio Open-source data labeling, annotation and exploration tool

Business Analytics

Commercial

Big Data Integration and Analytics | Hitachi Vantara
Business Intelligence and Analytics | Tableau Software
Introduction to Tableau - Learn The Part - Medium

Data Visualization | Microsoft Power BI
15 分鐘上手 Power BI!我一旦認真起來連我自己都會害怕 ~ - YouTube

The 5 best self-service BI tools compared | CIO

15 分鐘上手 Power BI!我一旦認真起來連我自己都會害怕 ~ - YouTube

Open source

Apache Superset (incubating) — Apache Superset documentation
Redash helps you make sense of your data | Redash
Metabase

Easy analytics with Grafana, Postgres, and Kubernetes.


Data Processing

Tabula: Extract Tables from PDFs
香港地址解析器 Hong Kong Address Parser
Data Analytics Reference Stack | Clear Linux* Project

AugLy: A new data augmentation library to help build more robust AI models
facebookresearch/AugLy: A data augmentations library for audio, image, text, and video.

Data Build Tool/dbt

What is dbt? | dbt Developer Hub

Transform Your Data Like a Pro With dbt (Data Build Tool) - DEV Community

DataStation

DataStation | The Data IDE for Developers
multiprocessio/datastation: Easily query, script, and visualize data from every database, file, and API.
multiprocessio/dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Python

data-analytics-python

JavaScript

Danfo.js Documentation - Danfo.js Pandas for JavaScript
Hello from Scikit.js | Scikit.js Scikit Learn for JavaScript
JSdata

Crossfilter Pandas for JavaScript
How to Create an Interactive Dashboard with Crossfilter and Dc.Js

scijs
ndarray
Implementing Multidimensional Arrays in JavaScript | 0 FPS

tidy.js
tidy.js – Intro & Demo / Peter Beshai / Observable

C

Articles on Mathematics, Physics and Computer Science

muparser - fast math parser library

Go

DataFrames in Go with gota, qframe, and dataframe-go - MungingData

gonum
plot package - gonum.org/v1/plot - pkg.go.dev

tobgu/qframe: Immutable data frame for Go
go-gota/gota: Gota: DataFrames and data wrangling in Go (Golang)
rocketlaunchr/dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Rust

Vector | A lightweight, ultra-fast tool for building observability pipelines

Polars

Polars
Pandas vs. Polars: A Syntax and Speed Comparison | by Leonie Monigatti | Jan, 2023 | Towards Data Science
Why Polars uses less memory than Pandas
Replacing Pandas with Polars. A Practical Guide. - Confessions of a Data Guy
Polars for initial data analysis, Polars for production