An Overview of Python’s Datatable package – Towards Data Science
Algorithmic Trading with Python – Free 4-hour Course With Example Code Repos
Algorithmic Trading Using Python - Full Course - YouTube
Hello! — Practical Data Science
Python Data Analysis Library(pandas
) massage data into a tabular state so it can be modeled
Spyder - Documentation Scientific PYthon Development EnviRonment
Python Data Transformation Tools for ETL - Towards Data Science
Data Preprocessing in Data Mining & Machine Learning
Data Preprocessing in Python - Towards Data Science
Assessing the Quality of Data - Towards Data Science
Five Command Line Tools for Data Science - Towards Data Science
Data Scientists, The 5 Graph Algorithms that you should know
Why and How to Use Pandas with Large Data - Towards Data Science
Tips for Handling Large Datasets in Python - KDnuggets
Build a Data Science App with Python in 10 Easy Steps - KDnuggets
Building Interactive Data Science Applications with Python - KDnuggets
Top 10 Python Libraries for Data Science - Towards Data Science
5 essential Python programming tools for data science—now updated
5 Hidden Gem Python Libraries for Data Science - KDnuggets
Essential Python Libraries for Data Manipulation - KDnuggets
Weekend Reading: Python | Linux Journal science and ML
Oktoberfest : Quick analysis using Pandas, Matplotlib, and Plotly
Python Data Science Handbook | Python Data Science Handbook âť—!important
pydata/pydata-cookbook: PyData Cookbook Project
Data Analysis with Dr Mike Pound - YouTube Computerphile
Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn) - YouTube
Try Docker image for Intel® Distribution for Python* | Intel® Software
NumPy
NumPy — NumPy
NumPy - Wikiwand
Numpy and Scipy Documentation — Numpy and Scipy documentation
NumPy User Guide — NumPy Manual
Indexing — NumPy Manual
Quickstart tutorial — NumPy Manual
numpy.reshape — NumPy v1.21 Manual
Axis 0: columns
Axis 1: rows
Axis 2: height
Welcome to numpy-ml — numpy-ml documentation
ddbourgin/numpy-ml: Machine learning, in numpy
Free Deep Learning Tutorial - Deep Learning Prerequisites: The Numpy Stack in Python V2 | Udemy
How to create NumPy arrays from scratch? - Towards Data Science
A Visual Intro to NumPy and Data Representation – Jay Alammar – Visualizing machine learning one concept at a time
Numpy Guide for People In a Hurry – Towards Data Science
Reshape numpy arrays—a visualization | Towards Data Science
The Easiest Python Numpy Tutorial Ever - Towards Data Science
A Complete Beginners Guide to Matrix Multiplication for Data Science with Python Numpy | by Chris I. | Towards Data Science
Array Oriented Programming with Python NumPy | by Semi Koen | Towards Data Science
NumPy Crash Course: Array Basics - Towards Data Science
27 NumPy Operations for beginners - Towards Data Science
10 quick Numpy tricks that will make life easier for a data scientist | by Harsh Maheshwari | Jun, 2021 | Towards Data Science
Look Ma, No For-Loops: Array Programming With NumPy – Real Python
np.linspace(): Create Evenly or Non-Evenly Spaced Arrays – Real Python
Python Numpy Array Tutorial (article) - DataCamp
NumPy indexing explained. NumPy is the universal standard for… | by Àlex Escolà Nixon | Towards Data Science
10 quick Numpy tricks that will make life easier for a data scientist | by Harsh Maheshwari | Jun, 2021 | Towards Data Science
Deep Learning Prerequisites: The Numpy Stack in Python (V2+) | Udemy free
Deep Learning Prerequisites: The Numpy Stack in Python Extra Resources - Lazy Programmer
A NumPy affair: Broadcasting - Towards Data Science
Count value
python - Frequency counts for unique values in a NumPy array - Stack Overflow
def count_unique(keys):
uniq_keys = np.unique(keys)
bins = uniq_keys.searchsorted(keys) # find index of key in uniq_keys
return uniq_keys, np.bincount(bins) # bincount indices
Numpy on GPU
CuPy
Here’s how to use CuPy to make Numpy 700X faster - Towards Data Science
FilipeMaia/afnumpy: A GPU-ready drop-in replacement for numpy.
Apache MXNet (incubating) Documents — deepnumpy documentation
NDArray - Scientific computing on CPU and GPU — mxnet documentation
Shohei Hido - CuPy: A NumPy-compatible Library for GPU - PyCon 2018 - YouTube slide
CuPy: A NumPy Compatible Library for High Performance Computing with GPU | SciPy 2019 | SciPy 2019 | - YouTube
SciPy
SciPy.org — SciPy.org
SciPy - Wikiwand
SciPy is an open-source Python-based tool used for scientific and technical computing. It is built on the NumPy extension and allows Python programmers to manipulate and visualize data with a wide range of high-level commands. SciPy is popular in the field of Mathematics, Science, and Engineering.
Numpy and Scipy Documentation — Numpy and Scipy documentation
Optimizing complex simulations? Use Scipy interpolation | by Tirthajyoti Sarkar | Oct, 2021 | Towards Data Science
Linear Algebra in Python: Matrix Inverses and Least Squares – Real Python
Numba
Numba: A High Performance Python Compiler
numba/numba: NumPy aware dynamic Python compiler using LLVM
3. Numba for CUDA GPUs — Numba documentation
Numba: High-Performance Python with CUDA Acceleration | NVIDIA Developer Blog
Numba: Tell those C++ bullies to get lost | SciPy 2016 Tutorial | Gil Forsyth & Lorena Barba - YouTube
Make Python code 1000x Faster with Numba - YouTube
Accelerating Scientific Workloads with Numba - Siu Kwan Lam - YouTube
How to Accelerate an Existing Codebase with Numba | SciPy 2019 | Siu Kwan Lam, Stanley Seibert - YouTube
Numba: High-Performance Python with CUDA Acceleration | Svelte Hacker News
Python Numba or NumPy: understand the differences - Towards Data Science
Run Your Python User Defined Functions in Native CUDA Kernels with RAPIDS cuDF | by Jiqun Tu | RAPIDS AI | Medium
Dask: Scalable analytics in Python
Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba
xarray
xarray: N-D labeled arrays and datasets in Python — xarray documentation
better API to address columns, akin to pandas
Rapids
Open GPU Data Science | RAPIDS
Getting Started | RAPIDS
rapidsai/cudf: cuDF - GPU DataFrame Library
Python Pandas at Extreme Performance - Towards Data Science
GPU Accelerated Data Analytics & Machine Learning - Towards Data Science
Here’s how you can accelerate your Data Science on GPU
Here’s how you can speedup Pandas with cuDF and GPUs
XGBoost Documentation — xgboost documentation gradient boosting model
A Gentle Introduction to XGBoost for Applied Machine Learning
Introduction to XGBoost in Python
New Features and Optimizations for GPUs in XGBoost 1.1
abhishekkrthakur/autoxgb: XGBoost + Optuna
[QST] Can cuml and cudf installed on nvidia jetson tx1/tx2/nano? · Issue #665 · rapidsai/cuml
Pandas
Python Data Analysis Library — pandas: Python Data Analysis Library
User Guide — pandas documentation
API Reference — pandas documentation
Hannah Stepanek - Thinking like a Panda: Everything you need to know to use pandas the right way. - YouTube
Pandas Cheat Sheet: Data Science and Data Wrangling in Python - KDnuggets
Python-Pandas cheat sheet: 30 functions-methods | by Jyoti Kumar | Aug, 2022 | Mediumr
pandas - Getting started with pandas | pandas Tutorial
Learn Python, Data Science & Machine Learning with expert instruction
Pandas Tutorials – Dunder Data – Medium
Explore Your Dataset With Pandas – Real Python
Finding Temporal Patterns in Twitter Posts: Exploratory Data Analysis with Python | by Dmitrii Eliuseev | May, 2023 | Towards Data Science
Full Stack Pandas. Lesser known functionality of the… | by Sayar Banerjee | Towards Data Science
Pandas Makes Python Better. Something I’ve wanted to talk about for… | by Emma Boudreau | Towards Data Science
10 Python Skills They Don’t Teach in Bootcamp | Towards Data Science
Improve pandas performance with eval and query | Python in Plain English
Using numba to make pandas operations faster | Towards Data Science
Stylin’ with Pandas - Practical Business Python
Efficiently Cleaning Text with Pandas - Practical Business Python
10 Python One-Liners That Will Boost Your Data Science Workflow - MachineLearningMastery.com
7 Ways to Improve Your Data Cleaning Skills with Python - KDnuggets
pandas-profiling/pandas-profiling: Create HTML profiling reports from pandas DataFrame objects
Validation
pandera can use Pydantic model syntax for schema
How to Use Pandas With Pandera to Validate Your Data in Python - YouTube
ArjanCodes/2023-pandera
Modin
Scale your pandas workflow by changing a single line of code — Modin documentation
modin-project/modin: Modin: Speed up your Pandas workflows by changing a single line of code
Tutorials
Python Pandas Tutorial
Python - Data Science Tutorial - Tutorialspoint
Time Series Tutorial - Tutorialspoint
Examining Data Using Pandas | Linux Journal
Introduction to Pandas | Machine Learning, Deep Learning, and Computer Vision
Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects – Real Python
3 Excel Functions and How to Do Them in Python! | Towards Data Science
Top 3 Pandas Functions I Wish I Knew Earlier | by Dario Radečić | Towards Data Science
Exploratory Data Analysis using Python | ActiveState
Quick and Dirty Data Analysis with Pandas
Pandas: The Swiss Army Knife for Your Data, Part 1
Pandas: The Swiss Army Knife for Your Data, Part 2
Video series: Easier data analysis in Python using the pandas library
Applying Statistics in Python — part I - Towards Data Science
Applying Statistics in Python — part II - Towards Data Science
Why Are We Teaching Pandas Instead of SQL? | HackerNoon compares Pandas ans SQL,
An End-to-End Project on Time Series Analysis and Forecasting with Python
The Easiest Data Cleaning Method using Python & Pandas
Seven Clean Steps To Reshape Your Data With Pandas Or How I Use Python Where Excel Fails
DataFrame
Intro to Data Structures — pandas documentation
Series:
1-D array with index
as axis label
equivalent to a dict
with key as index
mostly compatible to NumPy's ndarray
DataFrame:
2-D labeled data, like table or dict
of series with key as columns
index
is row labels , columns
is column (field) labels
not intended to work as 2-D ndarray
Show all rows
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_rows', df.shape[0]+1)
with pd.option_context('display.max_rows', None, 'display.max_columns', None): # more options can be specified also
print(df)
Intro to pandas data structures
Working with DataFrames
Using pandas on the MovieLens dataset
Creating Pandas DataFrames from Lists and Dictionaries - Practical Business Python
Python Dataclasses With Properties and Pandas | by Sebastian Ahmed | The Startup | Medium
SettingWithCopyWarning in Pandas: Views vs Copies – Real Python
Views and Copies in pandas — Practical Data Science
Python Pandas DataFrame: load, edit, view data | Shane Lynn
How to Merge Large DataFrames Efficiently with Pandas - KDnuggets
Combining Data in Pandas With merge(), .join(), and concat() – Real Python
Reshape pandas dataframe | Towards Data Science Convert long to wide with pd.pivot_table
Reshape pandas dataframe in Python | Towards Data Science Convert wide to long with pd.melt
Using the Pandas Data Frame as a Database. - Towards Data Science
Build pipelines with Pandas using “pdpipe” - Towards Data Science
Exploring your data with just 1 line of Python - Towards Data Science
Apply and Lambda usage in pandas - Towards Data Science
How to Use the pivot_table Function for Advanced Data Summarization in Pandas - KDnuggets
datas-frame – Modern Pandas (Part 1)
datas-frame – Modern Pandas (Part 2): Method Chaining
datas-frame – Modern Panadas (Part 3): Indexes
datas-frame – Modern Pandas (Part 4): Performance
datas-frame – Modern Pandas (Part 8): Scaling
datas-frame – Modern Pandas (Part 6): Visualization
datas-frame – Modern Pandas (Part 7): Timeseries
datas-frame – Modern Pandas (Part 8): Scaling
Pandas Tutorial 1: Pandas Basics (read_csv, DataFrame, Data Selection)
Pandas Tutorial 2: Aggregation and Grouping
Pandas Tutorial 3: Important Data Formatting Methods (merge, sort, reset_index, fillna)
Date types (dtype
)
Overview of Pandas Data Types - Practical Business Python
Categorical data — pandas documentation
Pandas Category Type: Pros and Cons | by Arli | Jan, 2023 | Level Up Coding
Working with Large Data Sets Made Easy: Understanding Pandas Data Types - YouTube
df.info()
df["A"].astype("category")
df["A"].value_counts()
df.astype({"A": "category", "B": "boolean", "D": "datetime[timezone]})
Indexing/Selection
Tips for Selecting Columns in a DataFrame - Practical Business Python
Python : 10 Ways to Filter Pandas DataFrame
python - How to select rows from a DataFrame based on column values - Stack Overflow
pandas.DataFrame.query — pandas documentation
MultiIndex / advanced indexing — pandas documentation
python - How do I select rows from a DataFrame based on column values? - Stack Overflow
idx = pd.MultiIndex.from_tuples([('Chris',48), ('Brian',np.nan), ('David',65),('Chris',34),('John',28)],
names=['Name', 'Age'])
col = ['Salary']
df = pd.DataFrame([120000, 140000, 90000, 101000, 59000], idx, col)
Serialization
IO tools (text, CSV, HDF5, …) — pandas documentation
Serializing pandas DataFrames | Pythontic.com
pandas-datareader — pandas-datareader documentation
How to use Pandas read_html to Scrape Data from HTML Tables
The Best Format to Save Pandas Data | by Ilia Zaitsev | Towards Data Science benchmarks
Tips and Tricks
Python Pandas: Tricks & Features You May Not Know – Real Python
Idiomatic Pandas: Tricks & Features You May Not Know – Real Python
25 Tricks for Pandas
10 Powerful Python Tricks for Data Science you Should Try Today
How to make your Pandas operation 100x faster | by Yifei Huang | Towards Data Science
Pandas tips and tricks. This post includes some useful tips for… | by Shir Meir Lador | Towards Data Science
5 lesser-known pandas tricks. 5 lesser-known pandas tricks that help… | by Roman Orac | Towards Data Science
Pandas and Python Tips and Tricks for Data Science and Data Analysis | by Zoumana Keita | Dec, 2022 | Towards Data Science
Display Customizations for pandas Power Users | by Roman Orac | Towards Data Science
My Python Pandas Cheat Sheet. The pandas functions I use everyday as… | by Chris I. | Towards Data Science
How To Make Your Pandas Loop 71803 Times Faster | by Benedikt Droste | Towards Data Science
Articles: Speed up your data science and scientific computing code
For the Love of God, Stop Using iterrows() – r y x, r df.itertuples(index=False)
# column name
col_mapping = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]
python - Pretty-print an entire Pandas Series / DataFrame - Stack Overflow
Binning Data with Pandas qcut and cut - Practical Business Python
GUI/Visualizer
A GUI for pandas | bamboolib
Introducing Bamboolib — a GUI for Pandas - Towards Data Science
Bamboolib — Learn and use Pandas without Coding - Towards Data Science
man-group/dtale: Visualizer for pandas data structures
dtale · PyPI
D-Tale (house_data)
Styling — pandas documentation
How to Use Conditional Formatting in Pandas to Enhance Data Visualization - KDnuggets
groupby
groupby — pandas documentation
pandas.core.groupby.DataFrameGroupBy.agg — pandas documentation
Pandas Grouper and Agg Functions Explained - Practical Business Python
Apply Operations To Groups In Pandas
Summarising, Aggregating, and Grouping data in Python Pandas | Shane Lynn
How to use Pandas Count and Value_Counts | kanoki
python - Get statistics for each group (such as count, mean, etc) using pandas GroupBy? - Stack Overflow make groupby()
result a dataframe
# group the dataframe by regiment
gb = df.groupby('regiment')
# for each regiment
for name, group in df.groupby('regiment'):
# print the name of the regiment
print(name)
# print the data of that regiment
print(group)
# make count result a dataframe to add more statistics
counts = gb.size().to_frame(name='counts')
filter
python - pandas: filter rows of DataFrame with operator chaining - Stack Overflow
How To Filter Pandas Dataframe By Values of Column? — Python, R, and Linux Tips
df.field == value
creates a list of matching indices
so df[df.field == value]
is a filtered list of data with those indices
Or use:
pandas.DataFrame.query — pandas documentation
Pivot Tables
Pivot Tables | Python Data Science Handbook
Pandas Crosstab Explained - Practical Business Python
pbpython/crosstab_cheatsheet.pdf at master · chris1610/pbpython
Check memory usage
df.memory_usage(deep=True)
#perfmatters
Enhancing Performance — pandas documentation
Scale your pandas workflow by changing a single line of code. — Modin documentation
4 Methods to Optimize Python Code for Data Science
Parallel Pandas – KRSTN concurrent.futures.ProcessPoolExecutor
faster than multiprocessing.Pool
How to use Pandas the RIGHT way to speed up your code
Parallelize Pandas map() and apply() while accounting for future records – Adeel's Corner
JAX
JAX: High-Performance Array Computing — JAX documentation
Sucessor of Autograd and XLA
PyTorch
Using PyTorch to accelerate analytics
GPU Accelerated Python - YouTube
Accelerate PyTorch across any distributed configuration
Kedro
Welcome to Kedro’s documentation!
kedro-org/kedro: A Python framework for creating reproducible, maintainable and modular data science code.
pingouin
Installation — pingouin documentation
The new kid on the statistics-in-Python block: pingouin
Input/Output
IO Tools (Text, CSV, HDF5, …) — pandas documentation
For reading: xlrd
(XLS), openpyxl
(XLSX)
For writing: openpyxl
/xlsxwriter
(XLSX), PyTables
(HDF5)
Three Ways of Storing and Accessing Lots of Images in Python – Real Python
Intake — intake documentation
intake/intake: Intake is a lightweight package for finding, investigating, loading and disseminating data.
PySpark
Welcome to Spark Python API Docs! — PySpark master documentation
First Steps With PySpark and Big Data Processing – Real Python
PySpark and SparkSQL Basics - Towards Data Science
JDSL
Mats Eikeland Mollestad | How I Accidentally Created the “JDSL” of Data Pipelines - And It's Awesome
Accidently created JDSL | Prime Reacts - YouTube