Fundamentals of Data Visualization - Mike Hung's ML/DS/DL/AI

Collections

🖊️

Blogs

📚️

SWE

🤖

ML/DS/DL/AI

🧰

Development Tools

🔧

Python Tricks

Filter

Sorter

Understand torch.scatter

Cheat Sheets for Machine Learning and Data Science

What frustrates Data Scientists in Machine Learning projects?

My bloody experience that worked for an AI startup.

Python Data Science Handbook | Python Data Science Handbook

Free Book

An Introduction to VisiData — An Introduction to VisiData

r/datascience - Imposter syndrome can really feel overwhelming.

63 votes and 16 comments so far on Reddit

Data in Wonderland

Explores communication with data in various forms through seminal and cutting-edge ideas in writing, data analyses, and visualzation.

PyTorch Internals (how pytorch works from the inside)

Introduction to GPUs: CUDA

How to fine tune VERY large model if it doesn’t fit on your GPU

Memory-efficient techniques to defeat the problem of “CUDA memory error..” during training

GitHub - cybertronai/gradient-checkpointing: Make huge neural nets fit in memory

Make huge neural nets fit in memory. Contribute to cybertronai/gradient-checkpointing development by creating an account on GitHub.

Gradient Accumulation in PyTorch

Increasing batch size to overcome memory constraints

Python for Data Analysis, 3E

Free Book

Fundamentals of Data Visualization

Free Book

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine Lea…

MLU-Explain

Visual explanations of core machine learning concepts.

GitHub - mryab/efficient-dl-systems: Efficient Deep Learning Systems course materials (HSE, YSDA)

Efficient Deep Learning Systems course materials (HSE, YSDA) - GitHub - mryab/efficient-dl-systems: Efficient Deep Learning Systems course materials (HSE, YSDA)

OpenML

OpenML is an open platform for sharing datasets, algorithms, and experiments - to learn how to learn better, together.

Hugging Face – The AI community building the future.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Inference Optimization for Convolutional Neural Networks

Quantization and fusion for faster inference

Build your open-source MLOps stack | MyMLOps

Explore 50+ most popular open-source MLOps tools. Build your own stack based on our template.

Drivendata

Part 1: Key Concepts in RL — Spinning Up documentation

Introduction to Machine Learning Interviews Book · MLIB

Neural Fields: Home

paulbridger.com

Deep dive machine learning articles with a focus on solving the hard problems in production engineering.

Learnings from Google’s comprehensive research into activation functions

This is a field that is heating up. Keep your eyes out on it

14.4. Anchor Boxes — Dive into Deep Learning 1.0.0-alpha1.post0 documentation

Interpreting the Latent Space of GANs for Semantic Face Editing

Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs foll…

The Principles of Deep Learning Theory

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iterati…

You Only Learn One Representation: Unified Network for Multiple Tasks

People ``understand″ the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously…

Few-Shot Forecasting of Time-Series with Heterogeneous Channels

Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. Howe…

Information Theory Basic for Machine Learning & Deep Learning

Information theory

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep le…

Generalized Out-of-Distribution Detection: A Survey

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never…

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

Out-of-distribution (OOD) detection has received much attention lately due to its importance in the safe deployment of neural networks. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Previous ap…

Model Assertions for Monitoring and Improving ML Models

ML models are increasingly deployed in settings with real world interactions such as vehicles, but unfortunately, these models can fail in systematic ways. To prevent errors, ML engineering teams monitor and continuously improve these models. We propose a new abstraction, model assertions, that adap…

Bounding Box Regression with Uncertainty for Accurate Object Detection

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding b…

Masked Visual Pre-training for Motor Control

Self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

I Deep Faked Myself In Every Meeting For A Whole Week

Not even one person noticed.

The Next Big Challenge for Data is Organizational - Locally Optimistic

We have the tools we need to build the data world we want to live in. Now we just need a little organization. It’s time to bring the assembly line to data.

Data team structure: embedded or centralised?

Embedded data teams are closer to business problems but ownership of what to do when data goes wrong is complicated

The Secret of Delivering Machine Learning to Production

The vast majority of Machine Learning (ML) projects FAIL.

Home - Made With ML

Learn how to responsibly deliver value with ML.

CS 329S: Machine Learning Systems Design

What I’ve learned about documentation for data teams

I used to be a hypocrite. There are numerous records of me in Slack complaining about other people’s lack of documentation. It would not be hard to track down the time I was venting furiously to a colleague about the non-existing API docs for a tool I was working to build out analytics for on a tigh…

Andrej Karpathy

Blog of Sr. Director of AI at Tesla

MLOps Is a Mess But That’s to be Expected

I discuss the messy state of MLOps today and how we are still in the early phases of a broader transformation to bring machine learning value to enterprises globally.

Labeling and Crowdsourcing - Data-centric AI Resource Hub

Asking someone to perform an annotation task, such as labeling an image with text or classifying it into a certain category, may seem simple. However, the huge number of different interpretations of the task makes it difficult for machine learning practitioners to effectively source new training dat…

Why 85% of Machine Learning Projects Fail - How to Avoid This – IIoT World

According to Gartner, 85% of Machine Learning (ML) projects fail. Worse yet, the research company predicts that this trend will continue through 2022. Does this point to some weakness in ML itself? No, it points to weaknesses in the way […]