ai-for-genomic-science

AI for Genomic Science

Welcome to AI for Genomic Science

Author: Joon-Yong An, Korea University

Last Update: 2025/11/2 (Under Construction - probably weekly update by 2026 if I am not lazy enough..)

Front Cover


About This Book

This textbook introduces how artificial intelligence is revolutionizing biological research — from analyzing genetic variants to modeling entire cells. It is designed specifically for 3rd-year (junior/senior) biology undergraduates who want to understand the computational approaches that are transforming genomics, without requiring prior experience in AI or advanced programming.

The field of genomics has been rapidly transformed by machine learning, deep learning, and large-scale computational methods. These advances now allow us to analyze massive genomic datasets, predict functional impacts of variants, and model complex biological systems with unprecedented accuracy. This textbook takes a biology-first approach to cover the essential AI concepts and methods that undergraduate students need to master, integrating computational techniques with genomic applications.

This book is written for biology majors who have a solid foundation in molecular biology and genetics, are curious about computational approaches, want to understand the why and how behind AI methods in genomics, and are comfortable with basic mathematics. Rather than assuming extensive programming background, we start from the basics and build up gradually, emphasizing conceptual understanding alongside practical applications.

All 17 chapters are now available, organized into 5 parts. Each chapter includes an interactive companion page with hands-on simulations and visualizations to reinforce key concepts.


What Makes This Book Different?

🧬 Biology-First Approach

Each chapter starts with a real biological challenge—experimental limitations that motivate computational solutions. You’ll never wonder “why do I need to learn this?”

💻 Hands-On Coding Labs

Every chapter includes Google Colab-based coding exercises. No installation needed—just click and start learning! All code is heavily commented and designed for beginners.

🎯 Clear Mathematical Explanations

Math concepts are explained in Math Boxes with biological examples. We won’t shy away from equations, but we’ll make sure you understand what they mean.

📚 Real Case Studies

Learn from actual research papers and real datasets. See how these methods are being used to make biological discoveries right now.


What You’ll Learn

By the end of this book, you will be able to:

✅ Understand the fundamental concepts of machine learning and deep learning
✅ Explain how AI methods predict the effects of genetic variants
✅ Use pre-trained models to analyze genomic sequences
✅ Interpret results from tools like CADD, DeepSEA, Enformer, and DNABERT
✅ Understand how language models are applied to DNA and RNA sequences
✅ Analyze single-cell omics data using foundation models
✅ Critically evaluate AI-based studies in genomics literature
✅ Write basic Python code for bioinformatics analyses


Book Structure

This textbook is organized into five parts:

Part 1: Foundations of AI for Biology (Chapters 1–4)

Learn the essential AI concepts every biologist should know — from Bayesian intuition to neural network architectures.

Part 2: Genomics Foundations and Traditional Methods (Chapters 5–7)

Understand how traditional and machine learning methods help us characterize genetic variation and interpret variant effects.

Part 3: Deep Learning for Genomics (Chapters 8–10)

Explore how CNNs and transformers predict regulatory elements and variant effects directly from DNA sequences.

Part 4: Language Models and Foundation Models (Chapters 11–14)

Discover how NLP techniques power DNA language models and genomic foundation models, from BERT-style architectures to next-generation long-context models.

Part 5: Single-Cell Genomics and Whole-Cell Modeling (Chapters 15–17)

See how AI is resolving cell-type heterogeneity at single-cell resolution and driving progress toward whole-cell computational models.


How to Use This Book

For Self-Study:

  1. Read each chapter sequentially — concepts build on each other
  2. Open the Interactive companion page linked at the top of every chapter for hands-on exploration
  3. Work through the Coding Labs in Google Colab
  4. Try the Discussion Questions to deepen your understanding
  5. Explore the Further Reading for topics that interest you

For Classroom Use:

Prerequisites:


Getting Started

Setting Up Your Environment

All coding exercises use Google Colab, which runs in your web browser. You’ll need:

  1. A Google account (free)
  2. Internet connection
  3. That’s it!

No software installation required. We’ll walk you through everything in Chapter 1.


Quick Navigation

Part 1: Foundations of AI for Biology

Part 2: Genomics Foundations and Traditional Methods

Part 3: Deep Learning for Genomics

Part 4: Language Models and Foundation Models

Part 5: Single-Cell Genomics and Whole-Cell Modeling


Happy Learning! 🧬🤖


License

License information to be added