Chronon Airbnb’s Feature Engineering Framework

By Nikhil Simha et al
Read the original document by opening this link in a new tab.

Table of Contents

Announcements
Agenda
Goals and Requirements
API Overview
Concepts & Examples
Dependencies Overview
Integration guide
Goals - management
Uniform API
Python + Spark SQL
Online & Offline
Raw Data -> Training Data
Raw Data -> Feature Serving
Feature Repository
Compiled Team based Feature monitoring
Goals - API
GroupBy - Aggregation engine
Join - PITC joins
Staging Query
Arbitrary ETL to prepare data
Goals - computation
Realtime Features
Stream processing + Batch processing + Storage + Fetching
Backfills
Requirements
Offline - problem statement (item recommendation)
Offline - problem statement
Online - problem statement
Examples – E-Commerce platform
Aggregation operations
Aggregation Inputs
GroupBy Concepts
Aggregations
Windows – Sliding
Windows – Hopping
Windows – Sawtooth
JoinConcepts
User workflow
Explore
Compile
Run.py - testing
Model Server
Scheduling needs
Repo structure
Scripts
Workflows – offline
Workflows – Online
Online Integration
Airflow Scheduling
Perf Stats
Data Stats
Cases
Problem statement - Events
Approaches
Tiling windows
Window tiling
Topology 1/2
Topology 2/2
Window tiling - final
Resources
Opinions
Appendix - Tree Tiling

Summary

Chronon Airbnb’s Feature Engineering Framework provides a comprehensive overview of the feature engineering process at Airbnb, now known as Chronon. The framework covers various aspects such as API overview, goals and requirements, integration guides, concepts and examples, dependencies, and more. It delves into the management goals, uniform API design, handling raw data for training and serving, feature monitoring, and computation aspects. The document also discusses real-time features, aggregation engines, join operations, backfills, offline and online problem statements, and workflows for both offline and online scenarios. Additionally, it explores user workflows, model serving, scheduling requirements, performance statistics, data quality metrics, event processing, window tiling strategies, and resources for efficient data processing. The presentation provides insights into the challenges and approaches in feature engineering, emphasizing the importance of efficient data handling and processing techniques.
×
This is where the content will go.