O que é Ilum?
Modular Data Lakehouse Platform for Cloud-Native Apache Spark Workloads
Ilum is an Apache Spark management platform designed for Kubernetes e Apache Hadoop Yarn environments. It provides enterprise-grade cluster orchestration, interactive Spark sessions via REST API , and seamless integration with modern data engineering tools including Jupyter , Fluxo de ar Apache , Fluxo de ML e Lago Delta / Icebergue / Hudi table formats.
Key Capabilities
- Kubernetes-native Spark operator with automatic pod orchestration and resource management
- Multi-cluster management across cloud providers (GKE, EKS, AKS) and on-premise deployments
- Interactive Spark sessions accessible through REST API endpoints for building Spark-based microservices
- Apache Hadoop Yarn integration for hybrid cluster architectures
- Built-in S3-compatible object storage for cloud-native data lake architectures
- Horizontal scalability from single-node development to production clusters with hundreds of executors
Get started with Ilum →| View architecture documentation →
Ilum - Apache Spark on Kubernetes Platform
Ilum transforms Apache Spark cluster management by providing a unified control plane for Kubernetes e Fio -based Spark deployments. Unlike traditional Spark management approaches, Ilum treats Spark applications as first-class microservices with REST API interfaces, enabling real-time data processing architectures.
Arquitetura
Ilum's architecture consists of two core components:
- Núcleo de Ilum : Backend service providing gRPC/Kafka-based job orchestration, cluster state management, and REST API endpoints
- Ilum UI : Web-based dashboard for cluster monitoring, job submission, and resource visualization
The platform supports both Pitão (PySpark) and Scala programming languages, with native integration for Spark SQL , Spark Streaminge MLlib frameworks.
Data Lakehouse Capabilities
Ilum provides comprehensive support for modern table formats:
- Lago Delta : ACID transactions with time travel and schema evolution
- Apache Iceberg : Partition evolution and hidden partitioning for large-scale analytics
- Apache Hudi : Record-level upserts and incremental data processing
- Mesas Ilum : Unified API abstraction across multiple table formats
Integration with Hive Metastore e Nessie catalog enables SQL-based metadata management and Git-like data versioning.
REST API for Spark Microservices
Ilum exposes Spark functionality through RESTful endpoints:
# Submit Spark job
POST /api/v1/jobs
# Query interactive session
POST /api/v1/sessions/{id}/execute
# Monitor job status
GET /api/v1/jobs/{id}/status
This enables building responsive data applications where Spark computations are triggered by HTTP requests, supporting use cases like:
- Real-time feature engineering for ML models
- On-demand data transformations via API
- Streaming analytics with REST-based controls
- Jupyter notebook execution through HTTP interface
Multi-Cluster Orchestration
Ilum manages heterogeneous Spark clusters from a single control plane:
- Cloud clusters: GKE, EKS, AKS with auto-scaling groups
- On-premise clusters: Bare metal Kubernetes or Hadoop Yarn deployments
- Hybrid architectures: Mixed cloud and on-premise for data sovereignty requirements
Each cluster maintains independent resource quotas, storage backends, and security policies while sharing centralized monitoring and job scheduling.
Comparison with Alternative Solutions
| Característica | Ilum | Databricks | Cloudera |
|---|---|---|---|
| Kubernetes native | ✓ | Partial | Partial |
| Multi-cluster management | ✓ | Limited | ✓ |
| Vendor lock-in | None | High | High |
| REST API for sessions | ✓ | ✓ | Limited |
| On-premise deployment | ✓ | Limited | ✓ |
| Cloud deployment | ✓ | ✓ | ✓ |
| Yarn integration | ✓ | ✗ | ✓ |
Video Overview
Prefer a guided path? Build your first data product on Ilum in hours. Official course →.
Características
Spark Cluster Management
- Kubernetes Operator Integration: Native CRD-based Spark application deployment with pod lifecycle management
- Multi-cluster Control Plane: Centralized management for GKE, EKS, AKS, and on-premise Kubernetes clusters
- Horizontal Pod Autoscaling: Dynamic executor scaling based on CPU/memory metrics and queue depth
- Cotas de recursos : Namespace-level limits for CPU cores, memory, and persistent volume claims
Interactive Computing
- REST API Endpoints: HTTP interface for Spark session creation, code execution, and result retrieval
- Jupyter Integration: Spark Magic kernels with automatic session binding and DataFrame visualization
- Apache Zeppelin Notebooks: Multi-language interpreters (Scala, Python, SQL) with paragraph-level execution
- Code Groups: Reusable Spark contexts shared across multiple notebook sessions
Storage & Data Formats
- S3-Compatible Object Storage: MinIO-based distributed storage with S3 API compatibility
- Table Format Support: Delta Lake, Iceberg, Hudi with ACID guarantees and schema evolution
- Catalog Integration: Hive Metastore, AWS Glue, Nessie for metadata management
- Distributed File Systems: HDFS, GCS, Azure Blob Storage, and S3 connectivity
Orchestration & Scheduling
- Built-in Scheduler: Cron-based job scheduling with dependency management
- Apache Airflow Integration: DAG-based workflow orchestration with Spark operators
- Kestra Support: Event-driven pipelines with Spark task execution
- dbt Core: SQL transformations with Spark as execution engine
Monitoring & Observability
- Servidor de Histórico do Spark : Job timeline, stage metrics, and executor resource utilization
- Prometheus Integration: Custom metrics for application-level monitoring
- Grafana Dashboards: Pre-configured visualizations for cluster health and job performance
- Loki Log Aggregation: Centralized logging with Promtail collectors
- Linhagem aberta : Data lineage tracking for table-level dependencies
Security & Access Control
- RBAC Policies: Kubernetes-native role-based access with fine-grained permissions
- OAuth2/OIDC: Integration with Keycloak, Okta, Azure AD for authentication
- TLS/mTLS: Certificate-based encryption for inter-service communication
- LDAP/Active Directory: Enterprise directory service integration
- Network Policies: Pod-to-pod traffic restrictions and egress controls
Explore full feature documentation →| Request new features →
Vantagens
Cloud-Native Architecture
Ilum is designed as a cloud-native first platform with containerized services, declarative configuration, and GitOps-compatible deployment:
- Helm Charts: Parameterized Kubernetes manifests for reproducible deployments
- Container Images: Official images for Spark 3.x with pre-installed connectors (S3, GCS, Azure)
- Custom Resource Definitions: Kubernetes API extensions for Spark application management
- Service Mesh Ready: Compatible with Istio/Linkerd for advanced traffic management
No Vendor Lock-In
Unlike proprietary platforms, Ilum provides:
- Open APIs: REST and gRPC interfaces following OpenAPI specifications
- Standard Protocols: JDBC/ODBC connectivity, S3 API compatibility, Kafka integration
- Portable Workloads: Spark applications run on any Kubernetes cluster without modification
- Multi-Cloud Support: Deploy across AWS, GCP, Azure without platform-specific dependencies
Hadoop Migration Path
For organizations migrating from Hadoop/HDFS ecosystems:
- Yarn Compatibility: Run existing Yarn-based Spark jobs without code changes
- HDFS Connector: Direct access to HDFS clusters during migration phases
- Hive Metastore : Reuse existing table metadata and partitioning schemes
- Migração incremental : Gradual transition with hybrid Yarn/Kubernetes deployment
Performance Optimization
Ilum includes performance enhancements:
- Dynamic Allocation: Automatic executor scaling based on shuffle data and pending tasks
- Adaptive Query Execution (AQE): Runtime optimization for join strategies and partition coalescing
- Columnar Caching: Parquet/ORC in-memory caching with LRU eviction policies
- Network-Aware Scheduling: Pod placement considering data locality and network topology
Enterprise Integration
Built for enterprise data platforms:
- Apache Kafka : Native Spark Structured Streaming integration with exactly-once semantics
- Fluxo de ar Apache : Managed Airflow instances with Spark operators pre-configured
- Fluxo de ML : Model registry and experiment tracking for machine learning pipelines
- Superset/Tableau: BI tool connectivity via JDBC drivers and load balancers
Read architecture documentation →| View use cases →
Project Roadmap
Explore planned features and integrations:
- Flink Operator: Stream processing workloads alongside Spark batch jobs
- GPU Scheduling: CUDA-enabled executors for deep learning workloads
- Cost Attribution: Resource usage tracking with cloud billing integration
View full roadmap →| See changelog →
Recursos adicionais
- Referência da API : REST API documentation for programmatic access
- Security Guide: Authentication, authorization, and network policies
- Production Deployment: Best practices for production clusters
- Guias do usuário : Step-by-step tutorials for common workflows