Overview
API Reference Overview¶
DataBeak provides 40+ tools for comprehensive CSV manipulation through the Model Context Protocol (MCP). All tools return structured responses and include comprehensive error handling.
Tool Categories¶
📁 I/O Operations¶
Tools for loading CSV data from web sources:
load_csv_from_url- Load CSV from HTTP/HTTPS URLload_csv_from_content- Load CSV from string contentget_session_info- Get current session details and statisticslist_sessions- List all active sessionsclose_session- Close and cleanup a session
🔧 Data Manipulation¶
Tools for transforming and modifying CSV data:
filter_rows- Filter rows with complex conditions (AND/OR logic)sort_data- Sort by single or multiple columnsselect_columns- Select specific columns by name or patternrename_columns- Rename columns with mappingadd_column- Add computed columns with formulasremove_columns- Remove unwanted columnsupdate_column- Update column values with transformationschange_column_type- Convert column data typesfill_missing_values- Handle null/NaN values with strategiesremove_duplicates- Remove duplicate rows with optional key columns
📊 Data Analysis¶
Tools for statistical analysis and insights:
get_statistics- Descriptive statistics for numeric columnsget_column_statistics- Detailed stats for specific columnsget_correlation_matrix- Pearson, Spearman, and Kendall correlationsgroup_by_aggregate- Group data with aggregation functionsget_value_counts- Frequency counts for categorical datadetect_outliers- Find outliers using IQR, Z-score, or custom methodsprofile_data- Comprehensive data profiling report
✅ Data Validation¶
Tools for schema validation and quality checking:
validate_schema- Validate data against schema definitionscheck_data_quality- Overall data quality scoringfind_anomalies- Detect statistical and pattern anomalies
🔄 Session Management¶
Tools for managing data sessions:
list_sessions- List all active sessionsclose_session- Close and cleanup a sessionget_session_info- Get session metadata and statistics
⚙️ System Tools¶
System information and health monitoring:
health_check- Server health and statusget_server_info- Server capabilities and configuration
Common Patterns¶
Error Handling¶
All tools return consistent response format:
Error responses:
Session Management¶
Most tools require a session_id parameter. Sessions are automatically created
and managed with configurable timeouts.
Data Types¶
DataBeak supports rich data types including:
- Strings, Numbers, Booleans
- Dates and DateTime objects
- Null values (JSON
null→ PythonNone→ pandasNaN)
Filtering Conditions¶
Filter operations support complex conditions:
{
"conditions": [
{"column": "age", "operator": ">", "value": 18},
{"column": "status", "operator": "==", "value": "active"}
],
"logic": "AND" // or "OR"
}
Environment Configuration¶
All tools respect these environment variables (all use DATABEAK_ prefix):
| Variable | Default | Purpose |
|---|---|---|
DATABEAK_SESSION_TIMEOUT |
3600 | Session timeout (seconds) |
DATABEAK_MAX_DOWNLOAD_SIZE_MB |
100 | Maximum URL download size (MB) |
DATABEAK_MAX_MEMORY_USAGE_MB |
1000 | Max DataFrame memory (MB) |
DATABEAK_MAX_ROWS |
1,000,000 | Max DataFrame rows |
DATABEAK_URL_TIMEOUT_SECONDS |
30 | URL download timeout (seconds) |
DATABEAK_HEALTH_MEMORY_THRESHOLD_MB |
2048 | Health monitoring threshold (MB) |
See DatabeakSettings for all configuration options.
Advanced Features¶
Null Value Support¶
Full support for null values across all operations:
- JSON
nullvalues are preserved and handled correctly - Python
Noneand pandasNaNcompatibility - Filtering and operations work seamlessly with nulls
Stateless Architecture¶
Clean MCP server design:
- Session-based processing - Data operations without internal state
- External persistence - Context handles data persistence as needed
- Resource efficient - No overhead from history or auto-save tracking
- MCP-aligned - Follows Model Context Protocol server patterns
For detailed examples and tutorials, see the Quick Start Guide