Real-world performance tuning from scaling a MariaDB cluster to handle 2M+ queries/day. This isn't theory - it's what actually worked in production.
Before optimization:
- Query time: 800ms average
- Slow queries: 15-20% of total
- Max concurrent connections: 50
- Frequent deadlocks
- Server load during peak: 8.5
After optimization:
- Query time: 35ms average (96% improvement)
- Slow queries: < 1%
- Max concurrent connections: 500
- Zero deadlocks (proper indexing + isolation levels)
- Server load during peak: 2.1
Real impact: Scaled from 100 users to 300+ users on the same hardware.
- production-optimized.cnf - Production config for high-traffic apps
- replication-master.cnf - Master server optimizations
- replication-slave.cnf - Slave server tuning for read replicas
- development.cnf - Local development setup
- sysbench-test.sh - Benchmark your database
- query-profiling.sql - Find slow queries and bottlenecks
- index-analyzer.sql - Identify missing indexes
- performance-report.sh - Generate comprehensive performance report
- add-indexes.sql - Indexes that made the biggest difference
- table-partitioning.sql - Partition large tables for better performance
- query-optimization.sql - Before/after query examples
# Install sysbench
apt-get install sysbench
# Run benchmark
./benchmarks/sysbench-test.sh
# Results will show:
# - Read/write throughput
# - Query latency (avg, 95th percentile, max)
# - Deadlocks and errors-- Enable slow query log
source benchmarks/query-profiling.sql;
-- Wait 24 hours, then check slow queries
SELECT * FROM mysql.slow_log
WHERE query_time > 1
ORDER BY query_time DESC
LIMIT 10;# Backup current config
cp /etc/mysql/mariadb.conf.d/50-server.cnf /etc/mysql/mariadb.conf.d/50-server.cnf.backup
# Copy optimized config
cp configs/production-optimized.cnf /etc/mysql/mariadb.conf.d/99-custom.cnf
# Restart MariaDB
systemctl restart mariadb
# Check if it started successfully
systemctl status mariadb-- Analyze your tables
source benchmarks/index-analyzer.sql;
-- Add recommended indexes
source migration-scripts/add-indexes.sql;Key settings that made the difference:
[mysqld]
# InnoDB Buffer Pool (most important setting)
# Set to 70-80% of available RAM for database-only servers
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8
# Query cache (disabled in MariaDB 10.6+, use at connection layer instead)
query_cache_type = 0
query_cache_size = 0
# Connection settings
max_connections = 500
max_connect_errors = 1000000
thread_cache_size = 50
table_open_cache = 4000
# InnoDB optimizations
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2 # Trade some durability for speed
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
# Temp tables
tmp_table_size = 64M
max_heap_table_size = 64MWhy these settings?
- innodb_buffer_pool_size: This is THE most important setting. Before: 128MB (default). After: 8GB. Query time dropped from 800ms to 200ms just from this.
- innodb_flush_log_at_trx_commit: Changed from 1 to 2. Reduces fsync calls. I lose max 1 second of transactions if server crashes, but gained 3x write speed.
- max_connections: Increased from 151 to 500 after monitoring connection spikes during peak hours.
- innodb_flush_method: O_DIRECT prevents double buffering (OS cache + InnoDB cache). Freed up 2GB RAM.
Master configuration:
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M
# Replication optimizations
sync_binlog = 0 # Performance over durability
innodb_flush_log_at_trx_commit = 2Slave configuration:
[mysqld]
server-id = 2
read_only = 1
relay_log = /var/log/mysql/relay-bin
relay_log_recovery = 1
# Slave optimizations
slave_parallel_threads = 4 # Parallel replication
slave_parallel_mode = optimistic-- Enable slow query log
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1; -- Log queries > 1 second
-- After running for a while, analyze
SELECT
sql_text,
COUNT(*) as execution_count,
AVG(query_time) as avg_time,
MAX(query_time) as max_time
FROM mysql.slow_log
GROUP BY sql_text
ORDER BY avg_time DESC
LIMIT 20;-- Tables with full table scans
SELECT
object_schema,
object_name,
count_star as total_queries,
count_read as full_scans
FROM performance_schema.table_io_waits_summary_by_table
WHERE count_read > 1000
ORDER BY count_read DESC;
-- Queries not using indexes
SELECT
digest_text,
count_star,
sum_rows_examined,
sum_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE sum_rows_examined > sum_rows_sent * 100
ORDER BY sum_rows_examined DESC
LIMIT 20;Should be > 99% for good performance:
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';
-- Calculate hit ratio:
-- ((Innodb_buffer_pool_read_requests - Innodb_buffer_pool_reads) /
-- Innodb_buffer_pool_read_requests) * 100I learned this the hard way:
-- โ WRONG: Separate indexes
CREATE INDEX idx_user ON orders (user_id);
CREATE INDEX idx_status ON orders (status);
-- โ
CORRECT: Composite index
CREATE INDEX idx_user_status ON orders (user_id, status);
-- This query benefits from composite index:
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
-- Index order matters! Put most selective column first
-- Exception: If you always filter by user_id, put it firstInclude frequently selected columns:
-- Instead of:
CREATE INDEX idx_email ON users (email);
-- Do this if you always SELECT name with email lookup:
CREATE INDEX idx_email_name ON users (email, name);
-- Query is entirely satisfied by index (no table lookup needed)
SELECT name FROM users WHERE email = 'user@example.com';These made the biggest difference:
-- Users table (40M+ rows)
CREATE INDEX idx_email ON users (email);
CREATE INDEX idx_created ON users (created_at);
CREATE INDEX idx_active_created ON users (is_active, created_at);
-- Tickets table (10M+ rows)
CREATE INDEX idx_user_status ON tickets (user_id, status);
CREATE INDEX idx_priority_created ON tickets (priority, created_at);
CREATE INDEX idx_assigned ON tickets (assigned_to, status);
-- Messages table (50M+ rows) - partitioned by date
CREATE INDEX idx_ticket ON messages (ticket_id, created_at);
CREATE INDEX idx_user ON messages (user_id, created_at);
ALTER TABLE messages ADD FULLTEXT idx_content (content);For tables with time-series data:
-- Partition messages table by month
ALTER TABLE messages
PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
PARTITION p202401 VALUES LESS THAN (202402),
PARTITION p202402 VALUES LESS THAN (202403),
PARTITION p202403 VALUES LESS THAN (202404),
-- Add partitions as needed
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Query performance improvement:
-- Before: 12s for date range query
-- After: 0.8s (only scans relevant partitions)-- Check table sizes
SELECT
table_schema,
table_name,
ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb,
table_rows
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'performance_schema', 'mysql')
ORDER BY (data_length + index_length) DESC
LIMIT 20;
-- Check fragmentation
SELECT
table_name,
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
ROUND(data_free / 1024 / 1024, 2) AS free_mb,
ROUND((data_free / data_length) * 100, 2) AS fragmentation_pct
FROM information_schema.tables
WHERE table_schema = 'your_database'
AND data_free > 0
ORDER BY fragmentation_pct DESC;
-- Optimize fragmented tables
OPTIMIZE TABLE your_table;# Analyze tables for query optimizer
mysqlcheck --analyze --all-databases
# Check and repair tables
mysqlcheck --check --all-databases
# Update statistics
mysqlcheck --optimize --all-databases-- On slave server
SHOW SLAVE STATUS\G
-- Check these values:
-- Seconds_Behind_Master: Should be < 5
-- Slave_IO_Running: Should be Yes
-- Slave_SQL_Running: Should be Yes- Buffer pool size is everything - Increased from 128MB to 8GB, instant 4x speedup
- Composite indexes over multiple single indexes - Reduced query time by 10x
- Partitioning time-series data - Made old data queries usable again
- Connection pooling at app layer - Reduced connection overhead by 80%
- Read replicas for reports - Offloaded heavy analytics from master
- Query cache - Caused more problems than it solved, disabled it
- Too many indexes - Slowed down writes, removed 30% of indexes with no impact on reads
- Over-tuning - Spent days optimizing queries that ran 10 times/day
- Premature sharding - Added complexity without benefits, went with replication instead
- Not backing up before config changes - Crashed production at 2 AM. Lesson learned.
- Changing multiple settings at once - Couldn't tell what helped. Change one thing at a time.
- Ignoring slow query log - Optimized the wrong queries. Always profile first.
- Setting innodb_buffer_pool_size too high - Left no RAM for OS. Caused swapping. Bad.
- Never disable binary logging in production - You'll need it for point-in-time recovery
- Use separate replication user with minimal privileges
- Enable SSL for replication traffic
- Regular backups before making changes
- Test in staging before applying to production
- MariaDB Performance Tuning
- MySQL High Performance - Best book on the subject
- Percona Monitoring - Free monitoring tool
These configs worked for my workload (OLTP with 300 concurrent users, 2M+ queries/day). Your workload might be different. Always:
- Benchmark before and after
- Change one setting at a time
- Test in staging first
- Monitor for at least 24 hours after changes
MIT - Use at your own risk. Always backup before making changes.
These optimizations went into production in 2024 on the Land of Promise Minecraft network. Still running strong today.