Skip to content

AIKUSAN/docker-kubernetes-automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MariaDB Optimization Guide

Real-world performance tuning from scaling a MariaDB cluster to handle 2M+ queries/day. This isn't theory - it's what actually worked in production.

๐Ÿ“Š Results

Before optimization:

  • Query time: 800ms average
  • Slow queries: 15-20% of total
  • Max concurrent connections: 50
  • Frequent deadlocks
  • Server load during peak: 8.5

After optimization:

  • Query time: 35ms average (96% improvement)
  • Slow queries: < 1%
  • Max concurrent connections: 500
  • Zero deadlocks (proper indexing + isolation levels)
  • Server load during peak: 2.1

Real impact: Scaled from 100 users to 300+ users on the same hardware.

๐ŸŽฏ What's Inside

Configuration Files

  • production-optimized.cnf - Production config for high-traffic apps
  • replication-master.cnf - Master server optimizations
  • replication-slave.cnf - Slave server tuning for read replicas
  • development.cnf - Local development setup

Benchmarking

  • sysbench-test.sh - Benchmark your database
  • query-profiling.sql - Find slow queries and bottlenecks
  • index-analyzer.sql - Identify missing indexes
  • performance-report.sh - Generate comprehensive performance report

Migration Scripts

  • add-indexes.sql - Indexes that made the biggest difference
  • table-partitioning.sql - Partition large tables for better performance
  • query-optimization.sql - Before/after query examples

๐Ÿš€ Quick Start

1. Benchmark Current Performance

# Install sysbench
apt-get install sysbench

# Run benchmark
./benchmarks/sysbench-test.sh

# Results will show:
# - Read/write throughput
# - Query latency (avg, 95th percentile, max)
# - Deadlocks and errors

2. Analyze Your Queries

-- Enable slow query log
source benchmarks/query-profiling.sql;

-- Wait 24 hours, then check slow queries
SELECT * FROM mysql.slow_log 
WHERE query_time > 1 
ORDER BY query_time DESC 
LIMIT 10;

3. Apply Optimizations

# Backup current config
cp /etc/mysql/mariadb.conf.d/50-server.cnf /etc/mysql/mariadb.conf.d/50-server.cnf.backup

# Copy optimized config
cp configs/production-optimized.cnf /etc/mysql/mariadb.conf.d/99-custom.cnf

# Restart MariaDB
systemctl restart mariadb

# Check if it started successfully
systemctl status mariadb

4. Add Missing Indexes

-- Analyze your tables
source benchmarks/index-analyzer.sql;

-- Add recommended indexes
source migration-scripts/add-indexes.sql;

๐Ÿ“ Configuration Breakdown

production-optimized.cnf

Key settings that made the difference:

[mysqld]
# InnoDB Buffer Pool (most important setting)
# Set to 70-80% of available RAM for database-only servers
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8

# Query cache (disabled in MariaDB 10.6+, use at connection layer instead)
query_cache_type = 0
query_cache_size = 0

# Connection settings
max_connections = 500
max_connect_errors = 1000000
thread_cache_size = 50
table_open_cache = 4000

# InnoDB optimizations
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2  # Trade some durability for speed
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1

# Temp tables
tmp_table_size = 64M
max_heap_table_size = 64M

Why these settings?

  • innodb_buffer_pool_size: This is THE most important setting. Before: 128MB (default). After: 8GB. Query time dropped from 800ms to 200ms just from this.
  • innodb_flush_log_at_trx_commit: Changed from 1 to 2. Reduces fsync calls. I lose max 1 second of transactions if server crashes, but gained 3x write speed.
  • max_connections: Increased from 151 to 500 after monitoring connection spikes during peak hours.
  • innodb_flush_method: O_DIRECT prevents double buffering (OS cache + InnoDB cache). Freed up 2GB RAM.

Replication Setup

Master configuration:

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
expire_logs_days = 7
max_binlog_size = 100M

# Replication optimizations
sync_binlog = 0  # Performance over durability
innodb_flush_log_at_trx_commit = 2

Slave configuration:

[mysqld]
server-id = 2
read_only = 1
relay_log = /var/log/mysql/relay-bin
relay_log_recovery = 1

# Slave optimizations
slave_parallel_threads = 4  # Parallel replication
slave_parallel_mode = optimistic

๐Ÿ” Finding Performance Bottlenecks

Check Slow Queries

-- Enable slow query log
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1;  -- Log queries > 1 second

-- After running for a while, analyze
SELECT 
    sql_text,
    COUNT(*) as execution_count,
    AVG(query_time) as avg_time,
    MAX(query_time) as max_time
FROM mysql.slow_log
GROUP BY sql_text
ORDER BY avg_time DESC
LIMIT 20;

Find Missing Indexes

-- Tables with full table scans
SELECT 
    object_schema,
    object_name,
    count_star as total_queries,
    count_read as full_scans
FROM performance_schema.table_io_waits_summary_by_table
WHERE count_read > 1000
ORDER BY count_read DESC;

-- Queries not using indexes
SELECT 
    digest_text,
    count_star,
    sum_rows_examined,
    sum_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE sum_rows_examined > sum_rows_sent * 100
ORDER BY sum_rows_examined DESC
LIMIT 20;

Monitor Buffer Pool Hit Ratio

Should be > 99% for good performance:

SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';

-- Calculate hit ratio:
-- ((Innodb_buffer_pool_read_requests - Innodb_buffer_pool_reads) / 
--   Innodb_buffer_pool_read_requests) * 100

๐Ÿ’ก Indexing Strategy

Composite Index Rules

I learned this the hard way:

-- โŒ WRONG: Separate indexes
CREATE INDEX idx_user ON orders (user_id);
CREATE INDEX idx_status ON orders (status);

-- โœ… CORRECT: Composite index
CREATE INDEX idx_user_status ON orders (user_id, status);

-- This query benefits from composite index:
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Index order matters! Put most selective column first
-- Exception: If you always filter by user_id, put it first

Covering Indexes

Include frequently selected columns:

-- Instead of:
CREATE INDEX idx_email ON users (email);

-- Do this if you always SELECT name with email lookup:
CREATE INDEX idx_email_name ON users (email, name);

-- Query is entirely satisfied by index (no table lookup needed)
SELECT name FROM users WHERE email = 'user@example.com';

Indexes I Added

These made the biggest difference:

-- Users table (40M+ rows)
CREATE INDEX idx_email ON users (email);
CREATE INDEX idx_created ON users (created_at);
CREATE INDEX idx_active_created ON users (is_active, created_at);

-- Tickets table (10M+ rows)
CREATE INDEX idx_user_status ON tickets (user_id, status);
CREATE INDEX idx_priority_created ON tickets (priority, created_at);
CREATE INDEX idx_assigned ON tickets (assigned_to, status);

-- Messages table (50M+ rows) - partitioned by date
CREATE INDEX idx_ticket ON messages (ticket_id, created_at);
CREATE INDEX idx_user ON messages (user_id, created_at);
ALTER TABLE messages ADD FULLTEXT idx_content (content);

๐Ÿ”ง Table Partitioning

For tables with time-series data:

-- Partition messages table by month
ALTER TABLE messages
PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
    PARTITION p202401 VALUES LESS THAN (202402),
    PARTITION p202402 VALUES LESS THAN (202403),
    PARTITION p202403 VALUES LESS THAN (202404),
    -- Add partitions as needed
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Query performance improvement:
-- Before: 12s for date range query
-- After: 0.8s (only scans relevant partitions)

๐Ÿ“ˆ Monitoring & Maintenance

Daily Health Check

-- Check table sizes
SELECT 
    table_schema,
    table_name,
    ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb,
    table_rows
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'performance_schema', 'mysql')
ORDER BY (data_length + index_length) DESC
LIMIT 20;

-- Check fragmentation
SELECT 
    table_name,
    ROUND(data_length / 1024 / 1024, 2) AS data_mb,
    ROUND(data_free / 1024 / 1024, 2) AS free_mb,
    ROUND((data_free / data_length) * 100, 2) AS fragmentation_pct
FROM information_schema.tables
WHERE table_schema = 'your_database'
AND data_free > 0
ORDER BY fragmentation_pct DESC;

-- Optimize fragmented tables
OPTIMIZE TABLE your_table;

Weekly Tasks

# Analyze tables for query optimizer
mysqlcheck --analyze --all-databases

# Check and repair tables
mysqlcheck --check --all-databases

# Update statistics
mysqlcheck --optimize --all-databases

Monitor Replication Lag

-- On slave server
SHOW SLAVE STATUS\G

-- Check these values:
-- Seconds_Behind_Master: Should be < 5
-- Slave_IO_Running: Should be Yes
-- Slave_SQL_Running: Should be Yes

๐ŸŽ“ Lessons Learned

What Worked

  1. Buffer pool size is everything - Increased from 128MB to 8GB, instant 4x speedup
  2. Composite indexes over multiple single indexes - Reduced query time by 10x
  3. Partitioning time-series data - Made old data queries usable again
  4. Connection pooling at app layer - Reduced connection overhead by 80%
  5. Read replicas for reports - Offloaded heavy analytics from master

What Didn't Work

  1. Query cache - Caused more problems than it solved, disabled it
  2. Too many indexes - Slowed down writes, removed 30% of indexes with no impact on reads
  3. Over-tuning - Spent days optimizing queries that ran 10 times/day
  4. Premature sharding - Added complexity without benefits, went with replication instead

Mistakes I Made

  1. Not backing up before config changes - Crashed production at 2 AM. Lesson learned.
  2. Changing multiple settings at once - Couldn't tell what helped. Change one thing at a time.
  3. Ignoring slow query log - Optimized the wrong queries. Always profile first.
  4. Setting innodb_buffer_pool_size too high - Left no RAM for OS. Caused swapping. Bad.

๐Ÿ” Security Notes

  • Never disable binary logging in production - You'll need it for point-in-time recovery
  • Use separate replication user with minimal privileges
  • Enable SSL for replication traffic
  • Regular backups before making changes
  • Test in staging before applying to production

๐Ÿ“š Resources

๐Ÿค Contributing

These configs worked for my workload (OLTP with 300 concurrent users, 2M+ queries/day). Your workload might be different. Always:

  1. Benchmark before and after
  2. Change one setting at a time
  3. Test in staging first
  4. Monitor for at least 24 hours after changes

๐Ÿ“„ License

MIT - Use at your own risk. Always backup before making changes.


These optimizations went into production in 2024 on the Land of Promise Minecraft network. Still running strong today.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages