What's in *Your* Binlog?

Over the last couple of months I have run into a number of replication problems where I needed to run reports on MySQL binlogs to understand what sort of updates servers were processing as well as to compute peak and average throughput.   It seems that not even Maatkit has a simple tool to report on binlog contents, so I wrote a quick Perl script called analyze-binlog.pl to summary output from mysqlbinlog, which is the standard tool to dump binlogs to text. 

The script operates as a filter with the following syntax:
Usage: ./binlog-analyze.pl [-h] [-q] [-v]
Options:
-h : Print help
-q : Suppress excess output
-v : Print verbosely for debugging
To get a report, you just run mysqlbinlog on a binlog file and pipe the results into analyze-binlog.pl.  Here is typical invocation and output.  The -q option keeps the output as short as possible.
$ mysqlbinlog /var/lib/mysql/mysql-bin.001430 | ./binlog-analyze.pl -q
===================================
| SUMMARY INFORMATION |
===================================
Server Version : 5.0.89
Binlog Version : 4
Duration : 1:03:37 (3817s)

===================================
| SUMMARY STATISTICS |
===================================
Lines Read : 17212685
Events : 3106006
Bytes : 1073741952
Queries : 2235077
Xacts : 817575
Max. Events/Second: 5871.00
Max. Bytes/Second : 1990077.00
Max. Event Bytes : 524339
Avg. Events/Second: 813.73
Avg. Bytes/Second : 281305.20
Avg. Queries/Sec. : 585.56
Avg. Xacts/Sec. : 214.19
Max. Events Time : 9:01:02

===================================
| EVENT COUNTS |
===================================
Execute_load_query : 10
Intvar : 53160
Query : 2235077
Rotate : 1
Start : 1
User_var : 182
Xid : 817575

===================================
| SQL STATEMENT COUNTS |
===================================
begin : 817585
create temp table : 0
delete : 31781
insert : 20
insert into : 411266
select into : 0
update : 633857

There are lots of things to see in the report, so here are a few examples.  For one thing, peak update rates generate 5871 events and close to 2Mb of log output per second.  That's loaded but not enormously so--MySQL replication can easily dump over 10,000 events per second into the binlog using workhorse 4-core machines.  The application(s) connected to the database execute a large number of fast, short transactions--typical of data logging operations, for example storing session data.  We can also see from the Execute_load_query events that somebody executed MySQL LOAD DATA INFILE commands.  That's interesting to me because we are just putting them into Tungsten and need to look out for them in user databases.

To interprete the binlog report most effectively, you need to understand MySQL binlog event types.  MySQL replication developers have kindly provided a very helpful description of the MySQL binlog format that is not hard to read.  You'll need to refer to it if you get very deeply into binlog analysis.  It certainly beats reading the MySQL replication code, which is a bit of a thicket.

Anyway, I hope this script proves useful.  As you may have noted from the URL the script is checked into the Tungsten project on SourceForge and will be part of future releases.  I plan to keep tweaking it regularly to add features and fix bugs.  Incidentally, if you see any bugs let me know.  There are without doubt a couple left.