Quick Installation of Replication from MySQL to MongoDB

Proof-of-concept Tungsten support for MongoDB arrived last May, when I posted about our hackathon effort to replicate from MySQL to MongoDB.  That code then lay fallow for a few months while we worked on other things like parallel replication, but the period of idleness has ended.  Earlier this week I checked in fixes to Tungsten Replicator to add one-line installation support for MongoDB slaves.

MySQL to MongoDB replication will be officially supported in the Tungsten Replicator 2.0.5 build, which will be available in a few weeks.  However, you can try out MySQL to MongoDB replication right now.  Here is a quick how-to using my lab hosts logos1 for the MySQL master and logos2 for the MongoDB slave. 

1. Download the latest development build of Tungsten Replicator.   See the nightly builds page for S3 URLs.

$ cd /tmp
$ wget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.5-332.tar.gz

2. Untar and cd into the release. 

$ tar -xzf tungsten-replicator-2.0.5-332.tar.gz
$ cd tungsten-replicator-2.0.5-332

3. Install a MySQL master replicator on a host that has MySQL installed and is configured to use row replication, i.e. binlog_format=row.  Note that you need to enable the colnames and pkey filters.  These add column names to row updates and eliminate update and delete query columns other than those corresponding to the primary key, respectively. Last but not least, ensure strings are converted to Unicode rather than transported as raw bytes, which we have to do in homogeneous MySQL replication to finesse character set issues.  

$ tools/tungsten-installer --master-slave -a \
  --datasource-type=mysql \
  --master-host=logos1  \
  --datasource-user=tungsten  \
  --datasource-password=secret  \
  --service-name=mongodb \
  --home-directory=/opt/continuent \
  --cluster-hosts=logos1 \
  --mysql-use-bytes-for-string=false \
  --svc-extractor-filters=colnames,pkey \
  --svc-parallelization-type=disk --start-and-report

4. Finally, install a MongoDB slave.  Before you do this, ensure mongod 1.8.x is up and running on the host as described in the original blog post on MySQL to MongoDB replication.   My mongod is running on the default port of 27017, so there is no --slave-port option necessary. 

$ tools/tungsten-installer --master-slave -a \
  --datasource-type=mongodb \
  --master-host=logos1  \
  --datasource-user=tungsten  \
  --datasource-password=secret  \
  --service-name=mongodb \
  --home-directory=/opt/continuent \
  --cluster-hosts=logos2 \
  --skip-validation-check=InstallerMasterSlaveCheck \
  --svc-parallelization-type=disk --start-and-report

That's it.  You test replication by logging into MySQL on the master, adding a row to a table, and confirming it reaches the slave.   First the SQL commands: 

$ mysql -utungsten -psecret -hlogos1 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
...
mysql> create table bar(id1 int primary key, data varchar(30));
Query OK, 0 rows affected (0.15 sec)

mysql> insert into bar values(1, 'hello from mysql');
Query OK, 1 row affected (0.00 sec)

Now check the contents of MongoDB:  

$ mongo logos2:27017/test
MongoDB shell version: 1.8.3
connecting to: logos2:27017/test
system.indexes
> db.bar.find()
{ "_id" : ObjectId("4e85269484aef8fcae4b0010"), "id1" : "1", "data" : "hello from mysql" }

Voila!  We may still have bugs, but at least MySQL to MongoDB replication is now easy to install.   

Speaking of bugs, I have been fixing problems as they pop up in testing.  The most significant improvement is a feature I call auto-indexing on MongoDB slaves.  MongoDB materializes collections automatically when you put in the first update, but it does nothing about indexes.  My first TPC-B runs processed less than 100 transactions per second on the MongoDB slave, which is pretty pathetic. The bottleneck is due to MongoDB update operations of the form 'db.account.findAndModify(myquery,mydoc)'.  You must index properties used in the query or things will be very slow.   

Auto-indexing cures the update bottleneck by ensuring that there is an index corresponding to the SQL primary key for any table that we update.  MongoDB makes this logic very easy to implement--you can issue a command like 'db.account.ensureIndex({account_id:1})' to create an index.  What's really cool is that MongoDB will do this even if the collection is not yet materialized--e.g., before you load data.   It seems to be another example of how MongoDB collections materialize whenever you refer to them, which is a very useful feature.  

TPC-B updates into MongoDB are now running at over 1000 transactions per second on my test hosts. I plan to fix more bugs and goose up performance still further over the next few weeks.  Through MongoDB we are unlearning assumptions within Tungsten that are necessary to work with non-relational databases.  It's great preparation for big game hunting next year:  replication to HBase and Cassandra.  

Innovation Hobnob and Articles

"In one way or another the American is an improvisation, the character in a play of his or her own invention, hoping that the audience--fortunately consisting of actors as makeshift as oneself--will accept the performance at par, believe the instructions." ~Lewis Lapham

It's been a while since I hobnobbed with the innovation crowd here in Chicago, so I wandered into the House of Blues on Monday night to check out the 500+ attendees of the Chicago Innovation Awards' annual Nominee Reception. With more than 400 organizations nominated for their new products and services this year, the Chicago Innovation Awards, now in its 10th year, does a yeoman's job shining a spotlight on creativity in Chicagoland industries.

While Chicago may or may not be the innovation hub it desires to be, I enjoyed speaker John Barron, publisher of the Chicago Sun-Times, wax eloquent about the grand innovations of this midwestern home of mine, pointing to our brave history of reversing the flow of our river, inventing public conversation and public sobbing (Oprah), with a brisk wind chill to focus our thoughts.

The truth is, innovation continues to be a leading conversation topic in cities, within companies and among politicians and writers throughout this country. And the innovation imperative remains strong--we must continue to change and invent, as we always have. As Lewis Lapham wrote in Harper's earlier this year, what truly unites Americans is not their pride or armies or GDP or common ancestry "but rather their complicity in a shared work of the imagination...If America is about nothing else, it is about making it up as one goes along."

Here are some recent articles from thinkers and improvisers trying to steer us through a bumpy ride of needed innovation:

*Tom Friedman is back with a new book, That Used To Be Us: How America Fell Behind in the World It Invented and How We Can Come Back. Click here for a link to a free chapter, interviews and more.

*Harvard Business School's Teresa Amabile (one of the leading researchers on creativity and one of my mentors) recently published a book, The Progress Principle: Using Small Wins to Ignite Joy, Engagement, and Creativity at Work--read more about it here. More from me on it in the near future.

*Did you miss Fast Company's 100 Most Creative People in Business issue? Check out the list here and a great guide to creativity by Conan O'Brien here.

*How did 9/11 spawn creativity and innovation? Read this Inc. article here.

*Innovation is dead, say PayPal founders. Check out this Forbes article here.

*Can innovation be part of a small company's every day routine? Read this Crain's Chicago Business article. And check out the video below--can songwriting techniques help business?


Apple Grows Tablet Market Share

"Apple Inc. (NASDAQ:: AAPL) continues its success story with the iPad 2, as it has emerged as the leading tablet maker in the second quarter of 2011 in terms of global shipments. Research firm IDC confirmed that 9.3 million units of Apple’s iPad 2 being shipped during the quarter, netting a 68.3% market share worldwide, up from 65.7% in the previous quarter."
Bottomline: Apple iPad sales will explode in the third quarter. IDC estimates at least 62 million in tablet shipments in 2011. With Apple gaining 70% market share, this means at least 45 million iPads will be sold in 2011. Apple has sold little over 14 millions iPads through the first two quarters. This means, another 31 million iPads will be sold in the last two quarters. Recently, Apple has upped its local TV iPad 2 advertisements, and is poised to sell significant numbers of iPad 2 at home in the US, and worldwide! Apple will also announce the iPad later in the 4th quarter, along with the shipment of iPhone 5. Finally, Apple iCloud is also rolling out in the 4th quarter. All eyes on Apple to deliver another home run out of the ball park!

How does Apple, the #1 innovative company in the world, innovate and create game-changing innovations such as the iPod, iTunes, iPhone, iPad and more? What is Apple's secret recipe for innovation success?


Download Apple's Innovation Strategy, and learn how Apple became the #1 innovator through:

• Creativity and Innovation
• Innovation in Products
• Innovation in Business Model
• Innovation in Customer Experience
• Innovation and Leadership
• Steve Jobs Visionary Leadership
Revised in 2011! Steve Jobs interview

Learn more...






References:http://finance.yahoo.com/news/Apple-Leads-Tablet-zacks-76625781.html

What's Next for Tungsten Replicator

As Giuseppe Maxia recently posted we released Tungsten Replicator 2.0.4 this week.  It has a raft of bug fixes and new features of which one-line installations are the single biggest improvement.  I set up replicators dozens of times a day and having a single command for standard cluster topologies is a huge step forward.  Kudos to Jeff Mace for getting this nailed down.

So what's next?  You can get see what we are up to in general by looking at our issues list.  We cannot do everything at once, but here are the current priorities for Tungsten Replicator 2.0.5.
  • Parallel replication speed and robustness.  I'm currently working on eliminating choke points in performance (like this one) as well as eliminating corner cases that cause the replicator to require manual intervention, such as aging out logs that are still needed by slaves.  
  • Multi-master replication.  This includes better support for system of record architectures, many masters to one slave, and replication between the same databases on different sites.  Stephane Giron nailed a key MyISAM multi-master bug for the last release.  We will continue to polish this as we work through our current projects.   
  • Better installations for more types of databases.  Jeff recently hacked in support for PostgreSQL as well as Oracle slaves, and we are contemplating addition of MongoDB support.  Heterogeneous replication is getting simpler to set up.  
  • Filter usability.  Giuseppe has a list of improvements for filters, which are one of the most powerful Tungsten Replicator features but not as easy for non-developers to use as we would like.  Better installation support is first on the list followed by ability to load and unload dynamically.  
  • Data warehouse loading.  We have a design for fast data warehouse loading that I hope we'll be able to implement in the next few weeks.  Linas Virbalas has also been working on this problem along with a number of other heterogeneous projects for customers.  
This is a lot of work and not everything will necessarily be finished when 2.0.5 goes out.  However, I hope we'll make progress on all of them.  In case you are wondering how we pick things, replicator development is largely driven by customer projects.   If you have something you need in the replicator, please contact Continuent.

After this build we will... Er, let's get 2.0.5 done first.  Suffice it to say we have a long list of useful and interesting features to discuss in future blog articles.

The Inimitable Mr. Steven Jobs

There have been countless articles praising Steve Jobs since he announced his retirement from Apple on August 25th.  Most either catalogue Steve Job's many triumphs or assess the impact of his creativity on society.  Those are entertaining topics but not especially useful.  A more practical question is why Steve Jobs is so good at creating new products and whether the rest of us can imitate him.

Steve Job's best work seems to follow a repeated pattern.  Let's call it the Apple pattern, though of course it could just as well be the Pixar pattern or Next pattern:
  1. See the whole picture of some crucial human/technology interaction and recognize gaps.  
  2. Design products to fill those gaps that combine artistic sensibility and innovative technology.
  3. Get a large organization to implement designs in a way that makes the end result like the handiwork of a single highly-focused craftsman. 
    Two things about the pattern seem particularly striking.  First, Steve Jobs is a complete package.  I have been in the tech industry for over three decades and have met people who did one or at most two of these things at the level necessary to create products that move large markets.  Almost nobody does all three.  The fact that Steve is excellent in all areas simultaneously may be a root cause behind his long run of successes.

    Second, Job's ability to drive implementation teams is extraordinary.  Maybe it's just the manager in me, but I find his ability to pick the right people to run teams and to keep those teams pointed in a clear direction without product-destroying compromises quite remarkable.  This is far harder than generating ideas in the first place.  The heart of the Apple pattern as as much about understanding people as technology--not just users but the creators as well.  I have never heard Jobs make pronouncements on team management, but there is an excellent talk from Ed Catmull of Pixar that summarizes the tensions quite well.

    Steve Jobs is commonly compared to great inventors like Edison, Ford, and Disney.  When thinking about imitation, another parallel seems more illuminating:  John Churchill, Duke of Marlborough and hands-down the greatest English general of all time.


    A possible Jobs ancestor?
    Marlborough possessed a seldom equalled ability to see war as an integrated whole across geography and branches of arms, devise unexpected strategies to exploit the weaknesses of his enemies, and execute them flawlessly in the difficult conditions of early 18th Century campaigns.  Execution extended from handling fractious allies down to the painstaking work to ensure his men had proper meals after each day's march.  In other words:  analogous problem-solving abilities to Steve Jobs, translated into the field of warfare.   The parallel extends to the lavish praise of contemporaries and later historians.  Winston Churchill famously described Marlborough as follows.  
    He commanded the armies of Europe against France for ten campaigns. He fought four great battles and many important actions ... He never fought a battle that he did not win, nor besieged a fortress that he did not take ... He quitted war invincible.
    Grand problem-solvers like Marlborough and Jobs are sufficiently rare they tend to be one-offs who change society but leave no obvious successors.  English military superiority on the Continent waned after Marlborough's retirement.  Something similar will likely befall Apple after Jobs, current happy talk about product pipelines and cash position notwithstanding.  It is simply not possible to imitate Jobs by committee, which is effectively what will happen once he is completely absent.  The driving force is gone.

    That said, we can all imitate Steve Jobs, albeit on a smaller scale.  Many highly successful products start with a single person who conceives the idea and drives at least the first couple of iterations to completion.  Seeing the whole problem, applying innovative designs to solve it, and managing the team to get it done is a fundamental pattern that applies across a wide range of endeavors.  Here is just one of many examples.

    Many years ago at Sybase I worked for a manager named Mark Deppe.  Early in the 1990s Mark learned that Wall Street firms were patching together crude publish/subscribe messaging applications to move data between financial systems in order to speed up trades.   He recognized that there was a much better way to do this using log-based data replication and built the Sybase Replication Server product.  The Rep Server went on to generate hundreds of millions of dollars in sales.  It still sells well today, over 15 years later.  Mark was a great architect but also a great builder of teams.  He paid as much or more attention to hiring and managing people as he did to technology.  He trusted the people he hired, and he gave them the freedom and support to do great work.  At the same time Mark was also incredibly attentive to detail and did all the project management for the first releases himself.  Years later he said it was too important a task to hand off to anyone else.

    Mark Deppe was the best technical manager I ever worked for.  I have consciously imitated his best practices for many years.  Looking back it seems I was unconsciously imitating the Apple design pattern.  But perhaps that was not a complete coincidence.   Before joining Sybase Mark was at Apple where he worked with (guess who?) Steve Jobs.

    ------------------
    NOTE:  After this article was published I found the flow hard to understand and edited it a week or so later to make it more readable.  The argument is the same as before.