Falcon Feature Preview
Contents |
[edit] Introduction
This Feature Preview is intended to allow the community to test some important performance improvements to the Falcon Database. The performance changes are listed below along with key bug fixes.
[edit] Version 6.0.7
Online Add & Drop Index
Until this version, adding or dropping an index made the table unavailable for the duration of the operation. Adding an index meant that the table was unavailable while MySQL first copied the data from table to a temporary table, then recreated the existing indexes, then created the new index, then dropped the old table, then renamed the temporary table to the original table name. Dropping an index also required copying all the data and recreating all the surviving indexes.
Starting with 6.0.7, Falcon creates indexes on the fly, without interrupting read/write access to to the table. A request to drop an index will block until any queries using that index have completed, then the space used by the index is released without affecting other access.
Page Checksum Protection
By default, Falcon now checksums pages before writing them to disk and validates the checksum after reading the page. To turn off checksumming, set the parameter falcon_checksums to 'OFF'.
Serial log File Truncation
Falcon writes its serial log to two files, alternately. While the Falcon front end writes to one file, the gopher threads move committed changes from the inactive file to the database. When all changes from the inactive file are written to the database, Falcon switches and begins writing to the formerly inactive file, overwriting the obsolete entries there. Under some circumstances, the switch may be delayed, causing the active file to become very large. A new parameter, falcon_serial_log_file_size, causes Falcon to truncate the inactive file immediately before switching to it. The default is 10 megabytes.
[edit] Version 6.0.6
Optimize Limit Queries Normally Falcon performs index look-ups in two stages: first it reads the index, setting bits in a sparse bitmap to indicate records that meet the indexed criteria, then it uses the bitmap to read the record data. That algorithm allows Falcon to make separate passes over the index and data, and insures that data is read in storage order. However, it also means that the rows are returned in storage order, not index order.
For queries that use LIMIT and some queries that use GROUP BY or DISTINCT, the MySQL optimizer produces better plans if rows are returned in index order. Starting in 6.0.6, Falcon can return rows in index order if the optimizer indicates that the query will run faster.
Online Add & Drop Column
This feature allows Falcon to add columns to an existing table without copying the table. Unfortunately there is a bug in the implementation and the feature is temporarily unavailable.
Record Cache Backlogging
Although it uses Multi-Version Concurrency Control (MVCC), Falcon stores only the most recently committed version of each record in the database. Old versions of records and uncommitted records are kept in the Record Cache in memory. While a transaction is active, Falcon must keep older record versions even after new versions are committed to support repeatable read. Prior to V6.0.6, long running transactions could cause Falcon to preserve huge numbers of old record versions, that could eventually fill the Record Cache completely. This feature allows chains of record versions, complete with their RecordVersion objects to be written out to a special tablespace.
Tests of very high concurrency and tests containing longer transactions could get a "Record Memory Exhausted" error when the record cache filled with record versions.
The Record backlog is a tablespace designed to hold record versions from any and all tables. It contains a single table, indexed by the table identifier and record number of the chain of versions being stored. Both writing to the backlog and retrieving records from the backlog are expensive operations compared with accessing them from the Record Cache. The backlog allows Falcon to continue operation in a low memory situation, but it should not be used to as an alternative to an adequately sized Record Cache. It is an emergency backup of records from the Record Cache, and its use should be considered a stopgap like swapping or paging from system memory.
[edit] Version 6.0.5
Several performance improvements were made in this release (downloads)
Supernodes
Supernodes are an array of 16 vectors into each index page to keys that are fully expanded with noprefix compression. This allows the page to be searched quicker using a binary search of supernode keys followed by the normal sequential search. Previously, the whole page was searched using a sequential search. This reduced the time spent searching key pages tremendously.
Thread Signaling
Bug#34890 documents a problem that existed in very high concurrency situations in which a thread that waits on a SyncObject could miss the signal and then wait on the next time that SyncObject was locked and unlocked. If the syncObject was a Transaction::syncObject, the transaction may complete and it may never get signaled. In this case there would be a false wait lock timeout. For other SyncObjects, the waiting thread would stall up to 10 seconds before retrying the lock. These hangs and stalls occur in boxes with multiple CPUs with high concurrency. They are very intermittent and timing dependent. Performance tests usually exhibit high deviation between identical runs with the overall result of lowering performance.
[edit] Version 6.0.4
Performance Improvements Previous alpha versions of Falcon showed performance problems in two areas that are addressed in the 6.0.4 release. The first was that forcing pages written by a checkpoint operation to disk took longer than the period between checkpoint operations. The second was that the front half of Falcon, where transactions run, tended to get significantly ahead of the back half which is responsible for integrating committed changes into the database. Fixing the first problem required a multi-step reworking of the I/O architecture of Falcon, which, to nobody's surprise, uncovered several surprises. Improving the speed of checkpoints helped the back end keep up with the front. Adding more gopher threads got the two ends running together again.
The measurements were made and the changes were tested on Linux. The solutions may change performance on Windows, and other platforms, but as of this moment we don't know whether the changes are beneficial elsewhere, let alone what the degree of change may be. We're more hopeful for *nix platforms. We will work on performance on Windows in the future. We know lots of tricks there, too.
Pool of Asynchronous I/O Threads
Having gone through all that effort, adding a group of threads writing in parallel seemed like a reasonable next step. That, of course, lead to the question, how many threads are enough? And that lead to the next parameter: falcon_io_threads which defaults to 2. When and if we discover a reliable algorithm for picking the right number of I/O threads, we would like to abandon this parameter.
Direct IO
When fsync was eliminated in version 6.0.3, Falcon used O_SYNC to force each page through the file system cache to disk. That lead to much discussion of the unnecessary cost of copying pages from the Falcon page cache to the system file cache. So O_SYNC was chanded to O_DIRECT. Then the two were compared and tested on different file systems. RAM disk doesn't support O_DIRECT, and at least one system showed 0_SYNC being faster than O_DIRECT, so we compromised, using O_DIRECT by default and falling back to O_SYNC, allowing an override with the parameter falcon_direct_io.
Pool of Gopher Threads
Falcon has a front end and a back end. The two run largely asynchronously. The front end handles transactions from start to durable commit. The front end uses the page cache to read data into the record cache. A running transaction makes data changes in the record cache and index changes in its deferred indexes. During a prepare or commit, the transaction's changes move to the serial log. Once the serial log is flushed to disk, the transaction is durable.
At that point, the back end of Falcon starts moving committed changes from the serial log into the database where they are easier to find. In fact, unmoved changes remain in memory and continue to be referenced there until the gopher gets them into the database. If the system crashes before the gopher has emptied the serial log, the recovery process picks up where the gopher left off and nothing much happens until all committed changes are on disk.
As the front end got faster, the back end lagged, causing bizarre performance problems. The strangest was that running a complex memory-intensive query on the Information Schema made the system faster. From our tests, we believe that the query tied up the front end, allowing the back end to finish pushing committed changes into the database and releasing the transactions that had committed but were kept around until they became "write complete" - meaning that their changes were in the database.
Part of the solution was to get the single gopher some friends to share the load. The number of gopher threads is governed by yet another parameter: falcon_gopher_threads which defaults to five.
Thread Scheduler
Changing to asynchronous I/O threads uncovered another interesting situation. Both reads into the cache and writes to the serial log starved while checkpoints ran along briskly. So Falcon developed yet another characteristic of an operating system: an I/O thread scheduler that gives different priority to different operations.
[edit] Version 6.0.3 - Alpha
Eliminate fsync
Prior to Version 6.0.3, Falcon used buffered I/O to write the pages flushed in a checkpoint operation. When all pages were written, Falcon used an fsync to force them to disk. The frequency of checkpoints is determined by the parameter falcon_checkpoint_schedule which defaults to once every 30 seconds. The fsync often took longer than 30 seconds. The next checkpoint waited for the first to complete. Delayed checkpoints tended to have more work than those that occurred on schedule, so the delays propagated. [Think air traffic control with a squall line going from Chicago to Atlanta.]
Page consolidation
The last of the I/O performance improvements in 6.0.3 is page consolidation. It is much faster to write a large contiguous block in a single operation than in a sequence of page sized writes. Unfortunately, the page cache doesn't necessarily keep pages in storage order. Nor should it. During the first pass over the cache, Falcon determines which of the pages to be written are actually contiguous. Those pages are moved to a write buffer and written in a single operation.
[edit] Feature Changes
[edit] New Settings
- FALCON_CONSISTENT_READ - Determines how repeatable read isolation is done when viewing new changes.
- FALCON_DIRECT_IO - Allows the user to select between O_DIRECT and O_SYNC
- FALCON_GOPHER_THREADS - Number of Gopher threads
- FALCON_IO_THREADS - Number of IO threads
- FALCON_LARGE_BLOB_THRESHOLD - Blobs below this threshold are stored in data pages instead of blob pages. This provides faster transaction durability since only the serial log needs to be written at the end of the transaction, not the blob pages.
- FALCON_LOCK_TIMEOUT - Specifies how long Falcon will make one transaction wait for another. Default = 0 which means indefinitely.
- FALCON_SERIAL_LOG_DIR - Allows the serial log to be placed on a separate disk.
- FALCON_SERIAL_LOG_PRIORITY - Allows the serial log to be written to at a higher priority.
[edit] Falcon Repeatable Read
The ISO SQL standard defines four isolation modes for transactions: Serializable, Repeatable Read, Read Committed, and Read Uncommitted. The definitions of the modes are based on the behavior of systems that lock records and ranges.
The ISO SQL Standard describes Repeatable Read transactions as having the isolation level provided by read/write record locks without locks on ranges. Reading the same record twice will always get the same value for its fields, but a select with the same criteria may get more records each time it runs. Oddly, the standard defines "Repeatable Read" as not repeatable.
However, Falcon and other engines that rely on Multi-Version Concurrency Control provide an isolation level that is completely repeatable but not serializable because it allows some update anomalies. This is the mode a Falcon Transaction gets when it chooses the Repeatable Read isolation level. Each transaction sees a stable snapshot of the records that were committed when the transaction started.
InnoDB's implementation of Repeatable Read includes an anomaly that causes a simple select statement to get different results from a select for update. A simple select sees the state of the database that was committed when the transaction began. A select for update sees all committed changes as of the instant it runs - effectively Read Committed mode. Update and delete statements also run in Read Committed mode.
This hybrid of Repeatable-Read and Read-Committed isolation levels improves throughput in a highly concurrent environment full of database updates. With careful coding, it also gets consistent results. In Falcon's normal Repeatable Read mode, a transaction cannot update or delete a record if it cannot select the most recent committed version. When the situation arises, the transaction gets an update conflict error and must rollback before the operation can succeed. By allowing the select for update, update, and delete statements to access the most recently committed version of records, InnoDB allows more transactions to succeed, at the cost of possible inconsistent results for improperly coded applications.
Falcon originally provided Repeatable-Read transactions that were consistent. In the 6.0.2 and 6.0.3 alpha releases, it emulated InnoDB. Now there is a setting in which you can choose between the two modes. The parameter falcon_consistent_read is on by default and provides truly repeatable reads. Turning the parameter off makes Repeatable Read transactions behave like InnoDB.
[edit] Bug Fixes
[edit] Bugs Fixed in Version 6.0.7
- Bug#35072 Falcon crash in RecoveryObjects::findRecoveryObject
- Bug#35939 Drift in Falcon row count reported by SHOW TABLE STATUS
- Bug#35991 Falcon assertion on TRUNCATE in Section::getSectionPage
- Bug#36703 Unknown symbol EncodedDataStream::decode when linking mysqld with Sun Studio
- Bug#36825 falcon_index_chill_threshold and falcon_record_chill_threshold have values in MB
- Bug#36990 Remove unsupported 'create tablespace' parameters and falcon_initial_allocation
- Bug#37622 Falcon does not compile on Solaris 9 on SPARC using Sun Studio compiler
- Bug#37726 Falcon crash in WalkDeferred::getNext
- Bug#38186 falcon_bug_31295 fails on pushbuild
- Bug#38535 AMD64 support for Falcon
- Bug#38556 Linking error when building Falcon as a shared library using Sun Studio compiler
- Bug#38594 Falcon crash in MemMgr and Sync object during exit of mysqld
- Bug#38743 falcon.falcon_tablespace_priv fails randomly on Windows
- Bug#38746 Falcon does not build on linux with valgrind enabled
[edit] Bugs Fixed in Version 6.0.6
- Bug#32287 Compiling mysql-6.0 on x86 assumes Falcon support
- Bug#32398 Falcon: tablespace file can be table file
- Bug#33933 Falcon assertionFailed()in fetchNext()
- Bug#34602 Falcon assertion in Transaction::commitNoUpdates
- Bug#35322 Falcon duplicate primary keys on updateable views
- Bug#35692 Running falcon_record_cache_memory_leak2-big.test crashes Falcon
- Bug#35768 Running DBT2 crashes Falcon; sometimes
- Bug#35929 MySQL 6.0.4 fails to compile with Sun Studio compiler due to using gcc options
- Bug#36097 Falcon: searches fail after repeated inserts
- Bug#36269 Thread stalls during DBT2 run
- Bug#36294 Assertion in Cache::writePage
- Bug#36296 Falcon: commitNoUpdates is sleeping too often
- Bug#36330 Falcon DBT2 crash in Transaction::needToLock
- Bug#36367 [Com,Apd->Doc]: Falcon DBT2 crash in Cache::ioThread
- Bug#36368 Compiling storage/falcon/BigInt.cpp fails using Sun Studio 12 compiler
- Bug#36396 Assertion in IO::pread (on closed tablespace file)
- Bug#36400 Compiling Falcon on Solaris 10/x86 fails with Sun Studio 12
- Bug#36403 Compiling Falcon on Solaris fails due to dtrace
- Bug#36438 Falcon crash in Record::poke
- Bug#36467 Falcon assertion in ha_partition.cc: virtual int ha_partition::extra()
- Bug#36486 Falcon compilation fails on Solaris 10/x86
- Bug#36603 Falcon; Performance drop when ageGroup hits 2^31
- Bug#36620 Legacy leftovers in Falcon startup I/O
- Bug#36636 Falcon; missing fsync when compiled with HAVE_PREAD
- Bug#36745 Falcon crash on solaris
- Bug#36991 falcon_max_transaction_backlog has no effect
- Bug#37078 falcon_bug_26828 fails sometimes on Pushbuild
- Bug#37080 Falcon deadlock on concurrent insert and truncate
- Bug#37251 Livelock between UPDATE and DELETE threads
- Bug#37343 Assertion in IndexNode::parseNode, ASSERT(key - (UCHAR*) indexNode < 14);
- Bug#37344 Crash in IndexWalker::rebalanceDelete
- Bug#37587 falcon_bug_33404.test hangs forever
- Bug#37679 Falcon does not compile on OpenSolaris/Nevada using the Sun Studio compiler
- Bug#37725 Falcon: assertion in waitForTransaction "waitingFor was not NULL"
[edit] Bugs Fixed in Version 6.0.5
- Bug#33041 Cannot set FALCON_CONSISTENT_READ for local session.
- Bug#33484 "Create table ... engine=falcon" fails with error 156 but Falcon is present
- Bug#34085 Create table on falcon hangs when it cannot allocate memory for the page cache
- Bug#34351 "Record has changed since last read" error on non-overlapping transactions
- Bug#34486 Problem setting falcon_record_chill_threshold and falcon_index_chill_threshold
- Bug#34567 Falcon deadlock between ALTERs of temporary and non-temporary tables
- Bug#34632 Falcon assertion in Table::checkUniqueRecordVersion
- Bug#34778 Possible memory leak running UPDATEs in tight loop
- Bug#34890 falcon_bug_22150.test fails on Pushbuild
- Bug#34990 Falcon: falcon_bug_34351_A & falcon_bug_34251_C fail periodically
- Bug#35490 FALCON_DATABASE_IO should be FALCON_TABLESPACE_IO
- Bug#35538 Falcon three-way deadlock between scavenger, gopher and an insert
- Bug#35688 Falcon: Crash recovery failure with blob
- Bug#35982 Falcon crashes on concurrent load data infile
[edit] Bugs Fixed in Version 6.0.4
- Bug#22125 Falcon: Double precision searches fail if index exists
- Bug#22168 Inserting bad early dates
- Bug#22173 TRUNCATE does not reset auto_increment counter
- Bug#22564 auto_increment column gets automatically incremented
- Bug#27424 Falcon: crash if case sensitive database names
- Bug#27425 Falcon: case sensitive table names
- Bug#27426 Falcon: searches fail if datetime column and index exists
- Bug#29151 Falcon: running sysbench 0.4.8 leads to duplicate key errors
- Bug#29211 Falcon: information_schema has a falcon_tables view
- Bug#29452 Falcon: two-way deadlock with unique index and trigger
- Bug#29823 Falcon falcon_database_io table doesn't report stats for user DB's
- Bug#30281 Falcon: missing privilege check for dropping tablespace
- Bug#31005 Falcon: setting falcon_serial_log_dir has no effect
- Bug#31045 Error in compiling Falcon 6.0.0.2 alpha FreeBSD
- Bug#31110 Falcon: missing engine check while dropping tablespace
- Bug#31114 Falcon: creating tablespace with same name twice returns Unknown error -103
- Bug#31286 Falcon crashes when falcon_record_memory_max is exceeded
- Bug#31296 Falcon does not remove associated tablespace file.
- Bug#31490 The funcs_1 test "falcon_func_view" fails due to differences in a datetime col
- Bug#31671 Falcon engine does not support ROW or STATEMENT binlog_format
- Bug#31967 Falcon: hang changing falcon_record_memory_max
- Bug#32191 Memory overrun when using join buffering for falcon table with a blob
- Bug#32194 Falcon: incorrect count of changed rows
- Bug#32413 Memory usage not constrained by falcon_record_memory_max, assertion failure
- Bug#33517 Falcon crash on recovery
[edit] Bugs Fixed in Version 6.0.3 - Alpha
- Bug#30826 Falcon - Crash if OPTIMIZE PARTITION of a file with no records
- Bug#29332 Falcon deadlocks when running falcon_bug_28026.test
[edit] Known Open Bugs
Try this link to access Open, Verified, Analyzing, and In Progress Falcon Bugs
[edit] Downloads
[edit] Binary packages
Preview builds for Linux x86-64 and Windows-32 are available for download from https://downloads.mysql.com/forge/falcon_feature_preview/
[edit] Sources
Source code is available from our public Bzr trees at [1] - please consult the reference manual for more information on how to build a MySQL binary from a source tree.
The source tree for this feature preview is also found here. This source has been compiled on Linux 32-bit and 64-bit, FreeBSD 32-bit and 64-bit, Mac/Intel and Mac/PPC, Windows 32-bit and 64-bit, and Solaris/x86 and Solaris/SPARC.