Migrate large MySQL Databases using AWS DMS

Submitted by Eric on Wed, 03/28/2018 - 16:59
AWS DMS

AWS Database Migration Service is a very powerful tool to help you migrate databases, but it can be very complicated to configure. We will go over some general information and also some details that I discovered after some trial and error (as I didn't find any information anywhere), but in the end AWS DMS can help you save a lot of time on a migration.
During some tests we have done a migration from a MySQL bare metal server to a AWS RDS instance running Aurora it would take from 5 - 7 hours. Using AWS DMS we saw as little as 1 hour and half for a 50GB database, this is a huge improvement. So enough chit chat, let's get to it

Things you need to configure

  • Replication Instance

    This is what initiates the connection between the source and target databases, transfers the data and caches any changes that occur on the source database during the initial data load. It's recommended to shut down or down size the instance after any testing or the migration itself.

    You will have to inform namedescriptioninstance class (size, see chart below), VPC, if you want Multi-AZ (HA and redundant, helps on performance), if publicly accessible (needed to connect to the outside world, not needed if connecting to RDS).

    There is the advanced settings that include allocated storagereplication subnet groupAZVPC security group and KMS master key. Last settings is the maintenance where you can select a window for minor version upgrade.

    Replication Instance Type

    vCPU

    Memory (GB)

    General Purpose

    dms.t2.micro

    1

    1

    dms.t2.small

    1

    2

    dms.t2.medium

    2

    4

    dms.t2.large

    2

    8

    Compute Optimized

    dms.c4.large

    2

    3.75

    dms.c4.xlarge

    4

    7.5

    dms.c4.2xlarge

    8

    15

    dms.c4.4xlarge

    16

    30


  • Endpoints

The endpoint is where the replication instance will connect to, either to a source or target endpoint. Source endpoint is the database you are migrating FROM and target endpoint is the database you are migrating TO.
Endpoints can be on-premise, on RDS or in EC2.

First thing to configure is the type of the endpoint, source or target. Next steps are endpoint identifier (a name to it), source/target engine (select what kind of database you are migrate from/to), server name (IP or domain to connect to the database), port, SSL mode (if this is selected, then you need to select/add the CA certificate in the next step), user name and password (both used to connect to the database).
In the advanced settings we have e

  • Extra connection attributes

Attributes can be very helpful during the migration. You can set to disable foreign_key_checks on the target database before the migration, so there is no error (this is a must for Magento).
Make sure to read the documentation here: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.ConnectionAttributes.html


  • Tasks

A task is where all the work happens. You use tasks to specify what tables and schemas to use for your migration and to apply specific replication requirements to a database migration. You must have at least one source and one target database and a replication instance up and running.
To create a task (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_GettingStarted.html#CHAP_GettingStarted.Tasks) you need to configure the task name, select the replication instance, source and target endpoint and the migration type. If you want to start the task right after it is created, select start task on create.
For the migration type you can select (NAME - API): (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.FullLoadandCDC.html)

  • Migrate existing data (Full Load)
  • Migrate existing data and replicate ongoing changes (full-load-and-cdc)
  • Replicate data changes only (CDC)

There are some limitations when modifying the task: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.Modifying.html)

  • You cannot modify the source or target endpoint of a task.

  • You cannot change the migration type from CDC to either Full_Load or Full_Load_and_CDC.

  • You cannot change the migration type from Full Load to either CDC or Full_Load_and_CDC.

  • A task that have been run must have a status of Stopped or Failed to be modified.

Still inside tasks we have task settings and this has very important settings. They are:

  • Target table preparation mode:
    • Do nothing - Data and metadata of the target tables are not changed.

    • Drop tables on target - The tables are dropped and new tables are created in their place.

    • Truncate - Tables are truncated without affecting table metadata.
  • Stop task after full load completes (only for Full_Load_and_CDC):
    • Don't stop - Do not stop the task, immediately apply cached changes and continue on
    • Stop before applying cached changes - Stop the task prior to the application of cached changes. This will allow you to add secondary indexes which may speed the application of changes.
    • Stop after applying cached changes - Stop the task after cached changes have been applied. This will allow you to add foreign keys, triggers etc. if you are using Transactional Apply.
  • Include LOB columns in replication: (LOB Support: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.LOBSupport.html)
    • Don't include LOB columns - LOB columns will be excluded from the migration.
    • Full LOB mode - Migrate complete LOBs regardless of size. LOBs are migrated piecewise in chunks controlled by the LOB chunk size. This method is slower than using Limited LOB Mode.

    • Limited LOB mode - Truncate LOBs to ‘Max LOB Size’ This method is faster than using Full LOB Mode.

  • Max LOB size (kb):
    • In Limited LOB Mode, LOB columns which exceed the setting of Max LOB Size will be truncated to the specified Max LOB Size.
  • Table mappings: (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html)
    • Selection rules: Here you can select what schema, tables and columns you want to include or exclude in the import. You are also able to filter the column name to some condition.
    • Transformation rules: In this setting you are able to change name, remove or add prefix to the schema, table or column being imported.

Things to take in consideration

  • AWS provides a Schema Conversion Tool (AWS SCT) for when you need to convert from one source database to some different target database on RDS. More here: http://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/Welcome.html
    T
    he convertion needs to be completed before you migrate using AWS DMS: http://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_SchemaConversionTool.DMSIntegration.html
  • AUTO_INCREMENT attribute on a column is NOT migrated to a target database column. This is very important during a migration. (http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html#CHAP_Source.MySQL.Limitations)
    • If you need to import the AUTO_INCREMENT, then you must import a dump of the database with no data, only the schema.
    • To dump a database with no data, use the -d or --no-data (mysqldump -d -u someuser -p mydatabase > mydatabase.sql OR mysqldump --no-data -u someuser -p mydatabase > mydatabase.sql)
    • With the dump containing the schema only, you can import to your database using MySQL or some other UI (phpMyAdmin for example).
    • Another thing to take in consideration is the Next autoindexAfter importing the dump the next autoindex will be set to 1, but some tables might have to start at 0 and not 1. If this is the case, AWS DMS will import the row containing 0 as 1, but the row 1 won't be imported because it's duplicate. This happens with Magento, this is the list of tables (by default) that start at 0 and not 1:
      • core_store, core_store_group, core_website, customer_group
      • These columns will import the first row as 1 and not as 0. For Magento you can run this SQL after the migration is completed using DMS to move the rows back to 0 and create the correct ones as 1:

        SET FOREIGN_KEY_CHECKS=0;
        UPDATE `core_store` SET store_id = 0 WHERE code='admin';
        UPDATE `core_store_group` SET group_id = 0 WHERE name='Default';
        UPDATE `core_website` SET website_id = 0 WHERE code='admin';
        UPDATE `customer_group` SET customer_group_id = 0 WHERE customer_group_code='NOT LOGGED IN';
        INSERT INTO `core_store` (`store_id`, `code`, `website_id`, `group_id`, `name`, `sort_order`, `is_active`) VALUES
          (1, 'default', 1, 1, 'Default Store View', 0, 1);
        INSERT INTO `core_store_group` (`group_id`, `website_id`, `name`, `root_category_id`, `default_store_id`) VALUES
          (1, 1, 'Main Website Store', 1013, 1);
        INSERT INTO `core_website` (`website_id`, `code`, `name`, `sort_order`, `default_group_id`, `is_default`, `is_staging`, `master_login`, `master_password`, `visibility`) VALUES
          (1, 'base', 'Main Website', 0, 1, 1, 0, NULL, NULL, NULL);
        INSERT INTO `customer_group` (`customer_group_id`, `customer_group_code`, `tax_class_id`) VALUES
          (1, 'General', 3);
        SET FOREIGN_KEY_CHECKS=1;

         

          • Keep in mind that it will change to the name provided, so change to whatever name you need before using the SQL or just change inside Magento admin later.
        • You must set the target table preparation mode to TRUNCATE, so it keeps the schema during the migration.
      • The fastest import is using the Limited LOB Mode, but you must find out the LOB size from the source database or you might truncate some information during the migration.
        • If you don't know the size, put some higher number (12288 for example), it will still be faster than use the Full LOB mode.
        • For comparision, these are the numbers we had on WMG DB migration. In total we had 50GB migrated:
          Full LOB mode: 3 hours.
          Limited LOB MOde: 1 hour and 30 minutes.
      • If any table fail, don't stop the migration, simply select the table and drop and reload the migration: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.ReloadTables.html
      • When you create the task, there is an advanced settings where you can control the table settings (metadata) and also tune the settings with:
        • Maximum number of tables to load in parallel - Max of 49
        • Transaction consistency timeout (seconds)
        • Commit rate during full load
        • For comparision we used 35 parallels number of tables to load, 150 seconds for timeout and 50000 commit rate.

FacebookTwitterGoogle+Share
Category