IMPORT

On this page Carat arrow pointing down

The IMPORT statement imports tabular data (e.g., CSVs) into a single table.

Note:
For details about importing SQL dumps, see Import Data.

Requirements

Before using IMPORT, you should have:

  • The schema of the table you want to import.
  • The tabular data you want to import (e.g., CSV), preferably hosted on cloud storage. This location must be accessible to all nodes using the same address. This means that you cannot use a node's local file storage.

    For ease of use, we recommend using cloud storage. However, if that isn't readily available to you, we also have a guide on easily creating your own file server.

Details

Import Targets

Imported tables must not exist and must be created in the IMPORT statement. If the table you want to import already exists, you must drop it with DROP TABLE.

You can only import a single table at a time.

You can specify the target database in the table name in the IMPORT statement. If it's not specified there, the active database in the SQL session is used.

Create Table

Your IMPORT statement must include a CREATE TABLE statement (representing the schema of the data you want to import) using one of the following methods:

  • A reference to a file that contains a CREATE TABLE statement
  • An inline CREATE TABLE statement

We also recommend all secondary indexes you want to use in the CREATE TABLE statement. It is possible to add secondary indexes later, but it is significantly faster to specify them during import.

CSV Data

The tabular data to import must be valid CSV files, with the caveat that the comma delimiter can be set to another single character. In particular:

  • Files must be UTF-8 encoded.
  • If the delimiter (, by default), a double quote ("), newline (\n), or carriage return (\r) appears in a field, the field must be enclosed by double quotes.
  • If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"

CockroachDB-specific requirements:

  • If a column is of type BYTES, it can either be a valid UTF-8 string or a string literal beginning with the two characters \, x. For example, a field whose value should be the bytes 1, 2 would be written as \x0102.

Object Dependencies

When importing tables, you must be mindful of the following rules because IMPORT only creates single tables which must not already exist:

  • Objects that the imported table depends on must already exist
  • Objects that depend on the imported table can only be created after the import completes

Available Storage Requirements

Each node in the cluster is assigned an equal part of the converted CSV data, and so must have enough temp space to store it. In addition, data is persisted as a normal table, and so there must also be enough space to hold the final, replicated data. The node's first-listed/default store directory must have enough available storage to hold its portion of the data.

On cockroach start, if you set --max-disk-temp-storage, it must also be greater than the portion of the data a node will store in temp space.

Import File Location

You can store the tabular data you want to import using remote cloud storage (Amazon S3, Google Cloud Platform, etc.). Alternatively, you can use an HTTP server accessible from all nodes.

For simplicity's sake, it's strongly recommended to use cloud/remote storage for the data you want to import. Local files are supported; however, they must be accessible identically from all nodes in the cluster.

Table Users and Privileges

Imported tables are treated as new tables, so you must GRANT privileges to them.

Performance

All nodes are used during tabular data conversion into key-value data, which means all nodes' CPU and RAM will be partially consumed by the IMPORT task in addition to serving normal traffic.

Viewing and Controlling Import Jobs

After CockroachDB successfully initiates an import, it registers the import as a job, which you can view with SHOW JOBS.

After the import has been initiated, you can control it with PAUSE JOB, RESUME JOB, and CANCEL JOB.

Warning:
Pausing and then resuming an `IMPORT` job will cause it to restart from the beginning.

Synopsis

IMPORT TABLE table_name CREATE USING create_table_file ( table_elem_list ) CSV DATA ( file_to_import , ) WITH kv_option ,
Note:
The IMPORT statement cannot be used within a transaction.

Required Privileges

Only the root user can run IMPORT.

Parameters

Parameter Description
table_name The name of the table you want to import/create.
create_table_file The URL of a plain text file containing the CREATE TABLE statement you want to use (see this example for syntax).
table_elem_list The table definition you want to use (see this example for syntax).
file_to_import The URL of the file you want to import.
WITH kv_option Control your import's behavior with these options.

Import File URLs

URLs for the files you want to import must use the following format:

[scheme]://[host]/[path]?[parameters]
Location scheme host parameters
Amazon S3 s3 Bucket name AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Azure azure Container name AZURE_ACCOUNT_KEY, AZURE_ACCOUNT_NAME
Google Cloud 1 gs Bucket name AUTH (optional): can be default or implicit
HTTP 2 http Remote host N/A
NFS/Local 3 nodelocal File system location N/A
S3-compatible services 4 s3 Bucket name AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_ENDPOINT

Considerations

  • 1 If the AUTH parameter is implicit, all GCS connections use Google's default authentication strategy. If the AUTH parameter is default, the cloudstorage.gs.default.key cluster setting must be set to the contents of a service account file which will be used during authentication. If the AUTH parameter is not specified, the cloudstorage.gs.default.key setting will be used if it is non-empty, otherwise the implicit behavior is used.

  • 2 You can easily create your own HTTP server with Caddy or nginx. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from HTTPS URLs.

  • 3 The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the flag is set to disabled, then imports from local directories and NFS drives are disabled.

  • 4 A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from an S3-compatible service.

  • The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.

Import Options

You can control the IMPORT process's behavior using any of the following key-value pairs as a kv_option.

delimiter

If not using comma as your column delimiter, you can specify another Unicode character as the delimiter.

Required? No
Key delimiter
Value The unicode character that delimits columns in your rows
Example To use tab-delimited values: WITH delimiter = e'\t'

comment

Do not import rows that begin with this character.

Required? No
Key comment
Value The unicode character that identifies rows to skip
Example WITH comment = '#'

nullif

Convert values to SQL NULL if they match the specified string.

Required? No
Key nullif
Value The string that should be converted to NULL
Example To use empty columns as NULL: WITH nullif = ''

Examples

Use Create Table Statement from a File

> IMPORT TABLE customers
CREATE USING 'azure://acme-co/customer-create-table.sql?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co'
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
;

Use Create Table Statement from a Statement

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.csv?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
;

Import a Tab-Separated File

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    delimiter = e'\t'
;

Skip Commented Lines

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    comment = '#'
;

Use Blank Characters as NULL

> IMPORT TABLE customers (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('azure://acme-co/customer-import-data.tsc?AZURE_ACCOUNT_KEY=hash&AZURE_ACCOUNT_NAME=acme-co')
WITH
    nullif = ''
;

Known Limitation

IMPORT can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the kv.bulk_io_write.max_rate cluster setting to a value below your max disk write speed. For example, to set it to 10MB/s, execute:

> SET CLUSTER SETTING kv.bulk_io_write.max_rate = '10MB';

See Also


Yes No
On this page

Yes No