Avatar billede jobless Nybegynder
08. december 2005 - 16:31 Der er 7 kommentarer

Import i 2005 express

Hej.

Jeg har testet lidt på SQL Server 2005 express, men har nogle performance-problemer ved import.

Min txt-fil er ca. 22 mb. og indeholder ca. 75.000 poster.

Så vidt jeg kan se, findes der ikke nogen metode til import indbygget i 2005 express, så derfor har jeg linket tabellen ned i en access-db, og importerer herigennem, med access's indbyggede import-funktionalitet.

Den nævnte import tager lidt over 1,25 time, og dette er ikke holdbart i længden (1 import om dagen).

Findes der alternative måder at importere til sql 2005 express ?

MVH

JoBless
Avatar billede lorentsnv Nybegynder
08. december 2005 - 17:18 #1
Jeg kender ikke til om der findes importmuligheder direkte fra 2005 express. Ved vanlig SQL Server, vil man bruge DTS/Integration services, men disse følger ikke med i 2005 express. Tidligere havde man også et bulccopy program med til SQL Server, men om det findes til Express, det ved jeg ikke.

Import af 75.000 records bør kunne laves på få minutter, lidt afhængig af record-størrelse, indexer m.m. Hvis du importerer til en stor tabel, men mange eller store indexer, falder performance drastisk, da den løbende skal vedligeholde indexer.

I mange tilfælder, ville man have en decideret tabel til at importere til, unde indexer, eller kun index på et enkelt felt for at hurtig kunne fremfinde nye records. Dermed kan man hurtig får tabellen ind på SQL Server. Du kunne f.eks. prøve at oprette en ny tom tabel på sql server med de nødvendige felter, og undgå at bruge indexer. Forsøg at importer tildenne tabel, go check performance her.

Hvis du skal transformere/ændre data i forbindelse med din import, ville jeg valgt at gøre dette ved hjælp af stored procedures, når du ikke kan bruge DTS/Integration services. Lav en 'rå' import til en staging tabel på sql server, og kør derefter en Stored procedure der transformerer data og flytter dem over i den rigtige tabel.

Hvordan er netværksforbindelsen mellem din Access og SQL Server? Alternativ til Access er at lave noget ADO.Net programmering, enten i en applikation, eller som webapplikation.

Der findes nok også 3.parts leverandører som kan lave import til SQL Server.
Avatar billede arne_v Ekspert
08. december 2005 - 21:04 #2
bcp mener jeg også er med i de gratis versioner

og det behøver ikke være en .NET app - du kan skrive en loader i hvad som helst også
C++, VB6, VBS, Java
Avatar billede jobless Nybegynder
08. december 2005 - 21:16 #3
Jeg har nu testet, vedr. import til en identisk tabel uden indexes.

Filen jeg importerede var 21,3 mb (74.635 poster), og det tog ml. 3 og 4 minutter.

Hvad er så vejen frem, en stored procedure som flytter data over i den ønskede tabel, eller bcp (som arne_v foreslår, men jeg ved ikke hvad det er) ?
Avatar billede arne_v Ekspert
08. december 2005 - 21:36 #4
ingen af delene - det er indexene som driller

2 ideer du kan prøve:

lig index på en anden disk end data

fjern index, importer, tilføj index igen

(det er generelle metoder - jeg ved ikke om de er gode i SQLServer)
Avatar billede lorentsnv Nybegynder
08. december 2005 - 22:48 #5
Det er rigtig at der også kan  bruges forskellige værktøj. Med Micfosofts porgrammeringsværktøj vil jeg mene et du vil bruge enten ADO eller ADO.net, enten du programmerer via C++, C#, VB6, VB.Net eller ASP/ASP.Net.

Men ved at du kan importere data til en ren tabel på 3-4 minutter, er du måske allerede kommet langt på vej, og så er spørgsmålet måske heller hvordan du mest effektivt får flyttet disse data videre.

Hvis SQL Server Express ligger på en stor server, med muligheder for at splitte dine data på forskellige diske, kan det godt betale sig. Det mest normale vil være at flytte loggen til en anden disk.

Men der kan laves en del optimering, selv uden mulighed for at bruge forskellige diske. Det første du måske skal vurdere, er om du behøver alle de indexer som ligger på din 'target'-tabel. Det er et ret vanlig 'problem' at der bliver lagt alt for mange indexer på tabeller, noget der gør insert/update funktioner i tabellen langsomme.

I første omgang kan du måske, enten ved hjækp af en sp, eller ved at køre sql, lave en Insert into din target-tabel:
Insert into targettabel
Select * from sourcetabel

Dette vil give et indtryk af hvor lang tid det vil tage at flytte data fra din import-tabel til din target-tabel. Hvis dette tager for lang tid (f.eks. mere end 10 minutter), kan du så vurdere forskellige optimeringsmuligheder. En mulighed er, som tidligere nævnt, at vurdere om du kan undvære nogle af de indexer som er på tabellen, eventuelt gøre dem mindre. Normalt vil det ikke kunne betale sig at have mere end 2-3 kolonner med i én index.

Hvis performance fortsat er meget dårlig, er der muligheder for at midlertidig fjerne index, loade dine data, og lægge på index igen. Dette kan i nogle tilfælde godt  betale sig når du importerer store mængder data.

Jeg lavet et hurtig søg på Google på 'sql server 2005 express bulk import', hvilket giver en massse resultater. Bulk-import vil normalt være det hurtigste, men jeg kender ikke til hvilke muligheder der findes i 2005 express. Prøv eventuelt at se om du kan finde noget mere her. Med bulk-import, vil den slå fra opdatering af indexer mens selve importen foretages, og så bliver indexerne automatisk opdateret når importen er afsluttet. Bulk-import kan imidlertid kun anvendes når der er ren insert, og ikke vis der også skal opdateres records.
Avatar billede lorentsnv Nybegynder
08. december 2005 - 23:02 #6
Med en hurtig søg på internet kan jeg se at der medfølger et bcp-program (bulck-copy-program) til SQL Server Express. Jeg kender ikke til hvordan det fungerer med SQL Server Express, men tilsyneladende er det et command line utility.

Selve programmet ligger i 90\Tools\Bin, og heder bcp.exe. Prøv eventuelt at se om du kan finde nogen informationer om hvordan du bruger dette værktøj. Det kan tenkes at der skal settes en parameter på din database for at bpc skal fungere, men jeg ved det ikke med sikkerhed. Men et eller andet sted på internet, eller i books online til SQL Server Express (hvis en sådan findes), fides der mere informationer om bcp.exe.
Avatar billede lorentsnv Nybegynder
08. december 2005 - 23:04 #7
The bcp utility bulk copies data between an instance of Microsoft SQL Server 2005 and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.

Syntax

bcp {[[database_name.][owner].]{table_name | view_name} | "query"}
    {in | out | queryout | format} data_file
    [-mmax_errors] [-fformat_file] [-x] [-eerr_file]
    [-Ffirst_row] [-Llast_row] [-bbatch_size]
    [-n] [-c] [-w] [-N] [-V (60 | 65 | 70 | 80)] [-6]
    [-q] [-C { ACP | OEM | RAW | code_page } ] [-tfield_term]
    [-rrow_term] [-iinput_file] [-ooutput_file] [-apacket_size]
    [-Sserver_name[\instance_name]] [-Ulogin_id] [-Ppassword]
    [-T] [-v] [-R] [-k] [-E] [-h"hint [,...n]"]


Arguments
database_name
Is the name of the database in which the specified table or view resides. If not specified, this is the default database for the user.

owner
Is the name of the owner of the table or view. owner is optional if the user performing the operation owns the specified table or view. If owner is not specified and the user performing the operation does not own the specified table or view, SQL Server 2005 returns an error message, and the operation is canceled.

table_name
Is the name of the destination table when importing data into SQL Server (in), and the source table when exporting data from SQL Server (out).

view_name
Is the name of the destination view when copying data into SQL Server (in), and the source view when copying data from SQL Server (out). Only views in which all columns refer to the same table can be used as destination views. For more information on the restrictions for copying data into views, see INSERT (Transact-SQL).

query
Is a Transact-SQL query that returns a result set. If the query returns multiple result sets, such as a SELECT statement that specifies a COMPUTE clause, only the first result set is copied to the data file; subsequent result sets are ignored. Use double quotation marks around the query and single quotation marks around anything embedded in the query. queryout must also be specified when bulk copying data from a query.

in | , out| , queryout | , format
Specifies the direction of the bulk copy, as follows:

in copies from a file into the database table or view.

Note: 
The bcp utility included with SQL Server  6.5 does not support bulk copying into tables that contain the sql_variant or bigint data types.



out copies from the database table or view to a file.


queryout copies from a query and must be specified only when bulk copying data from a query.


format creates a format file based on the option specified (-n, -c, -w, -6, or -N) and the table or view delimiters. When bulk copying data, the bcp command can refer to a format file, which saves you from re-entering format information interactively. The format option requires the -f option; creating an XML format file, also requires the -x option. For more information, see Creating a Format File.


data_file
Is the full path of the data file. When data is bulk imported into SQL Server, the data file contains the data to be copied into the specified table or view. When data is bulk exported from SQL Server, the data file contains the data copied from the table or view. The path can have from 1 through 255 characters. The data file can contain a maximum of 2,147,483,647 rows.

Important: 
For the format option, you must specify nul as the value of data_file (format nul).



-m max_errors
Specifies the maximum number of syntax errors that can occur before the bcp operation is canceled. A syntax error implies a data conversion error to the target data type. The max_errors total excludes any errors that can be detected only at the server, such as constraint violations.

A row that cannot be copied by the bcp utility is ignored and is counted as one error. If this option is not included, the default is 10.

Note: 
The -m option also does not apply to converting the money or bigint data types.



-f format_file
Specifies the full path of a format file. The meaning of this option depends on the environment in which it is used, as follows:

If -f is used with the format option, the specified format_file is created for the specified table or view. To create an XML format file, also specify the -x option.


If used with the in or out option, -f requires an existing format file.

Note: 
Using a format file in with the in or out option is optional. In the absence of the -f option, if -n, -c, -w, -6, or -N is not specified, the command prompts for format information and lets you save your responses in a format file (whose default file name is Bcp.fmt).



-x
Used with the format and -f format_file options, generates an XML-based format file instead of the default non-XML format file. The -x does not work when importing or exporting data. It generates an error if used without both format and -f format_file.

-e err_file
Specifies the full path of an error file used to store any rows that the bcp utility cannot transfer from the file to the database. Error messages from the bcp command go to the user's workstation. If this option is not used, an error file is not created.

-F first_row
Specifies the number of the first row to export from a table or import from a data file. This parameter requires a value greater than (>) 0 but less than (<) or equal to (=) the total number rows. In the absence of this parameter, the default is the first row of the file.

-L last_row
Specifies the number of the last row to export from a table or import from a data file. This parameter requires a value greater than (>) 0 but less than (<) or equal to (=) the number of the last row. In the absence of this parameter, the default is the last row of the file.

-b batch_size
Specifies the number of rows per batch of imported data. Each batch is imported and logged as a separate transaction that imports the whole batch before being committed. By default, all the rows in the data file are imported as one batch. To distribute the rows among multiple batches, specify a batch_size that is smaller than the number of rows in the data file. If the transaction for any batch fails, only insertions from the current batch are rolled back. Batches already imported by committed transactions are unaffected by a later failure.

Do not use this option in conjunction with the -h"ROWS_PER_BATCH = bb" option.

-n
Performs the bulk-copy operation using the native (database) data types of the data. This option does not prompt for each field; it uses the native values.

-c
Performs the operation using a character data type. This option does not prompt for each field; it uses char as the storage type, without prefixes and with \t (tab character) as the field separator and \n (newline character) as the row terminator.

-w
Performs the bulk copy operation using Unicode characters. This option does not prompt for each field; it uses nchar as the storage type, no prefixes, \t (tab character) as the field separator, and \n (newline character) as the row terminator. This option cannot be used with SQL Server 6.5 or earlier versions.

-N
Performs the bulk-copy operation using the native (database) data types of the data for noncharacter data, and Unicode characters for character data. This option offers a higher performance alternative to the -w option, and is intended for transferring data from one instance of SQL Server to another using a data file. It does not prompt for each field. Use this option when you are transferring data that contains ANSI extended characters and you want to take advantage of the performance of native mode. -N cannot be used with SQL Server 6.5 or earlier versions.

-V ( 60| 65| 70| 80)
Performs the bulk-copy operation using data types from an earlier version of SQL Server. This option does not prompt for each field; it uses the default values. For example, to bulk copy date formats supported by the bcp utility provided with SQL Server 6.5 (but no longer supported by ODBC) into SQL Server 2005, use the -V 65 parameter.

Important: 
When data is bulk exported from SQL Server into a data file, the bcp utility does not generate SQL Server 6.0 or SQL Server 6.5 date formats for any datetime or smalldatetime data, even if -V is specified. Dates are always written in ODBC format. Additionally, null values in bit columns are written as the value 0 because SQL Server 6.5 and earlier versions do not support nullable bit data.



-6
Performs the bulk-copy operation using SQL Server 6.0 or SQL Server 6.5 data types. This option is supported for compatibility with earlier versions only. For SQL Server 7 and later, use the -V option instead.

-q
Executes the SET QUOTED_IDENTIFIERS ON statement in the connection between the bcp utility and an instance of SQL Server. Use this option to specify a database, owner, table, or view name that contains a space or a single quotation mark. Enclose the entire three-part table or view name in quotation marks ("").

To specify a database name that contains a space or single quotation mark, you must use the –q option.

For more information, see Remarks later in this topic.

-C { ACP | OEM | RAW | code_page }
Supported for compatibility with early versions of SQL Server. For SQL Server 7.0 and later, Microsoft recommends that you specify a collation name for each column in a format file.

Specifies the code page of the data in the data file. code_page is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.

Code page value  Description 
ACP
ANSI/Microsoft Windows (ISO 1252).

OEM
Default code page used by the client. This is the default code page used if -C is not specified.

RAW
No conversion from one code page to another occurs. This is the fastest option because no conversion occurs.

code_page
Specific code page number; for example, 850.


-t field_term
Specifies the field terminator. The default is \t (tab character). Use this parameter to override the default field terminator.

-r row_term
Specifies the row terminator. The default is \n (newline character). Use this parameter to override the default row terminator.

-i input_file
Specifies the name of a response file, containing the responses to the command prompt questions for each data field when a bulk copy is being performed using interactive mode (-n, -c, -w, -6, or -N not specified).

-o output_file
Specifies the name of a file that receives output redirected from the command prompt.

-a packet_size
Specifies the number of bytes, per network packet, sent to and from the server. A server configuration option can be set by using SQL Server Management Studio (or the sp_configure system stored procedure). However, the server configuration option can be overridden on an individual basis by using this option. packet_size can be from 4096 to 65535 bytes; the default is 4096.

Increased packet size can enhance performance of bulk-copy operations. If a larger packet is requested but cannot be granted, the default is used. The performance statistics generated by the bcp utility show the packet size used.

-S server_name[ \instance_name]
Specifies the instance of SQL Server to which to connect. If no server is specified, the bcp utility connects to the default instance of SQL Server on the local computer. This option is required when a bcp command is run from a remote computer on the network or a local named instance. To connect to the default instance of SQL Server on a server, specify only server_name. To connect to a named instance of SQL Server 2005, specify server_name\instance_name.

-U login_id
Specifies the login ID used to connect to SQL Server.

Security Note: 
When the bcp utility is connecting to SQL Server with a trusted connection using integrated security, use the -T option (trusted connection) instead of the user name and password combination.



-P password
Specifies the password for the login ID. If this option is not used, the bcp command prompts for a password. If this option is used at the end of the command prompt without a password, bcp uses the default password (NULL).

Security Note: 
Do not use a blank password. Use a strong password.



To mask your password, do not specify the -P option along with the -U option. Instead, after specifying bcp along with the -U option and other switches (do not specify -P), press ENTER, and the command will prompt you for a password. This method ensures that your password will be masked when it is entered.

-T
Specifies that the bcp utility connects to SQL Server with a trusted connection using integrated security. The security credentials of the network user, login_id, and password are not required. If –T is not specified, you need to specify –U and –P to successfully log in.

-v
Reports the bcp utility version number and copyright.

-R
Specifies that currency, date, and time data is bulk copied into SQL Server using the regional format defined for the locale setting of the client computer. By default, regional settings are ignored.

-k
Specifies that empty columns should retain a null value during the operation, rather than have any default values for the columns inserted.

-E
Specifies that identity value or values in the imported data file are to be used for the identity column. If -E is not given, the identity values for this column in the data file being imported are ignored, and SQL Server 2005 automatically assigns unique values based on the seed and increment values specified during table creation.

If the data file does not contain values for the identity column in the table or view, use a format file to specify that the identity column in the table or view should be skipped when importing data; SQL Server 2005 automatically assigns unique values for the column. For more information, see DBCC CHECKIDENT (Transact-SQL).

The -E option has a special permissions requirement. For more information, see Remarks later in this topic.

-h " hint[ ,... n] "
Specifies the hint or hints to be used during a bulk import of data into a table or view. This option cannot be used when bulk copying data into SQL Server 6.x or earlier.

ORDER(column [ASC | DESC] [,...n])
The sort order of the data in the data file. Bulk copy performance is improved if the data being loaded is sorted according to the clustered index on the table. If the data file is sorted in a different order or if there is no clustered index on the table, the ORDER hint is ignored. The names of the columns supplied must be valid columns in the destination table. By default, bcp assumes the data file is unordered.

ROWS_PER_BATCH = bb
Number of rows of data per batch (as bb). Used when -b is not specified, resulting in the entire data file being sent to the server as a single transaction. The server optimizes the bulk load according to the value bb. By default, ROWS_PER_BATCH is unknown.

KILOBYTES_PER_BATCH = cc
Approximate number of kilobytes of data per batch (as cc). By default, KILOBYTES_PER_BATCH is unknown.

TABLOCK
Specifies that a bulk update table-level lock is acquired for the duration of the bulk load operation; otherwise, a row-level lock is acquired. This hint significantly improves performance because holding a lock for the duration of the bulk-copy operation reduces lock contention on the table. A table can be loaded concurrently by multiple clients if the table has no indexes and TABLOCK is specified. By default, locking behavior is determined by the table option table lock on bulk load.

CHECK_CONSTRAINTS
Specifies that all constraints on the target table or view must be checked during the bulk-import operation. Without the CHECK_CONSTRAINTS hint, any CHECK constraints are ignored, and after the operation the constraint on the table is marked as not-trusted.

Note: 
UNIQUE, PRIMARY KEY, FOREIGN KEY, or NOT NULL constraints are always enforced.



At some point, you will need to check the constraints on the entire table. If the table was nonempty before the bulk import operation, the cost of revalidating the constraint may exceed the cost of applying CHECK constraints to the incremental data. Therefore, Microsoft recommends that normally you enable constraint checking during an incremental bulk import.

A situation in which you might want constraints disabled (the default behavior) is if the input data contains rows that violate constraints. With CHECK constraints disabled, you can import the data and then use Transact-SQL statements to remove data that is not valid.

Note: 
In SQL Server 2005, bcp enforces new data validation and data checks that might cause existing scripts to fail when they are executed on invalid data in a data file.


Note: 
The -mmax_errors switch does not apply to constraint checking.



FIRE_TRIGGERS
Specified with the in argument, any insert triggers defined on the destination table will run during the bulk-copy operation. If FIRE_TRIGGERS is not specified, no insert triggers will run. FIRE_TRIGGERS is ignored for the out, queryout, and format arguments.
Avatar billede Ny bruger Nybegynder

Din løsning...

Tilladte BB-code-tags: [b]fed[/b] [i]kursiv[/i] [u]understreget[/u] Web- og emailadresser omdannes automatisk til links. Der sættes "nofollow" på alle links.

Loading billede Opret Preview
Kategori
Computerworld tilbyder specialiserede kurser i database-management

Log ind eller opret profil

Hov!

For at kunne deltage på Computerworld Eksperten skal du være logget ind.

Det er heldigvis nemt at oprette en bruger: Det tager to minutter og du kan vælge at bruge enten e-mail, Facebook eller Google som login.

Du kan også logge ind via nedenstående tjenester