While attempting to do some data merging across several Excel spreadsheets, I found some interesting behavior in how OLEDB determines data types of columns with an Excel source.
For me, the issue manifested itself with a column that looked like the following (shortened for brevity’s sake):
10132 10133 10134D 10134 10135
By default, Excel looks at the first 8 rows of each column to determine the type. If most of the items are numeric, it will create the column as an int or double type. The problem is, if the column also contains text data, like the one above (10134D), the column can’t be converted to a numeric type and OLEDB will just drop the value with no error or explanation.
With the above data, since 4 out of 5 are numeric, OLEDB turns the column into a numeric type, and the 3rd row’s column of that data would be NULL.
I tried several ways of fixing the issue but found that the only solution that worked was to set the IMEX property in the connection string to 1 in the Extended Properties section like so:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=blah.xls;Extended Properties="Excel 8.0;IMEX=1;"
An additional extended property you might find useful is “HDR=1”, which tells the driver that the first row of the spreadsheet contains text data as a header and not column data.
Apparently this tells the OLEDB driver for Excel that the data is “intermixed” and that it should convert any intermixed data to the type specified in the registry at:
By default, this is “Text”, which is exactly what I wanted. You’ll also notice a key in there named “TypeGuessRows”. This is the number of rows that Excel will use to determine the data-type of the column, the default being 8 as described above. You may need to tweak this value to a number greater than the row at which your data changes in the column you want as a string type.
Anyway, it’s probably useless to most, but it took me a bit of time and research to figure out so I thought I’d pass it along.