Worst, but still importable data I’ve ever seen Arthur Tabachneck Insurance Bureau of Canada

Preview:

Citation preview

worst, but still importable data I’ve ever seen

Arthur Tabachneck Insurance Bureau of Canada

Coder’s CornerApril 12, 2010ForumSAS

suppose you had the following excel file:

format: textformat: as shown

format: m/d/yyyy

format: d/m/yyyy

format: textformat: d-mon

format: text

Coder’s CornerApril 12, 2010ForumSAS

how the file got so bad:

members of a secretarial pool were asked to enter the data, in Excel, while they were covering the front desk

they (four different secretaries), obviously, weren’t given sufficient instructions

their task was simply to enter some data, which happened to include a date

Coder’s CornerApril 12, 2010ForumSAS

proc import can only be used if:

you licenseSAS/Access Interface to PC File Formats

and

at least half of the relevant rows (based on your system’s and SAS guessingrows

settings) are formatted as dates

or

you manually edit the spreadsheet and/or change your guessing rows settings so that condition #2 holds

1

2

3

Coder’s CornerApril 12, 2010ForumSAS

If proc import can be used, three steps are necessarystep 1: use mixed=no

Coder’s CornerApril 12, 2010ForumSAS

which will import date formatted cellsand assign missing values to the other cells

Coder’s CornerApril 12, 2010ForumSAS

step 2: use mixed=yeswhich will import all cells as text

Coder’s CornerApril 12, 2010ForumSAS

step 3merge the two files and use inputn to read missing dates

data want (drop=bdate); set inputa; set inputb (rename=(date=bdate)); if missing(date) then do; options datestyle=dmy; date=inputn(bdate, ‘anydtdte’, 20); end; if missing(date) then do; date=inputn(catt(scan(bdate,2,’-’), scan(bdate,1,’-’), scan(bdate,3,’-’)), ‘anydtdte’, 20); end;run;

Coder’s CornerApril 12, 2010ForumSAS

resulting in the following file

Coder’s CornerApril 12, 2010ForumSAS

however, if proc import can’t be used

or

if you simply want a better solution

Coder’s CornerApril 12, 2010ForumSAS

you can do it with DDE

options noxsync noxwait xmin;filename sas2xl dde 'excel|system';

step 1: set desired options and filename

Coder’s CornerApril 12, 2010ForumSAS

Step 2: Open Exceldata _null_; length fid rc start stop time 8; fid=fopen('sas2xl','s'); if (fid le 0) then do; rc=system('start excel'); start=datetime(); stop=start+10; do while (fid le 0); fid=fopen('sas2xl','s'); time=datetime(); if (time ge stop) then fid=1; end; end; rc=fclose(fid);run;

Coder’s CornerApril 12, 2010ForumSAS

Step 3: Open workbook and insert old-style macro sheetdata _null_; file sas2xl; put '[open("c:\worst data.xls")]';run;

data _null_; file sas2xl; put '[workbook.next()]'; put '[workbook.insert(3)]';run;

filename xlmacro dde 'excel|macro1!r1c1:r99c1‘ notab lrecl=200;

Coder’s CornerApril 12, 2010ForumSAS

Step 4: Create and run Excel macrodata _null_; file xlmacro; put '=set.name("Tag",!$b$1)'; put '=formula("<>",Tag)'; put '=set.name("OldValue",!$c$1)'; put '=set.name("NewValue",!$b$2)'; put '=for.cell("CurrentCell",sheet1!$a$2:$a$99,true)'; put '=formula(get.cell(5,CurrentCell),OldValue)'; put '=formula("=concatenate(Tag,OldValue)",NewValue)'; put '=formula(NewValue, CurrentCell)'; put '=next()'; put '=halt(true)'; put '!dde_flush'; file sas2xl; put '[run("macro1!r1c1")]'; put '[workbook.activate("sheet1")]'; put ‘[error(false)]’; put '[save.as(“c:\DateTest",6)]'; put '[quit()]';run;

Coder’s CornerApril 12, 2010ForumSAS

Step 5: Import the datadata want (keep=date); infile "c:\DateTest.csv" dsd dlm="," lrecl=32768 firstobs=2; informat rawdate $20.; input rawdate; format date date9.; rawdate=substr(rawdate,3); if anyalpha(rawdate) then do; options datestyle=dmy; date=inputn (rawdate , 'anydtdte' , 20 ); if missing(Date) then do; date=inputn(catt(scan(rawdate,2,'-'),scan(rawdate,1,'-'), scan(rawdate,3,'-')),'anydtdte' , 20) ; end; end; else Date=rawdate-21916;run;

Coder’s CornerApril 12, 2010ForumSAS

and obtain the desired resultregardless of your system’s guessing rows setting

or how your data is arranged

Coder’s CornerApril 12, 2010ForumSAS

Author Contact Information

Your comments and questions are valued and encouraged.

Contact the author:

Dr. Arthur TabachneckDirector, Data ManagementInsurance Bureau of Canada

Toronto, Ontario L3T 5K9 Canada

atabachneck at ibc dot ca orart297 at netscape dot net

Coder’s CornerApril 12, 2010ForumSAS

Microsoft Corporation. Function Reference Microsoft EXCEL Spreadsheet with Business Graphics and Database: Version 4.0 for Apple® Macintosh® Series or Windows™ Series. Document AB26298-0592, 1992.

Vyverman, K. Excel Exposed: Using Dynamic Data Exchange to Extract Metadata from MS Excel Workbooks, SESUG 17, 2003, paper TU15, St. Pete Beach, FL

Vyverman, K. Re: How to flag special formatting from Excel in a SAS dataset. SAS-L Post , 2002, http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0209a&L=sas-l&D=1&O=A&P=12088

Vyverman, K. Re: MS Excel column widths. SAS-L Post , 2002, http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0201b&L=sas-l&P=25268

Vyverman, K. Using Dynamic Data Exchange to Export Your SAS Data to MS Excel – Against All ODS, Part I, SUGI 26, 2001, paper 190-27, Long Beach, CA.

Key References

Recommended