21
What is the therapeutic area you worked earlier? Recently I am working on Oncology therapeutic area on different cancer types like 1. RENAL CELL CANCER (RCC) 2. METASTATIC COLORECTAL CANCER (MCC) 3. ADVANCED NON-SMALL CELL LUNG CANCER But I had worked earlier on some of the Neurosciences and Specialty care like Diabetes and Depression studies. What are your responsibilities? Some of them include; not necessarily all of them…. · Extracting the data from various internal and external database (Oracle clinical, CSV files, Excel spreadsheets) using SAS/ACCESS, SAS/INPUT, Proc Download · Creating and deriving the analysis datasets, listings and summary tables for different clinical trial protocols · Involved in mapping, pooling and analysis of clinical study data for safety · Using the Base SAS (MEANS, FREQ, UNIVARIATE, SUMMARY, TABULATE, REPORT etc) for summarization, Cross-Tabulations and statistical analysis purposes · Developing the Macros at various instances for automating listings and summary tables of multiple protocol having similar safety Tables /Listing of clinical data for analysis · Validating and QC of the efficacy and safety tables · Creating the Ad hoc reports using the SAS procedures and used ODS statements to generate different output formats like HTML, PDF · Creating the Statistical reports using Proc Report, Data _null_ and SAS Macro · Analyzing the data according to the Statistical Analysis Plan (SAP) · Generating the demographic tables, adverse events, labs, concomitant treatment/medication, Quality of Life (QoL) · Involved in Quality control and reporting of Data issues directly to data management team for the outliers, qualifiers and missing data. Can you tell me something about your last project study design? I recently worked on a protocol name A4061023 which is one of the potential compounds called Axitinib (AG-13736) and this study is for Refractory Metastatic Renal Cell Cancer (RCC). It is a Phase II, Open Label, Non Randomised, Single Group study for the findings of Safety and Efficacy of this drug. This study has 62 subjects enrolled.

Clinical SAS Interview Questions

Embed Size (px)

DESCRIPTION

Clinical SAS Interview Questions

Citation preview

Page 1: Clinical SAS Interview Questions

What is the therapeutic area you worked earlier? Recently I am working on Oncology therapeutic area on different cancer types like

1. RENAL CELL CANCER (RCC)2. METASTATIC COLORECTAL CANCER (MCC)3. ADVANCED NON-SMALL CELL LUNG CANCER

But I had worked earlier on some of the Neurosciences and Specialty care like Diabetes and Depression studies.

What are your responsibilities?Some of them include; not necessarily all of them….

· Extracting the data from various internal and external database (Oracle clinical, CSV files, Excel spreadsheets) using SAS/ACCESS, SAS/INPUT, Proc Download· Creating and deriving the analysis datasets, listings and summary tables for different clinical trial protocols· Involved in mapping, pooling and analysis of clinical study data for safety· Using the Base SAS (MEANS, FREQ, UNIVARIATE, SUMMARY, TABULATE, REPORT etc) for summarization, Cross-Tabulations and statistical analysis purposes· Developing the Macros at various instances for automating listings and summary tables of multiple protocol having similar safety Tables /Listing of clinical data for analysis· Validating and QC of the efficacy and safety tables· Creating the Ad hoc reports using the SAS procedures and used ODS statements to generate different output formats like HTML, PDF· Creating the Statistical reports using Proc Report, Data _null_ and SAS Macro· Analyzing the data according to the Statistical Analysis Plan (SAP)· Generating the demographic tables, adverse events, labs, concomitant treatment/medication, Quality of Life (QoL)· Involved in Quality control and reporting of Data issues directly to data management team for the outliers, qualifiers and missing data.

Can you tell me something about your last project study design? I recently worked on a protocol name A4061023 which is one of the potential compounds called Axitinib (AG-13736) and this study is for Refractory Metastatic Renal Cell Cancer (RCC).It is a Phase II, Open Label, Non Randomised, Single Group study for the findings of Safety and Efficacy of this drug. This study has 62 subjects enrolled.The primary endpoint for this study is Objective Response Rate (ORR), CR, PR by RECIST (Response Evaluation Criteria In Solid Tumors) and Response rate of Axitinib in patients with RCC.Some of the secondary endpoints of this study involves measurement of PFS, DR, FKSI (Cancer Related Symptoms) QOL questionnaire , Safety Profile of AG-13736

Functional Assesement of Cancer Therapy (FACT)

FKSI-15: is for patients with Kidney Cancer, 15 item scale

Some of the inclusion Criteria of the study were 18 or older, any gender,1. having RCC with metastases and nephrectomy2. Failure of prior sorafinib based therapy

Some of the exclusion criteria were1. Gastronatial abnormalities2. Active seizure disorder

RCC is Kidney Neoplasms Carcinoma Renal Cell Cancer. of or relating to the kidneys

Page 2: Clinical SAS Interview Questions

What is the primary and secondary end point in your last project?

How many analyzed data sets did you create?Again it depends on the study the safety and efficacy parameters that are need to determined from the study. Approx. 20-30 datasets is required for a study to get analyzed for the safety and efficacy parameters. For ex. DM (Demographics), MH (Medical History), AE (Adverse Events), PE (Physical Examination), EG (ECG), VS (Vital Signs), CM (Concomitant Medication), LB (Laboratory), QS (Questionnaire), IE (Inclusion and Exclusion), ), DT (Death), CO (Comments), EX (Exposure), Final, Primdiag, Pcancer, Prad, Bpbp

How did you create analyzed data sets?Analysis datasets are used for the statistical analysis of the data. Analysis datasets contains the raw data and the variables derived from the raw datasets. Variables, which are derived for the raw data, are used to produce the TLG’s of the clinical study. The safety as well as efficacy endpoints (parameters) dictate the type of the datasets are required by the clinical study for generating the statistical reports of the TLG’s. Sometimes the analysis datasets will have the variables not necessarily required to generate the statistical reports but sometimes they may required generating the ad-hoc reports. One important thing to keep is mind while generating VAD is to make sure to exclude the variables which will never be used for any kind of analysis and are normally collected for the data collection facilitation. This helps keeping the size of VAD to minimum and improves the performance as well.

What do you mean by treatment emergent adverse events?

A treatment-emergent adverse event is defined as any event not present prior to the initiation of thetreatments or any event already present that worsens in either intensity or frequency following exposure to the treatments.Some of the common AE’s are :-

a. Abnormal Lab Test finding.b. Clinical symptom and sign.c. Changes in physical examination findings.d. Hypersensitivitye. Progress/Worsening of underlying disease.

Can you explain something about the datasets?

DEMOGRAPHIC analysis dataset contains all subjects’ demographic data (i.e., Age, Race, and Gender), disposition data (i.e., Date patient withdrew from the study), treatment groups and key dates such as date of first dose, date of last collected Case Report Form (CRF) and duration on treatment. The dataset has the format of one observation per subject.

LABORATORY analysis dataset contains all subjects’ laboratory data, in the format of one observation per subject, per test, per visit, per lab result number. Here, we derive the study visits according to the study window defined in the SAP. If the laboratory data are collected from multiple local lab centers, this analysis dataset will also centralize the laboratory data and standardize measurement units by using conversion factors.

EFFICACY analysis dataset contains derived primary and secondary endpoint variables as defined in

Page 3: Clinical SAS Interview Questions

the SAP. In addition, this dataset can contain other efficacy parameters of interest, such as censor variables pertaining to the time to an efficacy event. This dataset has the format of one record per subject per analysis period.

ADVERSE EVENT analysis dataset contains all adverse events (AEs) reported including serious adverse events (SAEs) for all subjects. A treatment emergent flag, as well as a flag to indicate if an event is reported within 30 days after the subject permanently discontinued from the study, will be calculated. This dataset has a format of one record per subject per adverse event per start date. Partial dates and missing AEs start and/or stop dates will be imputed using logic defined in the SAP.

It is crucial to generate analysis datasets in a specific order, as some variables derived from one particular analysis dataset may be used as the inputs to generate other variables in other analysis datasets. For example, Cycles Dataset is a VAD and is generated based on number of days in a cycle as defined in Protocol but this AD is used in generating other VAD’s to facilitate study.

What is your involvement while using CDISC standards? What is mean by CDISC where do you use it?CDISC is an organization (Clinical Data Interchange Standards Consortium), which implements industrial standards for the pharmaceutical industries to submit the clinical data to FDA.There are so many advantages of using CDISC standards: Reduced time for regulatory submissions, more efficient regulatory reviews of submission, savings in time and money on data transfers among business.

CDISC standards is used in following activities:Developing CRTs for submitting them to FDA to get an NDA.Mapping, pooling and analysis of clinical study data for safety.Creating the annotated case report form (eCRF) using CDISC-SDTM mapping.

What do you mean when you say you created tables, listings and graphs for ISS and ISE?

There are many reasons to integrate and to summarize all the data from a clinical trial program. Each clinical trial in the program is unique in its objective and design. Some are small safety studies among normal volunteers, while others are efficacy trials in a large patient population.

The primary reason to create an integrated summary is to compare and to contrast all the various study results and to arrive at one consolidated review of the benefit/risk profile

Also, pooling the data from various studies enables the examination of trends in rare subgroups of patients, such as the elderly, those with differing disease states (mild vs. severe).

How do you do data cleaning?It is always important to check the data we are using- especially for the variables what we are using. Data cleaning is critical for the data we are using and preparing.I use Proc Freq, Proc SQL, Mean, Proc compare and some utility functions like date, etc to clean the data.

Page 4: Clinical SAS Interview Questions

Can you tell me CRT's??

These are Case Report Tabulations (CRTs) used for an NDA Electronic Submission to the FD. CRTs are made up of datasets and the accompanying documentation for the datasets.

The Food and Drug Administration (FDA) now strongly encourages all new drug applications (NDAs) be submitted electronically. Electronic submissions could help FDA application reviewers scan documents efficiently and check analyses by manipulating the very datasets and code used to generate them. The potential saving in reviewer time and cost is enormous while improving the quality of oversight. As described, one important part of the application package is the case report tabulations (CRTs), now serving as the instrument for submitting datasets. CRTs are made up of two parts:

1. Datasets in SAS® transport file format 2. The accompanying documentation for the datasets.

In practice, however, applicants may discuss what data to include as part of the CRTs with the FDA review division prior to the electronic submission. Note that some FDA reviewers’ software require that files are first loaded into random access memory (RAM), the Guidance explains, so individual files cannot exceed 25 MB. To accommodate, programmers might need to break-up large datasets. In addition to raw and derived variables, each dataset should include core demographics information such as age, sex, race, ethnicity, and site location. This helps reviewers track and analyze basic information quickly.

Where do you use MedDra and WHO? Can you write a code? How do you use it?What is MedDRA?The Medical Dictionary for Regulatory Activities (MedDRA) has been developed as a pragmatic, clinically validated medical terminology. MedDRA is applicable to all phases of drug development and the health effects of devices. MedDRA is used to report adverse event data from clinical trials.

What are the structural elements of the terminology in MedDRA?The structural elements of the MedDRA terminology are as follows:SOC - Highest level of the terminology, and distinguished by anatomical or physiological system, etiology, or purposeHLGT – Subordinate to SOC, superordinate descriptor for one or more HLTsHLT – Subordinate to HLGT, superordinate descriptor for one or more PTsPT – Represents a single medical concept (Preferred Term)LLT – Lowest level of the terminology, related to a single.

THE WHODRUG DICTIONARY:The WHODRUG dictionary was started in 1968. The dictionary contains information on both single and multiple ingredient medications. Drugs are classified according to the type of drug name being entered, (i.e. proprietary/trade name, nonproprietary name, chemical name, etc.). At present, 52 countries submit medication data to the WHO Collaborating Center, which is responsible for the maintenance and distribution of the drug dictionary.

Page 5: Clinical SAS Interview Questions

What do you mean by used Macro facility to produce weekly and monthly reports?The SAS macro facility can do lot of things and especially it is used to… • reduce code repetition• increase control over program execution• minimize manual intervention• create modular code.

Did you see anywhere that. Patient is randomized to one drug and the patient is given another drug? if you get in which population would you put that patient into? Although, this situation is almost impossible, but if happens I will consider that patient in the group of the drug that he was given.

What would you do if you had to pool the data related to one parallel study and one cross over study? OR Say If you have a same subject in two groups taking two different drugs.. and If you had to pool these two groups how would you do it?

This situation arises when the study is a cross over design study. I would consider the same patient as two different patients of each treatment group.

How would you transpose dataset using data step?

data new (keep=name date1-date3);set old;by name;array dates {3} date1-date3;retain date1-date3;if first.name then i=1;else i + 1;dates{i} = date;if last.name;run;

Similar results can be achieved by Proc transpose.Proc transpose data=old out=newprefix=DATE;var date;by name;run;

How do you deal with missing values? OR If some patient misses one lab how would you assign values for that missing values?? Can you write the code?

Whenever SAS encounters an invalid or blank value in the file being read, the value is defined as missing. In all subsequent processes and output, the value is represented as a period (if the variable is numeric-valued) or is left blank (if the variable is character-valued).

In DATA step programming, use a period to refer to missing numeric values.

Page 6: Clinical SAS Interview Questions

For example, to recode missing values in the variable A to the value 99, use the following statement:IF a=. THEN a=99;

Use the MISSING statement to define certain characters to represent special missing values for all numeric variables. The special missing values can be any of the 26 letters of the alphabet, or an underscore. In the example below, the values 'a' and 'b' will be interpreted as special missing values for every numeric variable.MISSING a b ;

Did you ever create efficacy tables?Yes, I have created Efficacy tables. Efficacy tables are developed to get an the information about primary objectives/parameters of the study.

What are the stat procedures you used?FREQ, GLM, MEANS, SUMMARY etc

Can you use all the functions in data step in macro definition?Yes.

If I have a dataset with different subjid's and each subjid has many records? How can I obtain last but one record for each patient?Syntax:Proc sort data=old;By subjid;Run;Data new;Set old;By subjid;If last..subjid;Run;

Orproc sort data=old out=new nodupkey;by subjid;run;

Can you get some value of a data step variable to be used in any other program you do later in the same SAS session? How do you do that?Use a macro… with a %PUT statement.

What would you do if you have to access previous records values in current record?By using lag function

What is a p value? Why should u calculate that? What are the procedures you can use for that?If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent

Page 7: Clinical SAS Interview Questions

variables does not reliably predict the dependent variable. Note that this is an overall significance test assessing whether the group of independent variables, when used together reliably predicts the dependent variable, and does not address the ability of any of the particular independent variables to predict the dependent variable. Using the Proc Freq, proc anova, proc GLM & Proc Ttest we cal calculate the p-value.

What do you usually do with proc life test? Proc Lifetest is used to obtain Kaplan-Meier and life table survival estimates (and plots).

Which procedure do you usually use to create reports?Proc Report, Data _null_

How do you use the macro which is created by some other people and which is in some other folder other than SAS?With SAS Autocall library using the SAS Autos system.

Can you tell me something regarding the macro libraries?These are the libraries which stores all the macros required for developing TLG’s of the clinical trial. These are very are necessary in controlling and managing the macros. With the help of a %INCLUDE statement; the stored macros in the macro library can be automatically called.

Did you use ODS?Yes, I have used the ODS(Output Delivery System), which normally used to make the output from the Tables, Listings and graphs looks pretty. ODS creates the outputs in html, pdf and rtf formats.General syntax:Start the output with:ODS PDF file=”Abc.pdf” pdfpassword=(open=”open” owner=”owner”)pdfsecurity=None|Low|High startpage=never;

Proc1 statements……………..Proc2 statements……………..Startpage=now;SAS statements……………..Ods PDF close;

Your resume says you created HTML, RTF, PDF? Why you had to create three?? Can you tell me in specific why each form is used?There are several ways of format to create the SAS output.To publish or to place the output on the Internet we need to create the output in HTML format, by converting the output into HTML files.We generally create the SAS output in RTF, because the RTF can be opened in Word or other word processors.If we need to send the printable reports through email, we need to create the output in PDF. PDF output is also needed when we send documents required to file an NDA to FDA.

Can you generate statistics using Proc SQl?Yes, we can generate the statistics like N, Mean, Median, Max, Min, STD & SUM using PROC SQL. But SQL procedure cannot calculate all the above statistics by default, as it is the case with PROC

Page 8: Clinical SAS Interview Questions

MEANS.

When do you prefer Proc SQl? Give me some situation?The SQL procedure supports almost all the functions available in the DATA step for the creation of data as well as the manipulation of the data. When we compare the same result, obtained from SQL and with the Data step, PROC SQL requires less code and, more importantly it requires less time to execute the code.

How do you delete a macro variable?If the macro variable is stored in the library then it is easy to delete it. Multiple variables may be deleted by placing the variable names in the DELETE statement:

Why do you have to use proc import and proc export wizards? Give me the situation?These two help us to transfer the files/data between SAS and external data sources.

What is SAP/ Protocol?Sap includes the technical and detailed elaboration of the principal features of the analysis described in the protocol. It explains detailed procedures for executing statistical analysis for primary and secondary endpoint.

SAP includes:- 1. Variable derivation2. Visit window definition3. Planned interim analysis4. LOT5. How to deal with missing values

The protocol provides the overall function under which the study is to be conducted and includes the type of analysis planned. For example : administration schedule of drug and the study design

What is the difference between Sas 8 and Sas 9 ? OR what are the changes in sas 8 and 9?1. Installation directory of ver8 is V8 and in the same was it is V9 for version9. Note. SAS 8 and 9

share the same components so be carefull while uninstalling if they are installed on the same machine.

2. ODS Document Viewer is the new addition to SAS 9 for viewing the documents created with ODS statement.

3. In version9 SAS supports formats and informats longer then 8 bytes. This is the only change in dataset structure. V9 Dataset are backward compatible if they conform to V8 naming conventions.New length for a numeric format is 31 and character format is 31.

4. Three new informats are available to convert various date, time and datetime form of data into a sas dataset.

a. ANYDTDTEw. – To convert date valueb. ANYDTTMEw. – To convert time valuec. ANYDTDTMw. – To conert datetime value.

It is important to note that these informats make assumptions on a record basis. Ambigous values can be interpreted in a incorrect fashion.

Page 9: Clinical SAS Interview Questions

5. DateStyle System option is now available to set a default assumption for the date to be either DMY, MDY, YMD.

6. In SAS 9 Call SORTN/ SORTC are quick ways to sort VARIABLE values inside the data step. It is not designed to remoe Proc sort. It is simpler way of ordering values of the same structure.

7. Multithreaded Architecture:- One of the biggest enhancement in SAS9 is its ability to support multithreaded access to files for use in the data step and certain procedures.

Like sorting the data of Pros sort using four processor by dividing it in a equal chunk or doing a summarization using Proc means by deriving summary of the data and then adding the results to get final summary.

Some of the Procedures which support multithreading are:-a. PROC SORTb. PROC SUMMARYc. PROC MEANSd. PROC REPORTe. PROC TABULATEf. PROC SQL

By default multi threading is on..in Version 9 for all these procedures. Therefore there is a new option available for each procedure (THREADS/NOTHREADS) to optionally turn this feature off.

8. CPUCOUNT OPTION is to limit the number of CPU to use for assigning the multithreaded work. The default value is to use the MAXIMUM number of cpu’s available.

9. Dataset can be referenced by the real physical location and leading zeros are supported when using INTO clause.

What is ICH and its guidelines?The International Conference on Harmonization of Technical Requirements for registration of Pharmaceuticals for the Human Use is a unique project that brings together regulatory authorities of Europe, Japan and the United States.

The objective of such harmonization is more effective use of human, animal and material resources to eliminate the unnecessary delay in the global development process whilst maintaining the safety quality and efficacy to protect public health.

How would you validate a TLG? OR How do you validate a program? Check for

Conditions expected from input data. The kind of data we are dealing with. extreme values expected range checking number of observations, tracks number of observations / variables handling of missing values all pathways through the code, to find:

logical dead-ends infinite loops code never executed

algorithms / mathematical calculations

Page 10: Clinical SAS Interview Questions

check for _ERROR_ flag in data PROC PRINT before and after DATA steps and then compare the results. use OBS= to limit number of obs printed use FIRSTOBS= to skip past a trouble spot use Proc compare to make sure the 100% accuracy between two datasets. (Length, label, Data,

Format and Informats)

What are the Data and Basic Syntax/Coding Errors?

errors or omissions in DATA step coding array subscript out of range uninitialization of variables invalid data hanging DOs or ENDs invalid numeric operations type conversions (automatic) warning and informational messages in log points to errors in:

DROP, KEEP DELETE BY OUTPUT MERGE (Repeat of By Values error) subsetting Ifs

What has been your most common programming mistake?1. Miss spelling a keyword2. Libraries / datasets spelled incorrectly.3. missing semicolon at the end of a statement.4. Not executing Macro after writing it. For %macroname to execute it.

Call Symput SYMPUT is a routine that you CALL inside a data step to produce a global symbolic variable. The variable can not be referenced inside the creating data step. The data step must end with a RUN; statement before you reference the variable in your code. The value of the macro variable is assigned during the execution of the data step.

Is is possible to Reference a GLOBAL and LOCAL macro variable separately having same name. In this example first %put statement will output second and the next %put statement will output first as we are mentioning the scope of the variable while defining it.

%global abc;%let abc=first;%macro test(); %local abc; %let abc=second; %put "inside macro &abc";%mend test;

Page 11: Clinical SAS Interview Questions

%test;

%put "Outside macro &abc";

51. What is PROC CDISC?

It is new SAS procedure that is available as a hotfix for SAS 8.2 version and comes as a part withSAS 9.1.3 version. PROC CDISC is a procedure that allows us to import (and export XML files that are compliant with the CDISC ODM version 1.2 schema.

52) What is LOCF?

Pharmaceutical companies conduct longitudinalstudies on human subjects that often span several months. It is unrealistic to expect patients to keep every scheduled visit over such a long period of time.Despite every effort, patient data are not collected for some time points. Eventually, these become missing values in a SAS data set later.

For reporting purposes,the most recent previously available value is substituted for each missing visit. This is called the Last Observation Carried Forward (LOCF).LOCF doesn't mean last SAS dataset observation carried forward. It means last non-missing value carried forward.

It is the values of individual measures that are the "observations" in this case. And if you have multiple variables containing these values then they will be carried forward independently.

Name several ways to achieve efficiency in your program. Explain trade-offs.1. Using Macro Facility to use it at different program phases in the same session avoiding the writing of

code again n again. For example for summary statistics.

2. Dropping / Keeping only those variables which are of significance to the program.

3. If using SAS9 making sure multithreading is being by the procedures which support it

4. Using class statement in place of by where possible as it avoids the time sorting the datasets\

Q:- What is the benefit of using Proc Copy instead of just copying and pasting datasets?

Proc copy procedure is used to copy the multiple datasets from one physical location to another and also to create the transport file of datasets for sending the datasets electronically. It created .xpt file which encoded to send them across

libname source 'SAS-data-library-on-sending-host';libname xptout xport 'filename-on-sending-host';

proc copy in=source out=xptout memtype=data; select bonus budget salary; /* Dataset names **/

Windows-------libname olddata 'c:\sas'; * This is the location of the SAS data sets;

Page 12: Clinical SAS Interview Questions

libname plum xport 'c:\abc.xpt'; * This is what the transport file is called;

proc copy in=olddata out=plum; select chars cleaning;

run;

libname olddata 'c:\'; * New SAS Data Sets will be written here;libname pear sasv5xpt 'c:\abc.xpt'; * This is what the transport file is called; proc copy in=pear out=olddata; select cleaning;

run;

How did you use Proc Printto?

When you use the PRINTTO procedure with its LOG= and PRINT= options, you can route the SAS log or SAS procedure output to an external file or a fileref from any mode. Specify the external file or the fileref in the PROC PRINTTO statement.

proc printto print='/u/myid/output/prog1' new;run;

The NEW option causes any existing information in the file to be cleared. If you omit the NEW option from the PROC PRINTTO statement, the SAS log or procedure output is appended to the existing file.

There are two options Print and Log …Print will direct output of a procedure and log will direct the log output.

If you plan to specify the same destination several times in your SAS program, you can assign a fileref to the file using a FILENAME statement

filename myoutput printer;filename myoutput “c:\abc.rpt”;

Usage of PROC UNIARIATE?The UNIVARIATE procedure provides data summarization tools, and provides information on the distribution of numeric variables. For example, PROC UNIVARIATE

calculates the median, mode, range, and quantiles calculates confidence limits tabulates extreme observations and extreme values generates frequency tables

PROC GLM analyzes data within the framework of General linear models. PROC GLM handles models relating one or several continuous dependent variables to one or several independent variables. The independent variables may be either classification variables, which divide the observations into discrete groups, or continuous variables.

Data Null Usage

Proc Sql;Select * into: n1 from datasetname;Quit;

Page 13: Clinical SAS Interview Questions

%if &n1 <= 0 %then; Data _Null_ Put “//Not data Matching the criteria” Run;

PROC SUMMARY ; BY city ; VAR income ; ID area ; OUTPUT OUT=two SUM=t_income MEAN=m_income ;

An OUTPUT statement builds a data set containing the specified statistics (SUMs and MEANs). The VAR statement is used to list the variables for which these statistics are desired.PROC SUMMARY produces an output data set similar to PROC MEAMS, but no printed output is generated. Thus it is essentially equivalent to PROC MEANS using the NOPRINT option.

Q. How do you create Analysis Dataset / What is the process of Creating Analysis Dataset.Ans. For standard datasets like Demog, Vital Sings and Concomitant Medication normally we don’t get specific requirement. These datasets are generated by standard production code and then quality checked by independent programmers. But sometime even on standard dataset a statisticians needs some specific endpoints to be derived for example Adjuvant and Neoadjuvent therapy classification along with demographics information then we have to program datasets and tables keeping the requirements in mind. Study specific datasets like efficacy and others always have requirements mentioned including the details like variable name, length, label and derivation. So we have to understand the requirements and derivation, but before starting to program the requirements are checked and approved by study manager.Another important thing to keep in mind while generating analysis dataset is to drop the unused variables or data which does not belong to the real reason behind generating that dataset, this is to improve the performance of the dataset for further report generation.Further analysis dataset is quality checked by an independent programmer by deriving it independently by understanding the requirement and doing a Proc compare on the dataset to ensure the quality.

Q. How do you setup a study or how do you start?Ans. It starts with working closely with statisticians / study team for requirements and programming plan which includes the step by step detail of important endpoints (primary or secondary) which needs to be derived and how they will be derived keeping SAP and Protocol in mind. Then the Tables, Fugures and Listings are decided and the LOT is prepared which helps divide the work between the team members and that’s how study setup starts.

Q. How do you QC a Raw Dataset or how do you make sure the quality of raw data is good.Ans. Raw dataset’s data is checked while performing the quality check of the tables which are based on that dataset. In the Quality check process of tables different permutation and combination on the type of data from a particular raw dataset is being used are put on. For example. If CRF mentions that this variable should have only 3 values then putting Proc freq and checking how many different unique values are in the data would be useful. Or checking against the demog data for male subject having value for child potential would be a data issue. Its important that the raw dataset is clean from data issues for the reports to be showing the different statistics of various data collections.Another way of checking data issues is to making sure that a particular data should be repeating in raw dataset or it needs to be unique for example screening records should only be one per subject but labs would have multiple entries for different visits of laboratory tests.

Q. How do you set up TFL’s or generate TFL’s ?Ans. After the VAD’s are ready then based on the LOT whatever is mentioned in it the TFL generation begins. Along with LOT we are provided with Mockups of the TFLs to facilitate how statistician want a TFL to be presented in what manner.Sometime the LOT of Mockup is not provided at that time we have to read the Protocol, SAPin detail to find out what all TFL’s would be required for the submission of that particular study this is called requirement review and then this

Page 14: Clinical SAS Interview Questions

requirement review is checked by statisticians or study team and then they give us green signal for the production of the TFL’s.TFL’s needs to have proper number, tittle, footnote, source data and other required mandatory information asked by client with proper formats and table structure mentioned in Mockups. All the TFL needs to be in co-ordination with each other as to mentioning right treatment arms and type of population which needs to be displayed.

Q. When a particular submission is ready or a deliverable is ready?Ans. When

1. Comparision against mockup is done2. None of the requirements are missed3. None of the crucial data for the reports is missed out4. All the statistical information required is present in TFL’s5. It is present in the layout and order is has been requested to be in6. Log of the programs are clean with no errors or potential warnings7. QC for VAD and TFL’s has been finished8. Numbers are presented in appropriate manner and with specific units

Q. How do you deal with Adhoc Reports or what kind of adhoc reports you have worked on?Ans. After tables are delivered sometime we receive adhoc request. Statistician will ask for a particular table which is already in original deliverable in a different group like by age, race, gender or a table by each cycle.So these kind of adhoc requests have to be dealt quickly and diligently. By acting on understanding the request clearly and doing programming and quality check on parallel basis. Q. What do you understand different protocol of a same compound?Ans. Sometimes a drug is administered against a

1. Comparator drug in the market 2. Compared against placebo3. Compared against different age groups4. Compared against doffrent dose level

Syntax of PROC SUMMARYPROC SUMMARY DATA=TEST.DATA NWAY; CLASS SEX COUNTRY STATE; VAR EXERLOSS WEITLOSS AEROLOSS; OUTPUT OUT=TEST1 SUM=;Run;Resulting report, contains the summed totals for each loss variable, is ordered as expected - alphabetically bySTATE within COUNTRY within SEX.

12. What is the Program Data Vector (PDV)? What are its functions?

Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.

14. At compile time when a SAS data set is read, what items are created?Automatic variables are created. Input Buffer, PDV and Descriptor Information·

Name statements that are recognized at compile time only? Drop, Keep, RenameLabel, Retain, LengthFormat, Informat

Page 15: Clinical SAS Interview Questions

Attrib, ArrayBy, Where

Name statements that are execution only.INFILE, INPUT·

Identify statements whose placement in the DATA step is critical.DATA, INPUT, RUN.

In the flow of DATA step processing, what is the first action in a typical DATA Step?The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. It is a Data counter variable in SAS.

_N_ indicates the number of times SAS has looped through the data step. This is not necessarily equal to the observation number, since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step.

_ERROR_ variable has a value of 1 if there is a error in the data for that observation and 0 if it is not.

Eg. If we want to find every third record in a Dataset then we can use the _n_ as follows

Data sasdatasetname;Set old;if mod(_n_,3)= 1 then;run;

Name some of the SAS functions you use on daily basis? Input, Put, Substr, Length, Lag, Sum, VarnumToday()Day()Mean()