rheumatoid arthritis

CONTENTS

S.No Topics

1. Acknowledgement

2. Abstract

3. Introduction

4. Review of Literature

5. Methodology

6. Tools and Software

7. Observation and Results

8. Conclusion

9. References

ACKNOWLEDGEMENT

I express my heartfelt thanks to Assistant Prof. Mr. Dharmendra Giri and Head of the Department of “KURUKSHETRA UNIVERSITY” for providing me an opportunity to do

my internship in SYNMEDICS LABORATORY PVT. LMD.,FARIDABAD.

I also like to thank the Directors of “SYNMEDICS LABORATORY PVT. LTD.” for granting me such a nice opportunity to work in a nice institute and for providing all

necessary facilities to carry out my project.

I take this opportunity to express my deep sense of gratitude to the faculty members who helped me to complete my training and my project especially Mr. D.S TOMAR who gave memorable support and guidance in each and every step of my work without which would not have been possible to complete my project work on “In-silico Genomic Analysis of

genes, In-silico competitive inhibition of herbal plant extracts on CD-40 gene and Quality control of Anti-Rheumatoid drugs”

I would like to thank Mr. V.K TYAGI and Mr. ISH JAIN who always been an imitable and have enlightened me by sharing their knowledge and facilitating with all the information

required for carrying out my project.

ABSTRACT:

Rheumatoid arthritis (RA) is an autoimmune disease that is strongly associated with the

expression of several HLA-DRB1, CD40, IL2, PADI4 and STAT4. Here I have demonstrated

the genome analysis of these 5 genes responsible for Rheumatoid Arthritis. I found that these

genes have property of antigencity and can trigger immune system when even a small

quantity of ligand concentration is available. I observed these genes with various tools like

CLC WORKBENCH AND GENEIOUS PRO. After genome analysis i performed

competitive inhibition of chemical constituents of natural herbs like nirgundi and bhrami, etc.

I found very good interaction with 44 ligands out of 46 ligands taken. I also performed

quality checking of anti rheumatic drugs at Synmedics Laboratory and found the exact

proportions of combinational drugs given in Rheumatoid Arthritis.

INTRODUCTION

BIOINFORMATICS:Bioinformatics, which uses computer databases to store, retrieve and assist in understanding biological information. Genome-scale sequencing projects have led to an explosion of genetic sequences available for automated analysis. These gene sequences are the codes, which direct the production of proteins that in turn regulate all life processes. The student will be shown how these sequences can lead to a much fuller understanding of many biological processes allowing pharmaceutical and biotechnology companies to determine for example new drug targets or to predict if particular drugs are applicable to all patients. Students will be introduced to the basic concepts behind Bioinformatics and Computational Biology tools. Hands-on sessions will familiarize students with the details and use of the most commonly used online tools and resources. The course will cover the use of NCBI's Entrez, BLAST, PSI-BLAST, ClustalW, Pfam, PRINTS, BLOCKS, Prosite and the PDB. An introduction to database design and the principles of programming languages will be predicted.

In all areas of biological and medical research, the role of the computer has been dramatically enhanced in the last five to ten year period. While the first wave of computational analysis did focus on sequence analysis, where many highly important unsolved problems still remain, the current and future needs will in particular concern sophisticated integration of extremely diverse sets of data. These novel types of data originate from a variety of experimental techniques of which many are capable of data production at the levels of entire cells, organs, organisms, or even populations. The main driving force behind the changes has been the advent of new, efficient experimental techniques, primarily DNA sequencing, that have led to anexponential growth of linear descriptions of protein, DNA and RNA molecules. Other new data producing techniques work as massively parallel versions of traditional experimental methodologies. Genome-wide gene expression measurements using DNA microrarrays is, in essence, a realization of tens of thousands of Northern blots. As a result, computational support in experiment design, processing of results and interpretation of results has become essential.

Major Research Areas:

Sequence analysisSince the Phage Φ-X174 was sequenced in 1977, the DNA sequences of hundreds of organisms have been decoded and stored in databases. These data are analyzed to determine genes that code for proteins, as well as regulatory sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees). With the growing amount of data, it long ago became impractical to analyze DNA sequences manually. Today, computer programs are used to search the genome of thousands of organisms, containing billions of nucleotides. These programs would compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical. A variant of this sequence alignment is used in the sequencing process itself. The so-called shotgun sequencing technique (which was used, for example, by The Institute for Genomic Research to sequence the first bacterial genome, Haemophilus influenzae) does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600-800 nucleotides long). The

ends of these fragments overlap and, when aligned in the right way, make up the complete genome. Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes. In the case of the Human Genome Project, it took several months of CPU time (on a circa-2000 vintage DEC Alpha computer) to assemble the fragments. Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research. Another aspect of bioinformatics in sequence analysis is the automatic search for genes and regulatory sequences within a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose. This so-called junk DNA may, however, contain unrecognized functional elements. Bioinformatics helps to bridge the gap between genome and proteome projects--for example, in the use of DNA sequences for protein identification.

Genome annotationIn the context of genomics, annotation is the process of marking the genes and other biological features in a DNA sequence. The first genome annotation software system was designed in 1995 by Dr. Owen White, who was part of the team that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae. Dr. White built a software system to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features, and to make initial assignments of function to those genes. Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.

Computational evolutionary biologyEvolutionary biology is the study of the origin and descent of species, as well as their change over time. Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to: trace the evolution of a large number of organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone, more recently, compare entire genomes, which permits the study of more complex evolutionary events, such as gene duplication, lateral gene transfer, and the prediction of bacterial speciation factors, build complex computational models of populations to predict the outcome of the system over time track and share information on an increasingly large number of species and organisms Future work endeavours to reconstruct the now more complex tree of life. The area of research within computer science that uses genetic algorithms is sometimes confused with computational evolutionary biology, but the two areas are unrelated. Measuring biodiversityBiodiversity of an ecosystem might be defined as the total genomic complement of a particular environment, from all of the species present, whether it is a biofilm in an abandoned mine, a drop of sea water, a scoop of soil, or the entire biosphere of the planet Earth. Databases are used to collect the species names, descriptions, distributions, genetic information, status and size of populations, habitat needs, and how each organism interacts with other species. Specialized software programs are used to find, visualize, and analyze the information, and most importantly, communicate it to other people. Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in agriculture) or endangered population (in conservation). One very exciting potential of this field is that entire DNA sequences, or genomes of endangered species can be

preserved, allowing the results of Nature's genetic experiment to be remembered in silico, and possibly reused in the future, even if that species is eventually lost.

Analysis of gene expressionThe expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), or various applications of multiplexed in-situ hybridization. All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in high-throughput gene expression studies. Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells. Analysis of regulationRegulation is the complex orchestration of events starting with an extra-cellular signal and ultimately leading to an increase or decrease in the activity of one or more protein molecules. Bioinformatics techniques have been applied to explore various steps in this process. For example, promoter analysis involves the elucidation and study of sequence motifs in the genomic region surrounding the coding region of a gene. These motifs influence the extent to which that region is transcribed into mRNA. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). One can then apply clustering algorithms to that expression data to determine which genes are co-expressed. For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements. Analysis of protein expressionProtein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected. Analysis of mutations in cancerMassive sequencing efforts are currently underway to identify point mutations in a variety of genes in cancer. The sheer volume of data produced requires automated systems to read sequence data, and to compare the sequencing results to the known sequence of the human genome, including known germline polymorphisms. Oligonucleotide microarrays, including comparative genomic hybridization and single nucleotide polymorphism arrays, able to probe simultaneously up to several hundred thousand sites throughout the genome are being used to identify chromosomal gains and losses in cancer. Hidden Markov model and change-point analysis methods are being developed to infer real copy number changes from often noisy data. Further informatics approaches are being developed to understand the implications of lesions found to be recurrent across many tumors. Some modern tools (e.g. Quantum 3.1 ) provide tool for changing the protein sequence at specific sites through alterations to its amino acids and predict changes in the bioactivity after mutations.

Prediction of protein structure

Protein structure prediction is another important application of bioinformatics. The amino acid sequence of a protein, the so-called primary structure, can be easily determined from the sequence on the gene that codes for it. In the vast majority of cases, this primary structure uniquely determines a structure in its native environment. (Of course, there are exceptions, such as the bovine spongiform encephalopathy - aka Mad Cow Disease - prion.) Knowledge of this structure is vital in understanding the function of the protein. For lack of better terms, structural information is usually classified as one of secondary, tertiary and quaternary structure. A viable general solution to such predictions remains an open problem. As of now, most efforts have been directed towards heuristics that work most of the time. One of the key ideas in bioinformatics is the notion of homology. In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In the structural branch of bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known. This currently remains the only way to predict protein structures reliably. One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting oxygen in the organism. Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes. Comparative genomics The core of comparative genome analysis is the establishment of the correspondence between genes (orthology analysis) or other genomic features in different organisms. It is these intergenomic maps that make it possible to trace the evolutionary processes responsible for the divergence of two genomes. A multitude of evolutionary events acting at various organizational levels shape genome evolution. At the lowest level, point mutations affect individual nucleotides. At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion. Ultimately, whole genomes are involved in processes of hybridization, polyploidization and endosymbiosis, often leading to rapid speciation. The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms, who have recourse to a spectra of algorithmic, statistical and mathematical techniques, ranging from exact, heuristics, fixed parameter and approximation algorithms for problems based on parsimony models to Markov Chain Monte Carlo algorithms for Bayesian analysis of problems based on probabilistic models.Modeling biological systemsSystems biology involves the use of computer simulations of cellular subsystems (such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms. High-throughput image analysisComputational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content biomedical imagery.

Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving accuracy, objectivity, or speed. A fully developed analysis system may completely replace the observer. Although these systems are not unique to biomedical imagery, biomedical imaging is becoming more important for both diagnostics and research. Some examples are: high-throughput and high-fidelity quantification and sub-cellular localization (high-content screening, cytohistopathology) morphometrics clinical image analysis and visualization determining the real-time air-flow patterns in breathing lungs of living animals quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury making behavioral observations from extended video recordings of laboratory animals infrared measurements for metabolic activity determination

GENOMICS :As of September 2007, the complete sequence was known of about 1879 viruses , 577 bacterial species and roughly 23 eukaryote organisms, of which about half are fungi. Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.For the United States Environmental Protection Agency, "the term "genomics" encompasses a broader scope of scientific inquiry associated technologies than when genomics was initially considered. A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels."Genomics was established by Fred Sanger when he first sequenced the complete genomes of a virus and a mitochondrion. His group established techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in the 1970-1980s.A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics. A related concept is materiomics, which is defined as the study of the material properties of biological materials (e.g. hierarchical protein structures and materials, mineralized biological tissues, etc.) and their effect on the macroscopic function and failure in their biological context, linking processes, structure and properties at multiple scales through a materials science approach.

http://www.news-medical.net/health/Proteomics-What-is-Proteomics.aspx

http://www.news-medical.net/health/What-is-Gene-Expression.aspx

http://www.news-medical.net/health/What-is-a-Virus.aspx

http://www.news-medical.net/health/What-is-Genetics.aspx

http://www.news-medical.net/health/What-is-Molecular-Biology.aspx

http://www.news-medical.net/health/What-is-Molecular-Biology.aspx

http://www.news-medical.net/health/Genes-What-are-Genes.aspx

http://www.news-medical.net/health/What-is-DNA.aspx

http://www.news-medical.net/health/What-is-a-Virus.aspx

The actual term 'genomics' is thought to have been coined by Dr. Tom Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) at a meeting held in Maryland on the mapping of the human genome in 1986. In 1972, Walter Fiers and his team at the Laboratory of Molecular Biology of the University of Ghent (Ghent, Belgium) were the first to determine the sequence of a gene: the gene for Bacteriophage MS2 coat protein. In 1976, the team determined the complete nucleotide-sequence of bacteriophage MS2-RNA. The first DNA-based genome to be sequenced in its entirety was that of bacteriophage Φ-X174; (5,368 bp), sequenced by Frederick Sanger in 1977.The first free-living organism to be sequenced was that of ''Haemophilus influenzae'' in 1995, and since then genomes are being sequenced at a rapid pace. Most of the bacteria whose genomes have been completely sequenced are problematic disease-causing agents, such as ''Haemophilus influenzae''. Of the other sequenced species, most were chosen because they were well-studied model organisms or promised to become good models. Yeast (''Saccharomyces cerevisiae'') has long been an important model organism for the eukaryotic cell, while the fruit fly ''Drosophila melanogaster'' has been a very important tool (notably in early pre-molecular genetics). The worm ''Caenorhabditis elegans'' is an often used simple model for multicellular organisms. The zebrafish ''Brachydanio rerio'' is used for many developmental studies on the molecular level and the flower ''Arabidopsis thaliana'' is a model organism for flowering plants. The Japanese pufferfish (''Takifugu rubripes'') and the spotted green pufferfish (''Tetraodon nigroviridis'') are interesting because of their small and compact genomes, containing very little non-coding DNA compared to most species.A rough draft of the human genome was completed by the Human Genome Project in early 2001, creating much fanfare. By 2007 the human sequence was declared "finished" (less than one error in 10,000 bases and all chromosomes assembled. Display of the results of the project required significant bioinformatics resources. The sequence of the human reference assembly can be explored using the UCSC Genome Browser.

TYPES OF GENOMICS:In the last few years some interesting findings have been recorded and several new branches have emerged. Consequently, the area of genomics has quitely widened. However, the genomics is broadly categorised into two, structural genomics and functional genomics. 1. Structural Genomics : The structural genomics deals with DNA sequencing, sequence assembly, sequence organisation and management. Basically it is the starting stage of genome analysis i.e. construction of genetic, physical or sequence maps of high resolution of the organism. The complete DNA sequence of an organism is its ultimate physical map. Due to rapid advancement in DNA technology and completion of several genome sequencing projects for the last few years, the concept of structural genomics has come to a stage of transition.Now it also includes systematic and determination of 3D structure of proteins found in living cells. Because proteins in every group of individuals vary and so there would also be variations in genome sequences.2. Functional Genomics : Based on the information of structural genomics the next step is to reconstruct genome sequences and to find out the function that the genes do. This information also lends support to design experiment to find out the functions that specific genome does. The strategy of functional genomics has widened the scope of biological investigations. This strategy is based on systematic study of single gene/ protein to all genes/proteins. Therefore, the large scale experimental methodologies (along with statistically analysed/computed results)

http://www.news-medical.net/health/What-is-a-Chromosome.aspx

http://www.news-medical.net/health/Junk-DNA-What-is-Junk-DNA.aspx

http://www.news-medical.net/health/Junk-DNA-What-is-Junk-DNA.aspx

http://www.news-medical.net/health/What-is-RNA.aspx

http://www.news-medical.net/health/Genes-What-are-Genes.aspx

characterise the functional genomics. Hence, the functional genomics provide the novel information about the genome. This eases the understanding of genes and function of proteins, and protein interactions. The wealth of knowledge about this untold story is being unraveled by the scientists after the development of microarray technology and proteomics. These two technologies helped to explore the instantaneous events of all the genes expressed in a cell/ tissue present at varying environmental conditions like temperature, pH, etc.

GENOMICS APPLICATIONS :1. Functional genomics2. Gene identification 3. Comparitive genomics4. Genome wide expression analysis

GENOMIC ANALYSIS:

Main focus of the genome analysis will be the microarray technique and the preprocessing and analysis methods associated with it.The microarray technique generates a gene expression profile which gives the expression states of genes in a cell by reporting the mRNA concentration. The mRNA concentration in turn reports the cell status determined by what and how many proteins are currently produced. The DNA microarray technologies such as cDNA and oligonucleotide arrays provide means of measuring tens of thousands of genes simultaneously (a snapshot of the cell). The microarrays are a large scale high-throughput method for molecular biological experimentation.The information obtained by recognizing genes that share expression patterns and hence might be regulated together are assumed to be in the same genetic pathway. Therefore the microarray technique helps to understand the dynamics and regulation behavior in a cell. One of the goals of microarray technology is the detection of genes that are differentially expressed in tissue samples like healthy and cancerous tissues to see which genes are relevant for cancer. It has important applications in pharmaceutical and clinical research and helps in understanding gene regulation and interactions. Genome analysis includes also genome anatomy and genome individuality (e.g. repetitions or single nucleotide polymorphism). We will address also actual genomic research questions about alternative splicing and nucleosome position.

Genomics – Main Research Areas

With the completion of the entire human DNA sequence, an important stage of genomic research has come to an end. However, the scientists still have plenty of work in genomics such as to determine the role of genomic variations on cell function and establish each gene in the context of the entire genome. In addition to human genome, other species’ genomes are being studied as well in order to better understand biology as well as to be able to use the knowledge of genomics to improve human health and to treat diseases.The advances in genomics were followed by advances in technology which enables the scientists to conduct more complex researches but it also led to specialization within the field of genomics. Thus the study of the genomes has been divided into several research areas with an emphasis on:Human genomicsAs already mentioned earlier, the main goal of genomics in the field of human genome – to sequence the entire human DNA has been completed in 2007 when the human genome sequencing has been declared finished by the Human Genome Project. Now, however, the scientists have to interpret the data and make the knowledge useful for practical applications such as treating and preventing diseases.Bacteriophage genomicsThis field of genomic research is focused on the study of bacteriophages or bacteria which infect the viruses. In the past, bacteriophages were also used to define gene structure and it was a bacteriophage whose genome was sequenced first.Cyanobacteria genomicsIt refers to the study of cyanobacteria, also known as blue-green algae which get their energy through the process of photosynthesis. This phylum of bacteria is thought to play an important role in shaping the Earth’s atmosphere and biodiversity of life on our planet by producing oxygen.Metagenomicsit is the study of metagenomes or material that is obtained directly from the environmental samples. Genome sequencing of the cultures which were taken from the environment has shown that the traditional research on cultures which was based on cultivated clonal cultures has missed most of microbial diversity. Metagenomics has revealed many previously unknown characteristics of the microbial world. As a result, this field of genomic research has a great potential to revolutionize not only the understanding of the microbial world but the entire living world.

Functional genomicsThis field of genomics is focused on interpretation of the data created by genomic research projects in order to describe functions and interaction of a gene. In contrary to genomics which is mainly focused on obtaining information from DNA sequence, functional genomics is also interested in the DNA on the levels of the genes. However, it does not use gene-by-gene approach but rather a genome-wide method.

PharmacogenomicsIt is a branch of pharmacology and genomics which researches the relationship between drug response and genetic variation in order to develop drug therapy which ensures optimal efficacy and minimum risk of side affects in respect to the patient’s genotype. This approach has been shown to be very helpful in treating conditions such as cancer, cardiovascular disease, diabetes, asthma and depression.

Drug designing

Drug design, sometimes referred to as rational drug design or more simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target.The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of small molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is often referred to as computer-aided drug design.

Drug discovery and development is an intense, lengthy and interdisciplinary endeavour. Drug discovery is mostly portrayed as a linear, consecutive process that starts with target and lead discovery, followed by lead optimization and pre- clinical in-vitro and in-vivo studies to determine if such compounds satisfy a number of pre-set criteria for initiating clinical development.

Typically a drug target is a key molecule involved in a particular metabolic or signalling pathway that is specific to a disease condition or pathology, or to the infectivity or survival of a microbial pathogen. Some approaches attempt to stop the functioning of pathway in the diseased state by causing a key molecule to stop functioning. Drugs may be designed that bind to the active region and inhibit this key molecule. However, these drugs would also have to design in such a way as not to affect any other important molecules that may be similar in appearance to the key molecules. Sequence homologies are often used to identify such risks. Other approaches may be to enhance the normal pathway by promoting specific molecules in the normal pathways that may have been affected in the diseased state. For the pharmaceutical industries the number of years to bring a drug from discovery to market is approximately 12-14 years and costing upto 41.2-$1.4 billion dollars. Traditionally, drugs were synthesizing compounds in a time consuming multi step processes against a battery of in vivo biological screens and further investigating the promising candidates for their pharmacokinetic properties, metabolism and potential toxicity. Such a development process has resulted in high attrition rates with failures attributed to poor pharmacokinetics (39%),lack of efficacy (30%), animal toxicity (11%), adverse effects in human (10%0 and various commercial and miscellaneous factors. Today , the process of drug discovery has been revolutionised with the advent of genomics, proteomic, bioinformatics and efficient technologies like, combinatorial chemistry, high throughput screening (HTS), virtual screening, de-novo design, in vitro, in-silico, ADMET screening and structure based drug design.

In-silico Drug Designing

In-silico methods can help in identifying drug targets via bioinformatics tools.They can also be used to analyze the target structures for possible binding/ active sites, generate candidate molecules, check for their drug likeness, dock these molecules with the target, rank them according to their binding affinities, further optimize the molecules to improve binding characteristics.

http://en.wikipedia.org/wiki/Biomolecule

The use of computers and computational methods permeates all aspects of drug discovery today and forms the core of structure-based drug design. High-performance computing, data management software and internet are facilitating the access of huge amount of data generated and transforming the massive complex biological data into workable knowledge in modern day drug discovery process. The use of complementary experimental and informatics techniques increases the chance of success in many stages of the discovery process, from the identification of novel targets and elucidation of their functions to the discovery and development of lead compounds with desired properties. Computational tools offer the advantage of delivering new drug candidates more quickly and at a lower cost. Major roles of computation in drug discovery are;

(1) Virtual screening & de novo design,

(2) In silico ADME/T prediction and

(3) Advanced methods for determining protein-ligand binding

Significance of insilico drug design

As structures of more and more protein targets become available through crystallography, NMR and bioinformatics methods, there is an increasing demand for computational tools that can identify and analyze active sites and suggest potential drug molecules that can bind to these sites specifically. Also to combat life-threatening diseases such as AIDS, Tuberculosis, Malaria etc., a global push is essential. Millions for Viagra and pennies for the diseases of the poor is the current situation of investment in Pharma R&D. Time and cost required for designing a new drug are immense and at an unacceptable level. According to some estimates it costs about $880 million and 14 years of research to develop a new drug before it is introduced in the market Intervention of computers at some plausible steps is imperative to bring down the cost and time required in the drug discovery process.

Structure based drug design

The crystal structure of a ligand bound to a protein provides a detailed insight into the interactions made between the protein and the ligand. Structure designed can be used to identify where the ligand can be changed to modulate the physicochemical and ADME properties of the compound, by showing which parts of the compound are important to affinity and which parts can be altered without affecting the binding. The equilibrium between target and ligand is governed by the free energy of the complex compared to the free energy of the individual target and ligand. This includes not only the interaction between target and ligand but also the salvation and entropy of the three different species and the energy of the conformation of the free species.

What is quality control and what are the methods applied for it?

Quality control is a process that is used to ensure a certain level of quality in a product or service. It might include whatever actions a business deems necessary to provide for the control and verification of certain

characteristics of a product or service. Most often, it involves thoroughly examining and testing the quality of products or the results of services. The basic goal of this process is to ensure that the products or services that are provided meet specific requirements and characteristics, such as being dependable, satisfactory, safe and fiscally sound.

Companies that engage in quality control typically have a team of workers who focus on testing a certain number of products or observing services being done. The products or services that are examined usually are chosen at random. The goal of the quality control team is to identify products or services that do not meet a company's specified standards of quality. If a problem is identified, the job of a quality control team or professional might involve stopping production or service until the problem has been corrected. Depending on the particular service or product as well as the type of problem identified, production or services might not cease entirely.

Dissolution testingIn the pharmaceutical industry, drug dissolution testing is routinely used to provide critical in vitro drug release

information for both quality control purposes, i.e., to assess batch-to-batch consistency of solid oral dosage forms

such as tablets, and drug development, i.e., to predict in vivo drug release profiles.

In vitro drug dissolution data generated from dissolution testing experiments can be related to in

vivo pharmacokinetic data by means of in vitro-in vivo correlations (IVIVC). A well established predictive IVIVC

model can be very helpful for drug formulation design and post-approval manufacturing changes.[2]

The main objective of developing and evaluating an IVIVC is to establish the dissolution test as a surrogate for

human bioequivalence studies, as stated by the Food and Drug Administration (FDA). Analytical data from drug

dissolution testing are sufficient in many cases to establish safety and efficacy of a drug product without in vivo tests,

following minor formulation and manufacturing changes (Qureshi and Shabnam, 2001). Thus, the dissolution testing

which is conducted in dissolution apparatus must be able to provide accurate and reproducible results.

Several dissolution apparatuses exist. In United States Pharmacopeia (USP) General Chapter <711> Dissolution, there

are four dissolution apparatuses standardized and specified.[3] They are:

• USP Dissolution Apparatus 1 - Basket (37°C)

• Dissolution Apparatus 2 - Paddle (37°C)

• USP Dissolution Apparatus 3 - Reciprocating Cylinder (37°C)

• USP Dissolution Apparatus 4 - Flow-Through Cell (37°C)

USP Dissolution Apparatus 2 is the most widely used apparatus among these four.

The performances of dissolution apparatuses are highly dependent on hydrodynamics due to the nature of dissolution

testing. The designs of the dissolution apparatuses and the ways of operating dissolution apparatuses have huge

impacts on the hydrodynamics, thus the performances. Hydrodynamic studies in dissolution apparatuses were carried

out by researchers over the past few years with both experimental methods and numerical modeling such

as Computational Fluid Dynamics (CFD). The main target was USP Dissolution Apparatus 2. The reason is that many

researchers suspect that USP Dissolution Apparatus 2 provides inconsistent and sometimes faulty data. The

hydrodynamic studies of USP Dissolution Apparatus 2 mentioned above clearly showed that it does have intrinsic

hydrodynamic issues which could result in problems. In 2005, Professor Piero Armenante from New Jersey Institute

of Technology (NJIT) and Professor Fernando Muzzio from University submitted a technical report to the FDA. In

this technical report, the intrinsic hydrodynamic issues with USP Dissolution Apparatus 2 based on the research

findings of Armenante's group and Muzzio's group were discussed.

More recently, hydrodynamic studies were conducted in USP Dissolution Apparatus 4.

Disintegration Test

PurposeDisintegration tests are performed as per the pharmacopoeial standards. Disintegration is a measure of the quality of the oral dosage form like tablets and capsules. Each of the pharmacopoeia like the USP, BP, IP etc each have their own set of standards and specify disintegration tests of their own. USP, European pharmacopoeia and Japanese pharmacopoeia have been harmonised by the International conference on Harmonisation (ICH) and are interchangeable. The disintegration test is performed to find out the time it takes for a solid oral dosage form like a tablet or capsule to completely disintegrate. The time of disintegration is a measure of the quality. This is because, for example, if the disintegration time is too high; it means that the tablet is too highly compressed or the capsule shell gelatin is not of pharmacopoeial quality or it may imply several other reasons. And also if the disintegration time is not uniform in a set of samples being analysed, it indicates batch inconsistency and lack of batch uniformity.

HistoryThe Swiss Pharmacopoeia, way back in 1935, required that a disintegration test should be performed on all tablets and capsules as a criterion of its performance (1). Disintegration test was seen as a test for the uniformity of the compressional characteristics. Optimisation of compression characteristics was done based on disintegration test and the hardness test. Modern medicine era may be considered to be starting from 1937, and from this year tablets became important (2). Tabletting technology was mostly empirical upto the year 1950. Till this year, i.e., 1950, formulators depended on disintegration test, largely, to optimise their compression characteristics. Drug release testing by way of dissolution testing was not much used to characterise the tablets, probably because, by that time, convenient and sensitive chemical analyses were not available before this

http://www.pharmainfo.net/tablet-evaluation-tests/dissolution

http://www.ich.org/

http://www.usp.org/

http://www.usp.org/

period. The British Pharmacopoeia was the first, in 1945, to adopt an official disintegration test. Before 1950, the test became official in USP also. Even at that time, it was recognised that disintegration does not ensure good performance. USP-NF of that period says " disintegration does not imply complete solution of the tablet or even of its active ingredient." In the year 1950, sporadic reports of tablet products of vitamins failing to release their total drug content started appearing. It was only then that formulators realised that though the tablets/capsules showed required disintegration time, they might show poor dissolution, which might effect its clinical performance. Chapman et al., demonstrated that formulations with long disintegration times might not show good bioavailability. Later, John Wagner demonstrated the relationship between poor performance of some drug products in disintegration tests and their failure to release the drug during their gastrointestinal transit. In the 1960s two separate developments occurred. One is the development of sensitive instrumental methods of analysis and the other is the growth of a new generation of pharmaceutical scientists who started applying the principles of physical chemistry to pharmacy. This development was attributed in USA to Takeru Higuchi and his students. In the later period more pharmaceutical scientists like, Campagna, Nelson, and Levy worked on this field and more and more instances of lack of correlation between disintegration time and bioavailability surfaced. It was in the year 1970 that the first dissolution apparatus, the rotating basket was designed and adopted in the USA. An excellent review on disintegration test was written by Wagner in 1971 (3).

Disintegration Test ApparatusComing to the test, the disintegration test is conducted using the disintegration apparatus. Although there are slight variations in the different pharmacopoeias, the basic construction and the working of the apparatus remains the same. The apparatus consists of a basket made of transparent polyvinyl or other plastic material. It has 6 tubes set into the same basket with equal diameter and a wire mesh made of stainless steel with uniform mesh size is fixed to each of these six tubes. Small metal discs may be used to enable immersion of the dosage form completely. The entire basket-rack assembly is movable by reciprocating motor which is fixed to the apex of the basket-rack assembly. The entire assembly is immersed in a vessel containing the medium in which the disintegration test is to be carried out. The vessel is provided with a thermostat to regulate the temperature of the

fluid medium to the desired temperature.

Disintegration Test MethodThe disintegration test for each dosage form is given in the pharmacopoeia. There are some general tests for typical types of dosage forms. However, the disintegration test prescribed in the individual monograph of a

http://www.pharmainfo.net/tablet-evaluation-tests/dissolution

product is to be followed. If the monograph does not specify any specific test, the general test for the specific dosage form may be employed. Some of the types of dosage forms and their disintegration tests are: 1.Uncoated tablets- Tested using distilled water as medium at 37+/-2 C at 29-32 cycles per minute; test is completed after 15 minutes. It is acceptable when there is no palpable core at the end of the cycle (for at least 5 tablets or capsules) and if the mass does not stick to the immersion disc. 2.Coated tablets- the same test procedure is adapted but the time of operation is 30 minutes. 3.Enteric coated/ Gastric resistant tablets- the test is carried out first in distilled water (at room temperature for 5 min.; USP and no distilled water per BP and IP), then it is tested in 0.1 M HCL (upto 2 hours; BP) or Stimulated gastric fluid (1 hour; USP) followed by Phosphate buffer, pH 6.8 (1 hour; BP) or Stimulated intestinal fluid without enzymes (1 hour; USP). 4.Chewable tablets- exempted from disintegration test (BP and IP), 4 hours (USP). These are a few examples for illustration. The disintegration tests for capsules, both hard and soft gelatin capsules are also performed in a similar manner. Also, the USP also provides disintegration tests for suppositories, pessaries etc. http://www.youtube.com/watch?v=c_xkeSfZa9Y

Applications of Disintegration test :1.Disintegration test is a simple test which helps in the preformulation stage to the formulator. 2.It helps in the optimisation of manufacturing variables, such as compressional force and dwell time. 3.This test is also a simple in-process control tool to ensure uniformity from batch to batch and among different tablets 4.It is also an important test in the quality control of tablets and hard gelatine capsules.

Advantages of Disintegration tests:

This test is simple in concept and in practice. It is very useful in preformulation, optimisation and in quality control.

Disadvantages:

Disintegration test cannot be relied upon for the assurance of bioavailability.

HIGH PERFORMANCE LIQUID CHROMATOGRAPHY - HPLC

High performance liquid chromatography is a powerful tool in analysis. This page looks at how it is carried out and shows how it uses the same principles as in thin layer chromatography and column chromatography.

Carrying out HPLC

Introduction

High performance liquid chromatography is basically a highly improved form of column chromatography. Instead of a solvent being allowed to drip through a column under gravity, it is forced through under high pressures of up to 400 atmospheres. That makes it much faster.

It also allows you to use a very much smaller particle size for the column packing material which gives a much greater surface area for interactions between the stationary phase and the molecules flowing past it. This allows a much better separation of the components of the mixture. The other major improvement over column chromatography concerns the detection methods which can be used. These methods are highly automated and extremely sensitive.

http://www.youtube.com/watch?v=c_xkeSfZa9Y

The column and the solvent

Confusingly, there are two variants in use in HPLC depending on the relative polarity of the solvent and the stationary phase.

Normal phase HPLC

This is essentially just the same as you will already have read about in thin layer chromatography or column chromatography. Although it is described as "normal", it isn't the most commonly used form of HPLC.

The column is filled with tiny silica particles, and the solvent is non-polar - hexane, for example. A typical column has an internal diameter of 4.6 mm (and may be less than that), and a length of 150 to 250 mm.

Polar compounds in the mixture being passed through the column will stick longer to the polar silica than non-polar compounds will. The non-polar ones will therefore pass more quickly through the column.

Reversed phase HPLC

In this case, the column size is the same, but the silica is modified to make it non-polar by attaching long hydrocarbon chains to its surface - typically with either 8 or 18 carbon atoms in them. A polar solvent is used - for example, a mixture of water and an alcohol such as methanol.

In this case, there will be a strong attraction between the polar solvent and polar molecules in the mixture being passed through the column. There won't be as much attraction between the hydrocarbon chains attached to the silica (the stationary phase) and the polar molecules in the solution. Polar molecules in the mixture will therefore spend most of their time moving with the solvent.

Non-polar compounds in the mixture will tend to form attractions with the hydrocarbon groups because of van der Waals dispersion forces. They will also be less soluble in the solvent because of the need to break hydrogen bonds as they squeeze in between the water or methanol molecules, for example. They therefore spend less time in solution in the solvent and this will slow them down on their way through the column.

That means that now it is the polar molecules that will travel through the column more quickly.

Reversed phase HPLC is the most commonly used form of HPLC.

Looking at the whole process

A flow scheme for HPLC

Injection of the sample

Injection of the sample is entirely automated, and you wouldn't be expected to know how this is done at this introductory level. Because of the pressures involved, it is not the same as in gas chromatography (if you have already studied that).

Retention time :The time taken for a particular compound to travel through the column to the detector is known as its retention time. This time is measured from the time at which the sample is injected to the point at which the display shows a maximum peak height for that compound.

Different compounds have different retention times. For a particular compound, the retention time will vary depending on:

the pressure used (because that affects the flow rate of the solvent) the nature of the stationary phase (not only what material it is made of, but

also particle size) the exact composition of the solvent the temperature of the column

That means that conditions have to be carefully controlled if you are using retention times as a way of identifying compounds.

The detector

There are several ways of detecting when a substance has passed through the column. A common method which is easy to explain uses ultra-violet absorption.

Many organic compounds absorb UV light of various wavelengths. If you have a beam of UV light shining through the stream of liquid coming out of the column, and

a UV detector on the opposite side of the stream, you can get a direct reading of how much of the light is absorbed.

The amount of light absorbed will depend on the amount of a particular compound that is passing through the beam at the time.

You might wonder why the solvents used don't absorb UV light. They do! But different compounds absorb most strongly in different parts of the UV spectrum.

Methanol, for example, absorbs at wavelengths below 205 nm, and water below 190 nm. If you were using a methanol-water mixture as the solvent, you would therefore have to use a wavelength greater than 205 nm to avoid false readings from the solvent.

Interpreting the output from the detector

The output will be recorded as a series of peaks - each one representing a compound in the mixture passing through the detector and absorbing UV light. As long as you

were careful to control the conditions on the column, you could use the retention times to help to identify the compounds present - provided, of course, that you (or somebody else) had already measured them for pure samples of the various compounds under those identical conditions.

But you can also use the peaks as a way of measuring the quantities of the compounds present. Let's suppose that you are interested in a particular compound, X.

If you injected a solution containing a known amount of pure X into the machine, not only could you record its retention time, but you could also relate the amount of X to the peak that was formed.

The area under the peak is proportional to the amount of X which has passed the detector, and this area can be calculated automatically by the computer linked to the display. The area it would measure is shown in green in the (very simplified) diagram.

If the solution of X was less concentrated, the area under the peak would be less - although the retention time will still be the same. For example:

This means that it is possible to calibrate the machine so that it can be used to find how much of a substance is present - even in very small quantities.

Be careful, though! If you had two different substances in the mixture (X and Y) could you say anything about their relative amounts? Not if you were using UV absorption as your detection method.

In the diagram, the area under the peak for Y is less than that for X. That may be because there is less Y than X, but it could equally well be because Y absorbs UV light at the wavelength you are using less than X does. There might be large quantities of Y present, but if it only absorbed weakly, it would only give a small

peak.

Coupling HPLC to a mass spectrometer

This is where it gets really clever! When the detector is showing a peak, some of what is passing through the detector at that time can be diverted to a mass spectrometer. There it will give a fragmentation pattern which can be compared against a computer database of known patterns. That means that the identity of a huge range of compounds can be found without having to know their retention times.

Introduction to Spectroscopy

In previous sections of this text the structural formulas of hundreds of organic compounds have been reported, often with very little supporting evidence. These structures, and millions of others described in the scientific literature, are in fact based upon sound experimental evidence, which was omitted at the time in order to focus on other aspects of the subject. Much of the most compelling evidence for structure comes from spectroscopic experiments, as will be demonstrated in the following topics.

The Light of Knowledge is an often used phrase, but it is particularly appropriate in reference to spectroscopy. Most of what we know about the structure of atoms and molecules comes from studying their interaction with light (electromagnetic radiation). Different regions of the electromagnetic spectrum provide different kinds of information as a result of such interactions. Realizing that light may be considered to have both wave-like and particle-like characteristics, it is useful to consider that a given frequency or wavelength of light is associated with a "light quanta" of energy we now call a photon. As noted in the following equations, frequency and energy change proportionally, but wavelength has an inverse relationship to these quantities.

In order to "see" a molecule, we must use light having a wavelength smaller than the molecule itself (roughly 1 to 15 angstrom units). Such radiation is found in the X-ray region of the spectrum, and the field of X-ray crystallography yields remarkably detailed pictures of molecular structures amenable to examination. The chief limiting factor here is the need for high quality crystals of the compound being studied. The methods of X-ray crystallography are too complex to be described here; nevertheless, as automatic instrumentation and data handling techniques improve, it will undoubtedly prove to be the procedure of choice for structure determination. The spectroscopic techniques described below do not provide a three-dimensional picture of a molecule, but instead yield information about certain characteristic features. A brief summary of this information follows:

• Mass Spectrometry: Sample molecules are ionized by high energy electrons. The mass to charge ratio of these ions is measured very accurately by electrostatic acceleration and magnetic field perturbation, providing a precise molecular weight. Ion fragmentation patterns may be related to the structure of the molecular ion.• Ultraviolet-Visible Spectroscopy: Absorption of this relatively high-energy light causes electronic excitation. The easily accessible part of this region (wavelengths of 200 to 800 nm) shows absorption only if conjugated pi-electron systems are present.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/UV-Vis/spectrum.htm#uv2

• Infrared Spectroscopy: Absorption of this lower energy radiation causes vibrational and rotational excitation of groups of atoms. within the molecule. Because of their characteristic absorptionsidentification of functional groups is easily accomplished.• Nuclear Magnetic Resonance Spectroscopy: Absorption in the low-energy radio-frequency part of the spectrum causes excitation of nuclear spin states. NMR spectrometers are tuned to certain nuclei (e.g. 1H, 13C, 19F & 31P). For a given type of nucleus, high-resolution spectroscopy distinguishes and counts atoms in different locations in the molecule.

Visible and Ultraviolet Spectroscopy

1. Background

An obvious difference between certain compounds is their color. Thus, quinone is yellow; chlorophyll is green; the 2,4-dinitrophenylhydrazone derivatives of aldehydes and ketones range in color from bright yellow to deep red, depending on double bond conjugation; and aspirin is colorless. In this respect the human eye is functioning as a spectrometer analyzing the light reflected from the surface of a solid or passing through a liquid. Although we see sunlight (or white light) as uniform or homogeneous in color, it is actually composed of a broad range of radiation wavelengths in the ultraviolet (UV), visible and infrared (IR) portions of the spectrum. As shown on the right, the component colors of the visible portion can be separated by passing sunlight through a prism, which acts to bend the light in differing degrees according to wavelength. Electromagnetic radiation such as visible light is commonly treated as a wave phenomenon, characterized by a wavelength or frequency. Wavelength is defined on the left below, as the distance between adjacent peaks (or troughs), and may be designated in meters, centimeters or nanometers (10-

9 meters). Frequency is the number of wave cycles that travel past a fixed point per unit of time, and is usually given in cycles per second, or hertz (Hz). Visible wavelengths cover a range from approximately 400 to 800 nm. The longest visible wavelength is red and the shortest is violet. Other common colors of the spectrum, in order of decreasing wavelength, may be remembered by the mnemonic: ROY G BIV. The wavelengths of what we perceive as particular colors in the visible portion of the spectrum are displayed and listed below. In horizontal diagrams, such as the one on the bottom left, wavelength will increase on moving from left to right.

Violet: 400 - 420 nm Indigo: 420 - 440 nm Blue: 440 - 490 nm Green: 490 - 570 nm Yellow: 570 - 585 nm

Orange: 585 - 620 nm Red: 620 - 780 nm

When white light passes through or is reflected by a colored substance, a characteristic portion of the mixed wavelengths is absorbed. The remaining light will then assume the complementary color to the wavelength(s) absorbed. This relationship is demonstrated by the color wheel shown on the right. Here, complementary colors are diametrically opposite each other. Thus, absorption of 420-430 nm light renders a substance yellow, and absorption of 500-520 nm light makes it red. Green is unique in that it can be created by absoption close to 400 nm as well as absorption near 800 nm.Early humans valued colored pigments, and used them for decorative purposes. Many of these were inorganic minerals, but several important organic dyes were also known. These included the crimson pigment, kermesic acid, the blue dye, indigo, and the yellow saffron pigment, crocetin. A rare dibromo-indigo derivative, punicin, was used to color the robes of the royal and wealthy. The deep orange hydrocarbon carotene is widely distributed in plants, but is not sufficiently stable to be used as permanent pigment, other than for food coloring. A common feature of all these colored compounds, displayed below, is a system of extensively conjugated pi-electrons.

2. The Electromagnetic Spectrum

The visible spectrum constitutes but a small part of the total radiation spectrum. Most of the radiation that surrounds us cannot be seen, but can be detected by dedicated sensing instruments. Thiselectromagnetic spectrum ranges from very short wavelengths (including gamma and x-rays) to very long wavelengths (including microwaves and broadcast radio waves). The following chart displays many of the important regions of this spectrum, and demonstrates the inverse relationship between wavelength and frequency (shown in the top equation below the chart).

The energy associated with a given segment of the spectrum is proportional to its frequency. The bottom equation describes this relationship, which provides the energy carried by a photon of a given wavelength of radiation.

To obtain specific frequency, wavelength and energy values use this calculator.

3. UV-Visible Absorption Spectra

To understand why some compounds are colored and others are not, and to determine the relationship of conjugation to color, we must make accurate measurements of light absorption at different wavelengths in and near the visible part of the spectrum. Commercial optical

spectrometers enable such experiments to be conducted with ease, and usually survey both the near ultraviolet and visible portions of the spectrum.

The visible region of the spectrum comprises photon energies of 36 to 72 kcal/mole, and the near ultraviolet region, out to 200 nm, extends this energy range to 143 kcal/mole. Ultraviolet radiation having wavelengths less than 200 nm is difficult to handle, and is seldom used as a routine tool for structural analysis.

The energies noted above are sufficient to promote or excite a molecular electron to a higher energy orbital. Consequently, absorption spectroscopy carried out in this region is sometimes called "electronic spectroscopy". A diagram showing the

various kinds of electronic excitation that may occur in organic molecules is shown on the left. Of the six transitions outlined, only the two lowest energy ones (left-most, colored blue) are achieved by the energies available in the 200 to 800 nm spectrum. As a rule, energetically favored electron promotion will be from the highest occupied molecular orbital (HOMO) to the lowest unoccupied molecular orbital (LUMO), and the resulting species is called an excited state. For a review of molecular orbitals .When sample molecules are exposed to light having an energy that matches a possible electronic transition within the molecule, some of the light energy will be absorbed as the electron is promoted to a higher energy orbital. An optical spectrometer records the wavelengths at which absorption occurs, together with the degree of absorption at each wavelength. The resulting spectrum is presented as a graph of absorbance (A) versus wavelength, as in the isoprene spectrum shown below. Since isoprene is colorless, it does not absorb in the visible part of the spectrum and this region is not displayed on the graph. Absorbance usually ranges from 0 (no absorption) to 2 (99% absorption), and is precisely defined in context with spectrometer operation.

Because the absorbance of a sample will be proportional to the number of absorbing molecules in the spectrometer light beam (e.g. their molar concentration in the sample tube), it is necessary to correct the absorbance value for this and other operational factors if the spectra of different compounds are to be compared in a meaningful way. The corrected absorption value is called "molar absorptivity", and is particularly useful when comparing the spectra of different compounds and determining the relative strength of light absorbing functions (chromophores). Molar absorptivity (ε) is defined as:

Molar Absorptivity, ε

= A / c l

(where A= absorbance, c = sample concentration in moles/liter & l = length of light path through the sample in cm.)

If the isoprene spectrum on the right was obtained from a dilute hexane solution (c = 4 * 10-

5 moles per liter) in a 1 cm sample cuvette, a simple calculation using the above formula indicates a molar absorptivity of 20,000 at the maximum absorption wavelength. Indeed the entire vertical absorbance scale may be changed to a molar absorptivity scale once this information about the sample is in hand. Clicking on the spectrum will display this change in units.

Chromophore

Exampl

e

Excitation

λma

x, nm

ε

Solvent

C=C

Ethene

π __

> π*

171

15,000

hexane

C≡C

1-Hexyne

π __

> π*

180

10,000

hexane

C=O

Ethanal

n __

> π*π __

> π*

290180

1510,000

hexanehexane

N=O

Nitromethan

e

n __

> π*

2752

175,

ethan

π __> π* 00

C-X X=Br X=I

Methyl bromideMethyl Iodide

n __> σ*n __> σ*

205255

From the chart above it should be clear that the only molecular moieties likely to absorb light in the 200 to 800 nm region are pi-electron functions and hetero atoms having non-bonding valence-shell electron pairs. Such light absorbing groups are referred to as chromophores. A list of some simple chromophores and their light absorption characteristics is provided on the left above. The oxygen non-bonding electrons in alcohols and ethers do not give rise to absorption above 160 nm. Consequently, pure alcohol and ether solvents may be used for spectroscopic studies.The presence of chromophores in a molecule is best documented by UV-Visible spectroscopy, but the failure of most instruments to provide absorption data for wavelengths below 200 nm makes the detection of isolated chromophores problematic. Fortunately, conjugation generally moves the absorption maxima to longer wavelengths, as in the case of isoprene, so conjugation becomes the major structural feature identified by this technique.Molar absorptivities may be very large for strongly absorbing chromophores (>10,000) and very small if absorption is weak (10 to 100). The magnitude ofε reflects both the size of the chromophore and the probability that light of a given wavelength will be absorbed when it strikes the chromophore.

For further discussion of this topic Click Here.

4. The Importance of Conjugation

A comparison of the absorption spectrum of 1-pentene, λmax = 178 nm, with that of isoprene (above) clearly demonstrates the importance of chromophore conjugation. Further evidence of this effect is shown below. The spectrum on the left illustrates that conjugation of double and triple bonds also shifts the absorption maximum to longer wavelengths. From the polyene spectra displayed in the center diagram, it is clear that each additional double bond in the conjugated pi-electron system shifts the absorption maximum about 30 nm in the same direction. Also, the molar absorptivity (ε) roughly doubles with each new conjugated double bond. Spectroscopists use the terms defined in the table on the right when describing shifts in absorption. Thus, extending conjugation generally results in bathochromic and hyperchromic shifts in absorption.The appearance of several absorption peaks or shoulders for a given chromophore is common for highly conjugated systems, and is often solvent dependent. This fine structure reflects not only the different conformations such systems may assume, but also electronic transitions between the different vibrational energy levels possible for each electronic state. Vibrational fine structure of this kind is most pronounced in vapor phase spectra, and is increasingly broadened and obscured in solution as the solvent is changed from hexane to methanol.

Terminology for Absorption Shifts

Nature of Shift

Descriptive Term

To Longer Wavelength

Bathochromic

To Shorter Wavelength

Hypsochromic

To Greater Absorbance

Hyperchromic

To Lower Absorbance

Hypochromic

To understand why conjugation should cause bathochromic shifts in the absorption maxima of chromophores, we need to look at the relative energy levels of the pi-orbitals. When two double bonds are conjugated, the four p-atomic orbitals combine to generate four pi-molecular orbitals (two are bonding and two are antibonding). This was described earlier in the section concerning diene chemistry. In a similar manner, the three double bonds of a conjugated triene create six pi-molecular orbitals, half bonding and half antibonding. The energetically most favorable π __> π* excitation occurs from the highest energy bonding pi-orbital (HOMO) to the lowest energy antibonding pi-orbital (LUMO). The following diagram illustrates this excitation for an isolated double bond (only two pi-orbitals) and, on clicking the diagram, for a conjugated diene and triene. In each case the HOMO is colored blue and the LUMO is colored magenta. Increased conjugation brings the HOMO and LUMO orbitals closer together. The energy (ΔE) required to effect the electron promotion is therefore less, and the wavelength that provides this energy is increased correspondingly (remember λ = h • c/ΔE ).


http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/addene2.htm#dien1c

Examples of π __> π* Excitation

Many other kinds of conjugated pi-electron systems act as chromophores and absorb light in the 200 to 800 nm region. These include unsaturated aldehydes and ketones and aromatic ring compounds. A few examples are displayed below. The spectrum of the unsaturated ketone (on the left) illustrates the advantage of a logarithmic display of molar absorptivity. The π __> π* absorption located at 242 nm is very strong, with an ε = 18,000. The weak n __> π* absorption near 300 nm has an ε = 100.

Benzene exhibits very strong light absorption near 180 nm (ε > 65,000) , weaker absorption at 200 nm (ε = 8,000) and a group of much weaker bands at 254 nm (ε = 240). Only the last group of absorptions are completely displayed because of the 200 nm cut-off characteristic of most spectrophotometers. The added conjugation in naphthalene, anthracene and tetracene causes bathochromic shifts of these absorption bands, as displayed in the chart on the left below. All the absorptions do not shift by the same amount, so for anthracene (green shaded

box) and tetracene (blue shaded box) the weak absorption is obscured by stronger bands that have experienced a greater red shift. As might be expected from their spectra, naphthalene and anthracene are colorless, but tetracene is orange.

The spectrum of the bicyclic diene (above right) shows some vibrational fine structure, but in general is similar in appearance to that of isoprene, shown above. Closer inspection discloses that the absorption maximum of the more highly substituted diene has moved to a longer wavelength by about 15 nm. This "substituent effect" is general for dienes and trienes, and is even more pronounced for enone chromophores.

Infrared Spectroscopy

1. IntroductionAs noted in a previous chapter, the light our eyes see is but a small part of a broad spectrum of electromagnetic radiation. On the immediate high energy side of the visible spectrum lies the ultraviolet, and on the low energy side is the infrared. The portion of the infrared region most useful for analysis of organic compounds is not immediately adjacent to the visible spectrum, but is that having a wavelength range from 2,500 to 16,000 nm, with a corresponding frequency range from 1.9*1013 to 1.2*1014 Hz.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/UV-Vis/spectrum.htm#uv3b

Photon energies associated with this part of the infrared (from 1 to 15 kcal/mole) are not large enough to excite electrons, but may induce vibrational excitation of covalently bonded atoms and groups. The covalent bonds in molecules are not rigid sticks or rods, such as found in molecular model kits, but are more like stiff springs that can be stretched and bent. The mobile nature of organic molecules was noted in the chapter concerning conformational isomers. We must now recognize that, in addition to the facile rotation of groups about single bonds, molecules experience a wide variety of vibrational motions, characteristic of their component atoms. Consequently, virtually all organic compounds will absorb infrared radiation that corresponds in energy to these vibrations. Infrared spectrometers, similar in principle to the UV-Visible spectrometer described elsewhere, permit chemists to obtain absorption spectra of compounds that are a unique reflection of their molecular structure. An example of such a spectrum is that of the flavoring agent vanillin, shown below.

The complexity of this spectrum is typical of most infrared spectra, and illustrates their use in identifying substances. The gap in the spectrum between 700 & 800 cm-1 is due to solvent (CCl4) absorption. Further analysis (below) will show that this spectrum also indicates the presence of an aldehyde function, a phenolic hydroxyl and a substituted benzene ring. The inverted display of absorption, compared with UV-Visible spectra, is characteristic. Thus a sample that did not absorb at all would record a horizontal line at 100% transmittance (top of the chart).

The frequency scale at the bottom of the chart is given in units of reciprocal centimeters (cm-

1) rather than Hz, because the numbers are more manageable. The reciprocal centimeter is the number of wave cycles in one centimeter; whereas, frequency in cycles per second or Hz is equal to the number of wave cycles in 3*1010 cm (the distance covered by light in one

Frequency - Wavelength Converter

Frequency in cm-1

Wavelength in μ


http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/UV-Vis/uvspec.htm#uv1

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/sterisom.htm#isom4

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/sterisom.htm#isom4

second). Wavelength units are in micrometers,microns (μ), instead of nanometers for the same reason. Most infrared spectra are displayed on a linear frequency scale, as shown here, but in some older texts a linear wavelength scale is used. A calculator for interconverting these frequency and wavelength values is provided on the right. Simply enter the value to be converted in the appropriate box, press "Calculate" and the equivalent number will appear in the empty box.Infrared spectra may be obtained from samples in all phases (liquid, solid and gaseous). Liquids are usually examined as a thin film sandwiched between two polished salt plates (note that glass absorbs infrared radiation, whereas NaCl is transparent). If solvents are used to dissolve solids, care must be taken to avoid obscuring important spectral regions by solvent absorption. Perchlorinated solvents such as carbon tetrachloride, chloroform and tetrachloroethene are commonly used. Alternatively, solids may either be incorporated in a thin KBr disk, prepared under high pressure, or mixed with a little non-volatile liquid and ground to a paste (or mull) that is smeared between salt plates.

2. Vibrational SpectroscopyA molecule composed of n-atoms has 3n degrees of freedom, six of which are translations and rotations of the molecule itself. This leaves 3n-6 degrees of vibrational freedom (3n-5 if the molecule is linear). Vibrational modes are often given descriptive names, such as stretching, bending, scissoring, rocking and twisting. The four-atom molecule of formaldehyde, the gas phase spectrum of which is shown below, provides an example of these terms. If a ball & stick model of formaldehyde is not displayed to the right of the spectrum, press the view ball&stick model button on the right. We expect six fundamental vibrations (12 minus 6), and these have been assigned to the spectrum absorptions. To see the formaldehyde molecule display a vibration, click one of the buttons under the spectrum, or click on one of the absorption peaks in the spectrum.

Gas Phase Infrared Spectrum of Formaldehyde, H2C=O

View CH2 Asymmetric Stretch

View CH2 Symmetric Stretch

View C=O Stretch View CH

2 Scissoring

View CH2 Rocking

View CH2 Wagging

Ball&Stick Model Spacefill Model Stick Model Motion Off

The exact frequency at which a given vibration occurs is determined by the strengths of the bonds involved and the mass of the component atoms. For a more detailed discussion of these factors Click Here. In practice, infrared spectra do not normally display separate absorption signals for each of the 3n-6 fundamental vibrational modes of a molecule. The number of observed absorptions may be increased by additive and subtractive interactions leading to combination tones and overtones of the fundamental vibrations, in much the same way that sound vibrations from a musical instrument interact. Furthermore, the number of observed absorptions may be decreased by molecular symmetry, spectrometer limitations, and spectroscopic selection rules. One selection rule that influences the intensity of infrared absorptions, is that a change in dipole moment should occur for a vibration to absorb infrared energy. Absorption bands associated with C=O bond stretching are usually very strong because a large change in the dipole takes place in that mode.Some General Trends:

i) Stretching frequencies are higher than corresponding bending frequencies. (It is easier to bend a bond than to stretch or compress it.)ii) Bonds to hydrogen have higher stretching frequencies than those to heavier atoms.iii) Triple bonds have higher stretching frequencies than corresponding double bonds, which in turn have higher frequencies than single bonds. (Except for bonds to hydrogen).

The general regions of the infrared spectrum in which various kinds of vibrational bands are observed are outlined in the following chart. Note that the blue colored sections above the dashed line refer to stretching vibrations, and the green colored band below the line encompasses bending vibrations. The complexity of infrared spectra in the 1450 to 600 cm-

1 region makes it difficult to assign all the absorption bands, and because of the unique patterns found there, it is often called the fingerprint region. Absorption bands in the 4000 to 1450 cm-1 region are usually due to stretching vibrations of diatomic units, and this is sometimes called the group frequency region.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/InfraRed/irspec1.htm#ir1

3. Group FrequenciesDetailed information about the infrared absorptions observed for various bonded atoms and groups is usually presented in tabular form. The following table provides a collection of such data for the most common functional groups. Following the color scheme of the chart, stretching absorptions are listed in the blue-shaded section and bending absorptions in the green shaded part. More detailed descriptions for certain groups (e.g. alkenes, arenes, alcohols, amines & carbonyl compounds) may be viewed by clicking on the functional class name. Since most organic compounds have C-H bonds, a useful rule is that absorption in the 2850 to 3000 cm-1 is due to sp3 C-H stretching; whereas, absorption above 3000 cm-1 is from sp2 C-H stretching or sp C-H stretching if it is near 3300 cm-1.

Typical Infrared Absorption Frequencies

Stretching Vibrations Bending Vibrations

Functional Class

Range (cm-1)Intensity

AssignmentRange (cm-1)

Intensity

Assignment

Alkanes 2850-3000 str CH3, CH2 & CH2 or 3 bands

1350-14701370-1390720-725

medmedwk

CH2 & CH3 deformationCH3 deformationCH2 rocking

Alkenes 3020-31001630-1680

1900-2000

medvar

str

=C-H & =CH2 (usually sharp) C=C (symmetry reduces intensity)

C=C asymmetric stretch

880-995780-850675-730

strmedmed

=C-H & =CH2

(out-of-plane bending)cis-RCH=CHR

Alkynes 33002100-2250

strvar

C-H (usually sharp)C≡C (symmetry reduces intensity)

600-700

str C-H deformation

Arenes 30301600 & 1500

varmed-wk

C-H (may be several bands)C=C (in ring) (2 bands)(3 if conjugated)

690-900

str-med

C-H bending &ring puckering

Alcohols & 3580-3650 var O-H (free), usually 1330- med O-H bending




Phenols 3200-3550970-1250

strstr

sharpO-H (H-bonded), usually broadC-O

1430650-770

var-wk

(in-plane)O-H bend (out-of-plane)

Amines 3400-3500 (dil. soln.)3300-3400 (dil. soln.)1000-1250

wkwkmed

N-H (1°-amines), 2 bandsN-H (2°-amines)C-N

1550-1650660-900

med-strvar

NH2 scissoring (1°-amines)NH2 & N-H wagging(shifts on H-bonding)

Aldehydes & Ketones

2690-2840(2 bands)1720-17401710-1720

16901675

1745

1780

medstrstr

strstrstrstr

C-H (aldehyde C-H)C=O (saturated aldehyde) C=O (saturated ketone)

aryl ketoneα, β-unsaturationcyclopentanonecyclobutanone

1350-13601400-1450 1100

strstrmed

α-CH3 bendingα-CH2 bending C-C-C bending

Carboxylic Acids & Deriva

tives

2500-3300 (acids) overlap C-H1705-1720 (acids)1210-1320 (acids)

1785-1815 ( acyl halides)1750 & 1820 (anhydrides)

1040-1100

1735-1750 (esters)

1000-

strstrmed-str

strstrstrstrstrstr

O-H (very broad)C=O (H-bonded) O-C (sometimes 2-peaks)

C=OC=O (2-bands) O-CC=O O-C (2-bands)C=O (amide I band)

1395-1440

1590-1650 1500-1560

med

medmed

C-O-H bending

N-H (1¡-amide) II bandN-H (2¡-amide) II band

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/InfraRed/irspec1.htm#ir5b






http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/InfraRed/irspec1.htm#ir4c


1300

1630-1695(amides)

Nitriles

Isocyanates,Isothiocyanates,

Diimides, Azides & Ketenes

2240-2260

2100-2270

med

med

C≡N (sharp)

-N=C=O, -N=C=S-N=C=N-, -N3, C=C=O

To illustrate the usefulness of infrared absorption spectra, examples for five C4H8O isomers are presented below their corresponding structural formulas. The five spectra may be examined in turn by clicking the "Toggle Spectra" button. Try to associate each spectrum (A - E) with one of the isomers in the row above it. When you have made assignments check your answers by clicking on the structure or name of each isomer.

4. Other Functional GroupsInfrared absorption data for some functional groups not listed in the preceding table are given below. Most of the absorptions cited are associated with stretching vibrations. Standard

abbreviations (str = strong, wk = weak, brd = broad & shp = sharp) are used to describe the absorption bands.

Functional Class Characteristic Absorptions

Sulfur Functions

S-H thiols 2550-2600 cm-1 (wk & shp)

S-OR esters 700-900 (str)

S-S disulfide 500-540 (wk)

C=S thiocarbonyl 1050-1200 (str)

S=O sulfoxide

sulfone

sulfonic acid

sulfonyl chloride

sulfate

1030-1060 (str)1325± 25 (as) & 1140± 20 (s) (both str)1345 (str)1365± 5 (as) & 1180± 10 (s) (both str)1350-1450 (str)

Phosphorous Functions

P-H phosphine 2280-2440 cm-1 (med & shp)950-1250 (wk) P-H bending

(O=)PO-H phosphonic acid 2550-2700 (med)

P-OR esters 900-1050 (str)

P=O phosphine oxide

phosphonate

phosphate

phosphoramide

1100-1200 (str)1230-1260 (str)1100-1200 (str)1200-1275 (str)

Silicon Functions

Si-H silane 2100-2360 cm-1 (str)

Si-OR 1000-11000 (str & brd)

Si-CH3 1250± 10 (str & shp)

Oxidized Nitrogen Functions

=NOH oxime

O-H (stretch)

C=N

N-O

3550-3600 cm-1 (str)1665± 15945± 15

N-O amine oxide

aliphatic

aromatic

960± 201250± 50

N=O nitroso

nitro

1550± 50 (str)1530± 20 (as) & 1350± 30 (s)

Nuclear Magnetic Resonance Spectroscopy

1. BackgroundOver the past fifty years nuclear magnetic resonance spectroscopy, commonly referred to as nmr, has become the preeminent technique for determining the structure of organic compounds. Of all the spectroscopic methods, it is the only one for which a complete analysis and interpretation of the entire spectrum is normally expected. Although larger amounts of sample are needed than for mass spectroscopy, nmr is non-destructive, and with modern instruments good data may be obtained from samples weighing less than a milligram. To be successful in using nmr as an analytical tool, it is necessary to understand the physical principles on which the methods are based.

The nuclei of many elemental isotopes have a characteristic spin (I). Some nuclei have integral spins (e.g. I = 1, 2, 3 ....), some have fractional spins (e.g. I = 1/2, 3/2, 5/2 ....), and a few have no spin, I = 0 (e.g. 12C, 16O, 32S, ....). Isotopes of particular interest and use to organic chemists are 1H, 13C, 19F and 31P, all of which have I = 1/2. Since the analysis of this spin state is fairly straightforward, our discussion of nmr will be limited to these and other I = 1/2 nuclei.

The following features lead to the nmr phenomenon:

1. A spinning charge generates a magnetic field, as shown by the animation on the right.The resulting spin-magnet has a magnetic moment (μ) proportional to the spin.

2. In the presence of an external magnetic field (B0), two spin states exist, +1/2 and-1/2.The magnetic moment of the lower energy +1/2 state is aligned with the external field, but that of the higher energy -1/2 spin state is opposed to the external field. Note that the arrow representing the external field points North.

3. The difference in energy between the two spin states is dependent on the external magnetic field strength, and is always very small. The following diagram illustrates that the two spin states have the same energy when the external field is zero, but diverge as the field increases. At a field equal to Bx a formula for the energy difference is given (remember I = 1/2 and μ is the magnetic moment of the nucleus in the field).

Strong magnetic fields are necessary for nmr spectroscopy. The international unit for magnetic flux is the tesla (T). The earth's magnetic field is not constant, but is approximately 10-4 T at ground level. Modern nmr spectrometers use powerful magnets having fields of 1 to 20 T. Even with these high fields, the energy difference between the two spin states is less than 0.1 cal/mole. To put this in perspective, recall that infrared transitions involve 1 to 10 kcal/mole and electronic transitions are nearly 100 time greater.For nmr purposes, this small energy difference (ΔE) is usually given as a frequency in units of MHz (106 Hz), ranging from 20 to 900 Mz, depending on the magnetic field strength and the specific nucleus being studied. Irradiation of a sample with radio frequency (rf) energy corresponding exactly to the spin state separation of a specific set of nuclei will cause excitation of those nuclei in the +1/2 state to the higher -1/2 spin state. Note that this electromagnetic radiation falls in the radio and television broadcast spectrum. Nmr spectroscopy is therefore the energetically mildest probe used to examine the structure of molecules. The nucleus of a hydrogen atom (the proton) has a magnetic moment μ = 2.7927, and has been studied more than any other nucleus. The previous diagram may be changed to display energy differences for the proton spin states (as frequencies) by mouse clicking anywhere within it.

4. For spin 1/2 nuclei the energy difference between the two spin states at a given magnetic field strength will be proportional to their magnetic moments. For the four common nuclei noted above, the magnetic moments are: 1H μ = 2.7927, 19F μ = 2.6273, 31P μ = 1.1305 & 13C μ = 0.7022. These moments are in nuclear magnetons, which are 5.05078•10-27 JT-1. The following diagram gives the approximate frequencies that correspond to the spin state energy separations for each of these nuclei in an external magnetic field of 2.35 T. The formula in the colored box shows the direct correlation of frequency (energy difference) with magnetic moment (h = Planck's constant = 6.626069•10-

34 Js).


2. Proton NMR SpectroscopyThis important and well-established application of nuclear magnetic resonance will serve to illustrate some of the novel aspects of this method. To begin with, the nmr spectrometer must be tuned to a specific nucleus, in this case the proton. The actual procedure for obtaining the spectrum varies, but the simplest is referred to as the continuous wave (CW) method. A typical CW-spectrometer is shown in the following diagram. A solution of the sample in a uniform 5 mm glass tube is oriented between the poles of a powerful magnet, and is spun to average any magnetic field variations, as well as tube imperfections. Radio frequency radiation of appropriate energy is broadcast into the sample from an antenna coil (colored red). A receiver coil surrounds the sample tube, and emission of absorbed rf energy is monitored by dedicated electronic devices and a computer. An nmr spectrum is acquired by varying or sweeping the magnetic field over a small range while observing the rf signal from the sample. An equally effective technique is to vary the frequency of the rf radiation while holding the external field constant.

As an example, consider a sample of water in a 2.3487 T external magnetic field, irradiated by 100 MHz radiation. If the magnetic field is smoothly increased to 2.3488 T, the hydrogen nuclei of the water molecules will at some point absorb rf energy and a resonance signal will appear. An animation showing this may be activated by clicking the Show Field

Sweep button. The field sweep will be repeated three times, and the resulting resonance trace is colored red. For visibility, the water proton signal displayed in the animation is much broader than it would be in an actual experiment.

Since protons all have the same magnetic moment, we might expect all hydrogen atoms to give resonance signals at the same field / frequency values. Fortunately for chemistry applications, this is not true. By clicking the Show Different Protons button under the diagram, a number of representative proton signals will be displayed over the same magnetic field range. It is not possible, of course, to examine isolated protons in the spectrometer described above; but from independent measurement and calculation it has been determined that a naked proton would resonate at a lower field strength than the nuclei of covalently bonded hydrogens. With the exception of water, chloroform and sulfuric acid, which are examined as liquids, all the other compounds are measured as gases.

Why should the proton nuclei in different compounds behave differently in the nmr experiment ? The answer to this question lies with the electron(s) surrounding the proton in covalent compounds and ions. Since electrons are charged particles, they move in response to the external magnetic field (Bo) so as to generate a secondary field that opposes the much stronger applied field. This secondary field shields the nucleus from the applied field, so Bomust be increased in order to achieve resonance (absorption of rf energy). As illustrated in the drawing on the right, Bo must be increased to compensate for the induced shielding field. In the upper diagram, those compounds that give resonance signals at the higher field side of the diagram (CH4, HCl, HBr and HI) have proton nuclei that are more shielded than those on the lower field (left) side of the diagram. The magnetic field range displayed in the above diagram is very small compared with the actual field strength (only about 0.0042%). It is customary to refer to small increments such as this in units of parts per million (ppm). The difference between 2.3487 T and 2.3488 T is therefore about 42 ppm. Instead of designating a range of nmr signals in terms of magnetic field differences (as above), it is more common to use a frequency scale, even though the spectrometer may operate by sweeping the magnetic field. Using this terminology, we would find that at 2.34 T the proton signals shown above extend over a 4,200 Hz range (for a 100 MHz rf frequency, 42 ppm is 4,200 Hz). Most organic compounds exhibit proton resonances that fall within a 12 ppm range (the shaded area), and it is therefore necessary to use very

sensitive and precise spectrometers to resolve structurally distinct sets of hydrogen atoms within this narrow range. In this respect it might be noted that the detection of a part-per-million difference is equivalent to detecting a 1 millimeter difference in distances of 1 kilometer.

Chemical Shift

Unlike infrared and uv-visible spectroscopy, where absorption peaks are uniquely located by a frequency or wavelength, the location of different nmr resonance signals is dependent on both the external magnetic field strength and the rf frequency. Since no two magnets will have exactly the same field, resonance frequencies will vary accordingly and an alternative method for characterizing and specifying the location of nmr signals is needed. This problem is illustrated by the eleven different compounds shown in the following diagram. Although the eleven resonance signals are distinct and well separated, an unambiguous numerical locator cannot be directly assigned to each.

One method of solving this problem is to report the location of an nmr signal in a spectrum relative to a reference signal from a standard compound added to the sample. Such a reference standard should be chemically unreactive, and easily removed from the sample after the measurement. Also, it should give a single sharp nmr signal that does not interfere with the resonances normally observed for organic compounds. Tetramethylsilane, (CH3)4Si, usually referred to as TMS, meets all these characteristics, and has become the reference compound of choice for proton and carbon nmr.Since the separation (or dispersion) of nmr signals is magnetic field dependent, one additional step must be taken in order to provide an unambiguous location unit. This is illustrated for the

acetone, methylene chloride and benzene signals by clicking on the previous diagram. To correct these frequency differences for their field dependence, we divide them by the spectrometer frequency (100 or 500 MHz in the example), as shown in a new display by again clicking on the diagram. The resulting number would be very small, since we are dividing Hz by MHz, so it is multiplied by a million, as shown by the formula in the blue shaded box. Note that νref is the resonant frequency of the reference signal and νsamp is the frequency of the sample signal. This operation gives a locator number called the Chemical Shift, having units of parts-per-million (ppm), and designated by the symbol δ Chemical shifts for all the compounds in the original display will be presented by a third click on the diagram.

The compounds referred to above share two common characteristics:

• The hydrogen atoms in a given molecule are all structurally equivalent, averaged for fast conformational equilibria. • The compounds are all liquids, save for neopentane which boils at 9 °C and is a liquid in an ice bath.

The first feature assures that each compound gives a single sharp resonance signal. The second allows the pure (neat) substance to be poured into a sample tube and examined in a nmr spectrometer. In order to take the nmr spectra of a solid, it is usually necessary to dissolve it in a suitable solvent. Early studies used carbon tetrachloride for this purpose, since it has no hydrogen that could introduce an interfering signal. Unfortunately, CCl4 is a poor solvent for many polar compounds and is also toxic. Deuterium labeled compounds, such as deuterium oxide (D2O), chloroform-d (DCCl3), benzene-d6(C6D6), acetone-d6 (CD3COCD3) and DMSO-d6 (CD3SOCD3) are now widely used as nmr solvents. Since the deuterium isotope of hydrogen has a different magnetic moment and spin, it is invisible in a spectrometer tuned to protons.

From the previous discussion and examples we may deduce that one factor contributing to chemical shift differences in proton resonance is the inductive effect. If the electron density about a proton nucleus is relatively high, the induced field due to electron motions will be stronger than if the electron density is relatively low. The shielding effect in such high electron density cases will therefore be larger, and a higher external field (Bo) will be needed for the rf energy to excite the nuclear spin. Since silicon is less electronegative than carbon, the electron density about the methyl hydrogens in tetramethylsilane is expected to be greater than the electron density about the methyl hydrogens in neopentane (2,2-dimethylpropane), and the characteristic resonance signal from the silane derivative does indeed lie at a higher magnetic field. Such nuclei are said to be shielded. Elements that are more electronegative than carbon should exert an opposite effect (reduce the electron density); and, as the data in the following tables show, methyl groups bonded to such elements display lower field signals (they are deshielded). The deshielding effect of electron withdrawing groups is roughly proportional to their electronegativity, as shown by the left table. Furthermore, if more than one such group is present, the deshielding is additive (table on the right), and proton resonance is shifted even further downfield.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/intro2.htm#strc3b

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/suppmnt1.htm#nom1

Proton Chemical Shifts of Methyl Derivatives

Compound(CH3)4C (CH3)3N (CH3)2O CH3F

δ 0.9 2.1 3.2 4.1

Compound (CH3)4Si (CH3)3P (CH3)2S CH3Cl

Δ 0.0 0.9 2.1 3.0

Proton Chemical Shifts (ppm)

Cpd. / Sub.

X=Cl

X=Br

X=I

X=OR

X=SR

CH3X3.0 2.7 2.1 3.1 2.1

CH2X25.3 5.0 3.9 4.4 3.7

CHX37.3 6.8 4.9 5.0

The general distribution of proton chemical shifts associated with different functional groups is summarized in the following chart. Bear in mind that these ranges are approximate, and may not encompass all compounds of a given class. Note also that the ranges specified for OH and NH protons (colored orange) are wider than those for most CH protons. This is due to hydrogen bonding variations at different sample concentrations.

Proton Chemical Shift Ranges*

Low FieldRegio

n

High FieldRegio

n

* For samples in CDCl3 solution. The δ scale is relative to TMS at δ = 0.

To make use of a calculator that predicts aliphatic proton chemical shifts Click Here. This application was developed at Colby College.

Signal Strength

The magnitude or intensity of nmr resonance signals is displayed along the vertical axis of a spectrum, and is proportional to the molar concentration of the sample. Thus, a small or dilute

http://www.colby.edu/chemistry/NMR/H1pred.html

sample will give a weak signal, and doubling or tripling the sample concentration increases the signal strength proportionally. If we take the nmr spectrum of equal molar amounts of benzene and cyclohexane in carbon tetrachloride solution, the resonance signal from cyclohexane will be twice as intense as that from benzene because cyclohexane has twice as many hydrogens per molecule. This is an important relationship when samples incorporating two or more different sets of hydrogen atoms are examined, since it allows the ratio of hydrogen atoms in each distinct set to be determined. To this end it is necessary to measure the relative strength as well as the chemical shift of the resonance signals that comprise an nmr spectrum. Two common methods of displaying the integrated intensities associated with a spectrum are illustrated by the following examples. In the three spectra in the top row, a horizontal integrator trace (light green) rises as it crosses each signal by a distance proportional to the signal strength. Alternatively, an arbitrary number, selected by the instrument's computer to reflect the signal strength, is printed below each resonance peak, as shown in the three spectra in the lower row. From the relative intensities shown here, together with the previously noted chemical shift correlations, the reader should be able to assign the signals in these spectra to the set of hydrogens that generates each.If you click on one of the spectrum signals (colored red) or on hydrogen atom(s) in the structural formulas the spectrum will be enlarged and the relationship will be colored blue.Hint: When evaluating relative signal strengths, it is useful to set the smallest integration to unity and convert the other values proportionally.

Hydroxyl Proton Exchange and the Influence of Hydrogen Bonding

The last two compounds in the lower row are alcohols. The OH proton signal is seen at 2.37 δ in 2-methyl-3-butyne-2-ol, and at 3.87 δ in 4-hydroxy-4-methyl-2-pentanone, illustrating the wide range over which this chemical shift may be found. A six-membered ring intramolecular hydrogen bond in the latter compound is in part responsible for its low field shift, and will be shown by clicking on the hydroxyl proton. We can take advantage of rapid OH exchange with the deuterium of heavy water to assign hydroxyl proton resonance signals . As shown in the following equation, this removes the hydroxyl proton from the sample and its resonance signal in the nmr spectrum disappears. Experimentally, one simply adds a drop of heavy water to a chloroform-d solution of the compound and runs the spectrum again. The result of this exchange is displayed below.

R-O-H + D2O R-O-D + D-O-H

Hydrogen bonding shifts the resonance signal of a proton to lower field ( higher frequency ). Numerous experimental observations support this statement, and a few of these will be described here.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/alcohol1.htm#alcrx1

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/alcohol1.htm#alcrx1

i) The chemical shift of the hydroxyl hydrogen of an alcohol varies with concentration. Very dilute solutions of 2-methyl-2-propanol, (CH3)3COH, in carbon tetrachloride solution display a hydroxyl resonance signal having a relatively high-field chemical shift (< 1.0 δ ). In concentrated solution this signal shifts to a lower field, usually near 2.5 δ.ii) The more acidic hydroxyl group of phenol generates a lower-field resonance signal, which shows a similar concentration dependence to that of alcohols. OH resonance signals for different percent concentrations of phenol in chloroform-d are shown in the following diagram (C-H signals are not shown).

iii) Because of their favored hydrogen-bonded dimeric association, the hydroxyl proton of carboxylic acids displays a resonance signal significantly down-field of other functions. For a typical acid it appears from 10.0 to 13.0 δ and is often broader than other signals. The spectra shown below for chloroacetic acid (left) and 3,5-dimethylbenzoic acid (right) are examples.

iv) Intramolecular hydrogen bonds, especially those defining a six-membered ring, generally display a very low-field proton resonance. The case of 4-hydroxypent-3-ene-2-one (the enol tautomer of 2,4-pentanedione) not only illustrates this characteristic, but also provides an instructive example of the sensitivity of the nmr experiment to dynamic change. In the nmr spectrum of the pure liquid, sharp signals from both the keto and enol tautomers are seen, their mole ratio being 4 : 21 (keto tautomer signals are colored purple). Chemical shift assignments for these signals are shown in the shaded box above the spectrum. The chemical shift of the hydrogen-bonded hydroxyl proton is δ 14.5, exceptionally downfield. We conclude, therefore, that the rate at which these tautomers interconvert is slow compared with the inherent time scale of nmr spectroscopy.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/crbacid1.htm#crbacd4a

Although hydroxyl protons have been the focus of this discussion, it should be noted that corresponding N-H groups in amines and amides also exhibit hydrogen bonding nmr shifts, although to a lesser degree. Furthermore, OH and NH groups can undergo rapid proton exchange with each other; so if two or more such groups are present in a molecule, the nmr spectrum will show a single signal at an average chemical shift. For example, 2-hydroxy-2-methylpropanoic acid, (CH3)2C(OH)CO2H, displays a strong methyl signal at δ 1.5 and a 1/3 weaker and broader OH signal at δ 7.3 ppm. Note that the average of the expected carboxylic acid signal (ca. 12 ) and the alcohol signal (ca. 2 ) is 7. Rapid exchange of these hydrogens with heavy water, as noted above, would cause the low field signal to disappear.

π-Electron Functions

An examination of the proton chemical shift chart (above) makes it clear that the inductive effect of substituents cannot account for all the differences in proton signals. In particular the low field resonance of hydrogens bonded to double bond or aromatic ring carbons is puzzling, as is the very low field signal from aldehyde hydrogens. The hydrogen atom of a terminal alkyne, in contrast, appears at a relatively higher field. All these anomalous cases seem to involve hydrogens bonded to pi-electron systems, and an explanation may be found in the way these pi-electrons interact with the applied magnetic field.Pi-electrons are more polarizable than are sigma-bond electrons, as addition reactions of electrophilic reagents to alkenes testify. Therefore, we should not be surprised to find that field induced pi-electron movement produces strong secondary fields that perturb nearby nuclei. The pi-electrons associated with a benzene ring provide a striking example of this phenomenon, as shown below. The electron cloud above and below the plane of the ring circulates in reaction to the external field so as to generate an opposing field at the center of the ring and a supporting field at the edge of the ring. This kind of spatial variation is called anisotropy, and it is common to nonspherical distributions of electrons, as are found in all the functions mentioned above. Regions in which the induced field supports or adds to the external field are said to be deshielded, because a slightly weaker external field will bring about resonance for nuclei in such areas. However, regions in which the induced field opposes the external field are termed shielded because an increase in the applied field is needed for resonance. Shielded regions are designated by a plus sign, and deshielded regions by a negative sign. The anisotropy of some important unsaturated functions will be displayed by clicking on the benzene diagram below. Note that the anisotropy about the triple bond nicely accounts for the relatively high field chemical shift of ethynyl hydrogens. The shielding & deshielding regions about the carbonyl group have been described in two ways, which alternate in the display.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/nmr/nmr1.htm#nmr3bb

Sigma bonding electrons also have a less pronounced, but observable, anisotropic influence on nearby nuclei. This is seen in the small deshielding shift that occurs in the series CH3–R, R–CH2–R, R3CH; as well as the deshielding of equatorial versus axial protons on a fixed cyclohexane ring.

Solvent Effects

Chloroform-d (CDCl3) is the most common solvent for nmr measurements, thanks to its good solubilizing character and relative unreactive nature ( except for 1º and 2º-amines). As noted earlier, other deuterium labeled compounds, such as deuterium oxide (D2O), benzene-d6 (C6D6), acetone-d6 (CD3COCD3) and DMSO-d6 (CD3SOCD3) are also available for use as nmr solvents. Because some of these solvents have π-electron functions and/or may serve as hydrogen bonding partners, the chemical shifts of different groups of protons may change depending on the solvent being used. The following table gives a few examples, obtained with dilute solutions at 300 MHz.

Some Typical 1H Chemical Shifts (δ values) in Selected Solvents

Solvent

CompoundCDCl3 C6D6 CD3COCD3 CD3SOCD3 CD3C≡N D2O

(CH3)3C–O–CH3

C–CH3

O–CH3

1.19

3.221.07

3.041.13

3.13 1.113.03

1.14

3.131.21

3.22

(CH3)3C–O–HC–CH3

O–H1.26

1.651.05

1.551.18

3.101.11

4.191.16

2.18------

C6H5CH3 2.36 2.11 2.32 2.30 2.33 ---

CH3

C6H5 7.15-7.20 7.00-7.10 7.10-7.20 7.10-7.15 7.15-7.30 ---

(CH3)2C=O 2.17 1.55 2.09 2.09 2.08 2.22

For most of the above resonance signals and solvents the changes are minor, being on the order of ±0.1 ppm. However, two cases result in more extreme changes and these have provided useful applications in structure determination. First, spectra taken in benzene-d6 generally show small upfield shifts of most C–H signals, but in the case of acetone this shift is about five times larger than normal. Further study has shown that carbonyl groups form weak π–π collision complexes with benzene rings, that persist long enough to exert a significant shielding influence on nearby groups. In the case of substituted cyclohexanones, axial α-methyl groups are shifted upfield by 0.2 to 0.3 ppm; whereas equatorial methyls are slightly deshielded (shift downfield by about 0.05 ppm). These changes are all relative to the corresponding chloroform spectra.The second noteworthy change is seen in the spectrum of tert-butanol in DMSO, where the hydroxyl proton is shifted 2.5 ppm down-field from where it is found in dilute chloroform solution. This is due to strong hydrogen bonding of the alcohol O–H to the sulfoxide oxygen, which not only de-shields the hydroxyl proton, but secures it from very rapid exchange reactions that prevent the display of spin-spin splitting. Similar but weaker hydrogen bonds are formed to the carbonyl oxygen of acetone and the nitrogen of acetonitrile. A useful application of this phenomenon is described elsewhere in this text.

Spin-Spin Interactions

The nmr spectrum of 1,1-dichloroethane (below right) is more complicated than we might have expected from the previous examples. Unlike its 1,2-dichloro-isomer (below left), which displays a single resonance signal from the four structurally equivalent hydrogens, the two signals from the different hydrogens are split into close groupings of two or more resonances. This is a common feature in the spectra of compounds having different sets of hydrogen atoms bonded to adjacent carbon atoms. The signal splitting in proton spectra is usually small, ranging from fractions of a Hz to as much as 18 Hz, and is designated as J (referred to as the coupling constant). In the 1,1-dichloroethane example all the coupling constants are 6.0 Hz, as illustrated by clicking on the spectrum.

http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Spectrpy/nmr/nmr2.htm#nmr15

1,2-dichloroethane 1,1-dichloroethane

The splitting patterns found in various spectra are easily recognized, provided the chemical shifts of the different sets of hydrogen that generate the signals differ by two or more ppm. The patterns are symmetrically distributed on both sides of the proton chemical shift, and the central lines are always stronger than the outer lines. The most commonly observed patterns have been given descriptive names, such as doublet (two equal intensity signals), triplet (three signals with an intensity ratio of 1:2:1) and quartet (a set of four signals with intensities of 1:3:3:1). Four such patterns are displayed in the following illustration. The line separation is always constant within a given multiplet, and is called the coupling constant (J). The magnitude of J, usually given in units of Hz, is magnetic field independent.

The splitting patterns shown above display the ideal or "First-Order" arrangement of lines. This is usually observed if the spin-coupled nuclei have very different chemical shifts (i.e. Δν is large compared to J). If the coupled nuclei have similar chemical shifts, the splitting patterns are distorted (second order behavior). In fact, signal splitting disappears if the chemical shifts are the same. Two examples that exhibit minor 2nd order distortion are shown below (both are taken at a frequency of 90 MHz). The ethyl acetate spectrum on the left displays the typical quartet and triplet of a substituted ethyl group. The spectrum of 1,3-dichloropropane on the right demonstrates that equivalent sets of hydrogens may combine their influence on a second, symmetrically located set. Even though the chemical shift difference between the A and B protons in the 1,3-dichloroethane spectrum is fairly large (140 Hz) compared with the coupling constant (6.2 Hz), some distortion of the splitting patterns is evident. The line intensities closest to the chemical shift of the coupled partner are enhanced. Thus the B set triplet lines closest to A

are increased, and the A quintet lines nearest B are likewise stronger. A smaller distortion of this kind is visible for the A and C couplings in the ethyl acetate spectrum.

What causes this signal splitting, and what useful information can be obtained from it ? If an atom under examination is perturbed or influenced by a nearby nuclear spin (or set of spins), the observed nucleus responds to such influences, and its response is manifested in its resonance signal. This spin-coupling is transmitted through the connecting bonds, and it functions in both directions. Thus, when the perturbing nucleus becomes the observed nucleus, it also exhibits signal splitting with the same J. For spin-coupling to be observed, the sets of interacting nuclei must be bonded in relatively close proximity (e.g. vicinal and geminal locations), or be oriented in certain optimal and rigid configurations. Some spectroscopists place a number before the symbol J to designate the number of bonds linking the coupled nuclei (colored orange below). Using this terminology, a vicinal coupling constant is 3J and a geminal constant is 2J.

The following general rules summarize important requirements and characteristics for spin 1/2 nuclei :

1) Nuclei having the same chemical shift (called isochronous) do not exhibit spin-splitting. They may actually be spin-coupled, but the splitting cannot be observed

directly.2) Nuclei separated by three or fewer bonds (e.g. vicinal and geminal nuclei ) will usually be spin-coupled and will show mutual spin-splitting of the resonance signals (same J's), provided they have different chemical shifts. Longer-range coupling may be observed in molecules having rigid configurations of atoms.3) The magnitude of the observed spin-splitting depends on many factors and is given by the coupling constant J (units of Hz). J is the same for both partners in a spin-splitting interaction and is independent of the external magnetic field strength.4) The splitting pattern of a given nucleus (or set of equivalent nuclei) can be predicted by the n+1 rule, where n is the number of neighboring spin-coupled nuclei with the same (or very similar) Js. If there are 2 neighboring, spin-coupled, nuclei the observed signal is a triplet ( 2+1=3 ); if there are three spin-coupled neighbors the signal is a quartet ( 3+1=4 ). In all cases the central line(s) of the splitting pattern are stronger than those on the periphery. The intensity ratio of these lines is given by the numbers in Pascal's triangle. Thus a doublet has 1:1 or equal intensities, a triplet has an intensity ratio of 1:2:1, a quartet 1:3:3:1 etc. To see how the numbers in Pascal's triangle are related to the Fibonacci series click on the diagram.

If a given nucleus is spin-coupled to two or more sets of neighboring nuclei by different J values, the n+1 rule does not predict the entire splitting pattern. Instead, the splitting due to one J set is added to that expected from the other J sets. Bear in mind that there may be fortuitous coincidence of some lines if a smaller J is a factor of a larger J.

Magnitude of Some Typical Coupling Constants

<

Spin 1/2 nuclei include 1H, 13C, 19F & 31P. The spin-coupling interactions described above may occur between similar or dissimilar nuclei. If, for example, a 19F is spin-coupled to a 1H, both nuclei will appear as doublets having the same J constant. Spin coupling with nuclei having spin other than 1/2 is more complex and will not be discussed here.

REVIEW OF LITERATURE:

In case of rheumatoid arthritis a study has revealed that nine types of genes are found to be associated with the disease.

Out of these 9 genes I have studied about 5 genes as follows: HLA-DR1,CD40,IL2,PADI4 and STAT4.[1]

Rheumatoid arthritis (RA) is an autoimmune disease that is strongly associated with the expression of several HLA-DR haplotypes, including DR1 (DRB1*0101). Although the antigen that initiates RA remains elusive, it has been shown that many patients have autoimmunity directed to type II collagen (CII). HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins.[2] The protein encoded by CD40 gene is a member of the TNF-receptor superfamily. This receptor has been found to be essential in mediating a broad variety of immune and inflammatory responses including T cell-dependent immunoglobulin class switching, memory B cell development, and germinal center formation. AT-hook transcription factor AKNA is reported to coordinately regulate the expression of this receptor and its ligand, which may be important for homotypic cell interactions. Adaptor protein TNFR2 interacts with this receptor and serves as a mediator of the signal transduction. The findings indicate that the interaction of platelet CD154 with CD40 on neighboring cells is temporally limited to prevent an uncontrolled inflammation at the site of thrombus formation. Thus, similar to the very tight regulation of the CD154-CD40 interaction in the immune system, an effective mechanism controls the inflammatory potential of platelet CD154 in the vascular system.[3] The protein encoded by IL2 gene is a secreted cytokine that is important for the proliferation of T and B lymphocytes.

The receptor of this cytokine is a heterotrimeric protein complex whose gamma chain is also shared by interleukin 4 (IL4) and interleukin 7 (IL7). The expression of this gene in mature thymocytes is monoallelic, which represents an unusual regulatory mode for controlling the precise expression of a single gene. Patients with rheumatoid arthritis had raised levels of sIL-2R both in their sera and in their synovial fluid compared to patients with osteoarthritis and age-matched healthy controls. Mononuclear cells from the synovial fluid of rheumatoid arthritis patients were found to produce spontaneously high levels of sIL-2R which eluted at approximately m.w. 40,000 on gel filtration. [4] Peptidylarginine deiminases (PADIs) convert peptidylarginine into citrulline via post-translational modification. Anti-citrullinated peptide antibodies are highly specific for rheumatoid arthritis (RA). Our genome-wide case−control study of single-nucleotide polymorphisms found that the PADI4 gene polymorphism is closely associated with RA. The results showed that PADI4 is mainly distributed in cells of various haematopoietic lineages and expressed at high levels in the inflamed RA synovium. The co-localization of PADI4, citrullinated protein and apoptotic cells in fibrin deposits suggests that PADI4 is responsible for fibrin citrullination and is involved in apoptosis. The immunoreactivity of citrullinated fibrin with IgA and IgM in the RA synovium supports the notion that citrullinated fibrin is a potential antigen of RA autoimmunity.[5] The STAT4 gene encodes a transcription factor involved in the signaling pathways of several cytokines, including interleukin-12 (IL-12), the type I interferons, and IL-23. Recently, the association of a STAT4 haplotype marked by rs7574865 with rheumatoid arthritis (RA) and systemic lupus erythematosus was reported. The aim of this study was to investigate the role of this STAT4 tagging polymorphism in other immune-mediated diseases. . Unlike several other risk genes for RA such as PTPN22, PADI4, and FCRL3, a haplotype of the STAT4 gene shows consistent association with RA susceptibility across Whites and Asians, suggesting that this risk haplotype predates the divergence of the major racial groups.[6].

METHODOLOGY:1.Genome Analysis:

1. The information about the genes is studied with the help of articles.

2. The nucleotide sequences of the genes are retrieved from NCBI.

3. A set of tools were enlisted, which were to use for the analysis of the sequences to extract information about the genes.

4. Some commercial software like CLC WORKBENCH, GENEIOUS PRO and certain software are used for the better analysis of work.

5. The output/results of various tools were assembled to analyzed and report the significance.

6. Finally the desired output/result are obtained.

7. The results are analyzed for the further development in the study of those genes as well as for the development of drugs.

2.Drug Designing:

1.The target gene CD-40 is selected and its PDB is downloaded via PDB id 3QD6.

2.herbal plants like Bhrami , Nirgudi , etc are selected and searched for their chemical constituents.

3.chemical structure of these plants extract are searched via PubChem and their sdf format is downloaded.

4.Total of 45 chemicals are found and their format is converted by Open Babel.

5.Docking of these ligands with the target is done by I-GemDock.

6.results analysed and interaction table is obtained.

3.Quality Control:

1.The sample tablets are analysed for colour and shape.

2.Average weight of tablets is calculated.

3.Uniformity of weight is calculated.

4.Analysed by Disintintegration machine.

5.Analysed for hardness.

6.Analysed by Dissolution Machine.

7.Analysed for Microbiological Test

8.Assay is applied for computing the concentration of contents.

9.Checked for Related Impurities.

TOOLS AND SOFTWARE:

Some of the tools and software has been used for the genome analysis of these genes. A short description of those tools and the commercial software is given below:

NCBI:The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by SenatorClaude Pepper. The NCBI houses genome sequencing data in GenBank and an index of biomedical research articles in PubMed Central and PubMed, as well as other information relevant to biotechnology.

http://en.wikipedia.org/wiki/Biotechnology

http://en.wikipedia.org/wiki/PubMed

http://en.wikipedia.org/wiki/PubMed_Central

http://en.wikipedia.org/wiki/GenBank

http://en.wikipedia.org/wiki/Sequencing

http://en.wikipedia.org/wiki/Genome

http://en.wikipedia.org/wiki/Claude_Pepper

http://en.wikipedia.org/wiki/Bethesda,_Maryland

http://en.wikipedia.org/wiki/National_Institutes_of_Health

http://en.wikipedia.org/wiki/United_States_National_Library_of_Medicine


PUBMED:

PubMed is a free database accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Healthmaintains the database as part of the Entrez information retrieval system. PubMed was first released in January 1996.

http://en.wikipedia.org/wiki/Information_retrieval

http://en.wikipedia.org/wiki/Entrez

http://en.wikipedia.org/wiki/National_Institutes_of_Health



http://en.wikipedia.org/wiki/Bibliographic_database

http://en.wikipedia.org/wiki/MEDLINE

EXPASY:

ExPASy is a bioinformatics resource portal operated by the Swiss Institute of Bioinformatics (SIB) and in particular the SIB Web Team. It is an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a decentralised way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by many different SIB groups and external institutions.

http://en.wikipedia.org/wiki/Population_genetics

http://en.wikipedia.org/wiki/Systems_biology

http://en.wikipedia.org/wiki/Systems_biology

http://en.wikipedia.org/wiki/Swiss_Institute_of_Bioinformatics

http://en.wikipedia.org/wiki/Swiss_Institute_of_Bioinformatics

http://en.wikipedia.org/wiki/Bioinformatics

EXPASY TRANSLATE TOOL :

Translate tool is a part of expasy genome tool kit, which translates nucleotides sequences into protein sequences.

EBI TOOLS:

The European Bioinformatics Institute (EBI) is a centre for research and services inbioinformatics, and is part of European Molecular Biology Laboratory (EMBL). It is located on the Wellcome Trust Genome Campus in Hinxton, Great Britain

http://en.wikipedia.org/wiki/United_Kingdom

http://en.wikipedia.org/wiki/Hinxton

http://en.wikipedia.org/wiki/Wellcome_Trust_Genome_Campus

http://en.wikipedia.org/wiki/European_Molecular_Biology_Laboratory

http://en.wikipedia.org/wiki/Bioinformatics

ATLAS :

Atlas is a ebi functional genomic tool, which is used to examine and explore data from gene expression experiments.

CLUSTAL-W:

CLUSTAL-W is a multiple sequence alignment program. Its main characteristic is that it will allow you to combine results obtained with several alignment methods.

CLC MAIN WORKBENCH :

GENEIOUS PRO :

BOXSHADE :

BOXSHADE is a program for pretty-printing multiple alignment output. The program itself doesn't do any alignment, you have to use a multiple alignment program like ClustalW or Pileup and use the output of these programs as input for BOXSHADE. Of course, you can also use manually aligned sequences

iGEMDOCK:

GEMDOCK : molecular docking tool

GEMDOCK - a Generic Evolutionary Method for molecular DOCKing GEMDOCK is a program for computing a ligand conformation and orientation relative to the active site of target protein. The tool was developed by Jinn-Moon Yang, a profesor of the Institute of Bioinformatics, National Chiao Tung University of GEMDOCK has been evaluated on several terms:

Molecular dockingo Two testing set, 100 and 305 protein-ligand complexes from the protein data bank (PDB)

phospholipase A2/vitamin E (1fe7-vit) tubulin/taxol (1jff-TA1)

o Practical applications on dengue virus E protein, sulfotransferase, hydantoinase, and amine oxidase.

Virtual screening for three different target proteins,o Thymidine kinase (TK),o Estrogen receptor alpha (ER alpha),o Dihydrofolate reductase (DHFR).

The accuracy of molecular docking and the screening utility were better than other docking methods. These results have been published.

The main features of GEMDOCK were

Scoring function : an empirical scoring function having fewer local minima to replace the relatively complicated AMBER-based energy function.

Evolutionary algorithm : a differential evolution operator to reduce the disadvantages of Gaussian and Cauchy mutations, and a new rotamer-based mutation operator to reduce the search space of ligand structure conformations.

GEMDOCK may be run as either a purely flexible or hybrid docking approach. GEMDOCK is an automatic system that generates all related docking variables, such as atom formal charge, atom type, and the ligand binding site of a protein.

http://gemdock.life.nctu.edu.tw/dock/method_ea.php

http://gemdock.life.nctu.edu.tw/dock/method_sf.php

http://gemdock.life.nctu.edu.tw/dock/result_er.php

http://gemdock.life.nctu.edu.tw/dock/result_tk.php

http://gemdock.life.nctu.edu.tw/dock/result_ecao.php

http://gemdock.life.nctu.edu.tw/dock/result_ht.php

http://gemdock.life.nctu.edu.tw/dock/result_st.php

http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0000428

http://gemdock.life.nctu.edu.tw/dock/result_set100.php

http://www.life.nctu.edu.tw/

http://bioxgem.life.nctu.edu.tw/

http://gemdock.life.nctu.edu.tw/dock/

http://gemdock.life.nctu.edu.tw/dock/

OPEN BABEL GUI:Open Babel is a full chemical software toolbox. In addition to converting file formats, it offers a complete programming library for developing chemistry software. The library is written primarily in C++ and also offers interfaces to other languages (e.g., Perl and Python) using essentially the same API.

The heart of Open Babel lies in the OBMol, OBAtom, and OBBond classes, which handle operations on atoms, bonds and molecules. Newcomers should start with looking at the OBMol class, designed to store the basic information in a molecule and to perceive information about a molecule.

http://openbabel.org/api/2.2.0/classOpenBabel_1_1OBMol.shtml

http://openbabel.org/api/2.2.0/classOpenBabel_1_1OBBond.shtml

http://openbabel.org/api/2.2.0/classOpenBabel_1_1OBAtom.shtml

http://openbabel.org/api/2.2.0/classOpenBabel_1_1OBMol.shtml

RESULTS AND OBSERVATIONS:

RESULT OF NCBI:

GENBANK FORMAT: HLA-DR1 GENE

FASTA FORMAT: HLA-DR1 GENE

GENBANK FORMAT : CD40 GENE

FASTA FORMAT :CD40 GENE

GENBANK FORMAT : IL2 GENE

FASTA FORMAT : IL2 GENE

GENBANK FORMAT: PADI4

FASTA FORMAT: PADI4

GENBANK FORMAT: STAT4 GENE

RESULT OF TRANSLATE TOOL:GENE HLA-DRB1 :5'3' Frame 1L E Y S T S E C H F F N G T E R V R F L D R Y F Y N Q E E Y V R F D S D V G E F R A V T E L G R P D E E Y W N S Q K D L L E Q K R G R V D N Y C R H N Y G V V E S F T V Q5'3' Frame 2W S T L R L S V I S S Met G R S G C G S W T D T S I T K R S T C A S T A T W G S S G R Stop R S W G G L Met R S T G T A R R T S W S R S G A G W T T T A D T T T G L W R A S Q C5'3' Frame 3G V L Y V Stop V S F L Q W D G A G A V P G Q I L L Stop P R G V R A L R Q R R G G V P G G D G A G A A Stop Stop G V L E Q P E G P P G A E A G P G G Q L L Q T Q L R G C G E L H S A3'5' Frame 1L H C E A L H N P V V V S A V V V H P A P L L L Q E V L L A V P V L L I R P P Q L R H R P E L P H V A V E A H V L L L V I E V S V Q E P H P L R P I E E Met T L R R R V L Q3'5' Frame 2C T V K L S T T P Stop L C L Q Stop L S T R P R F C S R R S F W L F Q Y S S S G R P S S V T A R N S P T S L S K R T Y S S W L Stop K Y L S R N R T R S V P L K K Stop H S D V E Y S

http://web.expasy.org/cgi-bin/translate/dna_sequences?/work/expasy/tmp/http/seqdna.4238,5





3'5' Frame 3A L Stop S S P Q P R S C V C S S C P P G P A S A P G G P S G C S S T P H Q A A P A P S P P G T P P R R C R S A R T P L G Y R S I C P G T A P A P S H Stop R N D T Q T Stop S T P

GENE CD-40:5'3' Frame 1A K A G A G E S A E A S L G R P V V L P P G L T S L W F V C L C S A S S G A A C Stop P L S I Q N H P L H A E K N S T Stop Stop T V S A V L C A S Q D R N W Stop V T A Q S S L K R N A F L A V K A N S Stop T P G T E R H T A T S T N T A T P T Stop G F G S S R R A P Q K Q T P S A P V K K A G T V R V R P V R A V S C T A H A R P A L G S S R L L Q G F L I P S A S P A Q S A S S P Met C H L L S K N V T L G Q A V R P K T W L C N R Q A Q T R L Met L S V V P R I G Stop E P W W Stop S P S S S G S C L P S S W C W S L S K R W P R S Q P I R P P T P S R N P R R S I F P T I F L A P T L L L Q C R R L Y Met D A N R S P R R Met A K R V A S Q C R R D S E A A P T Q E C G H V G K Q A V G Q R A W C C C C C G V R V R G W H Stop L G I A P R F C L H P C S L R Q E T W H W Met Q K Q F T L K N L S L H P G A H P V S Q L V L K T E A E V W W W W C W G Met V Stop Stop Y P P D L P I Q Q F G A Q R G I Met V A S L R P G S H I H R C P L Q H C L Stop Stop Stop T T G S C L T V H Q Q E T G Stop I K L E Y I Y T T E S Q K H C Stop V R K K R H A A E Stop W V W N F L K K Y Met L L C Met Y I A Y G Y Met Y K Y N Met H H I L I Stop Q G F W K G T Q K T H S S K S G D V W G G E E G S G 5'3' Frame 2P R L G Q G S Q Q R P R S G A Q W S C R L V S P R Y G S S A S A V R P L G L L A D R C P S R T T H C Met Q R K T V P N K Q S V L F F V P A R T E T G E Stop L H R V H Stop N G Met P S L R Stop K R I P R H L E Q R D T L P P A Q I L R P Q P R A S G P A E G H L R N R H H L H L Stop R R L A L Y E Stop G L Stop E L C P A P L Met L A R L W G Q A D C Y R G F Stop Y H L R A L P S R L L L Q C V I C F R K Met S P L D K L Stop D Q R P G C A T G R H K Q D Stop C C L W S P G S A E S P G G D P H H L R D P V C H P L G A G L Y Q K G G Q E A N Q Stop G P P P Q A G T P G D Q F S R R S S W L Q H C C S S A G D F T W Met P T G H P G G W Q R E S H L S A G E T V R L H P P R S V A T W A N R Q L A R E P G A A A A V A Stop G Stop G A G T D W A Stop L P A S A C T P A V Stop D R R P G T G C R N S S P Stop R T S H F T L E P I Q S P N L Y Stop R Q R Q K F G G G G V G V W F S N I H Q T F R S S S L V P R E A S W W L P C A Q E A I Y T D A H C S I V C D S E Q L E A A Stop L S I S R R L A K Stop N Stop N I F I Q Q N L K N T V E Stop G K K G Met L L N D G Y G T F Stop K S T C F Y V C I L P Met D I C I N T I C I I Y Stop Y N K G S G R V H R K P T A R R V V T S G V G K K G L G 5'3' Frame 3Q G W G R G V S R G L A R A P S G P A A W S H L A Met V R L P L Q C V L W G C L L T A V H P E P P T A C R E K Q Y L I N S Q C C S L C Q P G Q K L V S D C T E F T E T E C L P C G E S E F L D T W N R E T H C H Q H K Y C D P N L G L R V Q Q K G T S E T D T I C T C E E G W H C T S E A C E S C V L H R S C S P G F G V K Q I A T G V S D T I C E P C P V G F F S N V S S A F E K C H P W T S C E T K D L V V Q Q A G T N K T D V V C G P Q D R L R A L V V I P I I F G I L F A I L L V L V F I K K V A K K P T N K A P H P K Q E P Q E I N F P D D L P G S N T A A P V Q E T L H G C Q P V T Q E D G K E S R I S V Q E R Q Stop G C T H P G V W P R G Q T G S W P E S L V L L L L W R E G E G L A L T G H S S P L L P A P L Q F E T G D L A L D A E T V H L E E P L T S P W S P S S L P T C I K D R G R S L V V V V L G Y G L V I S T R P S D P A V W C P E R H H G G F P A P R K P Y T Q Met P I A A L F V I V N N W K L L N C P S A G D W L N K I R I Y L Y N R I S K T L L S K E K K A C C Stop Met Met G Met E L F K K V H A F Met Y V Y C L W I Y V Stop I Q Y A S Y I D I T R V L E G Y T E N P Q L E E W Stop R L G W G R R V W G 3'5' Frame 1P P D P S S P P Q T S P L F E L W V F C V P F Q N P C Y I N I Stop C I L Y L Y I Y P Stop A I Y I H K S Met Y F F K K F H T H H S A A C L F F L T Q Q C F Stop D S V V Stop I Y S N F I Stop P V S C Stop W T V K Q L P V V H Y H K Q C C N G H L C I W L P G R R E A T Met Met P L W A P N C W I G R S G G Y Y Stop T I P Q H H H H Q T S A S V F N T S W E T G W A P G Stop S E R F F K V N C F C I Q C Q V S C L K L Q G C R Q K R G A Met P S Q C Q P L T L T P Q Q Q Q H Q A L W P T A C L P T W P H S W V G A A S L S L L H Stop D A T L F A I L L G D R L A S Met Stop S L L H W S S S V G A R K I V G K I D L L G F L L G V G G L I G W L L G H L F D K D Q H Q E D G K Q D P E D D G D H H Q G S Q P I L G T T D N I S L V C A C L L H N Q V F G L T A C P R V T F F E S R Stop H I G E E A D W A G L A D G I R N P C S N L L D P K A G R A Stop A V Q D T A L T G L T R T V P A F F T G A D G V C F Stop G A L L L D P K P Stop V G V A V F V L V A V C L S V P G V Stop E F A F T A R K A F R F S E L C A V T H Q F L S W L A Q R T A L T V Y Stop V L F F S A C S G W F W Met D S G Q Q A A P E D A L Q R Q T N H S E V R P G G R T T G R P S E A S A D S P A P A L 3'5' Frame 2P Q T L L P H P R R H H S S S C G F S V Y P S R T L V I S I Y D A Y C I Y T Y I H R Q Y T Y I K A C T F L K S S I P I I Q Q H A F F S L L N S V F E I L L Y K Y I L I L F S Q S P A D G Q L S S F Q L F T I T N N A A Met G I C V Y G F L G A G K P P Stop C L S G H Q T A G S E G L V D I T K P Y P N T T T T K L L P L S L I Q V G R L D G L Q G E V R G S S R Stop T V S A S S A R S P V S N C R G A G R S G E L C P V S A S P S P S R H S S S S T R L S G Q L P V C P R G H T P G W V Q P H C L S C T E Met R L S L P S S W V T G W H P C K V S C T G A A V L E P G R S S G K L I S W G S C L G W G A L L V G F L A T F L I K T S T K R Met A N R I P K Met Met G I T T R A L S R S W G P Q T T S V L F V P A C C T T R S L V S Q L V Q G Stop H F S K A D D T L E K K P T G Q G S Q Met V S E T P V A I C L T P K P G E H E R C R T Q L S Q A S L V Q C Q P S S Q V Q Met V S V S E V P F C W T R S P R L G S Q Y







L C W W Q C V S L F Q V S R N S L S P Q G R H S V S V N S V Q S L T S F C P G W H K E Q H Stop L F I R Y C F S L H A V G G S G W T A V S K Q P Q R T H C R G R R T I A R Stop D Q A A G P L G A R A R P L L T P L P Q P W 3'5' Frame 3P R P F F P T P D V T T L R A V G F L C T L P E P L L Y Q Y Met Met H I V F I H I S I G N I H T Stop K H V L F Stop K V P Y P S F S S Met P F F P Y S T V F L R F C C I N I F Stop F Y L A S L L L Met D S Stop A A S S C S L S Q T Met L Q W A S V Y Met A S W A Q G S H H D A S L G T K L L D R K V W W I L L N H T P T P P P P N F C L C L Stop Y K L G D W Met G S R V K Stop E V L Q G E L F L H P V P G L L S Q T A G V Q A E A G S Y A Q S V P A P H P H A T A A A A P G S L A N C L F A H V A T L L G G C S L T V S P A L R C D S L C H P P G Stop P V G I H V K S P A L E Q Q C W S Q E D R R E N Stop S P G V P A W G G G P Y W L A S W P P F Stop Stop R P A P R G W Q T G S R R Stop W G S P P G L S A D P G D H R Q H Q S C L C L P V A Q P G L W S H S L S K G D I F R K Q Met T H W R R S R L G R A R R W Y Q K P L Stop Q S A Stop P Q S R A S Met S G A G H S S H R P H S Y S A S L L H R C R W C L F L R C P S A G P E A L G W G R S I C A G G S V S L C S R C L G I R F H R K E G I P F Q Stop T L C S H S P V S V L A G T K N S T D C L L G T V F L C Met Q W V V L D G Q R S A S S P R G R T A E A D E P Stop R G E T R R Q D H W A P E R G L C Stop L P C P S L G

GENE IL2:5'3' Frame 1S K P Q P S A C S A Q A A A V A F W R C L G H D T H L T C P P S T Y L Stop T I F L A L Q H G P K R N S A E V P G Stop G P K Q E N Y Stop R G V C Q Stop I S E A E K A I Y Q V Q G R Q N L S Y N C G Stop E A Q E Y Q E K Q I Stop G Y F A L Stop L Stop P G R T I P D N L Stop Stop G F Q L H Q C Q L H Stop G S L W T Q G L Y C H P G S F I Y N P P G L L E D D L G I Stop C P Y H C Y G Met H G V Stop N G K E K V Stop A L L G Stop A R R D A A G I W P F L C I L Stop S Stop K K E I Stop L Y N Q D S K S Stop V Q Stop Stop N S N Y L P V S L Q E L A R P Stop C T F I Y R P Y S Stop A H L G C T L L P R G Stop Q C S H Met H S L Q C W L W K D W C Y L C Y Stop L Y Met D V A K R W E S S K A L Y S Stop E K S H S P S R L L F S Stop F T K K Y H K S S K N D E P T K D K N G N Q R I F F L Stop L Stop D F Stop N K C K R R A S F A P C Stop I K H F F Stop L S G A K L Q F Stop Q K C Stop H N H E Met A D K G I S N S W G A S S E A S K F G F G L S F V Stop G Met F Stop F Stop T C K C S R K I F Stop F K G A N N T D Q I N S F Stop I D T A E R N Q G G G Q Q G K L F L F G I S T T Stop F L F C R D A G S K S N A C F F S R T E L F T A I Stop L Stop T P N T Stop C L Stop C K A P Stop L Stop C S W C I F L H T F S G K S L F F I Met A S K W Y Q F Stop D V S Stop F T Stop E A R W N C F S F F S V A N I L Y I P L L L L Q F T Stop F F I T E F S N Q Y F L T I E P G V S C T S N C S K D R Stop Stop N P P S T S C T D T Stop I I Y C G Stop G S W R I L T K C S Q I L I L S C E G K N W N I T G Met G W N I Stop T K E I Stop Stop L C D T Stop T K Q E C K T P K S Stop I R T T S R S F F S P T S S P R K N S R V L L S C R Stop R L Y A G P I Y R N I F Y Stop L S Stop H H G K F N I F K T D T E D S W K K F H K E Stop E F E N F A K H E K E Y L Stop F L P T K Q A C R I C S V K Stop L Q L I S E F W F C K P F F K T Q R T K E S T T N L E Y L I K L Q I Y N N Met G C K Y T C K Stop N Y Stop N T A S Stop N K C S I C I I S N Met K I C Stop C V N S F Stop K K S K Met P I S A S F A F S Y H L H Stop V E N C K Stop K F V T Stop A Y V Q N A I Stop E T L L E W I Y F S F L P V I F I F F Y F S T Stop T Stop T S K G L Stop D L D L N Stop F L H C Q N I L Stop K V K K K T Y F V G C N T N C S Stop Q Stop L F P D S Y F C L N G V Y L V N L P K C C G K L E Y Stop E N E K L Y L L E Stop N V Q I Met T I I Stop Met Stop Q G I Q L K S Stop Stop V L T K V I K L P I L E K Stop S Met K Y N S Y L L V A K D I N C I C L Y R I F Q I Met C N F Y L T N Q K Y Stop F K Met N F Y Met N Met D L P Stop E N L V Q L Stop F Y V V N K L A G N C F Y K E S T Stop L P L Met H Stop K Y F Y L N N F I Y N F Stop K H V V L F K H H L F F S I F H L E V Q Stop G K L N E V L L S V S C S T Met Y P T D T Q Stop T F W L L N Stop K K K K K K K K 5'3' Frame 2V N H S L Q H A L L R R Q Q W L F G G V S A Met T H I Stop H A L P Q P T Y R L F F L L C S Met D Q R E I L Q K F L D E A Q S K K I T K E E F A N E F L K L K R Q S T K Y K A D K T Y P T T V A E K P K N I K K N R Y K D I L P Y D Y S R V E L S L I T S D E D S S Y I N A N F I K G V Y G P K A Y I A T Q G P L S T T L L D F W R Met I W E Y S V L I I V Met A C Met E Y E Met G K K K C E R Y W A E P G E Met Q L E F G P F S V S C E A E K R K S D Y I I R T L K V K F N S E T R T I Y Q F H Y K N W P D H D V P S S I D P I L E L I W D V R C Y Q E D D S V P I C I H C S A G C G R T G V I C A I D Y T W Met L L K D G S Q A K H C I P E K N H T L Q A D S Y S P N L P K S T T K A A K Met Met N Q Q R T K Met E I K E S S S F D F R T S E I S A K E E L V L H P A K S S T S F D F L E L N Y S F D K N A D T T Met K W Q T K A F P I V G E P L Q K H Q S L D L G S L L F E G C S N S K P V N A A G R Y F N S K V P I T R T K S T P F E L I Q Q R E T K E V D S K E N F S Y L E S Q P H D S C F V E Met Q A Q K V Met H V S S A E L N Y S L P Y D S K H Q I R N A S N V K H H D S S A L G V Y S Y I P L V E N P Y F S S W P P S G T S S K Met S L D L P E K Q D G T V F P S S L L P T S S T S L F S Y Y N S H D S L S L N S P T N I S S L L N Q E S A V L A T A P R I D D E I P P P L P V R T P E S F I V V E E A G E F S P N V P K S L S S A V K V K I G T S L E W G G T S E P K K F D D S V I L R P S K S V K L R S P K S E L H Q D R S S P P P P L P E R T L E S F F L A D E D C Met Q A Q S I E T Y S T S Y P D T Met E N S T S S K Q T L K T P G K S F T R S K S L K I L R N Met K K S I C N S C P P N K P A E S V Q S N N S S S F L N F G F A N R F S K P K G P R N P P P T W N I Stop Stop N S R F I I I W A A S T P A N K T T R I L L V K I S A L Y A Stop Y Q I Stop R Y A N V L I A F K R K A K C Q Stop V P V L H F H I I C I E L K T A N K S L S L E L Met Y R Met L Y E K H F Stop N G F I F H F C Q L F L F S F T F L H K H K L Q K V C K I W I S T N F Y I A R I Y Y K K L K K K L T L W V A I Q T A L D N D Y S L T V I F A Stop Met E Y T L Stop I F P N V V E N W N I K K Met R N Y I Y Stop N K Met C K Stop Stop Q L F E C N K E F N Stop N P D K F Stop P K S L N Y Q F Stop K S N Q Stop N I I A I F W Stop Q K I Stop I V Y V Y T G S F R S C A I F I Stop P I R N T S L K Stop I S I Stop I W I C H K K I Stop F N S N F Met Stop Stop I N W Q V I V F T K N P P D F P Stop C I K N I F I Stop I T L F I T F R N Met Stop Y C L N I I C S S V F F I W K S N R A N Stop Met K Y Y Y L S L V V Q C I Q Q T L N K L F G C Stop T E K K K K K K K 5'3' Frame 3Stop T T A F S Met L C S G G S S G F L E V S R P Stop H T F D Met P S L N L L I D Y F S C S A A W T K E K F C R S S W Met R P K A R K L L K R S L P Met N F Stop S Stop K G N L P S T R Q T K P I L Q L W L R S P R I S R K T D I R I F C P Met I I A G Stop N Y P Stop Stop P L Met R I P A T S Met P T S L R E F Met D P R L I L P P R V L Y L Q P S W T S G G Stop F G N I V S L S L L W H A W S Met K W E R K S V S A T G L S Q E R C S W N L A L S L Y P V K L K K G N L I I Stop S G L Stop K L S S I V K L E L S T S F I T R I G Q T Met Met Y L H L Stop T L F L S S S G Met Y V V T K R Met T V F P Y A F T A V L A V E G L V L F V L L I I H G C C





Stop K Met G V K Q S I V F L R K I T L S K Q T L I L L I Y Q K V P Q K Q Q K Stop Stop T N K G Q K W K S K N L L P L T L G L L K Stop V Q K K S Stop F C T L L N Q A L L L T F W S Stop I T V L T K Met L T Q P Stop N G R Q R H F Q Stop L G S L F R S I K V W I W A L F C L R D V L I L N L Stop Met Q Q E D I L I Q R C Q Stop H G P N Q L L L N Stop Y S R E K P R R W T A R K T F L I W N L N H Met I L V L Stop R C R L K K Stop C Met F L Q Q N Stop I I H C H Met T L N T K Y V Met P L Met Stop S T Met T L V L L V Y I L T Y L Stop W K I L I F H H G L Q V V P V L R C L L I Y L R S K Met E L F F L L L C C Q H P L H P S S L I T I H Met I L Y H Stop I L Q P I F P H Y Stop T R S Q L Y Stop Q L L Q G Stop Met Met K S P L H F L Y G H L N H L L W L R K L E N S H Q Met F P N P Y P Q L Stop R Stop K L E H H W N G V E H L N Q R N L Met T L Stop Y L D Q A R V Stop N S E V L N Q N Y I K I V L L P H L L S Q K E L Stop S P S F L P Met K I V C R P N L Stop K H I L L A I L T P W K I Q H L Q N R H Stop R L L E K V S Q G V R V Stop K F C E T Stop K R V S V I L A H Q T S L Q N L F S Q I T P A H F Stop I L V L Q T V F Q N P K D Q G I H H Q L G I F N K T P D L Stop Stop Y G L Q V H L Q I K L L E Y C Stop L K Stop V L Y Met H N I K Y E D Met L Met C Stop Stop L L K E K Q N A N K C Q F C I F I S F A L S Stop K L Q I K V C H L S L C T E C Y Met R N T F R Met D L F F I F A S Y F Y F L L L F Y I N I N F K R F V R F G S Q L I S T L P E Y T I K S Stop K K N L L C G L Q Y K L L L T Met T I P Stop Q L F L P K W S I P C K S S Q Met L W K T G I L R K Stop E I I F I R I K C A N N D N Y L N V T R N S T E I L I S F N Q S H Stop I T N S R K V I N E I Stop Stop L S F G S K R Y K L Y Met F I Q D L S D H V Q F L S N Q S E I L V Stop N E F L Y E Y G S A I R K S S S T L I L C S K Stop I G R Stop L F L Q R I H L T S P N A L K I F L F K Stop L Y L Stop L L E T C S I V Stop T S F V L Q Y F S F G S P I G Q I E Stop S I I I C L L Stop Y N V S N R H S I N F L V V K L K K K K K K K K 3'5' Frame 1F F F F F F F F S V Stop Q P K S L L S V C W I H C T T R D R Stop Stop Y F I Q F A L L D F Q Met K N T E E Q Met Met F K Q Y Y Met F L K V I N K V I Stop I K I F L Met H Stop G K S G G F F V K T I T C Q F I Y Y I K L E L N Stop I F L W Q I H I H I E I H F K L V F L I G Stop I K I A H D L K D P V Stop T Y T I Y I F C Y Q K I A I I F H Stop L L F Stop N W Stop F N D F G Stop N L S G F Q L N S L L H S N N C H Y L H I L F Stop Stop I Stop F L I F L I F Q F S T T F G K I Y K V Y S I Stop A K I T V R E Stop S L S R A V C I A T H K V S F F F N F L Stop Y I L A Met Stop K L V E I Q I L Q T F Stop S L C L C R K V K E N K N N W Q K Stop K I N P F Stop K C F S Y S I L Y I S S S D K L L F A V F N S Met Q Met I Stop K C K T G T Y W H F A F L L K A I N T L A Y L H I Stop Y Y A Y R A L I L T S S I L V V L F A G V L A A H I I I N L E F Y Stop I F Q V G G G F L G P L G F E K R F A K P K F R N E L E L F D Stop T D S A G L F G G Q E L Q I L F F Met F R K I F K L L L L V K L F P G V F S V C F E D V E F S Met V S G Stop L V E Y V S I D W A C I Q S S S A R K K D S R V L S G R G G G G E E R S Stop C S S D L G L R S F T L L L G L S I T E S S N F F G S D V P P H S S D V P I F T F T A E D K D L G T F G E N S P A S S T T I N D S G V R T G S G G G I S S S I L G A V A S T A D S W F N S E E I L V G E F S D K E S C E L Stop Stop E K R D V E D V G N R E E G K T V P S C F S G K S R D I L E L V P L G G H D E K Stop G F S T K G Met Stop E Y T P R A L E S W C F T L E A L R I W C L E S Y G S E Stop F S S A E E T C I T F Stop A C I S T K Q E S C G Stop D S K Stop E K F S L L S T S L V S L C C I N S K G V D L V R V I G T F E L K Y L P A A F T G L E L E H P S N K R E P K S K L Stop C F Stop R G S P T I G N A F V C H F Met V V S A F L S K L Stop F S S R K S K E V L D L A G C K T S S S F A L I S E V L K S K E E D S L I S I F V L C W F I I F A A F V V L F G K L G E Stop E S A W R V Stop F F S G I Q C F A Stop L P S F S N I H V Stop S I A Q I T P V L P Q P A L Q Stop Met H Met G T L S S S W Stop Q R T S Q Met S S R I G S I D E G T S W S G Q F L Stop Stop N W Stop I V R V S L L N L T F R V L I I Stop S D F L F S A S Q D T E K G P N S S C I S P G S A Q Stop R S H F F F P I S Y S Met H A I T Met I R T L Y S Q I I L Q K S R R V V D K G P W V A I Stop A L G P Stop T P L Met K L A L Met Stop L E S S S E V I R D S S T R L Stop S Stop G K I S L Y L F F L I F L G F S A T V V G Stop V L S A L Y L V D C L F S F R N S L A N S S L V I F L L W A S S R N F C R I S L W S Met L Q S K K N S L Stop V G Stop G R A C Q Met C V Met A E T P P K S H C C R L S R A C Stop R L W F T 3'5' Frame 2F F F F F F F F Q F N N Q K V Y Stop V S V G Y I V L Q E T D N N T S F N L P Y W T S K Stop K I L K N K Stop C L N N T T C F Stop K L Stop I K L F K Stop K Y F Stop C I R G S Q V D S L Stop K Q L P A N L F T T Stop N Stop S Stop T R F S Y G R S I F I Stop K F I L N Stop Y F Stop L V R Stop K L H Met I Stop K I L Y K H I Q F I S F A T K R Stop L L Y F I D Y F S R I G N L Met T L V K T Y Q D F S Stop I P C Y I Q I I V I I C T F Y S N K Y N F S F S Stop Y S S F P Q H L G R F T R Y T P F R Q K Stop L S G N S H C Q E Q F V L Q P T K Stop V F F L T F Y S I F W Q C R N Stop L R S K S Y K P F E V Y V Y V E K Stop K K I K I T G K N E K Stop I H S K S V S H I A F C T Stop A Q V T N F Y L Q F S T Q C K Stop Y E N A K L A L I G I L L F F Stop K L L T H Stop H I F I F D I Met H I E H L F Stop L A V F Stop Stop F Y L Q V Y L Q P I L L Stop I W S F I K Y S K L V V D S L V L W V L K N G L Q N Q N S E Met S W S Y L T E Q I L Q A C L V G K N Y R Y S F S C F A K F S N S Y S L Stop N F F Q E S S V S V L K Met L N F P W C Q D S Stop Stop N Met F L Stop I G P A Y N L H R Q E R R T L E F F L G E E V G E K N D L D V V L I Stop D F G V L H S C L V Stop V S Q S H Q I S L V Q Met F H P I P V Met F Q F L P S Q L R I R I W E H L V R I L Q L P Q P Q Stop Met I Q V S V Q E V E G G F H H L S L E Q L L V Q L T P G S I V R K Y W L E N S V I K N H V N C N K R R G Met Stop R Met L A T E K K E K Q F H L A S Q V N Q E T S Stop N W Y H L E A Met Met K N K D F P L K V C K N I H Q E H Stop S H G A L H Stop R H Y V F G V Stop S H Met A V N N S V L L K K H A L L F E P A S L Q N K N H V V E I P N K K S F P C C P P P W F L S A V S I Q K E L I W S V L L A P L N Stop N I F L L H L Q V Stop N Stop N I P Q T K E S P N P N F D A S E E A P Q L L E Met P L S A I S W L C Q H F C Q N C N L A P E S Q K K C L I Stop Q G A K L A L L L H L F Q K S Stop S Q R K K I L Stop F P F L S F V G S S F L L L L W Y F L V N Stop E N K S L L G E C D F S Q E Y N A L L D S H L L A T S Met Y N Q Stop H K Stop H Q S F H S Q H C S E C I W E H C H P L G N N V H P R Stop A Q E Stop G L Stop Met K V H H G L A N S C N E T G R Stop F E F H Y Stop T Stop L L E S Stop L Y N Q I S F F Q L H R I Q R K G Q I P A A S L L A Q P S S A H T F S F P F H T P C Met P Stop Q Stop Stop G H Y I P K S S S R S P G G L Stop I K D P G W Q Y K P W V H K L P Stop Stop S W H Stop C S W N P H Q R L S G I V L P G Y N H R A K Y P Y I C F S Stop Y S W A S Q P Q L Stop D R F C L P C T W Stop I A F S A S E I H W Q T P L Stop Stop F S C F G P H P G T S A E F L F G P C C R A R K I V Y K Stop V E G G H V K C V S W P R H L Q K A T A A A Stop A E H A E G C G L 3'5' Frame 3F F F F F F F F S L T T K K F I E C L L D T L Y Y K R Q I I I L H S I C P I G L P N E K Y Stop R T N D V Stop T I L H V S K S Y K Stop S Y L N K N I F N A L G E V R W I L C K N N Y L P I Y L L H K I R V E L D F L Met A D P Y S Y R N S F Stop T S I S D W L D K N C T Stop S E R S C I N I Y N L Y L L L P K D S Y Y I S L I T F L E L V I Stop Stop L W L K L I R I S V E F L V T F K Stop L S L F A H F I L I N I I S H F L N I P V F H N I W E D L Q G I L H L G K N N C Q G I V I V K S S L Y C N P Q S K F F F Stop L F I V Y S G N V E I S Stop D P N L T N L L K F Met F Met Stop K S K R K Stop K Stop L A K Met K N K S I L K V F L I Stop H S V H K L K Stop Q T F I C S F Q L N A N D Met K Met Q N W H L L A F C F S F K S Y Stop H I S I S S Y L I L C I Stop S T Y F N Stop Q Y S S S




F I C R C T C S P Y Y Y K S G V L L N I P S W W W I P W S F G F Stop K T V C K T K I Q K Stop A G V I Stop L N R F C R L V W W A R I T D T L F H V S Q N F Q T L T P C E T F S R S L Q C L F Stop R C Stop I F H G V R I A S R I C F Y R L G L H T I F I G K K E G L Stop S S F W E R R W G R R T I L Met Stop F Stop F R T S E F Y T L A W S K Y H R V I K F L W F R C S T P F Q Stop C S N F Y L H S Stop G Stop G F G N I W Stop E F S S F L N H N K Stop F R C P Y R K W R G D F I I Y P W S S C Stop Y S Stop L L V Q Stop Stop G N I G W R I Q Stop Stop R I Met Stop I V I R E E G C R G C W Q Q R R R K N S S I L L L R Stop I K R H L R T G T T W R P Stop Stop K I R I F H Stop R Y V R I Y T K S T R V Met V L Y I R G I T Y L V F R V I W Q Stop I I Q F C Stop R N Met H Y F L S L H L Y K T R I Met W L R F Q I R K V F L A V H L L G F S L L Y Q F K R S Stop F G P C Y W H L Stop I K I S S C C I Y R F R I R T S L K Q K R A Q I Q T L Met L L K R L P N Y W K C L C L P F H G C V S I F V K T V I Stop L Q K V K R S A Stop F S R V Q N Stop L F F C T Y F R S P K V K G R R F F D F H F C P L L V H H F C C F C G T F W Stop I R R I R V C L E S V I F L R N T Met L C L T P I F Stop Q H P C I I N S T N N T S P S T A S T A V N A Y G N T V I L L V T T Y I P D E L K N R V Y R Stop R Y I Met V W P I L V Met K L V D S S S F T I E L N F Stop S P D Y I I R F P F F S F T G Y R E R A K F Q L H L S W L S P V A L T L F L S H F I L H A C H N N D K D T I F P N H P P E V Q E G C R Stop R T L G G N I S L G S I N S L N E V G I D V A G I L I R G Y Q G Stop F Y P A I I I G Q N I L I S V F L D I L G L L S H S C R I G F V C L V L G R L P F Q L Q K F I G K L L F S N F L A L G L I Q E L L Q N F S L V H A A E Q E K Stop S I S R L R E G Met S N V C H G R D T S K K P L L P P E Q S Met L K A V V Y

GENE PADI4:5'3' Frame 1R Q G A P G Stop T L C R R K D F L F F L F F F F F E T E S C S V T Q A G V Q W H D L G S L Q L L P P G L K Stop F F C L S L P S S W D Y R C L P P R P A N V Y V F N R D R V S P C W S G Stop S Stop S P D L R Stop S T G G G I F Stop F Q P A W Met I Stop T C S S V L L E V Met N S E R L R D Q G S L C W K G L Q N S L A K Stop P G L G A A A Stop P Stop S G R G C R G L G V R P R K I K R K C H H F L L R N L E W L P N S S E T R F P H L E N G E R C Stop G C C E D Stop R Stop C Met L G P G I G Q G Y T R A W H V G Stop S Q I Stop N P P R G R V T L S L V L H S S V E Met R L Y L L P H L Met V R I R N N R A Q G P S D R Stop T H S R R Stop Stop Met V V S I I T L G A S T F Met L N K A Stop V S F Stop A R A R H L G G I R Stop I F V E Stop T A Q C W E Stop T V L Met H E Stop C Stop S K S Stop A I C C K N S T A L Y P D T E P L G E S S D P N S F P P N D V I A I N W S R L A S Q S L G Q E L R W P Q P S G P Q Met P R G Y L A T P R T P L L P L R E L G W Stop F C L S A G R E A S S S R V S G L D P E P L T Met Y K R V H S P T P K V P F G L Stop K G Stop C H Q Q K K Q P G R V L R D L H R I L S G S S T G L G A G N T N S H G V K P P V L S D K S R H W P C S Q A S W G Met G I Stop T Q G V A N L R K R Met K R S A K K L K R G K D E S T T Y L Q I T T D T G T S D A S L H G L Stop H Stop K L A D S P H V Q Q Stop G S S H V R Met Q L E L Q A Q V G H A P A P S L L A S H S T A L G S C P Q P C P S Q S P W P L F H C E Stop P S V F L Q R G F A S K V Q P F S L S V S E C V C V C V C V T N L V L H C S L C N C K Stop H A F S F A S P L P H H S P G S L V E D P R A H P S A F S L P Y P H S F P R G A G D V D I S L F Y P F Q P G P V S Stop A P E S C V Q T S Y R A S S H Q Met T L G K A R T K L L I S T Met P T P R P S L S L P C P A A G S P I P P G P Q A P N C G V L L H P S V F L N T P H P V P L Q L L L A V L H H K Met S R T Stop P I L P S S T A T T L A Stop A P P S V T S L C A A A T S Q V F W P P P G S P T V C S H I T A R V I T F K Q V G S R S F S A Q N D S R L P T L L G A E A K A L R Met P E R P Y Met I A P R P G L L S P S L T A F Q P H W P L C S F L N P P G Met S L P Q G L C T D C T G C S L D L E P S P C V P G S L Y P V L F P F S T P L T Y D R I H L L H L L L T G C L S P N K N K Q T K R L Y E G R I F V H F V H S V S T V P G T V P G H Q Stop A L S Q Y L L N Stop Q N C F G A R P L C G S F T A N L N P K L L K K I R K K R R N R R K Q P Q E T Stop D L G Stop R P A L P L I C C V T V G R S H S L S V L Q S P L L Q N E D W I G C S L R A L P T L T Met D S S L L P L P S R S S N Stop F Stop S C G Q P E K L G L L T L C A G S H V D A S E R R T W P H H H L L V Q K I N F E P R E G G L N P D P H C R A Y E T L G V S L N L S K S L W S H L H Met G I L I G R T P Stop G C Stop D E A G F E L K P V Stop S H S L T L N Stop Stop G K F F A S K P K Q A N K K Q T Q D I Q P T K R S Met F T P D S H A C T L Stop V L L V G T H P P P P S C L I K G R A W H Q W P R C N H S S E A T W A S P W Q A W P T P A L S G L T P R G P G K R Q V S W S Met Q R T S T S Met P G S S S S L P S S A A F Stop L W H L G Q E L V T F T L Stop A S I F L L Stop N G L Stop Stop Stop A H I H L V L Stop R D V A V I Q S G K H V S Y R Q A A N E S H T L E L D T F G S N S G P T T L E L W K N C T S V F S S V K W V Stop Stop W H Q P Q R A A V R V K Stop G D N G Met I Stop F T G L S Stop Stop A I L C E V Q E N V P G I E N V S C H V G G P G G E T C P K P G W S L N V Stop P G L Stop L L T Stop N L P A F P T F S S L I C L F D I G E I G Stop I K V F R A S R Q G W D C G H E D Q D P T P Q P H S P P Q A F Stop A I H P S Q E T D R A T L P L V P A L T P I Stop R S E G Q P Q G R G D F E S P H P Stop P E W G G V E P L G H R P Q V L T H G L C W A I Stop R N Q P R G F L Q P E G R A S P T Met A Q G T L I R V T P E Q P T H A V C V L G T L T Q L D I C R Stop E G G L L G F W R Q V R R C W Met T Q F Y Stop H R S Met C L A Q A L G S S L C S H C Q G S S W R E K T P A A L G S A S L Met A V G S L P C T S A N S P G T H D R Stop I I Met G Q T R P S I S L S S S R Q Q K L T E S L Stop L A Q G H T A R V P W K Stop G Stop P L G L L V H S T V C Y P T S L S Stop A S L S G C I I A W V G Met A K R G S V A A E T E V G S L D L L L A S K E A L G N Stop A S V A S S V K W G Stop Y S H H G L V I G I Q C E K Met C R C W L S T Stop H S G G P I L G S F Y L L I Stop Q Met L S A Met L C G R V W V K Q E S Stop G Met L S G S V Q G H C C Y C R T G I K R N F K T C K Q K L S S Met L K K K K T I F P G K K K K K E L E S L K N Stop L P V F L W L V S L I A P L F P G I V K T L F L Q L C S C K V T R Q I N S S C K N F F P Stop K V K N N I Met H V S I K Stop L S L F L A S I I C F P L H R S T P A P Stop N A Stop K I T Stop L F V Q G S V F Stop N V S L T E P V N L N N K S S S T P R S L Stop F L N Y P A T I A V V R N R S G C P W L P H P S L G I G G P R Met V P T G V P G T F L C Y C L S P S S Q R T A G N V G Met V S Q G H Met V E A G D Q E V G E K R K K K I T Stop H H L R R P Q L L Stop E W V Stop R K Stop Q E L K L A Q G L S Q H A Q A S P L P S V H E Stop Stop S R S Q V L P L A L E R W V D S W S V L G R E E G T A H F S Q K G W G L R S S Q G P L N P S F L L L Q P H V E P V T L S P E H L T Stop Q W L I F T S H F L H S T H S Met A L G G R Stop C C Y S H F T K E E T E A P R S Stop V T H L K P H S Stop Stop V G E P G L N P G S L T L G P S I L P Met G L R S R P S C H F L S V L N I A P D S A S L P S P E Q P S A S S Stop T T Q V G G P L P P H L R P S Y S C L H N F A Q A I L S V P K V R N P G G R G C S E P R L G Stop C T P A W V T E Stop D S V S K K K R Q K H L T E G H I N Stop R W Q A G H G G L C L Stop S Q H F E R P R W A N H L W S E F R D Q P G Q H G E T P S L L K I Q K N Stop T G R N G S H L Stop S Q H F G R P R Q A D H K V R R S R P S W L T W Stop N A V S T K N T K I S R A W Stop H A A I V P A T W E A E A G E S L E P R R R T L Q Stop T E I V P L N S S L G N T V S L R L K K K K K N Stop Met G V V A H A C N L N Y S G G Stop G T R V A Stop T W E V E V A V R Q D C T T A L Q Q G Q Q S E T L S Q K K K S Stop R W Q S Q D C L T L N P P C F A A H P A S K C W G V D S P G H C S L W A E L I Q L P K D L V K A Stop K G W G S R R C Q L Q R A V C I Q K L P S Met V S L L P P K V S F F F F F F L K T E S R S V Q P R L E C S G K I L A H


R K L H L P G S R N S P A S A S P V A G T T G A C H H A W L I F L Y F Q Stop R R C F T V L A R Met V S I S Stop P R D P P A S A S Q S A G I T G L S H R A Q P G F L L N R P V G Met T V G S L D K S V C L S E P R F T H L Stop N G K I N T P S W V T A K I E L Y N A H K V S G R G V P F P Met Q I K E N D Stop S E C Q S F S E V Y F Q G Stop G R A Q E K N T E T G K I C S P C F F L N G S G T S I F K G K R V G I G P D V V A H A C N P S T L G D Q G G W I T Stop G Q E F E T S L D N Met V K P H L Y Stop K Y K N Stop P G V V A D A C N P S Y S G S Stop G K R I T Stop T Q E A E V A V S Stop D C T I A L Q P G Q Stop E Stop N S A S K K K K K K W V L G Q R G R K K K K Met G R Stop E V S G C I E S L I S F H Stop I H F P C E V G R G I V T Y A L S S G N L Y C H K Stop Stop T Stop S K G R N Q I C I C C R Stop A E G Stop L Stop F L S F V P C L Stop R Stop A I N L H C Q G E I Q Q N W F T G K I L G L T R N F L V G K L Stop G R Y V A F F F F F L I F V A I L F R N K Met G L G Met V A H A C N P N T L G G Stop G G Q I Met R S R D G D H P G Q H G E T L S L L K I Q K L A E H G G V R L Stop S Q L L A R V R Q E N R L N L G G G G C S E Q R W H H C P P A Stop A T E R D S A S K K I E R K K Q W G Q A W Stop L Met P V I P T L W E A E V G R S L E V R S S R P A Stop P T W Stop N P I S T K N T K T Stop P G V V V H A C N P S Y S G G Stop G T R I A Stop T W E A E A A V S Q D H A T V L Q P G R Q C K T L S K K K K K K K R K G R K E G K K E R K K E N G R Q I C Met T Q F L A Stop L S L W L S E F G V L R F I F L Stop Q Q R G L H E P L C V R V A D T E C H P H G G G Stop S G D G G C W S W R N D V T V W G S S S Q C W L Y G G C W K L Q E D S A L D G L Met I T H V G H T L E G Met H T P G V N P Q Met F Met K V P C L G A E N I W N T A R G S T L I L E S N P S G P V P S W P L P L Stop A S V S P S S R E S G Y K G A R P T R L C P S P F P S Stop G S H Met G Q R G L R R T Stop P L G P P A V C P S A S I V F Stop A P Y S S S W W T R H R A C P S P V P I L Stop P Q C P G S W K A W L I T E A L R E Stop R K T T A H R T K R Met S L Q A L A E K Q E P W L S F G S A R Stop V Y H K W R H L S R N A P E P L G H L L A G N C P L G Stop A V C C F I Y C Y Y Stop Stop P S H T E Q L P H A G P S K P C N H P G S R S T Stop F P F Y R E G S Stop G P G E V S G Stop A K V T E P V R S S P R S L P L S L P S R L P A A P G L A W E T F P R G S Stop T T P P Q L V C R R V S D P K L L Q P A G D G Stop V Q N H K H Met L C P G W L V Stop T D W P A V V C G V L G I R Met F H S H Stop P W Q S P R G P G Q C H P G Stop Stop Q P S H P A P H P R Q H G N Q A G S R W E S H H L S Stop G D R Q G Q P W L E F S R C R Met T S L S R R T G G S P Q G N P R F S T L S H P P L S S A Q G L V H S E G Stop V S V W Stop L I D Stop L N E R Met K E Stop V C V L R E T Met P V T Q E V L F K C L L S E Stop K K V E S R K G V I P L C I P Y N V W D V L G T Q L S Stop N L F A T R A G G G Stop K G K R E K L R G I W R E R E H T Q K H S T I L T S A T T I P Y S A G A P T S Q A T G T W F F A S Met A L R E G P P T L A P L V K D C S V S S T G E R E G E E Q V K E N F L K I G A C C D P H P F L I P S L V A L G K R N S S F P F L E K I L G S Y T G F P F R G C S L N P Q S P P T V T S Q P P G L H L S L E P G C L P D D P Q L C P D P S S L G P Y P Q G A S T Q Met A R Q K Q N Q P G S L P F P P P D E E A A T P I F P F Met S I T L L A G C S G S R L Stop S Q H F G R P R W A D H Met R P E V Stop D Q P G Q H G E T P S L L K I Q K L A G H G G A R L Stop S Q L L R R L R R E N H L N L Stop G G G C C E P R S H H C T P A W V T E Q D S V S K I N K Stop P G A V A H A C N P S T L G G Stop G G Q I T R S G D R D H P G Stop H S E T P S L L K I Q K N Stop P G L V A G A C S P S Y L G G Stop G R R Met A Stop T W E V E L A V S Q D R T T A L E P G R Q S E T P S Q N K Stop I N K L I N Stop L I N Stop N K Stop I N K I T P L P K N N P P T R L P C P A P S F Y L Stop L H F S R A F S R S C G T A Stop L G Q S L S V L I C K Met G R G Met P A S Q Stop Q D L I V L P Stop Met F S R Stop D S V C R T P G S Q Stop L S T D T S V C I L S S L S F F I C R R G T P I V R Met S Q S C H G D A C R D G V Stop P C C P G W S Q T Y G L K Q S T C L S F P K C Stop G Y R H E P P C L A V C Met A F L H K K I A I T Met H A T D I Met P T E S R S C R L G W S A Met V R S W L T A I S T S Q V Q A I L L P Q P P E Stop L G L Q T P T T T P G Stop F F C I F S T D G D S P C W P G W S R T P D L V I R P S R P P K V L G L Q A Stop A T Met P G H H N A L F I L S C A S W H G S L R F H S K I R V I Q Q T R I S I F F G S G S H T Stop V C I K S I W R A C Stop N T D C C R A L W L A P V I P A L W E A K A G G S L E A R S S R P A W E T Y Stop D S H L Y K N V F Stop I S W Met W Stop C V P V V P A T Q E A E A G G L L E P R S S S L P Stop A Met I V P L H S S K T S S Q K N Q N K Q K T H T S G P H S Stop S F Q L S K S E W S P R I C I F N K F P S E A D A A V P K T T L W E A L V F R K C W Met L G T A E W H Q C R E G G S R D V A G Met A P G W A A R G P R P A I I S Q H P E T S L C L S E P V F S S V K E Stop Q W L T H R Met I V K T K V N G H T E K G D Stop L V I C C R H V N T N S G N S V R T G E G A K T E Met S L R Met T E I L N V K L P N G G G K N N L T A E C V L Met S Stop S D F L Stop L Stop A L N K H F Y I F T F C R S Y A H K A K I G Met L I F F L F F F L R W S L A L S P R L E C S G A I L A H C K L R L P G S C H S L A S A S R V A G T T G T R H N S R L I F F C I F N R D G V S P C Stop P G W S R S P D L V I H P P R P P K V L G L Q A G A T A P S L F F S F L W R Q G L A I L P R L V E S S P P T S A S Q N A G I T G G S H R A W P V Y Stop L L K K S T T T C P L L L P F S Stop E D A D F G S S V F L I V C H F S R P R P S Y H S R V Q G F T Stop D Q E R D R G Met A W W L Met S V I P A L W K A K Met G G S L E P R S S R P V W R Stop G D S I S T N N N Y F K I S Q G Stop W H V P V V P A T Q E A E V E G S L Q P G R L S C S A L C L K K E R E R H S K T E G V L V S Y L L K E T Q D K L N L I A Stop W S K E P L V N Q A A P F T Q T R I G S E R L R S C H Met V E Q Y L R T E K R K Stop P T E H G S E A Q K Q L D Stop L Q L G I G L I Stop T W F E Q L A A Stop D Q P K Q C L V Stop K Stop V T G C L H L Q L G Y S L L C T K K P F S R T Stop N T Stop G G S C G V N L I Stop H I A A V T N H Y K L S D F K Q H K L I T Y Q F A R S E V Q N Met N L T W L K S R C R Q G C I P S G S S R G K P I P C S F Stop R C P H S L A C G P F Q Q Q H H S D L C C H H H S F P D S D P P A S L L Stop R P L Y Y W K V V P I Q T P K E G S Stop I L P E K E F K V N P Stop S E S K F I K N A K E Stop K N G Y S I D R A V P K A A G C P F L W L F L D Y Met L N K G W I I H E F S G Stop R V G N S L N Stop G F L H F L D H I G Stop L P S V A I V S V N C H D A G G S V F Stop H A N V L Stop L V Y N E Q Stop G R P E V T L V I V L V L V G F G Q L L Y C H L F Y Q Q G L Y D L C L V L T S Y L I L Stop L R Met P N L L G Met Q P S R S Q P Y F Met Q P L F K Met E L L W F K C L Stop H L W L H W A H L A N L G Stop S P H L K I V N L I T F A S F L F C V R Stop H L H R F G G L G C G H L W E T I V L P T T E G Stop S H F E E E I P F G W S D K G R I R W Q R T H Stop D G P L G L D W I L Q E G C G K G E P S R Stop G A G E K Q S P G D G K V G G R C R E L W A I Q S D H S T G R A A R G T T G G Q A R A H P H S L Y E A E I Q Y T Stop L L S L G K Stop L H L S V P Q F T H L Stop N E Q Y P P Y R L V V R T E L I N V W K E C E Q C L E Q S K L P V S N C C H L R Met Stop N Stop L R E G R Met R A Q A R V G R R W N E I S S S V Y F R P L R F F K G Y V K F H L L K D Q H Q N V S D C L V N Q Stop K Stop P R Stop V S I I L E G L F A K V K D P C L G G R S Met P F S E D D F E A L D I Stop R R K G G Stop W K K Met K K F F Stop C V G G Stop E T N S C I L L S L Stop S A F Y Stop K H H F R V R Q G R G V V T Y A F I Stop L S E S A F L H E I T Stop I Stop R R G S N Y I C I C L R G A E G Stop L Stop V L S F V L Y L Stop R Stop A L S F A L Y L Stop R Stop A I S W A W W L Met P V I P T L W E A K A G G S P E V R S L R P A Stop P I W Stop N P I S T K N T K N Stop P G V V A C T C S P S Y L G G Stop D R R I A Stop T W E A E A A V S Stop D C T T A L Stop P G Stop Q S E T P S P P K K K D K L Stop Met Y N C Stop G E I Q Q N C F R V K I F G A H R E F P C G P I V R E V C S F L S L Stop L F Y L Stop T Met R G R F E G P S S Q D S S L W L S E F G V P R C I F L S Q S V S G K I H F L L G N L C L L P S L P S F F P S S F S F S L F L P F S P S H F F F L L V Y C I S Y L Y C V T N C P K T Stop W L K

Met I T F I V S Stop F L Stop V S N P V W F S W V L Y F K V S H N Q G I G R A V V Met L R L H Q G R Stop H F Q A H S Y G C W Stop D S G P C R V L E P Q F L P G C W L E A S L S F L P H W P L H S S Q N G S L L H Q G E Q G K R Q A E R E A E R E T E R E

RESULT OF ATLAS TOOL:

HLA-DR1 GENE

CD40 GENE

IL2 GENE

PADI4 GENE

STAT4 GENE

RESULT FOR CLUSTALW

RESULTS FOR CLC WORKBENCH

SEQUENCE VIEWS:HLA-DR1 GENE

CD40 GENE

IL2 GENE

PADI4 GENE

STAT4

RNA SEQUENCES: HLA-DR1

CD40

IL2

PADI4:

STAT4

NUCLEOTIDE SEQUENCE STATISTICS:

READING FRAME TABLE:

MULTIPLE SEQUENCE ALINGNMENT:

ALGNMENT TREE:

RESULT FOR GENEIOUS PRO

SEQUENCE VIEWS: HLA-DR1

CD40

IL2

PADI4

STAT4

TRANSLATED VIEWS:HLA-DR1

CD40

IL2

PADI4

STAT4

DOTPLOTS OF THE GENES:HLA-DRI

CD40

IL2

PADI4

STAT4

MULTIPLE SEQUENCE ALIGNMENT:

ALIGNMENT TREE:VIEW1

VIEW2

VIEW3

Results for iGEMDOCK:

I found protein from protein data bank, name of the target protein is 3QD6, and its first ligand is N-ACETYL-D-GLUCOSAMINE. Protein can be browsed by clicking on prepare binding site and its ligand can be browsed by clicking on prepare compounds. Population size, generations, default settings can be changed according to the requirement. Docking starts by clicking start docking it will take maximum 10-15 minutes to complete the docking process. Protein ligand binding can be shown by clicking on view Docked poses and Post-Analyze.

DOCKING WITH SINGLE PROTEIN

This picture shows interaction between protein and ligand i.e between 3QD6 and N-ACETYL-D-GLUCOSAMINE. Docked structure may not be formed if the size of ligand is large because if size of ligand is large it almost become equal to protein size and iGEMDOCK software does not have the ability to do protein-protein interaction.

One can see dock poses by clicking on display button. Protein-ligand binding were shown in Rasmol ( Rasmol is a computer program written for molecular graphics visualization, intended and used primarily for the depiction and exploration of biological macromolecule structure, such as those found in protein data bank ). We can easily change the colour of the protein and ligand, and we have different options for displaying protein-ligand binding such as wireframe, backbone, sticks, cartoons, strands, ribbons, ball & sticks. One can also find out torsion angle, can easily pick atom, pick distance, rotate bond and etc.

CHEMICAL LIGANDS :

1.asarone2.ascorbic acid3.beta carotene4.luteoline5.quercitrin6.taraxasterol7.geraniol8.dehydroxy-isocalamendiol9.ribofalvin10.tripterine11.beta pinene12.hentriacontane13.n tetradecane14.syringic acid15.myrtenol16.isovaleric acid17.camphene18.niacin19.acetic acid20.palmitic acid21.hentriacontane22.hentriacontane23.sitoindocide424.scoparianoside b25.withaferal a26.cuscohygrine27.convoline28.8-azabicyclo[3.2.1]oct-3-yl 4-hydroxy-3-methoxy benzoate 29.valtrates(combination)30.scopoletine31.campherol32.alpha eleostearic acid33.momordica34.withenone35.bacoside a36.momordicine 2837.juglone38.vanelic acid39.syringic acid40.betulinic acid41.gamma sitosterol42.pedunculagin43.ferulic acid44.4 chomeric acid45.quercumine46.NAG(N-ACETYL-GLUCOSAMINE)

SUMMARY TABLE:

Due to large no. of ligands summary table is not organised properly.Therfore I have taken a summary table as an example.:

This above figure shows summary table of protein-ligand docking, in first row ligand is acetic acid which binds with protein, and docking gives the detail of binding energy i.e Gibbs free energy, Vander Waal forces and hydrogen bonding. In a similar way second row gives the detail of 2nd docking i.e with glycolic acid. Third row gives the binding energy of 2OR8 with Oxaloacetic aid, similarly values of binding energy, VDW and hydrogen bonding can be seen in fourth and fifth row. If binding energy is less then stability of protein-ligand structure is more and vice-versa.

INTERACTION TABLE

Ligand does not interact with whole structure of protein. There is some restricted area where ligand has to be bind with protein ant that area is known as active sites, where ligand has to be bound. Inside the pocket (active sites) there are some amino acid residues, which interacted with ligand. Above figure shows interaction of those residues with ligand, their name and energy. Amino acids present in active site are defined with green colour. Different ligands have different interaction with protein and also amino acid residues involved in interaction are different. Green colour shows competitive binding with substrate whereas below in figure shows black colour which represents Uncompetitive binding with substrate and red colour works as a activator for a substrate, so that substrate bind with protein instead of ligands.

INTERACTION PROFILE CLUSTERS

This figure shows clustering between five ligands with protein, it shows in coloured form, clustering does not take place with single ligand, at least two ligand must be required for clustering, first column shows the clustering of 46 ligands with protein. Numbers of arrays (residues) are found easily by clicking a particular gene. This figure can be seen only in java tree view, so for this java must be present in one’s computer. These figures are divided in to number of rows and columns, to know about number of residues at a particular position, one should click on particular, and it will give you entire column residue. Second column display the arrays and third column shows the name of the selected gene. Colour of column can be easily changed as number of options is provided on the tool bar. Second column shows name of the column and third shows name of the gene which you have selected. Clustering of protein sequences from different organisms has been used to identify orthologous and paralogous protein

CONCLUSION

Genes related to RHEUMATOID ARTHRITIS are studied here. Study reveals that there are nine genes present which can cause this auto immuno disease called RHEUMATOID ARTHRITIS. Out of these nine genes , five genes are selected for the study which are as follows:

1. HLA-DR12. CD-403. IL-24. PADI-45. STAT-4

The following tools have been applied for the study:1. Translate tool2. Atlas genomic tool3. Multiple sequence alignment tool/clustal W4. CLC MAIN WORKBENCH5. Geneious pro6. Boxshade

From these tools we got the following results:Translated protein sequence of the gene, multiple sequence alignment score of 89,orf of these genes,primer binding sites,nucleotide counts for every gene as well as comparitve statistics have been found, alignment tree has been produced by clustal w as well as clc workbench and geneious pro.dotplot results of each and every gene has been shown by geneios pro,the editting of the msa has been done by boxshade.this result shows that all genes belongs to immuno class and some polymorphism or mutations in these gene causes RHEUMATOID ARTHRITIS.From genome analysis I selected CD40 and found that it is a trf receptor and has good amount of active sites with multiple domain and can bind to many natural or herbal plants extract.It bounds with 44 ligands quite beautifully out of 46 .

In quality control of movex active which contains Glucosamine Sulphate 500 mg,Chondroitin Sulphate 400 mg and Diclofenac Potassuim 50 mg and it was found after the test assay that concentration of Glucosamine Sulphate is 514.38 mg , Chondroitin Sulphate is 419.5 mg and Diclofenac Potassuim is 51.51 which are considered as a nice result as range for Glucosamine Sulphate and Chondroitin Sulphate is-50 to + 50 mg while for Diclofenac Potassium is -5 to +5.

REFERENCES

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000).The Protein Data Bank. Nucleic acids research, 28:235-42.

Bracho MA, Saludes V, Martró E, Bargalló A, González-Candelas F, Ausina V (2008). Complete genome of a European hepatitis C virus subtype 1g isolate: phylogenetic and genetic analyses. Virology Journal, 5:5-72.

Brown L, van der Ouderaa F (2007). Nutritional genomics: food industry applications from farm to fork. The British journal of nutrition, 97:1027-35.

Byers P (2006). The role of genomics in medicine--past, present and future. Journal of Zhejiang University. Science. B, 7:159-60.

Chaston J, Douglas AE (2012). Making the most of "omics" for symbiosis research. The Biological Bulletin, 223:21-9.

Chakravarti DN, Fiske MJ, Fletcher LD, Zagursky RJ (2000). Application of genomics and proteomics for identification of bacterial gene products as potential vaccine candidates. Vaccine, 19:601-12.

Collins F, Green E, Guttmacher A, Guyer M (2003). A vision for the future of genomics research. Nature International weekly journal of science, 422:835-847.

David J, Elizabeth A (2000). Genomics, gene expression and DNA arrays. Nature International weekly journal of science, 4:827-836.

Daga PR, Duan J, Doerksen RJ (2010). Computational model of hepatitis B virus DNA polymerase: molecular dynamics and docking to understand resistant mutations. Protein science, 19:796-807.

Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (2011). T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic acids research, 39:13-17.

Dudley J, Butte AJ (2008). Enabling integrative genomic analysis of high-impact human diseases through text mining. Pacific Symposium on Bio computing. Pacific Symposium on Bio computing, 580-91.

Emerson SU, Huang YK, Nguyen H, Brockington A, Govindarajan S, St Claire M, Shapiro M, Purcell RH (2002). Identification of VP1/2A and 2C as virulence genes of hepatitis A virus and demonstration of genetic instability of 2C, Journal of virology, 76:8551-9.

Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al (1995). Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269:496-512.

Guo Y, Guo H, Zhang L, Xie H, Zhao X, Wang F, Li Z, Wang Y, Ma S, Tao J, Wang W, Zhou Y, Yang W, Cheng J (2005). Genomic analysis of anti-hepatitis B virus (HBV) activity by small interfering RNA and lamivudine in stable HBV-producing cells. Journal of virology, 79:14392-403.

Handelsman J (2005). Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiology and Molecular Biology Reviews, 69: 195.

Hsu KC, Chen YF, Lin SR, Yang JM. (2011). iGEMDOCK: a graphical environment of enhancing GEMDOCK using pharmacological interactions and post-screening analysis. BMC Bioinformatics, 1:1471-2105.

Huang SY, Zou X (2010). Advances and challenges in protein-ligand docking. International Journal of Molecular Sciences, 11:3016-34.

Jou JH, Muir AJ (2005). American Academy of Pediatrics Committee on Infectious Diseases. Hepatitis A vaccine recommendations. Pediatrics, 120:189-99.

Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. (2006). From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Research, 1:354-7.

Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A (2010). Gene expression atlas at the European bioinformatics institute. Nucleic acids research, 28: 690-698.

Martin A, Lemon SM (2006). Hepatitis A virus: from discovery to vaccines. Hepatology, 43:162-174.

Marsden RL, Lewis TA, Orengo CA (2007). Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics, 8:86.

Prabdial-Sing N, Puren AJ, Bowyer SM (2012). Sequence-based in silico analysis of well studied hepatitis C virus epitopes and their variants in other genotypes (particularly genotype 5a) against South African human leukocyte antigen backgrounds. BMC Immunology, 13:67.

Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology, 7:539.

Singh S, Gupta SK, Nischal A, Khattri S, Nath R, Pant KK, Seth PK (2011). Identification and characterization of novel small-molecule inhibitors against hepatitis delta virus replication by using docking strategies. Hepatitis monthly, 11:803809.

Su AI, Pezacki JP, Wodicka L, Brideau AD, Supekova L, Thimme R, Wieland S, Bukh J, Purcell RH, Schultz PG, Chisari FV (2002). Genomic analysis of the host response to hepatitis C virus infection. Proceedings of the National Academy of Sciences of the United States of America, 99:15669-74.

Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG. (2012). Primer3--new capabilities and interfaces. Nucleic Acids Research, 40:15-30.

Xu B, Zhi N, Hu G, Wan Z, Zheng X, Liu X, Wong S, Kajigaya S, Zhao K, Mao Q, Young NS (2013). Hybrid DNA virus in Chinese patients with seronegative hepatitis discovered by deep sequencing. Proceedings of the National Academy of Sciences of the United States of America, 38:13-20.

yadav sp (2007). The wholeness in suffix -omics, -omes, and the word om. Journal of Biomolecular Techniques, 18:277.

Zhang C, Kim SH (2003). Overview of structural genomics: from structure to function. Europe PubMed central, 7:28-32.

URLs

www.ncbi.nlm.nih.gov

www.geneious.com

http://www.ebi.ac.uk/gxa

www.uniprot.org/

www.rcsb.org/

www.drugbank.ca/

www.genome.jp/kegg/compound/

blast.ncbi.nlm.nih.gov/

www.ebi.ac.uk/Tools/msa/clustalo/

www.ebi.ac.uk/Tools/msa/tcoffee/

Simgene.com/Primer3

www.ncbi.nlm.nih.gov › NCBI › Chemicals & Bioassays

cactus.nci.nih.gov/translate/

string-db.org/

Health & Medicine

rheumatoid arthritis