Get quick access to your product right after purchase—no wait times.
Estimated to be delivered 0-24 H
Rest assured with our E-books digital products—delivered instantly via email! We stand by the quality of our digital offerings, but if you're not satisfied, we offer a hassle-free return policy.
You can request a refund for most digital purchases within 30 days of delivery. If you encounter any issues or if the product doesn't meet your expectations, simply contact our support team.
Although these digital products aren't physically returned, your satisfaction is our priority. Our team will work with you to ensure a smooth resolution or provide a refund promptly to your original payment method.
Shop with confidence! Enjoy the assurance of secure transactions with PayPal, a trusted and globally recognized payment gateway. Benefit from the safety and coverage of PayPal Buyer Protection, ensuring your purchase is secure and protected. Pay seamlessly with your credit or debit card through PayPal, providing you with an additional layer of security and convenience. Your peace of mind is our priority, and with PayPal, your transactions are backed by industry-leading safety measures and buyer guarantees.
Statistics for Biology and Health Series Editors W. Wong, M. Gail, K. Krickeberg, A. Tsiatis, J. Samet Robert Gentleman Rafael A. Irizarry Vincent J. Carey Sandrine Dudoit Wolfgang Huber Editors Bioinformatics and Computational Biology Solutions Using R and Bioconductor With 128 Illustrations Editors Robert Gentleman Vincent J. Carey Program in Computational Biology Channing Laboratory Division of Public Health Sciences Brigham and Women’s Hospital Fred Hutchinson Cancer Research Center Harvard Medical School 1100 Fairview Ave. N, M2-B876 181 Longwood Ave Boston MA 02115 USA PO Box 19024 Preface During the past few years, there have been enormous advances in ge- nomics and molecular biology, which carry the promise of understanding the functioning of whole genomes in a systematic manner. The challenge of interpreting the vast amounts of data from microarrays and other high throughput technologies has led to the development of new tools in the ?elds of computational biology and bioinformatics, and opened exciting new connections to areas such as chemometrics, exploratory data analysis, statistics, machine learning, and graph theory. The Bioconductor project is an open source and open development soft- ware project for the analysis and comprehension of genomic data. It is rooted in the open source statistical computing environment R. This book’s coverage is broad and ranges across most of the key capabilities of the Bioconductor project. Thanks to the hard work and dedication of many developers, a responsive and enthusiastic user community has formed. Al- though this book is self-contained with respect to the data processing and data analytic tasks covered, readers of this book are advised to acquaint themselves with other aspects of the project by touring the project web site www.bioconductor.org. This book represents an innovative approach to publishing about sci- enti?c software. We made a commitment at the outset to have a fully computable book. Tables, ?gures, and other outputs are dynamically gen- erated directly from the experimental data. Through the companion web site, www.bioconductor.org/mogr, readers have full access to the source code and necessary supporting libraries and hence will be able to see how every plot and statistic was computed. They will be able to reproduce those calculations on their own computers and should be able to extend most of those computations to address their own needs. Acknowledgments This book, like so many projects in bioinformatics and computational bi- ology, is a large collaborative e?ort. The editors would like to thank the chapter authors for their dedication and their e?orts in producing widely used software, and also in producing well-written descriptions of how to use that software. We would like to thank the developers of R, without whom there would be no Bioconductor project. Many of these developers have provided ad- ditional help and engaged in discussions about software development and design. We would like to thank the many Bioconductor developers and users who have helped us to ?nd bugs, think di?erently about problems, and whose enthusiasm has made the long hours somewhat more bearable. We would also like to thank Dorit Arlt, Michael Boutros, Sabina Chiaretti, James MacDonald, Meher Majety, Annemarie Poustka, Jerome vi Preface Ritz, Mamatha Sauermann, Holger Su¨ltmann, Stefan Wiemann, and Seth Falcon, who have contributed in many di?erent ways to the production of this monograph. Much of the preliminary work on the MLInterfaces pack- age, described in Chapter 16, was carried out by Jess Mar, Department of Biostatistics, Harvard School of Public Health. Ms Mar’s e?orts were supported in part by a grant from Insightful Corporation. The Bioconductor project is supported by grant 1R33 HG002708 from the NIH as well as by institutional funds at both the Dana Farber Cancer Institute and the Fred Hutchinson Cancer Research Center. W.H. received project-related funding from the German Ministry for Education and Re- search through National Genome Research Network (NGFN) grant FKZ 01GR0450. Seattle Robert Gentleman Boston Vincent Carey Cambridge (UK) Wolfgang Huber Baltimore Rafael Irizarry Berkeley Sandrine Dudoit February 2005 xviii Contributors J. Gentry, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA F. Hahne, Division of Molecular Genome Analysis, German Cancer Re- search Center, Heidelberg, FRG L. Harris, Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA T. Hothorn, Institut fu¨r Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universita¨t Erlangen-Nu¨rnberg, FRG W. Huber, European Molecular Biology Laboratory, European Bioinfor- matics Institute, Cambridge, UK J. Ibrahim, Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA J. D. Iglehart, Department of Cancer Biology, Dana Farber Cancer Insti- tute, Boston, MA, USA R. A. Irizarry, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA X. Li, Department of Biostatistics and Computational Biology, Dana Far- ber Cancer Institute, Boston, MA, USA X. Lu, Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA A. Miron, Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA A. C. Paquet, Department of Biostatistics, University of California, San Francisco, CA, USA K. S. Pollard, Center for Biomolecular Science and Engineering, University of California, Santa Cruz, USA D. Scholtens, Department of Preventive Medicine, Northwestern Univer- sity, Chicago, IL, USA Q. Shi, Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA Contents I Preprocessing data from genomic experiments 1 1 Preprocessing Overview 3 W. Huber, R.A. Irizarry, and R. Gentleman 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Stepwise and integrated approaches . . . . . . . . . 5 1.3 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Data sources . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Facilities in R and Bioconductor . . . . . . . . . . . 7 1.4 Statistical background . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 An error model . . . . . . . . . . . . . . . . . . . . . 9 1.4.2 The variance-bias trade-o? . . . . . . . . . . . . . . 11 1.4.3 Sensitivity and speci?city of probes . . . . . . . . . 11 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Preprocessing High-density Oligonucleotide Arrays 13 B.M. Bolstad, R.A. Irizarry, L. Gautier, and Z. Wu 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Importing and accessing probe-level data . . . . . . . . . . 15 2.2.1 Importing . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Examining probe-level data . . . . . . . . . . . . . 15 2.3 Background adjustment and normalization . . . . . . . . . 18 2.3.1 Background adjustment . . . . . . . . . . . . . . . . 18 2.3.2 Normalization . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 vsn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 expresso . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 threestep . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.3 RMA . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.4 GCRMA . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.5 a?ypdnn . . . . . . . . . . . . . . . . . . . . . . . . 28 viii Contents 2.5 Assessing preprocessing methods . . . . . . . . . . . . . . . 29 2.5.1 Carrying out the assessment . . . . . . . . . . . . . 30 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Quality Assessment of A?ymetrix GeneChip Data 33 B.M. Bolstad, F. Collin, J. Brettschneider, K. Simpson, L. Cope, R.A. Irizarry, and T.P. Speed 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Exploratory data analysis . . . . . . . . . . . . . . . . . . . 34 3.2.1 Multi-array approaches . . . . . . . . . . . . . . . . 35 3.3 A?ymetrix quality assessment metrics . . . . . . . . . . . . 37 3.4 RNA degradation . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Probe level models . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.1 Quality diagnostics using PLM . . . . . . . . . . . 42 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Preprocessing Two-Color Spotted Arrays 49 Y.H. Yang and A.C. Paquet 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Two-color spotted microarrays . . . . . . . . . . . . . . . . 50 4.2.1 Illustrative data . . . . . . . . . . . . . . . . . . . . 50 4.3 Importing and accessing probe-level data . . . . . . . . . . 51 4.3.1 Importing . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Reading target information . . . . . . . . . . . . . . 52 4.3.3 Reading probe-related information . . . . . . . . . 53 4.3.4 Reading probe and background intensities . . . . . 54 4.3.5 Data structure: the marrayRaw class . . . . . . . . 54 4.3.6 Accessing the data . . . . . . . . . . . . . . . . . . . 56 4.3.7 Subsetting . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Quality assessment . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.1 Diagnostic plots . . . . . . . . . . . . . . . . . . . . 57 4.4.2 Spatial plots of spot statistics - image . . . . . . . . 59 4.4.3 Boxplots of spot statistics - boxplot . . . . . . . . . 60 4.4.4 Scatter-plots of spot statistics - plot . . . . . . . . 61 4.5 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.1 Two-channel normalization . . . . . . . . . . . . . . 63 4.5.2 Separate-channel normalization . . . . . . . . . . . 64 4.6 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5 Cell-Based Assays 71 W. Huber and F. Hahne 5.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Experimental technologies . . . . . . . . . . . . . . . . . . 71 5.2.1 Expression assays . . . . . . . . . . . . . . . . . . . 72 5.2.2 Loss of function assays . . . . . . . . . . . . . . . . 72 Contents ix 5.2.3 Monitoring the response . . . . . . . . . . . . . . . 72 5.3 Reading data . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Plate reader data . . . . . . . . . . . . . . . . . . . 74 5.3.2 Further directions in normalization . . . . . . . . . 76 5.3.3 FCS format . . . . . . . . . . . . . . . . . . . . . . . 77 5.4 Quality assessment and visualization . . . . . . . . . . . . 79 5.4.1 Visualization at the level of individual cells . . . . 79 5.4.2 Visualization at the level of microtiter plates . . . 82 5.4.3 Brushing with Rggobi . . . . . . . . . . . . . . . . . 83 5.5 Detection of e?ectors . . . . . . . . . . . . . . . . . . . . . 85 5.5.1 Discrete Response . . . . . . . . . . . . . . . . . . . 85 5.5.2 Continuous response . . . . . . . . . . . . . . . . . . 88 5.5.3 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . 90 6 SELDI-TOF Mass Spectrometry Protein Data 91 X. Li, R. Gentleman, X. Lu, Q. Shi, J.D. Iglehart, L. Harris, and A. Miron 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 Baseline subtraction . . . . . . . . . . . . . . . . . . . . . . 93 6.3 Peak detection . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.4 Processing a set of calibration spectra . . . . . . . . . . . . 96 6.4.1 Apply baseline subtraction to a set of spectra . . . 98 6.4.2 Normalize spectra . . . . . . . . . . . . . . . . . . . 99 6.4.3 Cuto? selection . . . . . . . . . . . . . . . . . . . . 100 6.4.4 Identify peaks . . . . . . . . . . . . . . . . . . . . . 101 6.4.5 Quality assessment . . . . . . . . . . . . . . . . . . 101 6.4.6 Get proto-biomarkers . . . . . . . . . . . . . . . . . 102 6.5 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 II Meta-data: biological annotation and visualiza- tion 111 7 Meta-data Resources and Tools in Bioconductor 113 R. Gentleman, V. J. Carey, and J. Zhang 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.2 External annotation resources . . . . . . . . . . . . . . . . 115 7.3 Bioconductor annotation concepts: curated persistent packages and Web services . . . . . . . . . . . . . . . . . . 116 7.3.1 Annotating a platform: HG-U95Av2 . . . . . . . . 117 7.3.2 An Example . . . . . . . . . . . . . . . . . . . . . . 118 7.3.3 Annotating a genome . . . . . . . . . . . . . . . . . 119 7.4 The annotate package . . . . . . . . . . . . . . . . . . . . . 119 7.5 Software tools for working with Gene Ontology (GO) . . . 120 x Contents 7.5.1 Basics of working with the GO package . . . . . . . 121 7.5.2 Navigating the hierarchy . . . . . . . . . . . . . . . 122 7.5.3 Searching for terms . . . . . . . . . . . . . . . . . . 122 7.5.4 Annotation of GO terms to LocusLink sequences: evidence codes . . . . . . . . . . . . . . . . . . . . . 123 7.5.5 The GO graph associated with a term . . . . . . . 125 7.6 Pathway annotation packages: KEGG and cMAP . . . . . . 125 7.6.1 KEGG . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.6.2 cMAP . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.6.3 A Case Study . . . . . . . . . . . . . . . . . . . . . 129 7.7 Cross-organism annotation: the homology packages . . . . 130 7.8 Annotation from other sources . . . . . . . . . . . . . . . . 132 7.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8 Querying On-line Resources 135 V. J. Carey, D. Temple Lang, J. Gentry, J. Zhang, and R. Gentleman 8.1 The Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.1.1 Entrez . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1.2 Entrez examples . . . . . . . . . . . . . . . . . . . . 137 8.2 PubMed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.2.1 Accessing PubMed information . . . . . . . . . . . 139 8.2.2 Generating HTML output for your abstracts . . . 141 8.3 KEGG via SOAP . . . . . . . . . . . . . . . . . . . . . . . 142 8.4 Getting gene sequence information . . . . . . . . . . . . . . 144 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 9 Interactive Outputs 147 C.A. Smith, W. Huber, and R. Gentleman 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.2 A simple approach . . . . . . . . . . . . . . . . . . . . . . . 148 9.3 Using the anna?y package . . . . . . . . . . . . . . . . . . . 149 9.4 Linking to On-line Databases . . . . . . . . . . . . . . . . . 152 9.5 Building HTML pages . . . . . . . . . . . . . . . . . . . . . 153 9.5.1 Limiting the results . . . . . . . . . . . . . . . . . . 153 9.5.2 Annotating the probes . . . . . . . . . . . . . . . . 154 9.5.3 Adding other data . . . . . . . . . . . . . . . . . . . 155 9.6 Graphical displays with drill-down functionality . . . . . . 156 9.6.1 HTML image maps . . . . . . . . . . . . . . . . . . 157 9.6.2 Scalable Vector Graphics (SVG) . . . . . . . . . . . 158 9.7 Searching Meta-data . . . . . . . . . . . . . . . . . . . . . . 159 9.7.1 Text searching . . . . . . . . . . . . . . . . . . . . . 159 9.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 160 10 Visualizing Data 161
Experience immediate satisfaction with our Ebooks digital products! Once your purchase is complete, your product is promptly delivered via email, ensuring zero wait times. Anticipate receiving your digital delivery within 0-24 hours, granting you rapid access to our premium digital offerings. Embrace efficiency with Ebooks—swiftly delivered straight to your inbox.
We are committed to your satisfaction with our Ebooks digital products. If, for any reason, you're unsatisfied, we offer a hassle-free return policy. You can request a refund for most digital purchases within 30 days of delivery. Contact our dedicated support team in case of any issues or if the product doesn't meet your expectations.
As these digital products aren't physically returned, your contentment remains our priority. Our team will diligently collaborate with you to ensure a smooth resolution or promptly process a refund to your original payment method. Your satisfaction is at the core of our commitment.
Thanks for subscribing!
This email has been registered!