Machine Learning in Production - Kelleher Andrew, Kelleher Adam

Kelleher Andrew, Kelleher Adam

Machine Learning in Production

Developing and Optimizing Data Science Workflows and Applications. Empfohlen von 18 bis 67 Jahre. 1. Auflage. Electronic book text. Sprachen: Englisch
eBook (pdf), 288 Seiten
EAN 9780134116594
Veröffentlicht Februar 2019
Verlag/Hersteller Pearson ITP
37,99 inkl. MwSt.
Teilen
Beschreibung

Foundational Hands-On Skills for Succeeding with Real Data Science Projects This pragmatic book introduces both machine learning and data science, bridging gaps between data scientist and engineer, and helping you bring these techniques into production. It helps ensure that your efforts actually solve your problem, and offers unique coverage of real-world optimization in production settings. -From the Foreword by Paul Dix, series editor Machine Learning in Production is a crash course in data science and machine learning for people who need to solve real-world problems in production environments. Written for technically competent "accidental data scientists" with more curiosity and ambition than formal training, this complete and rigorous introduction stresses practice, not theory. Building on agile principles, Andrew and Adam Kelleher show how to quickly deliver significant value in production, resisting overhyped tools and unnecessary complexity. Drawing on their extensive experience, they help you ask useful questions and then execute production projects from start to finish. The authors show just how much information you can glean with straightforward queries, aggregations, and visualizations, and they teach indispensable error analysis methods to avoid costly mistakes. They turn to workhorse machine learning techniques such as linear regression, classification, clustering, and Bayesian inference, helping you choose the right algorithm for each production problem. Their concluding section on hardware, infrastructure, and distributed systems offers unique and invaluable guidance on optimization in production environments. Andrew and Adam always focus on what matters in production: solving the problems that offer the highest return on investment, using the simplest, lowest-risk approaches that work. - Leverage agile principles to maximize development efficiency in production projects - Learn from practical Python code examples and visualizations that bring essential algorithmic concepts to life - Start with simple heuristics and improve them as your data pipeline matures - Avoid bad conclusions by implementing foundational error analysis techniques - Communicate your results with basic data visualization techniques - Master basic machine learning techniques, starting with linear regression and random forests - Perform classification and clustering on both vector and graph data - Learn the basics of graphical models and Bayesian inference - Understand correlation and causation in machine learning models - Explore overfitting, model capacity, and other advanced machine learning techniques - Make informed architectural decisions about storage, data transfer, computation, and communication Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Portrait

Andrew Kelleher is a staff software engineer and distributed systems architect at Venmo. He was previously a staff software engineer at BuzzFeed and has worked on data pipelines and algorithm implementations for modern optimization. He graduated with a BS in physics from Clemson University. He runs a meetup in New York City that studies the fundamentals behind distributed systems in the context of production applications, and was ranked one of FastCompany's most creative people two years in a row. Adam Kelleher wrote this book while working as principal data scientist at BuzzFeed and adjunct professor at Columbia University in the City of New York. As of May 2018, he is chief data scientist for research at Barclays and teaches causal inference and machine learning products at Columbia. He graduated from Clemson University with a BS in physics, and has a PhD in cosmology from University of North Carolina at Chapel Hill.

Inhaltsverzeichnis

Foreword xv Preface xvii About the Authors xxi Part I: Principles of Framing 1 Chapter 1: The Role of the Data Scientist 3 1.1 Introduction 3 1.2 The Role of the Data Scientist 3 1.3 Conclusion 6 Chapter 2: Project Workflow 7 2.1 Introduction 7 2.2 The Data Team Context 7 2.3 Agile Development and the Product Focus 10 2.4 Conclusion 15 Chapter 3: Quantifying Error 17 3.1 Introduction 17 3.2 Quantifying Error in Measured Values 17 3.3 Sampling Error 19 3.4 Error Propagation 21 3.5 Conclusion 23 Chapter 4: Data Encoding and Preprocessing 25 4.1 Introduction 25 4.2 Simple Text Preprocessing 26 4.3 Information Loss 33 4.4 Conclusion 34 Chapter 5: Hypothesis Testing 37 5.1 Introduction 37 5.2 What Is a Hypothesis? 37 5.3 Types of Errors 39 5.4 P-values and Confidence Intervals 40 5.5 Multiple Testing and "P-hacking" 41 5.6 An Example 42 5.7 Planning and Context 43 5.8 Conclusion 44 Chapter 6: Data Visualization 45 6.1 Introduction 45 6.2 Distributions and Summary Statistics 45 6.3 Time-Series Plots 58 6.4 Graph Visualization 61 6.5 Conclusion 64 Part II: Algorithms and Architectures 67 Chapter 7: Introduction to Algorithms and Architectures 69 7.1 Introduction 69 7.2 Architectures 70 7.3 Models 74 7.4 Conclusion 77 Chapter 8: Comparison 79 8.1 Introduction 79 8.2 Jaccard Distance 79 8.3 MinHash 82 8.4 Cosine Similarity 84 8.5 Mahalanobis Distance 86 8.6 Conclusion 88 Chapter 9: Regression 89 9.1 Introduction 89 9.2 Linear Least Squares 96 9.3 Nonlinear Regression with Linear Regression 105 9.4 Random Forest 109 9.5 Conclusion 115 Chapter 10: Classification and Clustering 117 10.1 Introduction 117 10.2 Logistic Regression 118 10.3 Bayesian Inference, Naive Bayes 122 10.4 K-Means 125 10.5 Leading Eigenvalue 128 10.6 Greedy Louvain 130 10.7 Nearest Neighbors 131 10.8 Conclusion 133 Chapter 11: Bayesian Networks 135 11.1 Introduction 135 11.2 Causal Graphs, Conditional Independence, and Markovity 136 11.3 D-separation and the Markov Property 138 11.4 Causal Graphs as Bayesian Networks 142 11.5 Fitting Models 143 11.6 Conclusion 147 Chapter 12: Dimensional Reduction and Latent Variable Models 149 12.1 Introduction 149 12.2 Priors 149 12.3 Factor Analysis 151 12.4 Principal Components Analysis 152 12.5 Independent Component Analysis 154 12.6 Latent Dirichlet Allocation 159 12.7 Conclusion 165 Chapter 13: Causal Inference 167 13.1 Introduction 167 13.2 Experiments 168 13.3 Observation: An Example 171 13.4 Controlling to Block Non-causal Paths 177 13.5 Machine-Learning Estimators 182 13.6 Conclusion 187 Chapter 14: Advanced Machine Learning 189 14.1 Introduction 189 14.2 Optimization 189 14.3 Neural Networks 191 14.4 Conclusion 201 Part III: Bottlenecks and Optimizations 203 Chapter 15: Hardware Fundamentals 205 15.1 Introduction 205 15.2 Random Access Memory 205 15.3 Nonvolatile/Persistent Storage 206 15.4 Throughput 208 15.5 Processors 209 15.6 Conclusion 212 Chapter 16: Software Fundamentals 213 16.1 Introduction 213 16.2 Paging 213 16.3 Indexing 214 16.4 Granularity 214 16.5 Robustness 216 16.6 Extract, Transfer/Transform, Load 216 16.7 Conclusion 216 Chapter 17: Software Architecture 217 17.1 Introduction 217 17.2 Client-Server Architecture 217 17.3 N-tier/Service-Oriented Architecture 218 17.4 Microservices 220 17.5 Monolith 220 17.6 Practical Cases (Mix-and-Match Architectures) 221 17.7 Conclusion 221 Chapter 18: The CAP Theorem 223 18.1 Introduction 223 18.2 Consistency/Concurrency 223 18.3 Availability 225 18.4 Partition Tolerance 231 18.5 Conclusion 232 Chapter 19: Logical Network Topological Nodes 233 19.1 Introduction 233 19.2 Network Diagrams 233 19.3 Load Balancing 234 19.4 Caches 235 19.5 Databases 238 19.6 Queues 241 19.7 Conclusion 243 Bibliography 245 Index 247

Technik
Sie können dieses eBook zum Beispiel mit den folgenden Geräten lesen:
• tolino Reader 
Laden Sie das eBook direkt über den Reader-Shop auf dem tolino herunter oder übertragen Sie das eBook auf Ihren tolino mit einer kostenlosen Software wie beispielsweise Adobe Digital Editions. 
• Sony Reader & andere eBook Reader 
Laden Sie das eBook direkt über den Reader-Shop herunter oder übertragen Sie das eBook mit der kostenlosen Software Sony READER FOR PC/Mac oder Adobe Digital Editions auf ein Standard-Lesegeräte. 
• Tablets & Smartphones 
Möchten Sie dieses eBook auf Ihrem Smartphone oder Tablet lesen, finden Sie hier unsere kostenlose Lese-App für iPhone/iPad und Android Smartphone/Tablets. 
• PC & Mac 
Lesen Sie das eBook direkt nach dem Herunterladen mit einer kostenlosen Lesesoftware, beispielsweise Adobe Digital Editions, Sony READER FOR PC/Mac oder direkt über Ihre eBook-Bibliothek in Ihrem Konto unter „Meine eBooks“ -  „Sofort online lesen über Meine Bibliothek“.
 
Bitte beachten Sie, dass die Kindle-Geräte das Format nicht unterstützen und dieses eBook somit nicht auf Kindle-Geräten lesbar ist.
Hersteller
Libri GmbH
Friedensallee 273

DE - 22763 Hamburg

E-Mail: GPSR@libri.de

Website: www.libri.de

Das könnte Sie auch interessieren

David Foster Wallace
Das hier ist Wasser
eBook (epub)
Sofort lieferbar (Download)
0,99
László Krasznahorkai
Herscht 07769
eBook (epub)
Sofort lieferbar (Download)
19,99
Ayelet Gundar-Goshen
Wo der Wolf lauert
eBook (epub)
Sofort lieferbar (Download)
13,99
Robert Harris
Dictator
eBook (epub)
Sofort lieferbar (Download)
9,99
Mieko Kawakami
Brüste und Eier
eBook (epub)
Sofort lieferbar (Download)
9,99
Janina Beigel
Deeper Learning gestalten
eBook (pdf)
Sofort lieferbar (Download)
0,00
Sofort lieferbar (Download)
12,99
Peter A. Levine
Sprache ohne Worte
eBook (epub)
Sofort lieferbar (Download)
26,99
Simone Kannengieser
Sprachentwicklungsstörungen
eBook (epub)
Sofort lieferbar (Download)
84,99
Maria Judite de Carvalho
Leere Schränke
eBook (epub)
Sofort lieferbar (Download)
18,99
Caroline von St. Ange
Alles ist schwer, bevor es leicht ist
eBook (epub)
Sofort lieferbar (Download)
9,99
Joe Navarro
Menschen lesen
eBook (epub)
Sofort lieferbar (Download)
12,99
Neal Stephenson
Snow Crash
eBook (epub)
Sofort lieferbar (Download)
14,99
Christine Feehan
Giovanni
eBook (epub)
Sofort lieferbar (Download)
5,99
Noriko Morishita
Die Magnolienkatzen
eBook (epub)
Sofort lieferbar (Download)
17,99
Álvaro Enrigue
Von Königreichen hast du geträumt
eBook (epub)
Sofort lieferbar (Download)
18,99
Yukio Mishima
Der Goldene Pavillon
eBook (epub)
Sofort lieferbar (Download)
13,99
Axel Hutter
Sprachanalyse und Metaphysik
eBook (epub)
Sofort lieferbar (Download)
19,99
Josef Naber
Lernen kann jeder!
eBook (epub)
Sofort lieferbar (Download)
10,99
Dunya Mikhail
Das Vogel-Tattoo
eBook (epub)
Sofort lieferbar (Download)
18,99
Cixin Liu
Spiegel
eBook (epub)
Sofort lieferbar (Download)
6,99
Ina Lehr
Hallo Lernen!
eBook (epub)
Sofort lieferbar (Download)
10,99
Tiziana Bruno
Körpersprache und Rhetorik
eBook (pdf)
Sofort lieferbar (Download)
18,99
Albrecht Beutelspacher
Geheimsprachen und Kryptographie
eBook (epub)
Sofort lieferbar (Download)
9,99
Sofort lieferbar (Download)
5,99
Sofort lieferbar (Download)
23,99
Fanny König
Himmel, Herrgott, Hirschgeweih
eBook (epub)
Sofort lieferbar (Download)
5,99
Markus Meyer
Unterrichten mit Lernlandkarten
eBook (pdf)
Sofort lieferbar (Download)
23,99
Richard Egger
Sprache - Software des Geistes
eBook (pdf)
Sofort lieferbar (Download)
17,99
R. J. Barker
Die Rache des Assassinen
eBook (epub)
Sofort lieferbar (Download)
16,99
John Kulvicki
Modeling the Meanings of Pictures
eBook (pdf)
Sofort lieferbar (Download)
49,49
Erik Flügge
Der Jargon der Betroffenheit
eBook (epub)
Sofort lieferbar (Download)
8,99
Takashi Hiraide
Der Gast im Garten
eBook (epub)
Sofort lieferbar (Download)
7,99
Vera F. Birkenbihl
Stroh im Kopf?
eBook (pdf)
Sofort lieferbar (Download)
8,49
Javier Cercas
Die Erpressung
eBook (epub)
Sofort lieferbar (Download)
19,99
Leonie Lutz
Verstehen statt verlieren
eBook (epub)
Sofort lieferbar (Download)
12,99