Practical Binary Analysis Build Your Own Linux Tools

所需积分/C币:35 2019-01-10 00:48:17 15.86MB PDF

Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly By 作者: Dennis Andriesse ISBN-10 书号: 1593279124 ISBN-13 书号: 9781593279127 出版日期: 2018-12-11 pages 页数: (449) Stop manually analyzing binary! Practical Binary Analysis is the first book of its ki
PRACTICAL BINARY ANALYSIS. Copyright 2019 by Dennis Andriesse. All rights reserved. No part of this work may be reproduced or transmitted in any form by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-10:1-59327-912-4 ISBN-13:978-1-59327-912-7 Publisher: William pollock Production Editor: Riley Hoffman Cover illustration: Rick Reese Interior Design: Octopod Studios Developmental Editor: Annie Choi Technical reviewers: Thorsten holz and Tim vidas Copyeditor: Kim Wimpsett Compositor: Riley hoffman Proofreader: Paula L. Fleming For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly No Starch Press. Inc 245 8th Street, San Francisco, CA 94103 phone1.415.863.9900;info@nostarch.com www.nostarch.com Library of Congress Cataloging-in-Publication data Names: Andriesse. Dennis. author Title: Practical binary analysis: build your own Linux tools for binary instrumentation, analysis, and disassembly/ Dennis Andriesse. Description: San Francisco No Starch Press, Inc, [2019] Includes index Identifiers: LCCN 2018040696(print) LCCN 2018041700(ebook) ISBN 9781593279134(epub)|ISBN1593279132(epub)ISBN 9781593279127( print ISBN 1593279124(print) Subjects: LCSH: Disassemblers( Computer programs) Binary system (Mathematics) Assembly languages(Electronic computers) Linux Classification: LCC QA7676 D57(ebook) LCC QA76 76 D57 A53 2019 ( print) DDC 005.4/5--dc23 Lcrecordavailableathttps://iccn.loc.gov/2018040696 No Starch Press and the No Starch Press logo are registered trademarks of No starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every and to the benefit of the trademark owner, with no intention of infringement ofthe 9 occurrence of a trademarked name, we are using the names only in an editorial fashion trademark The information in this book is distributed on an"As Is basis, without warranty while every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it NTRODUCTION The vast majority of computer programs are written in high-level languages like C or C++, which computers cant run directly. before you can use these programs, you must first compile them into binary executables containing machine code that the computer can run. But how do you know that the compiled program has the same semantics as the high-level source? The unnerving answer is that you dont There's a big semantic gap between high-level languages and binary machine code that not many people know how to bridge. Even most programmers have limited knowledge of how their programs really work at the lowest level, and they simply trust that the compiled program is true to their intentions. As a result, many compiler bugs, subtle implementation errors, binary-level backdoors, and malicious parasites can go unnotice To make matters worse, there are countless binary programs and libraries-in industry at banks, in embedded systems-for which the source code is long lost or proprietary That means it's impossible to patch those programs and libraries or assess their security at the source level using conventional methods. This is a real problem even for major software companies, as evidenced by Microsoft's recent release of a painstakingly handcrafted binary patch for a buffer overflow in its Equation Editor program, which is part of the microsoft Office suite In this book, you'll learn how to analyze and even modify programs at the binary level Whether you're a hacker, a security researcher, a malware analyst, a programmer, or simply interested, these techniques will give you more control over and insight into the binary programs you create and use every day WHAT IS BINARY ANALYSIS. AND WHY DO YOU NEED IT? Binary analysis is the science and art of analyzing the properties of binary computer programs, called binaries, and the machine code and data they contain. Briefly put, the goal of all binary analysis is to figure out(and possibly modify) the true properties of binary programs-in other words, what they really do as opposed to what we think they should do Many people associate binary analysis with reverse engineering and disassembly, and theyre at least partially correct. Disassembly is an important first step in many forms of binary analysis, and reverse engineering is a common application of binary analysis and is often the only way to document the behavior of proprietary software or malware However, the field of binary analysis encompasses much more than this Broadly speaking, you can divide binary analysis techniques into two classes, or a combination of these Static analysis Static analysis techniques reason about a binary without running it This approach has several advantages: you can potentially analyze the whole binary in one go, and you don' t need a CPu that can run the binary. For instance, you can statically analyze an ARM binary on an x86 machine. The downside is that static analysis has no knowledge of the binary's runtime state, which can make the analysis very challenging Dynamic analysis In contrast, dynamic analysis runs the binary and analyzes it as it executes. This approach is often simpler than static analysis because you have full knowledge of the entire runtime state, including the values of variables and the outcomes of conditional branches. However, you see only the executed code, so the analysis may miss interesting parts of the program Both static and dynamic analyses have their advantages and disadvantages, and you'll learn techniques from both schools of thought in this book. In addition to passive binary analysis, you ' ll also learn binary instrumentation techniques that you can use to modify binary programs without needing source Binary instrumentation relies on analysis techniques like disassembly, and at the same time it can be used to aid binary analysis. Because of this symbiotic relationship between binary analysis and instrumentation techniques, this books covers both I already mentioned that you can use binary analysis to document or pentest programs for which you dont have source. But even if source is available, binary analysis can be useful to find subtle bugs that manifest themselves more clearly at the binary level than at the source level. Many binary analysis techniques are also useful for advanced debugging. This book covers binary analysis techniques that you can use in all these scenarios and more WHAT MAKES BINARY ANALYSIS CHALLENGING? Binary analysis is challenging and much more difficult than equivalent analysis at the source code level. In fact, many binary analysis tasks are fundamentally undecidable, meaning that it's impossible to build an analysis engine for these problems that always returns a correct result! To give you an idea of the challenges to expect, here is a list of some of the things that make binary analysis difficult. Unfortunately the list is far from exhaustive No symbolic information When we write source code in a high-level language like c or C++, we give meaningful names to constructs such as variables. functions, and classes. We call these names symbolic information, or symbols for short. Good naming conventions make the source code much easier to understand. but they have no real relevance at the binary level. As a result, binaries are often stripped of symbols, making it much harder to understand the code No type information Another feature of high-level programs is that they revolve around variables with well-defined types, such as int, float, or string, as well as more complex data structures like struct types. In contrast, at the binary level, types are never explicitly stated, making the purpose and structure of data hard to infer No high-level abstractions Modern programs are compartmentalized into classes and functions, but compilers throw away these high-level constructs. That means binaries appear as huge blobs of code and data, rather than well-structured programs and restoring the high-level structure is complex and error-prone Mixed code and data Binaries can(and do) contain data fragments mixed in with the executable code. 2 This makes it easy to accidentally interpret data as code, or vice versa, leading to incorrect results Location-dependent code and data because binaries are not designed to be modified, even adding a single machine instruction can cause problems as it shifts other code around, invalidating memory addresses and references from elsewhere in the code. As a result, any kind of code or data modification is extremely challenging and prone to breaking the binary As a result of these challenges, we often have to live with imprecise analysis results in practice. An important part of binary analysis is coming up with creative ways to build usable tools despite analysis errors WHO SHOULD READ THIS BOOK? This books target audience includes security engineers, academic security researchers hackers and pentesters, reverse engineers, malware analysts, and computer science students interested in binary analysis. But really, I've tried to make this book accessible for anyone interested in binary analysis That said, because this book covers advanced topics, some prior knowledge of programming and computer systems is required. To get the most out of this book, you should have the following A reasonable level of comfort programming in C and C++ a basic working knowledge of operating system internals( what a process is, what virtual memory is, and so on) Knowledge of how to use a Linux shell (preferably bash) A working knowledge of x86/x86-64 assembly. If you dont know any assembly yet, make sure to read Appendix a first If you' ve never programmed before or you don' t like delving into the low-level details of computer systems, this book is probably not for you WHATS IN THIS BOOK? The primary goal of this book is to make you a well-rounded binary analyst whos familiar with all the major topics in the field, including both basic topics and advanced topics like binary instrumentation, taint analysis, and symbolic execution. This book does not presume to be a comprehensive resource, as the binary analysis field and tools change so quickly that a comprehensive book would likely be outdated within a year Instead, the goal is to make you knowledgeable enough on all important topics so that you re well prepared to learn more independently Similarly this book doesnt dive into all the intricacies of reverse engineering x86 and x86-64 code(though Appendix A covers the basics )or analyzing malware on those platforms. There are many dedicated books on those subjects already, and it makes no sense to duplicate their contents here. For a list of books dedicated to manual reverse engineering and malware analysis, refer to Appendix d This book is divided into four parts Part I: Binary Formats introduces you to binary formats, which are crucial to understanding the rest of this book. If you re already familiar with the elF and Pe binary formats and libbfd, you can safely skip one or more chapters in this part Chapter 1: Anatomy of a Binary provides a general introduction to the anatomy of binary programs Chapter 2: The elF Format introduces you to the elf binary format used on Linux Chapter 3: The PE Format: A Brief Introduction contains a brief introduction on PE, the binary format used on Windows Chapter 4: Building a Binary Loader Using libbfd shows you how to parse binaries with libbfd and builds a binary loader used in the rest of this book Part II: Binary Analysis Fundamentals contains fundamental binary analysis techniques Chapter 5: Basic Binary Analysis in Linux introduces you to basic binary analysis tools for linux Chapter 6: Disassembly and Binary Analysis Fundamentals covers basic disassembly techniques and fundamental analysis patterns Chapter 7: Simple Code Injection Techniques for ElF is your first taste of how to modify elF binaries with techniques like parasitic code injection and hex editing. Part III: Advanced Binary analysis is all about advanced binary analysis techniques Chapter 8: Customizing Disassembly shows you how to build your own custom disassembly tools with Capstone Chapter g: Binary Instrumentation is about modifying binaries with Pin, a full- fledged binary instrumentation platform Chapter 10: Principles of Dynamic Taint Analysis introduces you to the principles of dynamic taint analysis, a state-of-the-art binary analysis technique that allows you to track data flows in programs Chapter 11: Practical Dynamic Taint Analysis with libdft teaches you to build your own dynamic taint analysis tools with libdft Chapter 12: Principles of Symbolic Execution is dedicated to symbolic execution another advanced technique with which you can automatically reason about complex program properties Chapter 13: Practical Symbolic Execution with Triton shows you how to build practical symbolic execution tools with Triton Part IV: Appendixes includes resources that you may find useful A ppendix A: A Crash Course on x86 Assembly contains a brief introduction to Bsa x86 assembly language for those readers not yet familiar with it Appendix B: Implementing Pt NOTE Overwriting Using libelf provides implementation details on the elfinject tool used in Chapter 7 and serves as an introduction to libel Appendix C: List of Binary Analysis Tools contains a list of binary analysis tools you can use Appendix D: Further Reading contains a list of references, articles, and books related to the topics discussed in this book HOW TO USE THIS BOOK To help you get the most out of this book, let's briefly go over the conventions with respect to code examples, assembly syntax, and development platform Instruction set architecture While you can generalize many techniques in this book to other architectures, I'll focus the practical examples on the Intel x86 Instruction Set Architecture(ISA) and its 64-bit version x86-64(x64 for short). Ill refer to both the x86 and x64 ISA simply as x86 ISA. Typically, the examples will deal with x64 code unless specified otherwise The x86 ISA is interesting because it's incredibly common both in the consumer market, especially in desktop and laptop computers and in binary analysis research (in part because of its popularity in end user machines). As a result, many binary analysis frameworks are targeted at x86 In addition, the complexity of the x86 ISa allows you to learn about some binary analysis challenges that dont occur on simpler architectures The x86 architecture has a long history of backward compatibility(dating back to 1978), leading to a very dense

...展开详情
img
THESUMMERE

关注 私信 TA的资源

上传资源赚积分,得勋章
相关内容推荐