This is the design documents for the Allure Defender system. This document is a high level design and API of the components that make up the Allure Defender system. We outline all the high-level pieces and then the individual components, their behaviors, expected input/outputs, and relationships. We will discuss specific implementation and design choices and languages and libraries that will be used. In addition we will cover specific user cases and illustrate some running examples. Last we refer to a running system which implements many of the components we cover in the document.
The goal of the document is for a designer to create a working system and or verify a working system conforms to the specifications outlined in the document.
The document generation component will create documents (henceforth referred to as Decoy Documents, or DD for short) in various formats (e.g., Word, Excel, PDF, Powerpoint, email messages, Instant Messaging logs, … ) that contain one of several features :
• a “mark” allowing Allure Defender to determine whether a file is a DD, and possibly allow legitimate users to avoid accessing/triggering the DD;
• one or more “beacons”, which will cause the application processing the DD to emit some sort of discernible signal;
• Enticing Information (henceforth referred to as EI) which, if acted upon by the adversary, will allow detection. Such information includes URLs (for various protocols), account information (e.g., username/password), and others that may be developed in the future; and
• Enticing Content (henceforth referred to as EC) that will attract t he adversary to the DD (e.g., if they are using a search function) without raising suspicion, will support the presence of the EI in the document, and will allow the DDs to “fit in” with the rest of the environment on which t hey have been deployed.
DDs may be deployed on servers, databases, user desktops and laptop, mobile devices, honeypots, or other locations. It is desirable t hat all of these seeding techniques be supported.
The EC may be generated based on templates, synthesized from private sources (e.g., by mining existing documents at the directory /account/system/server to be seeded), synthesized from public sources (e.g., documents acquired through search engines) based on high-level templates, or synthesized from public sources using information mined from existing documents at the directory/account/system/server to be seeded. Any combination of these techniques may be used to generate DDs, and a specific DD may be t he result of several such techniques being used simultaneously.
The misbehavior detection component consists of a variety of subcomponents, some of which are specific to the beacon techniques used:
• honeypot servers, pointed to by URLs and similar information;
• intrusion detection systems combined with legitimate servers/services, when the lat ter can be used for detection purposes without compromising primary functionality (e.g., invalid username password login attempts, specific directories in a filesystem or web server hierarchy, DNS server queries, and so on) ;
• Data Leakage Prevention (DLP) subsystems, which may operate at various points in the system, e.g. , network, filesystem, memory, and others. The DLP may be a priori aware of the identity and location of the DDs, or it may be able to identify them on the fly via the “mark “.
The design of the architecture (see Figure 1) attempts to cleanly divide the functionality of the different subsystems into self-managing components allowing maximum flexibility of the system to adopt to changes while allowing all the components to seamlessly work together. The design reflects the facts that (a) documents may be requested via different interfaces (e.g., webserver front-end, client-side logic interacting over the network, client-side application with generation library, and possibly others); (b) the documents may contain a combination of enticing information, marks and beacons, based on the desired configuration, (c) the corresponding detection capabilities can vary (and should be extensible so that we can add further capabilities as future research directs), and (d) the documents, and specifically the enticing content, may be generated through a variety of means.