(Executable / Malware Sample)
You start with a suspicious file like .exe, .apk, or .dll.
Raw binary → image (grayscale or RGB)
Each byte as pixel intensity (0–255)
Visual patterns for image models (ViT)
Behavior logging in sandbox
File ops, registry, network, API, CPU/memory
Sequence of events / feature vector
Image: Vision Transformer (ViT) → embeddings
Behavior log: Transformer Encoder (DistilBERT) → embeddings
Combine image & behavior embeddings
Multi-view representation
Outputs:
✅ Benign
☠️ Known Malware
🧬 Unknown (Zero-Day) Malware
Anomaly score
If malware → terminate / quarantine
If benign → allow execution
(.exe, .apk, .dll)
- Read bytes as pixels
- Output: Grayscale/RGB image
- Image representation
Image Data (2D/3D Pixel Array)
(VM, Cuckoo, CDR, etc.)
- Executes file
- Monitors system calls
- Captures network traffic
- Logs registry/file operations
Behavior Log (Structured Text)
- Patch Embedding
- Self-Attention Blocks
- CLS token for image embedding
Image Embedding (Feature Vector)
- Tokenization
- CLS token for sequence embedding
Behavior Embedding (Feature Vector)
- Input: img_emb + beh_emb
- Method: Concatenation / Attention-based Fusion
- Output: Unified Embedding
Multi-Layer Perceptron (Classification Head)
☠️ Malware
🧬 Zero-Day
📊 Anomaly Score
by Ganga