Step 1: Input File(Executable / Malware Sample)You start with a suspicious file like .exe, .apk, or .dll.

Step 2: Binary to Image ConversionRaw binary → image (grayscale or RGB)Each byte as pixel intensity (0–255)Visual patterns for image models (ViT)

Step 3: Simulated ExecutionBehavior logging in sandboxFile ops, registry, network, API, CPU/memorySequence of events / feature vector

Step 4: Feature ExtractionImage: Vision Transformer (ViT) → embeddingsBehavior log: Transformer Encoder (DistilBERT) → embeddings

Step 5: FusionCombine image & behavior embeddingsMulti-view representation

Step 6: Detection / ClassificationOutputs:✅ Benign☠️ Known Malware🧬 Unknown (Zero-Day) MalwareAnomaly score

Step 7: (Optional) Threat ResponseIf malware → terminate / quarantineIf benign → allow execution

Suspicious File(.exe, .apk, .dll)

Binary-to-Image Converter- Read bytes as pixels- Output: Grayscale/RGB image- Image representationImage Data (2D/3D Pixel Array)

Sandbox Environment(VM, Cuckoo, CDR, etc.)- Executes file- Monitors system calls- Captures network traffic- Logs registry/file operationsBehavior Log (Structured Text)

Vision Transformer (ViT)- Patch Embedding- Self-Attention Blocks- CLS token for image embeddingImage Embedding (Feature Vector)

Text Transformer (e.g., BERT)- Tokenization- Self-Attention Blocks- CLS token for sequence embeddingBehavior Embedding (Feature Vector)

FUSION LAYER- Input: img_emb + beh_emb- Method: Concatenation / Attention-based Fusion- Output: Unified Embedding

CLASSIFICATION & OUTPUTMulti-Layer Perceptron (Classification Head)Outputs:✅ Benign☠️ Malware🧬 Zero-Day📊 Anomaly Score

flowchart