digraph binary visualization and identification in memgaze.pl

on blog at
memgaze.pl (167K)

Someone recently pointed out to me that much of what memgaze is has already been done, and better. True, and if I could've remembered the ghidra's name I think I would've tried to use it with the digraph plugin. But ghidra is very complex and heavy to set up. What it, and it's sources, are good for is good ideas! It's full of them. Particularly a form of binary visualization and identification using "digraphs" from Voyage of a Reverser_A Visual Study of Binary Species_ Sergey Bratus_ Greg Conti_ BlackHat USA_ 2010.pdf. To quote them,

A digraph is when bytes in the binary data are considered in sequential pairs. So that for the binary data of the ascii text "black hat" it would be,

bl	(98,108)
la	(108,97)
ac	(97,99)
ck	(99,107)
k_	(107,32)
_h	(32,104)
ha	(104,97)
at	(97,116)

Or the ascii string "battelle" would be,
ba		(0x62, 0x61)
 at		(0x61, 0x74)
  tt		(0x74, 0x74)
   te 		(0x74, 0x65)
    el		(0x65, 0x6c)
     ll		(0x6c, 0x6c)
      le	(0x6c, 0x65)

I basically just vibe coded this into memgaze 'image' mode and the result was instantly useful. The pairs of numbers (from the sequential byte pairs) are considered coordinates of pixels in an image object 256*256 in size and plotted as green pixels on a black background. But I noticed that detail was missing when I used *lots* of data so I added a 'normalized' mode too. I think it's pretty cool and both normal and normalized mode show different aspects.

Mouse hover over the example digraph images below for the process name and which part of memory is represented. The 8bit wav is really distinctive. Images always have arcs or continuous curved lines. Text is always boxy arrays. It becomes quite easy to identify data types just by eye. It's probably be fairly feasible to train a small visual model to do this automagically with a synthetic dataset of labeled binary data types and normalized digraph images.

[comment on this post] Append "/@say/your message here" to the URL in the location bar and hit enter.

[webmention/pingback] Did you respond to this post? What's the URL?