Data Data Data

Data data data. Data data data data, data data. Data data data data data data data.

Developers developers de... oh whoops, wrong one. Data data data.

I wanted to talk a bit on data, it's perceptions, and how it is used or misused. So, to uphold this conversation, I would like people to look at the first too lines of this post. To a computer, specifically an ai, this may appear as a sequence representing some choice in lexical ambiguity or it may see it as simply some ascii strings with which we could map to known words, which we could map to known usage and habitual usage to find most likely meaning. In either case we think about this, a computer may see the first sentence by itself and assume one situation, then the second and assume it is another, or both together and assume it's a 3rd. This is a fundamental issue with data, even to a computer: perception changes how we investigate, diagnose, or define it.

Now lets say that I wanted to run this data myself, how would I figure out the meaning. I find the first one, and seems stupid so I pass it over because it becomes illogical for known trends of thought. Then I see the second one and see this could be referencing something, so I attempt to remember or look up what that reference could be. Eventually finding the rant turned into a fancy musical meme. But, I'm still left to deduce it's relevance to a topic about data. So I look back real quick, I see it's using the term data in place of developers, the rant before was about the importance of developers, so in a split second of deduction I found that this was going to be a topic about the importance of data.

"Is this how you see the world too my friend?"

Data, can be represented in many ways, come from many sources, stored in various ways, and analyzed in various ways. In the infosec side of the world I too often see people unwilling, or unable, to take data and expand it into an understanding of the world around us. Maybe it's cause infosec people stake claims of whitehats and defenders, further criminalizing those who aren't with them. But maybe there are other reasons too. I've grown up in a weird timeframe and I was told what you do online and who hurts you online and how much information you put online is up to you, now i'm told to tell my children to be worried about bullies saying mean words online. I was told the internet was the future and it was all about techies and businesses, while today i'm told to tell my children that the internet is a highly regulated, highly managed, multiple provider network where we should be scared to assert data.

"Ding ding ding, we have a wiener!"

Social constructs appear to be a huge damage to the ways we've grown up. We were told information was free and should be free because to criminalize data was against humanity. Now we have freedom fighters telling others to stop posting everything from political banter and hate speech, all the way to personal feelings or technical manuals they didn't purchase. Freedom my ass. Freedom of information was such a big win, we told the government to politely tell us things they did once it becomes irrelevant, but only what can't be redacted. But damn it we slapped the title freedom on that bill and it's sure to make everyone reference that as soon as you say information isn't free in america. In fact, information is criminal. A friend links a post showing data containing someone's social security number, then the cops raid your computer for any reason they choose to give, bam felony charge. You then have to defend yourself and hope for time served plus an ankle monitor for 9 months. Hope you can keep your job. Worse yet, you find that it's a frequent thing to look at pastebin pages where people got doxxed, and save them because you want to help solve issues with doxing. Oh snap, cops saw it, they don't like you, they decide to press charges of 20-life per social security number saved willingly. oh, but you made a script to do it, so it was functionally just cache? well good luck defending that with the assistance of the careless state of americans being your jury.

 "To live, is to commit a crime."

So, we've seen social corruption and governmental corruption, lets take this back a step or two. Data, can be any perceivable idea. I dream of demons ripping the flesh off everyone I know and dropping them from 200ft to let them splatter and try to struggle breathing. This is data. Every, single word. It's what we do with data that counts right? Well, sort of. But no. We need every bit of data, we need to be able to parse and analyze it, and we need to understand how this is done. While people sit here with their $20,000 platform that underperforms to expectations, they think it takes a large development team to do this work and it be effective enough for analysis and if they can't do it then we have no hope and blah blah blah blah blah. To all of this, I would like to mention the life lesson that rings true many many ways for me: "With all of our technology, we operate everything at a rudimentary level." I say this, because I find this true in everything. We use 120+ year old capacitor concepts to power industrial machines and war time weaponry. We use signals of true or false to identify traits which are other patterns of true or false to use massive computing architectures. We use linux cron jobs to power many "industry standard" tools that keep everyone safe. But really they just parse data like they're told, the way they're told. Without an understanding of how that data is used, we have no idea what it's reporting. But we can totally read the manual! That'll tell us! RIGHT!? fuckers. NO! We are at a stage where our "professionals" either hacked their way in or went to school and learned very little. Some times, we do find some who went to school and hacked their way in. But the essential problem remains that data is being parsed under our noses. We don't even spend the time to look anymore.

Storage gets bigger, data gets smaller, learning becomes less.

Data, in the eyes of humans, can be many many things and used many many ways. But we have to revert back to arbitrary notions before we understand what it really is. Someone says they're going to the store, but you realize they can't go to the store because the store is out of their way and their habits define a pattern directly against going to that particular store. So, to enumerate better possibilities and to enrich the data that you already have (they doesn't seem likely to go there). You go to the store, you find their car is not there. You ask the cashier if they've seen them, and that person says no. you message them, to which the response is that they are still at the store, will be back shortly. You go back, find them there before you. This little bit of data can judge a range of drive time and distance assuming regulations such as speed laws are in place. You proceed to call them out for it and they show you a receipt from the atm at that particular location. However, since that atm is only accessible of the cashier sees you, there is a functional flaw here. Further, the atm receipt was dated 2 hours before. their excuse is that this was due to dst.  By this point you don't believe it, you know they're lying but how do you prove it without just telling them to gtfo? well for one you should tell them to gtfo. But also, the amount of enrichment you do on your daily live's data can aid in identifying problems like this. You simply say, "I went there, i proved you weren't there, you made it here before me, from within a range of (blah), which coorelates to x number of friends you have." Cheating people hate being told who they're cheating with and how. It's almost funny. You can watch them struggle to find a new excuse or to change the lies they already told. It's great actually. But back from the data view point, this is all very minor data points enriched to solve a problem that many people have.

How can we do the same with our every day data as analysts for any form of infosec studies though? can we turn enrichment into an actually useful tool? Well several tools are made to enrich data, mostly doing the same basic functions. Like resolving domains, caching domain resolutions, storing large lists of data believed to be linked one way or another, etc... But none of these things need some multiple million dollar tool for this, any hacker with any system can pull this off.

"Review of time and place"

We need to teach people how to use data. Data is the key, not the toolset. Understanding how a mbr can be changed by changing the 16bit asm versus understanding that a tool shows deviance between known good versions, makes the world of difference when trying to identify bad actors, habits, or otherwise, activities.


No comments:

Post a Comment

New wordpress site. yes, seriously

 So, I made myself a little wordpress site over (http://hello.0daz.io/see-also/). It's running on docker, with goreplay setup to propaga...