Monday, April 18, 2016

Security or Something Like It

An interesting conversation happened around me today about all the high-profile data breaches that have occurred and could they have been prevented. There was of course, the one lunatic who was valiantly arguing they could have but weren't because of some grand conspiracy.

The unfortunate thing was that at the core, he was correct. Security is possible, and not nearly as hard as we think. Just harder than we would like.

As I organized my internal retort I realized that cause and cure, like many things, are very much related.  So I wanted to write my thoughts while fresh, for your amusement.

The crux of making security accessible and easy is reducing the factors and the bits involved. The crux of making security robust and reliable is by increasing factors and adding through division to increase the number of bits involved.

Let me explain through gratuitous use of some over-simplification.

When logging in with a name and password I only have the two bits and the one step. If that login grants me access to a system that lets me access customers and their details and transactions that is easy for me to use.

If you want to hack me, just get my two bits and go to town on everything. This is vulnerable.

Today, the big thing is to add an additional factor, for two-factor authentication. This is just one part of the way to three-factor authentication is which about the most you can ask for with today's accessible technologies. A simple way to explain three-factor authentication is that requires something you are, something you have, and something you know.  In modern two-factor, we use a phone or fob as the "something you have" and the password is the "something you know". This is because we haven't really gotten wide adoption on "something you are" like fingerprints, retinal scans, or other bio-metrics. Some of you may have experienced with this for passport or other controls where they've taken your fingerprints or retinal scan but this isn't exactly day-to-day for most people.

When there is only one step, the authentication part is addressed by adding these factors and so they make it harder to use. If you want to address the other aspects of security, you also need to add more bits.

Going back to our example, suppose if I had split the data such that customer data was saved in one system and transaction data was saved in another. I've divided the data which adds more bits. Or I could separate the work into two steps (request work, commit work) done by two different actors. This is on the way to the Four-eyes principle which requires two people collaborate to complete an activity. Notice, they're both addition by division, one is data the other is people.

You could extend this again to real-time audit and detection systems that use the same technique to determine if the work being performed matches some normal profile. This works by dividing control into normal processing and oversight. It adds bits but catches discrepancies between systems.

In the end, it's easy to throw more encryption and say that it will fix any security vulnerabilities that might exist. In reality, unless the number of bits involved doesn't change, you're just pushing the vegetables around the plate. You can add factors to reduce entry vulnerabilities, or use addition by division to increase the number of bits and change your overall vulnerability profile.

You can have it easy, secure, or fast. Pick two.

Thursday, October 08, 2015

On Being A Consultant

I've been an engineer for multiple decades. I've been a consultant for more than a decade. I've only been a great consultant for a few years.

It took me a few years to become an master engineer. It took me decades to become a master architect.  It has taken me longer to become a master consultant.

Growing up through the ranks of Microsoft and shipping products such as Visual Studio can teach one the value of delivering technical excellence, but not necessarily the soft skills and relationship management to make the process smooth and experiences positive.  In truth, surrounded by anti-social geniuses it was as easy to learn bad business habits as it was to learn set theory and how write elegant code in one pass.

Becoming capable as a consult required an unlearning of many aspects of the engineering value system. Similar to the inspired theory made famous in "A Beautiful Mind", it requires solving for more than one definition of the right answer. Which inherently invalidates the idea of getting something right the first time without any interaction.

As I learned to find a balance, I realized that I enjoyed the process of finding this balance. That bringing together strictly technical elegance and the wider spectrum of elegance in business was in itself a unique type of challenge that I found rewarding. Which is why I have not found myself particularly at home in large corporations doing maintenance on legacy systems year in and year out. Or even within the same enterprise solving and re-solving the same problems over and over for incremental refinement. Instead I am most at home when I can bring together seemingly unrelated solutions to craft evolutionary advancements where the impact can be significant. For example, using the same sophisticated algorithms that solve just-in-time manufacturing problems but applied to slotting seats and routes for an airline. These problems are many but they are intrinsically non-repeating. And therefore making a consistent income from this type of interesting work requires exposure to many industries and many clients. If one wants to keep doing innovative work for innovative companies, one must always be searching for the next bit of innovative work. And therein lies the rub. While I am working, I prefer to be focused on my work, not on finding the next bit of work that will bring me income once this current bit of work is completed.

Which leads me to the online marketplaces and project boards as a means to expand the pipeline of available opportunities. Recently I decided to start working with some "new" agency types as a means to extend my network. While my social and client network has no seeming end to the need for talents such as mine, there are indeed limitations and the idea of increasing my pool of opportunities from a much larger perspective is enticing. Previously I have been reluctant to post my profile in more publicly trafficked spaces because of the lack of oversight and proper vetting. My time is valuable and my talents unique, so it makes sense that I would employ a buffer between my time and the deluge of clients who don't appreciate that I am not the best choice to build the [landing page|brochure|insert mundane UI here] for their [flower shop|dry cleaners|insert small business here].

After a few experiences I have concluded that these "new" types of agencies are really the same old staffing agencies and development shops in disguise. The worst of these has been TopTal which is really just a bunch of elitist shysters who have found an interesting marketing approach to increase their margins while providing no additional value.

Friday, May 08, 2015

The Iron Triangle

There is a concept in technology known by some as the Iron Triangle. I tend to approach it through the simple adage:
You can have it fast, cheap, or right. Pick two.
What this tries to make clear is that building anything requires a trade-off between how quickly you want it done, how much you are willing to pay, and how important it is to meet a need completely. The first tends to be obvious, because everyone wants everything as soon as they can get it, preferably now. The second is also straight-forward because everyone tends to think that everything should be low cost, preferably free. The last one is where things start to get complicated.

Do you value performance? Accuracy? Precision? Reliability? Maintainability? Scalability? Everyone has their own ideas about what "right" can mean, and sometimes they don't even bother to understand what they think "right" might mean before they start making assumptions. In the same way, we just assume everything is free and everything will be done right away, we blindly assume everything we produce will scale easily, work repeatedly and be easy to maintain.  We might be asking for a very intricate algorithm, but we don't even bother writing down the nuances of all the conditions and error cases.

How can anyone accurately estimate the effort to build something without having a solid definition of what "done" means? If you can't take time to write down your expectations, it seems unreasonable to get upset when these hidden expectations aren't met. Simply put:
A lack of planning on your part, does not constitute an emergency on my part.
It's good in theory. But in practice, people seem entirely comfortable being completely unreasonable on a regular basis. Which I suppose keeps consultants like me in business. So no complaints here, just observations.

Tuesday, November 11, 2014

Spotify = Evil and Un-American

And wrote this:

So Spotify licenses 20M songs. Assume half are never played. The remaining songs split amongst  the 2B they've paid out means they're paying an average of $200 per song. Total. To date!

So assume a top seller is earning 100 times that and pocketing 20k for their song and most are earning in the $2 range. Obviously the artists are upset and they should be.

Spotify is the Walmart of streaming music; creating an unsustainable consumption model of greed that is a blight on the earth and the enemy of anything that is decent and just.

Monday, November 25, 2013

Reference vs Master

Had an interview this morning with some industry analysts who were researching Enterprise Architecture subjects. Per usual, the subject of Master Data Management came up. As sometimes happens, I verbalized something quite important that I previously hadn't had a chance to write down.

In the conversation, we went through the last eleven (11) master data management initiatives I've been involved with and looked for the common thread. In every engagement where I was called in because they were failing and in every engagement where we collectively failed, it was because Master Data was attempted before Reference Data. In every case where the engagement went smooth or where we were able to get things back on track, it was because we prioritized a single case of Reference Data and then iterated.

This seems glaring obvious in retrospect. Prior to this impromptu postmortem, I hadn't realize our collective experiences painted such a clear picture of how not to screw something up.

If you are thinking about Master Data Management, you first need to build good Reference data. The underlying reason is that most people, even architects, don't fully understand and recognize the difference. Since we don't always separate these types of data, we don't always prioritize properly and then we are focusing on a moving, complicated target and this increases the likely of challenges leading to failure.

Reference data is non-volatile, exists independent of business process and interactions, and is globally identifiable. Master data is slow-changing and can exist independent of a business process. A geographic location (lat/long) is reference data, whereas a physical address is master data. An address, like many kinds of master data, is built from reference data. For example, an address may include a geographic location. The name of the building might come from master data but the name of the region or country would come from reference data. Why is a country name reference data but the name of the building is not? Because of the global identification. If the globally identified data comes from an outside source, managed externally, then it is reference data. Understanding which components of your data constructs are reference instead of master is the first, most crucial step.

Why did the prioritization for reference data become a leading indicator for success? Because reference data is non-volatile and can withstand the winds of change within an organization. It gives an immovable target with quantified, known complexity that can be addressed. And like other forms of data it needs to be published, consumed, syndicated, replicated, secured, and so forth. Rather than figure out how these functional and technical capabilities must be delivered and iterated within your organization using volatile data of potentially unknown complexity, you can first provide these functions using data constructs of known complexity and fixed definition. Only once you have a tried and tested cadence for the cyclic functions, and proven templates for the iterative functions, can you then embrace more complex and volatile master data.

A real world anecdote is from a conversation I had a few short months ago with a colleague struggling to get traction on a Master Data Management engagement. This organization had multiple sources of customer data and like many organizations had determined they needed a master customer record. But how to wrangle twenty-seven (no joke, they have 27 customer relationship management systems!) and their dependent systems into all using a single master. To start with they all have different formats, fields, and definitions. They had many different ways to identify a master record and many allowed multiple records for a single customer for various purposes. When you expanded to include the down-stream systems that relied on those CRM databases the number exploded to over a hundred. No one wanted to take on the challenge and I don't blame them.

We made a plan, which they followed and they are now on they're sixth successful iteration.

Rather than try consolidating the entire records straight away, we just started with customer name. They created an extremely simple reference data-set of all customer names by pulling extracts from all systems. Every entry was given a unique key that matched the key from the system it came from. This list (10+ million records) was run through de-duplication and data quality software. What resulted was a single list of unique names with unique identifiers. Each unique identifier had alternate keys attached for as many of the source systems as had references for that customer. Then we published it at a fixed location and made it available in several ways (SQL, XML, CSV, etc).

Each system then undertook an exercise at their own pace of reconciling their data with the master list. Some used replication and pulled the master list nightly just making the single list their new source. Others used a combination of queries and manual entry to reconcile. By calendar week 8, 30% of the systems were using the customer name master list as a fully integrated data source.

It only took two weeks for the data mastering crew to clean and prep the customer name master. Which they handed off to the support team who helped all those systems with consuming that data. By calendar week 6, the data mastering crew handed over an address master to the support team. By calendar week 12, 30% of the systems were using the address master lists as fully integrated data sources.

By calendar week 8, the data mastering crew handed over employee and organization master lists to the support team. By calendar week 12, they had handed off office, contact information, and account.

The organization is 5 months in and they have 90% integration across all systems for customer, address, employee and organization. Along the way they have established and iterated patterns and processes for replicating data, auditing compliance, publishing secure feeds, and publishing secure subsets of data by organization. They even decommissioned two smaller systems just by cleaning and securing access to a single list of list. There are 11 more planned for decommissioning over the 6 months.

They got their footing by focusing on reference data. Every time they take on a new set, they first start with the reference data. Allowing them to separate this means they can separate the application from the data just enough to make progress. Once they get the pattern in place, they iterate to add complexity and features and expand the data set. It's methodical and direct and mostly non-threatening.

At the time, I didn't have a codified reason for why starting with reference data was so important. Now I've written down why it was good advice.