Monday, November 25, 2013

Reference vs Master

Had an interview this morning with some industry analysts who were researching Enterprise Architecture subjects. Per usual, the subject of Master Data Management came up. As sometimes happens, I verbalized something quite important that I previously hadn't had a chance to write down.

In the conversation, we went through the last eleven (11) master data management initiatives I've been involved with and looked for the common thread. In every engagement where I was called in because they were failing and in every engagement where we collectively failed, it was because Master Data was attempted before Reference Data. In every case where the engagement went smooth or where we were able to get things back on track, it was because we prioritized a single case of Reference Data and then iterated.

This seems glaring obvious in retrospect. Prior to this impromptu postmortem, I hadn't realize our collective experiences painted such a clear picture of how not to screw something up.

If you are thinking about Master Data Management, you first need to build good Reference data. The underlying reason is that most people, even architects, don't fully understand and recognize the difference. Since we don't always separate these types of data, we don't always prioritize properly and then we are focusing on a moving, complicated target and this increases the likely of challenges leading to failure.

Reference data is non-volatile, exists independent of business process and interactions, and is globally identifiable. Master data is slow-changing and can exist independent of a business process. A geographic location (lat/long) is reference data, whereas a physical address is master data. An address, like many kinds of master data, is built from reference data. For example, an address may include a geographic location. The name of the building might come from master data but the name of the region or country would come from reference data. Why is a country name reference data but the name of the building is not? Because of the global identification. If the globally identified data comes from an outside source, managed externally, then it is reference data. Understanding which components of your data constructs are reference instead of master is the first, most crucial step.

Why did the prioritization for reference data become a leading indicator for success? Because reference data is non-volatile and can withstand the winds of change within an organization. It gives an immovable target with quantified, known complexity that can be addressed. And like other forms of data it needs to be published, consumed, syndicated, replicated, secured, and so forth. Rather than figure out how these functional and technical capabilities must be delivered and iterated within your organization using volatile data of potentially unknown complexity, you can first provide these functions using data constructs of known complexity and fixed definition. Only once you have a tried and tested cadence for the cyclic functions, and proven templates for the iterative functions, can you then embrace more complex and volatile master data.

A real world anecdote is from a conversation I had a few short months ago with a colleague struggling to get traction on a Master Data Management engagement. This organization had multiple sources of customer data and like many organizations had determined they needed a master customer record. But how to wrangle twenty-seven (no joke, they have 27 customer relationship management systems!) and their dependent systems into all using a single master. To start with they all have different formats, fields, and definitions. They had many different ways to identify a master record and many allowed multiple records for a single customer for various purposes. When you expanded to include the down-stream systems that relied on those CRM databases the number exploded to over a hundred. No one wanted to take on the challenge and I don't blame them.

We made a plan, which they followed and they are now on they're sixth successful iteration.

Rather than try consolidating the entire records straight away, we just started with customer name. They created an extremely simple reference data-set of all customer names by pulling extracts from all systems. Every entry was given a unique key that matched the key from the system it came from. This list (10+ million records) was run through de-duplication and data quality software. What resulted was a single list of unique names with unique identifiers. Each unique identifier had alternate keys attached for as many of the source systems as had references for that customer. Then we published it at a fixed location and made it available in several ways (SQL, XML, CSV, etc).

Each system then undertook an exercise at their own pace of reconciling their data with the master list. Some used replication and pulled the master list nightly just making the single list their new source. Others used a combination of queries and manual entry to reconcile. By calendar week 8, 30% of the systems were using the customer name master list as a fully integrated data source.

It only took two weeks for the data mastering crew to clean and prep the customer name master. Which they handed off to the support team who helped all those systems with consuming that data. By calendar week 6, the data mastering crew handed over an address master to the support team. By calendar week 12, 30% of the systems were using the address master lists as fully integrated data sources.

By calendar week 8, the data mastering crew handed over employee and organization master lists to the support team. By calendar week 12, they had handed off office, contact information, and account.

The organization is 5 months in and they have 90% integration across all systems for customer, address, employee and organization. Along the way they have established and iterated patterns and processes for replicating data, auditing compliance, publishing secure feeds, and publishing secure subsets of data by organization. They even decommissioned two smaller systems just by cleaning and securing access to a single list of list. There are 11 more planned for decommissioning over the 6 months.

They got their footing by focusing on reference data. Every time they take on a new set, they first start with the reference data. Allowing them to separate this means they can separate the application from the data just enough to make progress. Once they get the pattern in place, they iterate to add complexity and features and expand the data set. It's methodical and direct and mostly non-threatening.

At the time, I didn't have a codified reason for why starting with reference data was so important. Now I've written down why it was good advice.

Monday, August 26, 2013

Quantified ALM

In most good consulting approaches, they start by pointing out that the way to effectively change something is to first measure it.  If you can't measure it, how will you know it changed? By how much? For what reason?

Being able to measure is a cornerstone of any good management approach. It's the cornerstone of the quantified self movement which is built around the premise that once you measure something, being able to change or maintain becomes much easier.

With software and technology, this is no different. When it comes to the application lifecycle the ability to measure is just as necessary. Unfortunately, ALM tends to evolve at a fairly slow pace. While there are pockets of innovation, true change only happens inside big, slow, enterprises that are typically not being managed by the most *cough* technically astute. As such, they might ask for measurements but this largely for show. If you don't understand what you are measuring, how relevant can the measure be?

The gents over and McKinsey took a swipe at this in a recent article:

While I think just substituting Use Case Points as the answer to how to measure is an easy thing to propose, it is certainly not the only or most obvious choice for addressing this significant need. The article does, however, break down the aspects of the problem and exposes some of the challenges that have created the situation in the first place. It is a good contextual read and likely to provide at least one uncommon perspective on something we take for granted but shouldn't. Definitely give it a read.

Tuesday, April 23, 2013

Component Interaction Diagrams

If you are going to work with other people to build something, you need a way to communicate clearly about what you are building.  From ages past, the clearest way to do that was by drawing pictures.  With software, we do the same thing.  Most tools, and methodologies have different techniques, diagrams, and types of illustrations that are central to their documentation approach.  I'm familiar with most of them.  Over the years, I've refined the best parts of each to construct a set of diagram that follows the Key Principles and allows me to Move Quickly In the Dark.

The first is called a Component Interaction Diagram, and the second is a Process Flow Diagram.  In reality, all of the documentation formats across the various methodologies and approaches have their strengths and weaknesses and are proposed by smart people for varying reasons.  With a Component Interaction Diagram (CID) and corresponding Process Flow Diagram (PFD) we attempt to focus the diagrams so that it provides the maximum value with the least effort and stays relevant for the longest amount of time to the widest possible audience.  Rather than try and justify the format up front, I'm going to explain the how you create them.  As we discuss each aspect of the diagrams and the guidelines for them, the reasons for each will become apparent.  If they don't, perhaps you'll find clear reasons why you prefer whatever documentation you have chosen.

Component Interaction Diagram
A Component Interaction Diagram (CID) has a singular purpose.  It is depicts the components utilized in a particular solution scope and the points of interaction between them.  How it does that, the information that can be layered on top of it or derived from it, and the variety of ways that it can be utilized are all secondary considerations.  The way in which we meet the primary purpose is what will allow us to use it to maximum advantage for the widest audiences.  So as we go through the guidelines and process of creation, consider all the downstream impacts and you'll understand why it requires such rigid precision.  You'll also uncover areas where you can forgo precision or formality, and the consequences of choosing to deviate.  In many cases, you may be perfectly happy with only reaping some of the benefits, a decision which would merit following an abbreviated process.

Okay, with the disclaimers and background out of the way, let us begin with the guidelines.

  •  A given CID should have a clear, identified and immutable scope that is independent of time.
  •  Use symbols to represent the components in scope for a given diagram.
  •  Use lines to represent interactions between the components in a given diagram.
  •  The consuming or external components are placed in the left most region in a given diagram.
  •  The persistent storage or most granular processes are placed in the right most region in a given diagram.
  •  Do not depict containment.
  •  Do not depict state, sequence, or directionality.
  •  Do not depict the flow of data or process logic.
  •  Environmental or grouping boundaries are an accepted practice.
  • Use different symbols to represent components of different types.
  • The set of symbols in use should be consistent across a given set of diagrams.
  • Symbols should not contain other symbols.

  • Every component should only exist once in a given diagram.
  • Every component should have a unique identifying label in a given diagram.
  • Classes, tables, and other structures containing state are represented as separate storage components.
  • Methods, functions, and other processing constructs are to represented as separate process components.
  • The should a minimal number of formats for lines in a given diagram.
  • There should only be a single line between any two components in a given diagram.
  • Every interaction line should connect exactly two components in a given diagram.
  • Interaction lines should not have labels, but line format may indicate classification.
  • Utilize call-outs to describe details in common language about a specific component or specific interaction.
  • Utilize note boxes to provide context for a set of components or to describe the interaction semantics.
  • Utilize note boxes to provide rationale for the approach, usage recommendations, or exception semantics.
  • Always provide a legend for symbols and line formats if there are multiple.
Since the guidelines are fairly rigid, let's discuss the intent and reasoning for the common areas of concern.

A diagram should have a particular scope independent of any particular processing state.  Ensuring that every component only shows up once on the diagram allows the diagram to serve as an inventory. By ensuring there is no state or sequencing this allows us to track completion against the inventory independent of the orthogonal or cross-cutting nature of the components.  Simply put, a component is complete for a given diagram when it satisfies the interactions present on given diagram. Enforcing that each component only shows up once allows for an accurate depiction of multiple dependency and ensures that polymorphic or iteratively developed components and functionality libraries are properly decomposed.

Adding state, sequence or flow to a diagram requires the introduction of time which modifies the scope. The nature of state, sequence and flows means the diagram would  not have a clearly delineated scope. Introducing time information to a CID requires that the user understand the particular pre- and post- conditions to validate the scope of the diagram. This will inevitably complicate the diagram, often requiring multiple diagrams and that the audience makes assumptions about the nature of the components or interactions.  All of these side effects will allow a single diagram to have multiple interpretations which can all appear accurate.  For these reasons and others this information should be represented separately in a Process Flow Diagram using the symbols and components from this diagram.

For a particular set of diagrams to be consumed easily by a variety of audiences, the information needs to be presented consistently.  Therefore positioning components consistently within a diagram provides the ability to perform comparisons between diagrams and to follow interactions through symbols across different diagrams.

Components on a diagram need to stand independently so that the their attributes can be granularly managed for inventory, tracking, and validation.  Containment makes calculation and attribute management very difficult and can introduce artificial assumptions about scope.  Decomposition becomes unwieldy and harder to validate with the introduction of containment.  The use of bounding boxes or shaded backgrounds for grouping is an accepted alternative but should be used sparingly to reduce complexity and ease consumption.

The two major classifications of components that are appropriate on a CID are components for storage and for processing.  Decomposing processing components are usually self-explanatory with the only challenge to find the appropriate level at which to stop.  Reasons to decompose below the assembly or interface boundary include tracking the contributing teams, the use of different skill sets, or when implementation is iterative.  Consider that every decomposition increases the barriers to construction and consumption.

Deciding which storage components to decompose can be challenging. As a general rule of thumb, only persistent or shared data structures need to be present on the diagram as storage components.  For example, a class that is used to transmit data across an interface isn't appropriate because it is transitive (not persisted) and is only accessed by the components independently.  Alternatively an in-memory class that manages thread state and is monitored by a controlling process is shared but not persistent and therefore still meets the criteria for placement on the diagram.  A file which receives log updates or a table which is updated as the outcome of a process are both persistent and therefore meet the criteria for use on the diagram.  In any case, all data structures for which you desire tracking can be included. Transitive structures are strongly discouraged because of the complications involved with fitting them into the diagram.  Again, the trick is to balance the desire to track at the most granular level against the ease to construct and consume these diagrams.

Since my examples are generally scrubbed for particular purposes you'll just have to contact me if you'd like one.

Wednesday, March 27, 2013

Not for Nothing

As someone who is often championing nimbleness and agility, people are often surprised at the level of importance I put on planning and writing things down.  In the typical situation, we are discussing some objective or goal and there is just this ambiguous laundry list of tasks, activities and milestones that spews out.  I insist on proper organization, writing down the relationships, and in general having all the elements of a formal plan laid out. The reason why is purely for selfish reasons based on historical evidence.

Most people can't hold the complex elements of a plan in their head and manipulate it effectively. For collaboration, all parties can function no faster than the slowest member. So writing down the elements of the plan means that when we start trying to optimize everyone can follow along. This is particularly necessary when you start to execute against a plan and need to recall the reasons behind your decisions.

One aspect of the plan that I typically insist on clarity about is the roles and responsibilities. Often more than just needing to understand what your outcome looks like, you need to understand why it looks a certain way,  and whom is driving those criteria. This is embodied in one of the tenets:

Learn the who and the why before the what or you will end up creating nothing for no one.

If you are going to react to tactical considerations while you are executing a strategic plan, you need an awareness of more than just what the criteria for success look like, but also who is defining and evaluating the criteria and towards what purpose.

If you are dealing with a fickle audience, an unknown solution, or a unique innovation, the definition of a successful outcome might be sufficiently ambiguous that you won't be able to effectively optimize without a rapid feedback loop. Obviously being able to bring outcomes to an audience or test rapidly is helpful but not always practical.  Being able to provide a preliminary evaluation of success without the cost of involving your constituency is crucial for agility and nimbleness. It doesn't replace the need for frequent and rapid testing, but it does mean you can often discard unusable alternatives more quickly.

More important than being able to do your own work more efficiently, by grounding your criteria with their source and rationale, you validate outcome in context. There is nothing worse than doing something difficult and amazing only to realize that the impact is only felt by a fraction of your patrons. Take the time upfront to make sure you know who you are working for and why what you are doing will matter to them. More clarity with these things will help ensure you keep asking the right questions at the right times.

Thursday, January 31, 2013

Control, Governance, and Management

If you've ever gone through an oddly-structured, ill-timed, poorly-communicated reorganization (and who hasn't?) you are familiar with how screwed up something as simple as roles and responsibilities can become. Over the years, I've developed a simple way of helping clarify things. At least it has worked for me and others I've share it with.

Let's start by clarifying the nature of the oversight being discussed. This starts, as so many things do, with getting the terminology down. Our thoughts always follow our words.  So here is some terminology that I have used. The specific terminology isn't important, call things what you like. What is important is the way providing a definition dissects the differences in how we think.

Two of the critical natures that needs to be isolated are financial and authoritative. Borrowing from industry, these are generally referred to as Controls. There is usually a financial Control in place to authorize incurring a cost or paying a debt. Someone is always looking after the money. They control the finances. For small and medium organizations this is also the role providing the authority or direction. While the specific authorization varies by size of organization, the distinction of control is between the intent and the action.  For example, a corporate board would give intent and parameters of activity to an executive but would not actually oversee the carrying out of the action.  Similarly your boss might ask you to fill out a form but isn't going to stand around watching you do it or shuffle it off to human resources on your behalf.  They express an intent and parameters for action and leave the actual activity to you. Controls are the means we express intent, set parameters for and evaluate the results of actions.

Moving from the abstract down to the most concrete is another industry term referred to as Management. These are the boots on the ground, the supervisory hands and feet of the structure. This the guy in the hard-hat watching as the workers actually dig. It is how contributors get their activity assignments and to whom they report status and completion.  Management brings life by way of action to the intentions of Control.

The middle is where things tend to get complicated but if you've made it this far, you can likely see that we've set things up to address the middle quite cleanly. In translating intent to actions, there is an additional role that provides the decision-making, priority setting, and so forth necessary to ensure that the Management actions are achieving the intentions within the parameters of activity established by Control. This middle structure, which in industry parlance is often referred to as Governance, provides this necessary translation. It absorbs the feedback from failed or variant activity, and provides corrective and deterministic input to the actions as they progress. The provide a means for multiple types of activities with potentially conflicting agendas, skills, and maturity to cohesively fulfill a given intention.

Consider that a single intention ("build website") can have multiple approaches ("hire firm to build" vs "hire employee to build") and diverse activities ("create look and feel" vs "code database"). In even simple cases multiple contributors ("graphic artist" vs "sql developer") can be directed by multiple managers ("creative" vs "development") and they might have conflicting opinions on how to interpret the intention and parameters for the activity ("build website cheaply" vs "build website quickly" vs "build website with lots of bells and whistles").

Management addresses the differences in a specific interpretation ("build website as quickly as possible using our existing graphics and software"). Governance addresses the differences in the interpretation and the parameters of activity for the intention ("build website with this many bells and whistles and without buying new graphics or software"). Control addresses the differences in intention ("build website within this time frame and for this amount of money").

If you use the right terminology you can figure out the kind of conversation people think they are having during the confusing times. If they are trying to exert control, you can express that you believe control belongs somewhere else (or with you). If they are trying to exert management, you can express that you believe management belongs with you (or somewhere else). The point is that you don't have confront the content of their position, only the forum for discussion and resolution. This will often make it evident the different thinking about roles and responsibility. Once that issue is out and you are aligned, typically knocking out disagreements about how to move forward becomes vastly easier. Right or wrong, we might rarely be able to settle our differences on smaller issues ("the what should we do") if we have different ideas on the bigger issues ("the when and how we should do it"), or the large issues ("the why we do it").

Interestingly enough, the inverse of this technique is how we ensure we are able to evaluate the performance at each level.

Clear as mud?

Saturday, January 19, 2013

Management vs Leadership

One of those planning conversations that so quickly goes off the rails into parts unknown, recently wandered into a discussion about management styles and corporate culture and odd assortment of other loosely connected things.

Along the way, there were several references to management styles and writings about the same. From time to time, someone speaking about a management style would reference a book or article about leadership and vice versa. The seemingly accepted interchangeability of these two concepts baffled me. To my mind, they are vastly different if loosely connected. Much in the way a wedding planner and a bride can often be interchanged but you wouldn't want to confuse one with the other.

An example of this was a reference to the style known as Management By Walking Around. There was an implication that the more face time managers have, they are not only more productive but more loyal. Which I totally disagree with. In my experience and extremely limited tracking on the subject I do agree that the frequency and depth of personal connection with contributors will increase productivity, provided that connection is about more than productivity. Should there fail to be an exhibition of leadership during these engagements, then a tipping point emerges where the contact becomes an inhibitor to productivity. Especially in areas requiring significant creativity or innovation. Prior to the tipping point for productivity being reached there can be observed a decrease in the attributes signifying loyalty.

The roadmap follows like this: too much face time without leadership makes employees feel over-managed and under-supported. They lose their part of the connection to the business objectives, they cease seeing their contribution being impactful. When they don't feel connected, they care less about their own productivity and eventually the productivity falls away. It's hard to stay productivity when you aren't inspired.

From the other direction, if you exhibit leadership too infrequently and with insufficient management, they might do their best to be productive but not know how to measure or account for what they do. They will then get frustrated at the disconnect between their hard work and successful business outcomes. In this case, business failings which is the summation of productivity, will be the precursor to diminishing loyalty. It's hard to stay loyalty to a leader in the face of failure.

Giving clear instruction and frequent oversight is management. Having actionable metrics and attainable objectives is good management. But these are nothing without inspiration, encouragement, a sense of purpose and a clear, personal connection to business outcomes. Ensuring people understand why they are working hard and have a transparent view of the impact they, specifically and personally, are making, that is leadership.

You can lead to long-term failure, you can manage to short-term success. Without leadership, your management won't have longevity and without management your leadership won't see results.