It's Not About Custom Applications; It's About Clean Data

by Michael Szul on

No ads, no tracking, and no data collection. Enjoy this article? Buy us a ☕.

Every year, Microsoft holds a Global Azure Bootcamp in cities across the world--all on the same day. These events are held by local organizations (some affiliated with Microsoft; some not) that acquire sponsors, feed the hungry programmers, and organize speaking sessions to talk about Azure services, Microsoft technology, and various trends in the industry. I was lucky enough to be a speaker this year (talking about Microsoft's Bot Framework), and I sat in on several of the sessions that came before mine. One of those presentations was from Jim Hadley, CEO of Tiber Solutions, in what was a low-key, high quality talk on executive education in analytics. In his presentation, Jim made several key points that made me reflect on not just the software industry, but on some of the false perceptions that seem to permeate information technology (IT) in larger corporations and institutions.

I've spoken out in the past about the difficulties when management feels they can just buy programming off the shelf. Not only is this an affront to a programmer's years of experience and expertise, marginalizing their impact and value, but it amplifies one of the misconceptions that often result in failed initiatives.

Solutions are built, not bought. They are tailored to meet the strategic goals of an organization.

Jim points out that solutions are built rather than bought. These solutions are meant to meet to strategic goals of an institution, and no off-the-shelf solution can do that because off-the-shelf solutions need to be generalized to a larger audience in order to build a viable business with a strong customer base. Store-bought solutions can be configurable to try to cover a larger range of needs, but they certainly will not be as flexibility as custom-developed programs.

This is not the misconception, however, since the custom vs. off-the-shelf debate is out in the open and consistent, and there are many people with differing opinions. The misconception is in "custom," what actually should be "custom," and who should make that decision.

Tools and technologies are confused as being solutions by themselves.

Jim's statement about custom solutions was in detailing the success drivers in implementing technology in the enterprise. It's one of several bullet points that directly follows a discussion on what causes projects to fail. One of those reasons is that decision-makers often confuse tools, technology, software, etc. for being the actual solution, and this is never the case. Every solution is custom because every business is different, and that solution could involve custom software or off-the-shelf software that requires custom configuration, a few tweaked data points, middleware, and integration.

In reality, you could have a team of five software engineers that never build a single web or desktop application, but instead of sitting around twiddling their thumbs, they're doing configuration, integration, DevOps, administrative scripting, technical business analysis, data tracing, analytics, and maintenance tasks. Programmers do more than just write custom applications.

We are drowning in information (data) but starved for knowledge. --John Naisbitt

It's a gross misconception that custom software is too expensive. If you are only looking from quarter to quarter, or from budget year to budget year, that might ring true, but long-term thinking and strategic planning reveal the true value of custom solutions in general, and custom software explicitly.

Custom transactional software reduces costs by validating data early, and reducing overhead later. Dirty or bad data is the bane of any data science or analytics project. Recent surveys from various online technical resources place data cleaning at 60% of a data scientist's time, and sometimes as high as 80%. Data management and storage requires engineering, extraction/transformation/load (ETL) packaging, and various other forms of data architecture to build a viable data warehouse. When all of that data comes from disparate systems in various forms of data disarray, your analytics or business intelligence teams grow exponentially, and in many cases are forced to specialize in a specific area of data analytics or a specific area of the business. This ends up creating expensive data stewards and report writers, whose time could be better spent on exploratory endeavors to produce meaningful decision support.

Custom transactional software (built on top of informed technical business analysis) offers the ability to validate data entering the transactional workflow, pre-cleaning it before it gets to any data mart or data warehouse. People on the transactional side can then inform the decisions on the analytics end when it comes to denormalizing data structures for reporting and analysis purposes.

Beyond data, custom software also allows for business rules and processes to be explicitly defined and automated. When software is bought off the shelf, you are often left with the "best" prepackaged solution. This solution might not capture all of the necessary data points, or might not have the appropriate interfaces. It surely will not satisfy all business needs, or be configurable to all processes. What happens in these instances? Most often, people are hired to fill the gaps that the software is missing. This means full-time employees with salaries and benefits to perform manual tasks of validation, data copying, or simply administrative support and reporting because the upfront investment in an appropriate custom solution was not made.

Ultimately, the earlier you begin to clean data and automate processes in a workflow, the less expensive the overhead is at the tail end of the process.

If data is the new oil, doesn't it make sense to fix your refinery tanks rather than hire people to clean up the oil spills?

As industries and companies get larger, projects get larger. As projects get larger, teams get larger, and when teams get larger, specialization often occurs. The problem is that when specialization occurs, this creates the need for knowledge transfers and hand-offs. Jim prefers to invest in "Swiss army knife" resources--people whose expertise allow them to be involved in projects from inception to deployment (or from business analysis to analytics reporting). When this happen, the baton is never dropped and solutions are ultimately of better quality because informed decisions are made with technical resources in the room.

In Fred Brooks' The Mythical Man-Month: Essays on Software Engineering, Brooks uses the term "surgeons" to describe these Swiss army knife resources--people who can get the job done from start to finish, with a support structure available to lean on or delegate to. Brooks' is also quick to point out that the more people are added to a project late in the game, the longer the project will actually take, and the amount of cross-communication between additional resources that occurs (additional nodes in the network graph), the more time is spent on communication, meetings, and knowledge transfer, slowing down a project's momentum.

I put the emphasis on custom applications, but the reality is that you are looking for clean data. Clean data clears the way to limit overhead, while making appropriate personnel decisions, rather than filling gaps. An investment in enterprise architecture means enabling a key group of individuals to have the tools and decision-making power to keep the technology strategy aligned with the business strategy, since information IT is inseparable from business needs and operations. This is an investment in custom solutions, which could mean custom application development, or simply choosing the appropriate application to customize, integrate, and support. Confusing off-the-shelf software with the actual solution (which should be custom) ultimately creates a pitfall that can lead to bad data and failed analytics. In fact, Gartner reports that more than 60% of all analytics projects fail. That means that more than half of the projects meant to create data-driven enterprises end up failing, and some of the major reasons for this failure are in the lack of understanding when it comes to the value of custom solutions, specialization of personnel which creates hand-offs allowing things to fall through the gaps, and failing to recognize that successful solutions are the result of IT and business collaboration, and one should never simply be a service to the other.