The Generative Turn for Tech Strategy
Disclaimer: all ideas are my own and may not reflect my employer's or my affiliated organizations' views.
In this blog, I argue that generative AI has triggered a paradigm shift in computing. I call the paradigm shift “The Generative Turn.”
I will review past paradigm shifts in computing and explain why generative AI is the next shift. I will recap new trends and patterns, give my recommendations for senior leadership, and give some advice on the direction and timing of these initiatives.
Inspired by Art
A few weeks ago, I saw the author Kate Crawford use the term "The Generative Turn" to describe how the concept of art has changed since the introduction of generative image tools like Stable Diffusion, DALL-E, and MidJourney. She defines The Generative Turn as:
A moment where what we previously understood as how everything from illustration to film directing to publishing works is all about to change very rapidly.
I love how Kate has taken a strategic-systems thinking view of generative AI's impact on art and the benefits and costs to society. Her definition nicely echoes the definition of the Technological Singularity. It got me thinking, how will the generative AI “turn” impact our overall IT roadmap?
Like in many of my blogs, I attempt to take a longer-term holistic system-thinking approach to this analysis. With all the rapid changes in generative AI, this is more challenging than ever.
The Past Paradigm Shifts in Computing
Thomas Kuhn, an influential historian and philosopher of science, defined a paradigm shift as:
A fundamental change in the basic concepts, principles, and practices of a scientific discipline.
According to Kuhn, progress in science and technology usually proceeds under a shared framework or “paradigm” of assumptions, concepts, and methods that define the research agenda and guide scientific inquiry.
A paradigm shift occurs when the existing assumptions and beliefs can no longer help us plan for the future. In response to this crisis, a new way of thinking and assumptions emerge that fundamentally change how we do strategic IT planning. This new paradigm involves a different set of assumptions and beliefs that lead to new business strategies that were previously unattainable.
Kuhn emphasized that paradigm shifts are not gradual or linear processes but rather involve sudden and revolutionary changes in how technologists think about the world. Kuhn argued that paradigm shifts are driven by a combination of changes in science, technology, and society's acceptance of these changes.
Here is a summary of some of the past paradigm shifts in computing.
Personal computing: In the last 1970s, low-cost PCs replaced mainframes. This created an explosion of new software like spreadsheets that could be quickly customized to the tasks of workers.
Graphical user interface (GUI): The introduction of GUI in the 1980s allowed non-technical staff to use metaphors on the screen, like documents and folders. People didn’t have to memorize command-line interfaces making it easier for everyone to interact with computers.
The Internet: By the late 1990s, PCs were no longer islands of information. Software packages could assume that they were connected to the Internet and be continually upgraded with new features. Web browsers and Wikipedia made it easier for everyone to access distributed knowledge.
Cloud computing: By 2010, AWS and other companies allowed organizations to perform tasks using flexible and scalable storage and computer resources.
These paradigm shifts in computing changed how we create strategic planning in our information systems. They made it easier for non-technical people to use low-cost resources to help them with tasks. Now let's see how generative AI will also be a big paradigm shift in your IT strategy development.
Background and Assumptions
This blog makes a few assumptions. I will summarize the key points and provide links if you want to dive deeper into my perspectives.
First, this blog assumes that you have some background in AI. It assumes you know that AI is trying to simulate the role of the human neocortex and its role in predicting future events. I suggest my blog on One Thousand Brains and the EKG if you need some background on this topic. A key take-home point is that we must deeply understand our brains and their use of reference frames to make true progress in AI.
The second assumption is that you understand that there are many different ways to represent knowledge of our world. We call these representations "models." Some of these representations are convenient, and some of them are strategic. Some models are vague, and some are extremely precise and capture the nuances of business entities and their relationships with each other. Putting all your knowledge in a two-dimensional array is convenient for reusing the GPU hardware on your video game system. However, it might not be the right strategic choice for sparse knowledge representations that model the human brain, where a neuron is only connected to 10,000 other neurons. Our brain has about 65 billion neurons. Each neuron is only sparsely connected to the rest of the brain. An adjacency matrix would be way over 99.999% zeros.
Problems in Flatland
Flatting your data to fit on a punch card, in an Excel spreadsheet, or a table in an RDBMS is an extremely convenient representation. Getting closer to how the human brain has evolved over time to represent knowledge in reference frames and knowledge graphs is strategic. In summary, convenient representations give us short-term wins but hold us back from long-term progress. Many of my blogs on Scalable Enterprise Knowledge Graphs cover this topic. Getting out of Flatland requires strong strategic leadership.
Most large-language models are convenient because they work well with back-propagation and easy-to-parallelize transformers. But don’t be fooled. This is not how the brain represents knowledge.
My last assumption is that you have a basic understanding of deep neural networks and embeddings. Specifically, understanding the relationships between graph embeddings and similarity search for similar knowledge is really helpful.
Many of the new patterns we will discuss require our corporate knowledge to be stored in a vector search system such as Pinecone or Vespa. Keyword-only searching will limit any future AI strategy. Similarity search is an embarrassingly-parallel operation and can be speeded up by many orders of magnitude using low-cost FPGAs. Don’t let anyone on your staff tell you they can’t find the 100 most similar items from a collection of 10 million in under 50 milliseconds. This is easy peasy with the right hardware.
If more people had diverse hardware and software backgrounds, my hope is that they would also reach similar conclusions to my recommendations. This lack of diversity in IT strategy may be holding back your organization.
LangChain and AutoGPT
LangChain and AutoGPT are the two latest design patterns to evolve from the Generative AI community. Let's quickly summarize their strategic impacts.
LangChain: Recursive Prompt Enrichment
LangChain is a newly named architectural pattern that combines the use of embeddings with large language models in orchestrated tasks. The central question these patterns ask is, Does the current prompt contain enough knowledge to answer the question posed? If not, it gathers more knowledge (usually through an API call to a knowledge graph or general knowledge base) and then repeats the question again. Can we now return the response to our user?
The inherently recursive patterns in LangChain use a well-traveled content enrichment pattern. But the combination of prompt enrichment with embeddings and large language models was a huge step in the right direction of building agile agent frameworks. It reflects many of the same processes the human neocortex does in problem-solving: reflecting, planning, and filling knowledge gaps using inference.
AutoGPT: Language is the New API
In quick succession to LangChain came the related realization that goals could be broken down into subtasks that each can execute in their own context. The figure above shows how language models can be used to take abstract goals in the prompt and convert them into discrete task lists. These tasks might depend on each other, and their order must be carefully choreographed.
For example, a task that plans a vacation with travel, lodging, and events might have a task to calculate the total vacation cost. You might ask it to find hotels in safe neighborhoods. Lower-cost hotels might be in a higher-risk area of a city. The total budget task can only be executed after all the other event-specific items return line items of their estimated costs.
Some of the newer systems could also take the next step. They assume if you had two large language models that were front ends to complex systems, they would naturally evolve to use language as their preferred form of communication. Rather than calling static brittle APIs, they only need ways to convert language into the appropriate internal API. In some special situations, language can be directly mapped to a graph query that will not disrupt response time for other users. We want our agents not to create CPU-intensive queries that consume too many resources at the wrong time. This gets in the way of high-availability systems that demand consistent user response times.
The Speed of Open Source Communities
It is interesting to note that both LangChain and AutoGPT were created by the open-source community just a few months ago. A large company with a billion-dollar R&D budget and hundreds of patent attorneys did not create it. Many people were in fear that large communities like Microsoft and Google were building an indefensible moat around their products. This turned out to be completely false. Open source software has quickly ramped up its quality as documented in the recently leaked document: We Have No Moat, and Neither Does OpenAI. This document is worthwhile for anyone creating generative AI business strategies.
Remember this when you consider what your company contributes to the open-source community and if your executives are using their systems thinking hats to contribute back to these communities.
A Concrete Example: A Helpdesk Chatbot
The concepts listed above are abstract. Let's now anchor our understanding with a concrete example. Assume you have a helpdesk with a chatbot agent. The users submit tickets with a chatbot but then must check the ticket status.
The trouble-ticking system is stored in a knowledge graph so that each ticket can be wrapped in the context of the event. If I say I have a problem with my PC, it will extract the fact that I have a Mac, I am running Mac OS Ventura 13.3.1 (a), the patch levels I have applied, how much RAM I have, and my free disk space. Without this contextual information, the trouble ticketing system can't precisely find similar problem/solution pairs in its history.
Once the status update chat comes in, the question is compared with similar questions. It then uses knowledge from prior chats that it can't answer this question without getting the trouble ticket status from the knowledge graph. It then fires off a get-last-open-ticket-for-user query to an API, returns the results, rebuilds the prompt, and sends it to a large-language model to compose a friendly response using full English sentences with related information like how long similar tickets take to get answered.
While your current trouble-ticking system might have an API for getting the latest open ticket for a given user, that is a brittle and narrow interface designed to answer one and only one question. AutoGPT makes the assumption that you can send natural language to any service, and it will figure out what API it will need to call. This shifts the burden of mapping a question to the right API call to the agent with the most data. This burden shift is what is so wonderful about Auto-GPT.
If the ticketing system API documentation only lists the numeric codes for ticket status, not their meaning (open, closed, etc.), then we are dead in the water. This tells you that machine-readable API documentation needs to be accessed by large language models. These descriptions must be compact enough to fit into a 4K prompt budget and fully describe the meanings of any numeric codes. Your data governance team is still responsible for getting this information on the API documentation site. Burying the codes deep within a data dictionary may not help.
Three Trends that The Generative Turn is Accelerating
We have seen many demonstrations of truly complex prompts that require the orchestration of many intelligent agents working together. Now let’s see how these will impact your IT strategy.
Trend #1: The Abstraction of APIs
Let's assume you have three cloud vendors that host GPUs. Each one of them has a proprietary API for using their GPUs for an hour. They have many options and rate schedules depending on how many GPUs you need, the length of time you want to use them, and their power and connectivity. Cloud vendors work hard to lock your code into the APIs for their platform and use their generic GPU hardware even if the tasks might be better optimized on AI hardware that has been customized to your situation.
With tools like AutoGPT and intelligent agent frameworks, we can now democratize using any online resources. Now third parties can build "aggregators" that can take your English language requests and recommend a specific service that is optimized to your context. This trend will focus vendors on providing efficient, low-cost computing that can be tailored to workloads.
API Abstraction lowers the ability for vendors to lock us into their systems and increase the portability of our code. This also includes the APIs to large-language models themselves. Switching from ChatGPT to Google Bard to OpenAssistent and making rapid cost and quality comparisons will be easy.
Trend #2: Embeddings Everywhere
Most of the new design patterns that generative AI has created need fast ways to find similar things. We compare prompts with documents, goals, tasks, data, data mappings, codes, reference data, and almost anything that your data governance team works on.
I call this key strategy an “Embeddings Everywhere” strategy. I want senior leadership to continually ask every IT service — “How can we quickly find similar items?”. Every data scientist in your organization should be familiar with algorithms that create vectors for business items. This includes every customer, every product, every product review, every call to your support center, every product feature, every Agile story, every acceptance test plan, every bug report, every error message in a log file, etc.
At the heart of these processes is that every question your customers ask in web search and chatbot prompts should be enriched by the most similar question/answer pairs and set back through your language models. Being able to grasp that cosine similarity is an embarrassingly parallelizable problem should be at the heart of your IT strategy. Any process that takes more than 50 milliseconds to find the ten most similar items will get in the way.
If you are a young engineer at your firm, documenting and cataloging the new timeless patterns of generative AI can only accelerate your career.
Trend #3: Combine Fine-Tuning and Prompt Enrichment Patterns
Smaller models, such as Open Assistant at around 12B parameters, can be quickly configured to be fine-tuned on your local datasets. This can be done in a cost-effective manner using commodity GPUs. Very large models such as GPT-4, rumored to be about 1 trillion parameters, are not cost-effective to fine-tune. Not only do we have to pay $84/hour for clusters of expensive GPUs to fine-tune the GPT-3 models, but we also have to have about 4 trillion bytes of RAM to host the models for reasonable response times. Hosting a fine-tuned GPT-4 model will run close to $2,000 daily at current prices.
The option is to enrich your prompt more with just the right contextual information. I use the term Prompt Enrichment because it really is the same as the famous Enterprise Integration Pattern called Content Enricher. I encourage everyone to use the same words so that we don’t need to invent new words for the same patterns.
Now let’s wrap up by reviewing a short list of my recommendations. We will examine the new training, skills, and resources your IT department will need.
Recommendation #1: Prompt Engineering for All
Every knowledge worker in your organization needs to gain experience learning how to create great prompts. These skills are necessary to coax the right knowledge from large language models like ChatGPT, BART, and OpenAssistent. Organizations like PWC have already made billion-dollar commitments to educating their knowledge workers to partner with generative AI when solving client problems.
I have been most optimistic about using generative AI for software developers and educators. There are many documented increases in productivity gains for these roles.
Yet, we can’t focus on providing training only for individual contributors. Every management level needs to lead in transforming your organization to the new generative AI paradigm.
Recommendation #2: Provide Safe Playgrounds for Learning and Evaluation
Many organizations are concerned that their intellectual property (IP) is slowly leaking out of their organizations through the knowledge in their prompts. Although this claim is difficult to prove, there is good justification for concern. Even though users can opt out of allowing ChatGPT to learn from their prompts, many people I talk to are unaware of this option. So it is best to provide a safer alternative for users to use these services that are guaranteed not to log prompts. You might have to set up a separate playground within a cloud provider like Microsoft Azure.
Recommendation #3: Consider an Enterprise Scale-Out Vector Database
In the last year, many new search engines have been created around the concept of storing embeddings in a single search engine. Given any document, it will quickly return the most similar documents. We call these vector databases. Although they are very new, they are foundations of modern generative AI systems that leverage the prompt enrichment pattern.
Most data scientists I work with are very comfortable with the concepts around mapping items into a higher-dimensional space and creating pretty-looking drawings of 2D projections of similar items in the Jupyter Notebooks. Principal Component Analysis can be done in just a few lines of Python, and ChatGPT will show you how to do it. These demos and pilots can show your team how data science algorithms create and use embeddings to find related knowledge. This is an essential first step.
However, very few data science people I work with know how to get these services to scale. They don't have holistic systems thinking plan to create embeddings for all their business entities: their customers, products, tickets, problems, solutions, data elements, and reference data.
There is one primary reason for this. And it is not that they don’t have the interest or skills to scale their pilot projects. The primary obstacle is that their senior leadership has never asked them to apply systems thinking to these problems. Data scientists are often only asked to consider the CSV file IT gave them. We must push everyone to take leadership roles in building out infrastructure that can scale with demand.
Recommendation #4: Build Natural Language APIs for Document Collections and Databases
In order to leverage all the wonderful things about generative AI, your teams need to start to visualize a future of intelligent agents all working together on small tasks that are part of larger tasks. This means that your organization needs to make databases “agent ready” by biding natural language to APIs and database queries.
You can start this process by looking at the 100 most common questions your users ask about your database. Write them down, and create a document for each one, including the parameters, code used, and definitions. You then add these documents to your vector database. When a question comes in, you match it to the right document, extract the parameters and execute the query. You would be surprised at how basic data dictionary and data stewardship is all you need to accelerate the adoption of AI within your organization.
Recommendation #5: Build Foundations for Intelligent Autonomous Agents
I have been advocating building autonomous agents for almost 20 years. I worked with world-class people like Arun Batchu (now at Gartner) and the late Gary Berosik to imagine a world where intelligent agents can be given abstract descriptions in plain English and then scurry off and find the perfect gift for our spouse and put the suggestions in our inbox by the next morning. And we have never been more excited about the potential to quickly build flexible agents that make our world a better place for everyone.
Creating responsible intelligent agents is complex. We can’t just give an agent a task and say, “Go to it!”. We need to give it a cost and time budget and instructions on minimizing the impact on production systems. If agents lower the responsiveness of production systems for real users, they will be quickly shut down.
Conclusion: Direction vs. Distance
I usually pride myself on being able to summarize to senior leadership the key things they should consider when asking how AI will impact their business unit. I discuss topics like data science skills, machine learning progress, AI hardware acceleration, knowledge representation, semantic search, developer productivity, knowledge management, etc. Yet I always remember the important quote from the futurist Paul Saffo:
Never Mistake a Clear View for a Short Distance
— Paul Saffo
It is clear that the latest developments in generative AI have quickly altered the roadmap for IT organizations and society as a whole. No one could have predicted that a tool AutoGPT would get over 100K stars on GitHub in the first 30 days. The pace of change is accelerating in unexpected ways, and key technologies are working together to compound the rate of innovation. That is the short distance that Paul refers to. The art world and IT strategy are both seeing huge shifts in their direction. This is the clear view that Paul references.
Despite our clear view, we must continue to focus on the direction of our strategy without being able to commit to specific calendar-driven deadlines. We need to do this because only our development teams can make accurate predictions on their development rates after they have adequate experience with these technologies. This is the Agile way.
Yet our inability to predict when AI will have a specific capability should not prevent us from moving in the right direction. Failure to do so is like a deer on the road when seeing the bright headlights of an approaching car. They freeze their position until it is too late to move. This is a disaster for any organization competing in technology today. Our leaders must be agile, supportive, and clear away obstacles so our employees can embrace the productivity gains of generative AI.