Home > Technology + Innovation

Unlocking AECO Knowledge with GPT: Natural Language Queries for Document Management

Unlocking AECO Knowledge with GPT: Natural Language Queries for Document Management

By Dr. Jeff Chen, Director of Digital Transformation and George Broadbent, VP of Asset Management, Symetri

A Mountain of Documents

Architecture, Engineering, Construction, and Operations (AECO) stands as one of the most document-intensive industries in the world. From early design sketches to finalized contracts, and comprehensive Operations and Maintenance (O&M) manuals, stakeholders are inundated with reams of paperwork and digital documents. (See image 1) This paperwork, albeit essential, often slows down processes. Imagine an engineer looking for a specific clause in a 200-page contract or an operator trying to understand a certain aspect from a complex O&M manual. Such hands-on searches not only consume time but also increase the potential for error.

The Rise of a Textual Savant

Generative Pre-trained Transformer (GPT) has emerged as a game-changer in the field of artificial intelligence. Its foundational architecture has been heralded for its ability to comprehend, generate, and interact with human-like textual finesse. To truly understand GPT’s potential, imagine an assistant that has read virtually every book, article, and significant document up to a last training cut-off in 2021 (as in the case of GPT-4). It retains this vast sea of knowledge, ready to offer insights, generate texts, and answer a plethora of questions with a precision that is remarkably close to human intellect.

One of the standout features of GPT is its “zero-shot” or “few-shot” learning capabilities. For many tasks, GPT does not have to be extensively trained. Give it a set of instructions or a couple of examples, and it gets to work, generating relevant responses. This ability means it is versatile and can be adapted to numerous scenarios without heavy retraining.

While GPT can be likened to a superhuman textual brain, it is crucial to remember its knowledge is not inexhaustible. Information created after 2021 is a blind spot for it. This implies that while GPT can provide a vast general knowledge base and even industry-specific insights up to its last update. However, for real-time, contemporary data or post-2021 developments, a supplementary method becomes essential. 

In the context of AECO, while GPT could provide general knowledge on industry standards, protocols, and best practices available up to 2021, newer project documents, recently established protocols, or contractual changes past this date would be out of its purview. Thus, there is an inherent need to marry GPT’s prowess with another technological solution to ensure continuous knowledge updates and relevance–and that is where embedding comes to shine.

Embeddings and Vector Databases

Embeddings, in the realm of digital technology, work like magic. They take complex, lengthy, and often hard-to-understand texts and convert them into colorful vectors—numerical representations that are compact yet bursting with meaning. Each vector captures the essence, or soul, of its original text. In this fantastical library, vectors are those glowing colors, each hue and shade representing a theme, topic, or sentiment.

Now, when a question arises, it is not spoken or typed into a computer. Instead, thoughts are transformed into a radiant vector. The moment this happens, the library comes alive. The books (or documents) that resonate with the query light up, leading straight to the most relevant pieces of information. This real-time matching and guidance are orchestrated by what is known as a ‘vector database.’

The magic deepens when this is applied to the AECO world. Picture an architect standing in this library, wondering, “Which designs have been optimized for tropical climates?” As the question forms, related project blueprints and design documents shimmer in response, ensuring the architect does not waste hours but gets instantaneous resources. (See Image 2)

In digital, real-world applications, this means that the vastness of AECO documentation — from initial drafts to finalized O&M manuals — can be transformed into a responsive, intuitive database. Questions from, “Show contracts that involve sustainable materials,” to “Where are the protocols for earthquake-resistant infrastructures?” get swift, precise answers, eliminating the tediousness of manual searches and ushering in an era of streamlined information retrieval.

In essence, embeddings and vector databases weave the tapestry of the future, where information is not just sought but is intuitively and vividly presented, ensuring that the AECO industry remains not just on the cutting edge but also marvelously efficient.

(Sidebar 1) Unlocking Potentials

Implementing GPT combined with embeddings offers the AECO sector unprecedented advantages:

  • Efficiency: Searching becomes instantaneous. A project manager could request, “Show all protocols for seismic safety in high-rise buildings,” and get immediate results.
  • Precision: Instead of sifting through irrelevant data, stakeholders receive only pertinent documents, minimizing information overload.
  • Learning: New employees can get up-to-speed quickly, asking questions about company protocols or past projects, and receiving precise answers.

(Sidebar 2) Use Cases:

  • Design Phase: Architects can effortlessly retrieve design standards or past project references that match current project specifications.
  • Construction: Contractors can instantly access material safety data sheets or machinery operation manuals without hands-on searches.
  • Operations: Facility managers can query specific O&M procedures, ensuring optimal building operations and safety compliance.

Boundaries of the System

Harnessing the power of embeddings and vector databases has undeniably brought a huge change in how vast amounts of information are accessed and processed. Yet, like all pioneering technologies, this method comes with its unique set of challenges.

First, there is the question of text chunking and how to determine the ideal size of text to be fed into the system for vector conversion? If too small a slice is taken, the risk is losing context, making the resulting vector a poor representative of the actual content. Think of it as trying to understand the plot of a novel by reading a random paragraph–there is some information, but not the whole story. Conversely, if the text chunks are too large, not only is the system flooded with unnecessary data, but also risks exceeding the token limit, especially when further processing it with GPT.

While embeddings can determine thematic relevance, they are not inherently designed for precise data retrieval. They shine when answering queries like, “How can BIM benefit our agency?”, and pulling insights from presentations, roadmaps, and guiding documents. However, for more data-specific questions such as, “How many BIM projects were completed last year?”, the embedding method might falter. It is a matter of qualitative versus quantitative data retrieval–while embeddings excel at the former, they are less adept at the latter.

To navigate these challenges, it is essential to see embeddings and vector databases as evolving tools. As the AECO industry and its informational needs grow, so too will the sophistication and adaptability of these technologies. Balancing the size of text chunks, refining the vector database for more precise queries, and integrating other AI systems can further optimize and refine this promising avenue of information management.

A Bright Horizon 

Navigating the vastness of AECO documentation has been a historic challenge. But with the integration of GPT and embeddings, the brink of an information revolution is close. The potential to instantly access the right knowledge not only streamlines processes but also fosters a culture of informed decision making. While the system has its limitations, its introduction marks a significant step forward, setting the stage for a more informed, efficient, and agile AECO industry.

Dr. Jeff Chen, Ph.D., LEED AP is Director of Digital Transformation, Symetri. Dr. Chen leads digital technology integration services for all aspects of client businesses to drive efficiency, reduce environmental impacts, and increase sustainability.

George Broadbent is Vice President of Asset Management, Symetri. Prior to his current role, George was Director of Asset Management. He has more than 25 years of diversified professional experience in Asset Management, Electronic Content Management, System Architecture and Vital Records Planning and Management.