Another day, another LLM fail with confidence -

I want to portray a few observations i seem to be experiencing when relying on LLMs that got me into a sticky situation.

Quite recently i have been leaning a bit too much on the LLMs i would say.

My usual go-to for a search is now one of these resource draining beasts. As we are seeing we get back a relatively clean and concise answer most of the time (In my case anyway as i use it for small bits of code rather than a boat load of functionality). Where as before it would be the sprawl of the internet via Google which seems so cumbersome now, would you agree?

In the beginning…

I was tasked to setup a replacement of OntoText’s GraphDB in the form AWS’s Neptune.

The team and I had a discussion and set about investigating what was required and a peruse of the the AWS docs.

After some time we got back together and discussed our findings and set about implementing a Neptune DB.

It went relatively smoothly we setup the infra – permissions, roles, access, db using the AWS CDK in the first day.

So far so good

Now here’s the kicker, we need to implement a free-text-search. As with most graph DBs there isn’t a directly search available. I did think this odd being new to the new graph DB paradigm; being used to typical SQL DBs.

Apparently the way to go is OpenSearch where an index will be created and this will be used when a search happens.

The docs state it is integrated with OpenSearch so we thought it’s going to be straightforward and our run of success will continue. The docs also seem somewhat of a marketing plea. It is merely a building block that can fit into many of its offerings; Dynamo, Kinesis, SQS and so on. This was a little overwhelming.

Overwhelming

LLM to the rescue – give us an example of how to set this up. We were to-ing and fro-ing the entire day. Nothing working but learning as we go.

The drain of the mobbing session was kicking in and everyone feeling fatigued we decided to call it there.

On the plus side we had a Neptune DB cluster and an OpenSearch server-less service ready to go.

With some time left at the end of the day I was feeling i would not be defeated by this. I changed tac with my prompts and started at a higher level than what i usually do.

The response it gave back was gold! i was buzzing! It had satisfied my every need!

It had addressed all the issue we faced earlier in the day and gave a high level overview of the steps required to achieve the full text search.

I finished the day in high spirits ready make good progress the next.

Ready for anything

Waking up that day, I was excited of the prospects to come. I have a good high level plan for the day to get this stood up and working. I posted with excitement on Slack and outlined the plan and got buy in from the team.

Today is going to be a good day. We had a high level plan which was to call a Neptune endpoint, setup an index and also setup a the query to delegate the search to OpenSearch and return back the corresponding graph.

We had a lambda in situe that we’re extending to add the index. The example it gave needed some tweaking from a cUrl request to an aws http request.

With expectations running high we went to trigger this http call to make the magic happen and happen it did not.

Getting some errors here and there but then it became clear.

The endpoint does not even exist.

I was so embarrassed! I had gone all on this being a valid solution.

What is going on here

The word fuming was an understatement, so i went back to the LLM and gave it a roasting. I mentioned it had given me wrong data and i had got stakeholder buy in for it and how to do i explain this now, for dramatic effect.

I could not believe how it wormed its way out of it. It claimed that the endpoints it used were from POCs, sandbox and pre-release code bases not available in the current production version. So it said i could say that to the team, can you believe it!

Then i copied and pasted a sample query from the docs and said how do i implement this SparQL query for the full test search.

It’s response was again, this is not valid in the current version of the production code. I checked the version i was using and asked what the version was it checking it from and there patch version differences.

At this stage i was very disappointed in myself for trusting the LLM and how i confidently pursued this avenue with the team.

On the plus side the complicated docs and how to integrate these services are starting to make a little more sense now.

Conclusion

Seeing is believing. Like a figure of authority dictating their opinion is absolution. I was succumbed to the conviction that this hallucination was true.

It was like the LLM had summarised the docs, ticked all the boxes of my prompt and hallucinated an all encompassing API sprawling multiple from repos.

A lesson has been learnt here. Going forwards i’m sure to double check the facts and not blurt out an unsupported strategy.

In refection i was trying to take a shortcut and was expecting an easy solution instead of studying what is actually the production service capabilities.

Will i continue to use an LLM to carry out this work, most likely. Will i trust its every suggestion, certainly not.