Beyond the Chatbot: AI and the Future of Digital Banking

Written by Cheryl Brown | 14 May, 2026

In the past couple of years, AI conversations at financial institutions have changed from "should we?" to "where should we next?" Adam Blue and Corey Gross discuss how banks and credit unions can move from broad experimentation to focused AI investments that align with their strategy in areas like operational workflows, fraud prevention, and support resolution.

Watch

Transcript

Adam Blue

Hey, everyone. Welcome to Cut to Context, where we are going to get real about AI today. Joining me today is Corey Gross, VP in the Product organization here with Q2, who spent a ton of time working with customers on AI initiatives, what's working, and what's not working in AI, and really just getting really deep into it.

As we get going here, Corey, I'm reminded of my uncle who lived in Texas for years. I used to go visit him as a kid, and he had an odd kind of mid-‘70s Texas wisdom, and he would tell me repeatedly, "There's really only two kinds of people in the world. People that think there are two kinds of people in the world, and people that are smart enough to know better." Of course, in and of itself an absolute paradox. He somehow made it into like a Country Joe & the Fish kind of Zen koan sort of thing.

I feel like today we've got two kinds of people in the world. People who are pretty sure that AI is going to completely change everything so fundamentally that we either enter a Utopia, or a hellish just complete disaster scape of AI destroying humanity, and people that think that sounds a little crazy. So, I think I know where you come down, but as we jump in here, maybe take that notion and think about what does it really mean when we talk about the future of digital banking in AI?

Corey Gross

Yeah, loaded question. I think what people have been trained to think of is let's take the generative AI chatbot and plug it into digital banking, and it becomes the hub instead of any existing UI to retrieve information about your account balances, your product holdings, to look up transactions, etc. And I think you, and I are of the same mind that that doesn't have the muscle memory quality that most folks want from a digital banking UI.

But I think where the immediate gains are when it comes to AI, and banking is more operational transformation. A lot of back-office functions which have very complex multisystem workflows are workflows ripe for agentic innovation. How do you compress something that takes a very long time historically to do a lot of context switching between multiple systems, and compress them into from days to hours, or hours to minutes? That's where I see the immediate value in an agentic digital banking system.

I think from a user perspective, I think there's no doubt that people want to be able to match the context that they're building in these LLM solutions with their banking data, which I think we also share the belief that it's the customer's data. They own this. And so they want to be able to get advice that feels personalized and tailored to them. And so I do believe there is a place for AI to assist the customer to make more confident financial decisions and receive information that otherwise they would have to spend a lot of time digging for. But right away, I feel like the problem space ripe for disruption is operational.

Adam Blue

Yeah. Interesting. I think that's a great take. So, given the focus on operational kind of opportunities right now, where do you think there are some examples of AI really earning its keep in operations in the back office in delivering better digital experience?

Corey Gross

Yeah, for sure. I think when I came into Q2, I didn't just look at AI as the bright, shiny tool from an LLM perspective. I looked at AI as sort of the spectrum of capabilities from rules, machine learning, all the way through. And I think historically at Q2, and elsewhere in banking, you've seen machine learning play a very important role when it comes to personalization. So, basically take a whole lot of data where trend analysis becomes the killer differentiation. When you have this much data, how do you make big data small so that you can make decisions that appear to be more personal, and therefore you can get a stronger response to that information? And then also when it comes to fraud, you're looking at a whole lot of data, whether it's behavioral data from logins, or transactions, or what have you, and how do you use machine learning to predict when a potential negative outcome might occur, or has possibly occurred?

And you're seeing that carry through in the generative, and now agentic age where I think a great example of a use case is being able to monitor user activity, and be able to detect anomalies. And when you see anomalous user behavior, you can step in, and make sure that you can stop a session before money leaves the bank, or something else bad happens. So, I think those are good examples of where AI has earned its keep, and you'll see it continue to develop because it's the most high leverage area for a bank.

Adam Blue

Great. Yeah, I agree. I think those are great use cases. It's interesting if you read a lot of press, a lot of media, a lot of... We live in a world where Twitter, or X, as I guess it's officially called now, is in some ways as valid a source of news about cutting edge technology as CNN, or The Register, or anything else. I see two really strong tracks. There are individual people who say, "I got up this morning, and I spent $4,000 in tokens writing code, and I reinvented CRM in a morning, and the world has changed forever." And I think there's some truth in that. I think people are doing amazing stuff. And then I see broader analysis, whether it's from Harvard Business Review, or some of the bigger think tanks that say, "Yeah, AI is here, but it doesn't seem to be having any real impact on profitability, ability to accrete revenue, or ability to serve customers." And I'm just wondering, the world is a big, and wonderful place. How could it be maybe that both of these things are true at the same time, right? What's the story we can take away from such widely varying reports about how AI is working from individuals, and then from research? Where's your head at?

Corey Gross

Yeah. It's so funny that meme you kind of bring up because there's that meme that I saw. It's like “I rewrote Salesforce in four hours, and a couple thousand bucks,” but then there's also “OpenClaw replaced all of my subscriptions, so, I've gone from paying $500 a month in software to $1,300 a month on API costs, and 15 extra hours fixing YAML.” So, the reality is both are true, and I think my refrain is it depends on, it's always started with what is the problem that we're solving we're trying to solve.

There is a place for software. There is a place for SaaS because it's really good at delivering repeatable results for the intended use case in a relatively inexpensive way that replaces the burden of building, maintaining, upgrading software on the organization using it to the supplier. Now, then there are use cases where, holy crap, that's a lot of money that I just spent on a SaaS product, and I don't need all the highfalutin stuff, and features that come with it. What I really need are these tools, these pieces that help smooth out this workflow I have.

And so I think the nuance is always in the middle of those two extremes, and it always, for me, starts with what's the job, and what's the right tool to solve that job? Everyone's now jumping to generative AI to solve a lot of these problems where some of the perceived bugs of generative AI, or LLMs, are actually the features. The feature is the randomness. But if you misapply that randomness to something that requires precision, you're going to get a bad outcome, and you're going to probably have a more expensive outcome, not just in the consequences of a bad result, but the actual cost of operating and maintaining that service.

So, I come out with, what's the problem that we're trying to solve? What's the tool that's best used to solve that problem? And then what's more economical for the person or for the organization? To build that piece, or that piece of software themselves, or that bundle of software themselves, or outsource it because you're actually optimizing for a solution where risk management is the thing that you're really spending the money for.

Have a third party manage the compliance, the maintenance, the updates, the support apparatus required to ensure that all of your users have the right outlet if there's any issues, and go from there. I think there's this rush to AI everything because of this perception that like, my 9-year old built a Claude Code weather app in nine minutes. So, if I'm a staff engineer, I should be able to rebuild an entire stack, but there's a lot more than those extremes.

Adam Blue

Yeah, yeah. And I think building things fast is amazing, building them quickly, but sometimes it can feel, you can feel a little bit like the early 2000s where it was just like, "Let's just have bacon on everything." Everything had bacon in it for the better part of a year. And at some point it's like, that is maybe enough bacon.

So, maybe give us a couple examples, Corey, where you think about this is a problem where the nondeterminism of AI is actually really interesting, and to some extent it's a little like drawing with colored pencils versus painting with watercolors. It's just harder to manage, and harder to control. And so where are some use cases where maybe that actually ends up being an asset for an FI as opposed to a liability?

Corey Gross

Yeah. I'm a big fan of like right now you've seen financial advisors where you want to have two-way conversation to kind of think something ... Well, let me separate this from customer-facing solutions, and then the internal stuff. I think most of everyone that I know is using the randomness, the generative solutions for a lot of their creative writing, their internal research, going down the Claude or ChatGPT rabbit hole to kind of accelerate their learning, and also accelerate the collation of thoughts and ideas that they have into something that can be presented, receive feedback on so that you can iterate, and get to sort of a working version of your work product quicker than you could before. And I think that's where you want that spark of creativity, to break the cold start problem of standing in front of a blank screen for a whole lot of time, and feeling like you're wasting your life and your hours. I think that is the no-brainer value out of that spark of randomness.

I think from an end user perspective, you think of the sometimes frustrating interactions you can have with deterministic solutions where you get to ask, and answer, but there's more nuance, there's more flavor to your inquiry. I think that financial advisory is something that you want context. You want to provide more context. I'm married, I've got a family of four, and we want to travel to such and such a place. This is what the context of my financial profile looks like. This is what the cashflow situation for me looks like. How do I accomplish this goal? Give me options, and let me iterate with you, and talk that out to something that can make sense for you, and resonate with you versus just a spreadsheet, or just a simple output of, here's what you should do. Because that feels cold, it feels distant, and it doesn't feel personalized to you.

So, I think whenever it comes to personalization, you want to create a little bit more of that randomness, because even though it isn't a person, people trust people, people trust the expertise and taste of others. And if you feel like you're able to have that back, and forth, you feel like it's getting closer to recommending something based on your proclivities or your particular situation.

Adam Blue

Yeah. It's interesting you bring up “cold” as a term, and I think there can be some simulacrum of warmth in an interaction with an LLM that I think is interesting. I had some work to do, and I was performing that kind of a thought partner pattern, right? Where you don't ask the LLM for the answer, you propose something, you get feedback, you iterate. I find it pretty useful for working on like visions for things. And so I used ChatGPT Enterprise, and it gave me usable, good answers. And then I used our internal tool, Kraglin, which is trained on a whole lot of very specific domain expertise here at Q2. And it was really interesting the difference between the interaction between the two of them, just based on the context and what the engine had access to.

They both use foundation models. They're relatively similar. And I think the overall output from Kraglin, because I was working on something in the Q2 realm, I think it was a little better, but there was still value in the ChatGPT piece as well. And I think it just underscores that the nondeterminism … I think randomness is a little too strong, kind of the fuzziness of an LLM implementation, you can make it work for you in a lot of ways that are interesting as well.

Corey Gross

And that's right. I think that people are just looking for LLMs to be the cheat code for the answer, but really you said it best, they're the work partner, and if you're using multiple kinds of solutions, whether it's Kraglin for us, and ChatGPT, or whatever. I have a friend who uses this concept of braiding. So, you start with LLM-A, which has very generalized context, doesn't have any domain specific knowledge. It gives you an idea. You bring that into your domain-specific knowledge base LLM, and then you kind of bounce them back and forth, and then you probably get to some answer that feels more right to you because it's like soliciting feedback from multiple people, the layperson but also the expert.

Adam Blue

That's really interesting. I'll try that. I kind of used them in parallel, which felt productive, but it had not occurred to me to cross the streams in "Ghostbusters" parlance. I will give that …

Corey Gross

Now you're speaking my language.

Adam Blue

I will give that a shot. Make a prediction that will almost certainly be wrong, but tell us what does agentic banking really look like for an account holder? What do you think that is, or could be?

Corey Gross

Yeah, exactly. The proclamation, or vision that will almost certainly be incorrect, but what that proviso out of the way, I think that we're entering an era where openness, which is something that we've invested in very heavily, is going to be prized more than ever before because no single organization is going to think up the exhaustive list of potential use cases for an agent, or the different things that you would want an agent to do.

And so to me, authentic banking, or agentic banking is enabling trusted third parties, building the right orchestration, building the right safety nets, the guardrails, building that sort of system of trust, and allowing third parties that are trusted to interact with digital to accomplish very bespoke workflows. So, we've talked about the example of an agent that could go inside of digital banking and fetch your tax documents so that they can prepare your tax return in an instant instead of having the human go through a bunch of users interface screens, and download the package, and then upload it into a third party system, etc.

I think it's that openness that allows the user to select the agents that are appropriate for them so that they can accomplish their job as fast as possible. And many of those agents will be authored by Q2, because it's probably in the best shape and the best position to write agents that interact with our systems, but there's going to be third parties that have workflows that we have invited into our ecosystem through Innovation Studio and the marketplace that will have all kinds of use cases.

And there will be FIs that want to have their brand expressed, and use cases very specific to the workflows they've designed to be executed by agents that they author. And so that's what the agentic banking furthers is that the choice aspect from both the institution, and from the user, be they a retail customer, small business, or commercial, because I think the use cases in each of those domains are going to be larger than any individual company can build.

Adam Blue

Yeah, yeah. I think I can really see that evolving, and the tax thing is like, I think such a great example because it's a lot easier if you do it a little bit all year long. And really if you think about the domain problem of your taxes, you have an extraordinarily complex set of laws and regulations that you have to map the story of your life over a calendar year against. And the way an LLM works, the way it works on language instead of working on data, which are two totally different things, right? The notion of language, and the notion of data is just so radically different when you dig into that as a concept, but being able to operate on this story of your financial life for a given calendar year, and then map that in a discrete way against the set of rules for filling out your taxes, it's such a great use case, and I think it's so hard to do procedurally in an effective way. And so I think there's a ton of potential there for that kind of thing.

So, one of the challenges we run into then for all the value of this technology, which is really fascinating, and for the way that it helps us keep problems in the space of natural language instead of boiling them down. There's nothing so joyless in life as having a conversation with someone, and then going, and mapping it into a discrete set of data elements that go into a table in a database. It's like you lose everything interesting about what happened. It's like watching a basketball game, or just seeing the score and who had how many points at the end. One is not a substitute for the other.

But even though we can operate in that way now, and LLMs give us that capacity not to lose the story underneath the data, I think it's tough to measure like, are we doing a good job? Are we doing a bad job? Is this an improvement? What happened when I changed out the foundation model? So, maybe talk a little bit about what you've learned around measurement, whether it's measuring outcomes, or model effectiveness, or drift. Just take us through that.

Corey Gross

Man, that could be the entire subject of the follow-up conversation to this one. I think that you’ve got to build what you intend to measure into the vision or the objective for the solution that you're designing. I think that too often we think of KPIs or measurement as an afterthought to the thing that we've built, but it has to be built into the design of the solution. And a good example is kind of like what we're doing with Q2 Assistant where we have a very clear objective, which is we want to make the customer support process more effective for both the person who we're trying to solve a problem for, the account holder, but also the agent that's trying to resolve the dispute or the issue as quickly as possible. So there are already some built-in measurements, some metrics that we can use to determine whether the solution is more effective than the traditional way of handling it.

And so build that into from the second that you open, the first prompt is submitted until the time the conversation is terminated, and the issue is resolved, and then you can A/B test that against … so time-to-resolution in this case is an important metric. Also, the effort that it took to resolve that dispute. There's the time elapsed from the time the customer logged the issue, or the ticket to the time that they get the communication back that it's resolved, but then there's also the human time it took to actually complete those steps. Then there's escalations, right? So, think about all the different people in any case-handling process that need to be included, right? So, what is the escalation rate? Are we reducing the number of escalations?

And this is just one use cases in one domain, but it can be anything from false positives. So, when we talk about the fraud domain, a big problem that isn't like there's that pendulum of safety and customer experience. And so oftentimes when we index too hard towards safety, you disrupt the customer experience or the operator experience, and they have to deal with now a whole slew of false positives. So how can you not just increase the safety, but the false positive rate falls as well? So, there's ways that we can measure effectiveness solution on those dimensions.

And then in terms of the evaluation gates that you brought up, that's a whole conversation. But I think when we think about evals, we think about like the first … my mind goes to hallucinations, right? So, what is the risk that I put a prompt into an AI solution and get back nonsense, or information, worse, that appears authentic but is actually not based in any real fact. And so writing evals so that we can measure whether the answer that we received back was a positive response or a correct response based on the prompt that we gave it.

Adam Blue

I think that's great, Corey. I think those are great ways to approach that. I think what we're finding a little bit on the very technical side, like on Jesse's team, where they're kind of at the front edge is the ability to measure and the metrics and the ability to evaluate what's changing and what we're doing is just as important as understanding the kind of underlying capabilities of the LLM and the technology itself.

All right, last question for today. You spend a lot of time talking with our financial institution customers, and more importantly, talking with people at banks, and credit unions that are really just trying to do their job of helping their communities grow. What are you hearing from them that they would like to have or that they're excited about with respect to AI?

Corey Gross

It's also a big topic. You could recall a couple of years back when we just started talking about AI in earnest at CONNECT, and a lot of FIs expressed a lot of reservation, a lot of consternation internally about whether they should invest in this, whether they should be having the conversations, whether it's too early. They were concerned about all the things that folks were talking about on Bloomberg, and Business Insider—hallucinations, IP leakage, data leakage, etc. etc. And it just felt like the risk profile is way too high for the bank to seriously investigate, or consider. But here we are just a couple of years later, and it's a 180, right? I mean we're talking with customers, and partners, boards, and executive management teams, and they're all varying degrees of all in, but I think the challenge now is where to start, where to start in earnest, I mean.

There's not many FIs I come across these days that don't have some kind of AI tool, either working internally that is being procured by a third party that's in production, whether it's a fraud solution or other, and a lot of them are running a lot of POCs with different vendors, whether it's us or other partners of theirs. And the question now becomes where to place their bets because it's even a lot for a technology company, a large technology company to have like 35 POCs in parallel, and hoping that those filter down to something valuable. There's cost is too high, not just in people, but in just the infrastructure cost of executing on these projects to have to cast that wide a net. So, I think now what they're trying to look for is direction on strategy. What are the right strategic bets to place so that they can whittle those 20 POCs down to five, and get those five sort of more measurable outcomes-based pilots to something that they want to seriously invest in, and scale.

And so, thematically, I think the first one is investing in foundational tools to enable their people because as their people become more familiar with using the tools, whether it's ChatGPT Enterprise, or Claude Code, or Claude Cowork, you will create an organization that understands what they're building with. It's kind of like that shift from steam to electricity. You're talking about transformation at a human level in terms of how they work.

And then from there, a lot of them are looking at, OK, it's not just about buying or building, it's about really designing a workflow that is suited for the AI age, or the agentic age. And the high-leverage workflows, no surprise, are usually in fraud, in back-office operations, improving the customer experience, improving the efficiency of executing a lot of these daily, hourly workflows that can otherwise take up a lot of human effort. So I think they're at the stage where they're bought in to the value and the potential of AI. It’s just really sharpening the pen on what bets to make that are most aligned with their winning strategy, which we can consult with them about. We can advise based on what we see in the industry, but I truly believe that every FI is going to have a different strat ... They should have a different strategy for how they want to win, and compete, and we're just one partner that helps them get there, but they have to think more holistically about what best they want to make.

Adam Blue

Yeah, yeah. That's an interesting takeaway that maybe it's more important that the AI choices that a financial institution makes, the things they want are really aligned with their strategy about how they serve account holders, and execute against their mission than it is that there's some universal set of top three AI projects everybody should be pursuing.

Corey Gross

And I think we've historically seen in each of these technology waves, that's where folks get it wrong. They read a McKinsey report, or they read a XYZ analyst report or thought piece, and they all think that it's a one-size-fits-all approach to big data, or a one-size-fits-all approach to cloud, and really the nuance is what is the mission and vision of your institution? What is the strategy that you've built to help get you there? And that should inform what initiatives that you prioritize when it comes to AI.

Adam Blue

All right, fantastic. Well, thanks for the time today, Corey. I think this was really great. I think people are going to enjoy it. Here at the very end, what we like to do is recommend a companion piece of media, or content, or something to think about. So, here's a deep cut. I bet you've seen this film, “36th Chamber of Shaolin” from 1978 starring Gordon Liu, and produced by the incomparable Shaw Brothers. And it is a fantastic film. I watched it again recently. I'd seen it three or four times as a kid on Saturday afternoons, but it is a classic of sort of mid- to late ‘70s martial arts exploitation cinema, but it's a real film. There's a fantastic story. There's a little bit of history of the oppression of the Chinese people during the Manchu Dynasty. There is a lot of very loud kung fu, which makes the film very entertaining, but there's a message there that I won't spoil that's really fantastic. So watch the podcast today, think about it, and then watch “36th Chamber of Shaolin,” turn it up loud, and rip off the knob. Thanks for being on Cut the Context today, Corey. Appreciate it.

Corey Gross

Anytime. Thank you.

View full post

Beyond the Chatbot: AI and the Future of Digital Banking

Watch

Subscribe

Related Links

Transcript