June 17, 2021

To CAPTCHA or not to CAPTCHA?

Amongst the varied and interesting topics in the field of digital fraud detection and identity proofing that I discuss with my clients is the topic of bot detection and mitigation. And when on this topic, I can usually guarantee a lively debate when we discuss CAPTCHAs. I find that the clients tend to divide quite cleanly into two camps of those being for or against using CAPTCHAs. And in the latter camp, they tend to have a rather visceral reaction to the notion of using a CAPTCHA and are militantly against it.

I find the topic of CAPTCHAs supremely fascinating. Let’s take a quick detour into the stages of their evolution (if you don’t want the history lesson, just skip to the next section).
The evolution of CAPTCHA

Stage1. CAPTCHAs first started to be used in the early 1990s, most prominently by search engine Alta Vista that had a huge spam problem – they introduced the concept of the warped letters.
Stage 2. After this, reCAPTCHA was created, which involved a pair of words – and was the start of using this process to solve AI problems, in this case book scanning. The first word was known to the system, the second word was unidentified to the system and was part of a book scan. Once a number of people had typed in the same answer for the second word, it would be resolved for the book scan.  Google acquired reCAPTCHA.
Stage 3. ReCAPTCHA 2 then came along, which involves ticking a box to say you’re not a robot, and Google doing checks on your user behaviour, and then a potential secondary check, the now infamous selection of images. This is another good example of CAPTCHAs being used to solve machine vision problems, in this case in the traffic environment, one would assume for self-driving cars. Given the focus in academia and commercially on this particular machine vision problem, bad actors have been able to leverage lots of tools to develop systems to defeat them.  In fact, given that Google Cloud sells machine-learning systems, it’s entirely likely that some of Google’s servers are creating CAPTCHAs, and others are breaking them.
Stage 4. Google has rolled out reCAPTCHA v3 which is nothing more than the checkbox, and appears to use the same kind of tools that most bot detection vendors do in terms of user behaviour analysis.
Stage 5. But Google is not the only player in town. CAPTCHAs continue to evolve, with many vendors investing heavily in their own version. Some claim to have exceptionally high solve rates for humans and low solve rates for bots.  The approach that some have taken is to focus on using machine vision challenges that are easy for humans to solve and also have no commercial applications, so that unlike the self-driving car applications, bad actors can’t leverage academic, commercial and open source research to build tools to solve these CAPTCHAs.

The case for and against
So back to the point in hand – why do some businesses like to use CAPTCHAs? Well, the logic is straightforward. If your bot detection solution believes the user is a bot, then you can block that user. But no solution is perfect, what if the user is not a bot? Then you’ve just blocked a good user. And so the CAPTCHA represents an opportunity, it represents hope, that if this is actually a good user then they can prove it by solving the CAPTCHA and continue on their way.

So why do some businesses hate CAPTCHAs? Well, the solve-rate for (good) humans on CAPTCHAs can sometimes be quite low, resulting in good users not being to proceed. Plus, for those good users who do solve the CAPTCHA, it’s often an annoying addition to the UX. And the solve rates for bad actors on CAPTCHAs can sometimes be quite high………….although this is a nuanced point since in some cases yes perhaps bots have been trained to solve the CAPTCHAs, but in others cases the bots hand over the session to a human to solve the CAPTCHA – there is a thriving industry of human-powered CAPTCHA solving services – typically people in low-income environments being paid a pittance per CAPTCHA, solving thousands of them daily.
So should you use a CAPTCHA or not?
I think so. Just don’t use the ‘pick a street sign from this matrix of images’ Google version of a CAPTCHA. There are many far more evolved CAPTCHAs available today from the likes of Arkose Labs, GeeTest or PerimeterX that have their own approaches and nuances but consistently do a better job than the dreaded matrix of traffic images. Given the pressure to reduce false positives on most digital commerce businesses, giving users a chance to prove that they’re human and not just blocking them is worth exploring. In a world where bad actors use humans to augment bots, though, you need a CAPTCHA that’s smart enough to detect when the humans solving it are just a little too fast – perhaps a sign that they spend their day solving these things over and over.  When that’s detected, the CAPTCHA should dynamically be made harder to discourage such activity and render it economically nonviable. Using a CAPTCHA also forces interaction with the user in a controlled manner, allowing more telemetry to be obtained about the user and their level of humanity.

I always advise clients to perform A/B testing………split your traffic and use a CAPTCHA on one segment only and compare the results – make decisions based on actual data and metrics, not preconceptions based on outdated approaches such as the traffic image matrix.

I’m always interested to hear about experiences of implementing different CAPTCHAs – reach out and let me know what you experienced!

As an interesting final aside, Amazon filed a patent in 2017 for a new type of CAPTCHA that is easy for machines to solve, but presents a visual challenge that humans would typically get wrong – and thus the process is subverted – human fallibility may in fact be the future when it comes to defeating bots……..

 

BrandPost: Justice is Blind – But Your Law Firm Shouldn’t Be

Would Perry Mason have leveraged metadata and AI? The encyclopedic knowledge and quick recall of his previous cases gave the fictional barrister and his clients an instant advantage in the courtroom. But few real-world lawyers have the mind of a Mason. Fortunately, we live in the digital era, in which lawyers can find information quickly thanks to an AI-enabled, metadata-driven enterprise search and knowledge management system.Law practices often tout their experience in handling certain types of cases. But if the firm’s lawyers can’t find information about that litigation, it’s like those cases never happened. Ingestion, storage and retrieval of data is critical to the institutional memory of a successful practice. When data enters a law firm, it must be profiled. Smart profiling that is AI-enabled works automatically to identify a document’s content and to tag it in metadata. Then the document can be searched on, shared, updated, protected, transferred, governed, and retired. When it comes time to retrieve relevant data, AI algorithms search for information intelligently, finding topics that are closely related, misspelled, or otherwise valuable.To read this article in full, please click here

BrandPost: 20 Years of the Agile Manifesto: Looking Back and Accelerating Forward

This year, we celebrate the 20th anniversary of the Agile Manifesto. Over the last two decades, I have watched organizations evolve and teams emerge to become increasingly effective, efficient and collaborative by implementing the Agile values and principles.Looking BackIf we could turn back time, we’d see how most corporate workplaces primarily consisted of row after row of small cubicles, with some private offices around the exterior walls. Where we worked was a reflection of how we worked. After many long months of solo effort, we would attempt to integrate everyone’s work, a messy process that invariably led to significant additional work and costs.To read this article in full, please click here

BrandPost: Robotic process automation comes of age

Chief information officers (CIOs) understand that the IT technologies and operations they oversee are in states of continual flux and evolution. Even with this knowledge, however, it can be difficult for these decision makers to stay abreast of every advance that could bring value to the organizations they serve.The field of robotic process automation (RPA) is one such area, where the growing sophistication and reach of the technology may have escaped those not closely tracking this discipline. RPA – most commonly understood as a technology for automating repetitive, manual tasks in which people interact with computers – has grown much more powerful and multifaceted in recent years.To read this article in full, please click here

BrandPost: How automation and AI can be used to improve business resilience today

How automation and AI can be used to improve business resilience todayMembers of IDG’s Influencer Network weigh in on the transformative power of these two technologies.As a recent article on CIO.com observed, the pandemic “has seen accelerated interest in process automation as organizations have scrambled to overhaul business processes and double down on digital transformations in response to disruptions brought about by COVID-19. And for IT leaders stepping into or already steeped in such modernization efforts, artificial intelligence — mainly in the form of machine learning — holds the promise to revolutionize automation, pushing them closer to their end-to-end process automation dreams.”To read this article in full, please click here

OT security on the shop floor

Managing your OT estate and knowing what’s connected to it, are two of the main challenges businesses face that leave them open to cyberattacks. Learn

Re-engineering the Decision – Update

Back in April I blogged about Re-engineering the Decision – Our Storyline for Data and Analytics that shared some of the insight into our main focus for data and analytics going forward.  Re-engineering the decision is the aspirational focus for which effective, outcome-driven D&A will help business and executive leaders re-engineer their decision-making capabilities.  Re-engineering the decision effectively helps executive leaders make their decision-making capabilities a competitive differentiator, again.

In that original blog I shared the original overview and some additional press clippings that demonstrated the importance and power of re-engineering decisions.

I wanted to provide an update.  Our Marketing team has been busy bumping up the pressure on the topic with a steady stream of decision-making information assets.  Here are a couple of them.

Map Your Way to More Efficient Growth Investments
Why Do Executives Move Forward With Strategic Initiatives Even When They See Pitfalls Ahead?
How Machine Customers Will Impact Future Decision-Making

You can access the whole set of assets here.

Additionally, our own research has been moving forward and we have made inroads into exploiting and connecting more strongly with (business) applications and software engineering.  This is not an IT or technical connection per se; more and more business roles in marketing, sales, HR, finance, supply chain, etc. embed various degrees of D&A, application and SWE capability.  The valuable distinction here is not “business and IT” but “data and analytics, AI and apps wherever it exists”.

Here are some of the additional research assets:

Top Trends in Data and Analytics for 2021: Engineering Decision Intelligence
Improve Critical Business Outcomes With Real-Time Data-Driven Insights
Should Your Project Use a Decision Management Suite?
2 Steps to Improve Business Decisions Using Data and Analytics

Composability is part of the “how” decisions can be re-engineered at speed:

Top Trends in Data and Analytics for 2021: Composable Data and Analytics
Strategic Architecture Roadmap for Composable Enterprise Applications (Presentation)
Composable Analytics Shapes the Future of Analytics Applications

Data Fabric sits at the heart of D&A and the emerging composable platform:

Infographic: An Intelligent Composable Business Demands a Data Fabric
Top Trends in Data and Analytics for 2021: Data Fabric Is the Foundation

For the Chief Data Officer we also have a Leadership Vision deck that operates as a desk-reference for the year ahead.  This one was published in Q3 last year for 2021 (in case you missed it) and this provides a view for how data, analytics and AI all help with re-engineering the decision.  There is a recorded webinar too in case you prefer that format.

Keep an eye on the topic of decision making.  It’s all starting to change…again…for the better.  More research is coming on decision modeling; how to organize for effective delivery (XOps), how to value decisions; how to measure the impact of decisions; and how to connect decisions and their dependencies together across value streams.

Bias isn’t the only problem with credit scores—and no, AI can’t help

We already knew that biased data and biased algorithms skew automated decision-making in a way that disadvantages low-income and minority groups. For example, software used by banks to predict whether or not someone will pay back credit-card debt typically favors wealthier white applicants. Many researchers and a slew of start-ups are trying to fix the problem by making these algorithms more fair.   But in the biggest ever study of real-world mortgage data, economists Laura Blattner at Stanford University and Scott Nelson at the University of Chicago show that differences in mortgage approval between minority and majority groups is not just down to bias, but to the fact that minority and low-income groups have less data in their credit histories. This means that when this data is used to calculate a credit score and this credit score used to make a prediction on loan default, then that prediction will be less precise. It is this lack of precision that leads to inequality, not just bias. The implications are stark: fairer algorithms won’t fix the problem.  “It’s a really striking result,” says Ashesh Rambachan, who studies machine learning and economics at Harvard University, but was not involved in the study. Bias and patchy credit records have been hot issues for some time, but this is the first large-scale experiment that looks at loan applications of millions of real people. Credit scores squeeze a range of socio-economic data, such as employment history, financial records, and purchasing habits, into a single number.

CIO Portal