Testing the AI Chatbots - Pat Walsh Tests

Testing the AI Chatbots – Perplexity AI

Perplexity AI

Tested iOS App version 2.13.0 in February 2024

Answer to Question 1

Screenshot of Perplexity AI answer to Question 1 – When was the Battle of Hastings? – showing the way Perplexity AI displays its results, with sources, images gallery etc

Accuracy Issues

Screenshot of Perplexity AI answer to Question 6 – Who has won the Best Director Oscar the most times?

This shows one of the accuracy issues that Perplexity AI got marked down on, it got the actual answer correct = John Ford with 4 Best Director Oscars – but then it incorrectly states ‘He is followed by Francis Ford Coppola, who has won the award five times’. Francis Ford Coppola has won 1 Best Director Oscar, but overall has won 5 Oscars, so the issue is really the wording as it states ‘the award’ when the last reference to the award was ‘Best Director Oscar’ not all Oscars.

Also the follow up question – How many times has Francis Ford Coppola won the Best Director Oscar? leads to inaccuracies, as in the original test in February it stated he has won it 3 times and in a retry on 9 March, it says he has won it 2 times – whereas he has won it for The Godfather Part II, so 1 time only.

I assume over time and updated versions of the software and models, these kind of issues will gradually disappear, but its an interesting test and result as its for a question that you can check yourself and check if the answer is correct – in other cases, you might be relying on the answer provided.

Perplexity AI scores were as follows:

Ease of Use / Usability / UX / UI – Score = 9
Very good all way through. Easy to use and navigate. Enjoyed working with the answers.
Accuracy – Score = 8
2 points off as errors in some of the answers – see the explanation and screenshots above, for Question 6 and the follow up question for Question 6 – but, other than that, very good. The errors were for questions in fairly specific areas but should still have supplied correct answers.
Response time – Score = 10
Very quick all the way through.
Error handling – Score = 10
Worked well with garbled questions and nonsense questions and garbage data.
Source quotation/Provenance – Score = 10
Gave 4 or 5 sources every time and the ones I tapped were relevant.
Working with the response e.g. further options – Score = 10
Provides text, images, sources and chance to share the answer so can view it on desktop browser for instance.
Overall rating and score – Overall Score = 9.5
Very good all round, only negative points are a couple of errors in the answer text for Q6 and follow up question for Q6. Answered general questions, more specific questions, and handled nonsense questions and garbage data.
Ranking = 1
Overview – Much fuller experience with answers with more info, images, sources, links than the others. A couple of errors in answer text prevented 10/10

More info on Perplexity AI – main website

According to this Wired article, Perplexity AI uses a combination of OpenAI’s GPT, Meta’s open source Llama AI model, a model from French startup Mistral AI and Anthropic’s Claude – all wrapped up in Perplexity’s own ‘answer engine’.

All the posts

I have created a post for each AI Chatbot test in order to fully explore the test results, along with a Results post, where I crown the winning AI Chatbot. You can jump straight there or read each test post via the links below.

– Testing the AI Chatbots

– Testing the AI Chatbots: Questions

– Testing the AI Chatbots: Perplexity AI

– Testing the AI Chatbots: OpenAI ChatGPT

– Testing the AI Chatbots: Microsoft Copilot

– Testing the AI Chatbots: Google Gemini

– Testing the AI Chatbots: Results

Note: The heading images used in these posts were created via Bing Image Creator

About my testing services: iOS App Testing / Android App Testing / Website Testing / AI Chatbot Testing

Pat is an excellent one-stop-shop for testing Apps and Websites. He is a self-starter who just knows what needs to be done and gets on with it, and is responsive to requests for any specific testing requirements. He provides test plans and reports as a matter of course and was a pleasure to work with. I will definitely be working with Pat again if the opportunity arises.

Andrew Birley, PM, Storythings

Pat Walsh provided Kingston Valdes with a professional and personable service that was excellent value. The site was built and tested quickly, we are delighted with the results, and we have already had positive feedback from some of our clients. If you’re thinking of building a new site or adapting an existing one, I’d certainly recommend speaking with Pat about it.

Kathy Valdes, Kingston Valdes

We’ve worked with Pat Walsh on a number of very different projects, from interactive video projects to mobile apps. His work is very thorough and has quickly become an essential and important phase of all the production work we do at Storythings. He’s thorough, diligent, and a pleasure to work with.

Matt Locke, Director, Storythings

Pat Walsh fixed my site within hours of my contacting him – his WordPress website fix service clearly works!

Douglas Green, Douglas Software

Couldn’t recommend Pat highly enough. He goes beyond what you’d expect of a QA tester, commenting more broadly on user experience and even design. He’s quick, efficient and thorough – couldn’t ask for any more.

Rob Calvert, Director, Click

Pat has been absolutely incredible at testing each of our projects he works on. He’s very professional, organised and diligent. I look forward to using him again in the future.

Oliver Thurtle, Director of Technology, Drum

Pat came to Composed to respond to our (increasingly urgent) need of QA on mobile and web and I can testify he more than fulfilled the task. On the Android side, his very structured approach (first exploratory testing, then focus on defining test plans for a variety of scopes) and his clear knowledge of the latest mobile technologies have quickly paid off. He was involved directly with the developers (including myself) and this very dynamic relationship enabled us to annihilate bugs swiftly and efficiently.

David Ferrand, Android Developer, Composed

Pat performs comprehensive website testing for us, across all the main browsers and platforms, including mobile devices. His flexible approach to testing and clear feedback enable us to have confidence that the business end of our website is still working smoothly after system updates and changes.

Steve Liu, Manager, Wicked Lasers

We hired Pat to help with QA of a music streaming app. We needed someone with that magic “6th sense” of where the bugs are lurking; the ones which developers tend to miss. Pat more than fulfilled this need, with highly methodical approach and a great eye for detail.

Subsequently Pat has provided QA services for a personal app project of mine, finding a significant issue which had been lurking for some time.

Pat is a real asset to any team – highly competent technically but also easy going and a great guy to work with. Highly recommended!

Andrew Ebling, Head of App Development, Composed

Testing the AI Chatbots – Perplexity AI

Testing the AI Chatbots – Perplexity AI

Share this to social media...