PROJECT GOALS

Evaluate the usability of two Museum of Fine Arts websites (Boston and Montreal)
Identify areas of improvements

RESEARCH QUESTIONS

How does the usability of the two websites compare with one another?
How much does the website experience impact people’s willingness to visit the museum?

METHOD

Remote, unmoderated study on Loop11
Between-subjects design
Quantitative and qualitative

RESULTS

Overall, the Boston site is more usable than the Montreal site
The Montreal site needs to focus on improvement on its information architecture (navigation, terminology, content organization, etc)
The Boston site needs to focus on improvement on visual and interaction design details
Website is an important component that contributes to whether people are likely to visit the museum

Museum Websites Usability Comparison Study

Boston website.(Left), Montreal website (right)

Museum websites usually have inviting graphics about their exhibitions to entice people to visit. But does an aesthetically pleasing website mean the website is usable? Should museums even care about the usability of their website? Trish, Tarry, Taylor, and I decided to investigate this as a class project (we were not affiliated with the museums).

We selected the Museum of Fine Arts (Boston) and the Montreal Museum of Fine Arts as the focus of the study. The two museums are of similar sizes and have similar content on their websites. However, the sites have distinctly different visual aesthetic, and the information is organized differently on the two sites. We were able to design a study so participants can conduct similar tasks on the two different sites and still be able to compare their performances.

STUDY DESIGN

Between-Subjects

We designed the study to be between-subjects, so that one particular participant only had to conduct the tasks on website. The main reason is that we used a friends and family recruit with no incentives, and we would be asking too much of their time if they are asked to complete tasks on both sites.

Participants

We aimed to get 10 participants for each site (20 total), and ended up with 13 for Boston, and 15 for Montreal. However, some participants completed the study on their mobile devices (instead of desktop), so we had to remove their data. Additionally, some participants had issues with the Loop11 platform, and ended up attempting to complete the tasks multiple times. Since this would distort the performance data, we removed those as well. Eventually, we had 11 participants for both sites in our analysis.

Remote, Unmoderated Testing

We set up the tasks and the pre-/post-task questions on Loop11. We required participants to record their screens so we could support our quantitative analysis with qualitative data.

Metrics

Performance metrics

Task success (Accuracy)
Time on task
Lostness (Efficiency)

Self-reported metrics

Task ease rating: before (expectation) and after task (actual)
System Usability Scale (SUS)
Likes and dislikes about the website
Likelihood to visit the museum

Tasks

Since we were interested in general usability, we chose some of the most common tasks visitors would complete on the websites for this study:

If you are planning on visiting the museum TODAY, what tours and events are available?
What are the featured exhibitions at the museum on display right now?
How much are tickets to see both the featured exhibitions and collections for a 27-year-old adult?
What types of items are for purchase in the museum gift shop?
What are the benefits associated with a museum membership at the least expensive level (for example, admission, events, tours, discounts)?
What are the accommodations the museum provides for individuals with disabilities?

Please follow this link to see the full study protocol and the success criteria.

ANALYSIS: PERFORMANCE DATA

Task Success

At the site level, participants on the Boston site had a statistically significantly higher task success rate than the participants on the Montreal site (T-test significance <0.0001).

At the task level, participants on the Boston site outperforms those on the Montreal site for all the tasks except Task 1 (Tours and events available today). After looking into the qualitative data, I found that 2 of the 4 participants who failed landed on the exhibitions page, presumably because they misread the task question.

MONTREAL SITE: Confusing terminology for gift shop in task 4.

For Task 2 (Current exhibitions), Task 4 (Gift shop), Task 5 (Membership benefits), and Task 6 (Accessibility), participants on the Montreal site performed statistically significantly worse. Qualitative data showed that 2 participants who failed Task 2 because the website switched to French and they didn’t know what was going on.

“My damn page automatically translated from one language to another, which was annoying.”

For Task 4, a lot of participants clicked on the “Shop Online“ link in the utility navigation, only to find out that was the link to purchase admission tickets, not to the gift shop. The actual gift shop link is under “Information“ - “M Boutique and Bookstore“.

MONTREAL SITE: VIP CARD (Membership benefits) section.

For Task 5, a lot of participants failed because they didn’t see what the “VIP Card“ actually meant. The site actually did explain the details of the VIP Card (free access to the museum, special events and discounts), but the information was at the bottom of the page. Most participants who failed did not see the information at the bottom of the page, and just copied what is listed at the top (shown in the image on the right).

“Everything is sent by email. Not exactly a benefit, but listed nonetheless.”

For Task 6, the Boston site had all the accessibility information on one-page, which makes it very easy to find. However, Montreal site has the accessibility information spread on multiple pages (under “Guidelines to Visitors“ and “Location and Direction“). This resulted in a high failure rate for participants on the Montreal site.

Time on Task

Average task time at the site level and at the task level Error bars represent 90% confidence interval.

At the site level, participants on the Boston site took more time to complete the tasks (T-test significance <0.001).

At the task level, participants on the Boston site took an extremely high amount of time compared to those on the Montreal site to complete Task 3 (Admission). I triangulated this with the qualitative data, and found out the participants are spending time to look up how much exhibition tickets cost after they’ve found the admission tickets (even though general admission includes the exhibition tickets). This could be a result of how the question was worded:

“How much are tickets to see both the featured exhibitions and collections for a 27-year-old adult?”

The question was worded this way because the Montreal Museum offers free general admission for their own collections but charges $15 for featured exhibition admissions for visitors under the age of 30. When we designed the tasks, we wanted to make sure people understand the pricing structure. When we conducted the pilot study, only the Montreal site was tested because it was the harder one to use. We did not anticipate the question wording would influence the Boston participants so much.

Lostness

The lostness score was calculated as a function of how many pages participants had to go through to find the information they wanted. At the site level, participants on the Montreal site were significantly more lost than those on the Boston site (T-test significance = 0.04). At the task level, there were also significant differences for Task 3 (Admission), Task 5 (Membership benefits), and Task 6 (Accessibility). The same factors discussed above that impacted task success for these tasks also impacted the lostness score.

Participants are confused by which terminology refers to museum membership.

For Task 5, participants also seemed to be confused by the terminologies used by the Montreal website to describe museum membership. Many participants clicked around on the VIP page (membership) and the different Philanthropic Circles pages (annual giving).

ANALYSIS: SELF-REPORTED DATA

Task Ease

I compared the pre-task expected task ease rating and the post-task task ease rating for the two sites, and found that the difference in individual ratings (actual-expected) is significant between the two sites (T-test significance = 0.02). The average individual differences in ratings for is 0.06 for Boston, and -0.39 for Montreal. Meaning the Boston site fulfilled the participants’ expectations but the Montreal site was much harder to use than the participants had expected.

I also plotted the average expectation and actual task ease rating (by task) to see if any would fall into the “big opportunity“ or “fix it fast“ category. However, because of the small sample, all of the ratings fall in the “don’t touch it“ category. However, when zoomed in to that quadrant, all the ratings except one (Boston Task 3, where our question wording was confusing) in the lower half are the Montreal tasks.

Average Expectation vs. Actual Task Ease rating.

SYSTEM USABILITY SCORE (SUS)

The Boston site received an average SUS score of 77, while the Montreal site received an average SUS score of 62. According to MeasuringU, the average SUS score across different types of systems is 68. According to the same scale, Boston site scored a B+ while the Montreal site scored a C-. The difference between the two SUS scores is also statistically significant (T-test significance = 0.015), meaning participants thought the Boston site was far more usable than the Montreal site.

Likes and Dislikes

Percentage of positive and negative comments by participants.

We calculated the percentage of positive and negative comments participants gave throughout the study as well as in the “Are there any specific things you liked or disliked about this website?” question. Overall, participants provided more negative comments than positive ones. Even though the Montreal site received far more negative comments than the Boston site, it also received more positive comments, comparatively.

After closer examination, we found that participants generally liked how the information was organized on the Boston site, and the visual appeal of the Montreal site. On the flip side, participants disliked some of the visual and interaction design elements on the Boston site, and the information architecture (navigation, terminology, organization of content) on the Montreal site.

Example positive and negative comments for the two sites.

Likelihood to visit

Average likelihood of visiting the museums.

All the performance and self-reported metrics have shown that the Boston site is more usable than the Montreal site, so we were shocked when we found there were no differences in participants’ likelihood to visit the museums (T-test significance = 0.702).

We looked at the reasons participants listed for providing their rating. 3 participants explicitly mentioned the website experience does not influence their museum visits, and 5 participants mentioned they were influenced (positively by the visuals/exhibits, and negatively by the usability of the website). Most participants offered other reasons that impact their likelihood to visit, such as general interest in museums (7 of 22), exhibits (6 of 22), distance (6 of 22), cost (2 of 22), and circumstances such as timing and friend availability (2 of 22).

We triangulated this finding with the qualitative questions we asked in the beginning of the study. We found that when planning a trip to the museum, most people would conduct web searches (on museum site, Google, or Tripadvisor) to find out information such as exhibits, hours, location, cost, reviews, and services. A small number of people would prefer to not plan at all and just to show up and explore the museum on their own. We also asked what matters most to them when they are on a museum website, and most people responded with being able to find logistics information (such as hours, cost, direction) or find out how to plan their day at the museum (such as exhibits, maps, food options). Some people also mentioned that they wanted the website to be visually appealing and easy to navigate. Essentially, people wanted to be able to find the information they need easily on the website. So even though it is not explicitly mentioned much that website usability would impact visitor’s likelihood of visit, website experience is still influential.

LEARNINGS

Start early

My team started to write the study protocol much earlier than the other teams in the class, and was ready to launch the study about 2 weeks before anyone else. This gave us extra cushion time for the unexpected hiccups in the study - participants having issues with the Loop11 platform, having to recruit more participants than initially intended, etc. Even with unexpected interruptions, we still had enough time to conduct thorough data analysis and triangulate among the qualitative and quantitative data we received.

Pilot test all the sites

For this study, we only pilot tested the Montreal site because it was more difficult to navigate. We did not anticipate the question wording for Task 3 would have such a big impact on participants’ interpretations. We actually experienced similar issues for Task 4, where we asked participants to find the page that would tell them “What types of items are for purchase in the museum gift shop.” The question was worded this way because the Boston site had an actual online gift shop where visitors can purchase items directly, but the Montreal site only had an overview page explaining what types of items are in the physical store. Some participants on the Boston site ended up exploring the actual items for sale after they landed on the right page, and this inflated their task time 2 to 3 times. Luckily, since the goal was to see whether participants can find the right page, I was able to go back to the video recordings and adjust the task times manually. If we had pilot tested the questions on both sites, we would have caught the problems a lot sooner.

SPECIAL THANKS

Special thanks to my teammates, Trish, Tarry, and Taylor!