SSRN Id3681546
SSRN Id3681546
SSRN Id3681546
Hartmut Hoehle
Department of Information Systems
Sam M. Walton College of Business
University of Arkansas
U.S.A.
hartmut@hartmuthoehle.com
Viswanath Venkatesh
Department of Information Systems
Sam M. Walton College of Business
University of Arkansas
U.S.A.
KEYWORDS: Usability, mobile applications, survey instrument development, continued use, mobile
application loyalty
INTRODUCTION
In most developed countries, mobile phone penetration rates have reached over 100% per capita,
with individuals often owning more than one mobile phone (Wood 2010). Google announced that 500,000
new Android-based devices are registered every day (Reisinger 2011). In conjunction with this trend, over
the last 5 years or so, mobile phone technology has changed significantly, with devices and operating
systems becoming more sophisticated. These developments have led to a large variety of mobile
applications designed for smartphone operating systems provided by mobile operating system vendors,
such as Apple, Google, and Microsoft. For instance, Apple’s iStore includes over 1 million mobile
applications providing consumers with access to a variety of services via iPhones. As of May 2013, the
available mobile applications had been downloaded more than 50 billion times from Apple’s iTunes store
(iTunes 2013). These statistics accentuate that it is vital for organizations to integrate mobile applications
Despite these trends and a recognition that they miss out business opportunities, two-thirds of all
firms selling consumer goods do not have established mobile strategies (Forrester Research 2011).
Integrating mobile channels into existing strategies is a major challenge for firms. First, the development of
considerable investments by firms (Deloitte 2012; Forrester Research 2011). Recent studies show that
mobile applications can cost up to millions of dollars to develop (Deloitte 2012). Second, despite the
success of some, the majority of mobile applications fail outright or are not as successful as expected
(Deloitte 2012). For instance, only 1% of all mobile applications have been downloaded more than one
million times and once downloaded, one in four mobile applications are never used again. Likewise, 80% of
all branded mobile applications are downloaded less than 1,000 times and the most popular 10% of mobile
applications generate 80% of all downloads in Apple’s iTunes store (Deloitte 2012; Dredge 2011). Latest
market research suggests that the lack of usability has been identified as the most important factor
influencing consumers’ decisions to reject mobile applications (Deloitte 2012; Forrester Research 2011;
Youens 2011).
The lack of mobile application usability can be due to various reasons. For example, many mobile
applications do not prioritize the most essential aspects of the application and content is ineffectively
presented that in turn negatively influences user interactions (Forrester Research 2011; Youens 2011).
Further, instead of developing applications that account for the unique characteristics of mobile devices
(e.g., small screens, tiny input mechanisms, various screen resolutions), firms often identically replicate the
content of their Internet-based applications on mobile channels (Forrester Research 2011). Presenting a
large body of content on mobile devices is problematic because the application interface is overloaded with
information, site links, and text (Adipat et al. 2011). As a result, users feel overwhelmed and become
frustrated because the application does not emphasize one of the key usability principles for mobile
applications, such as not using large buttons to facilitate the data input because interface space is
sacrificed for detailed information Deloitte 2012). To guide practitioners in developing mobile applications,
leading operating system vendors (e.g., Apple, Google, Microsoft) offer guidelines that aim to help mobile
application developers build better applications. However, neither do these guidelines provide points of
2012). For example, Apple’s mobile user experience guidelines recommend: “Beautiful artwork also helps
to build your app’s brand in people’s eyes” (Apple 2012). Although this suggests artwork is important, it
does not establish how important it is, when it is important (i.e., types of applications) and whether a
particular application has done this effectively (i.e., evaluation). Rigorously developed and comprehensive
survey instruments can help practitioners in achieving better mobile application design and evaluation. This
is due to the fact that such instruments support application designers in quantifying the impact of a given
instruments for holistically measuring mobile application usability. First, we found that much research
conceptualized and measured mobile application usability without proactively taking the mobile context into
consideration. Instead of integrating critical contextual factors for mobile applications, such as tiny input
mechanisms (Kurniawan 2008), we found that field studies typically used instruments that evolved from
website usability, much like website usability before it evolved from software usability (see Venkatesh and
Ramesh 2006). Second, related to the void of context-specific conceptualizations of mobile application
usability, our literature review suggested that prior research treated mobile application usability as
performance using metrics like response time and error rates related to tasks users performed (e.g.,
Burigat et al. 2008; Hummel et al. 2008; Li et al. 2009). Although these measurement techniques account
for the mobile application usability context and are useful for evaluating the interplay among individuals
(e.g., users), technology (e.g., mobile application) and tasks (e.g., typing text), these features are typically
studied in a piecemeal fashion and are therefore less suited for providing a comprehensive understanding
of the antecedents of mobile application usability. Therefore, while a useful starting point, these studies do
not provide a comprehensive view of mobile application usability, thereby potentially neglecting critical
factors pertaining to various aspects of the mobile application usability context. Third, we found that prior
including satisfaction with system use, ease of learning and effectiveness of mobile applications (e.g.,
Gebauer et al. 2007; Lee et al. 2007; Min et al. 2009; Urbaczewski and Koivisto 2007). Associating these
concepts with mobile application usability seems problematic as this practice could result in interpretational
confounding (Burt 1976) that occurs when the empirical meaning of a latent variable differs from the
meaning assigned by a researcher (Bollen 2007). We concur that mobile application usability leads to
higher consumer satisfaction but both concepts are conceptually different and should be treated as such.
The same principle applies to the concept of ease of learning and mobile application effectiveness.
Against the backdrop of these issues, we think it is important to think from the ground up about
mobile application usability as it will “contribute to our understanding of current technological and
organizational problems or challenges faced by IS or other practitioners” (Straub and Ang 2011, p. 3). A
context-specific and comprehensive survey instrument for measuring mobile application usability would be
appropriate to start addressing the above-mentioned practical issues. Such an instrument would also
advance our theoretical understanding of mobile application usability by being more precise and limiting
interpretational confounding. We argue it is critical that IS and HCI research can draw on a comprehensive
mobile application usability conceptualization and associated instrument that integrates context-specific
factors for two major reasons. First, recent advances on theory development and knowledge creation
suggests that theories and models grounded in specific contexts reveal rich insights (Alvesson and
Kärreman 2007; Bamberger 2008; Brown et al. 2010; Hong et al. 2014; Johns 2006; Van der Heijden 2004;
Venkatesh et al. 2012). Second, a holistic instrument integrating IT-specific antecedents addresses recent
calls for more specific theoretical models that step “backward toward IT, implementation, and design
factors, leading to research that is able to provide actionable advice” (Benbasat and Barki 2007, p. 213). To
address these gaps, this paper develops a context-specific and fine-grained conceptualization and
measurement of mobile application usability. In order to achieve this goal, we adapt Apple’s user
extant literature on mobile application usability, and develop and validate a survey instrument following the
This work is expected to make key scientific and practical contributions. Having a context-specific
conceptualization and measurement of mobile application usability will aid theory development in IS and
HCI research alike. Specifically, our comprehensive view on mobile application usability centers around the
IT artifact, and we provide conceptual clarity and develop and validate an associated survey instrument.
Our mobile application usability instrument can be used as a springboard for theoretically motivated studies
that allow researchers to develop a cumulative research tradition in this emerging research area.
Practitioners will benefit from a comprehensive measurement instrument as our IT-centric view of mobile
application usability will help them to determine the most desirable features of mobile applications by
surveying current or potential customers during the design, planning, and development phases of mobile
applications. During the maintenance and review phases of mobile applications, our instrument can be
used to identify desirable mobile application functions and design features. This should help practitioners to
design more successful mobile applications and integrate mobile applications into their channel strategies
more effectively. This is particularly important as latest market research suggests that practitioners are in
need of theoretical frameworks dedicated to analyze and overcome usability problems of mobile
applications (Nielsen 2012). Forrester Research (2011) found that 70% of firms are currently reconsidering
their mobile strategies and they are planning to integrate mobile channels into their distribution channels. A
comprehensive framework for assessing the usability of mobile applications would help practitioners in
designing successful mobile applications (Nielsen 2012; Nielsen Norman Group 2012).
(ISO) definition of usability, as the extent to which a mobile application can be used by specified users to
Venkatesh and Ramesh 2006). A mobile application is an IT software artifact that is specifically developed
for mobile operating systems installed on handheld devices, such as smartphones or tablet computers.
Mobile applications are either pre-installed on mobile devices or can be downloaded from various mobile
application stores (e.g., Apple’s iTunes store). It is important to note that mobile application usability differs
from mobile device usability in that the latter term is commonly referred to the extent to which the operating
system (e.g., iOS) is user friendly (Adipat et al. 2011). Mobile application usability has been studied in a
range of contexts and researchers have used a variety of conceptualizations and measurement
approaches to study the topic. Although there is a considerable amount of literature available on mobile
application usability (Hess et al. 2009; Hong et al. 2004c; Palmer 2003; Thong et al. 2002; Wells et al.
2011a; Wells et al. 2011b), our literature review suggests that there is a lack of theoretically sound
instruments for measuring mobile application usability holistically. Specifically, based on a review of
literature on mobile application usability, we identify three major issues that we discuss next.
First, we found that much research has conceptualized and measured mobile application usability
without integrating critical context-specific factors that are relevant to individuals using mobile applications
(e.g., the mobile application’s ability to display content in horizontal and vertical mode driven by the way the
user holds the mobile device). Instead, we found that most field studies used conceptualizations and
instruments, typically designed for traditional personal computers and websites (see Table 1). For instance,
Venkatesh and Ramesh (2006) applied a conceptualization and scales based on Microsoft’s usability
guidelines (MUG) to a multinational research context and tested the generalizability of the MUG instrument
to mobile sites. The MUG conceptualization and scales were originally developed by Agarwal and
Venkatesh (2002) based on Microsoft’s white paper on usability for assessing the usability of websites (see
Keeker 1997). Although Venkatesh and Ramesh’s (2006) study confirmed that specific aspects of the MUG
conceptualization and scales were found to be more important for mobile contexts (e.g., content, ease of
the context of mobile application usability. By simply examining the generalizability and differences
between website and mobile site usability, it is likely that important requirements of the mobile context were
omitted (e.g., fingertip-sized buttons that are used to select features on small multi-touch screens).
Second, we found much of the research that evaluated mobile application usability was conducted
in laboratory settings. These studies typically used performance measures (e.g., speed, error rates) to
evaluate mobile applications usability. For example, in order to evaluate mobile application usability, Jokela
et al. (2006) asked usability raters to benchmark two application interfaces using several predefined tasks
(e.g., inserting a new entry in a phone book application). Next, the participants were asked to rate each
application interface in terms of its efficiency and its overall usability. Adipat et al. (2011) studied the
hierarchy of mobile applications and conducted an experiment in which they exposed participants to
multiple mobile applications. By manipulating the presentation of mobile sites and the task level, they
confirmed that hierarchical text and colorful presentation techniques influence the perceived usability of
mobile applications. However, we found that many of these studies evaluated mobile application usability in
a piecemeal fashion, rather than attempting to holistically study the concept of mobile application usability
and associated context. Consequently, although able to identify error rates and the time users take to
perform a given task using a mobile application, these studies suffer from their inability to predict and
explain why users took more or less time to perform a given task. A holistic instrument providing an IT-
centric view of mobile applications would help in understanding the triggers that influence such
performance outcomes.
Third, our literature review indicated that research on mobile application usability does not
sufficiently “build upon each others work” (Keen 1980, p. 9) but instead use various conceptualizations and
scales for assessing mobile application usability. For instance, Lee and Benbasat (2003) used seven
design elements for e-commerce sites, namely context, content, community, customization,
concepts commonly seen in the technology acceptance literature (e.g., ease of use; Venkatesh 2000;
Venkatesh et al. 2003a) with HCI principles (e.g., design aesthetics), as well as with concepts from the
marketing research discipline (e.g., satisfaction) (see Cyr et al. 2006; Cyr et al. 2009; Huang et al. 2006;
Urbaczewski and Koivisto 2007). These studies often argued that satisfaction, efficiency, and effectiveness
would represent a user’s perception of mobile application usability. Associating such concepts with mobile
summarizes literature and shows the various conceptualizations and measurement approaches used in
Taken together, this discussion confirms the several issues that we raised earlier. There is a lack of
theoretical and methodological clarity surrounding the overall usability of mobile applications. Next, we
provide a background on Apple’s user experience guidelines, which is a white paper by the leading vendor
10
11
Color, text and menu icons Specific, alternative mobile applications were benchmarked Experiment Sonderegger and
measuring menu color, text and menu icons (empirical) Sauer (2010)
Design, customer needs, satisfaction, General, mobile application usability was evaluated using Likert Field study Kim et al. (2012)
innovativeness, feedback and efficiency scale questionnaire for the identified usability attributes (empirical)
Icon characteristics (semantically close Specific, alternative mobile applications were benchmarked Experiment Leung et al. (2011)
meaning, familiar, labeled and concrete) manipulating icon characteristics (empirical)
Learnability, efficiency, memorability, error, General, deriving mobile application usability attributes from prior Conceptual (non- Zhang and Adipat
satisfaction, effectiveness, simplicity, studies empirical) (2005)
comprehensibility and learning performance
Predictability, learnability, structure principle, General, mobile application usability attributes were measured Field study Ji et al. (2006)
consistency, memorability, familiarity through single item questionnaire (empirical)
Efficiency and direct usability measures General, mobile application usability attributes were measured Experiment Jokela et al. (2006)
through single item questionnaire (empirical)
12
Apple’s iTunes store is the most accepted mobile application store and consumers have
downloaded more than 50 billion apps as of middle of 2013. Gartner (2011) predicts that Apple will remain
the single best-selling store until 2015 and the company reports that Apple holds a market share of more
than 80% in terms of revenues made through mobile applications. Apple’s first iOS was released in 2007
and in four years, the platform has become one of the most widely accepted operating systems for
smartphones (Apple 2012). Much of the success of the iPhone has been attributed to the user friendliness
of the interface of iPhone applications. Apple’s user experience guidelines aim to support developers by
designing user friendly and successful applications for the iPhone and iPad. The guidelines particularly
focus on the user interface design and place less emphasis on technical assistance during the
programming and system development process (Apple 2012). In total, the guidelines include 28 sections
related to the user experience of mobile applications and most sections include recommendations for
developers. For instance, one section recommends using succinct and short text that users can absorb
quickly and easily (Apple 2012). We found Apple’s guidelines particularly suited to inform the development
guidelines published by independent usability consultants and researchers (e.g., Masse 1998; Nielsen
2012; Nielsen Norman Group 2012) but decided to focus on Apple’s user experience guidelines due to the
company’s prevailing position in the mobile application market. We also felt that Apple’s guidelines will help
us to provide a relevant contribution to academics and practitioners alike. Rosemann and Vessey (2008)
suggest that one systematic approach for developing relevant research is that it is “not necessarily based in
theory, [but] involves examining a practical intervention using a well-established, rigorous research
approach” (p. 7). We embrace this notion by employing and evaluating user experience guidelines
developed by practitioners, and rigorously developing the constructs and the associated survey instrument
13
development and validation procedure. This procedure synthesized prior scale development literature
(DeVellis 2011; Straub 1989; Straub et al. 2004), and integrated several methodological strategies for
construct and scale development and validation. In total, the procedure consists of 10 steps that are shown
in Figure 1.
3. Content 4.
1. Construct 2. Measure Measurement 5. Pre-test of
validity
definition development model the scales
assessment
specification
We closely followed these guidelines to develop the constructs and scales to assess the usability
of mobile applications. Below, we discuss how we applied the 10-step procedure outlined by MacKenzie et
al. (2011) in order to develop and validate the mobile application usability conceptualization and measures.
In the discussion of each step, we describe our actions as it relates to the development of the mobile
The first step of the 10-step procedure is to develop a conceptual definition of the constructs
because the lack of a precise and detailed conceptualization of the focal constructs can cause significant
measurement errors during the testing phase (DeVellis 2011; MacKenzie et al. 2011).
To inform the construct conceptualization, the lead author systematically reviewed and analyzed
Apple’s user experience guidelines (Apple 2012). In order to guide this process, the following three
questions were posed: (i) what are the main usability criteria suggested by Apple’s guidelines? (ii) which
14
each keyword? These questions were used to systematically review and code Apple’s comprehensive user
experience guidelines. Strauss and Corbin’s (1990) open and axial coding procedures were followed to
identify conceptually similar themes discussed in Apple’s guidelines. Open coding is the “analytical process
through which concepts are identified and their properties and dimensions are discovered in the data”
(Strauss and Corbin 1990, p. 101). Axial coding is the process of “relating categories to their subcategories,
termed ‘axial’ because coding occurs around the axis of a category, linking categories at the level of
properties and dimensions” (Strauss and Corbin 1990, p. 123). Initially, Apple’s guidelines were carefully
read and the content was coded using Strauss and Corbin’s (1990, p. 119) line-by-line analysis. The most
essential open codes identified are shown in Table 2. Next, the open codes were clustered and
subcategories were formed by summarizing conceptually similar codes. The subcategories are also listed
in Table 2. Then, using axial coding, the open codes were inspected for similarities and/or differences and
For example, two open codes were identified that discussed the concept of branding in mobile
applications: (i) in mobile applications, incorporate a brand’s colors or images in a refined, unobtrusive way;
and (ii) the exception to these guidelines is your mobile application icon, which should be completely
focused on your brand. Next, both open codes were conceptualized as one subcategory that was labeled
as brand appropriately. Then, using axial coding, the major category was labeled as branding. Following
this approach, Apple’s guidelines were initially coded by the lead author and organized in a matrix, as
outlined by Miles and Huberman (1999). Organizing codes in a data matrix is useful to compress coded
information and it supports drawing conclusions (Miles and Huberman 1999). Next, the second author
reviewed the user experience guidelines and associated coding outcomes. In a few cases where there was
a disagreement between the authors, two independent judges who were unfamiliar with the study were
asked to facilitate a discussion in order to reach a coding consensus. Both judges were IS researchers who
15
general, neither of them considered himself/herself as a mobile application usability expert. They also
indicated that they were unfamiliar with the literature on mobile application usability. We purposefully
selected both judges who were unfamiliar with the literature on mobile application usability because we
aimed to obtain unbiased feedback on the codes derived from Apple’s user experience guidelines. Table 2
shows the final matrix derived from Apple’s guidelines. All generated axial codes, shown on the left column
in Table 2, were then used as a base for the construct development. Specifically, we used these axial
codes and related them to the extant usability literature. In some cases, the user experience guidelines
matched existing constructs. For instance, Apple’s guidelines suggest that the content of mobile
applications needs to be relevant to the users and developers should consider the target audience when
designing mobile applications. Prior research studying the usability of web and mobile sites has also
established that the content of web and mobile applications drives users’ intentions to use the applications
(Venkatesh and Ramesh 2006). Thus, the concept of content was informed by practitioner guidelines and
prior research. In some cases, the user experience guidelines suggested construct domains for which we
could not identify existing theoretical conceptualizations or concept definitions in the literature. For instance,
realism was suggested by Apple’s guidelines to be an essential part of mobile applications due to the fact
that users would instantly associate with the application because it would look familiar to them. Examples
for realism might be address books or time and date applications in which traditional designs are used to
better communicate the purpose of the application (e.g., by using a traditional address book design).
Although we found some literature supporting realism in mobile applications (e.g., Kang 2007), our
literature review did not reveal any theoretically motivated constructs with overlapping themes to the
concept of realism.
16
17
18
19
from Apple’s guidelines and existing literature. The leftmost column lists the construct domains we
identified based on the coding procedure explained earlier. The middle columns show the open codes we
derived from Apple’s guidelines and prior work that has studied the identified construct domains. The
rightmost column provides prior research that has studied the construct domains that we identified. In total,
our analysis of Apple’s user experience guidelines led to twenty initial constructs that represented the most
essential concepts outlined in the user experience guidelines. Following the guidelines of MacKenzie et al.
(2011), we conceptualized each construct by identifying the construct entities as well as the entities to
which each construct applies. Table 4 lists all constructs and definitions.
20
21
22
23
24
constructs. This is an important issue for the scale development process, and this needs to be done once
all constructs have been conceptualized and defined (MacKenzie et al. 2011). Constructs sharing a
common theme and similar characteristics should be theoretically abstracted to a higher level and identified
To identify potential higher-order constructs, we carefully examined the constructs shown in Table
4 for conceptual similarities and reviewed the existing literature on each identified construct. During this
phase, we discussed how distinctive the constructs were from each other and if eliminating any of them
would restrict the domain of the construct in a significant way (MacKenzie et al. 2011). We also asked two
judges to help us identifying conceptual similarities among the constructs. Both judges were IS researchers
and unfamiliar with the study’s content. To facilitate this process, both judges were provided with 20 cards.
Each card was labeled with a construct name and associated construct definition (as shown in Table 4) that
was identified in the previous phase of the construct development procedure. The card sorting results were
then compared with the higher-order constructs identified by both authors. After a thorough discussion
between the authors, six second-order constructs were identified to represent aggregations of the 20
third-order constructs, i.e., higher-level abstractions (Rindskopf and Rose 1988; Wetzels et al. 2009).
However, we concluded that each second-order construct represented a unique part of mobile application
usability and it was thus not necessary or useful to form third-order constructs. Table 5 shows all
25
Second-order Construct: Application Design. Application design is the degree to which a user perceives
that a mobile application is generally designed well. Prior literature on application design suggests that
users are influenced by several factors when evaluating the overall design of a mobile application. For
example, it is important that the application preserves data that is input by the user (Adipat et al. 2011). If
users are required to input the same data twice (e.g., when switching from one screen to another), they will
become frustrated and dissatisfied with the mobile application (Adipat et al. 2011; Kurniawan 2008; Tan et
al. 2009). It is also essential that a mobile application is instantly ready to be used after being switched on
and long wait times normally lead to user frustration (see Devaraj et al. 2002; Devaraj et al. 2006; Galletta
et al. 2006). Moreover, due to the fact that mobile phones are handheld devices, systems designers should
consider that the mobile application displays the information well, independent of whether the mobile device
is held horizontally or vertically (Wobbrock et al. 2008). Finally, Apple’s user experience guidelines suggest
that well-designed applications employ subtle branding efforts. This is consistent with marketing research
on website design that suggests that subtle branding efforts can increase consumer satisfaction with overall
website design (Dou et al. 2010). Hence, data preservation, instant start, orientation and branding are
Second-order Construct: Application Utility. Application utility is defined as the degree to which a user
perceives a mobile application generally serves its purpose well. Several factors were proposed in Apple’s
guidelines related to this concept. Mobile applications should focus on the content that is most relevant to
26
what he/she expects and the main purpose of the application should be emphasized (see Hess et al. 2009;
Hong et al. 2004b; Hong et al. 2004c; Li et al. 2009; Thong et al. 2002; Venkatesh and Ramesh 2006;
Wells et al. 2011a; Wells et al. 2011b). For example, prior research on mobile applications has emphasized
the concept of context-awareness, which can be defined as “the ability of an application to detect and
understand its situational context and to adapt its behavior in a user preferred manner accordingly” (Zhang
et al. 2009, p. 29). An example of a context-aware application is a GPS-based mobile application that
provides an interface for viewing map products and managing geographic data in real time. Because timely
and accurate geographic information is most relevant to users of mobile GPS-based map applications
(Kaikkonen et al. 2005; Kakihara and Sorensen 2002), it is important that these context-aware applications
center on location-based information and do not focus on less relevant features (e.g., detailed
environmental information) that could distract the user from the most relevant content of the application.
Likewise, a mobile application should enable users to search for information. If bounded by limited screen-
size, a search function can lead to a better user experience because it helps users to better navigate an
application (Campbell et al. 2009; Hess et al. 2009; Valacich et al. 2007; Wells et al. 2005). Finally, an
application should also help users to share information with others and collaborate with friends and
colleagues (Hess et al. 2009; Oulasvirta et al. 2007). For example, many mobile gaming applications
emphasize this feature and users view this as a useful feature because it leads to a more social user
experience. Thus, content relevance, search and collaboration are proposed as first-order constructs
Second-order Construct: User Interface Graphics. User interface graphics is defined as the degree to
which a user perceives a mobile application’s user interface graphics to be effectively designed. For
example, it is important that animations are designed subtly and are not be used extensively in mobile
applications (see Hess et al. 2005; Hong et al. 2004a; Hong et al. 2007; Lim et al. 2000). Likewise, it would
27
2007). For instance, many pre-installed applications including Apple’s calendar and address book functions
are designed realistically to represent traditional calendars or address books. This would allow users to
instantly recognize the function of an application (Flavian et al. 2006). Finally, the graphics used in a mobile
application should be designed so as to be aesthetically appealing (Aladwani and Palvia 2002; Cyr et al.
2009; Hess et al. 2005; Hong et al. 2007; Kim and Stoel 2004; Parboteeah et al. 2009; Wells et al. 2011a;
Wells et al. 2011b). Therefore, subtle animation, realism and aesthetic graphics are proposed as first-order
Second-order Construct: User Interface Input. User interface input is the degree to which a user
perceives that a mobile application allows easy input of data. For example, fingertip-size controls would
help users to select functions and menus in mobile applications (Kurniawan 2008). Relatively large controls
would be more usable because it will be easier for users to pick desired functions. Further, mobile
application controls should be immediately obvious and intuitive to use (Jokela et al. 2006; Seffah et al.
2006; Sorensen and Altaitoon 2008). Apple’s guidelines suggest that users are not willing to spend much
time learning mobile applications and it is essential that the user interface is clear when using an
application for the first time. The effort it takes users to input data should be minimized and it is important
that a mobile application offers mechanisms for easy data input. This could include dropdown menus or
automated data entry support (see Hong et al. 2004a; Valacich et al. 2007; Wells et al. 2005). Finally,
although user settings should be available in mobile applications, they should be designed to remain in the
background of the application and users should not be frequently prompted to adjust the settings when
using a mobile application (Jokela et al. 2006; Sorensen and Altaitoon 2008; Tan et al. 2009; Xu et al.
2009). Therefore, fingertip-size controls, control obviousness, effort minimization and de-emphasis of user
28
perceives that a mobile application presents content effectively. Apple’s guidelines suggest that mobile
applications should apply user-centric terminology and avoid technical jargon. Technical terms and difficult
to read text passages would lead to frustrated users and decreased user friendliness. Similarly, the
displayed text should be written in a concise style and lengthy descriptions should be avoided (Hess et al.
2009). The same principle would apply to the labels of icons and thus lengthy icon descriptions should be
avoided (Gebauer et al. 2007; Huizingh 2000; Jokela et al. 2006; Kurniawan 2008; Robbins and Stylianou
2003). Finally, to increase the recognizability of functions used in a mobile application, standardized user
interface elements should be employed. Users will appreciate standard elements because it would make
them believe that they are already familiar with the user interface (Hess et al. 2009; Hong et al. 2002; Hong
et al. 2007; Vila and Kuster 2011; Wells et al. 2011a; Wells et al. 2011b). Hence, user-centric terminology,
concise language, short icon labeling and standardized user interface elements are proposed as first-order
Second-order Construct: User Interface Structure. User interface structure is the degree to which a user
perceives that a mobile application is structured effectively. Apple’s guidelines suggest that the user
interface of mobile applications should be structured from top-to-bottom. Users would intuitively start
searching for the most important information on the top of the screen (Li et al. 2009; Valacich et al. 2007;
Wells et al. 2011b). Thus, it would be best to place key information at the top of the screen. Also, mobile
applications require a logical and predictable path (see Adipat et al. 2011; Wells et al. 2011a: Wells et al.
2011b). Therefore, top-to-bottom structure and logical path are proposed as first-order constructs forming
Following MacKenzie et al. (2011), we next discuss if the identified second-order constructs should
be viewed as defining characteristics of mobile application usability. This would indicate that the identified
mobile application usability dimensions should be modeled as formative indicators of the mobile application
29
decide a construct’s directionality, MacKenzie et al. (2011) proposed considering if changes in one of the
sub-dimensions would be associated with a change in the focal construct. Thus, in thinking through the
relationship between the second-order constructs and mobile application usability, we felt that the sub-
dimensions are defining characteristics of mobile application usability—hence, mobile application usability
should be modeled formatively. For example, it is reasonable to say that an increase in the level of
application design would be associated with an increase in the overall usability of a given mobile
application, without necessarily being associated with any changes in the application utility or user interface
graphics. Similar arguments could be made for all second-order constructs. Thus, we model mobile
Once the constructs of interest are well defined, the next step is the creation of items (MacKenzie
et al. 2011). During the item development, the codes derived from the mobile user experience guidelines
were leveraged. Many sections of Apple’s guidelines included information that was helpful in designing the
initial items. The open codes derived from the guidelines were also helpful during the item creation process
because they included keywords describing the construct domains identified in step 1. We also searched
the literature for prior research that may have examined similar constructs to identify items that may be
relevant to us. While doing this, we examined the identified literature for additional keywords that could be
Altogether, 120 items were initially developed to capture the most essential aspects of the
constructs outlined in Table 4. Next, the simplicity and wording of the items were examined—i.e., face
validity check. Face validity checks are useful in situations when items are developed from scratch and
have not yet been tested with individuals (DeVellis 2011; MacKenzie et al. 2011; Straub et al. 2004). The
face validity check focused on the items themselves and did not ask participants to rank or respond to the
30
students and one Masters student volunteered to participate in the face validity check. A pre-requisite for
participation in the face validity check was that the participant be a mobile smartphone user, which would
ensure that they understood the context of the items. The participants were provided with a paper-based
survey that included all the 120 items. The participants were asked to examine all items and to comment on
the clarity of the questions. To identify weak items, we asked the participants to flag items whose wording
was confusing or vague. In total, 41 items were identified as too vague or worded unclearly, with specific
changes suggested to certain items. The authors discussed the 41 flagged items. Three of these 41 items
remained in the item pool after the wording was modified. This led to a pool of 82 items that were used in
Content validity is defined as the extent to which a scale represents all facets of a given construct
(Lewis et al. 2005; MacKenzie et al. 2011; Straub et al. 2004). According to MacKenzie et al. (2011),
researchers should consider two major components when assessing the content validity of a survey
instrument: (1) is the individual item representative of an aspect of the content domain of the construct?
and (2) are the items as a set collectively representative of the entire content domain of the construct? In
the IS field, content validity has been assessed infrequently by researchers developing survey instruments
MacKenzie et al. (2011) proposed using a variance analysis approach to assess content validity of
the items. Although this technique is rarely used in IS research, organizational studies have used it
frequently (see Hinkin and Tracey 1999; Yao et al. 2007). This procedure includes the use of a matrix in
which items are placed in the rows and construct definitions are listed at the top of the columns. Then,
raters would indicate the extent to which items capture the construct domain using a Likert-type scale (e.g.,
1=not at all; 5=completely). A one-way ANOVA can be used to assess if an item’s mean rating on one
31
disadvantage of this approach is that raters would need “sufficient intellectual ability to rate the
correspondence between items and the theoretical definitions” (MacKenzie et al. 2011, p. 306).
A slightly less comprehensive technique for assessing the content validity of new scales was
proposed by Anderson and Gerbing (1991). Anderson and Gerbing’s (1991) approach assumes that each
item represents a single construct. Thus, rather than rating each item-construct combination, respondents
are only asked to assign each item to a single corresponding construct definition. Although the variance
analysis approach to content validity (Hinkin and Tracey 1999) is appealing due to its preciseness (Yao et
al. 2007), Anderson and Gerbing’s (1991) approach seems to be particularly suited for large survey
instruments (Yao et al. 2007). This is due to the fact that raters would be required to rate each item-to-
construct definition in the approach suggested by Hinkin and Tracey (1999), whereas Anderson and
Gerbing’s (1991) procedure would only require selecting one construct per item.
As recommended by MacKenzie et al. (2011), we initially attempted to use Hinkin and Tracey’s
(1999) variance analysis approach to assess the content validity of our newly developed scales. Thus, we
developed a matrix and organized construct definitions in the columns and placed the items in the rows
(Yao et al. 2007). It is important to note that Hinkin and Tracey (1999) used this approach with four
constructs and 39 items. MacKenzie et al. (2011) stressed that it is important to avoid overburdening the
raters by exposing them to too many content domains at the same time. They proposed limiting the content
domains to a maximum of eight to ten aspects at the same time. We, therefore, split the pool of items and
developed several matrixes, with a view toward reducing the complexity of the rating exercise. They also
suggested that researchers who construct new scales would be well-advised to employ construct
definitions from related constructs in order to ensure that the new items have content that is free from
external content domains. Therefore, we aimed to cluster similar construct definitions together in order to
identify potentially overlapping across constructs. In particular, we combined the first-order constructs that
32
survey development process, the higher-order construct application design is formed by four first-order
constructs: branding, data preservation, instant start, and orientation. Therefore, these first-order constructs
were combined in one matrix in order to identify potential overlap across these constructs.
Next, we asked four independent raters to evaluate and examine the matrixes—two raters were IS
researchers and two were administrative staff at an Australian university. Each rater was provided with
instructions and paper-based matrixes (Table 6 illustrates a matrix used for the content validity check).
Each respondent was asked to rate how well each item (row) corresponds to each construct
definition (column) on a scale from 1-7 (1=strongly disagree... 7=strongly agree). Once respondents
completed the survey, we interviewed them to ask them about their experiences with the rating procedure.
All respondents reported that the instructions were clear but they found it challenging to complete the
33
administrative staff provided feedback via email and suggested: “I found it difficult to link the concepts to
the questions. Some of them did not correspond to each other at all and I was wondering why I should rate
them. I think it would be better to simplify the questions.” Further, three out of four raters suggested that it
would be more logical to pick-and-choose the most appropriate construct definition for each item (rather
than rating how well each construct definition corresponds to each item). Interestingly, the approach
suggested by the respondents corresponded to the content validity assessment approach described by
Given the overall length of the survey and the feedback we received, we decided to use the
modified matrixes, as suggested by Anderson and Gerbing (1991). We anticipated that ranking all the initial
item-to-construct definition combinations would overburden raters. The content validity survey was re-
organized and the instructions were re-written by asking raters to select only one corresponding construct
definition for each item. It should be noted that the items were worded generically for the task but when
they were actually administered, the application could be replaced with the name of the actual application.
This approach is consistent with prior instrument development research in IS—e.g., Compeau et al. (2012).
We discussed this new survey with the raters who had previously assessed the matrixes. Each rater
confirmed that the task would be more meaningful to them and they indicated it would be easier to
Content Validity Check: MacKenzie et al. (2011) and Anderson and Gerbing (1991) suggested that
content validity raters should come from the main population of interest. Therefore, we executed this step
by collecting data from actual consumers recruited by a market research firm. We obtained 350 responses.
Out of 350 respondents, 318 participants took a reasonable length of time to complete the survey. The
1 When queried about whether they were familiar with the Anderson and Gerbing (1991) technique or if they had used such a
technique before, none of the respondents indicated familiarity.
34
Appendix 1 summarizes the respondent demographics. As evident from Appendix 1, the sample
had a large proportion of students, which matched the profile of the sampling frame provided by the market
research firm. Thus, it was felt that non-response bias was not a concern. Further, comparing early versus
late responses was not felt to be useful because all responses were collected during a single weekend and
no reminders were employed (Churchill 1979; Hair et al. 1998). At first glance, the high proportion of young
students may seem problematic because newly developed items should be understood by the main
population of interest (Compeau et al. 2012; MacKenzie et al. 2011). However, such a sample was
appropriate to check the content validity of the scales for two major reasons—(1) mobile applications are
primarily used by young individuals and these applications are particularly popular among students
(Hampton et al. 2011); and (2) Hinkin and Tracey (1999) suggested that even samples containing
exclusively students are appropriate for content validity checks because the sorting procedure requires
analytical thinking skills (see Compeau et al. 2012). For these reasons, the respondent demographics were
We computed two indexes from the survey data following Anderson and Gerbing (1991). First, the
proportion of substantive agreement (PSA) was computed. PSA indicates the proportion of respondents who
Where nc is the number of respondents who assigned an item to its intended construct and N is the total
number of respondents. PSA values can range between 0 and 1, with a high value indicating higher
agreement that the construct definition represents the items judged (Anderson and Gerbing 1991).
Second, the substantive validity coefficient (CSV) was computed. CSV is the extent to which
respondents assign items to the posited construct rather than to any other construct. The ratio is computed
35
Where nc is the number of respondents assigning an item to the intended construct, n0 is the highest
number of assignment of the measure to any other construct, and N is the total number of respondents.
CSV values can range from -1 and 1. Positive values suggest that items were assigned to its intended
construct more than assignments to any other construct. Negative values suggest the opposite.
The results of these analyses are shown in Appendix 2. We applied a threshold of .60 as a cut-off
point for PSA and CSV values for our content analysis. When using a .60 cut-off value, the results suggest
that more than 60% of all raters associated the items with the intended construct definitions.
Overall, the content validity ratios were high indicating that most respondents sorted the majority
of items into their posited construct domains. Out of 82 items, eleven items did not meet the .60 cut-off
value. Appendix 2 shows that the CSV values obtained for DAPR4, CRLV2, CRLV4, COOB2, ICOL1,
ICOL2, ICOL3, ICOL4, TTPS2, TTPS3 and TTPS5 were lower than the threshold of .60. However, all PSA
values obtained for these items were above the .60 cut-off value. Thus, we carefully inspected the item
wordings and compared them with the construct definitions. In some case (e.g., CRLV2), we re-worded the
items slightly in order to better align the items with the construct definitions. In contrast, all short-icon
labeling (ICOL 1-4) items were closely associated with concise language items (CLAN1-4) and the resulting
CSV values were lower than .05. These findings could be interpreted as suggesting that individuals assume
that if a mobile application makes use of concise language, it will also have concise labels for icons. After
careful consideration, we decided to exclude the ICOL construct. Appendix 3 shows the item pool based
following the content validity check including 78 items for measuring the constructs derived from Apple’s
The next step in the survey development process is the specification of the measurement model
(MacKenzie et al. 2011). This process focuses on specifying how the indicators relate to the constructs and
36
measurement models. Although we developed the rationale for the second-order constructs in step 1
earlier, we wanted to ensure that our model was a good specification. In the first measurement model, all
constructs were modeled as first-order constructs. In this case, one path between the reflective indicators
and each first-order construct was set as 1 when setting up the measurement model (Hair et al. 1998;
MacCallum and Browne 1993; MacKenzie et al. 2011). Our proposed measurement model, i.e., second-
order model, is shown in Figure 2. Here too, one path between the indicators and each first-order construct
was set as 1. We repeated this procedure for the relationships between the first-order constructs and the
second-order constructs. We also used 4 global reflective items to measure each second-order construct.
The global items were developed based on the second-order construct definitions that were discussed
earlier in step 1. In order to avoid overburdening raters, the global items were not included in the content
validity assessment outlined in the previous section. All remaining steps of the instrument development
procedure were followed to develop the global items as outlined above. Specifically, we developed items
that represented the second-order construct definitions discussed earlier. We also searched for relevant
literature that might have examined similar constructs in order to inform our item development process. In
order to evaluate the wording of the global items, we conducted a face validity check involving the same six
participants that supported us in evaluating the wording of the first-order constructs. The face validity check
was conducted using the same procedure as for the first-order constructs explained earlier. Flagged items
that were identified as too vague were deleted or their wording was modified. All global items used to
37
38
Once the measurement model is formally specified, the next step of the scale development is the
pre-test of the survey instrument. The psychometric properties, including the convergent, discriminant and
nomological validity, of the scales should be investigated (MacKenzie et al. 2011; Straub 1989; Straub et al.
2004). To evaluate nomological validity of the focal construct, it is important to collect data for constructs
For the pre-test, we created a survey including instructions for the participants and the items
developed as discussed in step 2 earlier. All items were measured using a 7-point Likert-agreement scale
(1=strongly disagree... 7=strongly agree). Before collecting data from a large sample, we asked two
individuals to read the instructions and provide feedback on the items and survey structure. Both individuals
were administrative staff at an Australian university. They read the instructions that accompanied the
39
these two individuals, and they confirmed that the instructions were clear and easy to follow.
In order to collect data from a wider audience, we surveyed a different set of U.S. consumers who
were also recruited by the same market research firm that we had used earlier. Due to the exploratory
nature of this phase, we aimed to collect 500 responses that were necessary to investigate the
psychometric properties of the scales (Hair et al. 1998; MacCallum et al. 1999). Given that our survey
instrument included 102 newly developed items, our ratio of items to responses was in range specified by
MacKenzie et al. (2011). Due to the popularity of social media sites among mobile smartphone users
(Hampton et al. 2011), we tailored all questions to a mobile social media application, such as Facebook.
This approach of tailoring the questions to a specific application, as noted earlier, is consistent with extant
research (e.g., Adipat et al. 2011; Devaraj et al. 2002; Devaraj et al. 2006; Hess et al. 2009; Hong et al.
2004a; Hong et al. 2004b; Thong et al. 2002; Wells et al. 2011a; Wells et al. 2011b; Venkatesh et al.
2003b; Venkatesh and Ramesh 2006). At the beginning of the survey, we asked two qualifying questions:
(i) what is the frequency with which you access mobile social media applications via your smartphone? and
(ii) which mobile social media application do you use most often via your smartphone? Respondents who
did not use mobile social media applications at all were disqualified from participating in the survey and we
did not collect data from these individuals. Similarly, we excluded those potential respondents who did not
use one of the following mobile social media applications: Facebook, Google+, LinkedIn, MySpace or
Twitter. We pursued this sampling strategy for three reasons. First, these mobile social media applications
are very popular and successful, and we felt that this would allow us to understand the usability factors that
can truly contribute to the outcome variables. We felt that the findings of our work should be particularly
informative for application designer with less extensive funding opportunities. Second, we felt asking
questions regarding a specific mobile social media application was necessary to ensure that all
respondents have well-established thoughts about a specific social media application so that they could
40
application. Third, these mobile applications pursue similar business strategies and they provide
comparable value propositions to users (e.g., allowing users to create and share user generated content).
This was particularly useful because it allowed us to group the responses in the data analyses discussed
below. We felt this was reasonable because the usability principles should not differ at a theoretical level.
During the data collection, the online survey was programmed to carry forward the mobile social media
application response that each individual provided. For instance, rather than the item being stated as “the
mobile social media application allows you to connect with other people,” the item was displayed as
In total, 500 responses were collected for the pre-test. As with the content validity check, all
responses were scrutinized for the duration that the respondents took to complete the survey. Those
respondents who took too little time and/or did not correctly answer reverse-coded filler items were
excluded from the sample. We applied a cut-off threshold of eight minutes to complete the survey including
all items of our instrument (113 items as shown in Appendix 3, Table 7 and Table 8) and excluded those
responses that were completed in less than eight minutes. We felt that a response ratio of more than fifteen
questions per minute would indicate that the survey taker did not pay adequate attention to the questions.
This led to 440 usable responses. Appendix 1 provides information on the respondent demographics. As
with the content validity check, we felt that non-response bias was not a concern because the sample
matched the profile of the sampling frame provided by the market research firm. We also felt that
comparing early versus late responses was not useful because all responses were collected during a single
weekend and no reminders were employed (Churchill 1979; Hair et al. 1998).
As indicated above, assessing the nomological network of focal constructs is an important part of
the scale development procedure. Bagozzi (1980, p. 129) argues that we: “must also consider the
relationship of the concept under investigation to other concepts in an overall context of a theoretical
41
empirical relationships.” We followed Bagozzi’s (1980) recommendation and explored the nomological
network of mobile application usability. In order to assess the nomological validity of our proposed mobile
application usability conceptualization, we collected data on two dependent variables, namely continued
Among those with experience with a technology, continued intention to use is an appropriate dependent
variable (Bhattacherjee 2001; Brown et al. 2012; Brown et al. 2014). We define continued intention as the
degree to which a user feels he or she will keep using a mobile application (adapted from Bhattacherjee
2001). There are theoretical reasons and empirical evidence that usability is a key predictor for individuals’
ongoing use of technological applications (Agarwal and Venkatesh 2002; Devaraj et al. 2002; Devaraj et al.
2006; Hong et al. 2004a; Venkatesh and Agarwal 2006; Venkatesh and Ramesh 2006; Wells et al. 2011b).
Brand loyalty is widely used in marketing research and we felt that examining the impact of usability on an
outcome of such organizational significance would help us in validating our conceptualization and scales
and furthering criterion validity. We adapted the concept of brand loyalty to the context of mobile
applications. We define mobile application loyalty as the degree to which a user has a deeply held
commitment to rebuy or repatronize a mobile application (adapted from Johnson et al. 2006). Much
practitioner-based research suggests that mobile application loyalty is a consequence of the overall
usability of a given application (Deloitte 2012; Forrester Research 2011; Youens 2011). For instance,
Gartner (2012) argues that mobile application loyalty is ultimately driven by user experience and the
usability of the mobile application. Further, we found a considerable amount of research on website
usability that established a theoretical link between web application usability and web application loyalty
(Casalo et al. 2008; Cyr et al. 2006; Cyr 2008; Flavian et al. 2006). Therefore, we felt it is important to
examine a broader outcome in conjunction with mobile application usability. Further, MacKenzie et al.
42
nomological network to include…consequences of the focal construct… learning more about a theoretical
construct is a matter of elaborating the nomological network in which it occur, or of increasing the
definiteness of the components.” Thus, we decided to expand the nomological network of mobile
application usability and examine whether mobile application usability is a predictor of consumers’ mobile
application loyalty. We adapted a previously validated scale to measure continued intention use
(Bhattacherjee 2001; Venkatesh and Goyal 2010). We measured mobile application loyalty by adapting a
previously validated scale of brand loyalty (Johnson et al. 2006). Table 8 lists the items. Figure 3 displays
43
44
With the pre-test data, MacKenzie et al. (2011) suggest purification and refinement of the survey
instrument. This course of action involves a set of statistical tests to evaluate the measurement properties
of the scales using the pre-test data. The course of action that should be performed for evaluating the
measurement model requires: (a) assessing goodness of fit; (b) assessing the validity of the set of
indicators at the construct level; (c) assessing the reliability of the set of indicators at the construct level; (d)
assessing individual indicator validity and reliability; and (e) eliminating weak indicators.
We used SPSS and AMOS to perform the statistical tests reported in this and subsequent sections.
We compared the two alternative measurement models discussed in step 4: first-order vs. second-order
models. Consistent with prior research, we used the Χ2-difference test (see Tanriverdi 2005; Wallace et al.
2004) and comparative model fit. The results suggested that the model including the second-order
constructs had a slightly lower Χ2 but was not statistically significantly different from the first-order model.
However, the parsimony in predicting variables that comes with the second-order model caused us to favor
this model. Further, all of the other fit statistics were better for the second-order model and in the
acceptable range (see Hair et al. 1998; Straub et al. 2004). In fact, three of the six fit statistics were below
the acceptable levels for the first-order measurement model. Next, the procedures outlined by MacKenzie
Appendix 4 includes the goodness of fit statistics of the measurement model. Overall, the
goodness of fit statistics were well in line with the recommended cutoff values (Hair et al. 1998; MacKenzie
et al. 2011; Straub et al. 2004), thus supporting the validity of the measurement model. We then assessed
the validity of the indicators at the construct level by examining the average variance extracted (AVE) of all
first-order constructs. The results confirmed that all AVEs were in excess of .70, which is higher than the
recommended threshold of .50 (MacKenzie et al. 2011). We also examined the reliability of the scales
using Cronbach’s alpha. All Cronbach’s alpha values ranged from .75 to .85 and were above the desired
45
the unique proportion of variance in the second-order construct accounted for by each first-order construct
and Fornell and Larcker’s (1981) construct reliability index. Appendix 6 shows these results. Most first-order
constructs explained a considerable amount of variance in the respective second-order constructs. Notably,
content relevance explained 35% of unique variance in application utility. All construct reliability index
scores were above .77 and higher than the recommended threshold of .70 (Fornell and Larcker 1981).
Next, we inspected the item loadings for the reflectively measured first- and second-order
constructs, and assessed the weights of each first-order construct on the respective second-order
construct. Appendix 7 shows these results. All items loaded highly on the intended constructs, with item-to-
construct loadings between .65 and .93 and only four loadings lower than .70 (UCT1, TTPS3, DUS4,
SANM1), thus supporting convergent validity. The weights of the first-order formative constructs on the
respective second-order constructs were significant (p<.001 in all cases), thus indicating that each first-
order construct contributed substantially to the corresponding second-order construct. These findings were
positive because a well-distributed weight structure is desired when using formatively measured constructs
(Cenfetelli and Bassellier 2009; Diamantopoulos et al. 2008; Petter et al. 2007). We also examined the
variance inflation factors (VIFs) for all first-order constructs forming the second-order constructs. All VIF
values were less than 4 indicating that multicollinearity was not a major issue (see Petter et al. 2007). The
bottom part of Appendix 7 shows the results of the item loadings obtained for the reflectively measured
second-order constructs. As discussed earlier in step 4, we used four global items to identify each second-
order construct. The findings confirmed that the items loaded highly on the intended second-order
The structural model results are shown in Table 9. Usability explained 42% of variance in
continued intention to use. The R2 was lower for mobile application loyalty (17%). Although encouraging
that usability predicted mobile application loyalty, the lower variance explained is to be expected given that
46
applications (Johnson et al. 2006). All paths between the second-order constructs and continued intention
to use were significant. Most influential was application design (.36), followed by application utility (.28) as
well as user interface graphics (.28). Three of the six second-order constructs were significant predictors of
mobile application loyalty, with application design and application utility being the strongest, and user
interface graphics having a weak effect. Due to these promising findings, we did not feel that any items
needed to be removed.
Once the scales are pre-tested, refined and problematic indicators are eliminated, data should be
collected from a new sample in order to re-examine the purified scales. This is an essential step that is
Similar to the pre-test, we collected data from a new sample consisting of U.S. consumers using
mobile social media applications. To ensure that we collected data from a new sample, we hired a different
market research firm from the one used for the content validity check and pre-test. The new firm conducted
the data collection using the same survey instrument that we used for the pre-test. Following the procedure
outlined in step 5, we screened respondents who did not qualify based on their usage patterns, time spent
on the survey and response to reverse-coded items. In total, 408 usable responses were collected.
Appendix 8 provides the information on the respondent demographics. As in the case of the pre-test, we
47
weekend and no reminders were employed (Churchill 1979; Hair et al. 1998). The sample was well
distributed across income, jobs, and age groups. Men were over-represented but the sample corresponded
The next step is to assess scales validity (MacKenzie et al. 2011). A major goal during this step is
to evaluate if the items used to assess the focal construct are: “(1) accurate representations of the
underlying construct (through experimental manipulation or comparing groups known to differ on the
construct), (2) adequately capture the multidimensional nature of the construct, (3) are distinguishable from
the indicators of other constructs (discriminant validity), and (4) are related to the measures of other
constructs specified in the construct’s theoretical network (nomological validity)” (MacKenzie et al. 2011, p.
317).
MacKenzie et al. (2011) suggested that experimental manipulation checks and video tape
observations work well if benchmarking different outcome scenarios (e.g., performance measures). In
contrast, they noted that individuals’ internal states (e.g., attitudes, anger, frustration, values) would be less
suited to experimental manipulation (MacKenzie et al. 2011). Given that prior research has pointed out that
usability is a subjective judgment made by users (e.g., Devaraj et al. 2002; Devaraj et al. 2006; Hess et al.
2009; Hong et al. 2004a; Hong et al. 2004b; Thong et al. 2002; Wells et al. 2011a; Wells et al. 2011b), we
concluded that an experimental manipulation was impractical to validate our newly developed instrument.
Also, a literature review did not reveal any meaningful theoretical patterns across groups that could be used
to validate the scales. Moreover, given the number of constructs being examined here and the length of the
paper even as it stands now, we felt that we could not develop hypotheses comparing different groups in
this paper.
48
conceptualization and instrument based on MUG (Agarwal and Venkatesh 2002; Venkatesh and Agarwal
2006; Venkatesh and Ramesh 2006) would further our case in support of nomological validity. We use the
MUG scales for our work as an illustration only and we do not wish to critique the instrument but use it for
the following reasons. First, the MUG scales were drawn from usability guidelines developed for websites
(see Agarwal and Venkatesh 2002) and it is likely that critical contextual factors pertaining to mobile
application usability were not included as a result of this. Second, the MUG scales were the most
comprehensive conceptualization of application usability that was previously validated in a field study
focusing on mobile application usability. Third, we used Google Scholar and found that the MUG studies,
i.e., Agarwal and Venkatesh (2002), Venkatesh et al. (2003b), Venkatesh and Agarwal (2006) and
Venkatesh and Ramesh (2006), were cited in excess of 1000 times. This provided us with confidence that
the instruments were well accepted. Given that we could not compare the model across groups, we felt that
such a benchmarking was a good alternative. For the competing model, we adapted the scales employed
by Venkatesh and Ramesh (2006), shown in Appendix 9. We initially used the new sample to assess the fit
of the measurement model. As listed in Appendix 4, all fit indexes met the recommended thresholds, thus
indicating a good model fit. Appendix 10 shows that the AVEs were all above .70, which is above the
recommended cutoff value. The results confirmed that the AVEs for each construct exceeded the squared
correlation of the construct with other constructs (Fornell and Larcker 1981). Appendix 10 also shows that
the reliabilities, assessed using Cronbach’s alphas, ranged from .70 to .83 for all scales and were above
the threshold of .70. Following MacKenzie et al. (2011), we also computed the unique proportion of
variance that each first-order construct explained in the associated second-order construct. Appendix 11
shows that most first-order constructs explained a significant proportion of variance in the second-order
constructs. Similar to the pre-test, content relevance explained more than 30% of unique variance in
49
Next, we inspected the item loadings for the reflectively measured first-order constructs. We also
assessed the weights of each first-order construct on the respective second-order construct. Appendix 12
shows these results. All item loadings ranged between .62 and .91. Only four items were lower than .70
(SANM1, DUS4, UCT1, and TTPS3). These findings indicated a high level of convergent validity. The
weights for the formative constructs were all significant and ranged between .24 and .58, thus indicating
that each first-order construct contributed substantially to the corresponding second-order construct.
Further, all VIFs were less than 4 indicating that multicollinearity was not an issue (Petter et al. 2007). Also,
the item loadings for the reflective indicators of the second-order constructs exceeded .70.
Next, we tested the structural model using the two outcome variables: continued intention to use
and mobile application loyalty. Table 10 shows these results. Usability explained 41% of variance in the
continued intention to use construct. Consistent with the pre-test results (Table 9), the R2 was lower in
explaining mobile application loyalty (16%). All structural paths from the second-order constructs to
continued intention to use were significant and the path coefficients ranged between .19 and .35.
Comparing these results with the pre-test, we concluded that the findings for the measurement and
structural models were stable and consistent across the two samples. The MUG results are also shown in
Table 10. In comparison to our model, MUG is far less predictive, with the variance in intention and loyalty
The next step in the scale development was to cross-validate our results. This step is best
performed using a new sample in order to assess the stability of the scales (MacKenzie et al. 2011).
We collected data from a new sample. The second sample comprised 412 U.S. consumers using
mobile social media applications. We employed the same market research firm that we used for study 1.
Care was taken not to invite any study 1 participants. Following steps 5 and 7, we excluded individuals who
were not qualified to participate in the survey (e.g., being unfamiliar with mobile social media applications),
those who spent too little time on the survey and responded incorrectly to the reverse-coded items. We
collected 412 usable responses. Appendix 8 shows the respondent demographics. Like in the pre-test and
study 1, we felt comparing early versus late responses was not useful because all responses were
collected during a single weekend and we did not send out reminders to invited participants (Churchill
1979; Hair et al. 1998). Also similar to the pre-test and study 1, men were over-represented in this sample
but this was in line with the sampling frame that we were provided. We followed the same approach as
discussed for the pre-test and study 1 to assess the psychometric properties of the scales.
Appendix 4 shows that all fit indexes were in line with recommended cutoff values indicating that
the measurement model fit was good. All AVEs ranged from .70 and .74, and exceeded the .50 threshold.
The results confirmed that the AVEs for each construct exceeded the squared correlation of the construct
with other constructs (Fornell and Larcker 1981). Further, all Cronbach’s alphas ranged from .74 to .82 and
were above the recommended .70 threshold. Appendix 13 shows these results.
51
the associated second-order construct. The first-order constructs explained a significant proportion of
variance in the second-order constructs. Appendix 14 shows these results. Fornell and Larcker’s (1981)
construct reliability scores ranged between .78 and .89, thus exceeding the recommended threshold of .70.
Appendix 15 shows that the item-to-construct loadings for the first-order constructs ranged
between 69. and .91. Only the loading of SANM1 was lower than .70 and the majority of item loadings
exceeded .80. All first-order construct weights were significant, and ranged between .24 and .60. This
suggested that all first-order constructs contributed substantially to the intended second-order constructs.
The VIFs of the formative first-order constructs were all lower than 4, thus suggesting multicollinearity was
not a problem. Appendix 15 also shows that the global items used to measure the second-order constructs
loaded cleanly onto the intended constructs and the loadings exceeded .70 in all cases.
The structural model results are shown in Table 11. Usability explained 47% of the variance in
continued intention to use. All six second-order constructs were significant, with application design,
application utility and user interface graphics being the strongest predictors. Interestingly, these three
strongest predictors of continued intention to use were the only significant predictors of mobile application
loyalty. The pattern of results was thus highly consistent with the pre-test and study 1. Table 11 also reports
the results of the prediction using the MUG constructs. Like study 1, the variance explained by the MUG
The final step includes developing norms for the new scales. This is an important step because it
helps in interpreting the findings and guides future research (MacKenzie et al. 2011).
Generally, it is useful to estimate the distribution of the population of interest before developing
sampling strategies (DeVellis 2011; MacKenzie et al. 2011). Hence, we studied the U.S. user population of
social media sides published by each provider (e.g., Facebook) and found that younger individuals use
social media applications. For instance, 56% of all U.S. Facebook users are between 18-35 years old
(Insidefacebook 2011). Therefore, our survey strategy matched the population of social media applications
and we felt it was less meaningful to aim for a representative sample of the entire U.S. population including
a large proportion of older adults. Another consideration for choosing the most suitable sample was our
resource constraints. Representative samples are difficult to obtain and we instructed the market research
firms to provide a sampling frame that matched the user statistics on social media users in the U.S. As
recommended by MacKenzie et al. (2011), we also carefully inspected all four samples for the shape of the
distribution and examined the skewness and kurtosis—we found no significant issues.
Another important aspect when developing norms for scales is the sample size. The sample should
be large enough to conclude that the scales are truly stable (Lee and Baskerville 2003; MacKenzie et al.
2011). Beyond the content validity check, we validated our scales using three independent samples, each
comprising more than 400 individuals. We hired two different market research firms to conduct the data
collection. In total, we surveyed more than 1,200 individuals who use mobile social media applications and
53
across research contexts and time (Ancona et al. 2001; Johns 2006; MacKenzie et al. 2011). Johns (2006)
specifically noted that a particular context of study can lead to different results, such as relationships going
from significant to non-significant. Others have argued that the concept of time is important, and could
impact the stability of theories and associated scales (Ancona et al. 2001; Harrison et al. 1998; MacKenzie
et al. 2011). Due to practical constraints, we only tested our conceptualization and survey instrument in the
context of mobile social media applications in the U.S. using cross-sectional data. Future studies could
extend the scope of this work and develop further norms for our usability instrument using samples in other
countries to understand cross-cultural differences and also by collecting longitudinal data to understand
differences in the importance of different usability elements over time. Still, given our results, we believe it
is reasonable to say that the scales are stable in the context of our work—i.e., current users of mobile
DISCUSSION
The current study developed and validated a mobile application usability conceptualization and
survey instrument following the 10-step procedure proposed by MacKenzie et al. (2011). We adapted
Apple’s user experience guidelines for mobile applications to develop our conceptualization and instrument.
We conceptualized and used 19 first-order constructs to form 6 second-order constructs. The scale
development process included item generation and a content validity assessment using 318 U.S.
consumers. Once we specified the measurement model, we collected three waves of data from over 1,200
U.S. consumers of social media applications. The scales were tested and found to be reliable and valid.
The fit indexes of the proposed measurement model were good in all three samples. We also found that
our conceptualization of mobile application usability was a good predictor of continued intention to use and
mobile application loyalty, thus supporting the nomological and predictive validity of our scales. Finally, we
benchmarked our model against a model based on Microsoft’s usability guidelines (Venkatesh and Ramesh
54
mobile application loyalty (see Hong et al. 2014). Based on our findings, we proposed norms to help future
Despite mobile application usability being an important emerging concept in IS, our literature
review revealed that there is a lack of theoretical and methodological clarity on holistically evaluating mobile
application usability. Our work addresses this issue by providing a context-specific and fine-grained
conceptualization of mobile application usability and offers a reliable and valid instrument. This advances
First, our work highlights the importance of the user context in IS and HCI studies and provides an
example for context-specific theorizing related to an IT artifact. Our literature review suggested that prior
research primarily drew on scales developed for website usability for measuring mobile application
usability. Although it is certainly reasonable to use conceptualizations and instruments for website usability
as a starting point (e.g., Venkatesh and Ramesh 2006), such instruments may not capture important factors
relevant to the mobile application usability context. To address this issue, we developed and validated
several constructs that were unique to the mobile application context. For instance, the fingertip-size control
construct comprised several items that focused on the button size of mobile applications. Apple’s guidelines
suggest that fingertip-size controls should be considered for the usability of mobile applications because
users would input data via touch screen interfaces. Our results confirmed that the fingertip-size control
construct substantially contributed to the user interface input construct. Likewise, web-based applications
are commonly displayed on fixed computer screens. Therefore, it seems to be unnecessary for web-based
applications to be displayed in horizontal and vertical modes. In contrast, mobile applications are displayed
on smartphones and users often change the way they hold the smartphone when using a mobile
application. Due to this, application orientation becomes critical in this context and our studies confirmed
55
applying website usability instruments, these factors might remain undetected because the user interface
principles of web-based applications fundamentally differs from its mobile counterparts (e.g., web-based
applications are normally controlled using a mouse cursor rather than using one’s fingertips in combination
with touch screens). When benchmarking the MUG instrument against our instrument developed
specifically for the mobile application usability context, we found that our instrument explained more
variance in the outcome variables, thus highlighting the importance of context in theory development in the
IS and HCI research. As such, our work follows several calls for developing more precise and context-
specific theories (Alvesson and Kärreman 2007; Bamberger 2008; Brown et al. 2010; Hong et al. 2014;
Johns 2006; Van der Heijden 2004; Venkatesh et al. 2012) because there is “a general tendency to seek
causal explanations at lower rather than higher levels of analysis, a tactic referred to unflatteringly as
Second, given that our conceptualization and scales are a more comprehensive and accurate
representation of mobile application usability compared to prior research, future research can use it. Our
literature review indicated that researchers have often pursued a “pick-and-choose” strategy and combined
various theoretical constructs to measure mobile application usability. We believe that such a strategy is no
longer necessary because our instrument provides clarity in terms of the underlying constructs of the
overall usability of mobile applications. Instead of combining theoretically unrelated constructs to measure
mobile application usability, researchers could use our entire instrument or relevant parts (especially all or
some of the second-order constructs) to investigate all specific aspects of mobile application usability. For
example, HCI research has investigated the concept of user interfaces to a great extent (Hess et al. 2009;
Hong et al. 2004c; Palmer 2003; Thong et al. 2002; Wells et al. 2011a; Wells et al. 2011b). Our
conceptualization includes constructs that could be used to study user interaction and interface design of
mobile applications, namely interface input and user interface output. User interface input specifies how
56
interface input, i.e., control obviousness, de-emphasis of user settings, effort minimization, and fingertip-
size controls. User interface output specifies how well a given mobile application presents content
effectively. Three first-order constructs were identified to assess user interface output, i.e., concise
language, standardized user interface element, and user-centric terminology. Our results confirmed that
each first-order construct significantly contributed to the associated second-order constructs. Therefore,
HCI research could use our conceptualization and instrument to investigate how user interface design
(including user interface input and output mechanisms) performs in terms of relating to dependent variables
typically studied in HCI, such as response time and error rates. Our work also has implications that go
beyond HCI research and we believe a large audience will be interested in leveraging our instrument. For
instance, there is considerable interest in understanding the interplay between personality traits and HCI
(Devaraj et al. 2008; Junglas et al. 2008; McElroy et al. 2007). We think that our instrument is an ideal
candidate for exploring this relationship in more depth. Specifically, future research could leverage the
second-order constructs and investigate the interaction effects between mobile application usability,
personality traits (e.g., agreeableness, conscientiousness, extraversion, neuroticism, openness) and use of
mobile applications. Such a study could reveal findings that contribute to a variety of streams in IS (e.g.,
Third, brand loyalty is frequently used in the marketing literature to explain why consumers are
committed to a brand or a company (Brakus et al. 2009; Johnson et al. 2006; Melnyk et al. 2009; Wagner et
al. 2009). Traditionally, marketing research has paid less attention to techno-centric topics and studies
have explained brand loyalty as a function of affective commitment, perceived value, and brand equity
(e.g., Johnson et al. 2006). We adapted the concept of brand loyalty to mobile applications and explored
the nomological network of mobile application usability. We found empirical evidence that mobile
application loyalty was significantly influenced by usability. Our mobile application usability
57
samples. Due to practical constraints in our work and our focus on mobile application usability, we did not
include other possible predictors of mobile application loyalty. Future research should certainly conduct
such an investigation with the goal of comparing the previously known predictors to a model based on
usability, followed by a study with an eye toward integrating these varied set of predictors into a cohesive
whole.
Fourth, we believe that our survey development exemplar contributes to measurement theory.
MacKenzie et al. (2011) provided comprehensive instrument validation guidelines for IS in particular and
behavioral and social sciences research in general. We applied their recommendations and inform future
research aiming to develop instruments. Overall, we did not encounter major issues by following
MacKenzie et al. (2011). In some steps, we had to deviate from their recommendations due to practical
considerations. For example, during the content validity check, we initially followed their preferred Hinkin
and Tracey’s (1999) content validity assessment approach. Given the feedback we obtained during the
face validity check, we decided to employ Anderson and Gerbing’s (1991) approach because we felt that
ranking all item-construct pairs would overburden raters. In examining studies that applied Hinkin and
Tracey’s (1999) approach, we found it to be used almost exclusively when there were fewer than fifty items
and 4 to 6 construct definitions (e.g., Yao et al. 2007). Although we did not conduct a group comparison
(e.g., mobile social media application user versus non-user) because developing the theoretical bases for
expected differences was beyond the scope of this paper, we believe that other researchers will encounter
such practical constraints as well. Instead, we benchmarked our new constructs and scales against a
conceptualization and instrument based on Microsoft’s usability guidelines. Future work could also consider
this option. It is important to note that during all stages of the instrument development process, we asked
participants to provide us feedback and this was generally positive, thus adding to the credibility of our
scales, the validity of the overall procedure proposed by MacKenzie et al. (2011), and the changes we
58
by-step guide for validating instruments, the item generation process is not discussed in depth due to the
focus of their paper. Therefore, we believe that we complement the work of MacKenzie et al. (2011) in that
we exemplify how to leverage coding procedures during the item generation step. Future research aiming
to develop instruments could learn from our work and codify alternative information sources, such as
Finally, our mobile application usability conceptualization and instrument can be used as a
springboard for future research. As noted earlier related to step 10, researchers could replicate our work in
new contexts or test the stability of the scales over time (Ancona et al. 2001; Johns 2006). For example,
studies could investigate existing mobile applications besides social media applications, such as mobile
news, mobile marketing and mobile entertainment. Testing our conceptualization of mobile application
usability and associated instrument in a new context is critical in order to see if the conceptualization is
comparable in terms of predicting the outcomes in the context of interest. If our conceptualization predicts
the outcomes fairly well in alternative contexts, this would suggest that the context would not include
attributes that affect our conceptualization of mobile application usability (see Alvesson and Kärreman
2007). Future studies could also use our conceptualization and instrument in combination with other
theories, such as IS continuance theory (Bhattacherjee 2001), IS success model (DeLone and McLean
1992), unified theory of acceptance and use of technology (Venkatesh et al. 2003a) and task-technology fit
(Goodhue and Thompson 1995), to study why individuals use mobile applications. Such theoretically
motivated studies could also explain mobile application use in mobile commerce environments and inter-
and intra-organizational information sharing facilitated by mobile information applications (Kohli 2007). Our
work can also serve a starting point for design science studies in the context of consumer adoption and use
of mobile applications. For example, it would be interesting to investigate how users differentiate usability of
mobile applications on different mobile device formats including iPhones and iPads (or alternative brands,
59
application usability conceptualization and explore if mobile device usability conceptually differs from
mobile application usability. The findings would be particularly useful for organizations offering mobile
applications on various mobile devices. Further, much practitioner-based literature has recognized that
there is a push toward touch-based applications for new laptop generations running Windows 8 (Ovide
2013). For example, the Wall Street Journal recently reported that over 100,000 applications would be soon
available for Microsoft’s new Windows 8 operating system that users can operate via touch-based
interfaces (Ovide 2013). Recent reports suggested that users are disappointed with the usability of the
operating system. For instance, the Nielsen Norman Group (2014) evaluated the usability of Windows 8
and reports several usability problems from a user perspective including cognitive overload triggered by
double desktops and error prone gestures. Given this, it would be interesting to measure how individuals
rate such applications in terms of the usability criteria we identified (e.g., fingertip-sized controls). Finally, it
is worthwhile to note that our sampling strategy focused exclusively on mobile application users who had
prior experience with mobile social media applications. Future studies should examine the impact of user
One limitation concerns the self-reported measure of continued intention to use. As noted by
Burton-Jones and Straub (2006), technology use has been conceptualized and measured variously
including as breath of use (e.g., Saga and Zmud 1994), variety of use (e.g., Igbaria et al. 1997) and extent
of use (e.g., Venkatesh and Davis 2000). The same principles apply to studies using continued intention to
use as a proxy for individuals’ ongoing technology use behavior. By adapting previously validated scales
from Bhattacherjee (2001) and Venkatesh and Goyal (2010), we purposefully focused on the extent of their
use. Both papers have been cited in excess of 1500 times, thereby underscoring their impact. By aligning
our conceptualization of the continued intention to use construct with these papers, we expect that our
60
conceptualizations of technology use in IS research (see Burton-Jones and Straub 2006; Sykes and
Venkatesh forthcoming), we note that the generalizability of our findings is constrained by the limitations of
All items used in our work did not focus on explaining structural elements of use (see Burton-Jones
and Straub 2006) as we did not ask users regarding particular tasks they perform on mobile social media
applications. One of our major goals was to explain why individuals continue to use mobile social media
applications in general. We suggest that future studies adapt our scales to specific tasks that users perform
on mobile applications, such as chatting with friends or posting news on mobile social media applications,
chatting with friends on mobile instant messaging applications or transferring money on mobile banking
applications. A potential starting point for such studies could be an augmented task-technology fit model
(Goodhue and Thompson 1995) that evaluates the fit between tasks (e.g., chatting with friends) that users’
perform on mobile applications and technological artifacts (e.g., mobile social media application) used to
perform the task. It is also important to acknowledge that smartphones and associated applications
continue to develop and user interface design will adapt to the technological progress. As a consequence
of these developments, mobile application vendors, including Apple, will likely release updated versions of
their user interface guidelines. Future research on mobile application usability should monitor these
Practical Implications
Our findings have important implications for practitioners because we identified critical usability
elements in the context of mobile applications. Our instrument could be used by practitioners to study to-be
developed, to-be implemented, and existing mobile applications. For example, companies could use our
mobile application usability conceptualization and associated instrument in all phases of the system
61
During the investigation phase, when practitioners determine the scope of the application and
conduct a feasibility analysis, our coding matrix could be used as a checklist for the required usability
elements with which a given mobile application needs to conform. Our codes are a comprehensive
reflection of Apple’s guidelines and the coding matrix could be particularly useful in situations where
multiple team members discuss the mobile application scope and conduct a feasibility analysis. For
example, the codes derived for the concise language first-order construct emphasize precise application
feature descriptions. The mobile application development team might decide to consult editors in order to
ensure that the text written for the application is precise and adequate. Other codes, including subtle
animation or aesthetic graphics, indicate that it might be useful to consult with graphic designers regarding
the integration of these elements. Many other usability elements could be discussed with system
developers (e.g., interface structure, control obviousness, fingertip-size controls) as they will be able to
estimate the time and costs associated with developing the application.
During the analysis and design phase, our instrument could be used to acquire information about
the most desirable features of a to-be developed mobile application. If designing mobile applications from
scratch, it might be useful to adapt our scales and survey potential users regarding their expectations
related to the most important usability elements of the mobile application. Another way to use our
instrument will be to design a prototype and invite potential users to test it. Once potential users have
tested the prototype, our instrument could be used to obtain feedback from them. Given the iterative nature
of the analysis and design phase, the feedback obtained could be incorporated into the next application
Given that the application implementation phase is mostly technical in nature, our instrument may
be less useful there but we believe that our instrument will be particularly valuable during the application
62
complaints could be used and checked against our coding matrix. If the complaints deviate from the
usability requirements listed in the matrix, practitioners could use the comprehensive usability guidelines
offered by vendors and develop strategies to overcome these problems. Bug reports might be an
alternative source for identifying potential weaknesses and they could also be benchmarked against the
codes identified in our work. During the review stage of the application development lifecycle, our
comprehensive mobile application usability instrument could be leveraged for detailed usability testing in
laboratory research environment or field studies. The findings would reveal how developers can fine-tune
their applications in terms of continuous improvement. The findings could then be used to evaluate the
usability of the existing mobile applications and designers could take actions based on the criteria
with important second-order constructs can guide practitioners on how they can improve their current
mobile applications.
CONCLUSIONS
Due to the widespread diffusion of mobile devices, consumers expect user-friendly and well-
designed mobile applications from service providers in various industries. So far, little systematic help has
been offered to evaluate existing mobile applications or in designing new mobile applications. Our work
developed a conceptualization and associated survey instrument based on Apple’s general user
experience guidelines that can aid such an endeavor. Our work also serves as an exemplar that uses the
procedure proposed by MacKenzie et al. (2011), albeit with some modifications that we found to be
necessary. The conceptualization and instrument of mobile application usability is an important contribution
for IS and HCI research because it helps theory development in various research areas, such as mobile
technology adoption research, mobile user interface evaluation and mobile application development.
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88