Strengthened Readings of Scalars

A Closer Look at Strengthened Readings of Scalars

Mandy Simons1 & Tessa Warren2

1Carnegie Mellon University, Dept f Philosophy

2University of Pittsburgh, Learning & Research Devpt Center and Dept of Psychology

1

Strengthened Readings of Scalars

1

Strengthened Readings of Scalars

Abstract

The majority of the extensive experimental and theoretical literature on scalar strengthening assumes that the phenomenon is uniform across all types of scalars. The experiment reported here contributes to the growing evidence against scalar uniformity, while also exploring the suggestion of van Tiel at al. 2014 of the role of boundedness in the observed variation. The current experiment utilizes a novel approach to exploring the interpretation of scalars, and also investigates the content of strengthened interpretations.

Key words: language comprehension, scalar implicatures, experimental pragmatics, sentence processing

Introduction

Scalar terms include words such as some, cool, and like (v), which partition a semantic scale. It has been argued that the semantics of scalar terms places a lower bound on their interpretation (e.g. if Jane merely likes donuts, this falls below the lower bound of love) but not an upper bound (e.g. if Jane loves donuts then we can still say that she likes them; Horn, 1972, 1989). Nonetheless, sentences containing scalar terms are often taken to imply an upper bound, giving rise to a strengthened interpretation. Examples are given in (1)-(3) below.

(1) Some of John’s children are at home.
Strengthened interpretation: Some of John’s children are at home, and at least one of his children is not at home.

(2) It was a cool day.
Strengthened interpretation: The temperature was in the range characterizable as “cool” but not in the range characterizable as “cold”.

(3) Jane likes ice cream.
Strengthened interpretation: Jane has positive feelings about ice cream, but not feelings that can be characterized by “love”

Standard accounts of scalar strengthening (e.g. Horn, 1972; Gazdar, 1979; Geurts 2010) all depend on the following idea derived from Grice (1967): Speakers are expected to be as informative as they can, consistent with current conversational goals and their current information. The assumption that a speaker has been as informative as she can leads to the inference that certain stronger alternatives to the content the speaker has expressed are considered, by her, to be not assertible. In the simplest case, where the speaker is assumed to be fully informed, it is assumed that this is because the stronger alternatives are not believed true by the speaker. More recent accounts (e.g. Fox, 2007; Chierchia, Fox, & Spector, 2012) grammaticize this kind of reasoning, but are still driven by considerations of informativity.

A central debate about scalar strengthening in the current literature is whether it involves ad hoc pragmatic reasoning on the part of the interpreter, or is a relatively “shallow” phenomenon, involving lexically-given alternatives to specific scalar forms, possibly encapsulated in the grammar. One question that bears on this debate is the extent to which scalar inferences are variable. If scalar inference is pragmatic, we would expect it to be sensitive both to features of context and to lexical pragmatic differences among scalar terms. On the other hand, if scalar inference is simply dependent on the availability of lexical alternatives, we might expect more homogeneity.

Two types of contextual variability have been widely acknowledged. First, context influences whether or not a particular occurrence (token) of a scalar will be strengthened: scalar effects are pragmatically canceled in some environments. Second, the grainedness of strengthening is contextually variable. For example, if we care only about whether or not all students passed a test, an utterance of Some students passed will trigger the implication “not all,” but perhaps not the implication “not most”. However, if we care about exactly how many passed, the same utterance might carry the “not most” implication.

The current paper investigates a different kind of variability: the extent to which strengthening varies across scalar types: whether, for example, the term cool is strengthened to “not cold” at different rates than possibly is strengthened to “not certainly”. This point is tangential to the question of contextual cancelation of scalar effects or modification of grainedness, which, according to standard views, would affect all scalars equally.

Current theories of scalar strengthening do not predict variability across types. Perhaps for this reason, the experimental literature on scalars has been dominated by the assumption that all scalar items behave in the same way, and therefore that findings pertaining to any scalar item can safely be generalized to all scalars (cf. Doran, Baker, McNabb, Larson, & Ward, 2009; van Tiel, van Miltenburg, Zevakhina, & Guerts, 2014). Consistent with this, the experimental literature on scalar implicatures is almost entirely based on only two scalar exemplars: the quantifiers (typically some), and the connective or (van Tiel et al., 2014).

Another prediction of current theories of scalar strengthening is that strengthened interpretations rule out both (contextually available) non-maximal values on the relevant semantic scale and maximal values. Most existing experimental work on scalars investigates whether or not sentences containing weak scalars are understood to be consistent with the top of the associated semantic scale (e.g., Subjects are asked questions like: “Jane says I ate some of the chips; does she means that she did not eat all of the chips?”) But an additional question is whether, and to what extent, weak scalars are understood to be consistent with stronger but non-maximal values on the scale (i.e. if some implies “not all,” does it equally imply “not most”?) The current study contributes to the very small body of work investigating this (e.g., Zevakhina, 2012), or the related question of whether non-minimal scalars trigger strengthening at the same rates as minimal ones (e.g. Beltrama & Ziang, 2012).The aims of current experiment are to: (1) investigate whether there is variation among scalar items in the degree to which unembedded occurrences (i.e. main clause occurrences not in the scope of any operator) of these items give rise to scalar inferences, and (2) explore the content of strengthened interpretations.

Two previous papers have reported experiments investigating the homogeneity assumption. Doran et al. (2009) tested a range of scalars using a task in which participants were instructed to take the perspective of a character named Literal Lucy, who always interpreted statements literally, and from that perspective to judge the truth value of a statement made by a second character. Doran et al. found considerable variation in strengthening across the set of scalars they tested. However, the complexity of Doran et al.’s perspective-taking task raises concerns about their findings, as does the fact that the judgments they gathered were their participants’ guesses about the interpretations of a third party.

Van Tiel et al. (2014) tested a wide range of scalars in two questionnaires. Participants made inferences about a speaker’s mental model based on his or her use of one of two contrasting scalars, as in the following example:

(4) John says: “She is intelligent.”
Would you conclude from this that, according to John, she is brilliant?

Participants answered yes or no. In a second version of the survey, the statements to be judged had more specific predicates and full noun phrases instead of pronouns. Both surveys showed similar variation in how likely different scalars were to be strengthened. Further experiments investigated what factors might account for the observed variation, and found two properties that made a significant contribution: semantic distance and boundedness. The current experiment follows up on van Tiel et al., using a different method and a different conception of scalar inference.

Van Tiel et al. adopted an approach consistent with much of the current literature, ultimately informed by Horn (1972) and Gazdar (1979), according to which scalar inference is driven by the interpreter’s knowledge of a lexical scale (Horn scale) associated with a given scalar item. A lexical scale is an ordered n-tuple of lexical items standing in asymmetrical entailment relations. The lexical scales that would be invoked to explain the inferences in (1)-(3) above are shown in (5)-(7) below:

(5) <all, most, many, a few, some>

(6) <frigid, cold, cool>

(7) <like, love>

Gazdar (1979) takes the elements in these scales to be semantic representations and not expressions of English. He argues that this position is required for consistency with the underlying Gricean idea, because “to read off [scalar implicatures] from the actual lexical items given in the surface structure would be tantamount to treating them as conventional implicatures” (p.56).

This view of scales has, though, largely disappeared from the literature, with the result that scalar inferences are typically seen as inferences involving formally determinable alternative sentences (e.g. Katzir, 2007). On this view, utterance of a sentence S containing a weak scalar item implies the negation of sentences derivable from S by replacing the weak scalar item with logically stronger ones (e.g. by replacing cool with cold).

In contrast, we take scalar inference to be driven by reasoning about the underlying semantic scale associated with scalar expressions: e.g. quantitative/proportional scales associated with the quantifiers; scales of temperature associated with temperature expressions; etc. We therefore take scalar strengthening to be an inference that guides the interpreter’s construction of a mental model of the content expressed by the speaker. We view lexical scales merely as realizations of the underlying semantic scale over which reasoning takes place.

We nonetheless recognize the importance of the expressions in these associated scales as vehicles for increasing the salience of alternative portions of the underlying scale, and it is for this reason that our experiment was explicitly designed to avoid the use of scale-mates in probing participants’ interpretations of scalars. The design was motivated by Geurts and Pouscoulous (2009) (see also Pouscoulous 2006; Geurts, 2009; Doran et al., 2009), who found that experimental designs in which subjects are presented with the scalar alternatives of a target scalar result in increased reports of strengthened interpretations. Van Tiel et al. (2014) used such a design and justified it on the grounds that a general heightening of scalar effects should not affect the relative frequency of strengthening across scalars. However, it is unknown whether this heightening effect is uniform across scalars. One contribution of the current paper is to offer a new method for investigating scalar implicatures without presenting scalar alternatives.

The current study was designed to probe maximally natural scalar interpretation. To this end, weak scalars were embedded in the relatively rich context of 3-6 sentence paragraphs. Participants read the paragraphs and after each one judged the consistency or inconsistency of each of seven sentences with the paragraph. To minimize the likelihood that participants would focus on scalar interpretation, four of these sentences were fillers that did not involve scalars. The remaining sentences probed scalar interpretation in a novel way, avoiding the use of explicit scalar alternatives. Instead, these probe sentences contained descriptions of events or states of affairs consistent with different readings of the scalar sentences from the passage. For example, one passage contained the sentence She noticed that many of her pencils were chewed on. To test whether many is judged consistent with all without offering this as an explicit alternative, we asked participants to judge whether the sentence 100% of her pencils were chewed on was consistent with the passage. (Note that the phrase “100%” is not itself a scale mate of the term some on standard views: first, scale mates must be equally lexicalized; some and “100%” are not; also, they belong to different syntactic categories: some is a determiner, 100% is not.)

Although our method avoids explicit comparisons between scale mates, it may encourage semantic/pragmatic reasoning about the meaning of the scalar term in its context. We take this to be a benefit. Our goal is, in part, to investigate whether the standard mechanistic approaches to scalar inferences – taking the scalar alternatives to be linguistically given and the process of strengthening to be automatically triggered – are consistent with ordinary interpretation, which happens in many different contexts. If strengthening is affected (in ways going beyond granularity) by contextually induced reasoning, this is important to take into account in our models of the process.

Experiment

Method

Participants

Forty-three native American-English speaking undergraduates from Carnegie Mellon University received $8 each for completing the questionnaire.

Design and Stimuli

The experiment had a 9x3 repeated measures within-subjects design. The first factor was the scalar. We tested 8 scalar words, each one associated with a non-maximal point on an underlying scale. Target words were: cool, warm, good, like, many, some, possible, and think. Possible was tested twice: once for a strengthening that excluded the top of its scale (equivalent to the meaning of possible but not certain) and the other for a strengthening that excluded a higher but not maximal point on its scale (roughly the meaning of possible but not probable).

The target scalars were divided into three triads: {cool, good, possible1}, {many, possible2, think}, {some, like, warm}. For each triad, we constructed twelve naturalistic paragraphs consisting of 3-6 sentences, and included one instance of each scalar in each paragraph. Every paragraph was about a different situation or scenario, and the scalars were used with a variety of different argument types. For example, the verb like appeared with direct objects that were activities, food, other people, etc. This variety allowed us to sample strengthening across a range of naturalistic uses.

The second factor was the relation between the content of the probe sentence and the semantic scale underlying the target scalar. For each scalar, we devised ways to invoke a region of the underlying semantic scale clearly consistent with the bounded (strengthened) reading of the scalar, and parallel ways to invoke a region of the scale above that bound. Probe sentences were designed to test in two different ways whether the target scalar was judged compatible with this higher region of the scale. Unstrengthened-compatible (UNSTR) sentences described the relevant feature of the passage as being above the expected bound, i.e. invoked a point or region at or close to the top of the underlying semantic scale. These sentences would be judged consistent with the passage only if the scalar item were given an unstrengthened interpretation. Crucially, the unstrengthened-compatible prompts were simple, non-modal statements (e.g. There was a 100% chance that Sally would run into Steven at the pool.) Sentences in the Range condition (RNG) made reference to a range of values on the underlying semantic scale, including points clearly consistent with the strengthened reading and points lying above the upper bound induced by strengthening. The Range sentences always included the words anywhere between modifying the given range so as to elicit a response of “compatible with passage” only if all values on the identified range were compatible.(e.g. The temperature was anywhere between 32 and 60 degrees Fahrenheit, or The chance of rain was anywhere between 30% and 100%). Because the Range sentences included the values from the Unstrengthened-Compatible condition, we expected that these conditions would pattern together.

In the third, Strengthened-compatible (STR), condition, sentences described the relevant feature of the passage as being at a point or small range of values clearly within the strengthened interpretation of the scalar. These sentences were expected to be judged consistent with the passage on any reading of the target scalar (as strengthened-compatible sentences would also be compatible with the unstrengthened reading of the target item). See Table 1 for an example item from the cool, good, possible/certain triad. (The complete materials are available at http://www.cmu.edu/dietrich/philosophy/people/faculty/core-faculty/simons.html.)

** TABLE 1 ABOUT HERE **

Each of the 36 experimental paragraphs was followed by an Unstrengthened-Compatible sentence for one scalar, a Strengthened-Compatible sentence for a different scalar, and Range sentence for the remaining scalar, as well as four filler sentences. Three counter-balancing lists were created such that the assignment of Unstrengthened-Compatible, Strengthened-Compatible, and Range sentences rotated in a Latin Square design across the three scalars within each paragraph. The order in which scalars appeared in a paragraph was varied. Paragraph presentation order was randomized, as was the ordering of the statements following each passage. Filler sentences ranged in their consistency with the paragraph, with approximately half of the fillers consistent and half inconsistent. Some fillers were easy to judge as consistent or inconsistent with the paragraph, others less so.

Apparatus

The questionnaire used Qualtrics software and was administered via the web.

Procedure

Instructions read as follows: for each statement following the passage, “decide whether or not the statement is consistent with what is, for you, the most natural way of understanding the passage” (bold in original). There followed one example passage, example statements, and example consistency judgments with explanations. Each subsequent questionnaire page had a passage followed by seven statements. Participants judged the consistency of each statement with the passage by pressing either an icon labeled “consistent with passage”, or one labeled “not consistent with passage.” After responding to all statements, participants clicked a “continue” button to move to the next passage. Participants could only advance if every statement had been responded to, and after moving on, it was not possible to return to an earlier passage. Participants were encouraged to complete the questionnaire in one sitting; however, after accessing the questionnaire they were able to save it and complete it later (but not to return to questions that had already been completed).