The New York War Crimes | “All the Consent That’s Fit to Manufacture”

NYT VS. Palestine

Computational Analysis Methodology and Data

March 14, 2024

Article Bank

Two methods were used to gather URLs to relevant news articles from the New York Times (NYT). First, a keyword search in The New York Times API for “Gaza,” “Israel” and “Palestine,” limited to a maximum of 50 articles per day, yielded 2,518 articles published between 7 Oct 2023 and 22 Feb 2024 (inclusive). We decided to exclude all types of NYT content from our analysis besides that of a news article. Removing URLs pointing to opinion, video, live, interactive and other NYT content reduced the number of articles to 1,680.

Second, similar keyword searches on the New York Times’ search page, with the same filtering process applied, yielded an additional 535 articles, bringing the total number of articles under analysis to 2,215. These URLs are listed here.

We then downloaded the full text of each of these URLs. When possible, the Internet Archive’s Wayback Machine was used to download the oldest “snapshot” of the article; otherwise, the text was downloaded from the NYT website. All article downloading was accomplished by a selenium automated web browser. Due to copyright restrictions, we will not be releasing the full text of these articles.

Finally, we filtered the remaining articles for only those for which a NYT-assigned keyword, “Israel-Gaza War (2023– ),” appeared in the top 3 most relevant keywords for the given article. This process yielded a more focused set of 1,003 articles used in Charts 2 and 4.

Keyword-based Methods

Charts 1 and 3 are based on simple keyword counting in the full text of the full 2,215 articles in the article bank. For Chart 1, article texts were scanned for the presence of the words “antisemitic” or “antisemitism” and “Islamophobic“ or “Islamophobia.” The number of articles containing both the former and the latter, the former but not the latter, and the latter but not the former, is shown in the chart. Chart 3x counts the number of mentions for “Israel,” “Israeli” or “Israelis”; “Palestine,” “Palestinian,” or “Palestinians”; “Gaza,” “Gazan” or “Gazans”; and “Hamas,” yielding 1,677 articles mentioning at least one of these words. These counts were then aggregated (summed) across the 21 weeks from October 7th to February 22nd. The data underlying Chart 3 can be found here.

LLM-based Methods

Charts 2 and 4 were made possible by a mixture of human and automatic annotation of the people mentioned in the articles. First, each of the 1,003 articles in the smaller, filtered article bank was passed to several commercially available Large Language Models (LLMs): Google Gemini, OpenAI gpt-3.5-turbo, and OpenAI gpt-4-turbo-preview. The LLMs were prompted to output JSON data detailing the names of people and organizations mentioned in the article; their nationality (if inferable from the article); whether or not they were directly quoted; and whether or not they represent the Israeli government or military in an official capacity. The full prompt used is here.

To validate the output, 50 articles, along with 274 named people and organizations identified in them by the LLMs, were passed to human readers to annotate. Readers were asked to indicate whether the LLM had responded correctly in the case of each person or organization, and if not, to correct the output.

Inter-Annotator Agreement (IAA) was then assessed by comparing the annotations given by human readers and the LLMs. Cohen’s kappa is a common statistic used to assess IAA, with scores above 0.8 generally indicating 'almost perfect' agreement. When comparing the human-annotated nationalities with those annotated by LLMs, Cohen's kappa scores averaged 0.89 (± 0.9) across LLM models, with annotations agreeing 91% (± 7%) of the time. In assessing whether the source officially represents the Israeli government or military, 100% agreement was reached.

When assessing IAA between the LLMs themselves, Cohen’s kappa scores average 0.92 (± 0.05) for nationality annotations, which agreed 94% (± 4%) of the time; and 0.95 (± 0.04) for Israeli official status annotations, which agreed 98% (± 1.6%) of the time. Unfortunately, whether the source was directly quoted was not annotated by readers; however, IAA between LLMs yields an average Cohen’s kappa of .5 (± .17), indicating “moderate agreement,” with an average 79% (± 7%) agreement.

Further validation data can be found here.

Additional LLM prompts were subsequently made to determine whether the named source was an official representative of the United States government or military, or a representative of any Palestinian government or military. The prompts used for those annotations are here and here, respectively.

For Chart 2, the total number of direct quotes was aggregated across 6 pairs of nationality and official status: American officials and non-officials, Israeli officials and non-officials and Palestinian officials and non-officials. The data underlying the chart can be found here.

NLP-based Methods

For Chart 4, an additional processing step was applied to sentences in which Israeli and Palestinian sources appeared. We used the spaCy natural language processing (NLP) library to parse sentences for words' part of speech and grammatical role. With this word-level grammatical data linked to the annotated source name data mentioned above, we determined the frequency with which Israeli and Palestinian people and organizations governed active (e.g. 'killed') vs. passive (e.g. 'was killed') verbs in the sentences in which they appeared as a grammatical subject. We also noted which specific passive and active verbs appeared most frequently.

This process revealed that Israeli people and organizations, when the subject of a verb, govern 95.1% active verbs and 4.9% passive verbs; whereas Palestinians governed 91.8% active and 8.2% passive verbs. Consequently, Chart 4 shows that the likelihood of Palestinian sources governing a passive verb is ~1.7x (8.2 / 4.9) that of Israeli ones.

The words shown in Chart 4 are the most frequent passive and active verbs, with the exception of the word “said,” the most common active verb for both groups, which was removed as under informative. The data for the chart can be found here.