R Meetup Day 4


Fourth session of the R meetup for Digital Humanists main page previous
American University of Beirut

I had a specific task this week. I was preparing data scraped from a library catalog as found below, and I wanted to extract all the rows beginning with “Published/Created” and make a list of them(bolded below):

===================================================
Permalink: http://lccn.loc.gov/60057948
Personal name: Belot, J. B. (Jean Baptiste), 1822-1904.
Main title: Dictionnaire français-arabe.
Published/Created: Beyrouth, Impr. catholique, 1890.
Description: v. 22 cm.
CALL NUMBER: PJ6645.F6 B394 1890 Copy 1
Request in: Jefferson or Adams Building Reading Rooms
===================================================
Permalink: http://lccn.loc.gov/2013415453
Personal name: Arrajānī, Aḥmad ibn Muḥammad, 1067 or 1068-1149 or 1150.
Uniform title: Poems
Main title: Dīwān al-adīb al-mudaqqiq wa-al-balīgh al-muḥaqqiq al-Qāḍī al-ʻādil al-rashīd al-Imām Nāṣiḥ al-Dīn Abī Bakr Aḥmad ibn al-Ḥusayn al-Arrajānī / ṣaḥḥaḥahu wa-fassara al-gharīb min alfāẓihi Aḥmad ibn ʻAbbās al-Azharī.
Published/Created: Bayrūt : Maṭbaʻat Jarīdat Bayrūt, [1]307 [1889 or 1890]
Description: 453 p. ; 22 cm.
CALL NUMBER: PJ7755 .A72 1889 Arab Cage Copy 1
Request in: African & Middle Eastern Reading Room (Jefferson, LJ220)
===================================================

Here is the code I used:

beipub <- scan("LOCbookspublishers.txt", what = "character", sep = "\n")
write(grep("Published/Created", beipub, value = TRUE), file='BPN')

This was very convenient and fast since the file included almost 10000 entries. I pulled this file 'BPN' into LibreOffice Calc and set the delineators as colon, comma and created a table as follows: