Data Mining

This week in DITA we looked at Data Mining, which is generally defined as the process of analyzing data from different perspectives and summarizing it into useful information. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Last week, we went through some text analysis methods, which are a subset of data mining.

This blog will explore my experience of practicing data mining using two major projects: Old Bailey Online and one of the digitization projects from the University of Utrecht.

The Old Bailey Online:

The Old Bailey Proceedings Online allows access to over 197,000 trials and biographical details of approximately 2,500 men and women executed at Tyburn, free of charge for non-commercial use.

I started with the original site search engine, and use London as the keyword to focus the search in this city, and to refine my search I chose Richard as the given name. I selected Murder crime in the period of 1680 to 1850. The verdict I chose was Guilty including all the punishment categories. I used the option calculate total to display the full number of the results obtained.

Screen Shot 2014-11-30 at 21.07.38

The API Demonstrator is different from the original site research in different ways. Firstly, the API allows a search by trial rather than by offence, defendant, etc. as is the case on the original site; and, second, that it allows you to explore the result sets, before exporting them to Zotero (for weeding), or to Voyant Tools for further linguistic analysis.

I used the API to search for the murder crime in London committed by male on women in the period of 29 April 1674 to 1 April 1913. This query resulted in 25 trials. These results can now be explored in detail either by choosing Undrill in relation to specific query components, or by using Break Down to identify relevant sub-sets of trials. Also, we can identify similar texts to any trial by using the More Like.

Screen Shot 2014-12-06 at 23.11.31

Screen Shot 2014-12-06 at 23.31.21

Exporting the results using Voyant Tools

The result generated can be explored for further analysis using text analysis tools such as Zotero and Voyant tools. I used the option of sent to Voyant to export the full text of 100 trials to the Voyant Tools site as shown below.

Screen Shot 2014-12-07 at 00.04.02

Screen Shot 2014-12-07 at 00.03.18

Utrecht University Digital Humanities Lab Text Mining Research Projects

I explored Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic as a project of Utrecht University. This project aims to access free, online search to historical sources from various disciplines all over the world. This project allow scholars to discuss problems, share information and add transcriptions or footnotes. One of the great advantages of this project is its ability to invoke new questions, new interpretations and new information, and to bring all this together on an expanding website. The application used by this project called ePistolarium, which allows researchers to browse and analyze around 20,000 letters that were written by and sent to 17th century scholars who lived in the Dutch Republic. In addition, the ePistolarium enables visualizations of geographical, time-based, social network and co-citation inquiries.

I used this appellation to search for all the letters that contained the phrase of Digital Information in the period of 1650 t0 1690. There was 25 results, which can be displayed either as a list or visualised as shown below.

Screen Shot 2014-12-07 at 01.02.27

Screen Shot 2014-12-07 at 01.04.41Screen Shot 2014-12-07 at 01.08.31

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s