Exploring family relations from online obituaries using text mining and data visualisation tools

Using a copy of all the obituaries published online by the Ahram newspaper from January 2002 till April 2008 it is possible to use Linux command line tools (gawk, sed, bash) to find family relations between individuals in certain professions. An example given here explores the family links between a sample of 456 Egyptian state security officers.

This is a very brief description of the method.

  1. The first step is to convert the HTML files downloaded by curl into one giant text file.
  2. Then to move each separate obituary into a line of it's own.
  3. Extract officer names sandwiched between rank and place of work into a separate text file.
  4. Search for the names of each officer through each obituary, family links between different officers can be discovered.
  5. The output is in GraphViz .dot format, which draws a graph similar to the one below.
  6. Graph showing 63 family links between 174 officers from 456 Egyptian state security officers. Each link corresponds to a family tie of a variable degree of relationship including in-laws.

    Graph showing 63 family links between 174 officers from 456 Egyptian state security officers. Each link corresponds to a family tie of a variable degree of relationship including in-laws.

    This is just a preview of what might be possible using a data set of 43,156 obituary. Without control group(s), this graph says nothing other than it's pretty and that there are family links between officers. Other methods for analysis of data could be done using statistical methods to answer different questions.

    UPDATE: I attached the list of SS officers and the script used to find links between officers and output a .dot Graphviz file. You can download the obituaries dataset from here.

    UPDATE Zeinobia wrote a very interesting post about the use of Ahram obituaries by the IDF before 1973 and what the obituaries mean to many Egyptian families:

    Before the Yom Kippur war in 1973 the IDF soldiers and officers used to stand on the East bank of the Suez Canal calling Egyptian army officers and soldiers by name along with their families names surprisingly in order to prove that the IDF is the best army in this universe that knew everything everywhere.

    The intelligence did not take to much long to figure out how the IDF and also Mossad got their info : Al Ahram Obituary pages , the most famous Who is Who pages in Egypt society and the cash cow of the famous newspaper along with its ad. Egyptian families especially the big and rich ones like to show off with their relations through big obituary , it has become a shallow and silly tradition that the longer the obituary is regardless of how expensive it will be , the most prestigious the family is. As personal experience there are still family members who are angry from my grandma on how she dared and published a small obituary for my grandfather in the famous newspaper !!

    UPDATE If you look carefully you will find a couple of mistakes, duplicate names. I will try to fix those, but not now as I am incredibly busy.

Language section:

english tags:

AttachmentSize
SS_list.text31.78 KB

Share this post