Minamata Fish Market Decline 1950-1956

The visualizations might be ready to print out and think about.

After putting a table of data into R-style, “long form” lists, Racket's group-by and sort allows for creative labeling and ordering. The hours spent with R/ggplot2 got me ready for “pilfering” ideas into Racket. (fn:2)

Harada水俣病p11 Original Table

Working with this table of fisheries data made it easy to appreciate functional programming with sort. You are free to create a function that will sort, or order, the data. It's simple to adhere to Howard Wainer's rules for “table construction.”(fn:1)

Harada 水俣病p11 longform rows plot

With a few more iterations I might find a way to abstract out the common parts to the row-based and column-based plots. There is still a lot of repetition... If I did this more frequently I guess I'd develop a common function that is directed toward a row-based or a column-based plot with a few key words and settings. But maybe clear visualizations, with sensible label placement, repetition and high ink-to-data ratios need hand-crafting. So maybe the utility functions discovered while building these plots will be enough: row-sum, col-sum, and percent-of-col seem general enough but I wonder how it is helpful to get the percentage of the first figure in a row: cell-percent-of-first-row-val .

It's been very educational, “wrangling” the little tables of data from Masazumi Harada's first Minamata Disease book. I'm getting a feel for how important “naming” is. While working with “long-form” data it helps to keep the table convention in mind, so I want a consistent use of “row”, “column” and “cell”... I ended up using “cell” for the list that has the original table cell's value after the row-header and column-header. Maybe a new name would be better. Is “long-form-cell” too long? How about “r-c-v-list”? I'm not sure how I feel about the naming so far, but the convention of going from left to right, row-name to column-name to value has helped me keep track of the processes.

I wonder why the amount of crab that reached market always seemed the least affected every year. And why shad was alway most affected? Even crab declined 58% by 1956, but it increased the first year. These visualizations have given me some interesting questions for the next chance I get to hang out at a port talking with fishermen.

Harada 水俣病p11 longform cols plot

code

#lang racket

(define my-home (find-system-path 'home-dir))
(define learning-directory "Learning-Minamata")

;; 単位 貫 カン 3.75kg 8.3lb
(define data-file "HM-1-p11-FisheryFigures.csv")
;; 魚種,昭和25〜28年平均,29年,30年,31年
;; ボラ,16000,14521,10136,5901
;; エビ,4727,2425,1558,945
;; 片ロイワシ,44515,27076,12536,6926
;; コノシロ,8457,1811,1615,318
;; タチ,13851,7931,6535,5354
;; タコ,3896,2430,2033,1179
;; イカ,3293,2517,1480,1043
;; カキ,2659,1973,1427,429
;; ナマコ,2750,2302,1630,535
;; ハモ,2083,1726,1243,603
;; カニ,1441,1702,1024,600
;; その他,18789,8102,4731,1660
;; 計,122460,74516,45948,25493


(define data-path (build-path my-home learning-directory data-file))

(define get-path
  (lambda (file-name)
    (build-path (current-directory-for-user) file-name)))

(define get-data
  (lambda (pth)
    (let* ((inp (open-input-file pth))
	   (lines (port->lines inp)))
      (close-input-port inp)
      (map (lambda (s) (string-split
			(regexp-replace* "\"" s "")
				     ","))
	   lines))))

(define fishery-data-Nengo (get-data data-path))
;(define headers (first fishery-data))	
;; '("魚種" "昭和25〜28年平均" "29年" "30年" "31年")

;; new column headers
(define headers-en-19
  '("fish" "1950-53avg" "1954" "1955" "1956"))

;; new row headers
(define fish-en '("mullet" "shrimp" "anchovy" "shad"
                   "hairtail" "octopus" "squid" 
                   "oyster" "sea slug" "sea eel" "crab" "other" "total"))

(define replace-firsts
  (lambda (orig-list new-firsts)
    (define replace-first  ;; Racket-style guide "lambda is cute but..."
      (lambda (lst frst)
        (cons frst (rest lst))))
    (map replace-first orig-list new-firsts)))

(define (replace-headers-firsts table headers firsts)
  (cons headers
	(replace-firsts (cdr table) firsts)))

(define (貫string->kg-number str)
  (* 3.75 (string->number str)))
(define (貫strings-row->kg-numbers row)
  (define head (car row))
  (cons head
	(map 貫string->kg-number (cdr row))))

(define fishery-data-en-19-kg
  (cons headers-en-19
	(replace-firsts
	 (map 貫strings-row->kg-numbers
	      (cdr fishery-data-Nengo))
	 fish-en)))
;; '(("fish" "1950-53avg" "1954" "1955" "1956")
;;   ("mullet" 60000.0 54453.75 38010.0 22128.75)
;;   ("shrimp" 17726.25 9093.75 5842.5 3543.75)
;;   ("anchovy" 166931.25 101535.0 47010.0 25972.5)
;;   ("shad" 31713.75 6791.25 6056.25 1192.5)
;;   ("hairtail" 51941.25 29741.25 24506.25 20077.5)
;;   ("octopus" 14610.0 9112.5 7623.75 4421.25)
;;   ("squid" 12348.75 9438.75 5550.0 3911.25)
;;   ("oyster" 9971.25 7398.75 5351.25 1608.75)
;;   ("sea slug" 10312.5 8632.5 6112.5 2006.25)
;;   ("sea eel" 7811.25 6472.5 4661.25 2261.25)
;;   ("crab" 5403.75 6382.5 3840.0 2250.0)
;;   ("other" 70458.75 30382.5 17741.25 6225.0)
;;   ("total" 459225.0 279435.0 172305.0 95598.75))

(define (data-table->long-form table)
  (let ((long-cells '()))
    (define column-headers (cdr (car table)))
    (define data-rows (cdr table))
    (define (add-cell cell)
      (set! long-cells (cons cell long-cells)))
    (define (data-row->long-cells row)
      (define row-head (car row))
      (define numbers (cdr row))
      (define (data-cell->long-cell num c-head)
	(add-cell (list row-head c-head num)))
      (map data-cell->long-cell numbers column-headers))
    (map data-row->long-cells (cdr table))
    (reverse long-cells)))
;; (data-table->long-form-try-2 fishery-data-19-kg)    

(define fish-market-en-long-form
  (data-table->long-form ;; remove last "total" row
   (reverse (cdr (reverse fishery-data-en-19-kg)))))
   ;fishery-data-en-19-kg))

(define (get-row header table)
  (filter (lambda (cell) (string=? (first cell) header))
	  table))

(define (row-sum header table)
  (define row (get-row header table))
  (apply + (map third row)))

(define (get-col header table) ;; get-column
  (filter (lambda (cell) (string=? (second cell) header))
	    table))

(define (col-sum header table) ;; sum column figures
  (define col (get-col header table))
  (apply + (map third col)))

(define (get-cell r-head c-head table)
  (define row (get-row r-head table))
  (car (get-col c-head row)))

(define (percent-of-col r-head c-head table)
  (define col (get-col c-head table))
  ;; efficient but brittle, use col column but relies on
  ;; convention that first-row second-col third-num
  (define sum (apply + (map third col))) 
  (define cell (car (get-row r-head col)))
  (define num (third cell))
  (* 100 (/ num sum)))

(define (percent-of-col-num header num table)
  ;(define col (get-column header table))
  (* 100 (/ num (col-sum header table))))

(define (cell-percent-of-first-row-val r-head c-head table)
  (define row (get-row r-head table))
  (define cell (car (get-col c-head row)))
  (define num (third cell))
  (define row-nums (map third row))
  (define standard (first row-nums))
  (* 100 (/ num standard)))

(require plot/utils)

(define (percent->diff-label num)
  (if (= num 100)
      ""
      (string-append
       (->plot-label (- (- 100 num)) 0)
       "%")))

(define (kg-fig->kg-label num)
  (string-append (->plot-label num 0) "kg"))

(require plot)

(define horizontal-line
  (lambda (x y #:clr (clr "black")) ;; lne-wdh lne-stl ..etc
    (lines (list (vector 0 y) (vector x y)) #:color clr)))

(define labeled-point
  (lambda (x y l #:ancr (ancr 'left) #:fnt-sze (fnt-sze 8) #:fnt-clr (fnt-clr "black") #:pnt-sze (pnt-sze 5)
            #:pnt-clr (pnt-clr 0) #:lbl-angl (lbl-angl 0) )
    (point-label (vector x y) l #:anchor ancr #:size fnt-sze #:color fnt-clr #:point-size pnt-sze
                #:point-color pnt-clr #:angle lbl-angl)))

;; plot-a-line, was plot-a-fish,plot a cell!!
(define (plot-a-cell cell y table)
  (define fish (first cell))
  (define year (second cell))
  (define num (third cell))
  (define x-percent (cell-percent-of-first-row-val fish year table))
  (define kglabel (kg-fig->kg-label num))
  (define fish-kglabel (string-append fish " " kglabel))
  (define %diff-label (percent->diff-label x-percent))
  ;; need `parameterize`? to set data-table as default?
  (define %of-col (percent-of-col-num year num table))
  (define %of-col-label (string-append
			 (->plot-label %of-col 0) "% of" year " total"))
  (define kg-and-%of-year-label
    (string-append fish-kglabel ", " %of-col-label))
  (list
   (points (list (vector x-percent y)) #:sym 'fullcircle)
   (labeled-point 0 y kg-and-%of-year-label
		  #:ancr 'top-left #:pnt-sze 0 #:fnt-clr "gray" #:fnt-sze 10)
   (horizontal-line x-percent y)
   (labeled-point 100 y %diff-label #:ancr 'bottom-right #:pnt-sze 0 #:fnt-sze 10)
   (labeled-point 100 y year #:ancr 'bottom-left #:pnt-sze 3 #:pnt-clr "red"  #:fnt-sze 10)
   (lines (list (vector x-percent y) (vector 100 y)) #:color "red")))

;; A row is a group of long-form cells derived from row in original table
(define (plot-a-row row y-center table)
  (define y-label (first (car row)))
  (define y-limit .35)
  (define ys (linear-seq
	      (- y-center y-limit) (+ y-center y-limit)
	      (length  row)))
  (define (helper rows ys plots)
    (cond
     ((empty? rows)  plots) ;; no need to revers, ys determing plot order
     (#t (helper (cdr rows) (cdr ys)
		 (cons (plot-a-cell (car rows) (car ys) table) plots)))))
  (list (labeled-point 110 y-center y-label #:fnt-sze 14 #:pnt-sze 0 #:ancr 'bottom-left)
	;(labeled-point 0 y-center y-label #:fnt-sze 14 #:pnt-sze 0 #:ancr 'left)
	(helper row ys '())))
;; 
(define (get-group-1956-percent row-group)
  (define r-head (first (car row-group)))
  ;; hard-coding 1956) works better with `last
  ;; with cell-percent-of-first-row-val
  (define c-head "1956")
  (cell-percent-of-first-row-val r-head c-head fish-market-en-long-form))

(define (compare-1956-percent group1 group2)
  (< (get-group-1956-percent group1)
     (get-group-1956-percent group2)))

(define (plot-rows table)
  (define groups (group-by first table))
  (define len (length groups))
  (define y-seq (linear-seq 1 len len))
  (define (helper rows ys plots)
    (cond ;; no need to reverse consed list
     ((empty? rows) plots) ;; just reverse the ys
     (#t (helper (cdr rows) (cdr ys)
		 (cons (plot-a-row (car rows) (car ys) table)
		       plots)))))
  (helper (sort groups compare-1956-percent)
	  (reverse y-seq) '()))

;; make a function that takes a table and
;; then plots the groups with the #:y-max set right
(parameterize
    ((plot-title "Harada「水俣病」p.11魚類別漁獲高調査表'53-'56")
     (plot-x-axis? #f)
     (plot-x-far-axis? #f)
     (plot-x-label #f)
     (plot-y-axis? #f)
     (plot-y-far-axis? #f)
     (plot-y-label #f)
     (plot-width 600)
     (plot-height 1200)
     )
  (plot
   (plot-rows fish-market-en-long-form)
   #:x-min 0 #:x-max 140
   #:y-min .5 #:y-max (+ .5 (length (group-by first fish-market-en-long-form)))
   #:out-file "Harada-MinamataByo-p11-rowplot-6.png"
   #:out-kind 'png
   ))


(define (get-percent cell)
  (define fish (first cell))
  (define year (second cell))
  (cell-percent-of-first-row-val fish year fish-market-en-long-form))
  
(define (compare-percent c1 c2)
  (< (get-percent c1) (get-percent c2)))
;; (sort ex-1956 compare-percent)

(define (plot-a-col-cell cell y table)
  (define fish (first cell))
  (define year (second cell))
  (define num (third cell))
  (define x-percent (cell-percent-of-first-row-val fish year table))
  (define kglabel (kg-fig->kg-label num))
  (define fish-kglabel (string-append fish " " kglabel))
  (define %diff-label (percent->diff-label x-percent))
  ;; need `parameterize`? to set data-table as default?
  (define %of-col (percent-of-col-num year num table))
  (define %of-col-label (string-append
			 (->plot-label %of-col 0) "% of" year " total"))
  (define kg-and-%of-year-label
    (string-append fish-kglabel ", " %of-col-label))
  (list
   (points (list (vector x-percent y)) #:sym 'fullcircle)
   (labeled-point 0 y kg-and-%of-year-label
		  #:ancr 'top-left #:pnt-sze 0 #:fnt-clr "gray" #:fnt-sze 10)
   (horizontal-line x-percent y)
   (labeled-point 100 y %diff-label #:ancr 'bottom-right #:pnt-sze 0 #:fnt-sze 10)
   (labeled-point 100 y fish #:ancr 'bottom-left #:pnt-sze 3 #:pnt-clr "red"  #:fnt-sze 10)
   (lines (list (vector x-percent y) (vector 100 y)) #:color "red")))

;; plot a column with meaningful order: Howard Wainer's advice
;;  against A-first order with no relation to data; use compare-percent
;; the table column is not a long-form row with col-header as second element of each cell
(define (plot-a-col row y-center table)
  (define y-label (second (car row)))
  (define y-limit .40)
  (define ys (linear-seq
	      (- y-center y-limit) (+ y-center y-limit)
	      (length  row)))
  (define (helper rows ys plots)
    (cond
     ((empty? rows)  plots) ;; no need to revers, ys determing plot order
     (#t (helper (cdr rows) (cdr ys)
		 (cons (plot-a-col-cell (car rows) (car ys) table) plots)))))
  ;; y-limit needs global definition or parameterization or something... 
  (list (labeled-point 20 (+ .40 y-center) y-label #:fnt-sze 16 #:pnt-sze 0
                      #:ancr 'bottom-left #:lbl-angl 0)
	;(labeled-point 0 y-center y-label #:fnt-sze 14 #:pnt-sze 0 #:ancr 'left)
	(helper (sort row compare-percent) ys '())))

(define (plot-cols table)
  (define groups (group-by second table))
  (define len (length groups))
  (define y-seq (linear-seq 1 len len))
  (define (helper rows ys plots)
    (cond ;; no need to reverse consed list
     ((empty? rows) plots) ;; just reverse the ys
     (#t (helper (cdr rows) (cdr ys)
		 (cons (plot-a-col (car rows) (car ys) table)
		       plots)))))
  (helper groups y-seq '()))

(parameterize
    ((plot-title "Harada Masazumi「水俣病」p.11 魚類別漁獲 '53-'56 減収")
     (plot-x-axis? #f)
     (plot-x-far-axis? #f)
     (plot-x-label #f)
     (plot-y-axis? #f)
     (plot-y-far-axis? #f)
     (plot-y-label #f)
     (plot-width 600)
     (plot-height 1200)
     )
  (plot
   (plot-cols fish-market-en-long-form)
   #:x-min 0 #:x-max 130
   #:y-min .5 #:y-max (+ .5 (length (group-by second fish-market-en-long-form)))
   #:out-file "Harada-MinamataByo-p11-colplot-6.png" 
   #:out-kind 'png
   ))

The first rule of table construction is to: 1. Order the rows and columns in a way that makes sense. We are almost never interested in “Austria First.” Two useful ways to order the data are:

a. Size places-Put the largest first. Often we look most carefully at what is on top and less carefully further down. … b. Naturally-Time is ordered from the past to the future. Showing data in that order melds well with what the viewer might expect. This is always a good idea.

  1. Round-a lot! This is so for three reasons:

a. Humans cannot understand more than two digits very easily. b. We can almost never justify more than two digits of accuracy statistically. c. We almost never care about accuracy of more than two digits. – Stable URL: http://www.jstor.org/stable/1176346

#DrRacket #DataVisualization #Minamata #HaradaMasazumi #MasazumiHarada #水俣病 #原田正純 #データビジュアリゼーション #漁業