Saturday, February 7, 2015

Days 5 & 6--The end of the beginning and Regressian

Day 5 Logged--1:16
Day 6 Logged--1:45

Day 5 was great--I figured out the script for all the segregation indices and banged out everything I needed--exposure and dissimilarity indices for:
white-black
white-nonwhite
white-hispanic
black-hispanic

I then fell down a little rabbit hole on my tests because things weren't adding up--and then I realized it's because the Reardon dataset includes Tennessee, which I dropped because Tennessee didn't submit demographic data in 1990, so they don't have data for the full 20 year panel.  I'm leaning towards just leaving them out, but I suppose can take a look at trying to append before the next phase.

What's the next phase?  Well, it's actually doing some observations and starting to answer my research questions.  And that's where the 6 year layoff since a real statistics class is starting to show. I can look at either the theory or the programming behind how similar analyses are done...and I understand about 25% of it.  It's like reading a foreign language you studied in high school twenty years later--you understand enough to persist, but are largely baffled.

My gameplan is to go back to the very basics.  I (embarrasingly) redefined my dependent and independent variables tonight, and am reviewing the basics on regression and panel data.  I think it's foolish to try to modify or replicate analyses I can't understand, and I think in reality I'm only seeking to confirm what others have found already with regards to the segregation indices in the district under court-order.  So my idea is to start from IV/DV and work my way up to a "Eureka" moment around how to run MY analysis.

So tomorrow night, instead of diving back into STATA, I'm diving back into reading around regression and panel data (Fixed Effects vs Random Effects) to start to get a better sense of what I really should be doing here.

Also, it's the weekend, so I had two cocktails while reading. Not sure if it was causal, but it was definitely correlated.

Friday, February 6, 2015

Days 3 & 4--X Marks the Spot/A Night off

Day 3 Time Logged--1:00
Day 4 Time Logged: 0:00

We'll work backwards here.  I took tonight off--I know, after only 3 lousy nights!  I had a rough day, the deadly combination of not enough sleep, work issues, and cancer stress (nothing specifically bad happened, it's just generally stressful when you kid is working through recovery from brain cancer). The next thing I know, it's 9:40 and I feel like I've been worked over by some guys with baseball bats.

So...I'll put in an extra hour sometime this weekend.  Feeling much better today, and actually excited to put some work in.

Two nights ago, I made some big steps.  First, I reworked the dataset to match the Reardon data set.  I added a few districts, but more importantly, did 2 things--1) it dropped some districts that were erroneously reported as being under court order in some previous resources (Reardon's team and I used pretty much all the same resources--specifically John Logan's list from the Mumford Center, correspondence with the DOE, Vigdor/Clotfelter/etc. list from 2004, and Armour and Rossell).  On some of these lists, old cases that were voluntary orders or HEW plans that weren't court-ordered ended up getting treated as court-ordered cases.  The second reason is that Reardon drops all districts under 2000 kids because they tend not to work well with deseg indices.

That's right, deseg indices!  I programmed them in!  Sort off--it turns out that there's a LOT of user-written STATA program packages ("do files" for those of you in the know) including several on segregation matrices.  The trick is making sure you understand how to plug your data into the right spots on the pre-written formulas so that the calculations work.  It took a little while to figure this out, but by the end of the night I was able to calculate dissimiliarity and exposure indices, and the same program with the same structure to the formula can calculate several others that may be useful at some point (isolation, information theory are both possible).

The next step is to run all of the indices on the following match-ups:
1. Black/White
2. White/Nonwhite
3. White/Hispanic
4. Black/Hispanic (not sure what this will reveal, but I think it's worth running if the code's already there)

Once I have all of these, I can start to work towards answering the questions for the chapter.

Doug

Wednesday, February 4, 2015

Day 3--Moving Forward

Day 2 time logged--1:15 Cumluative Time: 2:37

So I think I have a way forward for my first chapter.  I'm going to keep the blogging to a minimum tonight because I want to get my hour in and be done with it--or hit a breakthrough and run with it.

Basically, here's the plan:

  1. Use the Reardon districts
  2. Compare changes in composition and segregation between districts never under court order, continuously under court order, and released from court order
  3. Run/replicate changes in segregation in districts released (consider using balanced panel of only districts that have 5 years pre and 5 years post)
  4. Use Orfield regions
  5. Use only dissimilarity and exposure index
  6. Check Black/Hispanic segregation to see if there's something there

We'll see how it goes.

UPDATE-- Boo-yah! Figured out how to get the Segregation Indices programmed, and I think I checked it and it works out.  Am going to run a few hand calculations tomorrow to check but I now can run dissimilarity, exposure, and Isolation indices on everything!  So tomorrow night, we will!

Tuesday, February 3, 2015

Day 2--Mixed Results

Day 1--Time logged, 1:22

So day 1 started pleasantly enough--my old files booted up just fine in my shiny new copy of STATA 13, which appeared to be running super fast on my new computer.  After a couple of quick sort-resorts to remind myself I hadn't forgotten everything, I dove back into my first research chapter.

My dissertation looks at a group of approximately 425 districts that were following court-ordered desegregation plans at the time of the Dowell ruling in 1991.  I wanted to see how these districts fared throughout the 90s and 00s, particularly those that were released from court order.  My first chapter really looks at two questions in particular:

1. How did racial composition of districts under court order change relative to the overall public school population?
2. Did desegregation decelerate/resegregation occur in districts that were released from their court-ordered plans.

So I set off for Google Scholar to look for some resources that would help me start doing the analysis to answer these questions...and boy did I come across one.

For those of you who don't know Sean Reardon (all of you?) he's like Michael Jordan/Paul McCartney/Picasso of quantifying segregation using various metrics.  And he essentially worked with some other folks to answer my first set of dissertation questions in 2012--while I was still screwing around raising a 1-year old and traveling to recruit schools into a massive randomized control study for work.

Here's the good, the bad, and the ugly about finding this paper:

The Good

1. The paper confirms that my data set was pretty spot on--I'm going to compare mine to theirs (they dropped all districts with fewer than 2000 kids, I dropped only those below 500), but it looks like we pulled from all the same sources.
2. Reardon and his co-authors shows a few different ways to measure changes in districts from the time they were released from court orders.
3.  The conclusion to the paper specifically calls out a few of my other research questions as areas that warrant further research.

The Bad

1. I feel like writing this chapter is pointless now.  Reardon and his co-authors do a very sophisticated, detailed look at how the gains in desegregation achieved under court orders do indeed receed after those orders are listed.  It's really thorough and well done--to the point that I emailed my advisor a copy of the paper along with a plea to consider dropping this question and moving straight on to my other two questions.  He replied that if our data sets are different, it's worth writing this up. I'm going to take one more look tonight and hope that they are different.  If not, I'll soldier through, but if feels pointless, and though I'll give full credit to Reardon and his crew and admit that I'm basically replicating his research, it feels a bit like plagiarizing.  At the very least, I feel like I've already read the ending of the mystery I was planning to write.

The Ugly

I'm not going to say I'm totally lost trying to follow the "discrete-time hazard models" and "comparative interrupted time-series models", but...uh, this is significantly more complicated than running a basic linear regression.  Stata (and Google! And whatever versions of "Advanced Statistical Modeling for Dummies" I can find, don't fail me now!)!

So--at least I have a somewhat clearer (if complicated) path in front of me.  Time to slog on and put my hour in for the night...why do I know it's only an hour?  Because last night, our youngest decided he wanted to wake up and hang out with us from midnight until 2. So yeah, short night for me.

Monday, February 2, 2015

Day 1--The Last Homework Assignment

This blog is about trying to revive a partially completed dissertation that's been on life support for years.  I won't bore you with the details of why in the first post--I'm a part-time student who has a full-time job helping lead a educational non-profit.  I had two kids, one of them got (and is beating!) cancer. So yeah, a lot of shit.

Life has been getting back to normal the past few months, and people have been asking about my PhD/dissertation in one of 3 ways:

1. They refer to me as a PhD and I have to inform them that no--I never actually did finish my dissertation.
2. They ask me how the dissertation is going, and I chuckle and inform them it's not.
3. They act surprised that I'm in/still in a graduate program, since I started my PhD in the fall of 2006. Yes, fall of 2006.

So technically, I should have been DQ'd from the program at the end of Spring 2014 semester, or at least have started to lose credits since I've been going so long.  But because my son had cancer, and we were pretty much at his bedside for a year, I've been on a leave of absence since the fall of 2013 semester.  

Anyways, when people ask me how the dissertation is going (See #2 above), I usually admit that I could get the dissertation done if I could just put 100 hours into it.  Then I keep wondering where this 100 hours is going to magically come from...

...and then in the shower yesterday morning, I started thinking about it--100 hours is essentially a month or so of giving up TV/Internet/Chillout time in the evenings and dedicating it to the dissertation.  That's basically what a dedicated high school or college student puts into reading and/or homework every week night, so not totally unreasonable.  Plus, it's a finite number.  I push myself for 100 hours and see where it stands.

It's also worth noting that I fall for gimmicks that involve a finite amount of time.  The Whole 30. half marathon training plans, Lent--any sort of time-bound self-improvement plan resonates with me, so I figure giving the dissertation some short-term, intensive focus can't hurt.  Also, the metacognitive, self-aware, blogging it factor (which is admittedly very 2007--thanks a lot, Julia and Me) is kind of a draw, though I'm not sure I'll share this with anyone.  Maybe a few of my friends from my doctoral cohort that I still talk to.

So anyways, before I get started for the evening, here are the rough rules:

1. I put in a minimum of 1 hour a day on this--I was originally going to say 3 hours, but I also don't want to give up the first day I can't hit that amount.  I'm going to shoot for 3 every night, but have to put in at least an hour.
2. I keep going until I hit 100 hours or a completed draft.
3. Time writing this blog or doing indirectly related things (tweaking STATA, moving files around Google Drive, shopping for robes) doesn't count towards the 1-hour a day time cap or the 100 hour project limit.
4. Not sure how to count meetings with my advisor (who probably thinks I'm a joke by now), committee members (who will no doubt be surprised that I'm enrolled), and others--but for now, we're not going to count them until they come up or they prove helpful.

I realize that I should probably give an overview of the dissertaiton, but it's already 9:04, so I really ought to get writing.  I'll give an overview tomorrow night as well as how much time I've logged.