The value of intellectual play: Mill, machine learning, and a drilling problem I couldn’t stop thinking about

Featured

A few years back, I watched a CSEG talk by Lee Hunt (then at Jupiter Resources) called Value thinking: from the classical to the hyper-modern. One case study in particular stuck with me—so much so that I ended up exploring it in a Jupyter Lab notebook, bringing it up in a job interview, and eventually testing whether an AI could reason through it on its own.

This post is about that journey. It’s also about what happens when you let yourself get genuinely curious about someone else’s problem. And—fair warning—it involves a 19th-century philosopher, a seven-well dataset, and a neural network that learned to distrust AVO attributes.

The problem

Jupiter Resources had a history of occasionally encountering drilling trouble in the Wilrich reservoir—specifically, loss of circulation when encountering large systems of open fractures. Mud loss. The kind of problem that can cost you a well.

They had done extensive geophysical work with multiple seismic attributes that, in theory, should correlate with fractures: Curvature, Coherence, AVAz (amplitude variation with azimuth), VVAZ (velocity variation with azimuth), and Diffraction imaging. But they lacked direct calibration data for the drilling problem, and some of the attributes were giving conflicting results.

Lee Hunt, who led the team and the geophysical work, suspected from the start that the AVO-based attributes might be compromised. He had seen evidence as far back as 2014 that AVAz and VVAZ responses in the Wilrich were dominated by an overlying coal, not the fractures themselves—the attributes were measuring a different geological signal entirely. Diffraction imaging was planned early as a complementary measure, precisely because it might not be affected by the coals in the same way (personal communication).

Seven wells. Five attributes. Four of the wells had experienced drilling problems; three had not. Here’s the data:

The question: which attribute—or combination—could reliably predict drilling problems, so that future wells could be flagged ahead of time?

Mill’s Methods: 19th-century philosophy meets drilling risk

Rather than accept uncertainty and provide no geophysical guidance at all, the team at Jupiter tried something different: Mill’s Methods of Induction. Their goal was to find a pattern that could help them advise the operations team—flag high-risk well locations ahead of time so contingency plans could be in place. Mill’s Methods are a set of logical procedures for identifying causal relationships, laid out by philosopher John Stuart Mill in 1843. They’re often illustrated with a food poisoning example (who ate what, who got sick), but they work just as well here.

This approach was characteristic of Lee Hunt’s attitude toward quantitative geophysics—an attitude I had come to admire through his other work. A few years earlier, he had published a CSEG Recorder column called “Many correlation coefficients, null hypotheses, and high value,” a tutorial on statistics for geophysicists that included synthetic production data and an explicit invitation: “You can do it, too. Write in to tell us how.”

I took him up on it. I worked through his examples in Jupyter notebooks, built visualizations, explored prediction intervals, learned a good deal of scientific computing along the way. I reached out to him about the work. I even wrote up some of that exploration in a blog post on distance correlation and variable clustering—the kind of technical deep-dive where you’re learning as much about the tools as about the data. That extended engagement gave me a feel for his way of thinking: understand the statistics, accept the uncertainty, improve your techniques if you can—but don’t just throw up your hands when the data is messy.

Method of Agreement: Look at all the problem wells (A, B, F, G). What do they have in common? Curvature is TRUE for all four. So is Diffraction imaging. The other attributes vary.

Method of Difference: Compare problem wells to non-problem wells (C, D, E). Neither Curvature nor Diffraction alone perfectly discriminates—Well E has Curvature TRUE but no problem; Well D has Diffraction TRUE but no problem.

Joint Method: But here’s the key insight—Curvature AND Diffraction together form a perfect discriminator. Every well where both are TRUE had problems. Every well where at least one is FALSE did not.

This wasn’t a claim about causation. It was a decision rule: when the next well location shows both high curvature and diffraction anomalies, flag it as elevated risk and ensure contingency protocols are in place.

The logic is sound because of asymmetric costs. Preparing for mud loss (having lost circulation material on site, adjusting mud weight plans) is a minor expense. Not preparing when you should have—that’s where you lose time, money, sometimes the well. You don’t need certainty to justify preparation. You need a defensible signal.

What a neural network learned

I wanted to see if a data-driven approach would arrive at the same answer. Looking at the table myself, and spending some time applying Mill’s Methods, I had already seen the pattern—Curvature and Diffraction together were the key predictors. But I was curious: what would a simple neural network learn on its own?

I trained a two-layer network (no hidden layer)—mathematically equivalent to logistic regression—on the same seven wells. (Yes, seven wells. I know. But stay with me.)

The network classified all seven wells correctly. But the real insight came from the weights it learned:

Attribute Weight
Curvature +14.6
Diffraction +9.7
Coherence ~0
AVAz −4.9
VVAZ −14.5

Curvature and Diffraction were strongly positive—predictive of problems. Coherence contributed almost nothing. But AVAz and VVAZ were negative—the network learned to suppress them.

A way to think about negative weights: imagine training a network to identify ducks from a set of photos that includes birds, ducks, and people in duck suits. The network will learn to weight “duck features” positively, but also to weight “human features” negatively—to avoid being fooled by the costumes. In the Wilrich case, the AVAz and VVAZ attributes were like duck suits: they looked like fracture indicators, but they were actually measuring something else.

This was interesting. All five attributes have theoretical justification for detecting fractures. Why would the network actively discount two of them?

When I mentioned this result to Lee Hunt, he confirmed what he had long suspected (personal communication): the AVAz and VVAZ responses in the Wilrich were dominated by an overlying coal, not the fractures themselves. He had measured this effect and documented it in a 2014 paper, where multiple attributes—including AVAz—showed statistically significant correlations to coal thickness rather than to reservoir properties. The neural network had learned, from just seven data points, to suppress exactly the attributes that Lee’s domain knowledge had already flagged as problematic.

This is Mill’s Method of Residues in action: if you know something else causes an observation, subtract it out. And it’s a reminder that domain knowledge and data-driven methods can converge on the same answer when both are applied honestly. I found this deeply satisfying.

What the AI got right—and what it missed

More recently, I revisited this problem using ChatGPT with the Wolfram plugin. I wanted to see if an AI, given just the table and a prompt about Mill’s Methods, could reason its way to the same conclusions.

It did—mechanically. It correctly identified Curvature and Diffraction as the consistent factors among problem wells. It noted that neither attribute alone was a perfect discriminator. It even offered to run logistic regression.

But it missed the interpretive leap. It hedged with phrases like “although there are exceptions” when in fact there were no exceptions to the conjunction rule. And it didn’t articulate the pragmatic framing: that the goal wasn’t to find the true cause, but to build a defensible decision rule under uncertainty.

That framing—the shift from epistemology to operations—required domain knowledge and judgment. The AI could apply Mill’s Methods. It couldn’t tell me why that application was useful here.

Drafting this post, I worked with a different AI—Claude—and found the collaboration more useful in a different way: not for solving the problem, but for reflection. Having to explain the context, the history, the why of my interest helped me articulate what I’d been carrying around in my head for years. Sometimes the value of a thinking partner isn’t in the answers, but in the questions that force you to be clearer.

Why this stuck with me

I’ll be honest: I kept thinking about this problem for years. It became part of a longer arc of engagement with Lee’s work—first the statistics tutorial, then the Wilrich case study, each building on the last.

When I interviewed for a geophysics position (Lee was retiring, and I was a candidate for his role), I mentioned this case study. I pulled out a pen and paper and wrote the entire seven-well table from memory. They seemed impressed—not because memorizing a table is hard, but because it signaled that I’d actually enjoyed thinking about it. That kind of retention only happens when curiosity is real.

I didn’t get the job. The other candidate had more operational experience, and that was the right call. But the process was energizing, and I’m sure that enthusiasm carried into my next opportunity, where I landed happily and stayed for over six years.

I tell this not to brag, but to make a point: intellectual play compounds. You don’t always see the payoff immediately. Sometimes you explore a problem just because it’s interesting—because someone like Lee writes “You can do it, too” and you decide to take him seriously—and it pays dividends in ways you didn’t expect.

The convergence

Three very different approaches—19th-century inductive logic, a simple neural network, and (later) an AI assistant—all pointed to the same answer: Curvature and Diffraction predict drilling problems in this dataset. The AVO attributes are noise, or worse, misleading.

When three methods converge, you can trust the signal. And you can make decisions accordingly.

That’s the real lesson here: rigorous reasoning under uncertainty isn’t about finding the One True Cause. It’s about building defensible heuristics, being honest about what you don’t know, and updating as new data comes in. Mill understood this in 1843. A neural network can learn it from seven wells. And sometimes, so can an AI—with a little help.

I hope you enjoyed this as much as I enjoyed putting it together.


The original case study was presented by Lee Hunt in his CSEG talk “Value thinking: from the classical to the hyper-modern.” The neural network analysis is in my Geoscience_ML_notebook_4. Lee documented the coal correlation issue in Hunt et al., “Precise 3D seismic steering and production rates in the Wilrich tight gas sands of West Central Alberta” (SEG Interpretation, May 2014), and later reflected on confirmation bias as an obstacle to recognizing such issues in “Useful Mistakes, Cognitive Biases and Seismic” (CSEG Recorder, April 2021). My thanks to Lee for the original inspiration, for confirming the geological context, and for sharing the original presentation materials.


  • Hunt, L., 2013, Many correlation coefficients, null hypotheses, and high value: CSEG Recorder, December 2013. Link
  • Hunt, L., S. Hadley, S. Reynolds, R. Gilbert, J. Rule, M. Kinzikeev, 2014, Precise 3D seismic steering and production rates in the Wilrich tight gas sands of West Central Alberta: SEG Interpretation, May 2014.
  • Hunt, L., 2021, Useful Mistakes, Cognitive Biases and Seismic: CSEG Recorder, April 2021.
  • My neural network analysis: Geoscience_ML_notebook_4
  • My earlier exploration of Lee’s production data: Data exploration in Python: distance correlation and variable clustering
  • ChatGPT + Wolfram session on Mill’s Methods: Gist

AI/HI Transparency Statement Modified from Brewin https://www.theguardian.com/books/2024/apr/04/why-i-wrote-an-ai-transparency-statement-for-my-book-and-think-other-authors-should-too

Has any text been generated using AI?Yes
Has any text been improved or corrected using HI?Yes

Additional context: This post emerged from a conversation with Claude AI (Anthropic). I provided the source materials (a ChatGPT + Wolfram session, a Jupyter notebook, personal history with the problem), direction, and editorial judgment throughout. Claude drafted the post based on these inputs and our discussion of structure, voice, and framing. I reviewed multiple draft, revised as needed, rewrote some key sections, and made all final decisions about what went to publication. The core analysis—Mill’s Methods, the neural network, the interpretation—was done by me years before this collaboration; the AI’s role was in helping articulate and structure that work for a blog audience.

ChatGPT as an essay-writing assistant – Part III

Featured

The Challenge of a Satisfying Conclusion

When I published Part II of this series back in February 2025, I had a plan for Part III. Show the prompts I used, analyze the time investment, evaluate the result against Part I’s GPT-3.5 baseline, maybe try one more iteration with even newer tools. Straightforward. Methodical.

But I never finished it. To be honest, I lost interest. Another marginally better AI-generated essay wasn’t going to cut it—not for me, and probably not for you readers either. Another iteration showing GPT-4.5 writes slightly better than GPT-4? That’s predictable, uninspiring… so I dropped it.

But the unfinished series sat there in the back of my mind. I wasn’t actively working on it, but I also couldn’t quite let it go. It created a kind of block—I found myself not writing about anything at all, partly because this felt incomplete, partly because my interests had genuinely shifted elsewhere.

Recently though, I came back to this question. Not because I wanted to complete the series for completeness sake, but because I wanted to understand what would actually make Part III worthwhile.

So I asked for help. I brainstormed with Claude (Anthropic’s AI) about what Part III should actually be about—what would make it worth writing and worth reading. And something clicked.

What Was the Question Really Asking For?

Looking back now, with decades between me and that moment in Professoressa Carbone’s classroom, I think I understand what she was asking for. She wasn’t looking for recitation of Plato’s philosophy mechanically applied to medieval warfare. She wanted to see if I could reason using philosophical frameworks in unfamiliar territory. Synthesis, not facts. Thinking, not performing memorization.

At 15, I wasn’t ready for that. I had volunteered for the oral examination thinking I could rely on prepared material about Plato’s recent lessons. Instead, she cut through my preparation with a single question that required genuine philosophical thinking: “What would Plato have thought about the Hundred Years’ War?”

It was a brilliant pedagogical move. It required understanding Plato’s ideas deeply enough to apply them to a completely different context—a context Plato never encountered, in a historical period he never knew. It required the kind of intellectual flexibility and reasoning that, honestly, I didn’t have yet.

The humiliation I felt wasn’t really about not knowing facts. It was about being exposed as someone trying to get by on memorization rather than understanding. And I think she knew it. She saw through my bluff.

So What Would Satisfy?

This brings me back to the problem of Part III. Showing that AI can now generate a more sophisticated-sounding essay than my 15-year-old self could produce doesn’t prove anything interesting. AI is very good at generating sophisticated-sounding content. That’s almost the problem.

What would actually satisfy—both as closure for this series and as something worth your time reading—is demonstrating the kind of reasoning Professoressa Carbone was asking for. Can I, now, with the benefit of intellectual maturity and AI assistance, actually think through what Plato might have thought about prolonged warfare between nations? Not just string together plausible-sounding paragraphs with proper citations, but engage in genuine philosophical reasoning?

What Would That Actually Look Like?

If I were to actually write that essay—the one demonstrating real philosophical reasoning rather than AI-generated content—what would it need?

Looking back at the GPT-4 essay from Part II, it has proper citations and coherent structure, but it’s superficial. It lists Platonic concepts (philosopher-kings, guardians, ideal states) and applies them mechanically to medieval warfare. That’s exactly the kind of recitation Professoressa Carbone was testing me against.

Real reasoning would require:

  • Connecting Plato’s specific ideas to specific events or decisions during the Hundred Years’ War—not just general principles applied generally
  • Exploring how Plato’s concepts might actually illuminate something about prolonged conflict between nations that we wouldn’t see otherwise
  • Considering contemporary interpretations or modern applications—what do we learn about conflict, governance, or political philosophy from this exercise?
  • Drawing genuine insights about both Plato and warfare, not just restating both

That’s the essay I’d want to write someday. Not as an academic exercise, but as personal closure—proving to myself I can do the kind of thinking she was asking for.

Closure for Now

But that’s not this post. This post is about giving you, the readers, closure on this series. About acknowledging honestly what I learned about AI as a writing assistant, and why simple iteration wasn’t the answer.

Here’s what I’ve learned:

AI is excellent at generating plausible content. GPT-4 produced an essay that looks credible—proper structure, citations, coherent arguments. For many purposes, that’s enough.

But AI doesn’t reason, it recognizes patterns. The essay from Part II strings together familiar ideas in familiar ways. It’s sophisticated pattern matching, not thinking. It can’t do what Professoressa Carbone was asking for: genuine synthesis that produces new insight.

The real value of AI as a writing assistant isn’t in replacing thinking—it’s in supporting it. AI can help with research, organization, articulation. It can reduce cognitive load so you can focus on the hard part: the actual reasoning. But you still have to do the reasoning.

Writing with AI requires clarity about what you’re trying to accomplish. If you want content generation, AI does that well. If you want thinking support, you need to know what thinking you’re trying to do. The tool can’t figure that out for you.

This series started with a simple question: can AI help me write an essay? The answer turned out to be more nuanced than I expected. It depends entirely on what kind of essay, and what role you want AI to play. For the essay I’d need to write to truly answer Professoressa Carbone’s question—the one that demonstrates reasoning rather than recitation—AI could help, but it couldn’t do the essential work.

Maybe someday I’ll write that essay. For now, I’m moving on to other projects where I’m excited about what AI can do: document extraction in geoscience, agentic workflows, problems where AI’s strengths align better with what I’m trying to accomplish.

Thank you for following this journey with me. Even if it didn’t end where I originally planned, I learned something worth sharing.

A Final Thought: Rigor Without Brutality

I started this series partly because of concerns about AI in education—concerns rooted in my own experience.

ChatGPT has educators calling for more in-class writing and oral examinations. I agree we need assessment that can’t be faked by AI. But I’m deeply opposed to the brutality that often came with those older systems.

Here’s the thing: the brutality was never necessary for the educational value. Professoressa Carbone’s question was pedagogically brilliant. The public humiliation didn’t make it more effective; it just made it traumatic.

We need assessment methods that demand genuine reasoning, in environments that support both students and teachers. It’s possible to have rigorous evaluation without breaking people in the process.

AI forces us to confront what we actually value in education: not the appearance of learning, but the development of genuine understanding and reasoning. The question is whether we can build systems that nurture that without the cruelty.

AI/HI Transparency Statement Modified from Brewin https://www.theguardian.com/books/2024/apr/04/why-i-wrote-an-ai-transparency-statement-for-my-book-and-think-other-authors-should-too

Has any text been generated using AI?Yes
Has any text been improved or corrected using HI?Yes

Additional context: This post was collaboratively written through an iterative conversation with Claude (Anthropic). The human author provided the direction, constraints, personal context, and decisions about what to include/exclude. The AI assistant drafted text, which was then reviewed and revised based on feedback. Sections were rewritten multiple times to match the author’s voice and intentions. The final editorial decisions, including what content made it to publication, were made by the human author.

10-year anniversary

Featured

July 15th, 2022

Dear readers:

I started writing this blog 10 years ago and it is to date one of the things I am the most proud of.

This is a big milestone for me, so I would like to begin with thanking all the people that encouraged me at the beginning, and in particular, for their valiant support and feedback: Matt Hall, Evan Bianco, Oliver Kuhn, Steve Lynch, and last but not least my life partner Rebecca.

A lot of the work I did in the first couple of years was on furthering my understanding, and sharing that with others, of the use of colours in scientific visualization, and how to use better colormaps, for example Perceptual rainbow palette, part a and part b.

I am grateful I achieved those knowledge-sharing goals:

  • The material is referenced in the matplotlib documentation
  • The blog has often been used as reference in talks and other publications on colormaps, beginning with this classic matplotlib talk given by Kristen Thyng at scipy 2014
  • I was thrilled to have received positive feedback on my work by Bernice Rogovitz, someone I hold in very high esteem
  • Some of that work on the blog resulted in being invited by Matt Hall to write a tutorial for The Leading Edge (SEG). The tutorial came with a Jupyter notebook demonstrating how to evaluate default colour maps and favour more perceptual alternatives
  • I am particularly proud to see that the article is still ranking in the top 20 most downloaded papers from The Leading Edge (between 2010-2020)
  • Additionally, the two blog post are to date top results for the #google search “perceptual rainbow” and “perceptual palette images

Ultimately, I am very happy to have created a space for sharing and exchanging ideas freely.

So, to celebrate these 10 years of MyCarta, I treated it to a new domain, mycartablog.com (but the old domain, and links still work) and a brand new look (it took me a while to get there but I like it a lot) with a theme that should now be responsive for all devices (welcome to the new era Matteo!).

I will also soon publish a short series of short but sweet new posts on colormaps and visualization (and republish on linkedin).

Thank you all for sharing this journey!

Be a geoscience and data science detective

Featured

September 16, 2020

Introduction

These days everyone talks about data science. But here’s a question: if you are a geoscientist, and like me you have some interest in data science (that is, doing more quantitative and statistical analyses with data), why choose between the two? Do both… always! Your domain knowledge is an indispensable condition, and so is an attitude of active curiosity (for me, an even more important condition). So, my advice for aspiring geoscientists and data scientists is: “yes, do some course work, read some articles, maybe a book, get the basics of Python or R, if you do not know a programming language” but then jump right ahead into doing! But – please! – skip the Titanic, Iris, Cars datasets, or any other data from your MOOC or that many have already looked at! Get some data that is interesting to you as geoscientist, or put it together yourself.

Today I will walk you through a couple of examples; the first one, I presented as a lightning talk at the Transform 2020 virtual conference organized by Software Underground. This project had in fact begun at the 2018 Geophysics Sprint (organized by Agile Scientific ahead of the Annual SEG conference) culminating with  this error flag demo notebook. Later on, inspired by watching the Stanford University webinar How to be a statistical detective, I decided to resume it, to focus on a more rigorous approach. And that leads to my last bit of advice, before getting into the details of the statistical analysis: keep on improving your geo-computing projects (if you want more advice on this aspect, take a look at my chapter from the upcoming Software Underground book, 52 things you should know about geocomputing): it will help you not only showcase your skills, but also your passion and motivation for deeper and broader understanding.

First example: evaluate the quality of seismic inversion from a published article

In this first example, the one presented at Transform 2020, I wanted to evaluate the quality of seismic inversion from a published article in the November 2009 CSEG Recorder Inversion Driven Processing. In the article, the evaluation was done by the authors at a blind well location, but only qualitatively, as illustrated in Figure 5, shown below for reference. In the top panel (a) the evaluation is for the inversion without additional processing (SPNA, Signal Protected Noise Attenuation); in the bottom panel (b) the evaluation is for the inversion with SPNA. On the right side of each panel the inverted seismic trace is plotted against the upscaled impedance well log (calculated by multiplying the well density log and the well velocity log from compressional sonic); on the right, the upscaled impedance log is inserted in a seismic impedance section as a colored trace (at the location of the well) using the same color scale and range used for the impedance section.

Figure 5 caption: Acoustic impedance results at the blind well for data without (a) and with (b) SPNA. The figure shows a 200 ms window of inverted seismic data with well B, the blind well, in the middle on the left, along with acoustic impedance curves for the well (red) and inverted seismic (blue) on the right. The data with SPNA shows a better fit to the well, particularly over the low frequencies.

What the authors reported in the figure caption is the extent to which the evaluation was discussed in the paper; unfortunately it is not backed up in any quantitative way, for example comparing a score, such as R^2, for the two methods. Please notice that I am not picking on this paper in particular, which in fact I rather quite liked, but I am critical of the lack of supporting statistics, and wanted to supplement the paper with my own.

In order to do that, I hand-digitized from the figure above the logs and inversion traces , then interpolated to regularly-sampled time intervals (by the way: if you are interested in a free tool to  digitize plots, check  use WebPlotDigitizer).

My plan was to split my evaluation in an upper and lower zone, but rather than using the seismically-picked horizon, I decided to add a fake top at 1.715 seconds, where I see a sharp increase in impedance in Figure 5. This was an arbitrary choice on my part of a more geological horizon, separating the yellow-red band from the green blue band in the impedance sections. The figure below shows all the data in a Matplotlib figure:

The first thing I did then, was to look at the Root Mean Square Error in the upper and lower zone obtained using the fake top. They are summarized in the table below:

Based on the RMSE , it looks like case b, the inversion with Signal Protected Noise Attenuated applied on the data, is a better result for the Upper zone, but not for the Lower one. This result is in agreement with my visual comparison of the two methods.

But lets’ dig a bit deeper. After looking at RMSE, I used the updated version of the error_flag function, which I first wrote at the 2018 Geophysics Sprint, listed below:
 
def error_flag(pred, actual, stat, dev = 1.0, method = 1):
    """Calculate the difference between a predicted and an actual curve 
    and return a curve flagging large differences based on a user-defined distance 
    (in deviation units) from either the mean difference or the median difference
    
Matteo Niccoli, October 2018. Updated in May 2020.
    
    Parameters:
        predicted : array
            predicted log array           
        actual : array
            original log array            
        stat : {‘mean’, ‘median’}
            The statistics to use. The following options are available:
                - mean: uses numpy.mean for the statistic, 
                and np.std for dev calculation
                - median: uses numpy.median for the statistic, 
                and scipy.stats.median_absolute_deviation (MAD) for dev calculation        
        dev : float, optional
            the standard deviations to use. The default is 1.0           
        method : int {1, 2, 3}, optional
            The error method to use. The following options are available
            (default is 1):
                1: difference between curves larger than mean difference plus dev
                2: curve slopes have opposite sign (after a 3-sample window smoothing)
                3: curve slopes of opposite sign OR difference larger than mean plus dev  
    Returns:
        flag : array
        The error flag array
    """   
    
    flag = np.zeros(len(pred))
    err = np.abs(pred-actual)
    
    if stat == 'mean':
        err_stat = np.mean(err)
        err_dev = np.std(err)
    elif stat == 'median':
        err_stat = np.median(err)
        err_dev = sp.stats.median_absolute_deviation(err)
        
    pred_sm = pd.Series(np.convolve(pred, np.ones(3), 'same'))
    actual_sm = pd.Series(np.convolve(actual, np.ones(3), 'same'))
    ss = np.sign(pred_sm.diff().fillna(pred_sm))
    ls = np.sign(actual_sm.diff().fillna(actual_sm))
                  
    if method == 1:
        flag[np.where(err>(err_stat + (dev*err_dev)))] = 1
    elif method == 2:      
        flag[np.where((ss + ls)==0 )]= 1
    elif method == 3:
        flag[np.where(np.logical_or(err>(err_stat + (dev*err_dev)), (ss+ls)==0 ))]= 1
    return flag

I believe this new version is greatly improved because:

  • Users now can choose between mean/standard deviation and median/median absolute deviation as a statistic for the error. The latter is more robust in the presence of outliers
  • I added a convolutional smoother prior to the slope calculation, so as to make it less sensitive to noisy samples
  • I expanded and improved the doctstring

The figure below uses the flag returned by the function to highlight areas of poorer inversion results, which I assigned based on passed to the function very restrictive parameters:

  • using median and a median absolute deviation of 0.5 to trigger flag
  • combining the above with checking for the slope sign
 

I also wrote short routines to count the number and percentage of samples that have been flagged, for each result, which are summarized in the table below:

The error flag method is in agreement with the RMS result: case b, the inversion on Signal Protected Noise Attenuated data is a better result for the Upper zone, but for the Lower zone the inversion without SPNA is the better one. Very cool!

But I was not satisfied yet. I was inspired to probe even deeper after a number of conversations with my friend Thomas Speidel, and reading the chapter on Estimation from Computational and Inferential Thinking (UC Berkeley). Specifically, I was left with the question in mind: “Can we be confident about those two is better in the same way?

This question can be answered with a bootstrapped Confidence Interval for the proportions of flagged samples, which I do below using the code in the book, with some modifications and some other tools from the datascience library. The results are shown below. The two plots, one for the Upper and one for the Lower zone, show the distribution of bootstrap flagged proportions for the two inversion results, with SPNA in yellow, and without SPNA in blue, respectively, and the Confidence Intervals in cyan and brown, respectively (the CI upper and lower bounds are also added to the table, for convenience).

By comparing the amount (or paucity) of overlap between the distributions (and between the confidence intervals) in the two plots, I believe I can be more confident in the conclusion drawn for the Lower zone, which is that the inversion on data without SPNA is better (less proportion of flagged errors), as there is far less overlap.

I am very pleased with these results. Of course, there are some caveats to keep in mind, mainly that:
  • I may have introduced small, but perhaps significant errors with hand digitizing
  • I chose a statistical measure (the median and median absolute deviation) over a number of possible ones
  • I chose an arbitrary geologic reference horizon without a better understanding of the reservoir and the data, just the knowledge from the figure in the paper

However, I am satisfied with this work, because it points to a methodology that I can use in the future. And I am happy to share it! The Jupyter notebook is available on GitHub.

Second example: evaluate regression results from a published article

My second, more recent example, is shorter but no less interesting, in my opinion. It is a case study correlating inversion data to petrophysical estimates of porosity-height in the Doig and Montney Formations in Western Canada (from the paper Tight gas geophysics: AVO inversion for reservoir characterization, Close et al. CSEG Recorder, May 2010, an article which I enjoyed very much reading).

The authors indicated that Vp/Vs and/or Poisson’s ratio maps from seismic inversion are good indicators of porosity in the Lower Doig and Upper Montney reservoirs in the wells used in their study, so it was reasonable to try to predict Phi-H from Vp/Vs via regression. The figure below, from the article, shows one such Vp/Vs ratio map and the Vp/Vs vs. Phi-H cross-plot for 8 wells.

Figure 7 caption: Figure 7. a) Map of median Vp/Vs ratio value and porosity-height from 8 wells through the Lower Doig and Upper Montney. The red arrows highlight wells with very small porosity-height values and correspond in general to areas of higher Vp/Vs ratio. The blue arrow highlights a well at the edge of the seismic data where the inversion is adversely affected by decreased fold. The yellow line is the approximate location of a horizontal well where micro-seismic data were recorded during stimulation. b) Cross-plot of porosity-height values as shown in (a) against Vp/Vs extracted from the map. The correlation co-efficient of all data points (blue line) of 0.7 is improved to 0.9 (red line) by removing the data point marked by the blue arrow which again corresponds to the well near the edge of the survey (a).

They also show in the figure that by removing one of the data points, corresponding to a well near the edge of the survey (where the seismic inversion result is presumably not as reliable, due to lower offset and azimuth coverage), the correlation co-efficient is improved from 0.7 to 0.9 (red line).

So, the first thing I set out to do was to reproduce the crossplot.  I again hand-digitized the porosity-height and Vp/Vs pairs in the cross-plot using again WebPlotDigitizer. However, switched around the axes, which seems more natural to me since the objectives of regression efforts would be to predict as the dependent variable Phi-h, at the location of yet to drill wells, given Vp/Vs from seismic inversion. I also labelled the wells using their row index, after having loaded them in a Pandas DataFrame. And I used Ordinary Least Square Regression from the statsmodels library twice: once with all data points, the second time after removal of the well labelled as 5 in my plot above.

So, I am able to reproduced the analysis from the figure. I think removing an outlier with insight from domain knowledge (the observation that poorer inversion result at this location is reasonable for the deviation from trend) is a legitimate choice. However, I would like to dig a bit deeper, to back up the decision with other analyses and tests, and to show how one might do it with their own data.

The first thing to look at is an Influence plot, which is a plot of the residuals, scaled by their standard deviation, against the leverage, for each observation. Influence plots are useful to distinguish between high leverage observations from outliers and are one of statsmodel ‘s standard Regression plots, so we get the next figure almost for free, with minor modifications to the default example). Here it is below, together with the OLS regression result, with all data points.

From the Influence plot it is very obvious that the point labelled as zero has high leverage (but not high normalized residual). This is not a concern because points with high leverage are important but do not alter much the regression model. On the other hand, the point labelled as 5 has very high normalized residual. This point is an outlier and it will influence the regression line, reducing the R^2 and correlation coefficient. This analysis is a robust way to legitimize removing that data point.

Next I run some inferential tests. As I’ve written in an earlier notebook on Data loading, visualization, significance testing, I find the critical r very useful in the context of bivariate analysis. The critical r is the value of the correlation coefficient at which you can rule out chance as an explanation for the relationship between variables observed in the sample, and I look at it in combination with the confidence interval of the correlation coefficient.

The two plots below display, in dark blue and light blue respectively, the upper and lower confidence interval bounds, as the correlation coefficient r varies between 0 and 1 (x axis). These two bounds will change with different number of wells (they will get closer with more wells, and farther apart with less wells). The lower bound intersects the x axis (y=0) axis at a value equal to the critical r (white dot). The green dots highlight the actual confidence interval for a specific correlation coefficient chosen, in this case 0.73 with 9 wells, and 0.9 with 8 wells.

By the way: these plots are screen captures from the interactive tool I built taking advantage of the Jupyter interactive functionality (ipywidgets). You can try the tool by running the Jupyter notebook.

With 9 wells, and cc=0.73, the resulting critical r = 0.67 tells us that for a 95% confidence level (0.05 alpha) we need at least a correlation coefficient of 0.67 in the sample (the 9 wells drilled) to be able to confidently say that there is correlation in the population (e.g. any well, future wells). However, the confidence interval is quite broad, ranging between 0.13 and 0.94 (you can get these numbers by running confInt(0.73, 9) in a cell.

With 8 wells (having removed the outlier), CC=0.9, the critical r is now 0.71, meaning that the requirement for rejecting the the null hypothesis (there is no association between Vp/Vs and Phi-H) is now a bit higher. However, a CC increased to 0.9, and only one less well, also results in a confidence interval ranging from 0.53 to 0.98, hence our confidence is greatly increased.

This second analysis also corroborates the choice of removing the outlier data point. One thing worth mentioning before moving on to the next test is that these confidence interval bounds are the expected population ones, based on the sample correlation coefficient and the number of observations; the data itself was not used. Of course, I could also have calculated, and shown you, the OLS regression confidence intervals, as I have done in this notebook (with a different dataset).

My final test involved using the distance correlation (dcor.distance_correlation) and p-value (dcor.independence.distance_covariance_test) from the dcor library. I have written before in a blog post how much I like the distance correlation, because it does not assume a linear relationship between variables, and because a distance correlation of zero does mean that there is no dependence between those two variables (contrary to Pearson and Spearman). In this case the relationship seems to be linear, but it is still a valuable test to compare DC and p-value before and after removing the outlier. Below is a summary of the test:
All data points:
D.C. =  0.745
p-value =  0.03939
Data without outlier:
D.C. =  0.917
p-value =  0.0012

The distance correlation values are very similar to the correlation coefficients from OLS, again going up once removed the outlier. But, more interestingly, with all data points, a p-value of 0.04079 is very close to the alpha of 0.05, whereas once we removed the outlier, the p-value goes down by a factor of 20, to 0.0018. This again backs up the decision to remove the outlier data point.

Data exploration in Python: distance correlation and variable clustering

Featured

April 10, 2019

In my last post I wrote about visual data exploration with a focus on correlation, confidence, and spuriousness. As a reminder to aficionados, but mostly for new readers’ benefit: I am using a very small toy dataset (only 21 observations) from the paper Many correlation coefficients, null hypotheses, and high value (Hunt, 2013).

The dependent/target variable is oil production (measured in tens of barrels of oil per day) from a marine barrier sand. The independent variables are: Gross pay, in meters; Phi-h, porosity multiplied by thickness, with a 3% porosity cut-off; Position within the reservoir (a ranked variable, with 1.0 representing the uppermost geological facies, 2.0 the middle one, 3.0 the lowest one); Pressure draw-down in MPa. Three additional ‘special’ variables are: Random 1 and Random 2, which are range bound and random, and were included in the paper, and Gross pay transform, which I created specifically for this exercise to be highly correlated to Gross pay, by passing Gross pay to a logarithmic function, and then adding a bit of normally distributed random noise.

Correlation matrix with ellipses

I am very pleased with having been able to put together, by the end of it, a good looking scatter matrix that incorporated:

  • bivariate scatter-plots in the upper triangle, annotated with rank correlation coefficient, confidence interval, and probability of spurious correlation
  • contours in the lower triangle
  • shape of the bivariate distributions (KDE) on the diagonal

In a comment to the post, Matt Hall got me thinking about other ways to visualize the correlation coefficient.  I did not end up using a colourmap for the facecolour of the plot (although this would probably be relatively easy, in an earlier attempt using hex-bin plots, the colourmap scaling of each plot independently – to account for outliers – proved challenging). But after some digging I found the Biokit library, which comes with a lot of useful visualizations, among which corrplot is exactly what I was looking for. With only a bit of tinkering I was able to produce, shown in Figure 1, a correlation matrix with:

  • correlation coefficient in upper triangle (colour and intensity indicate whether positive or negative correlation, and its strength, respectively)
  • bivariate ellipses in the lower triangle (ellipse direction and colour indicates whether positive or negative correlation; ellipticity and colour intensity are proportional to the correlation coefficient)

Figure 1. Correlation matrix using the Biokit library

Also notice that – quite conveniently – the correlation matrix of Figure 1 is reordered with strongly correlated variables adjacent to one another, which facilitates interpretation. This is done using the rank correlation coefficient, with pandas.DataFrame.corr, and Biokit’s corrplot:

corr = data.corr(method='spearman')
c = corrplot.Corrplot(corr)
c.plot(method='ellipse', cmap='PRGn_r', shrink=1, rotation=45, upper='text', lower='ellipse')
fig = plt.gcf()
fig.set_size_inches(10, 8);

The insightful take-away is that with this reordering, the more ‘interesting’ variables, because of strong correlation (as defined in this case by the rank correlation coefficient), are close together and reposition along the diagonal, so we can immediately appreciate that Production, Phi-h, and Gross Pay, plus to a lesser extent position (albeit this one with negative correlation to production) are related to one another. This is a great intuition, and supports up our hypothesis (in an inferential test), backed by physical expectation, that production should be related to those other quantities.

But I think it is time to move away from either Pearson or Spearman correlation coefficient.

Correlation matrix with distance correlation and its p-value

I learned about distance correlation from Thomas when we were starting to work on our 2018 CSEG/CASP Geoconvention talk Data science tools for petroleum exploration and production“. What I immediately liked about distance correlation is that it does not assume a linear relationship between variables, and even more importantly, whereas with Pearson and Spearman a correlation value of zero does not prove independence between any two variables, a distance correlation of zero does mean that there is no dependence between those two variables.

Working on the R side, Thomas used the Energy inferential statistic package. With thedcor function he calculated the distance correlation, and with the dcov.test function the p-value via bootstrapping.

For Python, I used the dcor and dcor.independence.distance_covariance_test from the dcor library (with many thanks to Carlos Ramos Carreño, author of the Python library, who was kind enough to point me to the table of energy-dcor equivalents). So, for example, for one variable pair, we can do this:

print ("distance correlation = {:.2f}".format(dcor.distance_correlation(data['Production'], 
                                                                        data['Gross pay'])))
print("p-value = {:.7f}".format(dcor.independence.distance_covariance_test(data['Production'], 
                                                                           data['Gross pay'], 
                                                                           exponent=1.0, 
                                                                           num_resamples=2000)[0]))
>>> distance correlation  = 0.91
>>> p-value = 0.0004998

So, wanting to apply these tests in a pairwise fashion to all variables, I modified the dist_corr function and corrfunc function from the existing notebook

def dist_corr(X, Y, pval=True, nruns=2000):
""" Distance correlation with p-value from bootstrapping
"""
dc = dcor.distance_correlation(X, Y)
pv = dcor.independence.distance_covariance_test(X, Y, exponent=1.0, num_resamples=nruns)[0]
if pval:
    return (dc, pv)
else:
    return dc
def corrfunc(x, y, **kws):
d, p = dist_corr(x,y) 
#print("{:.4f}".format(d), "{:.4f}".format(p))
if p > 0.1:
    pclr = 'Darkgray'
else:
    pclr= 'Darkblue'
ax = plt.gca()
ax.annotate("DC = {:.2f}".format(d), xy=(.1, 0.99), xycoords=ax.transAxes, color = pclr, fontsize = 14)

so that I can then just do:

g = sns.PairGrid(data, diag_sharey=False)
axes = g.axes
g.map_upper(plt.scatter, linewidths=1, edgecolor="w", s=90, alpha = 0.5)
g.map_upper(corrfunc)
g.map_diag(sns.kdeplot, lw = 4, legend=False)
g.map_lower(sns.kdeplot, cmap="Blues_d")
plt.show();

Figure 2. Revised Seaborn pairgrid matrix with distance correlation colored by p-value (gray if p > 0.10, blue if p <= 0.10)

Clustering using distance correlation

I really like the result in Figure 2.  However, I want to have more control on how the pairwise plots are arranged; a bit like in Figure 1, but using my metric of choice, which would be again the distance correlation. To do that, I will first show how to create a square matrix of distance correlation values, then I will look at clustering of the variables; but rather than passing the raw data to the algorithm, I will pass the distance correlation matrix. Get ready for a ride!

For the first part, making the square matrix of distance correlation values, I adapted the code from this brilliant SO answer on Euclidean distance (I recommend you read the whole answer):

# Create the distance method using distance_correlation
distcorr = lambda column1, column2: dcor.distance_correlation(column1, column2) 
# Apply the distance method pairwise to every column
rslt = data.apply(lambda col1: data.apply(lambda col2: distcorr(col1, col2)))
# check output
pd.options.display.float_format = '{:,.2f}'.format
rslt
rslt

Table I. Distance correlation matrix.

The matrix in Table I looks like what I wanted, but let’s calculate a couple of values directly, to be sure:

print ("distance correlation = {:.2f}".format(dcor.distance_correlation(data['Production'], 
data['Phi-h'])))
print ("distance correlation = {:.2f}".format(dcor.distance_correlation(data['Production'], 
data['Position'])))
>>> distance correlation = 0.88
>>> distance correlation = 0.45

Excellent, it checks!

Now I am going to take a bit of a detour, and use that matrix, rather than the raw data, to cluster the variables, and then display the result with a heat-map and accompanying dendrograms. That can be done with Biokit’s heatmap:

data.rename(index=str, columns={"Gross pay transform": "Gross pay tr"}, inplace=True)
distcorr = lambda column1, column2: dcor.distance_correlation(column1, column2)
rslt = data.apply(lambda col1: data.apply(lambda col2: distcorr(col1, col2)))
h = heatmap.Heatmap(rslt)
h.plot(vmin=0.0, vmax=1.1, cmap='cubehelix')
fig = plt.gcf()
fig.set_size_inches(22, 18)
plt.gcf().get_axes()[1].invert_xaxis();

Figure 3. Biokit heatmap with dendrograms, using correlation distance matrix

That’s very nice, but please notice how much ‘massaging’ it took: first, I had to flip the axis for the dendrogram for the rows (on the left) because it would be incorrectly reversed by default; and then, I had to shorten the name of Gross-pay transform so that its row label would not end up under the colorbar (also, the diagonal is flipped upside down, and I could not reverse it or else the colum labels would go under the top dendrogram). I suppose the latter too could be done on the Matplotlib side, but why bother when we can get it all done much more easily with Seaborn? Plus, you will see below that I actually have to… but I’m putting the cart before the horses…. here’s the Seaborn code:

g = sns.clustermap(rslt, cmap="mako",  standard_scale =1)
fig = plt.gcf()
fig.set_size_inches(18, 18);

and the result in Figure 4. We really get everything for free!

Figure 4. Seaborn clustermap, using correlation distance matrix

Before moving to the final fireworks, a bit of interpretation: again what comes up is that Production, Phi-h, Gross Pay, and Gross pay transform group together, as in Figure 1, but now the observation is based on a much more robust metric. Position is not as ‘close’, it is ends up in a different cluster, although its distance correlation from Production is 0.45, and the p-value is <0.10, hence it is still a relevant variable.

I think this is as far as I would go with interpretation. It does also show me that Gross pay, and Gross pay transform are very much related to one another, with high dependence, but it does still not tell me, in the context of variable selection for predicting Production, which one I should drop: I only know it should be Gross pay transform because I created it myself. For proper variable selection I will look at techniques like Least Absolute Shrinkage and Selection Operator (LASSO, which Thomas has showcased in his R notebook) and Recursive Feature Elimination (I’ll be testing Sebastan Raschka‘s Sequential Feature Selector from the mlxtend library).

Correlation matrix with distance correlation, p-value, and plots rearranged by clustering

I started this whole dash by saying I wanted to control how the pairwise plots were arranged in the scatter matrix, and that to do so required use of  Seaborn. Indeed, it turns out the reordered row/column indices (in our case they are the same) can be easily accessed:

a = (g.dendrogram_col.reordered_ind)
print(a)
>>> [1, 6, 2, 7, 0, 5, 3, 4]

and if this is the order of variables in the original DataFrame:

b = list(data)
print (b)
>>> [Position', 'Gross pay', 'Phi-h', 'Pressure', 'Random 1', 'Random 2', 'Gross pay transform', 'Production']

we can rearrange them with those reordered column indices:

data = data[[b[i] for i in a]]
print(list(data))
>>> ['Gross pay', 'Gross pay transform', 'Phi-h', 'Production', 'Position', 'Random 2', 'Pressure', 'Random 1']

after which we can simply re-run the code below!!!

g = sns.PairGrid(data, diag_sharey=False)
axes = g.axes
g.map_upper(plt.scatter, linewidths=1, edgecolor="w", s=90, alpha = 0.5)
g.map_upper(corrfunc)
g.map_diag(sns.kdeplot, lw = 4, legend=False)
g.map_lower(sns.kdeplot, cmap="Blues_d")
plt.show()

Figure 5. Revised Seaborn pairgrid matrix with distance correlation colored by p-value (gray if p > 0.10, blue if p <= 0.10), and plots rearranged by clustering results

UPDATE: I just uploaded the notebook on GitHub.

ChatGPT as an essay-writing assistant – Part II

The blog post below was produced entirely by GPT-4.0, following a series of iterative prompts I provided, from the Introduction to the References and Footnotes, included. Please refer to my AI (Artificial Intelligence) and HI (Human Intelligence) Table in the last section. In the next post, I will include the full listing of my prompts, text evaluation, and time investment versus return analysis. I will also attempt prompting for some further improvements beyond this initial analysis.

Plato’s Perspective on the Hundred Years’ War

Introduction

The Hundred Years’ War, a prolonged and devastating conflict between England and France (1337-1453), was a historical event unknown to the ancient Greek philosopher Plato. Nonetheless, by examining his philosophical ideas, primarily in “The Republic” and “Laws,” we may speculate on his potential views regarding such a conflict.

The Philosopher King and the Ideal State

The Philosopher-King and the Ideal State In his magnum opus, “The Republic,” Plato presents the concept of a philosopher-king, an individual embodying wisdom and justice, who governs the ideal state (Plato, Republic, Stephanus 473c-d, Book V)1. This enlightened ruler, possessing both knowledge and moral integrity, would guide society toward harmony and justice.

War as Necessary Evil

In the “Laws,” Plato acknowledges that war might be a necessary recourse, particularly when defending the state against injustice (Plato, Laws, Stephanus 625c, Book I)2. However, he also emphasizes the importance of striving for peace and pursuing diplomacy whenever possible.

Striving for Peace

Plato contends that an ideal state should always seek to cultivate peace with other nations, fostering a spirit of friendship and cooperation (Plato, Republic, Stephanus 470a-b, Book V)3. This suggests that, had he been aware of the Hundred Years’ War, Plato would likely have advocated for diplomatic efforts to resolve the conflict.

The Guardians as Defenders of the State

The Guardians: Defenders of the State In “The Republic,” Plato introduces the guardians, a class of citizens specifically trained to defend the state and uphold its values (Plato, Republic, Stephanus 414a-b, Book III)4. These individuals, educated in both martial and intellectual pursuits, would protect the state from external threats, such as those posed during the Hundred Years’ War.

Governance in Times of War

Though Plato acknowledges the potential necessity of war, he cautions against allowing it to dominate the state’s agenda (Plato, Republic, Stephanus 468a-b, Book V)5. Instead, he argues that governance should remain focused on promoting the well-being of citizens and maintaining social harmony.

Fostering Social Harmony and Virtue

Plato maintains that an ideal state should strive to cultivate virtue among its citizens, promoting wisdom, courage, moderation, and justice (Plato, Republic, Stephanus 427e-428a, Book IV)6. In this context, the prolonged conflict of the Hundred Years’ War might be seen as a distraction from the state’s primary goal of fostering a harmonious and virtuous society.

Conclusion

Though Plato lived centuries before the Hundred Years’ War, his philosophical ideas can provide valuable insights into how he might have viewed such a conflict. By examining his writings, particularly “The Republic” and “Laws,” we can speculate that Plato would have recognized the potential necessity of war in certain circumstances but would have strongly advocated for peace, diplomacy, and the pursuit of justice and virtue in governance.

References

TitleAuthorDate WrittenChicago citation
The RepublicPlato380 BCEPlato. The Republic. Translated by G.M.A. Grube. Indianapolis: Hackett Publishing Company, 1992.
LawsPlato348 BCEPlato. Laws. Translated by A.E. Taylor. London: Dent, 1934.

Footnotes

  1. Plato, Republic, 473c-d. Perseus Digital Library | The Internet Encyclopedia of Philosophy 
  2. Plato, Laws, 628a-630d. Perseus Digital Library | The Internet Encyclopedia of Philosophy 
  3. Plato, Republic, 470a-471c. Perseus Digital Library | The Internet Encyclopedia of Philosophy 
  4. Plato, Republic, 433b-c. Perseus Digital Library | The Internet Encyclopedia of Philosophy 
  5. Plato, Republic, 414b-415d. Perseus Digital Library | The Internet Encyclopedia of Philosophy 
  6. Plato, Laws, 348, 663a-b. Perseus Digital Library | The Internet Encyclopedia of Philosophy 

AI (Artificial Intelligence) and HI (Human Intelligence) Statement

Modified from Brewin https://www.theguardian.com/books/2024/apr/04/why-i-wrote-an-ai-transparency-statement-for-my-book-and-think-other-authors-should-too

Has any text been generated using AI?Yes
Has any text been improved or corrected using HI?No
Has any methods of analysis been suggested using HI?Yes
Has any methods of analysis been suggested using AI?No
Do any analyses utilize AI technologies, such as Large Language Models, for tasks like analyzing, summarizing, or retrieving information from data?Yes

ChatGPT as an essay-writing assistant – Part I

Posted on by

Outline

Introduction and background

In this multi-part series of articles, I will document my efforts in writing a high school Philosophy paper using ChatGPT. The paper shall answer in writing a question I was asked at an impromptu oral assessment in Philosophy class in my 3rd year of high school.

Let me take a step back to give you some context. I was enrolled in Classical High School in Rome, Italy. This type of school has a strong 5-year emphasis on History and Philosophy, as well as Latin, Greek, and Italian literature and grammar, with a bit of Science and Math. Students are typically evaluated on weekly written assignments, a minimum of 3 scheduled written tests each semester, and a minimum of 2 oral assessments per semester. During the latter ones, the teacher would (usually) pick two names at random from the attendance list once a week and gauge their knowledge on both the weekly lesson, in detail, but also on anything covered in the year up to that point; this could last between 30-60 minutes, and would happen in front of an audience of (often roaring) classmates. Seriously: brutal!

Only occasionally were students allowed to give a presentation on a prepared topic or volunteer for a full oral evaluation. This was the case for me during a block on Plato. And so it was that the teacher, the famously feared Professoressa Carbone (who taught both History and Philosophy), probed into my soul with her terrifying eyes and asked with a sinister smile, “Tell me, Niccoli, what would Plato have thought about the Hundred Years’ War?

My first prompt

When I heard about OpenAI ChatGPT being available, it seemed very fit to try this question in my first investigation. So I went ahead and asked the question “What would Plato have thought about the Hundred Years’ War?“, with only the minor modification to add citations and break down the response in steps.

GPT-3.5 result

Below is the result from the default GPT-3.5 model:

Essay written by Chat GPT-3.5

Time investment and return

  • Time investment: the whole investigatyion took about 30 minutes, including a very light read on prompts from a reputalbe source, and reading the answer (becaue the citations really did not come as full references, there was really nothing else to do other than that).
  • Return: was this worth 30 minutes of my time? The answer is, as often, it depends! On the one hand, this is a really poor answer in my view (see next section for a more detailed evaluation). So, as an essay (l imagine myself as a student looking for a quick free lunch), this would not do it; not even close. For me… but then again, even at 13 in my first year of high school I would’ve known this was not going to cut it… or in Junior high, for what matters. On the other hand, I was deliberate in providing a very simple, not engineered or iteratively refined prompt. As such, the response will come useful as a benchamrk against wich I will compare other efforts (for example using a better model, GPT4, and improving the prompt).

Evaluation

Is the answer grammatically correct? yes; is it written in a uniform style, appropriate for a school essay? yes; Does it contain any hallucination? I would say no, everything sounds reasonable. However I must admit that without reference to specific passages to Plato’s work, it is hard to say for sure. From what I remember of Plato’s Philosophy it’s reasonable, and that’s all.

On the other hand, had I been handed in this as a teacher, I would not be impressed. IF this were the output of an in-class exam, I probably would have marked it as a D- (a D for the very mediocre text and a minus for the ridiculousness of the citations). Had it instead been handed for an at home written assignement (in which cse the student would have had access to their notes and books) this would definitely be certainly worth an F (Fail). Now I sound like Professoressa Carbone.

You may reasonably wonder at the end of this article: how did I fare in that evaluation in 1986? Fair question. And I am not ashamed to confess it was a disaster. At 15 I was still very immature, intelligent but unengaged in school work, and also unable to do a real analysis. By that I mean interpret information rather than just accumulate facts and connect them. I honestly do not recall the mark I got (we had different levels of Failure that went progressively deeper, not too dissimilarly from the Circles in Dante’s Inferno) but I rememberr that an F definitely it was.

P.S. The model did a perfectly fine job in proofreading my first draft for this post. THat was definitely worth a lot, given I type very slowly, and I make lots of typos.

Keep advancing your Python coding skills

October 22, 2020

In my last post I touched on the topic of continuously improving your geo-computing projects (also take a look at my chapter from the upcoming Software Underground book, 52 things you should know about geocomputing).

However, one aspect that I intentionally left out in was that of coding skills as I was planning to get back to it with a dedicated post, which you are reading just now.

2018 vs 2020 comparison of flag percentage calculation

In the Jupyter notebook I compare the results of seismic inversion from two methods (with or without inversion-tailored noise attenuation) using a custom function to flag poor prediction of the target well log using median/median absolute deviation as a statistic for the error; the results are shown below.

One may just do this visual comparison, but I also included calculations to count the number and percentage of samples that have been flagged for each case. Below is a cell of code from the Jupyter notebook (let’s call it 2020 code) that does just that .

zone_errors_a['flagged samples']=result_a.groupby('zone', sort=False).flag.sum().values
zone_errors_b['flagged samples']=result_b.groupby('zone', sort=False).flag.sum().values

def calc_proportion(dtf):
"""
function to calculate proportion of flagged samples
"""
x=dtf.flag
return round(100 * x.sum()/len(x), 1)

zone_errors_a['proportion (%)']=result_a.groupby('zone',sort=False).apply(calc_proportion).values
zone_errors_b['proportion (%)']=result_b.groupby('zone',sort=False).apply(calc_proportion).values

I am a lot happier with this code than with the original code (circa 2018), which is in the cell below.

zones_a=list(result_a['zone'].unique())
zones_b=list(result_b['zone'].unique())
zone_errors_a['flagged samples']=[result_a.loc[result_a.zone==z,'flag'].sum() for z in zones_a]
zone_errors_b['flagged samples']=[result_b.loc[result_b.zone==z,'flag'].sum() for z in zones_b]
zone_errors_a['proportion (%)']=[round(result_a.loc[result_a.zone==z,  'flag'].sum()/len(result_a.loc[result_a.zone==z,'flag'])*100,1) for z in zones_a]                                
zone_errors_b['proportion (%)']=[round(result_b.loc[result_b.zone==z,  'flag'].sum()/len(result_b.loc[result_b.zone==z,'flag'])*100,1) for z in zones_b]                                    

The major differences in the older code are:

  • I was using unique instead of Pandas’ groupby
  • I was using list comprehensions to work through the DataFrame, instead of Pandas’ apply and a custom function to calculate the percentages on the entire DataFrame at once.

I find the 2020 code much more tidy and easier to read.

Enters Pandas for everyone

The above changes happened in but a few hours over two evenings, after having worked through chapters 9 and 10 of Pandas for Everyone by Daniel Chen, a very accessible read for all aspiring data scientists, which I highly recommend (also, watch Daniel’s fully-packed 2019 Pycon tutorial).

And before you ask: no, you do not get the Agile Scientific sticker with the book, I am sorry.

🙂

Comparison of 2016 vs 2020 code snippets from the 2016 SEG Machine Learning contest

A second example is of code used to calculate the first and second derivatives for all geophysical logs from the wells in the 2016 SEG Machine Learning contest.

The two cells of code below do exactly the same thing: loop through the wells and for each one in turn loop through the logs, calculate the derivatives, add them to a temporary Pandas DataFrame, then concatenate into a single output DataFrame. In this case, the only difference is the moving away from unique to groupby.

I use the %%timeit cell magic to compare the runtimes for the two cells.

2016 code
%%timeit
# for training data
# calculate all 1st and 2nd derivative for all logs, for all wells
train_deriv_df = pd.DataFrame()             # final dataframe
for well in train_data['Well Name'].unique():        # for each well
    new_df = pd.DataFrame() # make a new temporary dataframe
   
    for log in ['GR', 'ILD_log10', 'DeltaPHI', 'PHIND' ,'PE']: # for each log
        # calculate and write to temporary dataframe
        new_df[str(log) + '_d1'] = np.array(np.gradient(train_feat_df[log][train_feat_df['Well Name'] == well]))
        new_df[str(log) + '_d2'] = np.array(np.gradient(np.gradient(train_feat_df[log][train_feat_df['Well Name'] == well])))
         
    # append all rows of temporary dataframe to final dataframe          
    train_deriv_df = pd.concat([train_deriv_df, new_df])
86 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2020 code
%%timeit
# for training data
# calculate all 1st and 2nd derivative for all logs, for all wells
train_deriv_df = pd.DataFrame() # final dataframe
for _, data in train_feat_df.groupby('Well Name'): # for each well        
    new_df = pd.DataFrame()                        # make a new temporary dataframe
   
    for log in ['GR', 'ILD_log10', 'DeltaPHI', 'PHIND' ,'PE']: # for each log
        # calculate and write to temporary dataframe 
        new_df[str(log) + '_d1'] = np.gradient(data[log])
        new_df[str(log) + '_d2'] = np.gradient(np.gradient(data[log]))
    # append all rows of temporary dataframe to final dataframe          
    train_deriv_df = pd.concat([train_deriv_df, new_df])
52.3 ms ± 353 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

We go down to 52.3 ms from 86 ms, which is a modest improvement, but certainly the code is more compact and a whole lot lighter to read (i.e. more pythonic, or pandaish if you prefer): I am happy!

As an aside, if you want to know more about timing code execution, see section 1.07 from Jake VanderPlas’ outstanding Python Data Science Handbook, which I also cannot recommend enough (and do yourself a favor: watch his series Reproducible Data Analysis in Jupyter).

By the way, below I show the notebook code comparison generated using the nbdiff-web option from the awesome nbdime library, a recent discovery.

Geoscience Machine Learning bits and bobs – data completeness

2016 Machine learning contest – Society of Exploration Geophysicists

In a previous post I showed how to use pandas.isnull to find out, for each well individually, if a column has any null values, and sum to get how many, for each column. Here is one of the examples (with more modern, pandaish syntax compared to the example in the previous post:

for well, data in training_data.groupby('Well Name'): 
print(well)
print (data.isnull().values.any())
print (data.isnull().sum(), '\n')

Simple and quick, the output showed met that  – for example – the well ALEXANDER D is missing 466 samples from the PE log:

ALEXANDER D
True
Facies         0
Formation      0
Well Name      0
Depth          0
GR             0
ILD_log10      0
DeltaPHI       0
PHIND          0
PE           466
NM_M           0
RELPOS         0
dtype: int64

A more appealing and versatile alternative, which I discovered after the contest, comes with the matrix function form the missingno library. With the code below I can turn each well into a Pandas DataFrame on the fly, then a missingno matrix plot.

for well, data in training_data.groupby('Well Name'): 
msno.matrix(data, color=(0., 0., 0.45)) 
fig = plt.gcf()
fig.set_size_inches(20, np.round(len(data)/100)) # heigth of the plot for each well reflects well length 
axes=fig.get_axes()
axes[0].set_title(well, color=(0., 0.8, 0.), fontsize=14, ha='center')

I find that looking at these two plots provides a very compelling and informative way to inspect data completeness, and I am wondering if they couldn’t be used to guide the strategy to deal with missing data, together with domain knowledge from petrophysics.

Interpreting the dendrogram in a top-down fashion, as suggested in the library documentation, my first thoughts are that this may suggest trying to predict missing values in a sequential fashion rather than for all logs at once. For example, looking at the largest cluster on the left, and starting from top right, I am thinking of testing use of GR to first predict missing values in RDEP, then both to predict missing values in RMED, then DTC. Then add CALI and use all logs completed so far to predict RHOB, and so on.

Naturally, this strategy will need to be tested against alternative strategies using lithology prediction accuracy. I would do that in the context of learning curves: I am imagining comparing the training and crossvalidation error first using only non NaN rows, then replace all NANs with mean, then compare separately this sequential log completing strategy with an all-in one strategy.

Busting bad colormaps with Python and Panel

I have not done much work with, or written here on the blog about colormaps and perception in quite some time.

Last spring, however, I decided to build a web-based app to show the effects of using a bad colormaps. This stemmed from two needs: first, to further my understanding of Panel, after working through the awesome tutorial by James Bednar, Panel: Dashboards (at PyData Austin 2019); and second, to enable people to explore interactively the effects of bad colormaps on their perception, and consequently the ability to on interpret faults on a 3D seismic horizon.

I introduced the app at the Transform 2020 virtual subsurface conference, organized by Software Underground last June. Please watch the recording of my lightning talk as it explains in detail the machinery behind it.

I am writing this post in part to discuss some changes to the app. Here’s how it looks right now:

The most notable change is the switch from one drop-down selector to two-drop-down selectors, in order to support both the Matplotlib collection and the Colorcet collection of colormaps. Additionally, the app has since been featured in the resource list on the Awesome Panel site, an achievement I am really proud of.

awesome_panel

You can try the app yourself by either running the notebook interactively with Binder, by clicking on the button below:
Binder

or, by copying and pasting this address into your browser:

https://mybinder.org/v2/gh/mycarta/Colormap-distorsions-Panel-app/master?urlpath=%2Fpanel%2FDemonstrate_colormap_distortions_interactive_Panel

Let’s look at a couple of examples of insights I gained from using the app. For those that jumped straight to this example, the top row shows:

  • the horizon, plotted using the benchmark grayscale colormap, on the left
  • the horizon intensity, derived using skimage.color.rgb2gray, in the middle
  • the Sobel edges detected on the intensity, on the right

and the bottom row,  shows:

  • the horizon, plotted using the Matplotlib gist_rainbow colormap, on the left
  • the intensity of the colormapped, in the middle. This is possible thanks to a function that makes a figure (but does not display it), plots the horizon with the specified colormap, then saves plot in the canvas to an RGB numpy array
  • the Sobel edges detected on the colormapped intensity, on the right

I think the effects of this colormaps are already apparent when comparing the bottom left plot to the top left plot. However, simulating perception can be quite revealing for those that have not considered these effects before. The intensity in the bottom middle plot is very washed out in the areas corresponding to green color in the bottom left, and as a result, many of the faults are not visible any more, or only with much difficulty, which is demonstrated by the Sobel edges in the bottom right.

And if you are not quite convinced yet, I have created these hill-shaded maps, using Matt Hall”s delightful function from this notebook (and check his blog post):

Below is another example, using the Colocrcet cet_rainbow which is is one of Peter Kovesi’s perceptually uniform colormaps.  I use many of Peter’s colormaps, but never used this one, because I use my own perceptual rainbow, which does not have  a fully saturated yellow, or a fully saturated red. I think the app demonstrate, that even though they are more subtle , this rainbow still is introducing some artifacts. The yellow colour creates narrow flat bands, visible in the intensity and Sobel plots, and indicated by yellow arrows; the red colour is also bad as usual, causing an artificial decrease in intensity(magenta arrows).