Using classes (or more broadly object-oriented programming) can really take your coding to the next level. While they can be a bit confusing at first, some of the things you can do with them would be very difficult, if not impossible, to do using function-based programming. A very good and concise definition of classes in Python can be found in the official documentation, which I’ll literally quote below:
“Classes provide a means of bundling data and functionality together. Creating a new class creates a new type of object, allowing new instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state.”
If you’re already using functions in Python, moving to classes is a very feasible step. The approach I will take in this post is to compare a class and a set of functions that do the same thing, i.e., some data fitting. As we go over the comparison, I will also be referencing bits and pieces of the quote above, hopefully making it more clear to understand.

First of all, you will want to bundle data and functionality that actually belong together. With that in mind, using classes will actually make sense in the long run.
The attributes is where the data (or states) are kept inside an instance of the class. Because of the nature of classes, each instance exists in the computer memory completely independent of other instances of the same class. In other words, the attributes (or properties) of each instance can hold different values. All instances still contain the same methods of the class. These are actions that can be performed by a given instance that usually modify the attribute values of that particular instance.
If you use functions on a regular basis and look inside a Python class, you’ll recognize that the methods are basically functions. That set of functions that you created to perform related tasks on your data can all be put inside a class. Besides making your code more modular and self-contained, the fact that you can have multiple instances of the same class can come in really handy, as we’ll see later on. All the code in this post can be found on my GitHub page, as well.

Let’s get started on our data fitting code. The data part is a set of ordered (x, y) pairs that will be used in a linear regression. The functionality part is the set of actions that can be performed on the data. More specifically: defining the data points, fitting a line to them, and plotting the original data points and the line of best fit.
As a general grammatical recommendation, variables (or attributes) are substantives, or groups of substantives, since they contain or define something. Variable names such as figuresize, x_values, numdatapoints are all typical in Python.
Since, functions or methods “do something” with “something”, I always start with a verb and then one or more substantives. So, open_figure, define_data, find_data_points are all valid names. It is a convention in Python to use underscores in function or method names. If you use underscores in your variable names, that verb is all you got to quickly distinguish between attributes (variables) and methods (functions) in your code.
Data Fitting Functions
With that said, let’s check below the three (very simple) functions that will be used. One important aspect is that most functions have input arguments and return values. In this simple example, the functions define_data and fit_data return dictionaries containing the output information. In larger projects, functions may be calling functions, who in turn call more functions. Handling and tracking input and output can become pretty complex. By then, most people resort to global variables. Which, in my opinion, should be avoided at all costs. Having local variables that stay local to their functions will make debugging the code much more straightforward.
import numpy as np
import plotly.graph_objects as go
from scipy.stats import linregress
def define_data(x, y, xname=None, yname=None):
"""
Creates dictionary containing data information.
"""
if len(x) == len(y):
data = {
'xname': xname,
'yname': yname,
'x': x,
'y': y,
}
else:
raise Exception("'x' and 'y' must have the same length.")
return data
def fit_data(data):
"""
Calculates linear regression to data points.
"""
f = linregress(data['x'], data['y'])
fit = {
'slope': f.slope,
'intercept': f.intercept,
'r2': f.rvalue**2
}
print('Slope = {:1.3f}'.format(fit['slope']))
print('Intercept = {:1.3f}'.format(fit['intercept']))
print('R-squared = {:1.3f}'.format(fit['r2']))
return fit
def plot_data(data, fit):
"""
Creates scatter plot of data and best fit regression line.
"""
# Making sure x and y values are numpy arrays
x = np.array(data['x'])
y = np.array(data['y'])
# Creating plotly figure
fig = go.Figure()
# Adding data points
fig.add_trace(
go.Scatter(
name='data',
x=x,
y=y,
mode='markers',
marker=dict(size=10, color='#FF0F0E')
)
)
# Adding regression line
fig.add_trace(
go.Scatter(
name='fit',
x=x,
y=fit['slope']*x+fit['intercept'],
mode='lines',
line=dict(dash='dot', color='#202020')
)
)
# Adding other figure objects
fig.update_xaxes(title_text=data['xname'])
fig.update_yaxes(title_text=data['yname'])
fig.update_layout(
paper_bgcolor='#F8F8F8',
plot_bgcolor='#FFFFFF',
width=600, height=300,
margin=dict(l=60, r=30, t=30, b=30),
showlegend=False)
fig.show()
Below is a little program that uses the three functions to fit some x, y values. Notice how inputs and outputs have to be passed around between functions. Also, if we want to fit a different set of data points, we start carrying around multiple variables in memory, which can eventually be overwritten or confused with something else. By the way, I’m assuming you saved a file named fitfunctons.py in a modules folder which is either on the Python path or is your current working folder.
from modules.fitfunctions import define_data, fit_data, plot_data
# Defining first data set
x1 = [0, 1, 2, 3, 4]
y1 = [2.1, 2.8, 4.2, 4.9, 5.1]
data1 = define_data(x1, y1, xname='x1', yname='y1')
# Fitting data
fit1 = fit_data(data1)
# Plotting results
plot_data(data1, fit1)
# Defining second data set
x2 = [0, 1, 2, 3, 4]
y2 = [3, 5.1, 6.8, 8.9, 11.2]
data2 = define_data(x2, y2, xname='x2', yname='y2')
# Fitting data
fit2 = fit_data(data2)
# Plotting results
plot_data(data2, fit2)
Data Fitting Class
Now let’s take a look at a class (saved in fitclass.py in the same modules folder) that contains the three functions as methods. As far as naming conventions in Python are concerned, classes use UpperCamelCase. Therefore, our class will be called FitData. In addition to the three functions (now as methods inside the class) there’s a constructor method, which is by convention named __init__
. It’s in the constructor that we can initialize attributes, in this case the data and fit dictionaries, as well as call other methods. Notice that the first argument to all methods is self
, which will hold the pointer to the class instance once that instance is created, and is how attributes and methods are accessed inside the class.
For instance, by passing only self
to the fit_data method, I can access any attribute or method of the class by using the self
argument with dot notation. For instance, self.data['x']
gets the x values array anytime I need it inside the class.
On the same token, methods don’t need to have return values (although sometimes you may want them to). They can use the self
pointer to store the return values as a class attribute. The method fit_data gets the x, y values from the data attribute and stores the fitting parameters in the fit attribute, such as self.fit['slope'] = f.slope
.
import numpy as np
import plotly.graph_objects as go
from scipy.stats import linregress
class FitData:
def __init__(self):
"""
Class constructor.
"""
self.data = dict()
self.fit = dict()
def define_data(self, x, y, xname=None, yname=None):
"""
Creates dictionary containing data information.
"""
if len(x) == len(y):
self.data['x'] = x
self.data['y'] = y
self.data['xname'] = xname
self.data['yname'] = yname
else:
raise Exception("'x' and 'y' must have the same length.")
def fit_data(self):
"""
Calculates linear regression to data points.
"""
f = linregress(self.data['x'], self.data['y'])
self.fit['slope'] = f.slope
self.fit['intercept'] = f.intercept
self.fit['r2'] = f.rvalue**2
print('Slope = {:1.3f}'.format(self.fit['slope']))
print('Intercept = {:1.3f}'.format(self.fit['intercept']))
print('R-squared = {:1.3f}'.format(self.fit['r2']))
def plot_data(self):
"""
Creates scatter plot of data and best fit regression line.
"""
# Making sure x and y values are numpy arrays
x = np.array(self.data['x'])
y = np.array(self.data['y'])
# Creating plotly figure
fig = go.Figure()
# Adding data points
fig.add_trace(
go.Scatter(
name='data',
x=x,
y=y,
mode='markers',
marker=dict(size=10, color='#FF0F0E')
)
)
# Adding regression line
fig.add_trace(
go.Scatter(
name='fit',
x=x,
y=self.fit['slope']*x+self.fit['intercept'],
mode='lines',
line=dict(dash='dot', color='#202020')
)
)
# Adding other figure objects
fig.update_xaxes(title_text=self.data['xname'])
fig.update_yaxes(title_text=self.data['yname'])
fig.update_layout(
paper_bgcolor='#F8F8F8',
plot_bgcolor='#FFFFFF',
width=600, height=300,
margin=dict(l=60, r=30, t=30, b=30),
showlegend=False
)
fig.show()
The program below can be contrasted with the one that employed the functions. We can start by creating two instances of the FitData class, which will be two distinct and independent objects in the computer memory. Each instance now holds the self
pointer and can therefore access attributes and call methods using dot notation. fitdata1.plot_data()
will run that method with whatever attributes are required and that are already stored in that particular instance. Also, print(fitdata2.data)
would display the values assigned to the data attribute in the fitdata2 object.
from modules.fitclass import FitData
# Creating class instances
fitdata1 = FitData()
fitdata2 = FitData()
# Defining first data set
x1 = [0, 1, 2, 3, 4]
y1 = [2.1, 2.8, 4.2, 4.9, 5.1]
fitdata1.define_data(x1, y1, xname='x1', yname='y1')
# Fitting data
fitdata1.fit_data()
# Plotting results
fitdata1.plot_data()
# Defining second data set
x2 = [0, 1, 2, 3, 4]
y2 = [3, 5.1, 6.8, 8.9, 11.2]
fitdata2.define_data(x2, y2, xname='x2', yname='y2')
# Fitting data
fitdata2.fit_data()
# Plotting results
fitdata2.plot_data()
You should also observe that all the variables that were being passed around are now incapsulated inside each object, making for code that is more organized and less prone to errors. Of course, these examples are quite simple and keeping track of what’s going on is still pretty straightforward.
Finally, similar to functions, methods can call methods. But since input arguments and return values can all be stored inside the class attributes, nesting method calls becomes much cleaner and easier to do than nesting function calls.
Random Remarks
I was first exposed to Object-Oriented Programming a little over 6 years ago, when I was playing around with a LEGO EV3 Mindstorms. It was in MATLAB, which was my programming language of many, many years. It took me some time to wrap my head around the concept, but once I understood its power, a whole new dimension of programming opened up! I then took to write the code for an entire engine test cell automation system: GUIs, instrumentation interfaces, and everything in between. The level of complexity and modularization that was required in order to succeed was only achievable through the use of classes.
Speaking of GUIs, those are built upon the concept of classes. If you ever venture down that path, there’s another strong reason to learn more and start using them. Once you get it, you’ll never look back.
Greetings! Very helpful advice in this particular article! It is the little changes that produce the most significant changes. Thanks for sharing!
Thank you for the feedback, Roberto!
great post, very informative. I’m wondering why the opposite
experts of this sector don’t realize this.
You must continue your writing. I am confident, you have a huge readers’ base already!
Well, thank you! I appreciate your comment.